@raph
How close to theoretical peak do you want to get? What shape of problem are you trying to optimize for?
Have you heard of Epic's Verse language project?
Top-level
@raph 5 comments
@dneto I realize what I posted here is very clear exactly what I'm going for. There's more detail of my current thinking in a Zulip thread: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Vello-like.20pipeline.20on.20parallel.20CPU @raph @dneto I'm interested in prior art, so pointers are welcome (I'll look into this). There's also CUDA streams, which is maybe the closest existing thing, though I haven't yet carefully studied the alternatives in CUDA world. @raph E.g. https://www.iwocl.org/wp-content/uploads/iwocl2017-andrew-ling-fpga-sdk.pdf The CNN work was published more formally too. |
@dneto Ideally very close to theoretical peak. Basically problems with "interesting" parallelism including 2D rendering of course but also including sparse matrix multiplication, parsers, and so on. Yes, but that's at a much higher level than what I'm talking about; Futhark and Taichi are probably closer to the mark as far as languages that might compile down to run on GPC (as of course would be PyTorch and the MLIR ecosystem).