@raph How close to theoretical peak do you want to...

@raph
How close to theoretical peak do you want to get? What shape of problem are you trying to optimize for?
Have you heard of Epic's Verse language project?

Like 14 Sep 2023 at 2:56 | Wall-to-wall | Open on mastodon.gamedev.place

5 comments

Raph Levien

@dneto Ideally very close to theoretical peak. Basically problems with "interesting" parallelism including 2D rendering of course but also including sparse matrix multiplication, parsers, and so on. Yes, but that's at a much higher level than what I'm talking about; Futhark and Taichi are probably closer to the mark as far as languages that might compile down to run on GPC (as of course would be PyTorch and the MLIR ecosystem).

14 Sep 2023 at 2:59 | Open on mastodon.online

Raph Levien

@dneto I realize what I posted here is very clear exactly what I'm going for. There's more detail of my current thinking in a Zulip thread: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Vello-like.20pipeline.20on.20parallel.20CPU

14 Sep 2023 at 3:03 | Open on mastodon.online

David Neto

@raph
This made me think of the parallel kernels connected with real channels that Altera made about 10 years ago. It's in their OpenCL FPGA optimization guide. I don't recall whether the channel operations synchronized global memory writes, my hunch is they don't/didn't.

14 Sep 2023 at 4:11 | Open on mastodon.gamedev.place

Raph Levien

@dneto I'm interested in prior art, so pointers are welcome (I'll look into this). There's also CUDA streams, which is maybe the closest existing thing, though I haven't yet carefully studied the alternatives in CUDA world.

14 Sep 2023 at 4:15 | Open on mastodon.online

David Neto

@raph
Intel pre seemed some overviews of this at IWOCL.

E.g.

https://www.iwocl.org/wp-content/uploads/iwocl2017-andrew-ling-fpga-sdk.pdf

The CNN work was published more formally too.

14 Sep 2023 at 4:35 | Open on mastodon.gamedev.place

Go Up