Email or username:

Password:

Forgot your password?
Top-level
David Neto

@raph
How close to theoretical peak do you want to get? What shape of problem are you trying to optimize for?
Have you heard of Epic's Verse language project?

5 comments
Raph Levien

@dneto Ideally very close to theoretical peak. Basically problems with "interesting" parallelism including 2D rendering of course but also including sparse matrix multiplication, parsers, and so on. Yes, but that's at a much higher level than what I'm talking about; Futhark and Taichi are probably closer to the mark as far as languages that might compile down to run on GPC (as of course would be PyTorch and the MLIR ecosystem).

Raph Levien

@dneto I realize what I posted here is very clear exactly what I'm going for. There's more detail of my current thinking in a Zulip thread: xi.zulipchat.com/#narrow/strea

David Neto

@raph
This made me think of the parallel kernels connected with real channels that Altera made about 10 years ago. It's in their OpenCL FPGA optimization guide. I don't recall whether the channel operations synchronized global memory writes, my hunch is they don't/didn't.

Raph Levien

@dneto I'm interested in prior art, so pointers are welcome (I'll look into this). There's also CUDA streams, which is maybe the closest existing thing, though I haven't yet carefully studied the alternatives in CUDA world.

David Neto

@raph
Intel pre seemed some overviews of this at IWOCL.

E.g.

iwocl.org/wp-content/uploads/i

The CNN work was published more formally too.

Go Up