Email or username:

Password:

Forgot your password?
Raph Levien

I've been thinking quite a bit about a "Good Parallel Computer" which would overcome the worst limitations of GPUs, which I find increasingly frustrating. Basically, instead of launching compute shaders in an (x, y, z) cube, you have a programmable controller which launches workgroups (multiple kinds) when inputs are ready, enabling queues and other things.

I know of the GRAMPS paper, Vortex, and Tenstorrent. What other things are out there? Who should I be talking to?

Blog post before long.

8 comments
David Neto

@raph
How close to theoretical peak do you want to get? What shape of problem are you trying to optimize for?
Have you heard of Epic's Verse language project?

Raph Levien

@dneto Ideally very close to theoretical peak. Basically problems with "interesting" parallelism including 2D rendering of course but also including sparse matrix multiplication, parsers, and so on. Yes, but that's at a much higher level than what I'm talking about; Futhark and Taichi are probably closer to the mark as far as languages that might compile down to run on GPC (as of course would be PyTorch and the MLIR ecosystem).

Raph Levien

@dneto I realize what I posted here is very clear exactly what I'm going for. There's more detail of my current thinking in a Zulip thread: xi.zulipchat.com/#narrow/strea

David Neto

@raph
This made me think of the parallel kernels connected with real channels that Altera made about 10 years ago. It's in their OpenCL FPGA optimization guide. I don't recall whether the channel operations synchronized global memory writes, my hunch is they don't/didn't.

Raph Levien

@dneto I'm interested in prior art, so pointers are welcome (I'll look into this). There's also CUDA streams, which is maybe the closest existing thing, though I haven't yet carefully studied the alternatives in CUDA world.

David Neto

@raph
Intel pre seemed some overviews of this at IWOCL.

E.g.

iwocl.org/wp-content/uploads/i

The CNN work was published more formally too.

Raph Levien

@aras Right, that's what basically started the exploration. I'm unhappy with them because of three limitations: no joins (yet, though that's planned), fixed size data only, and no ordering guarantees. That makes them pretty much unsuitable for Vello. The centerpiece of my idea is that the launch logic is programmable, so you can express joins and other things.

Go Up