@dneto I realize what I posted here is very clear exactly...

@dneto I realize what I posted here is very clear exactly what I'm going for. There's more detail of my current thinking in a Zulip thread: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Vello-like.20pipeline.20on.20parallel.20CPU

Like 14 Sep 2023 at 3:03 | Open on mastodon.online

3 comments

David Neto

@raph
This made me think of the parallel kernels connected with real channels that Altera made about 10 years ago. It's in their OpenCL FPGA optimization guide. I don't recall whether the channel operations synchronized global memory writes, my hunch is they don't/didn't.

14 Sep 2023 at 4:11 | Open on mastodon.gamedev.place

Raph Levien

@dneto I'm interested in prior art, so pointers are welcome (I'll look into this). There's also CUDA streams, which is maybe the closest existing thing, though I haven't yet carefully studied the alternatives in CUDA world.

14 Sep 2023 at 4:15 | Open on mastodon.online

David Neto

@raph
Intel pre seemed some overviews of this at IWOCL.

E.g.

https://www.iwocl.org/wp-content/uploads/iwocl2017-andrew-ling-fpga-sdk.pdf

The CNN work was published more formally too.

14 Sep 2023 at 4:35 | Open on mastodon.gamedev.place

Go Up