One of the main things I can't do in the current graphics...

One of the main things I can't do in the current graphics API is run a 2D renderer within bounded memory, at least without having a fence and readback to the CPU, which could tank performance. The underlying problem can do it, but you need to be able to dynamically dispatch the various parts of the problem and use queues to connect the pieces, which compute shaders can't do. The recent development of work graphs can do queues and bounded memory, but...

(2/4)

Like 30 July at 1:02 | Open on mastodon.online

9 comments

Raph Levien

...can't sustain the ordering guarantees you need for correct 2D rendering.

I'm more bullish on work graphs now, especially after attending HPG (https://www.highperformancegraphics.org/2024/program/index.html) and seeing the two work graph talks there. However, they need more baking; the current version is still pretty limited.

(3/4)

30 July at 1:03 | Open on mastodon.online

Raph Levien

Another thing missing from this talk is to go deeper into the Cell architecture. I think that's every bit as relevant as Larrabee. A great introduction is Copetti's site (https://www.copetti.org/writings/consoles/playstation-3/).

(4/4)

30 July at 1:03 | Open on mastodon.online

Tom Forsyth

@raph Well now - I may have some very strong views on this! The mantra I constantly yelled at people when developing Larrabee was "don't build the Cell". I think we very much succeeded in that goal.

30 July at 1:06 | Open on mastodon.gamedev.place

Raph Levien

@TomF I'd love to hear you expand on that, in any format you'd like. As I say, I have a half-finished blog post, and polishing that might be an opportunity to include your perspective, if nothing else by linking to something.

30 July at 1:08 | Open on mastodon.online

Tom Forsyth

@raph Just watching the talk now. I very much disagree with the "AVX-512" was too power hungry" statement - that was not a factor.

There's a robust argument to be made that requiring a 100% coherent memory fabric took more power than the GPU's much weaker fabric. On the other hand, your whole lecture is kinda wishing they HAD that fabric, so... :-)

The real problem was it was 20 years too early. Ironically what it absolutely destroyed contemporary GPUs at was very short AA lines and splines.

30 July at 1:16 | Open on mastodon.gamedev.place

Raph Levien

@TomF I will be happy to update and correct the talk. There *might* be a bit of an element of Cunningham's Law there.

To respond though, I don't think I need a coherent memory fabric, for the stuff I'm doing I'm fairly happy using atomics to indicate explicit communication between workgroups.

Interesting correction re texture queries. From my perspective I don't see a huge difference between "CISC instruction" and "send a packet" but from yours I can see it's pretty different.

30 July at 1:21 | Open on mastodon.online

Tom Forsyth

@raph Yup - unless you literally need the machine to run an off-the-shelf OS with very few changes, you clearly want to be able to bypass the coherent fabric for all sorts of traffic. It burns a lot of power and limits your bandwidth.

We had lots of plans for turning it off for certain areas of memory, and make the traffic look more like a GPU, but we never got the chance to implement those. Ah well.

30 July at 1:25 | Open on mastodon.gamedev.place

Tom Forsyth

@raph BY the way, enjoying the whole discussion of your renderer because it sounds like all the same problems we had with the Larrabee renderer.

We had a tile-based renderer, also for load-balancing reasons, and we also had problems with potentially massive intermediate buffers. Even though the cores were general x86 cores that could indeed call malloc() in the middle of things (though note there's no backing store if you run out!), the overhead of that is huge, so you try to avoid it.

30 July at 4:57 | Open on mastodon.gamedev.place

Tom Forsyth

@raph Thanks - happy to collaborate in any way you want - feel free to send me email or whatever.

30 July at 1:20 | Open on mastodon.gamedev.place