@TomF I'd love to hear you expand on that, in any format you'd like. As I say, I have a half-finished blog post, and polishing that might be an opportunity to include your perspective, if nothing else by linking to something.
Top-level
@TomF I'd love to hear you expand on that, in any format you'd like. As I say, I have a half-finished blog post, and polishing that might be an opportunity to include your perspective, if nothing else by linking to something. 5 comments
@TomF I will be happy to update and correct the talk. There *might* be a bit of an element of Cunningham's Law there. To respond though, I don't think I need a coherent memory fabric, for the stuff I'm doing I'm fairly happy using atomics to indicate explicit communication between workgroups. Interesting correction re texture queries. From my perspective I don't see a huge difference between "CISC instruction" and "send a packet" but from yours I can see it's pretty different. @raph Yup - unless you literally need the machine to run an off-the-shelf OS with very few changes, you clearly want to be able to bypass the coherent fabric for all sorts of traffic. It burns a lot of power and limits your bandwidth. We had lots of plans for turning it off for certain areas of memory, and make the traffic look more like a GPU, but we never got the chance to implement those. Ah well. @raph BY the way, enjoying the whole discussion of your renderer because it sounds like all the same problems we had with the Larrabee renderer. We had a tile-based renderer, also for load-balancing reasons, and we also had problems with potentially massive intermediate buffers. Even though the cores were general x86 cores that could indeed call malloc() in the middle of things (though note there's no backing store if you run out!), the overhead of that is huge, so you try to avoid it. @raph Thanks - happy to collaborate in any way you want - feel free to send me email or whatever. |
@raph Just watching the talk now. I very much disagree with the "AVX-512" was too power hungry" statement - that was not a factor.
There's a robust argument to be made that requiring a 100% coherent memory fabric took more power than the GPU's much weaker fabric. On the other hand, your whole lecture is kinda wishing they HAD that fabric, so... :-)
The real problem was it was 20 years too early. Ironically what it absolutely destroyed contemporary GPUs at was very short AA lines and splines.