@raph Just watching the talk now. I very much disagree with the "AVX-512" was too power hungry" statement - that was not a factor.
There's a robust argument to be made that requiring a 100% coherent memory fabric took more power than the GPU's much weaker fabric. On the other hand, your whole lecture is kinda wishing they HAD that fabric, so... :-)
The real problem was it was 20 years too early. Ironically what it absolutely destroyed contemporary GPUs at was very short AA lines and splines.
@TomF I will be happy to update and correct the talk. There *might* be a bit of an element of Cunningham's Law there.
To respond though, I don't think I need a coherent memory fabric, for the stuff I'm doing I'm fairly happy using atomics to indicate explicit communication between workgroups.
Interesting correction re texture queries. From my perspective I don't see a huge difference between "CISC instruction" and "send a packet" but from yours I can see it's pretty different.