I've uploaded a video of my "I want a good parallel computer" talk: https://www.youtube.com/watch?v=c52ziyKOArc
I also have a half-finished blog post where I intended to go into more detail, but I'm posting this now because I hope it can provoke some interesting discussion. I'll put a little more context here, and I'm also happy to answer questions.
(1/4)
One of the main things I can't do in the current graphics API is run a 2D renderer within bounded memory, at least without having a fence and readback to the CPU, which could tank performance. The underlying problem can do it, but you need to be able to dynamically dispatch the various parts of the problem and use queues to connect the pieces, which compute shaders can't do. The recent development of work graphs can do queues and bounded memory, but...
(2/4)