@dneto Ah, uVkCompute looks good, I agree an analog of that for WebGPU would be great.

I've thought seriously about doing the prefix sum part of that (and dipped my toe into it in the piet-gpu days), and could possibly be cajoled if someone else would run the project.

Now I'm reading up on the sort literature, and it's a pretty deep rabbithole. On CUDA, Onesweep looks very good, but I might be finding out that, for this algorithm, the gap between CUDA and WebGPU is like a yawning chasm.