Email or username:

Password:

Forgot your password?
Top-level
Az

@raph Funny, I wrote a (naive) bitonic sort in #wgpu no later than on Tuesday! Two resources that helped me were the webgpu sample (in typescript) webgpu.github.io/webgpu-sample and a great blogpost by @tgfrerer poniesandlight.co.uk/reflect/b
Have fun, seeing those pixels getting sorted is really satisfying.

4 comments
Raph Levien

@Az @tgfrerer Very cool! My understanding is that this implements only a single workgroup, while one of my constraints is dealing with a large and potentially quite variable number of elements. Also, from what I'm seeing so far, bitonic is not a great match from this application, but at the end of the day you have to measure and compare.

Az

@raph The post optimizes with workgroup shared memory when possible but fallbacks on a global implementation if not possible, depending on the step size. In my case I'm sorting 1M floats (not necessarily a power of 2) with a few thousand workgroups.

Raph Levien

@Az Right, I've read the blog post more closely since last replying to you, and see pretty clearly how it works. Do you have any kind of performance numbers for that?

The other thing I'm looking at (based on another tip) is OneSweep, which is state of the art from Nvidia:

arxiv.org/pdf/2206.01784.pdf

Single pass might work here, even on WebGPU, because the value plus flags can fit in 32 bits.

Az

@raph I'm still very new to this so I won't be able to say much, and my implementation doesn't exploit locality yet. (which also helps diminishing dispatches). Still my rough implementation seems to take around 5ms for 1M elements on a 3080Ti. Take it with a grain of salt though, I'm sure it will do way better once I find the time to improve it (and my profiling isn't the most accurate).

Go Up