@dequbed Turns out it was just that it was using a silly strategy for parallelising vertex shader invocations which in this case translated to 'don't parallelise', so I was paying the overhead of spinning up threads for nothing.