A bit of a followup to the barrier miscompilation issue I found. So far in developing piet-gpu I've found about 5 of these, in about 5k lines of GPU shader code. So I estimate if you're writing this kind of intricate compute stuff, you'll likely run into one serious miscompilation every 1000 lines of code.
My goal is to get each of these into an appropriate test. A previous cycle is https://github.com/KhronosGroup/VK-GL-CTS/issues/295 and will also be featured in an ASPLOS paper.
The new one is:
@raph Blessed are the test makers.
Great find.
My first theory in the office hours discussion was that the MSL compiler assumed reconvergence after the "if" and then assumed the barrier could be elided because it thought the subgroup was executing reconvergently or in "lockstep".
Who knows what Apple will conclude, though.