@alyssa I have some intense curiosity about Apple Silicon GPU, and you're probably the #1 expert on this who can speak publicly. One is: if you're implementing Vulkan, does the hardware support barriers with device scope? Metal doesn't export this.
I'm also very curious about the hardware's capability for dynamically spawning work or doing other dynamic things (malloc!) from within a dispatch.