Dmabuf screencasting is crazy good. Here's a histogram of the screencasting overhead on my 2560×1600@165 screen—the median is 300 microseconds, and the worst across 12,669 frames was just below 1 ms. Most of that time is spent rendering the frame, perhaps something could even be further optimized in Smithay.
And yeah, if you look at the profiling timeline, I zoomed it in such a way that almost the entire width is taken by one frame, that is 6.05 ms long. Most of it is completely empty!
Today in Wayland compositor profiling! Turns out closing a shm pool file descriptor can result in a fat stall of up to like 6 ms with the kernel waiting on some spinlocks. Which is extra fun when you realize it covers the entire frame budget of your 165 Hz screen, and some clients are sometimes doing it every frame!
I'm trying a "dropping thread" workaround where the fd closing happens on a separate thread. Appears to work at the first glance.