@ahltorp @thomasfuchs ...but at least right now, this has the major problem that you can't be sure it didn't also change something else.
(And we *know* that they don't currently work well for larger-scale system maintenance: Their performance in the SWE-Bench benchmark, where they're given actual Github issues on actual Github repos rather than leetcode problems, is *abysmal*, 0-4% success rate.)