@shauna@hynek I think I know enough about programming to assess the tests: I use tricks like changing the implementation, confirming the test breaks, then fixing the implementation and confirming the test passes
@shauna@hynek I have 20+ years of programming to rely on here though - I don’t think “shipping production code in a language you don’t know” is something that’s a great idea with a LOT of that existing experience
@simon@shauna@hynek - only frontier models routinely find bugs with unit tests. 3.5 wrote vacuous tests in comparison to 4 or 4o
- once it fixed the bug via monkey patching before the test ran to make it pass (malicious compliance!)
- the bots write so many unit tests that after a while quantity becomes a quality all of its own & the value comes with the next change I make, I'll see how sensitive the rest of the app was to a change in any part of the app (which points out design flaws)
@shauna @hynek I have 20+ years of programming to rely on here though - I don’t think “shipping production code in a language you don’t know” is something that’s a great idea with a LOT of that existing experience