Email or username:


Forgot your password?
Shauna GM

@simon @hynek do you know Go enough to assess the tests? I have had a number of contributors to a project use AI and often their tests pass but don't actually test the right thing.

Simon Willison

@shauna @hynek I think I know enough about programming to assess the tests: I use tricks like changing the implementation, confirming the test breaks, then fixing the implementation and confirming the test passes

Simon Willison

@shauna @hynek I have 20+ years of programming to rely on here though - I don’t think “shipping production code in a language you don’t know” is something that’s a great idea with a LOT of that existing experience

Hynek Schlawack

@simon @shauna Yes, that’s a HUGE qualifier. Given how careers typically work in IT, I’m guessing that’s top 1 percentile.

🥥Matthew Martin🥥☑

@simon @shauna @hynek - only frontier models routinely find bugs with unit tests. 3.5 wrote vacuous tests in comparison to 4 or 4o
- once it fixed the bug via monkey patching before the test ran to make it pass (malicious compliance!)
- the bots write so many unit tests that after a while quantity becomes a quality all of its own & the value comes with the next change I make, I'll see how sensitive the rest of the app was to a change in any part of the app (which points out design flaws)

Go Up