@simon @hynek do you know Go enough to assess the tests?...

@simon @hynek do you know Go enough to assess the tests? I have had a number of contributors to a project use AI and often their tests pass but don't actually test the right thing.

Like 31 August at 14:36 | Wall-to-wall | Open on social.coop

4 comments

Simon Willison

@shauna @hynek I think I know enough about programming to assess the tests: I use tricks like changing the implementation, confirming the test breaks, then fixing the implementation and confirming the test passes

31 August at 14:40 | Open on fedi.simonwillison.net

Simon Willison

@shauna @hynek I have 20+ years of programming to rely on here though - I don’t think “shipping production code in a language you don’t know” is something that’s a great idea with a LOT of that existing experience

31 August at 14:41 | Open on fedi.simonwillison.net

Hynek Schlawack

@simon @shauna Yes, that’s a HUGE qualifier. Given how careers typically work in IT, I’m guessing that’s top 1 percentile.

31 August at 14:44 | Open on mastodon.social

Matthew Martin

@simon @shauna @hynek - only frontier models routinely find bugs with unit tests. 3.5 wrote vacuous tests in comparison to 4 or 4o
- once it fixed the bug via monkey patching before the test ran to make it pass (malicious compliance!)
- the bots write so many unit tests that after a while quantity becomes a quality all of its own & the value comes with the next change I make, I'll see how sensitive the rest of the app was to a change in any part of the app (which points out design flaws)

31 August at 14:45 | Open on mastodon.social