Email or username:

Password:

Forgot your password?
Top-level
Bill Mill

@simon …as long as nobody invents any new languages or techniques

Even if they keep training new models, will they be able to overcome their own poisoning of the well with AI slop?

It feels to me like we’re in a temporary awakening before the world’s greatest corpus of language is ruined

3 comments
Simon Willison

@llimllib I don’t believe in the “model collapse” idea personally, AI models have been deliberately training on “synthetic data” for the last 12 months with increasingly impressive results

How quickly models can pick up new tech is definitely an interesting question - I’ve been pasting dozens of pages of documentation directly into them with good results, eg this example gist.github.com/simonw/97e29b8

Simon Willison

@llimllib the idea of “model collapse” is almost irresistible, because it’s a story of LLMs being brought down by their dual sins of polluting the web and then training on unverified and unlicensed scraped data

If AI labs continued to train indiscriminately it might be a problem, but those researchers are smarter than that: their whole game is about sourcing (and often deliberately generating) high quality training data

Bill Mill

@simon that's pretty dystopian: the only source of consistently un-slopped data is locked up in the AI companies' vaults; the rest of us make do with the crap that's on the web

Go Up