@llimllib the idea of “model collapse” is almost irresistible,...

@llimllib the idea of “model collapse” is almost irresistible, because it’s a story of LLMs being brought down by their dual sins of polluting the web and then training on unverified and unlicensed scraped data

If AI labs continued to train indiscriminately it might be a problem, but those researchers are smarter than that: their whole game is about sourcing (and often deliberately generating) high quality training data

Like 31 August at 13:38 | Open on fedi.simonwillison.net