Even with a huge corpus of data, LLMs are useless, and here's why:
They generate text which looks like, but is not, equivalent to a researched and reviewed paper.
They will take bits and pieces from the entire set of articles and chunk them together into something which is functionally meaningless but looks acceptable at a casual glance.
And I mean individual words! Sentence fragments! Syllables!
They don't know ANYTHING. But they give the illusion of doing so.
@theogrin @kkolakowski @alyssa LLMs don't necessarily need to generate stuff
There are signs of promise for LLMs that avoid hallucination by paraphrasing and permutating instead.
I recommend checking out perplexity.ai
LLMs are also quite helpful for automation; the base training data is just to get the relations right in the first place, then constraints, checks, temperature, and human validation can help vet things out