@johan Именно об этом и пишет автор: Sure, there was spam in the wordfreq data sources, but it was manageable and often identifiable.

LLMs generate text that masquerades as real language with intention behind it, even though there is none, and their output crops up everywhere.

Крч не очень понятно, с чем именно ты не согласен.