Did you see this? The whole thing with "the stack".
https://post.lurk.org/@emenel/112111014479288871
Some jerks did mass scraping of open source projects, putting them in a collection called "the stack" which they specifically recommend other people use as machine learning sources. If you look at their "Github opt-out repository" you'll find just page after page of people asking to have their stuff removed:
https://github.com/bigcode-project/opt-out-v2/issues
(1/2)
…but wait! If you look at what they actually did (correct me if I'm wrong), they aren't actually doing any machine learning in the "stack" repo itself. The "stack" just collects zillions of repos in one place. Mirroring my content as part of a corpus of open source software, torrenting it, putting it on microfilm in a seedbank is the kind of thing I want to encourage. The problem becomes that they then *suggest* people create derivative works of those repos in contravention of the license. (2/2)