Your's truly has a quote in the San Jose Mercury News in article:
"New artificial intelligence: Will Silicon Valley ride again to riches on other people’s products?"
about the unauthorized use of web content for training of #AI models. In the interview, I pointed out to the writer that in the #Fediverse we have the same problem with indexing and archiving permissions, and the proposed solution is marking up posts with metadata. But that didn't make it into the article.
@J12t
Where do you draw the not-OK line in this sequence?
* Human reading websites to learn a subject.
* Hand-creating a website that summarizes and comments on other websites that cover a particular subject.
* Automating creation of an index of websites covering a particular subject.
* Automating creation of an index of all websites [search engine].
* Implementing summary extraction and advanced semantic matching on a search engine.
* Training an LLM on all websites.