@nixCraft Blocking AI Scraper Bots
https://chriscoyier.net/2023/09/19/blocking-ai-scraper-bots/
By @chriscoyier
Blocking bots
https://ethanmarcotte.com/wrote/blockin-bots/
By @beep
Top-level
@nixCraft Blocking AI Scraper Bots Blocking bots 6 comments
@pixelriot @iamdtms @nixCraft @chriscoyier @beep If I were an AI company I would never use any user agent in this list. @fay @iamdtms @nixCraft @chriscoyier @beep We Crawl very slowly and very politely, always respecting robots.txt. We have been doing so for years, way before LLMs. Yes some companies have used our crawls for AI training, but we’re mainly a research crawl, our goal is to provide resources to researchers, archive and actually increase visibility of underrepresented parts of the web. @fay @iamdtms @nixCraft @chriscoyier @beep There are also people who are starting to use our crawls in order to build indexes and alternative open web search engines, which I love, I don’t believe a handful of companies should be deciding the content that people consume on the web. |
@iamdtms @nixCraft @chriscoyier @beep
Here's a maintained and updated robots.txt from the author of that blog:
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt