Bad and freeloader behaviour AnthropicAI....

Bad and freeloader behaviour
AnthropicAI. https://www.reddit.com/r/linux/comments/1ceco4f/claude_ai_name_and_shame/ #linux #linuxmint #opensource Please boost to shame them.

Like 27 April at 15:57 | Open on mastodon.social

39 comments

theheretic

@nixCraft damn, that blows. I really like using their bot over ChatGPT too.

27 April at 15:59 | Open on techhub.social

Bo Stahlbrandt

@nixCraft we blocked them at #nikonians some time ago since they are scraping like mad.

27 April at 16:05 | Open on infosec.exchange

Christian Quest 🌍

@nixCraft got the same on @osm_fr discourse forum.

They are now blacklisted, with others.

27 April at 16:14 | Open on amicale.net

nixCraft 🐧

@cquest @osm_fr any idea how to black list them? robots.txt? CIDR block?

27 April at 16:59 | Open on mastodon.social

Christian Quest 🌍

@nixCraft @osm_fr robots.txt + user-agent rule in nginx

Few days ago that silly Claudebot made 100.000+ queries on five years old phpBB urls and got the same amount of 404 in return.
Now it is only 403 they will get whatever the url

27 April at 17:06 | Open on amicale.net

madopal

@cquest @nixCraft @osm_fr Did the same for another site, and it just happily continues to request urls, hundreds of thousands of 403s not dissuading it in the slightest. I reached out to them via LinkedIn, got a response, trying to get them to pull their heads out of their collective asses.

27 April at 18:00 | Open on mastodon.social

F4GRX SÃ©bastien

@cquest @nixCraft @osm_fr ah so thats just a:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} <regex>
RewriteRule . - [R=403,L]

Nice and easy! What regex should we block? is "^.*Claude.*$" enough?

27 April at 19:59 | Open on chaos.social

Gonçalo Ribeiro

@f4grx Just checked the logs on my web server. I see they've used "ClaudeBot" and "claudebot" in my case.

27 April at 20:11 | Open on infosec.exchange

Exzedo

@nixCraft Ah, so not only do they gather fuel by the truckload for their plagiarism bot, but they also actively harm the source of said fuel in the process.

How pleasantly disgusting.

27 April at 16:21 | Open on meow.social

katana crimson

@nixCraft Google did this back to one forum I was a part of years ago - and their stubborn refusal to obey robots.txt's crawl-delay directive still angers me to this day.

It's not just new crawlers that do this. Even the big old search engines have done it for ages.

27 April at 16:35 | Open on mas.to

Luigi :donor:

@nixCraft I wrote my master's thesis on data poisoning LLMs, especially as a self-defense mechanism against bot scraping.

The poisoning tool is pretty basic and shitty for now, I am aiming to release a better version in the summer

27 April at 16:39 | Open on infosec.exchange

nixCraft 🐧

@luigirenna I really hope you release that tool.

27 April at 17:00 | Open on mastodon.social

Luigi :donor:

@nixCraft University of Chicago has released an amazing data poisoning for images called Nightshade. Have a look, it's fantastic.

https://nightshade.cs.uchicago.edu/

27 April at 17:52 | Open on infosec.exchange

F4GRX SÃ©bastien

@luigirenna @nixCraft Yes but we need something for website protection. Following you to get future info about your tool!

27 April at 20:01 | Open on chaos.social

Jamie Knight

@nixCraft if these parasites had any shame they wouldn't be stealing others' work for their AI grift in the first place.

27 April at 17:10 | Open on social.vivaldi.net

Jeff Rivett

@nixCraft Same on several of the sites I manage. Blocking everywhere.

27 April at 17:10 | Open on fosstodon.org

Ted Johnson

@nixCraft I know some Claude lovers who are going to be very conflicted about this.

27 April at 17:10 | Open on mas.to

F4GRX SÃ©bastien

@HalfHeartedFanatic @nixCraft All AI is the same shit, no exception.

27 April at 20:02 | Open on chaos.social

Ted Johnson

@f4grx @nixCraft

#NotAllAI

But definitely AI that requires the entire WWW to be competitive.

28 April at 2:27 | Open on mas.to

Fubaroque

@nixCraft Interesting. I just decided that enough was enough and am since yesterday redirecting (301 permanent) ClaudeBot to large files filled with random bytes elsewhere on the web. 🤣

Didn’t know it was scraping for AI. But I’m sure the “info” they get out of that will be useful to them. 🤭

27 April at 17:21 | Open on mastodon.social

Kierkrampusgaanks regretfully

@nixCraft maybe we should start prefacing ai companies as parasitic when talking about them on reddit et al? Eg ‘parasitic ai company anthropic bla bla bla’

27 April at 17:36 | Open on beige.party

iamdtms

@nixCraft Blocking AI Scraper Bots
https://chriscoyier.net/2023/09/19/blocking-ai-scraper-bots/
By @chriscoyier

Blocking bots
https://ethanmarcotte.com/wrote/blockin-bots/
By @beep

27 April at 17:36 | Open on mas.to

DELETED

@iamdtms @nixCraft @chriscoyier @beep

Here's a maintained and updated robots.txt from the author of that blog:

https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt

27 April at 18:31 | Open on mas.to

F4GRX SÃ©bastien

@pixelriot @iamdtms @nixCraft @chriscoyier @beep If I were an AI company I would never use any user agent in this list.

27 April at 20:03 | Open on chaos.social

morgan

@iamdtms @nixCraft @chriscoyier @beep please don't block CCBot though, it's extremely well behaved cc @pjox

27 April at 19:12 | Open on lingo.lol

Pedro Ortiz Suarez

@fay @iamdtms @nixCraft @chriscoyier @beep We Crawl very slowly and very politely, always respecting robots.txt. We have been doing so for years, way before LLMs. Yes some companies have used our crawls for AI training, but we’re mainly a research crawl, our goal is to provide resources to researchers, archive and actually increase visibility of underrepresented parts of the web.

27 April at 20:04 | Open on mastodon.social

Pedro Ortiz Suarez

@fay @iamdtms @nixCraft @chriscoyier @beep There are also people who are starting to use our crawls in order to build indexes and alternative open web search engines, which I love, I don’t believe a handful of companies should be deciding the content that people consume on the web.

27 April at 20:07 | Open on mastodon.social

iamdtms

@pjox @fay @nixCraft @chriscoyier @beep Thank you for letting me know. I'll act like this.

27 April at 20:33 | Open on mas.to

DamonHD

@nixCraft had to mail the press@ address to stop them re-checking every few seconds if I'd changed my mind about blocking them for effectively DoSing my site. Tech bros in a hurry...

27 April at 19:28 | Open on mastodon.social

DamonHD

@nixCraft my email subject was "cease and desist" which may be the only language that they understand!

27 April at 19:32 | Open on mastodon.social

@nixCraft fuck ai fuck ai fuck ai

27 April at 19:56 | Open on chaos.social

Gonçalo Ribeiro

@nixCraft For anyone's reference, I've checked my logs and see user agents "ClaudeBot" and "claudebot" from below IPs, since December 2023 (block at your own risk, don't know what else they may be used for).

PS: they're all AWS IPs.

3.84.110.120
13.59.136.170
18.210.24.192
54.198.157.15
54.81.157.133

27 April at 20:14 | Open on infosec.exchange

Freelock

@nixCraft This explains a lot -- several sites we manage have been hit by extremely unfriendly ClaudeBot scrapes in the past week. We've been putting in rate limiters that help -- but I like the idea of poisoning 😀

27 April at 20:16 | Open on drupal.community

YurkshireLad

@nixCraft it’s a shame you can’t return a random block of text instead of the page content.

27 April at 21:35 | Open on mastodon.social

Hunterrules

@nixCraft cant wait for there to be a new spinoff of hoarders called "digital horders" lmao. this is the internet version of living off your mom at 30

27 April at 22:13 | Open on techhub.social

Raptor :gamedev:

@nixCraft I'm not so sure I'd call it bad behavior, as far as I can tell their bot respects robots.txt and looking at the one for the LM forums they don't have any crawl delay set or any restrictions on indexing so any new large crawler that finds the site for the first time would likely have the same effect, as anything indexing the site that hasn't before will just continue branching without any delay, google will nuke you like this too if you put a big dataset up and they index it w/o delay

27 April at 23:04 | Open on mastodon.gamedev.place

Michael Boelen

@nixCraft
Yes, saw them as well on my end. Tried contacting them and not much of a response yet. We should feed them digital rubbish, so their products will output 🤡💩
@securingdev

27 April at 23:15 | Open on mastodon.social

DELETED

@nixCraft Well done to the @linuxmint team for finding and blocking this Ai scraper 👍

28 April at 3:52 | Open on wehavecookies.social

s92

@nixCraft Dang, I actually somewhat enjoyed using Claude, for all its limitations.

28 April at 5:52 | Open on c.im

Go Up