Email or username:

Password:

Forgot your password?
Top-level
Martinus Hoevenaar

@leah I have added several lines of code to my .htaccess-file and also adjusted the firewall on the server where my website is hosted. That's it.
Robots.txt is first of all not obliged, but an agreement with search-engine companies. So, if they ignore it, they are within the boundaries of law, eventhough they're assholes.
I think it is better to do as I did and if you have the possibility to go even further, like blocking via the OS of the server, you're good for now.

7 comments
Martinus Hoevenaar

@leah
I think webdesigners, coders and hosting companies should work together to do something about this pest. It completely ruins the internet, uses tonnes of resources and so damages the climate even more than we're already doing.
We should take back the internet, in a fashion like the fediverse is.
Most mainstream social media and AI-scrapers, if not all, are a virus. Not just for the infrastructure and content of the internet, but also for society as a whole.

DJM (freelance for hire)

@martinus @leah Did the same:
- wall 0: ai.txt
- 1st wall: robots.txt
- 2nd wall: .htaccess
- 3rd wall: IP

All info in French here: didiermary.fr/bloquer-ai-bots-

Martinus Hoevenaar

@cybeardjm @leah How is your server-load now? Mine speeded up a lot, since there's less banging at the door. And I saw an increase in regular and unique visitors, who stay longer on pages.

Earth Notes

@martinus @leah I disagree. If you tell a remote entity, eg by registered mail to the CEO and board or similar, that they are forbidden from accessing your server at all for any reason ("withdrawing implied rights of access") then in the UK at least all continued access is unauthorised and illegal.

It is an approach that I have used a few times to keep out persistently badly behaving identifiable bots and spiders.

Martinus Hoevenaar

@EarthOrgUK @leah I do not approach any CEO, I just use the method described and guess what? My webserver statistics show me that scraping is, more or less, done. There is still some scraping, but those are either bots from individuals that are not mentioned in the .htaccess file or in the firewall, or brand new bots that I wasn't aware of.
It's a continue job, which is, now it works, fun to do.

Earth Notes

@martinus @leah Sorry, I wasn't being clear!

What I meant to point out is that at least within the UK you can make their action illegal by telling them (human-to-human) that their access to your servers is now unauthorised.

Usually, as you point out, the troublemaker goes away relatively soon anyway. But for persistent long-term repeat offenders it's a useful tool. I have used it with bad search spiders, SEO companies, and recently AI nonsense.

Martinus Hoevenaar

@EarthOrgUK @leah I live in Belgium and what you mention is a very interesting way to go or add to the toolkit. I'm not sure how our legislation is about this subject, but I guess there will be not too much. The governments here are quite apathic with new technologies, even backwards.
This is an interesting method to study.

Go Up