Email or username:

Password:

Forgot your password?
Top-level
Leah (Cloudstylistin)

I would go so far to say that for small sites 30-50% of traffic is now artificial bot traffic. 20-75% for bigger sites with much content. It's a huge waste of resources they outsource to all of us in terms of money and environmental damage.
And this is part of the much bigger footprint all this AI shit has beside the irresponsible big amounts of resources they waste officially.

32 comments
morihofi

@leah For me, an individual who runs a small private website, this is true. 50% - 60% of the traffic on my website comes from bots in an average month.

tsia

@morihofi @leah I would like to compare my request logs. How are you measuring this? Some sort of user agent list?

Darrin West

@morihofi @leah @tsia_ We might crowd source detection by sharing (trimmed) logs somewhere. Seeing IPs and ids repeated across zillions of websites is a pretty good fingerprint of a bad actor. If all that were automated, the results could be auto added to block lists.

Leah (Cloudstylistin)

@obviousdwest @morihofi @tsia_ there are a lot of providers offering such services and I would prefer to fix the source of the peoblem and not the symptoms.

morihofi

@leah @obviousdwest @tsia_ I just discovered in my Matomo analytics that some traffic came from clients with a referer to xtraffic.plus. Some sort of traffic generator for SEO optimisation (How even should that work)

tsia

@morihofi @leah @obviousdwest (knowing some of those SEO people I wouldn’t be surprised if it didn’t work at all)

Werawelt

@morihofi @leah

Maybe: couldn't you reduce the number of hits per minute and block the IP for a certain period of time? In WP, for example, you can do this with Wordfence

Leah (Cloudstylistin)

@werawelt @morihofi thats only fixing the symptoms not the problem.

maybit

@leah @werawelt @morihofi
agree also, but fixing the problem, while it is something I'd like to see happen or even contribute to, is not something actionable which would relieve the anxiety* I feel reading about this

* light anxiety level, no need for a CW, I'm just expressing my feelings

mirabilos

@werawelt @morihofi @leah nah, typical visitors do bursts (page, CSS, fonts, images, js), so you’d hurt them too much

.oO(maybe one could just block all “cloud” services, since they don’t list their customers’ actual ranges in WHOIS and change them too often anyway… if enough people did this, maybe we could get good IP subnet to customer mapping in WHOIS…)

Earth Notes

@morihofi @leah >50% of my traffic has been non-human for a long time, though that has included search engine spiders etc. I have a target of keeping them to < 50%. See top line in table.

earth.org.uk/note-on-site-tech

The AI bots are a huge slice added to that.

Leah (Cloudstylistin)

In my opinion we should strongly regulate the AI stuff and just forbid most of it, like it's forbidden to run a very dirty coal plant too. For everything else we do it a little like in the pharmaceutical sector. You have to prove that your shiny new product is better/a _real_ benefit compared to the old one. I would bet that only very few very specialized use cases would be the result. But this would require an evaluation that is not based on capitalist logic and this just won't happen.

Nicole Parsons

@leah

Hybrid warfare & cyberwarfare has a new ally in the AI hype.

It's unsurprising that LLM & AI development is being funded by investors from hostile state actors like Saudi Arabia's Mohammed bin Salman, Iran, Russia, & China.

The scams, election interference, & climate denial from AI is huge.
reuters.com/technology/artific

It was never intended to be a legitimate business product, except for deluded CEO's who were promised mass layoffs & wage suppression schemes.
futurism.com/investors-concern

@leah

Hybrid warfare & cyberwarfare has a new ally in the AI hype.

It's unsurprising that LLM & AI development is being funded by investors from hostile state actors like Saudi Arabia's Mohammed bin Salman, Iran, Russia, & China.

The scams, election interference, & climate denial from AI is huge.
reuters.com/technology/artific

Nicolai Hähnle

@leah It ought to be possible to sue people who disregard robots.txt under some kind of computer-related crime law.

Mathaetaes

@leah I wonder if there wouldn’t be a market for a DDoS protection-like service that detects heavy traffic from a single IP and starts returning garbage data.

Ignore robots.txt with your ML scraper, get poisoned data in your ML scraper.

Seems like a fair outcome to me.

T1gerlilly

@leah I work for a large software firm with huge amounts of online content - which the AI firms have publicly said they trained on. We've since indicated through our various mechanisms they are not allowed to scrape content... But they very clearly are. We just moved to g4 analytics and started removing bot traffic from our metrics. It was nearly 70% of our traffic, so your numbers are dead on.

T1gerlilly

@leah And here's the thing, they're scraping and surfacing content that's our IP. They're literally stealing from us and redirecting traffic from our business.This is definitely criminal conduct, even if we don't have laws that mark it as such.

bookandswordblog

@T1gerlilly @leah I had a similar experience with my Wordpress site hosted in Canada and my static site hosted in Austria

OddOpinions5

@leah
probably stupid question from a non programmer
can't you crowd source a reverse attack ?

Rachel Rawlings

@leah At minimum, it should follow the European model that says do what thou wilt until there's significant evidence of harm. We have all that and more with AI (also fossil fuels).

panther

@leah agreed. It should be regulated with rules and quality check as for fake news, images and stuff

Florian Lohoff

@leah i deployed config snippets available for inclusion blocking OpenAI, Claude and some others by useragent as they really misbehave. Pasting random stuff into search boxes etc.

Just a single line in the vhost and the get http code for Removed.

Leah (Cloudstylistin)

Ok that went viral. I have two comments regarding the most frequent replies.

1. Yes, cloudflare is a short term solution, but the real solution can't be that we all need to book services at company A to protect against company B. The result would be a huge monopoly battle.

2. Stop suggesting to poison them with huge amounts of fake data. That would only further increase the waste of resources on all sites.

hukl

@leah Without knowing all the details myself, do you think HAProxy IP based rate limiting could have a chance for mitigation? IIRC the evaluation window for the rate limiting can be quite short.

pitch R.

@leah the only positive thing in that cloudflare Argument: at least the next cold war would be partially federated...

I don't get why people are so harshly opposed to legislature or even say it won't work. Especially in regards to electronic systems.
GDPR showed that Europe is a relevant enough market to set rules.

And it's mostly the same people claiming that a 2% subvention cut will drive hundred thousands of workplaces away. 🙄

💫💖 Haecksen-Maya 💖💫

@leah Idk seems to me like it's a war of attrition, so poisoning their data could very well work?

Leah (Cloudstylistin)

@MayaMitKind and just hoping that something will change because some little sites started poisoning? That's a bold bet. That's like having no car and hoping this will change the car industry.

rappet

@leah what is traffic in this case? Does the bot load the whole page with pictures and everything?

Leah (Cloudstylistin)

@rappet that's different depending on the case and bot. See traffic as number of requests if it helps.

Go Up