Email or username:

Password:

Forgot your password?
Top-level
Alexander S. Kunz

@leah I’ve put my site behind a free Cloudflare account with their web application firewall, and began to “challenge” all traffic from major hosting companies (AWS, Hetzner, Google, Microsoft, and more over time as I kept monitoring my logs). Took some fine tuning to whitelist legit services of course. Bandwidth usage has dropped by two thirds now…

Crawlers such as “DataForSEOBot” or “FriendlyCrawler” and others fly pretty much under everyone’s radar still. It’s a huge mess.

5 comments
mirabilos

@alexskunz @leah some ignore Crawl-Delay in robots.txt, too… grr… I 429 those with .htaccess.

Because, fuck Cloudflare.

Alexander S. Kunz

@mirabilos I know that CF has a mixed reputation. From a small website owner's perspective, it's a great help to curb this madness.

(there's a lot more going on than just ignoring crawl delays. 3rd parties that identify themselves as GoogleBot etc. for example are super easy to block with CF — hard to do that with .htaccess)

mirabilos

@alexskunz as a lynx user I have extra reasons to hate them, ofc

iolaire

@alexskunz @leah this sounds like good advice and also helps to explain why I see that challenge a lot more these days (Comcast and cell ips)

Alexander S. Kunz

@iolaire yes, it's a bit unfortunate. I try to not "challenge" dial-up/residential/cell but some combinations (outdated browser etc.) might trigger a challenge.

Cloudflare provides a "challenge solve rate" (CSR) and it's a little bit high today for me — 1.2% of presently 751 challenged requests were solved. Eight slightly inconvenienced visitors is better than getting scraped to death, with resource warnings from my hosting company, etc.

Go Up