Email or username:

Password:

Forgot your password?
Codeberg.org

You might have guessed it already: We are struggling with excessive crawling today. We have - again - blocked several large IP ranges, but were not yet able to identify the new actor.

We are working on restoring service availability and fine-tuning our rate-limiting.

If someone is interested in implementing an improved native rate-limiting in #Forgejo that also protects other instances from abusive crawlers, please reach out 😉

14 comments
Rev. Roger BW 😷

@Codeberg We block all Azure now. Users whine but if they want good service they shouldn't be using an address in the bad part of Internet Town.

Felipe M.

@Codeberg Just wondering what you folks use in front of forgejo. I experience abusive crawling as well but my instance is a small personal one on my homelab, so it's really annoying be losing bandwidth to abusive actors. Considering any self-hosatble WAF in front of my homelab services.

Codeberg.org

@fmartingr We're using haproxy and have a custom blacklist loaded here: codeberg.org/Codeberg-Infrastr

It's not public (yet), but we should probably consider opening it. Would need a check there are only publicly known IP addresses on there, though. I'm not fully up to date with how law considers publishing IP ranges of bad actors. ~f

Felipe M.

@Codeberg I was considering trying something like CrowdSec, but unsure how they handle the bad IP ranges and what they consider "bad actors". If we could had something like that but with lists like adblockers do, maintained by the community, it could be nice. Will take a look :blobfoxeyes: Thanks and hope you resolve the issue soon!

Codeberg.org

Does "Aceville" ring bells for anyone by chance? Related to tencent probably? We are blocking one IP range after another ...

Daniel Böhmer

@Codeberg I won’t be able to provide an implementation but for better understanding: How does rate limiting work now and what kind of improvement would be helpful in your current scenario?

Codeberg.org

@dboehmer One of the primary constraints of the current rate-limiting is that there is only a global counter that increases for each request.

So a user watching Forgejo Actions logs scroll through will fire a lot of small requests. And a botnet that is distributed over many many IP addresses do not trigger the rate-limiting at all, because each server only fires a few requests.

Harald

@Codeberg @dboehmer Is there an issue for this, to allow a more focused discussion?

Marcus Rohrmoser 🌻

@Codeberg as an emergency measure I'd prbly block non-authenticated http (except signup + login) altogether.

Simon

@Codeberg I know what you feel... Same on gitnet.fr.

if ($http_user_agent ~* "facebookexternalhit|bytespider|Amazonbot|ClaudeBot|AhrefsBot") { return 429; }

Rachel Rawlings

@Codeberg Any chance you put some honeypot paths in robots.txt that trigger fail2ban against any requestors?

Torsten Grote

@Codeberg
Note that @fdroidorg can't build apps hosted on codeberg anymore due to this. Its buildserver clones the repo for each app and soon gets 429.

Codeberg.org

@grote
This is interesting feedback. There have been no changes to the rate-limiting, and the last two changes over the past three months were always increases.

We have blocked several offending IP ranges. Is there information about which hosting providers Fdroid uses?
@fdroidorg

Hans-Christoph Steiner

@Codeberg @grote @fdroidorg where are production buildservers are located is not public information, and it is not necessarily static. But I imagine it would be easy to figure out which IP address by looking at the logs on the codeberg side. We haven't been blocked before by any other git/scm hoster, to my knowledge.

Go Up