I'm not an expert on website hosting, but after reading some of the blog posts o...

jauntywundrkind · 2025-07-03T01:59:09 1751507949

https://xeiaso.net/blog/2025/anubis/ links to https://pod.geraspora.de/posts/17342163 which says:

> If you try to rate-limit them, they'll just switch to other IPs all the time. If you try to block them by User Agent string, they'll just switch to a non-bot UA string (no, really). This is literally a DDoS on the entire internet.

My gut is that the switch between IP addresses can't be that hard to follow. That the access pattern it pretty obvious to follow across identities.

But it would be non trivial, it would entail crafting new systems and doing new work per request (when traffic starts to be elevated, as a first gate).

Just making the client run through some math gauntlet is an obvious win that aggressors probably can't break. But I still think there's probably some really good hanging fruit for identifying and rate limiting even these somewhat rather more annoying traffic patterns, that the behavior itself leaves a figure print that can't be hidden and which can absolutely be rate limited. And I'd like to see that area explored.

Edit: oh heck yes, new submission with 1.7tb logs of what AI crawlers do. Now we can machine learn some better rate limiting techniques! https://news.ycombinator.com/item?id=44450352 https://huggingface.co/datasets/lee101/webfiddle-internet-ra...

xena · 2025-07-03T03:18:45 1751512725

This isn't as helpful as you think. If it included all of the HTTP headers that the bots sent and other metadata like TLS ClientHelloInfo it would be a lot more useful.

jauntywundrkind · 2025-07-03T19:26:44 1751570804

There's headers, but I hadn't noticed that they are the response headers. :facepalm: