There's a page on one project's GitHub Wiki with a lot of images (font specimens) which will get you flagged as abusing GitHub's infrastructure - and institute a rate limit:
2 years ago, I realised we were accidentally DOS'ing GitHub and they didn't care. I was impressed.
We had software for checking a piece of data. To do this, (we realised later) it fetched a schema from GitHub every time. The context is this software was used infrequently so that was fine - and also there was good reasons why it should reload the schema every time.
We then took this bit of software and dumbly used it in a tool for checking masses of data at once. We did this quite happily for 2 months or so, until one day I was trying to work out why it didn't run that fast. (It could only check 2 or 3 pieces of data a second).
At this point I realised/discovered all the above, went "oh shit" and quickly slapped in a request caching library. We stopped DOS'ing GitHub and the amount of checks we could do per second went right up.
But I'm pretty sure that at no point did GitHub rate limit us or block us during this - for which I was very impressed.
I'm curious if it's a legitimate abuse flag or if it's more like a rate limit with some unfortunate phrasing for the humanized message. Like am I just getting blocked at the edge for a bit or would there be a flag on my account somewhere now?
Yeah it's probably a very short-lived rate limit. I got that error the other night, searching a repo for how it returns its version information in the code, and I got that error as shown in the screenshot. I then clicked Back in my browser, clicked another link, and it was fine. /shrug
It makes sense; loading that README performs almost 1700 requests for GitHub resources. That's probably way off the regular GitHub usage and definitely enough to trigger a regular user's rate limit.
In Chrome you can view the source here: "view-source:https[DELETE]://github.com/olikraus/u8g2/wiki/fntlistall" (added the [DELETE] after the protocol to prevent people from accidentally loading the actual website rather than source)
It's just a page with a lot of small images (all directly loaded from the repo) and a short description for each. After loading half of the images GitHub throttled me and blocked requests with an error message about abuse.
After 2 minutes I was able to access GitHub again but I imagine they will block you for longer if you try to do it again and again.
Could GitHub mitigate the load to their servers by serving files via a caching layer? Obviously the first request would hit everything, but there’s no reason that subsequent requests should cause problems.
I remember, when a page had many-many-many images, we used to call it "56k killer" back in the old days, as it would take a long time to load that page on a 33k or a 56k modem.
HN’s rate limiting can be pretty aggressive. I’ve hit it seemingly “just” clicking an upvote immediately after submitting a response (a common pattern as I often only remember to reward a good comment in the voting system after I’ve given it verbal praise).
Just tried from a residential connection on a Macbook. Only about a quarter of the images loaded, with the rest reporting a 429 error. Browsing to other pages afterwards shows me the "Access has been restricted" warning.
A simple lazy loading of images would save mega tons of bandwidth to them. This could be implemented in 10 lines of js ... Plus few lines of backend code to rewrite the raw html to make the images lazy loaded.
If you work at github, do this in few hours and be the hero :)
We had software for checking a piece of data. To do this, (we realised later) it fetched a schema from GitHub every time. The context is this software was used infrequently so that was fine - and also there was good reasons why it should reload the schema every time.
We then took this bit of software and dumbly used it in a tool for checking masses of data at once. We did this quite happily for 2 months or so, until one day I was trying to work out why it didn't run that fast. (It could only check 2 or 3 pieces of data a second).
At this point I realised/discovered all the above, went "oh shit" and quickly slapped in a request caching library. We stopped DOS'ing GitHub and the amount of checks we could do per second went right up.
But I'm pretty sure that at no point did GitHub rate limit us or block us during this - for which I was very impressed.