This can be appropriate as a tactical short-term hack, but for anyone reading wh...

This can be appropriate as a tactical short-term hack, but for anyone reading who doesn't have a good sense for when to cut corners here, this isn't a great general-purpose solution. You're building assumptions about the way your machines receive requests into your service logic instead of externalizing it in your load balancer, which isn't a good practice.

In practical terms this means that whenever the characteristics of your environment changes, your rate limiting suddenly gets wonky. If you're doing a rolling deployment and take 1/3 of your machines out of service, or if some machines go down for maintenance, or a variety of other things happen to your machines, you're going to end up with fluctuating rate limits.

It sounds like OP is running a relatively mature service, so this probably isn't the best idea for them.