This can be appropriate as a tactical short-term hack, but for anyone reading who doesn't have a good sense for when to cut corners here, this isn't a great general-purpose solution. You're building assumptions about the way your machines receive requests into your service logic instead of externalizing it in your load balancer, which isn't a good practice.
In practical terms this means that whenever the characteristics of your environment changes, your rate limiting suddenly gets wonky. If you're doing a rolling deployment and take 1/3 of your machines out of service, or if some machines go down for maintenance, or a variety of other things happen to your machines, you're going to end up with fluctuating rate limits.
It sounds like OP is running a relatively mature service, so this probably isn't the best idea for them.
In practical terms this means that whenever the characteristics of your environment changes, your rate limiting suddenly gets wonky. If you're doing a rolling deployment and take 1/3 of your machines out of service, or if some machines go down for maintenance, or a variety of other things happen to your machines, you're going to end up with fluctuating rate limits.
It sounds like OP is running a relatively mature service, so this probably isn't the best idea for them.