I know the reason behind the rate limits, but it would be great to have an endpoint that limits without error, just waiting until the next request happens.
An async framework, such as Nginx, differs from Apache and can be archived without extra error handling on the application side, especially if the API is used for many components, such as a multi-agent application.
If there is another way to handle it on the client side that is best, please let me know.
Regarding speed, most of these calls are not performance-critical, but I think the 429 error calls also log overhead on your side.