I already implemented a rate limiter in my code, but now it falls after seven calls per minute.
It is no problem for me if it has a lower limit, but it is good to know if I need to adjust my code to seven calls per minute.
@hello1 Yes for the free tier . The free tier can change dynamically based on demand. We do have a pay as you go tier that can be subscribed to until we have the true developer tier ready.
-Coby
Thanks, Coby,
As I told you, there is no problem with the limits, but it would be great if an API would return the actual limits so the API that calls the model could adopt the actual rate limits.
Such JSON response:
rate_limits = {
“sambanova_llama31_8b”: 30,
“sambanova_llama31_70b”: 20,
“sambanova_llama31_405b”: 7,
“sambanova_llama32_1b”: 30,
“sambanova_llama32_3b”: 30,
“sambanova_llama32_11b”: 10,
“sambanova_llama32_90b”: 1
}
One other solution, but I know it requires more development on your side, is to use an async run instead of 429 errors. This minimizes the resource use for open sessions and adopts the required rate limits. That would not need any special workaround on the client side. Now, I create a unique error handling around the calls. I will use a paid model once my developments are ready for a business case.
@hello1 I do know having the rate limit information in the header has been submitted to Product Management and it is on the road map . I do not have the precise ETA for it being available.
-Coby
Thanks, Coby!
Excellent news!