Rate limit informations

antoinedemangeon · September 11, 2024, 7:50pm

Hello everyone, I just tried the inference and I’m really impressed by the speed—it’s nearly instant. However, the lack of detailed rate limits is a bit challenging for me, as it makes it hard to gauge whether the endpoint can scale when analyzing datasets. Would it be possible to get some basic information on the rate limits for the current tiers, as well as the upcoming ones? Terms like “low” or “very low” are a bit unclear. I understand these are new products, but having more precise information, especially for the developer tier (which could be the most relevant for me in the future), would be very helpful. Thanks in advance!

coby.adams · September 12, 2024, 12:40am

@antoinedemangeon.

Welcome to our community and we appreciate you using our offerings and taking the time to provide valuable feedback .

We do understand your frustration and requirements. Clarification will be coming soon. I will ensure to update this thread with the proper pointers as soon as we publish them.

Thank you for your patience,
-Coby

nikhilswami1 · October 10, 2024, 10:38am

@coby.adams please update , i have been looking since a month.
“low-rate limits” is very subjective and not quantitative.
there is a workload to summarize around 50 news items, im still in confusion wether samba will handle or 429

coby.adams · October 11, 2024, 12:11am

@nikhilswami1 we published the rate limits 20 days ago . I am sorry that I did not come back and link it .

-Coby

harisnop · December 9, 2024, 11:48am

hello. im new user sorry for this question. would you explain what is the meaning for rate limit 30 per minute ? i am thinking about 30 question per minute i can ask to chat dialogue box. please enlight me if i am wrong.

coby.adams · December 9, 2024, 8:36pm

@harisnop welcome to the community.

The rate limit is the number of API calls within a given minute. Now if you build a chat application and that application calls a model api 2-3 times per request then that means the app would have fewer app interactions per minute.

Now let’s say you build an app used by 100 people concurrently you would need to establish some sort of queueing process so as not to blow past the rate limits for your API key because your app would be using 1 key to serve all 100.

Coby