SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference on the best open source models.
Rate limits in the Free tier
Model | Requests per Minute |
---|---|
Llama 3.1 8B | 30 |
Llama 3.1 70B | 20 |
Llama 3.1 405B | 10 |
Llama 3.2 1B | 30 |
Llama 3.2 3B | 30 |
Llama 3.2 11B | 10 |
Llama 3.2 90B | 1 Temporarily limited due to high demand |
Llama 3.3 70B | 20 |
Llama Guard 3 8B | 30 |
Qwen 2.5 72B | 20 |
Qwen 2.5 Coder 32B | 20 |
QwQ 32B Preview | 10 |
Rate limits may be enforced over shorter time intervals in the Free tier.
If you need higher higher rate limits beyond the Free tier, let us know!
Send a private message to @coby.adams to tell us about your project.
Rate limits in other tiers
- Rate limits for the Developer tier will be published on its availability - stay tuned!
- For rate limits in the Enterprise tier, contact sales so we can accomodate your projects needs.