Rate Limits

SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference on the best open source models.

Rate limits in the Free tier

Model Requests per Minute
Llama 3.1 8B 30
Llama 3.1 70B 20
Llama 3.1 405B 10
Llama 3.2 1B 30
Llama 3.2 3B 30
Llama 3.2 11B 10
Llama 3.2 90B 1
Temporarily limited
due to high demand
Llama 3.3 70B 20
Llama Guard 3 8B 30
Qwen 2.5 72B 20
Qwen 2.5 Coder 32B 20
QwQ 32B Preview 10

Rate limits may be enforced over shorter time intervals in the Free tier.

If you need higher higher rate limits beyond the Free tier, let us know!
Send a private message to @coby.adams to tell us about your project.

Rate limits in other tiers

  • Rate limits for the Developer tier will be published on its availability - stay tuned!
  • For rate limits in the Enterprise tier, contact sales so we can accomodate your projects needs.
14 Likes