Rate Limits

SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference on the best open source models.

Rate limits in the Free tier

Model Requests per Minute
Llama 3.1 8B 30
Llama 3.1 70B 20
Llama 3.1 405B 10
Llama 3.2 1B 30
Llama 3.2 3B 30

Rate limits may be enforced over shorter time intervals in the Free tier.

If you need higher higher rate limits beyond the Free tier, let us know!
Send a private message to @coby.adams or @sarosh.naseem to tell us about your project.

Rate limits in other tiers

  • Rate limits for the Developer tier will be published on its availability - stay tuned!
  • For rate limits in the Enterprise tier, contact sales so we can accomodate your projects needs.
11 Likes