Rate limits are a deal breaker "TOO MANY REQUESTS [retrying in 3m 8s attempt #8]"

noltrix · March 18, 2026, 3:49pm

I’m currently evaluating your service for agent-based workflows and running into a fundamental limitation: the current rate limits make practical agent usage infeasible.

While the high token-per-second throughput is excellent and well-suited for low-latency generation, it is effectively offset by restrictive request rate limits. A single agent execution can easily exceed 60 requests per minute, which leads to throttling and interrupts the workflow.

In its current state, this creates a mismatch: the infrastructure supports fast responses, but the rate limits prevent sustained execution. This significantly reduces the value proposition and undermines the primary reason to adopt your platform.

Are there any options to increase the request rate limits—ideally within today—or to enable a configuration better suited for agent-style workloads?

Without an adjustment, I will need to consider moving this workload to another provider where the rate limits align with this use case.

shivani.moze · March 18, 2026, 5:05pm

@noltrix ,
Thank you for highlighting this issue. We understand that the current request rate limits are restricting agent-based workflows despite high token throughput. We’re reviewing options to better support sustained agent executions and will update you shortly.

Regards,
Shivani Moze

shivani.moze · March 18, 2026, 5:45pm

Hi @noltrix ,
We understand that the current request rate limits are affecting agent-based workflows despite high token throughput. Could you please share the model, configuration, and workflow specifications you are using so we can better assess options? We’ll review and update you on potential solutions as soon as possible.

Regards,
Shivani Moze

Coby · March 19, 2026, 12:04pm

@noltrix

We clearly state the rate limits for free and Developer tier in our documentation Rate Limits Policy - SambaNova Documentation

If you require higher we do have Enterprise tiers available. Would you like to have someone from our sales organization reach out to you about Enterprise tier commit plans?

Alternatively we have an EU inference partner Infercom who is based in Germany . Infercom Inference Dashboard

They offer 80 RPM on MIniMax on their Developer tier . You should definitely give them a try and reach out.

Coby