GPT-OSS 120B is Now Live on SambaCloud

vasanth.mohan · September 8, 2025, 8:40pm

OpenAI’s GPT-OSS 120B is now generally available for all developers on SambaCloud. We are running the model at the full 131K context length and at speeds over 700 tokens/seconds/user powered by the SambaNova RDU. It delivers two advanced features essential for enterprise deployments:

Reasoning Effort Control: Developers can optimize performance and cost by specifying reasoning intensity (low/medium/high) per query. This parameter dynamically adjusts computational resources - from rapid responses for simple tasks to maximum accuracy for complex chain-of-thought prompts. Our API defaults to the balanced medium intensity and can be easily changed.
Chain of Thought Tool Calling: Unlike other open-source models, GPT-OSS supports real-time tool invocation during reasoning cycles. Developers can inject tool responses directly into the reasoning process, significantly enhancing output accuracy for agentic workflows and RAG implementations.

Read more about it in our blog and we are excited to be hosting this model on SambaCloud.

david.keane · September 9, 2025, 4:25am

working great thanks for doing this - just one question - for me doing a multiple tool call is having execution issues - its like the harmony format isnt quite right in the way the model is setup - with this setup (sample of mutiple tool call)
{

“model”: “gpt-oss-120b”,

“parallel_tool_calls”: true, // Explicitly enabled

“reasoning_effort”: “high”, // Maximum effort

“tool_choice”: “auto”,

“temperature”: 0.7,

“tools”: [/* stock_price and coin_data tools */]

}

we only get back
“tool_calls”: [

“id”: “call_c1bd33effb134a34b3”,

“type”: “function”,

“function”: {

“name”: “stock_price”,

“arguments”: “{\“ticker\”: \“CBA.AX\”}”

has anybody had success yet with gpt-oss with multipe tool calls?

mkhristi32 · September 9, 2025, 3:16pm

Just built an AI-powered CV Screening workflow in n8n with SambaNova GPT-OSS 120B + Mistral OCR.

olivier · September 13, 2025, 5:01am

Is this fully featured out in the Vercel AI SDK community provider ( Community Providers: SambaNova ) including for v5 with the full parallel tool calling and structured outputs with flex on reasoning in provider configs?.. Who is maintaining that? @david.perez ? Thank you!

It works like a charm with groq so would love to switch over to Sambanova

Coby · September 15, 2025, 9:57pm

@olivier our Vercel integration is documented here: Vercel - SambaNova Documentation . It is being maintained by a member of our internal Integrations working group. I have contacted them to find out if it supports the parallel tool calls.

-Coby

Coby · September 15, 2025, 9:59pm

@david.keane I apologize for my team not seeing this sooner. For a tighter SLA on break fix issues, open a support case by emailing help@sambanova.ai .

-Coby

david.keane · September 17, 2025, 2:13am

All good @coby - just excited to get full Vercel AI SDK going and to have parallel tool calling working - those are going to empower lots of new applications.

Coby · September 17, 2025, 3:38am

@david.keane I verified that they have updated our integration for Vercel v5 . I am still fiddling with some of the OSS parallel tool calling and will report back when I have that sorted.

-Coby

olivier · September 18, 2025, 6:28pm

The rate limits for gpt-oss are not in the limits page ( couldnt paste link ) .. Could you clarify? Also we can’t even start using oss-gpt on Sambanova or any other model if we can’t get at least 500 RPM due to bursts in batch workflows. Considering up-ing the limits? Groq’s free tier is similar or better than Sambanova’s developer tier .. The only reason we haven’t switched is the very low limits for developer tier. (e.g gpt-oss-120b on groq has a 1k RPM on dev tier).

vasanth.mohan · September 19, 2025, 4:15pm

Groq keeps their RPM pretty high, but has a pretty strict TPM at 250K even on their dev tier (about 250 tokens per request to use it all).

Would love to understand more about your use case and what the usage pattern might look like.

olivier · September 20, 2025, 7:05pm

Yes the 250k limit with groq is annoying.. our use case is essentially bursts of parallel calls with small-medium contexts .. e.g. running 100 parallel calls within 20 seconds to analyse 100 different chunks/partial docs .. ie parallel extraction/analysis/summarization. So we don’t often hit the 250k TPM but we do hit the RPM.

ibuzzfacts · October 7, 2025, 4:14pm

When adding GPT-OSS-120B model on Roo Code or any other IDE it returns this error:
400 Unable to tokenize message with status code 400 for model gpt-oss-120b: Invalid 'content' type. Expected one of: ['str'], got list.