Introducing the SN50 RDU: Purpose-Built for Agentic Inference

justin.woo · February 24, 2026, 7:40pm

Today, we’re introducing the SN50 RDU, our 5th-generation dataflow accelerator, designed specifically for agentic inference workloads.

As more applications move from single LLM calls to multi-step agent loops (planner → tool use → verifier → iteration), inference becomes a latency and memory movement problem — not just a compute problem.

The SN50 is built to address that.

Tokenonomics that Make Sense for Agents

The SN50 RDU delivers an unmatched blend of ultra‑low latency, high throughput, and power‑efficient performance for AI inference workloads, fundamentally reshaping the economics of token generation.

Compared to Blackwell B200 GPUs, the SN50 delivers 5X the maximum speed and over 3X the throughput for agentic inference as highlighted across an array of models, such as Meta’s Llama 3.3 70B, which is a widely used open-source model even several years since it was released.

This impressive performance is delivered while averaging just 20 kW of power in a SambaRack, which allows the rack to operate in existing air-cooled data centers. This combination of performance, efficiency, and scalability translates into a total‑cost‑of‑ownership (TCO) advantage that is unparalleled in the market for inference service providers running models like gpt-oss at 8x the savings of B200s GPUs.

The SambaRack SN50 combines 16 SN50 chips per system and can scale to 256 accelerators with multi-terabyte-per-second interconnect bandwidth.

It supports very large models and long contexts, and is designed for real-time agent workloads where latency compounds across multiple inference calls.

Check out our official announcement for more information.

slicksroboto · March 1, 2026, 7:24pm

Amazing news!
I love the tech, it packs a punch. Its like the Bruce Lee of AI tech, not the biggest guy around but punches above its weight.

Congratulations to the team!
(And im also looking forward to test and play with MiniMax model/s going forward.)

Keep up the great job!

mkhristi32 · March 2, 2026, 7:25pm

justin.woo:

oday, we’re introducing the SN50 RDU, our 5th-generation dataflow accelerator, designed specifically for agentic inference workloads.

As more applications move from single LLM calls to multi-step agent loops (planner → tool use → verifier → iteration), inference becomes a latency and memory movement problem — not just a compute problem.

The SN50 is built to address that.

Tokenonomics that Make Sense for Agents

The SN50 RDU delivers an unmatched blend of ultra‑low latency, high throughput, and power‑efficient performance for AI inference workloads, fundamentally reshaping the economics of token generation.

Compared to Blackwell B200 GPUs, the SN50 delivers 5X the maximum speed and over 3X the throughput for agentic inference as highlighted across an array of models, such as Meta’s Llama 3.3 70B, which is a widely used open-source model even several years since it was released.

image1500×843 148 KB

This impressive performance is delivered while averaging just 20 kW of power in a SambaRack, which allows the rack to operate in existing air-cooled data centers. This combination of performance, efficiency, and scalability translates into a total‑cost‑of‑ownership (TCO) advantage that is unparalleled in the market for inference service providers running models like gpt-oss at 8x the savings of B200s GPUs.

image1500×843 149 KB

The SambaRack SN50 combines 16 SN50 chips per system and can scale to 256 accelerators with multi-terabyte-per-second interconnect bandwidth.

It supports very large models and long contexts, and is designed for real-time agent workloads where latency compounds across multiple inference calls.

Check out our official announcement for more information.

Huge congratulations to the entire SambaNova team on the launch of the SN50 RDU