Today, we’re introducing the SN50 RDU, our 5th-generation dataflow accelerator, designed specifically for agentic inference workloads.
As more applications move from single LLM calls to multi-step agent loops (planner → tool use → verifier → iteration), inference becomes a latency and memory movement problem — not just a compute problem.
The SN50 is built to address that.
Tokenonomics that Make Sense for Agents
The SN50 RDU delivers an unmatched blend of ultra‑low latency, high throughput, and power‑efficient performance for AI inference workloads, fundamentally reshaping the economics of token generation.
Compared to Blackwell B200 GPUs, the SN50 delivers 5X the maximum speed and over 3X the throughput for agentic inference as highlighted across an array of models, such as Meta’s Llama 3.3 70B, which is a widely used open-source model even several years since it was released.
This impressive performance is delivered while averaging just 20 kW of power in a SambaRack, which allows the rack to operate in existing air-cooled data centers. This combination of performance, efficiency, and scalability translates into a total‑cost‑of‑ownership (TCO) advantage that is unparalleled in the market for inference service providers running models like gpt-oss at 8x the savings of B200s GPUs.
The SambaRack SN50 combines 16 SN50 chips per system and can scale to 256 accelerators with multi-terabyte-per-second interconnect bandwidth.
It supports very large models and long contexts, and is designed for real-time agent workloads where latency compounds across multiple inference calls.