Overview:
The SambaNova Cloud is purpose-built to accelerate AI workloads powered by SambaNova RDUs, empowering enterprises to deploy, fine-tune, and scale cutting-edge models with ease. Whether you’re building a chatbot, summarizing legal documents, tuning a vision-language model, or powering multimodal applications, SambaNova delivers a curated ecosystem of performant, cost-effective models optimized for real-world use cases.
This comprehensive guide walks through the available models, their capabilities, pricing, and specifications, including context lengths, completion limits, modalities, and how SambaNova keeps you informed as the platform evolves.
What’s Available on SambaNova Cloud?
SambaNova Cloud supports a diverse suite of models ranging from compact, ultra-efficient models to long-context, instruction-tuned giants, covering a wide array of AI tasks:
Chatbots & assistants
Document summarization & question answering
RAG pipelines
Audio transcription
Multimodal AI applications
Model Context Lengths, Token Capabilities & Pricing
These models offer large context windows and high completion token capacities, enabling advanced use cases such as long-form summarization, chain-of-thought reasoning, and multistep workflows.
Supported Models and Pricing (Per 1 Million Tokens)
Model Family | Model Name | Supports | Max Context Length | Max Completion Tokens | Prompt $/M tokens | Completion $/M tokens |
---|---|---|---|---|---|---|
DeepSeek | DeepSeek-R1 | Text | 32,768 | 16,384 | $5.00 | $7.00 |
DeepSeek-R1-Distill-Llama-70B | Text | 131,072 | 4,096 | $0.70 | $1.40 | |
DeepSeek-V3-0324 | Text | 32,768 | 16,384 | $3.00 | $4.50 | |
Meta (LLaMA) | Llama-4-Maverick-17B-128E-Instruct | Text | 131,072 | 4,096 | $0.63 | $1.80 |
Llama-4-Scout-17B-16E-Instruct | Text | 8,192 | 4,096 | $0.40 | $0.70 | |
Meta-Llama-3.1-405B-Instruct | Text | 16,384 | 4,096 | $5.00 | $10.00 | |
Meta-Llama-3.1-8B-Instruct | Text | 16,384 | 4,096 | $0.10 | $0.20 | |
Meta-Llama-3.2-1B-Instruct | Text | 16,384 | 4,096 | $0.04 | $0.08 | |
Meta-Llama-3.2-3B-Instruct | Text | 4,096 | 4,096 | $0.08 | $0.16 | |
Meta-Llama-3.3-70B-Instruct | Text | 131,072 | 3,072 | $0.60 | $1.20 | |
Meta-Llama-Guard-3-8B | Text | 16,384 | 4,096 | $0.30 | $0.30 | |
Qwen | Qwen2-Audio-7B-Instruct | Text, Audio | 4,096 | 4,096 | $10.00 | $1.00 |
Qwen3-32B | Text | 8,192 | 4,096 | $0.40 | $0.80 | |
Other | QwQ-32B | Text | 16,384 | 4,096 | $0.50 | $1.00 |
Mistral | E5-Mistral-7B-Instruct | Text | 4,096 | 4,096 | $0.13 | Free |
Whisper | Whisper-Large-v3 | Text | 4,096 | 4,096 | Free | $17.50 |
Note: Pricing is subject to change. For the most up-to-date rates, visit: Pricing
Model Highlights
DeepSeek Series
- DeepSeek-R1-Distill-Llama-70B offers impressive long-context handling (up to 131k tokens!) with highly affordable pricing. Great for document-heavy tasks like legal, research, or summarization pipelines.
- DeepSeek-V3-0324 balances context window and cost, ideal for high-throughput AI applications.
Meta Llama Family
- Meta-Llama-3.1-405B-Instruct stands out as one of the most powerful models available, with capabilities suitable for enterprise-scale generative tasks.
- Meta-Llama-3.2-1B-Instruct and 3.2-3B-Instruct offer lightweight, ultra-cheap inference ideal for real-time or edge applications.
Llama 4 Models
- The Maverick and Scout variants of Llama 4 provide expanded token context and lower latency inference with competitive pricing.
Qwen and QwQ Series
- Qwen2-Audio-7B-Instruct brings multimodal support, perfect for audio-based generative AI.
- QwQ-32B delivers strong performance while remaining cost-conscious.
E5-Mistral-7B-Instruct
- The only free-to-complete model in the lineup! Use it for prototyping, experiments, and even production inference when cost is a concern.
Cost-Aware Model Selection
If your goal is maximum performance, you can consider:
- Meta-Llama-3.1-405B-Instruct
- DeepSeek-R1
- QwQ-32B
For cost-effective inference, you can lean on:
- Meta-Llama-3.2-1B-Instruct
- E5-Mistral-7B-Instruct
- Llama-4-Scout-17B-16E-Instruct
For large-context applications, these models shine:
- DeepSeek-R1-Distill-Llama-70B (131k tokens)
- Meta-Llama-3.3-70B-Instruct (131k tokens)
- Llama-4-Maverick-17B-128E-Instruct (131k tokens)
Model Lifecycle & Customer Notification
We actively maintain and evolve the model catalog to match industry trends and user needs.
New models are added frequently
Specs and prices may change to improve performance and value
Model retirements are communicated in advance
Customers are notified via email with sufficient time to transition workloads
Migration guidance is provided for affected models
This approach ensures reliability and minimal disruption for production systems.
How to Stay Synced
We provide a public models API to let you monitor the real-time model catalog: https://api.sambanova.ai/v1/models
The catalog provides:
Model IDs
Max context and completion tokens
Pricing (prompt/completion)
Use this for:
Dynamic cost estimation tools
Custom dashboards
Automated alerts on catalog changes
Summary
Whether you’re optimizing cost, scaling performance, or enabling long-form understanding, SambaNova Cloud delivers a robust, scalable platform. Our model selection, transparent pricing, and dynamic infrastructure make it easier than ever to deploy AI that meets your evolving needs.