Unlocking the Power of AI: A Deep Dive into Available Models on SambaNova Cloud

:rocket: Overview:

The SambaNova Cloud is purpose-built to accelerate AI workloads powered by SambaNova RDUs, empowering enterprises to deploy, fine-tune, and scale cutting-edge models with ease. Whether you’re building a chatbot, summarizing legal documents, tuning a vision-language model, or powering multimodal applications, SambaNova delivers a curated ecosystem of performant, cost-effective models optimized for real-world use cases.

This comprehensive guide walks through the available models, their capabilities, pricing, and specifications, including context lengths, completion limits, modalities, and how SambaNova keeps you informed as the platform evolves.

:brain: What’s Available on SambaNova Cloud?

SambaNova Cloud supports a diverse suite of models ranging from compact, ultra-efficient models to long-context, instruction-tuned giants, covering a wide array of AI tasks:

:sparkles: Chatbots & assistants
:page_facing_up: Document summarization & question answering
:bar_chart: RAG pipelines
:studio_microphone: Audio transcription
:speech_balloon: Multimodal AI applications

:magnifying_glass_tilted_left: Model Context Lengths, Token Capabilities & Pricing

These models offer large context windows and high completion token capacities, enabling advanced use cases such as long-form summarization, chain-of-thought reasoning, and multistep workflows.

:blue_book: Supported Models and Pricing (Per 1 Million Tokens)

Model Family Model Name Supports Max Context Length Max Completion Tokens Prompt $/M tokens Completion $/M tokens
DeepSeek DeepSeek-R1 Text 32,768 16,384 $5.00 $7.00
DeepSeek-R1-Distill-Llama-70B Text 131,072 4,096 $0.70 $1.40
DeepSeek-V3-0324 Text 32,768 16,384 $3.00 $4.50
Meta (LLaMA) Llama-4-Maverick-17B-128E-Instruct Text 131,072 4,096 $0.63 $1.80
Llama-4-Scout-17B-16E-Instruct Text 8,192 4,096 $0.40 $0.70
Meta-Llama-3.1-405B-Instruct Text 16,384 4,096 $5.00 $10.00
Meta-Llama-3.1-8B-Instruct Text 16,384 4,096 $0.10 $0.20
Meta-Llama-3.2-1B-Instruct Text 16,384 4,096 $0.04 $0.08
Meta-Llama-3.2-3B-Instruct Text 4,096 4,096 $0.08 $0.16
Meta-Llama-3.3-70B-Instruct Text 131,072 3,072 $0.60 $1.20
Meta-Llama-Guard-3-8B Text 16,384 4,096 $0.30 $0.30
Qwen Qwen2-Audio-7B-Instruct Text, Audio 4,096 4,096 $10.00 $1.00
Qwen3-32B Text 8,192 4,096 $0.40 $0.80
Other QwQ-32B Text 16,384 4,096 $0.50 $1.00
Mistral E5-Mistral-7B-Instruct Text 4,096 4,096 $0.13 Free
Whisper Whisper-Large-v3 Text 4,096 4,096 Free $17.50

Note: Pricing is subject to change. For the most up-to-date rates, visit: Pricing

:magnifying_glass_tilted_left: Model Highlights

:microscope: DeepSeek Series

  • DeepSeek-R1-Distill-Llama-70B offers impressive long-context handling (up to 131k tokens!) with highly affordable pricing. Great for document-heavy tasks like legal, research, or summarization pipelines.
  • DeepSeek-V3-0324 balances context window and cost, ideal for high-throughput AI applications.

:graduation_cap: Meta Llama Family

  • Meta-Llama-3.1-405B-Instruct stands out as one of the most powerful models available, with capabilities suitable for enterprise-scale generative tasks.
  • Meta-Llama-3.2-1B-Instruct and 3.2-3B-Instruct offer lightweight, ultra-cheap inference ideal for real-time or edge applications.

:llama: Llama 4 Models

  • The Maverick and Scout variants of Llama 4 provide expanded token context and lower latency inference with competitive pricing.

:test_tube: Qwen and QwQ Series

  • Qwen2-Audio-7B-Instruct brings multimodal support, perfect for audio-based generative AI.
  • QwQ-32B delivers strong performance while remaining cost-conscious.

:free_button: E5-Mistral-7B-Instruct

  • The only free-to-complete model in the lineup! Use it for prototyping, experiments, and even production inference when cost is a concern.

:money_with_wings: Cost-Aware Model Selection

If your goal is maximum performance, you can consider:

  • Meta-Llama-3.1-405B-Instruct
  • DeepSeek-R1
  • QwQ-32B

For cost-effective inference, you can lean on:

  • Meta-Llama-3.2-1B-Instruct
  • E5-Mistral-7B-Instruct
  • Llama-4-Scout-17B-16E-Instruct

For large-context applications, these models shine:

  • DeepSeek-R1-Distill-Llama-70B (131k tokens)
  • Meta-Llama-3.3-70B-Instruct (131k tokens)
  • Llama-4-Maverick-17B-128E-Instruct (131k tokens)

:counterclockwise_arrows_button: Model Lifecycle & Customer Notification

We actively maintain and evolve the model catalog to match industry trends and user needs.

:white_check_mark: New models are added frequently
:wrench: Specs and prices may change to improve performance and value
:cross_mark: Model retirements are communicated in advance

:e_mail: Customers are notified via email with sufficient time to transition workloads
:blue_book: Migration guidance is provided for affected models

This approach ensures reliability and minimal disruption for production systems.

:satellite_antenna: How to Stay Synced

We provide a public models API to let you monitor the real-time model catalog: :backhand_index_pointing_right: https://api.sambanova.ai/v1/models

The catalog provides:

:white_check_mark: Model IDs
:white_check_mark: Max context and completion tokens
:white_check_mark: Pricing (prompt/completion)

Use this for:

:wrench: Dynamic cost estimation tools
:bar_chart: Custom dashboards
:police_car_light: Automated alerts on catalog changes

:white_check_mark: Summary

Whether you’re optimizing cost, scaling performance, or enabling long-form understanding, SambaNova Cloud delivers a robust, scalable platform. Our model selection, transparent pricing, and dynamic infrastructure make it easier than ever to deploy AI that meets your evolving needs.