Questions about Preview Models

tonyd · March 29, 2025, 6:22pm

Hi SambaNova community! I just discovered SambaNova earlier today after seeing it mentioned in Cline’s changelog. I read the whole SN40L RDU datasheet, and I have to say, this is utterly brilliant, and I am already considering replacing Anthropic with SambaNova as my preferred LLM API provider. That said, I have several questions about the nature of SambaNova’s Preview models as opposed to Production models:

DeepSeek themselves cite the new DeepSeek-V3-0324 model as supporting a 128k context length (source). Does SambaNova have plans to expand the context window size from the current limit of 8k (source) if/when DeepSeek-V3-0324 goes from Preview to Production?
If/when DeepSeek-V3-0324 goes from Preview to Production, will the pricing change at all? The current listed prices of $1/1m input tokens and $1.50/1m output tokens (source) is a huge part of the appeal for me, but if the price were to rise up to $5/1m in | $7/1m out like the DeepSeek-R1 model, it gets a lot less attractive. If this is a question that can’t be answered yet, have there been price changes to other models in the past as they have gone from Preview to Production?
At an architectural level, is SambaNova equipped to offer input caching, such as to lower the cost of what would otherwise be full-price inputs to much cheaper cache hits / reads? I understand that the RDU architecture differs from traditional GPU architecture substantially and may not lend itself to such efficiencies, but the way I see it, every other LLM API provider is cooked if those same kind of cost savings are possible on RDU’s.

Thank you to the SambaNova team for all the hard work they’ve done to create such a compelling offering so far!

vasanth.mohan · March 29, 2025, 9:03pm

First welcome to the community! Second, great questions!

Yes, we are making updates week by week to the model, so expect continued updates to increase the context length.
In general, we update pricing on a constant basis to keep up to date and stay competitive in the market. For example, a few weeks back we just cut the prices on our Qwen models. DeepSeek V3-0324 is the first model we have released in preview mode, so the team is evaluating and will provide more updates when its ready for production.
Architecturally, yes and we will share updates once we make it available.

Would love to learn a bit more about your use case with AI and what you are looking to build. Would be really curious to learn how you are planning to take advantage of prompt caching.

tonyd · March 29, 2025, 10:44pm

Thank you for the quick reply! I’m thrilled to hear about the planned context window expansion and potential for prompt caching with the RDU architecture.

Regarding my use case: I extensively use Cline with Claude 3.7 Sonnet and o3-mini for Golang development. Cline adds significant context from my editor, resulting in a 10:1 to 15:1 ratio of cache reads to fresh input tokens with OpenAI. On busy days, this has translated to millions of tokens as cache reads, making prompt caching extremely valuable for both cost savings and reduced latency - the latter being particularly important on obsolete GPU-based platforms that lack the inherent performance advantages of your RDU-based platform

My project is sensitive in nature (potential trade secrets), so I’m closely following your privacy thread. How my prompts and responses will be used is critical - if they’re stored or used by SambaNova or SambaNova’s partners for purposes beyond simply ingesting my prompts and serving responses, I’d need to restrict which code components I work on via your API. I understand that in some cases, prompts which are flagged for potential abuse / AUP violations may be retained for investigative purposes - this is not a dealbreaker for me with Anthropic or OpenAI and this type of clause would not be a dealbreaker for me with SambaNova either. I’m not planning on doing anything naughty, after all.

If SambaNova can maintain current (or close to current) DeepSeek-V3-0324 pricing while expanding to 128k context, adding prompt caching, and ensuring strong data privacy, your platform’s extreme hardware advantages paired with the inclusion of these important API features would result in an unmatched value proposition for my use case. I’d be eager to become both a customer and advocate if that’s where SambaNova is headed.

P.S. The technical superiority of SambaNova’s RDU is truly remarkable - this brilliant design seems to outclass not only traditional GPUs but also other ‘Post-GPU’ architectures. From what I understand, you’re achieving with a single rack what would take Groq 9 racks or Cerebras 4 racks to match. Kudos once again to the whole SambaNova team for this truly innovative engineering feat!

vasanth.mohan · March 31, 2025, 6:44pm

Thanks for the detailed response @tonyd! Really great breakdown that I made sure to share with the team

Definitely stay tuned for a lot more updates!

hzruo · April 11, 2025, 9:06am

128k context length would be great for code completion, the current context 8k is too short.