We’re using the Vercel AI SDK with SambaNova and streamText method. When calling streamText with DeepSeek model, we’re getting NaN values for token usage in onFinish callback.
Looks like the implementation might need a bit of massaging.
Here’s a sample of how to query us for usage and the way in which it is returned:
import os
import json
import openai
client = openai.OpenAI(
api_key=os.environ.get("SAMBANOVA_API_KEY"),
base_url="https://api.sambanova.ai/v1",
)
response = client.chat.completions.create(
model="Llama-4-Scout-17B-16E-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"},
],
temperature=0.1,
top_p=0.1,
stream_options={"include_usage": True},
stream=True,
)
# Handle streaming response
for chunk in response:
if chunk.choices:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
if chunk.usage:
print(f"\n\nUsage: {json.dumps(chunk.usage.model_dump(), indent=2)}"
response:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
Usage: {
"completion_tokens": 24,
"prompt_tokens": 21,
"total_tokens": 45,
"completion_tokens_details": null,
"prompt_tokens_details": null,
"acceptance_rate": 1,
"completion_tokens_after_first_per_sec": 319.08824550900187,
"completion_tokens_after_first_per_sec_first_ten": 86.32741942123245,
"completion_tokens_per_sec": 42.49429727135262,
"end_time": 1745245649.3937593,
"is_last_response": true,
"start_time": 1745245648.8289776,
"stop_reason": "stop",
"time_to_first_token": 0.49270129203796387,
"total_latency": 0.564781665802002,
"total_tokens_per_sec": 79.67680738378617
}
Please let me know if you have any other questions.
Further details about our API can be found here: