Overview:
SambaNova Cloud offers advanced text generation capabilities via an OpenAI-compatible API interface. Supported modes include:
Non-Streaming (standard)
Streaming (token-by-token)
Async (non-blocking)
This guide explains how to use these capabilities internally.
Supported Models & API Info:
Models available:
Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-70B-Instruct
API Details:
- Base URL:
https://api.sambanova.ai/v1
- Authentication: API Key (ask DevOps or Admin)
Sample Code – Basic Text Completion:
from openai import OpenAI
client = OpenAI(
base_url="https://api.sambanova.ai/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the benefits of prompt tuning."}
]
)
print(response.choices[0].message.content)
Streaming Generation Example:
stream = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You're a real-time assistant."},
{"role": "user", "content": "Summarize this document in real-time."}
],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Async Usage:
from openai import AsyncOpenAI
import asyncio
client = AsyncOpenAI(
base_url="https://api.sambanova.ai/v1",
api_key="YOUR_API_KEY"
)
async def main():
response = await client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "user", "content": "What's few-shot learning?"}
]
)
print(response.choices[0].message.content)
asyncio.run(main())
Prompt Engineering Tips:
- Define the assistant’s role clearly.
- Specify output format (JSON, list, etc.).
- Include examples or context if needed.
- Use chain-of-thought prompts for reasoning.
- Ensure prompt + history stays within the model’s token limit.
Example Prompt:
System: You are an expert in AI tuning.
User: Explain “prompt tuning” with a practical use-case in bullet points.
Model Selection Guidelines:
Model | Use Case | Speed | Cost |
---|---|---|---|
Llama 3.1 - 8B Instruct | General, fast, low cost | ![]() |
![]() |
Llama 3.1 - 70B Instruct | High-depth, complex reasoning | ![]() |
![]() |