SambaNova Text Generation

Overview:

SambaNova Cloud offers advanced text generation capabilities via an OpenAI-compatible API interface. Supported modes include:

  • :white_check_mark: Non-Streaming (standard)
  • :counterclockwise_arrows_button: Streaming (token-by-token)
  • :thread: Async (non-blocking)

This guide explains how to use these capabilities internally.

:brick: Supported Models & API Info:

Models available:

  • Meta-Llama-3.1-8B-Instruct
  • Meta-Llama-3.1-70B-Instruct

API Details:

  • Base URL: https://api.sambanova.ai/v1
  • Authentication: API Key (ask DevOps or Admin)

:laptop: Sample Code – Basic Text Completion:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sambanova.ai/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the benefits of prompt tuning."}
    ]
)

print(response.choices[0].message.content)

:globe_with_meridians: Streaming Generation Example:

stream = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You're a real-time assistant."},
        {"role": "user", "content": "Summarize this document in real-time."}
    ],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

:thread: Async Usage:

from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI(
    base_url="https://api.sambanova.ai/v1",
    api_key="YOUR_API_KEY"
)

async def main():
    response = await client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct",
        messages=[
            {"role": "user", "content": "What's few-shot learning?"}
        ]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

:writing_hand: Prompt Engineering Tips:

  • Define the assistant’s role clearly.
  • Specify output format (JSON, list, etc.).
  • Include examples or context if needed.
  • Use chain-of-thought prompts for reasoning.
  • Ensure prompt + history stays within the model’s token limit.

Example Prompt:

System: You are an expert in AI tuning.
User: Explain “prompt tuning” with a practical use-case in bullet points.

:bar_chart: Model Selection Guidelines:

Model Use Case Speed Cost
Llama 3.1 - 8B Instruct General, fast, low cost :rocket: Fast :heavy_dollar_sign: Low
Llama 3.1 - 70B Instruct High-depth, complex reasoning :turtle: Slow :money_with_wings: High
2 Likes