What is a Large Language Model?
A Large Language Model (LLM) is a type of artificial intelligence trained to understand and generate human language. LLMs like Meta-Llama-3.3-70B-Instruct and
Llama-4-Maverick-17B-128E-Instruct learn from massive amounts of text data (books, websites, articles) to:
- Answer questions
- Write code
- Summarize content
- Translate text
- And more!
Core Concepts Behind How LLMs Work
1. Tokens
- LLMs don’t read sentences like we do—they break them into tokens.
- One token ≈ a word or part of a word.
- For example:
"I love Python!"→["I", " love", " Python", "!"]
2. Parameters
- LLMs have millions or billions of parameters (think of them as memory or knobs).
- These are what the model adjusts during training to “learn” the structure of language.
3. Prompting
- You give the model an input (prompt), and it replies based on what it has learned.
- You can add context to guide its response (like previous conversation or instructions).
Required Parameters
| Parameter | Type | Description |
|---|---|---|
model |
String | The name of the model to use (e.g., llama-3.3-70b-chat). |
messages |
Array | Array of message objects containing role and content for conversation. |
Message Object Structure
| Field | Type | Description |
|---|---|---|
role |
String | One of system, user, or assistant. |
content |
Mixed | Can be a string or an array (for multimodal content). |
Chat API Roles: system, user, assistant
When using the Sambanova Chat API, messages are structured using roles.
| Role | Represents | Purpose |
|---|---|---|
system |
The app or developer | Sets behavior, tone, and context |
user |
The end user (you) | Asks a question or gives input |
assistant |
The AI | Responds to the user |
Text Example:
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Italy?"},
{"role": "assistant", "content": "The capital of Italy is Rome."}
]
"messages": [
{ "role": "user", "content": "What is the capital of Italy?" }
]
Multimodal Example:
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
]
}
]
Chat API Parameters Explained
When calling the model with ChatCompletion.create(), you can use several parameters to control how the AI responds:
Optional Parameters
| Parameter | Type | Description | Values / Default |
|---|---|---|---|
max_tokens |
Integer | Maximum number of tokens to generate. The total is limited by the model’s context length. | Default: None |
temperature |
Float | Controls randomness in output. Higher values (e.g., 0.8) result in more creative and diverse responses. Lower values (e.g., 0.2) make output more focused. | Range: 0 to 1 |
top_p |
Float | Controls nucleus sampling. The model considers only the most probable tokens with a cumulative probability of top_p. |
Range: 0 to 1 |
top_k |
Integer | Limits the number of highest probability tokens to consider when generating text. | Range: 1 to 100 |
stop |
String, Array, Null | Specifies up to 4 sequences where the API will stop generating further tokens. Useful for controlling output structure. | Default: null |
stream |
Boolean, Null | If set to true, enables response streaming (token by token). If false, returns the full completion at once. |
Default: false |
stream_options |
Object, Null | Specifies additional options for streaming mode. Example: { "include_usage": true } adds token usage in streaming. |
Default: null |
code example
from openai import OpenAI
client = OpenAI(
base_url="https://api.sambanova.ai/v1",
api_key="YOUR SAMBACLOUD API KEY"
)
completion = client.chat.completions.create(
model="Meta-Llama-3.3-70B-Instruct",
messages = [
{"role": "system", "content": "Answer the question in a couple sentences."},
{"role": "user", "content": "Share a happy story with me"}
]
)
print(completion.choices[0].message)
Function Calling (Tool Calling)
This is a powerful feature allowing LLMs to call external functions by specifying them in the request.
Function Calling Parameters
| Parameter | Type | Description |
|---|---|---|
tools |
Array | List of functions the model can call. |
response_format |
Object | Forces structured output (e.g., valid JSON). |
tool_choice |
String / Object | Controls if/which function is called: auto, required, or specific function. |
Example: Using tools
{
"model": "Meta-Llama-3.3-70B-Instruct",
"messages": [
{ "role": "user", "content": "What will the weather be in Pune tomorrow?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Fetch weather info for a city and date.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string", "description": "City name" },
"date": { "type": "string", "description": "Date in YYYY-MM-DD" }
},
"required": ["city", "date"]
}
}
}
],
"tool_choice": "auto"
}
Values for tool_choice
| Value | Description |
|---|---|
auto |
Default. The model decides when to use the function. |
required |
Forces the model to use the function, not just reply with plain text. |
{"type":"function","function":{"name":"get_weather"}} |
Forces a specific function call. |
Example with response_format
To enforce the model to return only structured JSON output:
"response_format": {
"type": "json_object"
}
Or to match a custom schema:
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"answer": { "type": "string" },
"source": { "type": "string" }
},
"required": ["answer"]
}
}
Summary:
| Feature | SambaNova |
|---|---|
| Endpoint | https://api.sambanova.ai/v1/chat/completions |
| Model Example | Llama-3.3-Swallow-70B-Instruct-v0.4,Meta-Llama-3.3-70B-Instruct etc. |
| Auth | Bearer YOUR_SAMBANOVA_API_KEY |
| Streaming | (set "stream": true) |
| Function Calling | (via tools, tool_choice) |