Core Concepts Behind Large Language Models (LLMs)

What is a Large Language Model?

A Large Language Model (LLM) is a type of artificial intelligence trained to understand and generate human language. LLMs like Meta-Llama-3.3-70B-Instruct and
Llama-4-Maverick-17B-128E-Instruct learn from massive amounts of text data (books, websites, articles) to:

  • Answer questions
  • Write code
  • Summarize content
  • Translate text
  • And more!

Core Concepts Behind How LLMs Work

1. Tokens

  • LLMs don’t read sentences like we do—they break them into tokens.
  • One token ≈ a word or part of a word.
  • For example: "I love Python!"["I", " love", " Python", "!"]

2. Parameters

  • LLMs have millions or billions of parameters (think of them as memory or knobs).
  • These are what the model adjusts during training to “learn” the structure of language.

3. Prompting

  • You give the model an input (prompt), and it replies based on what it has learned.
  • You can add context to guide its response (like previous conversation or instructions).

Required Parameters

Parameter Type Description
model String The name of the model to use (e.g., llama-3.3-70b-chat).
messages Array Array of message objects containing role and content for conversation.

Message Object Structure

Field Type Description
role String One of system, user, or assistant.
content Mixed Can be a string or an array (for multimodal content).

Chat API Roles: system, user, assistant

When using the Sambanova Chat API, messages are structured using roles.

Role Represents Purpose
system The app or developer Sets behavior, tone, and context
user The end user (you) Asks a question or gives input
assistant The AI Responds to the user

Text Example:

messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of Italy?"},
    {"role": "assistant", "content": "The capital of Italy is Rome."}
]

"messages": [
  { "role": "user", "content": "What is the capital of Italy?" }
]

Multimodal Example:

"messages": [
  {
    "role": "user",
    "content": [
      { "type": "text", "text": "What's in this image?" },
      { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
    ]
  }
]

Chat API Parameters Explained

When calling the model with ChatCompletion.create(), you can use several parameters to control how the AI responds:

Optional Parameters

Parameter Type Description Values / Default
max_tokens Integer Maximum number of tokens to generate. The total is limited by the model’s context length. Default: None
temperature Float Controls randomness in output. Higher values (e.g., 0.8) result in more creative and diverse responses. Lower values (e.g., 0.2) make output more focused. Range: 0 to 1
top_p Float Controls nucleus sampling. The model considers only the most probable tokens with a cumulative probability of top_p. Range: 0 to 1
top_k Integer Limits the number of highest probability tokens to consider when generating text. Range: 1 to 100
stop String, Array, Null Specifies up to 4 sequences where the API will stop generating further tokens. Useful for controlling output structure. Default: null
stream Boolean, Null If set to true, enables response streaming (token by token). If false, returns the full completion at once. Default: false
stream_options Object, Null Specifies additional options for streaming mode. Example: { "include_usage": true } adds token usage in streaming. Default: null

code example

from openai import OpenAI
client = OpenAI(
    base_url="https://api.sambanova.ai/v1", 
    api_key="YOUR SAMBACLOUD API KEY"
)
completion = client.chat.completions.create(
  model="Meta-Llama-3.3-70B-Instruct",
  messages = [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
    ]
)
print(completion.choices[0].message)

Function Calling (Tool Calling)

This is a powerful feature allowing LLMs to call external functions by specifying them in the request.

Function Calling Parameters

Parameter Type Description
tools Array List of functions the model can call.
response_format Object Forces structured output (e.g., valid JSON).
tool_choice String / Object Controls if/which function is called: auto, required, or specific function.

Example: Using tools

{
  "model": "Meta-Llama-3.3-70B-Instruct",
  "messages": [
    { "role": "user", "content": "What will the weather be in Pune tomorrow?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Fetch weather info for a city and date.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string", "description": "City name" },
            "date": { "type": "string", "description": "Date in YYYY-MM-DD" }
          },
          "required": ["city", "date"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Values for tool_choice

Value Description
auto Default. The model decides when to use the function.
required Forces the model to use the function, not just reply with plain text.
{"type":"function","function":{"name":"get_weather"}} Forces a specific function call.

Example with response_format

To enforce the model to return only structured JSON output:

"response_format": {
  "type": "json_object"
}

Or to match a custom schema:

"response_format": {
  "type": "json_schema",
  "json_schema": {
    "type": "object",
    "properties": {
      "answer": { "type": "string" },
      "source": { "type": "string" }
    },
    "required": ["answer"]
  }
}

Summary:

Feature SambaNova
Endpoint https://api.sambanova.ai/v1/chat/completions
Model Example Llama-3.3-Swallow-70B-Instruct-v0.4,Meta-Llama-3.3-70B-Instruct etc.
Auth Bearer YOUR_SAMBANOVA_API_KEY
Streaming (set "stream": true)
Function Calling (via tools, tool_choice)
2 Likes