Core Concepts Behind Large Language Models (LLMs)

shivani.moze · July 29, 2025, 7:19am

What is a Large Language Model?

A Large Language Model (LLM) is a type of artificial intelligence trained to understand and generate human language. LLMs like Meta-Llama-3.3-70B-Instruct and
Llama-4-Maverick-17B-128E-Instruct learn from massive amounts of text data (books, websites, articles) to:

Answer questions
Write code
Summarize content
Translate text
And more!

Core Concepts Behind How LLMs Work

1. Tokens

LLMs don’t read sentences like we do—they break them into tokens.
One token ≈ a word or part of a word.
For example: "I love Python!" → ["I", " love", " Python", "!"]

2. Parameters

LLMs have millions or billions of parameters (think of them as memory or knobs).
These are what the model adjusts during training to “learn” the structure of language.

3. Prompting

You give the model an input (prompt), and it replies based on what it has learned.
You can add context to guide its response (like previous conversation or instructions).

Required Parameters

Parameter	Type	Description
`model`	String	The name of the model to use (e.g., `llama-3.3-70b-chat`).
`messages`	Array	Array of message objects containing role and content for conversation.

Message Object Structure

Field	Type	Description
`role`	String	One of `system`, `user`, or `assistant`.
`content`	Mixed	Can be a string or an array (for multimodal content).

Chat API Roles: system, user, assistant

When using the Sambanova Chat API, messages are structured using roles.

Role	Represents	Purpose
`system`	The app or developer	Sets behavior, tone, and context
`user`	The end user (you)	Asks a question or gives input
`assistant`	The AI	Responds to the user

Text Example:

messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of Italy?"},
    {"role": "assistant", "content": "The capital of Italy is Rome."}
]


"messages": [
  { "role": "user", "content": "What is the capital of Italy?" }
]

Multimodal Example:

"messages": [
  {
    "role": "user",
    "content": [
      { "type": "text", "text": "What's in this image?" },
      { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
    ]
  }
]

Chat API Parameters Explained

When calling the model with ChatCompletion.create(), you can use several parameters to control how the AI responds:

Optional Parameters

Parameter	Type	Description	Values / Default
`max_tokens`	Integer	Maximum number of tokens to generate. The total is limited by the model’s context length.	Default: `None`
`temperature`	Float	Controls randomness in output. Higher values (e.g., 0.8) result in more creative and diverse responses. Lower values (e.g., 0.2) make output more focused.	Range: `0` to `1`
`top_p`	Float	Controls nucleus sampling. The model considers only the most probable tokens with a cumulative probability of `top_p`.	Range: `0` to `1`
`top_k`	Integer	Limits the number of highest probability tokens to consider when generating text.	Range: `1` to `100`
`stop`	String, Array, Null	Specifies up to 4 sequences where the API will stop generating further tokens. Useful for controlling output structure.	Default: `null`
`stream`	Boolean, Null	If set to `true`, enables response streaming (token by token). If `false`, returns the full completion at once.	Default: `false`
`stream_options`	Object, Null	Specifies additional options for streaming mode. Example: `{ "include_usage": true }` adds token usage in streaming.	Default: `null`

code example

from openai import OpenAI
client = OpenAI(
    base_url="https://api.sambanova.ai/v1", 
    api_key="YOUR SAMBACLOUD API KEY"
)
completion = client.chat.completions.create(
  model="Meta-Llama-3.3-70B-Instruct",
  messages = [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
    ]
)
print(completion.choices[0].message)

Function Calling (Tool Calling)

This is a powerful feature allowing LLMs to call external functions by specifying them in the request.

Function Calling Parameters

Parameter	Type	Description
`tools`	Array	List of functions the model can call.
`response_format`	Object	Forces structured output (e.g., valid JSON).
`tool_choice`	String / Object	Controls if/which function is called: `auto`, `required`, or specific function.

Example: Using `tools`

{
  "model": "Meta-Llama-3.3-70B-Instruct",
  "messages": [
    { "role": "user", "content": "What will the weather be in Pune tomorrow?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Fetch weather info for a city and date.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string", "description": "City name" },
            "date": { "type": "string", "description": "Date in YYYY-MM-DD" }
          },
          "required": ["city", "date"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Values for `tool_choice`

Value	Description
`auto`	Default. The model decides when to use the function.
`required`	Forces the model to use the function, not just reply with plain text.
`{"type":"function","function":{"name":"get_weather"}}`	Forces a specific function call.

Example with `response_format`

To enforce the model to return only structured JSON output:

"response_format": {
  "type": "json_object"
}

Or to match a custom schema:

"response_format": {
  "type": "json_schema",
  "json_schema": {
    "type": "object",
    "properties": {
      "answer": { "type": "string" },
      "source": { "type": "string" }
    },
    "required": ["answer"]
  }
}

Summary:

Feature	SambaNova
Endpoint	`https://api.sambanova.ai/v1/chat/completions`
Model Example	`Llama-3.3-Swallow-70B-Instruct-v0.4,Meta-Llama-3.3-70B-Instruct etc.`
Auth	`Bearer YOUR_SAMBANOVA_API_KEY`
Streaming	(set `"stream": true`)
Function Calling	(via `tools`, `tool_choice`)