SambaNova Cloud API Reference

Chat

Create chat completions

Creates a model response for the given chat conversation.

POST https://api.sambanovacloud.com/v1/chat/completions

Request body

Reference

Parameter Definition Type Values
model The name of the model to query. string Refer to the Model List.
messages A list of messages comprising the conversation so far. array of objects Array of message objects, each containing:
role (string, required): The role of the message author. Choice between: system, user, or assistant.
content (required): The contents of the message.
  - For text-only messages, provide the content as a simple string (e.g., "content": "Answer the question in a couple sentences.").
  - For multimodal content, use an array of objects to represent different content types, such as text and image. Example format for multimodal:
      - [{ "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }]
max_tokens The maximum number of tokens to generate. integer The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model.
temperature Determines the degree of randomness in the response. float The temperature value can be between 0 and 1.
top_p The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. float The value can be between 0 and 1.
top_k The top_k parameter is used to limit the number of choices for the next predicted word or token. int The top k value can be between 1 to 100.
stop Up to 4 sequences where the API will stop generating further tokens. string, array or null Default is null.
stream If set, partial message deltas will be sent. boolean or null Default is false.
stream_option Options for streaming response. Only set this when you set stream: true. object or null Default is null.

Available Options:
include_usage: boolean
tools A list of tools the model may call. Currently, only functions are supported as a tool. array { “type”: “function”, “function”: { “name”: “string”, “description”: “string”, “parameters”: { “type”: “object”, “properties”: { “parameter_name_1”: { “type”: “data_type”, “description”: “Description of parameter_name_1” }, “parameter_name_2”: { “type”: “data_type”, “description”: “Description of parameter_name_2” } // Add more parameters as needed }, “required”: [ “parameter_name_1”, “parameter_name_2” // List of required parameters ] } } }
response_format You can set the response_format parameter to json_object in your request to ensure that the model outputs a valid JSON. In case the mode is not able to generate a valid JSON, we will return an error. object Usage: response_format = { “type”: “json_object”}
tool_choice Controls which (if any) tool is called by the model string or object When sending a request to SN Cloud, include the function definition in the tools parameter and set tool_choice to: auto : allows the model to choose between generating a message or calling a function. This is the default tool choice when the field is not specified. required : This forces the model to generate a function call. The model will then always select one or more function(s) to call. To enforce a specific function call, set tool_choice = {“type”: “function”, “function”: {“name”: “solve_quadratic”}}. In this case, the model will only use the specified function.

Sample Request (Text Model)

Below is a sample request body for a streaming response.

{
   "messages": [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
   ],
   "max_tokens": 800,
   "stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
   "model": "Meta-Llama-3.1-8B-Instruct",
   "stream": true, 
   "stream_options": {"include_usage": true}
}

Sample Request (Image)

{
    "model": "Llama-3.2-11B-Vision-Instruct",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          },
          {
            "type": "text",
            "text": "Summarize"
          },
        ]
      }
    ],
    "max_tokens": 300, # Optional, default 1000
    "temperature": [0-1], # Optional default 0
    "top_p": [0-1], # Optional default 0
    "top_k": [0-100], # Optional default 100
    "stop": ["<eot>"], # Optional
  }

Response

The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.

Chat completion object

Represents a chat completion response returned by model, based on the provided input.

Reference

Property Type Description
id string A unique identifier for the chat completion.
choices array A list containing a single chat completion.
created integer The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model string The model used to generate the completion.
object string The object type, which is always chat.completion.
usage object An optional field present when stream_options: {"include_usage": true} is set.
When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request.

Values returned are:
throughput_after_first_token: The rate (as tokens per second) at which output tokens are generated after the first token has been delivered
time_to_first_token: The time (in seconds) model takes to generate the first token
model_execution_time: The time (in seconds) to generate a complete response or all tokens
output_tokens_count: Number of tokens generated in the response.
input_tokens_count: Number of tokens in the input prompt.
total_tokens_count: The sum of input and output tokens.<br•

Sample

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Llama-3-8b-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }]
}

Chat completion chunk object

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.

Reference

Property Type Description
id string A unique identifier for the chat completion.
choices array A list containing a single chat completion.
created integer The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model string The model used to generate the completion.
object string The object type, which is always chat.completion.
usage object An optional field present when stream_options: {"include_usage": true} is set.
When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request.

Values returned are:
throughput_after_first_token: The rate (as tokens per second) at which output tokens are generated after the first token has been delivered.
time_to_first_token: The time (in seconds) the model takes to generate the first token.
model_execution_time: The time (in seconds) to generate a complete response or all tokens.
output_tokens_count: Number of tokens generated in the response.
input_tokens_count: Number of tokens in the input prompt.
total_tokens_count: The sum of input and output tokens.
queue_time: The time (in seconds) a request spends waiting in the queue before being processed by the model.

Sample

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "Llama-3-8b-chat",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}

Errors

If a request fails, the response body provides a JSON object with details about the error.
For more information on errors, refer to the API Error Codes article