Chat
Create chat completions
Creates a model response for the given chat conversation.
POST https://api.sambanovacloud.com/v1/chat/completions
Request body
Reference
Parameter | Definition | Type | Values |
---|---|---|---|
model | The name of the model to query. | string | Refer to the Model List. |
messages | A list of messages comprising the conversation so far. | array of objects | Array of message objects, each containing: • role (string, required): The role of the message author. Choice between: system , user , or assistant .• content (required): The contents of the message. - For text-only messages, provide the content as a simple string (e.g., "content": "Answer the question in a couple sentences." ).- For multimodal content, use an array of objects to represent different content types, such as text and image. Example format for multimodal: - [{ "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }] |
max_tokens | The maximum number of tokens to generate. | integer | The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model. |
temperature | Determines the degree of randomness in the response. | float | The temperature value can be between 0 and 1. |
top_p | The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. |
float | The value can be between 0 and 1. |
top_k | The top_k parameter is used to limit the number of choices for the next predicted word or token. |
int | The top k value can be between 1 to 100. |
stop | Up to 4 sequences where the API will stop generating further tokens. | string, array or null | Default is null. |
stream | If set, partial message deltas will be sent. | boolean or null | Default is false. |
stream_option | Options for streaming response. Only set this when you set stream: true . |
object or null | Default is null. Available Options: include_usage : boolean |
Sample Request (Text Model)
Below is a sample request body for a streaming response.
{
"messages": [
{"role": "system", "content": "Answer the question in a couple sentences."},
{"role": "user", "content": "Share a happy story with me"}
],
"max_tokens": 800,
"stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
"model": "Meta-Llama-3.1-8B-Instruct",
"stream": true,
"stream_options": {"include_usage": true}
}
Sample Request (Image)
{
"model": "Llama-3.2-11B-Vision-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{
"type": "text",
"text": "Summarize"
},
]
}
],
"max_tokens": 300, # Optional, default 1000
"temperature": [0-1], # Optional default 0
"top_p": [0-1], # Optional default 0
"top_k": [0-100], # Optional default 100
"stop": ["<eot>"], # Optional
}
Response
The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.
Chat completion object
Represents a chat completion response returned by model, based on the provided input.
Reference
Property | Type | Description |
---|---|---|
id | string |
A unique identifier for the chat completion. |
choices | array |
A list containing a single chat completion. |
created | integer |
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | string |
The model used to generate the completion. |
object | string |
The object type, which is always chat.completion . |
usage | object |
An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • throughput_after_first_token : The rate (as tokens per second) at which output tokens are generated after the first token has been delivered• time_to_first_token : The time (in seconds) model takes to generate the first token• model_execution_time : The time (in seconds) to generate a complete response or all tokens• output_tokens_count : Number of tokens generated in the response.• input_tokens_count : Number of tokens in the input prompt.• total_tokens_count : The sum of input and output tokens.<br• |
Sample
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "Llama-3-8b-chat",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"logprobs": null,
"finish_reason": "stop"
}]
}
Chat completion chunk object
Represents a streamed chunk of a chat completion response returned by model, based on the provided input.
Reference
Property | Type | Description |
---|---|---|
id | string |
A unique identifier for the chat completion. |
choices | array |
A list containing a single chat completion. |
created | integer |
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | string |
The model used to generate the completion. |
object | string |
The object type, which is always chat.completion . |
usage | object |
An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • throughput_after_first_token : The rate (as tokens per second) at which output tokens are generated after the first token has been delivered.• time_to_first_token : The time (in seconds) the model takes to generate the first token.• model_execution_time : The time (in seconds) to generate a complete response or all tokens.• output_tokens_count : Number of tokens generated in the response.• input_tokens_count : Number of tokens in the input prompt.• total_tokens_count : The sum of input and output tokens.• queue_time : The time (in seconds) a request spends waiting in the queue before being processed by the model. |
Sample
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1694268190,
"model": "Llama-3-8b-chat",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"delta": {},
"logprobs": null,
"finish_reason": "stop"
}
]
}
Errors
If a request fails, the response body provides a JSON object with details about the error.
For more information on errors, refer to the API Error Codes article