Chat
Create chat completions
Creates a model response for the given chat conversation.
POST https://api.sambanovacloud.com/v1/chat/completions
Request body
Reference
Parameter | Definition | Type | Values |
---|---|---|---|
model | The name of the model to query. | string | Refer to the Model List. |
messages | A list of messages comprising the conversation so far. | array of objects | Array of message objects, each containing: • role (string, required): The role of the messages author. Choice between: system, user, or assistant. • content (string, required): The contents of the message. |
max_tokens | The maximum number of tokens to generate. | integer | The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model. |
temperature | Determines the degree of randomness in the response. | float | The temperature value can be between 0 and 1 . |
top_p | The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. |
float | The value can be between 0 and 1 . |
top_k | The top_k parameter is used to limit the number of choices for the next predicted word or token. |
int | The top k value can be between 1 to 100 . |
stop | Up to 4 sequences where the API will stop generating further tokens. | string, array or null | Default is null. |
stream | If set, partial message deltas will be sent | boolean or null | Default is false. |
stream_option | Options for streaming response. Only set this when you set stream: true . |
object or null | Default is null. Available Options: include_usage: boolean |
Sample
Below is a sample request body for a streaming response.
{
"messages": [
{"role": "system", "content": "Answer the question in a couple sentences."},
{"role": "user", "content": "Share a happy story with me"}
],
"max_tokens": 800,
"stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
"model": "Meta-Llama-3.1-8B-Instruct",
"stream": true,
"stream_options": {"include_usage": true}
}
Response
The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.
Chat completion object
Represents a chat completion response returned by model, based on the provided input.
Reference
Property | Type | Description |
---|---|---|
id | string |
A unique identifier for the chat completion. |
choices | array |
A list containing a single chat completion. |
created | integer |
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | string |
The model used to generate the completion. |
object | string |
The object type, which is always chat.completion . |
usage | object |
An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • throughput_after_first_token : The rate (as tokens per second) at which output tokens are generated after the first token has been delivered• time_to_first_token : The time (in seconds) model takes to generate the first token• model_execution_time : The time (in seconds) to generate a complete response or all tokens• output_tokens_count : Number of tokens generated in the response.• input_tokens_count : Number of tokens in the input prompt.• total_tokens_count : The sum of input and output tokens.<br•- queue_time : The time (in seconds) a request spends waiting in the queue before being processed by the model. |
Sample
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "Llama-3-8b-chat",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"logprobs": null,
"finish_reason": "stop"
}]
}
Chat completion chunk object
Represents a streamed chunk of a chat completion response returned by model, based on the provided input.
Reference
Property | Type | Description |
---|---|---|
id | string |
A unique identifier for the chat completion. |
choices | array |
A list containing a single chat completion. |
created | integer |
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | string |
The model used to generate the completion. |
object | string |
The object type, which is always chat.completion . |
usage | object |
An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • throughput_after_first_token : The rate (as tokens per second) at which output tokens are generated after the first token has been delivered.• time_to_first_token : The time (in seconds) the model takes to generate the first token.• model_execution_time : The time (in seconds) to generate a complete response or all tokens.• output_tokens_count : Number of tokens generated in the response.• input_tokens_count : Number of tokens in the input prompt.• total_tokens_count : The sum of input and output tokens.• queue_time : The time (in seconds) a request spends waiting in the queue before being processed by the model. |
Sample
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1694268190,
"model": "Llama-3-8b-chat",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"delta": {},
"logprobs": null,
"finish_reason": "stop"
}
]
}