SambaNova Cloud API Reference

karan.srivastava · September 4, 2024, 11:07pm

Chat

Create chat completions

Creates a model response for the given chat conversation.

POST https://api.sambanovacloud.com/v1/chat/completions

Request body

Reference

Parameter	Definition	Type	Values
model	The name of the model to query.	string	Refer to the Model List.
messages	A list of messages comprising the conversation so far.	array of objects	Array of message objects, each containing: • role (string, required): The role of the messages author. Choice between: system, user, or assistant. • content (string, required): The contents of the message.
max_tokens	The maximum number of tokens to generate.	integer	The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model.
temperature	Determines the degree of randomness in the response.	float	The temperature value can be between `0` and `1`.
top_p	The `top_p` (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.	float	The value can be between `0` and `1`.
top_k	The `top_k` parameter is used to limit the number of choices for the next predicted word or token.	int	The top k value can be between `1` to `100`.
stop	Up to 4 sequences where the API will stop generating further tokens.	string, array or null	Default is null.
stream	If set, partial message deltas will be sent	boolean or null	Default is false.
stream_option	Options for streaming response. Only set this when you set `stream: true`.	object or null	Default is null. Available Options: `include_usage:` boolean

Sample

Below is a sample request body for a streaming response.

{
   "messages": [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
   ],
   "max_tokens": 800,
   "stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
   "model": "Meta-Llama-3.1-8B-Instruct",
   "stream": true, 
   "stream_options": {"include_usage": true}
}

Response

The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.

Chat completion object

Represents a chat completion response returned by model, based on the provided input.

Reference

Property	Type	Description
id	`string`	A unique identifier for the chat completion.
choices	`array`	A list containing a single chat completion.
created	`integer`	The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model	`string`	The model used to generate the completion.
object	`string`	The object type, which is always `chat.completion`.
usage	`object`	An optional field present when `stream_options: {"include_usage": true}` is set. When present, it contains a `null` value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • `throughput_after_first_token`: The rate (as tokens per second) at which output tokens are generated after the first token has been delivered • `time_to_first_token`: The time (in seconds) model takes to generate the first token • `model_execution_time`: The time (in seconds) to generate a complete response or all tokens • `output_tokens_count`: Number of tokens generated in the response. • `input_tokens_count`: Number of tokens in the input prompt. • `total_tokens_count`: The sum of input and output tokens.<br•- `queue_time`: The time (in seconds) a request spends waiting in the queue before being processed by the model.

Sample

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Llama-3-8b-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }]
}

Chat completion chunk object

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.

Reference

Property	Type	Description
id	`string`	A unique identifier for the chat completion.
choices	`array`	A list containing a single chat completion.
created	`integer`	The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model	`string`	The model used to generate the completion.
object	`string`	The object type, which is always `chat.completion`.
usage	`object`	An optional field present when `stream_options: {"include_usage": true}` is set. When present, it contains a `null` value except for the last chunk, which contains the token usage statistics for the entire request. Values returned are: • `throughput_after_first_token`: The rate (as tokens per second) at which output tokens are generated after the first token has been delivered. • `time_to_first_token`: The time (in seconds) the model takes to generate the first token. • `model_execution_time`: The time (in seconds) to generate a complete response or all tokens. • `output_tokens_count`: Number of tokens generated in the response. • `input_tokens_count`: Number of tokens in the input prompt. • `total_tokens_count`: The sum of input and output tokens. • `queue_time`: The time (in seconds) a request spends waiting in the queue before being processed by the model.

Sample

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "Llama-3-8b-chat",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}