Hi all, I hope you can help me with the following problem.
I have a use case where I need to generate different samples for the same prompt which is typically done by having high temperature, but for example the following request always returns the same response:
curl -H "Authorization: Bearer $SAMBANOVA_API_KEY" -H "Content-Type: application/json" \
-d '{
"stream": false,
"model": "Meta-Llama-3.2-3B-Instruct",
"temperature": 1.0,
"messages": [
{
"role": "system",
"content": "You are helpful assistant."
},
{
"role": "user",
"content": "write me a poem"
}
]
}'
So it seems that responses are always cached. Is there a way to avoid caching? Alternatively, OpenAI API has parameter “n” where we can set the number of candidate generations, but in SambaNova API this does not seem to work.