anybody have a comment on this - to truely implement streaming responses the SambaNova API should be responding with something like this;
Token 1 [0ms]: ‘Hello’ (5 chars)
Token 2 [55ms]: ‘,’ (1 char)
Token 3 [108ms]: ’ world’ (6 chars)
… -
Progressive display over 488ms
but when we run streaming with the text generation process - as documented here Text generation - SambaNova Documentation
what we get is;
Chunk 1 [1650ms]: ‘Hello’ (5 chars)
Chunk 2 [1665ms]: ‘, world! How are you today?’ (27 chars)
which looks like we are getting Large semantic chunks (sentences/paragraphs), with rapid delivery (all chunks within milliseconds), but. no progressive token display
the challenge im seeing is that we are unable to get rapid streaming type responses
but we could have it wrong?
sample code here we are using;
``python
stream = client.chat.completions.create(
model=“Meta-Llama-3.1-8B-Instruct”,
messages=[
{"role": "system", "content": "You're a real-time assistant."},
{"role": "user", "content": "Count from 1 to 10, one number at a time."}
],
stream=True
)
```
**Expected Token-by-Token Output:**
```
Chunk 1: ‘1’
Chunk 2: ‘\n’
Chunk 3: ‘2’
Chunk 4: ‘\n’
Chunk 5: ‘3’
…
```
**Actual Output:**
```
Chunk 1: ‘1’ (size: 1)
Chunk 2: ‘. \n\nWould you like me to continue?’ (size: 34)
```