Token-by-token streaming

anybody have a comment on this - to truely implement streaming responses the SambaNova API should be responding with something like this;
Token 1 [0ms]: ‘Hello’ (5 chars)
Token 2 [55ms]: ‘,’ (1 char)
Token 3 [108ms]: ’ world’ (6 chars)
… -
Progressive display over 488ms

but when we run streaming with the text generation process - as documented here Text generation - SambaNova Documentation

what we get is;

Chunk 1 [1650ms]: ‘Hello’ (5 chars)
Chunk 2 [1665ms]: ‘, world! How are you today?’ (27 chars)

which looks like we are getting Large semantic chunks (sentences/paragraphs), with rapid delivery (all chunks within milliseconds), but. no progressive token display

the challenge im seeing is that we are unable to get rapid streaming type responses

but we could have it wrong?

sample code here we are using;

``python

stream = client.chat.completions.create(

model=“Meta-Llama-3.1-8B-Instruct”,

messages=[

    {"role": "system", "content": "You're a real-time assistant."},

    {"role": "user", "content": "Count from 1 to 10, one number at a time."}

],

stream=True

)

```

**Expected Token-by-Token Output:**

```

Chunk 1: ‘1’

Chunk 2: ‘\n’

Chunk 3: ‘2’

Chunk 4: ‘\n’

Chunk 5: ‘3’

```

**Actual Output:**

```

Chunk 1: ‘1’ (size: 1)

Chunk 2: ‘. \n\nWould you like me to continue?’ (size: 34)

```

Hi @david.keane1 ,

We appreciate you bringing this to our attention and will investigate the matter further.
We will get back to you with our findings shortly.

Regards,
Shivani Moze

thanks - just to say the system is very very fast - its just that it seems that its not doing token by token streaming - but maybe it is just the speed that makes it seems like its not…no big deal here just interested if you know