Token-by-token streaming

anybody have a comment on this - to truely implement streaming responses the SambaNova API should be responding with something like this;
Token 1 [0ms]: ‘Hello’ (5 chars)
Token 2 [55ms]: ‘,’ (1 char)
Token 3 [108ms]: ’ world’ (6 chars)
… -
Progressive display over 488ms

but when we run streaming with the text generation process - as documented here Text generation - SambaNova Documentation

what we get is;

Chunk 1 [1650ms]: ‘Hello’ (5 chars)
Chunk 2 [1665ms]: ‘, world! How are you today?’ (27 chars)

which looks like we are getting Large semantic chunks (sentences/paragraphs), with rapid delivery (all chunks within milliseconds), but. no progressive token display

the challenge im seeing is that we are unable to get rapid streaming type responses

but we could have it wrong?

sample code here we are using;

``python

stream = client.chat.completions.create(

model=“Meta-Llama-3.1-8B-Instruct”,

messages=[

    {"role": "system", "content": "You're a real-time assistant."},

    {"role": "user", "content": "Count from 1 to 10, one number at a time."}

],

stream=True

)

```

**Expected Token-by-Token Output:**

```

Chunk 1: ‘1’

Chunk 2: ‘\n’

Chunk 3: ‘2’

Chunk 4: ‘\n’

Chunk 5: ‘3’

```

**Actual Output:**

```

Chunk 1: ‘1’ (size: 1)

Chunk 2: ‘. \n\nWould you like me to continue?’ (size: 34)

```

1 Like

Hi @david.keane1 ,

We appreciate you bringing this to our attention and will investigate the matter further.
We will get back to you with our findings shortly.

Regards,
Shivani Moze

thanks - just to say the system is very very fast - its just that it seems that its not doing token by token streaming - but maybe it is just the speed that makes it seems like its not…no big deal here just interested if you know

1 Like

@david.keane1 This chunked streaming technique was done by design . I am filing a doc bug to get this clarified. If you do need token by token you can add a post parser to break down the chunks. Thank you for claling it out though so we can update the docs. :slight_smile:

-Coby

We dont need token by token now - just that we were trying to run some streaming testing and token by token is the standard way. But just nice to know that what we found was correct.

1 Like