Llama 3.1 iPython Environment

When using Llama 3.1 70/401B with a system message that includes Environment: ipython I expect the model response to begin with a with the special token ‘<|python_tag|>’ to indicate a code execution block as shown in the model card

This is the first response of a stream using vLLM’s openai compatible api on our internal servers:
Choice(delta=ChoiceDelta(content=‘<|python_tag|>’, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)

This is the response from your api for the same prompt and api options:
Choice(delta=ChoiceDelta(content='import pandas ', function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)

Notice how the content goes straight to the import line skipping over the special token.
Is there an option on the api for the special tokens to be included in the response stream?

1 Like

@matthew.malek Thank you for participating in the community . The special tokens do not show in our stream as they are removed during de-tokenization. On your internal systems you must be getting the raw stream before it is de-tokenized.

-Coby

Hi @coby.adams, is there an option to keep the special tokens in the output or return the raw stream over the api?

1 Like

@matthew.malek

Not at this time. I can file an enhancement request but I cannot promise it will be considered. I will at least file it though.

-Coby