Llama 3.1 iPython Environment

matthew.malek · February 14, 2025, 4:55pm

When using Llama 3.1 70/401B with a system message that includes Environment: ipython I expect the model response to begin with a with the special token ‘<|python_tag|>’ to indicate a code execution block as shown in the model card

This is the first response of a stream using vLLM’s openai compatible api on our internal servers:
Choice(delta=ChoiceDelta(content=‘<|python_tag|>’, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)

This is the response from your api for the same prompt and api options:
Choice(delta=ChoiceDelta(content='import pandas ', function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)

Notice how the content goes straight to the import line skipping over the special token.
Is there an option on the api for the special tokens to be included in the response stream?

coby.adams · February 14, 2025, 10:10pm

@matthew.malek Thank you for participating in the community . The special tokens do not show in our stream as they are removed during de-tokenization. On your internal systems you must be getting the raw stream before it is de-tokenized.

-Coby

matthew.malek · February 15, 2025, 2:12pm

Hi @coby.adams, is there an option to keep the special tokens in the output or return the raw stream over the api?

coby.adams · February 15, 2025, 4:00pm

@matthew.malek

Not at this time. I can file an enhancement request but I cannot promise it will be considered. I will at least file it though.

-Coby