When using Llama 3.1 70/401B with a system message that includes Environment: ipython I expect the model response to begin with a with the special token ‘<|python_tag|>’ to indicate a code execution block as shown in the model card
This is the first response of a stream using vLLM’s openai compatible api on our internal servers:
Choice(delta=ChoiceDelta(content=‘<|python_tag|>’, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)
This is the response from your api for the same prompt and api options:
Choice(delta=ChoiceDelta(content='import pandas ', function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)
Notice how the content goes straight to the import line skipping over the special token.
Is there an option on the api for the special tokens to be included in the response stream?