Issue Summary:
Users may observe that responses from the QwQ-32B model (also referred to as the “thinking model”) contain a closing </think>
tag without a corresponding opening <think>
tag. This may cause issues in parsing logic or structured post-processing pipelines.
Root Cause:
This is expected behavior when the model is used with the apply_chat_template
function and the add_generation_prompt=True
parameter is enabled. As per the official QwQ-32B usage guidelines provided by Hugging Face:
Enforce Thoughtful Output: Ensure the model starts with
<think>\n
to prevent generating empty thinking content, which can degrade output quality. If you useapply_chat_template
and setadd_generation_prompt=True
, this is already automatically implemented, but it may cause the response to lack the<think>
tag at the beginning. This is normal behavior.
Explanation:
When add_generation_prompt=True
is used:
- The system template assumes the presence of a
<think>
block implicitly. - Therefore, the model may skip outputting the opening
<think>
tag, even though the thinking content is still generated and closed properly with</think>
. - This results in only a closing
</think>
tag being visible in the output.
This behavior is by design and does not indicate a malfunction or model issue.