Issue Summary:
Users may observe that responses from the QwQ-32B model (also referred to as the “thinking model”) contain a closing </think> tag without a corresponding opening <think> tag. This may cause issues in parsing logic or structured post-processing pipelines.
Root Cause:
This is expected behavior when the model is used with the apply_chat_template function and the add_generation_prompt=True parameter is enabled. As per the official QwQ-32B usage guidelines provided by Hugging Face:
Enforce Thoughtful Output: Ensure the model starts with
<think>\nto prevent generating empty thinking content, which can degrade output quality. If you useapply_chat_templateand setadd_generation_prompt=True, this is already automatically implemented, but it may cause the response to lack the<think>tag at the beginning. This is normal behavior.
Explanation:
When add_generation_prompt=True is used:
- The system template assumes the presence of a
<think>block implicitly. - Therefore, the model may skip outputting the opening
<think>tag, even though the thinking content is still generated and closed properly with</think>. - This results in only a closing
</think>tag being visible in the output.
This behavior is by design and does not indicate a malfunction or model issue.