hey team,
I’m writing an app that uses the llama vision model to return structured text from an image… unfortunately I seem to be limited on output token length.
Are there any plans to increase this?
Best,
Matt
“model”: “Llama-3.2-90B-Vision-Instruct”,
“object”: “chat.completion”,
“system_fingerprint”: “fastcoe”,
“usage”: {
“completion_tokens”: 4059,
“completion_tokens_after_first_per_sec”: 105.0270045726293,
“completion_tokens_after_first_per_sec_first_ten”: 103.96115165272784,
“completion_tokens_per_sec”: 103.6753340246086,
“end_time”: 1740521727.4034967,
“is_last_response”: true,
“prompt_tokens”: 37,
“start_time”: 1740521688.2524292,
“stop_reason”: “length”,
“time_to_first_token”: 0.513385534286499,
“total_latency”: 39.15106749534607,
“total_tokens”: 4096,
“total_tokens_per_sec”: 104.62039126996719
}
}