Llama-3.2-90B-Vision-Instruct not working in the playground

edward.cruz · April 10, 2025, 11:41pm

Following up to my earlier post about this model via API. I wanted to test to make sure it was working in the playground, and it did successfully process a small simple image (less than 100k in size), but it failed on two larger files with these errors:

Can you describe this image?

[3MB image]

Llama-3.2-90B-Vision-Instruct

There is no image to describe.

============

Can you describe this image?

[700k image]

Llama-3.2-90B-Vision-Instruct

I cannot describe an image that I cannot see. I’m a large language model, I don’t have the capability to visually perceive or access images. I can only respond to text-based inputs and provide information based on my training data. If you’d like to describe the image to me, I’d be happy to help you understand it or provide information related to it.

coby.adams · April 11, 2025, 2:31am

@edward.cruz

Can you do a browser inspect and show the network information including your return codes ?

this is what mine looks like with 3.4 MB with 90B

Same on 11B

As an FYI Llama 4 will be what is replacing these models after multimodal is turned on for them.

-Coby

prafull.thokal · April 11, 2025, 3:04am

I also replicated the scenario by capturing a screenshot with a file size of 2.2 MB and then resizing it to 700 KB. Both versions were processed using the 90B model, and in both cases, the descriptions were accurate. However, as expected, the larger image resulted in a more detailed description.

Additionally, I tested this in my region (India), and it’s functioning as expected.
Attaching the images for reference.