Challenges with Llama-3.2-90B-Vision-Instruct model

sonubabut · February 27, 2025, 11:21am

Currently facing challenges with the accuracy of the Llama-3.2-90B-Vision-Instruct model in detecting and counting various components in technical drawings.

Issues Faced:

Preprocessing Techniques: Despite implementing several preprocessing methods—such as zooming, sharpening, and contrast enhancement—we have not observed any improvement in the model’s performance.

Image Chunking: To refine detection, we experimented with dividing images into smaller sections. However, this approach led to additional inaccuracies, with the model frequently identifying components in areas where none were present.

Complexity Impact: As the complexity of the technical drawings increases, the model’s accuracy declines further.

Given that developers at SambaNova have successfully addressed similar issues, we would greatly appreciate any insights or alternative approaches to improve our model’s performance for this use case.

prajwal.balapure · March 5, 2025, 8:58am

Hi,

Thank you for sharing your challenges with detecting and counting components in technical drawings using the Llama-3.2-90B-Vision-Instruct model. We understand the complexity of this task and appreciate your detailed insights.

We’ll discuss this with our team and let you know how we plan to proceed. In the meantime, please feel free to share any additional details that might help us refine our strategy.

Thank you,
Prajwal

prajwal.balapure · March 6, 2025, 7:52am

Hi,

To better analyze the issue and explore possible solutions, could you share the technical drawing you’re working on? Having the actual drawing will help us better understand the specific challenges and refine our approach accordingly.

Thanks,
Prajwal

coby.adams · March 10, 2025, 3:22pm

@sonubabut Were you able to provide us one of the images you are having challenges with so we can duplicate your concerns in house?

-Coby

sonubabut · March 11, 2025, 1:25pm

Hi @prajwal.balapure & @coby.adams,
Thank you for reaching out. I have attached an example technical drawing where we are experiencing challenges with the Llama-3.2-90B-Vision-Instruct model.

Please let us know if you need any additional details. Looking forward to your insights on improving detection accuracy.