Are Llama Vision Models 'Blind,' or Do I Just Not Know How to Use Them?

Hi @malawad ,
Can you please try this code example:

import openai
import base64

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

client = openai.OpenAI(
    api_key='SAMBANOVA_API_KEY',
    base_url="https://api.sambanova.ai/v1",
)

# Path to your image
image_path = "/Users/omkarg/Downloads/IMG_2117.jpg" #your image path

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
  model="Llama-3.2-90B-Vision-Instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)

“Llama-3.2-11B-Vision-Instruct” has Rate limit of 10 requests per minute
“Llama-3.2-90B-Vision-Instruct” has Rate limit of 1 requests per minute (Temporarily limited
due to high demand)

More information regarding rate limit is found at rate_limits and you can also check api_error_codes

Thanks & Regards

2 Likes