I’m working on an app where users can make a sketch, then the app sends the image to a model to generate a basic HTML page from it. This setup works fine with models like ‘4o’ and others, but with the Llama Vision models, it feels like they don’t even recognize the image I’m sending. I’ve tried various prompts, but nothing seems to help—they can’t even accurately describe what they’re seeing.
Is this behavior normal for Llama Vision, or is there a specific way to use these models and guide them to produce the output I want?
Hi @malawad , Can you please re-confirm whether you are using this request format for vision models. [{ "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }]
Hi @malawad,
We are currently working on updating the documentation for the console to improve clarity and usability. Regarding your testing, please note that the rate limit is applied every minute, so you should try again after some time.
Let me know if you need any further assistance!
Thanks so much for your help! I tried the format suggested by @omkar.gangan, but I keep running into issues like “Rate limit exceeded” or “unexpected_error.” I’m not quite sure how to troubleshoot this. If you have a code example I can try locally to see how the vision model works, that would be incredibly helpful.
Hi @malawad ,
Can you please try this code example:
import openai
import base64
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
client = openai.OpenAI(
api_key='SAMBANOVA_API_KEY',
base_url="https://api.sambanova.ai/v1",
)
# Path to your image
image_path = "/Users/omkarg/Downloads/IMG_2117.jpg" #your image path
# Getting the base64 string
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="Llama-3.2-90B-Vision-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
)
print(response.choices[0].message.content)
“Llama-3.2-11B-Vision-Instruct” has Rate limit of 10 requests per minute
“Llama-3.2-90B-Vision-Instruct” has Rate limit of 1 requests per minute (Temporarily limited
due to high demand)
Thanks so much for your help! Please correct me if I’m wrong but It turns out the model doesn’t accept a system prompt, which is why I kept getting the “unexpected_error” message. Once I removed the system prompt, it started working, yet now, whenever I ask it to create HTML from an image, it just replies,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "I can't help with that.",
"role": "assistant"
}
}
]
It’s a bit frustrating haha! On the plus side, it does describe the image to some degree, so that’s a start. Now, I just need to figure out how to communicate with it effectively. I really appreciate your assistance!
@malawad I was able to get something working if I changed by text prompt to this
"text": "Please generate the HTML to build the item in this image"
And it gave me both the html and css for the image of a webtable that I provided . Have you tried passing what normally would have been in your system prompt into the text ?