Meta Llama 3.3 model

hello1 · December 7, 2024, 3:23am

Happy Friday!

I recently saw the post about the 3.3 version of Llama released.
Any news on this version?

I will be curious about the token speed and real-life results compared to the 405b model.

Also, do you know any benchmarking method that helps compare the results?

Meta says it has the same results with a 70b size as the 3.1 405b version.

Thanks, Laszlo

coby.adams · December 11, 2024, 6:34pm

@hello1

3.3 is available

hello1 · December 11, 2024, 6:50pm

Thanks, Coby and the whole Team! I am starting to test with my code updates right now!

hello1 · December 11, 2024, 6:52pm

Is it possible to get beta access to the 64/128k context?

hello1 · December 12, 2024, 2:20pm

I faced this error:
Error code: 503 - {‘error’: {‘code’: None, ‘message’: ‘Meta-Llama-3.3-70B-Instruct-16k is temporarily unavailable. Please try again later!’, ‘param’: None, ‘type’: ‘’}}

I called the Meta-Llama-3.3-70B-Instruct model with my code.

omkar.gangan · December 12, 2024, 5:48pm

Hi @hello1 , Meta-Llama-3.3-70B model currently supports a maximum context length of 4096 tokens.

Thanks & Regards

hello1 · December 12, 2024, 6:02pm

Thanks for the information.

wwmcheung · December 13, 2024, 10:00pm

I also got the same error as @hello1 just now on Dec 13: Error code: 503 - {'error': {'code': None, 'message': 'Meta-Llama-3.3-70B-Instruct-8k is temporarily unavailable. Please try again later!', 'param': None, 'type': ''}}. I wanted to see if 3.3 would speed up my apps Stock vs. Stock and Parallel Unime.

coby.adams · December 13, 2024, 10:05pm

@wwmcheung

You should not have -8k in your code

Meta-Llama-3.3-70B-Instruct

hello1 · December 13, 2024, 10:11pm

In my code I wrote the given model-id but in the response I got a message for a different model.

wwmcheung · December 13, 2024, 10:43pm

In my code I’m passing Meta-Llama-3.3-70B-Instruct as the model, I’m not using the -8k suffix. That suffix is appearing in the error message, however. Maybe you guys should check the default on your side?

coby.adams · December 16, 2024, 5:10am

Thank you for raising this . I’ll up a bug this evening.

wwmcheung · December 17, 2024, 12:42am

Is there an ETA for when this will be fixed? Thanks

coby.adams · December 17, 2024, 5:46am

@wwmcheung
I do not have an exact date but I can say soon. I do apologize for not being able to be more specific at this time.

-Coby

seth.kneeland · December 18, 2024, 4:04pm

@wwmcheung @hello1 I have tested this morning and api access seems to be working. Please try again and report back if there are any issues.

Thanks!
Seth

hello1 · December 18, 2024, 4:24pm

Thanks, Seth,

I tested now and got the following error message in my code - the Error code comes from the API:
2024.12.18 17:23:17.288 [WARNING] - [SambaNova] - Error while generating text: Error code: 503 - {‘error’: {‘code’: None, ‘message’: ‘Meta-Llama-3.3-70B-Instruct-64k is temporarily unavailable. Please try again later!’, ‘param’: None, ‘type’: ‘’}}

sethkneeland · December 18, 2024, 4:35pm

Yes sorry I spoke too soon. Context lengths over 4k tokens are still throwing errors. I’ll update once this has been corrected.

hello1 · December 18, 2024, 4:46pm

Thanks!
Until it is not larger, I will not use the new model for my code.

wwmcheung · January 11, 2025, 7:47pm

Hi, any ETA on this? I tried again today and still get: Error code: 429 - {'error': {'code': None, 'message': 'Meta-Llama-3.3-70B-Instruct-8k is temporarily unavailable. Please try again later!', 'param': None, 'type': ''}} httpx.HTTPStatusError: Client error '429 Too Many Requests' for url 'https://api.sambanova.ai/v1/chat/completions'

coby.adams · February 25, 2025, 5:45am

@hello1 and @wwmcheung

As an FYI the Meta-Llama-3.3-70B-Instruct ow supports up to 128k context lengths. Please note as always if TTFT is your main concern working with the shortest context length would always be the fastest.

-Coby