Will "Llama 3.2 1B and Llama 3.2 3B" support an 8k or longer context length?

pcchen · October 24, 2024, 4:43pm

I see that the documentation currently only mentions 4k. Has it already started supporting longer context lengths?

alex.penketh · October 25, 2024, 10:56am

Welcome to the community.

You can find our currently supported models/context lengths in the following post: Supported Models
We are always working on new releases and improvements to models and their context lengths, so do check back for updates in the future!

Kind Regards

stu · November 24, 2024, 4:33pm

Is ‘max_tokens’ argument required for API inference, to take advantage of increased context length (e.g. 8k → 64k, Llama 3.1 70B)?

Hackathon apparently caused significant slowdown past week and even timeout issues, which I resolved by commenting out max_tokens.

But inference w/o max_tokens seems to fail for larger context, despite <<64k.

So is max_tokens required? Can’t Samba server accommodate increased limits w/o us specifying max_tokens? Thanks!