I see that the documentation currently only mentions 4k. Has it already started supporting longer context lengths?
1 Like
Hi @pcchen !
Welcome to the community.
You can find our currently supported models/context lengths in the following post: Supported Models
We are always working on new releases and improvements to models and their context lengths, so do check back for updates in the future!
Kind Regards
1 Like
Is ‘max_tokens’ argument required for API inference, to take advantage of increased context length (e.g. 8k → 64k, Llama 3.1 70B)?
Hackathon apparently caused significant slowdown past week and even timeout issues, which I resolved by commenting out max_tokens.
But inference w/o max_tokens seems to fail for larger context, despite <<64k.
So is max_tokens required? Can’t Samba server accommodate increased limits w/o us specifying max_tokens? Thanks!