Usage page doesn't show usage that goes to -8k endpoints

xiaoqianwx · September 10, 2024, 7:43pm

Also All Models apparently points to 8B not actually usage across everything

vasanth.mohan · September 11, 2024, 1:54pm

can you explain more? what does API request look like and which models did you try?

xiaoqianwx · September 11, 2024, 6:31pm

I see that now usage to all 8k ctx endpoints are in total usage, but we can’t see if we go into 8b/70b/405b
also it appears like(at least yesterday) the behavior of the usage page is a bit wonky.
Yesterday, I had 11 requests to 405b-4k, 2 requests to 70b-4k, and 1 requests to 8b-4k
The All Models page shows 11 requests in total, not 14, and the number of tokens used is the same as what I see on the 405b page yesterday

sam.saffron · September 12, 2024, 12:14am

Yeah one big surprise I had was that I was expecting a 128k context.

The 8k context window is a huge surprise.

Perhaps a note about context size should be documented at:

coby.adams · September 12, 2024, 12:46am

@sam.saffron Thank you for the feedback . I will take that back to engineering . I know it is mentioned in other places but having it there would make it easier on developers.

a.malaga · September 12, 2024, 6:45pm

Hello, yes a larger sequence length is expected, with a simple request i’m out of 8k

Requested generation length 1 is not possible! The provided prompt is 16811 tokens long, so generating 1 tokens requires a sequence length of 16812, but the maximum supported sequence length is just 8192!

Topic		Replies	Views
Usage page fails to reliably track usage Discussion jira-issue	1	29	September 24, 2024
Possibility of introducing larger ctx to 405b? Discussion dev	1	45	October 14, 2024
Context Length for the Meta-Llama-3.1-405B-Instruct is too small Discussion	17	1009	November 18, 2024
Launching app soon. Keen to migrate to SAMBANOVA but need larger context. Can someone help? Discussion dev	1	62	October 18, 2024
Will "Llama 3.2 1B and Llama 3.2 3B" support an 8k or longer context length? Discussion dev	2	128	November 24, 2024

Usage page doesn't show usage that goes to -8k endpoints

Related topics