Usage page doesn't show usage that goes to -8k endpoints

Also All Models apparently points to 8B not actually usage across everything

can you explain more? what does API request look like and which models did you try?

I see that now usage to all 8k ctx endpoints are in total usage, but we can’t see if we go into 8b/70b/405b
also it appears like(at least yesterday) the behavior of the usage page is a bit wonky.
Yesterday, I had 11 requests to 405b-4k, 2 requests to 70b-4k, and 1 requests to 8b-4k
The All Models page shows 11 requests in total, not 14, and the number of tokens used is the same as what I see on the 405b page yesterday

Yeah one big surprise I had was that I was expecting a 128k context.

The 8k context window is a huge surprise.

Perhaps a note about context size should be documented at:

2 Likes

@sam.saffron Thank you for the feedback . I will take that back to engineering . I know it is mentioned in other places but having it there would make it easier on developers.

1 Like

Hello, yes a larger sequence length is expected, with a simple request i’m out of 8k :sweat_smile:

Requested generation length 1 is not possible! The provided prompt is 16811 tokens long, so generating 1 tokens requires a sequence length of 16812, but the maximum supported sequence length is just 8192!

1 Like