Also All Models apparently points to 8B not actually usage across everything
can you explain more? what does API request look like and which models did you try?
I see that now usage to all 8k ctx endpoints are in total usage, but we can’t see if we go into 8b/70b/405b
also it appears like(at least yesterday) the behavior of the usage page is a bit wonky.
Yesterday, I had 11 requests to 405b-4k, 2 requests to 70b-4k, and 1 requests to 8b-4k
The All Models page shows 11 requests in total, not 14, and the number of tokens used is the same as what I see on the 405b page yesterday
Yeah one big surprise I had was that I was expecting a 128k context.
The 8k context window is a huge surprise.
Perhaps a note about context size should be documented at:
@sam.saffron Thank you for the feedback . I will take that back to engineering . I know it is mentioned in other places but having it there would make it easier on developers.
Hello, yes a larger sequence length is expected, with a simple request i’m out of 8k
Requested generation length 1 is not possible! The provided prompt is 16811 tokens long, so generating 1 tokens requires a sequence length of 16812, but the maximum supported sequence length is just 8192!