Understanding CompletionUsage attributes

angpers95 · September 9, 2024, 8:22am

I would like to first express appreciation to the team for providing me with the API key to explore Llama models with SambaNova.
I was trying to analyse the speed of each request i made and i found the info stored in the CompletionUsage class.
Below is my result:

CompletionUsage(completion_tokens=505, prompt_tokens=934, total_tokens=1439, completion_tokens_after_first_per_sec=97.85392382608508, completion_tokens_after_first_per_sec_first_ten=100.00384770462531, completion_tokens_per_sec=93.61228421798707, end_time=1725867271.8211262, is_last_response=True, start_time=1725867266.3258064, time_to_first_token=0.3447854518890381, total_latency=5.394591150281611, total_tokens_per_sec=266.74866730630373)

In the recent blogpost it mentioned that SambaNova was able to run Llama3.1 405B at 114 tok/sec. Is completion_tokens_per_sec the attribute used to identify the abovementioned tok/s?

coby.adams · September 9, 2024, 11:32pm

@angpers95 I apologize for the delay . I am consulting with back end engineering and will get you an answer .

coby.adams · September 10, 2024, 12:54am

@angpers95 the value that is used in our published values is completion_tokens_after_first_per_sec. So the rate of return after time to first token .

Please note that the free tier may not always exhibit published numbers . If you are interested in moving beyond the free tier you may contact our sales department.