I would like to first express appreciation to the team for providing me with the API key to explore Llama models with SambaNova.
I was trying to analyse the speed of each request i made and i found the info stored in the CompletionUsage
class.
Below is my result:
CompletionUsage(completion_tokens=505, prompt_tokens=934, total_tokens=1439, completion_tokens_after_first_per_sec=97.85392382608508, completion_tokens_after_first_per_sec_first_ten=100.00384770462531, completion_tokens_per_sec=93.61228421798707, end_time=1725867271.8211262, is_last_response=True, start_time=1725867266.3258064, time_to_first_token=0.3447854518890381, total_latency=5.394591150281611, total_tokens_per_sec=266.74866730630373)
In the recent blogpost it mentioned that SambaNova was able to run Llama3.1 405B at 114 tok/sec. Is completion_tokens_per_sec
the attribute used to identify the abovementioned tok/s?