About the SambaNova Documentation category

SambaNova Cloud is a high-performance inference service that delivers rapid and precise results. Customers can seamlessly leverage SambaNova technology to enhance their user experience by integrating FastAPI inference APIs with their applications. This service provides an easy-to-use REST interface for streaming the inference results. Users are able to customize the inference parameters and pass the ML model on to the service.

SambaNova Cloud models are the industry’s latest and highest performing models. These include the Llama 3.1-8B at >1000 tokens/s, the Llama 3.1-70B as well as the most capable Llama 3.1-405B at >114 tokens/s

Get Started with our Documentation to Kick Off your AI Journey!