PulseProbe – Your CLI Companion for Model Visibility on SambaNova Cloud

prafull.thokal · May 29, 2025, 1:38pm

Context

Modern AI applications often leverage multiple large language models (LLMs) deployed via cloud APIs like SambaNova Cloud. For seamless development and troubleshooting, it’s essential to maintain clear visibility into model endpoint responsiveness and behavior.

That’s where a simple, continuous diagnostic tool becomes useful — helping developers quickly validate endpoint behavior and share actionable output when needed.

Why PulseProbe?
While full-featured monitoring platforms exist, they may be overkill for everyday development and debugging needs.
** PulseProbe is intentionally lightweight, CLI-friendly, and easy to integrate, designed to help teams validate model interactions and provide quick diagnostics for SambaNova Cloud APIs.

Problem Statement

Teams building with SambaNova Cloud often integrate multiple models across a range of workflows. At times, you may encounter:

Unclear response behavior during development
Inconsistent results in CLI or GUI workflows
Challenges replicating conditions during support requests

Without a quick way to test API behavior across models, it’s difficult to:

Pinpoint if an issue is model-specific
Collect reproducible examples
Provide actionable info to support teams

Goal

Create a lightweight, script-based diagnostic tool that:

Dynamically fetches the latest model list from SambaNova Cloud
Cycles through each model with a standardized test prompt
Logs success/failure in real-time for quick triage
Helps users capture and share API response behavior easily
Minimizes resource/credit usage during checks
Handles clean exits for easy CLI or scheduled use

Solution: PulseProbe

PulseProbe.py is a Python utility to assist with real-time, low-cost diagnostics of LLM APIs on SambaNova Cloud.

Key Features:

Accepts a SambaNova API key as a CLI argument (no hardcoding)
Fetches the current active model list via /v1/models
Skips test/guard models like Meta-Llama-Guard-*
Sends a minimal prompt (“Say hello!”) to each model
Logs results clearly:
Successful 200 OK
Failures with HTTP code/message
Graceful handling of exceptions or timeouts
Includes a 1-second delay between checks to limit credit use
Supports clean exit on Ctrl+C

How It Works

Loop Steps:

Initialize:

Load API key from CLI.
Set up HTTP headers.

Fetch Models:

GET request to https://api.sambanova.ai/v1/models.
Filter out any models with "Guard" in their ID.

Send Request:

POST to /v1/chat/completions for each model with the prompt:“Say hello!”

Log Result:

If HTTP 200, print success.
If failure, log error code and reason.
If exception (e.g., timeout), print warning.

Repeat:

After all models are tested, print summary and repeat the cycle.

Prerequisites

Python 3.7 or higher
requests package

pip install requests

Valid SambaNova Cloud API key

Code Overview

Filename: pulseprobe.py

Core Modules:

requests: For API interactions
signal: For handling clean exits
time: For delays between requests
sys, argparse: For CLI argument parsing

Highlights:

Runtime model discovery
Simple filtering and test loop
Human-readable, real-time logging
Useful in CI/CD, dev environments, and issue reporting

Usage Example & Code:

Run the script by passing your SambaNova API key as a command-line argument:

python pulseprobe.py <YOUR_API_KEY>

Example:

python pulseprobe.py 34db0ebf-4962-4124-b209-5a3c33f06d4b

Sample Output:

`📥 Found 14 active models (excluding 'Guard').
✅ DeepSeek-R1 responded successfully.
✅ E5-Mistral-7B-Instruct responded successfully.
❌ QwQ-32B error: 503 - Service Unavailable
⚠️ Meta-Llama-3.3-70B-Instruct failed: ReadTimeout
🔄 Completed cycle #1. Continuing...`

Code:

pulseprobe.py

import time
import requests
import json
import signal
import sys

# Constants
API_URL = "https://api.sambanova.ai/v1/chat/completions"
MODEL_LIST_URL = "https://api.sambanova.ai/v1/models"

# Graceful termination
def signal_handler(sig, frame):
    print("\n⛔ Terminated by user.")
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

# Check for API key
if len(sys.argv) != 2:
    print("❗ Usage: python PulseProbe.py <YOUR_API_KEY>")
    sys.exit(1)

API_KEY = sys.argv[1]

HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Fetch latest model list from SambaNova
def fetch_models():
    try:
        response = requests.get(MODEL_LIST_URL, headers=HEADERS)
        response.raise_for_status()
        data = response.json()
        # Skip models that include 'Guard' in their ID
        models = [m["id"] for m in data.get("data", []) if "id" in m and "Guard" not in m["id"]]
        print(f"📥 Found {len(models)} active models (excluding 'Guard').")
        return models
    except requests.RequestException as e:
        print(f"❌ Error fetching model list: {e}")
        sys.exit(1)

# Send prompt to model
def make_request(model):
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": "Respond concisely."},
            {"role": "user", "content": "What is the use of memory in AI models?"}
        ],
        "max_tokens": 512,
        "temperature": 0.5,
        "stream": False
    }

    try:
        res = requests.post(API_URL, headers=HEADERS, json=payload)
        if res.status_code == 200:
            print(f"✅ {model} responded successfully.")
        else:
            print(f"❌ {model} error: {res.status_code} - {res.text}")
    except Exception as e:
        print(f"⚠️ Exception for {model}: {e}")

# Main logic loop
def monitor_models():
    models = fetch_models()
    if not models:
        print("⚠️ No models available. Check your credentials or the model API.")
        sys.exit(1)

    idx = 0
    cycle = 1

    while True:
        model = models[idx]
        make_request(model)

        idx = (idx + 1) % len(models)
        if idx == 0:
            print(f"\n🔁 Completed cycle #{cycle}. Continuing...\n")
            cycle += 1

        time.sleep(1)

if __name__ == "__main__":
    monitor_models()

Use Cases

Use Case	Description
Diagnostic Visibility	Quickly validate if models are responding as expected.
CLI/GUI Troubleshooting	Check response behavior when results are unclear.
Support Ticket Aid	Capture logs and share with SambaNova support to streamline triage.
Credit-Conscious Checks	Uses minimal prompt + 1s delay to reduce credit cost

API Credits & Usage Notes

PulseProbe was designed to run efficiently and continuously without consuming excessive API credits.
It uses a lightweight prompt and delay between checks to conserve resources.
You can adjust the timing or number of models to further control usage.