Context
Modern AI applications often leverage multiple large language models (LLMs) deployed via cloud APIs like SambaNova Cloud. For seamless development and troubleshooting, it’s essential to maintain clear visibility into model endpoint responsiveness and behavior.
That’s where a simple, continuous diagnostic tool becomes useful — helping developers quickly validate endpoint behavior and share actionable output when needed.
Why PulseProbe?
While full-featured monitoring platforms exist, they may be overkill for everyday development and debugging needs.
** PulseProbe is intentionally lightweight, CLI-friendly, and easy to integrate, designed to help teams validate model interactions and provide quick diagnostics for SambaNova Cloud APIs.
Problem Statement
Teams building with SambaNova Cloud often integrate multiple models across a range of workflows. At times, you may encounter:
Unclear response behavior during development
Inconsistent results in CLI or GUI workflows
Challenges replicating conditions during support requests
Without a quick way to test API behavior across models, it’s difficult to:
Pinpoint if an issue is model-specific
Collect reproducible examples
Provide actionable info to support teams
Goal
Create a lightweight, script-based diagnostic tool that:
- Dynamically fetches the latest model list from SambaNova Cloud
- Cycles through each model with a standardized test prompt
- Logs success/failure in real-time for quick triage
- Helps users capture and share API response behavior easily
- Minimizes resource/credit usage during checks
- Handles clean exits for easy CLI or scheduled use
Solution: PulseProbe
PulseProbe.py
is a Python utility to assist with real-time, low-cost diagnostics of LLM APIs on SambaNova Cloud.
Key Features:
Accepts a SambaNova API key as a CLI argument (no hardcoding)
Fetches the current active model list via
/v1/models
Skips test/guard models like
Meta-Llama-Guard-*
Sends a minimal prompt (“Say hello!”) to each model
Logs results clearly:
Successful 200 OK
Failures with HTTP code/message
Graceful handling of exceptions or timeouts
Includes a 1-second delay between checks to limit credit use
Supports clean exit on
Ctrl+C
How It Works
Loop Steps:
- Initialize:
- Load API key from CLI.
- Set up HTTP headers.
- Fetch Models:
- GET request to
https://api.sambanova.ai/v1/models
. - Filter out any models with
"Guard"
in their ID.
- Send Request:
- POST to
/v1/chat/completions
for each model with the prompt:“Say hello!”
- Log Result:
If HTTP 200, print success.
If failure, log error code and reason.
If exception (e.g., timeout), print warning.
- Repeat:
- After all models are tested, print summary and repeat the cycle.
Prerequisites
- Python 3.7 or higher
requests
package
pip install requests
- Valid SambaNova Cloud API key
Code Overview
Filename: pulseprobe.py
Core Modules:
requests
: For API interactionssignal
: For handling clean exitstime
: For delays between requestssys
,argparse
: For CLI argument parsing
Highlights:
- Runtime model discovery
- Simple filtering and test loop
- Human-readable, real-time logging
- Useful in CI/CD, dev environments, and issue reporting
Usage Example & Code:
Run the script by passing your SambaNova API key as a command-line argument:
python pulseprobe.py <YOUR_API_KEY>
Example:
python pulseprobe.py 34db0ebf-4962-4124-b209-5a3c33f06d4b
Sample Output:
`📥 Found 14 active models (excluding 'Guard').
✅ DeepSeek-R1 responded successfully.
✅ E5-Mistral-7B-Instruct responded successfully.
❌ QwQ-32B error: 503 - Service Unavailable
⚠️ Meta-Llama-3.3-70B-Instruct failed: ReadTimeout
🔄 Completed cycle #1. Continuing...`
Code:
pulseprobe.py
import time
import requests
import json
import signal
import sys
# Constants
API_URL = "https://api.sambanova.ai/v1/chat/completions"
MODEL_LIST_URL = "https://api.sambanova.ai/v1/models"
# Graceful termination
def signal_handler(sig, frame):
print("\n⛔ Terminated by user.")
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
# Check for API key
if len(sys.argv) != 2:
print("❗ Usage: python PulseProbe.py <YOUR_API_KEY>")
sys.exit(1)
API_KEY = sys.argv[1]
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Fetch latest model list from SambaNova
def fetch_models():
try:
response = requests.get(MODEL_LIST_URL, headers=HEADERS)
response.raise_for_status()
data = response.json()
# Skip models that include 'Guard' in their ID
models = [m["id"] for m in data.get("data", []) if "id" in m and "Guard" not in m["id"]]
print(f"📥 Found {len(models)} active models (excluding 'Guard').")
return models
except requests.RequestException as e:
print(f"❌ Error fetching model list: {e}")
sys.exit(1)
# Send prompt to model
def make_request(model):
payload = {
"model": model,
"messages": [
{"role": "system", "content": "Respond concisely."},
{"role": "user", "content": "What is the use of memory in AI models?"}
],
"max_tokens": 512,
"temperature": 0.5,
"stream": False
}
try:
res = requests.post(API_URL, headers=HEADERS, json=payload)
if res.status_code == 200:
print(f"✅ {model} responded successfully.")
else:
print(f"❌ {model} error: {res.status_code} - {res.text}")
except Exception as e:
print(f"⚠️ Exception for {model}: {e}")
# Main logic loop
def monitor_models():
models = fetch_models()
if not models:
print("⚠️ No models available. Check your credentials or the model API.")
sys.exit(1)
idx = 0
cycle = 1
while True:
model = models[idx]
make_request(model)
idx = (idx + 1) % len(models)
if idx == 0:
print(f"\n🔁 Completed cycle #{cycle}. Continuing...\n")
cycle += 1
time.sleep(1)
if __name__ == "__main__":
monitor_models()
Use Cases
Use Case | Description |
---|---|
Diagnostic Visibility | Quickly validate if models are responding as expected. |
CLI/GUI Troubleshooting | Check response behavior when results are unclear. |
Support Ticket Aid | Capture logs and share with SambaNova support to streamline triage. |
Credit-Conscious Checks | Uses minimal prompt + 1s delay to reduce credit cost |
API Credits & Usage Notes
- PulseProbe was designed to run efficiently and continuously without consuming excessive API credits.
- It uses a lightweight prompt and delay between checks to conserve resources.
- You can adjust the timing or number of models to further control usage.