Analysis
Most teams running AI agents are flying blind. The agent works in testing, ships to production, and then the bills arrive: a token spend nobody budgeted for, a latency spike a customer noticed before you did, an error rate creeping up while everyone assumed things were fine. The model itself rarely tells you any of this. You have to go looking.
That gap is the problem a monitoring dashboard solves. Instead of digging through provider invoices at the end of the month or grepping logs after something breaks, you get a live picture of what your agents are actually doing, how many requests they handle, how long each one takes, how much it costs, and which models are carrying the load.
The build below puts that picture on a screen. A Python backend collects the numbers, a WebSocket pushes them to the browser as they happen, and a React dashboard charts them. None of it is exotic. It's the same stack a lot of Australian engineering teams already run, wired together for one job: telling you the truth about your AI system while it's running, not after.
Here's how the pieces fit.
Analysis
Prerequisites
- Python 3.10+, Node.js 20+
- InfluxDB or TimescaleDB
- Docker for one-command deployment
- Basic React knowledge for frontend
Step-by-Step Framework
Step 1: Metrics Collection
Everything starts with capturing each request as it happens. The collector below batches raw events in memory and flushes them to a time-series database, either InfluxDB or TimescaleDB, both of which are built for exactly this kind of metrics storage. Batching matters: writing every single request to the database one at a time will hammer it under load, so the buffer holds points until there are 100 of them or the flush timer fires.
# monitoring/collector.py
from datetime import datetime
from typing import Dict
import asyncio
class MetricsCollector:
def __init__(self, influx_client):
self.influx = influx_client
self.buffer = []
self.flush_interval = 10 # seconds
async def record_request(self, data: Dict):
"""Record a single request metric."""
point = {
"measurement": "llm_requests",
"tags": {
"model": data["model"],
"provider": data["provider"],
"status": data["status"], # success, error, timeout
"endpoint": data.get("endpoint", "default")
},
"fields": {
"input_tokens": data["input_tokens"],
"output_tokens": data["output_tokens"],
"total_tokens": data["input_tokens"] + data["output_tokens"],
"latency_ms": data["latency_ms"],
"cost_usd": data.get("cost_usd", 0),
"error": 1 if data["status"] == "error" else 0
},
"time": datetime.utcnow()
}
self.buffer.append(point)
if len(self.buffer) >= 100:
await self._flush()
async def _flush(self):
if not self.buffer:
return
await self.influx.write_points(self.buffer)
self.buffer = []
async def start(self):
while True:
await asyncio.sleep(self.flush_interval)
await self._flush()Step 2: FastAPI Backend with WebSockets
Next, the backend serves the data to the browser. FastAPI handles WebSocket connections natively through the @app.websocket decorator, with websocket.accept() to open the connection and receive_text/send_text to pass messages back and forth, see the Better Stack guide to FastAPI WebSockets for the full pattern. The broadcast_metrics loop wakes every five seconds, queries the latest aggregates, and pushes them to every connected client, dropping any that have gone dead.
One thing worth flagging before you ship this: the CORS config below uses allow_origins=["*"] together with allow_credentials=True, and the connection handlers swallow errors with bare except blocks. That's fine for a local build, but lock down the allowed origins and tighten the error handling before this faces the public internet.
# monitoring/api.py
from fastapi import FastAPI, WebSocket
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import json
from datetime import datetime, timedelta
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
connected_clients = set()
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
connected_clients.add(websocket)
try:
while True:
data = await websocket.receive_text()
# Client can send filter preferences
except:
connected_clients.discard(websocket)
async def broadcast_metrics():
"""Broadcast latest metrics to all connected clients."""
while True:
await asyncio.sleep(5)
metrics = await get_latest_metrics()
message = json.dumps({
"type": "metrics_update",
"timestamp": datetime.utcnow().isoformat(),
"data": metrics
})
dead_clients = set()
for client in connected_clients:
try:
await client.send_text(message)
except:
dead_clients.add(client)
connected_clients -= dead_clients
async def get_latest_metrics():
"""Query aggregated metrics."""
return {
"requests_per_minute": await get_rpm(),
"avg_latency_ms": await get_avg_latency(),
"error_rate": await get_error_rate(),
"tokens_per_minute": await get_tpm(),
"cost_per_hour": await get_hourly_cost(),
"active_models": await get_model_distribution(),
"top_endpoints": await get_top_endpoints()
}
@app.get("/api/metrics/current")
async def get_current_metrics():
return await get_latest_metrics()
@app.get("/api/metrics/history")
async def get_history(metric: str, period: str = "1h"):
"""Get historical data for charting."""
return await query_history(metric, period)Step 3: React Dashboard Frontend
The front end opens a WebSocket to the backend, listens for metrics_update messages, and keeps the last 50 readings in state so the charts have something to plot over time. Charting runs on Recharts, a composable React library built on D3, the LineChart, Line, XAxis, YAxis, Tooltip and ResponsiveContainer components imported here are its standard building blocks, and ResponsiveContainer is what makes the chart resize cleanly with the layout.
// dashboard/src/App.tsx
import { useEffect, useState } from 'react';
import { LineChart, Line, XAxis, YAxis, Tooltip, ResponsiveContainer } from 'recharts';
interface Metrics {
requests_per_minute: number;
avg_latency_ms: number;
error_rate: number;
tokens_per_minute: number;
cost_per_hour: number;
}
function App() {
const [metrics, setMetrics] = useState<Metrics | null>(null);
const [history, setHistory] = useState<any[]>([]);
const [ws, setWs] = useState<WebSocket | null>(null);
useEffect(() => {
const socket = new WebSocket('ws://localhost:8000/ws');
socket.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'metrics_update') {
setMetrics(data.data);
setHistory(prev => [...prev.slice(-50), {
time: new Date().toLocaleTimeString(),
rpm: data.data.requests_per_minute,
latency: data.data.avg_latency_ms,
errors: data.data.error_rate * 100
}]);
}
};
setWs(socket);
return () => socket.close();
}, []);
return (
<div className="dashboard">
<h1>AI Agent Monitoring</h1>
<div className="metrics-grid">
<MetricCard title="Requests/min" value={metrics?.requests_per_minute ?? 0} />
<MetricCard title="Avg Latency" value={




