How-to Guide

How to build a real-time AI monitoring dashboard.

Create a live monitoring dashboard for AI agent systems with token usage tracking, latency metrics, error rates, cost analysis, and model performance comparisons.

Daniel Fleuren2026-06-1915 min readAustralian business teamsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for How to build a real-time AI monitoring dashboard.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: A real-time monitoring dashboard gives you visibility into every aspect of your AI agent system: token consumption, latency distributions, error rates, cost per request, and model performance. This guide builds a complete dashboard with a FastAPI backend, WebSocket streaming, and a React frontend, reportedly deployable in under an hour.

Key takeaways

Metrics: Tokens, latency, errors, costs, model distribution
Streaming: WebSockets for sub-second updates
Aggregation: Roll-up from raw events to minute/hour/day views
Alerting: Threshold-based alerts for anomalies
Storage: Time-series database (InfluxDB or TimescaleDB)

Analysis

Most teams running AI agents are flying blind. The agent works in testing, ships to production, and then the bills arrive: a token spend nobody budgeted for, a latency spike a customer noticed before you did, an error rate creeping up while everyone assumed things were fine. The model itself rarely tells you any of this. You have to go looking.

That gap is the problem a monitoring dashboard solves. Instead of digging through provider invoices at the end of the month or grepping logs after something breaks, you get a live picture of what your agents are actually doing, how many requests they handle, how long each one takes, how much it costs, and which models are carrying the load.

The build below puts that picture on a screen. A Python backend collects the numbers, a WebSocket pushes them to the browser as they happen, and a React dashboard charts them. None of it is exotic. It's the same stack a lot of Australian engineering teams already run, wired together for one job: telling you the truth about your AI system while it's running, not after.

Here's how the pieces fit.

Analysis

Prerequisites

Python 3.10+, Node.js 20+
InfluxDB or TimescaleDB
Docker for one-command deployment
Basic React knowledge for frontend

Step-by-Step Framework

Step 1: Metrics Collection

Everything starts with capturing each request as it happens. The collector below batches raw events in memory and flushes them to a time-series database, either InfluxDB or TimescaleDB, both of which are built for exactly this kind of metrics storage. Batching matters: writing every single request to the database one at a time will hammer it under load, so the buffer holds points until there are 100 of them or the flush timer fires.

# monitoring/collector.py
from datetime import datetime
from typing import Dict
import asyncio

class MetricsCollector:
    def __init__(self, influx_client):
        self.influx = influx_client
        self.buffer = []
        self.flush_interval = 10  # seconds

    async def record_request(self, data: Dict):
        """Record a single request metric."""
        point = {
            "measurement": "llm_requests",
            "tags": {
                "model": data["model"],
                "provider": data["provider"],
                "status": data["status"],  # success, error, timeout
                "endpoint": data.get("endpoint", "default")
            },
            "fields": {
                "input_tokens": data["input_tokens"],
                "output_tokens": data["output_tokens"],
                "total_tokens": data["input_tokens"] + data["output_tokens"],
                "latency_ms": data["latency_ms"],
                "cost_usd": data.get("cost_usd", 0),
                "error": 1 if data["status"] == "error" else 0
            },
            "time": datetime.utcnow()
        }

        self.buffer.append(point)

        if len(self.buffer) >= 100:
            await self._flush()

    async def _flush(self):
        if not self.buffer:
            return
        await self.influx.write_points(self.buffer)
        self.buffer = []

    async def start(self):
        while True:
            await asyncio.sleep(self.flush_interval)
            await self._flush()

Step 2: FastAPI Backend with WebSockets

Next, the backend serves the data to the browser. FastAPI handles WebSocket connections natively through the @app.websocket decorator, with websocket.accept() to open the connection and receive_text/send_text to pass messages back and forth, see the Better Stack guide to FastAPI WebSockets for the full pattern. The broadcast_metrics loop wakes every five seconds, queries the latest aggregates, and pushes them to every connected client, dropping any that have gone dead.

One thing worth flagging before you ship this: the CORS config below uses allow_origins=["*"] together with allow_credentials=True, and the connection handlers swallow errors with bare except blocks. That's fine for a local build, but lock down the allowed origins and tighten the error handling before this faces the public internet.

# monitoring/api.py
from fastapi import FastAPI, WebSocket
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import json
from datetime import datetime, timedelta

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"]
)

connected_clients = set()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    connected_clients.add(websocket)
    try:
        while True:
            data = await websocket.receive_text()
            # Client can send filter preferences
    except:
        connected_clients.discard(websocket)

async def broadcast_metrics():
    """Broadcast latest metrics to all connected clients."""
    while True:
        await asyncio.sleep(5)

        metrics = await get_latest_metrics()
        message = json.dumps({
            "type": "metrics_update",
            "timestamp": datetime.utcnow().isoformat(),
            "data": metrics
        })

        dead_clients = set()
        for client in connected_clients:
            try:
                await client.send_text(message)
            except:
                dead_clients.add(client)

        connected_clients -= dead_clients

async def get_latest_metrics():
    """Query aggregated metrics."""
    return {
        "requests_per_minute": await get_rpm(),
        "avg_latency_ms": await get_avg_latency(),
        "error_rate": await get_error_rate(),
        "tokens_per_minute": await get_tpm(),
        "cost_per_hour": await get_hourly_cost(),
        "active_models": await get_model_distribution(),
        "top_endpoints": await get_top_endpoints()
    }

@app.get("/api/metrics/current")
async def get_current_metrics():
    return await get_latest_metrics()

@app.get("/api/metrics/history")
async def get_history(metric: str, period: str = "1h"):
    """Get historical data for charting."""
    return await query_history(metric, period)

Step 3: React Dashboard Frontend

The front end opens a WebSocket to the backend, listens for metrics_update messages, and keeps the last 50 readings in state so the charts have something to plot over time. Charting runs on Recharts, a composable React library built on D3, the LineChart, Line, XAxis, YAxis, Tooltip and ResponsiveContainer components imported here are its standard building blocks, and ResponsiveContainer is what makes the chart resize cleanly with the layout.

// dashboard/src/App.tsx
import { useEffect, useState } from 'react';
import { LineChart, Line, XAxis, YAxis, Tooltip, ResponsiveContainer } from 'recharts';

interface Metrics {
  requests_per_minute: number;
  avg_latency_ms: number;
  error_rate: number;
  tokens_per_minute: number;
  cost_per_hour: number;
}

function App() {
  const [metrics, setMetrics] = useState<Metrics | null>(null);
  const [history, setHistory] = useState<any[]>([]);
  const [ws, setWs] = useState<WebSocket | null>(null);

  useEffect(() => {
    const socket = new WebSocket('ws://localhost:8000/ws');
    socket.onmessage = (event) => {
      const data = JSON.parse(event.data);
      if (data.type === 'metrics_update') {
        setMetrics(data.data);
        setHistory(prev => [...prev.slice(-50), {
          time: new Date().toLocaleTimeString(),
          rpm: data.data.requests_per_minute,
          latency: data.data.avg_latency_ms,
          errors: data.data.error_rate * 100
        }]);
      }
    };
    setWs(socket);
    return () => socket.close();
  }, []);

  return (
    <div className="dashboard">
      <h1>AI Agent Monitoring</h1>

      <div className="metrics-grid">
        <MetricCard title="Requests/min" value={metrics?.requests_per_minute ?? 0} />
        <MetricCard title="Avg Latency" value={

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

Pick the smallest useful workflow that proves the pattern.
Write down the owner, data boundary, review point, and success measure.
Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI consulting & strategy.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: How to build a real-time AI monitoring dashboard

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call

How to build a real-time AI monitoring dashboard.

Daniel Fleuren

Start narrow

Hype drift

Business signal

TL;DR

Key takeaways

Analysis

Analysis

Prerequisites

Step-by-Step Framework

Step 1: Metrics Collection

Step 2: FastAPI Backend with WebSockets

Step 3: React Dashboard Frontend

Primary references to keep this briefing grounded

What to do next

Use the article as a decision prompt

Turn this into a practical roadmap.

Related articles

How to evaluate LLMs with private benchmarks

How to build a voice-enabled AI assistant

How to build a knowledge graph with Cognee