Metrics & Monitoring
Metrics collection and visualization for Semantic Router using Prometheus and Grafana.
1. Metrics & Endpoints​
| Component | Endpoint | Notes | 
|---|---|---|
| Router metrics | :9190/metrics | Prometheus format (flag: --metrics-port) | 
| Router health | :8080/health | HTTP readiness/liveness | 
| Envoy metrics (optional) | :19000/stats/prometheus | If Envoy is enabled | 
Configuration location: tools/observability/
Dashboard: tools/observability/llm-router-dashboard.json
2. Local Mode (Router on Host)​
Run router natively on host, observability in Docker.
Quick Start​
# Start router
make run-router
# Start observability
make o11y-local
Access:
- Prometheus: http://localhost:9090
 - Grafana: http://localhost:3000 (admin/admin)
 
Verify targets:
# Check Prometheus scrapes localhost:9190
open http://localhost:9090/targets
Stop:
make stop-observability
Configuration​
All configs in tools/observability/:
prometheus.yaml- Scrapes the target from theROUTER_TARGETenv var (default:localhost:9190)grafana-datasource.yaml- Points tolocalhost:9090grafana-dashboard.yaml- Dashboard provisioningllm-router-dashboard.json- Dashboard definition
Troubleshooting​
| Issue | Fix | 
|---|---|
| Target DOWN | Start router: make run-router | 
| No metrics | Generate traffic, check :9190/metrics | 
| Port conflict | Change port or stop conflicting service | 
3. Docker Compose Mode​
All services in Docker containers.
Quick Start​
# Start full stack (includes observability)
docker compose up --build
# Or with testing profile
docker compose --profile testing up --build
Access:
- Prometheus: http://localhost:9090
 - Grafana: http://localhost:3000 (admin/admin)
 
Expected targets:
semantic-router:9190envoy-proxy:19000(optional)
Configuration​
Same configs as local mode (tools/observability/), but:
ROUTER_TARGET=semantic-router:9190PROMETHEUS_URL=prometheus:9090- Uses 
semantic-networkbridge network 
4. Kubernetes Mode​
Production-ready Prometheus + Grafana for K8s clusters.
Namespace:
vllm-semantic-router-system
Components​
| Component | Purpose | Location | 
|---|---|---|
| Prometheus | Scrapes router metrics, 15d retention | deploy/kubernetes/observability/prometheus/ | 
| Grafana | Dashboard visualization | deploy/kubernetes/observability/grafana/ | 
| Ingress | Optional external access | deploy/kubernetes/observability/ingress.yaml | 
Deploy​
# Apply manifests
kubectl apply -k deploy/kubernetes/observability/
# Verify
kubectl get pods -n vllm-semantic-router-system
Access​
Port-forward:
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
Ingress: Customize ingress.yaml with your domain and TLS
Key Configuration​
Prometheus uses Kubernetes service discovery:
scrape_configs:
  - job_name: semantic-router
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [vllm-semantic-router-system]
Grafana credentials (change in production):
kubectl create secret generic grafana-admin \
  --namespace vllm-semantic-router-system \
  --from-literal=admin-user=admin \
  --from-literal=admin-password='your-password'
5. Key Metrics​
| Metric | Type | Description | 
|---|---|---|
llm_category_classifications_count | counter | Category classifications | 
llm_model_completion_tokens_total | counter | Tokens per model | 
llm_model_routing_modifications_total | counter | Model routing changes | 
llm_model_completion_latency_seconds | histogram | Completion latency | 
Example queries:
rate(llm_model_completion_tokens_total[5m])
histogram_quantile(0.95, rate(llm_model_completion_latency_seconds_bucket[5m]))
6. Troubleshooting​
| Issue | Check | Fix | 
|---|---|---|
| Target DOWN | Prometheus /targets | Verify router is running and exposing :9190/metrics | 
| No metrics | Generate traffic | Send requests through router | 
| Dashboard empty | Grafana datasource | Check Prometheus URL configuration |