Skip to content
Skip to content

Monitoring & Observability

Overview

This document describes BrainSAIT's monitoring, logging, and observability practices for maintaining system health and performance.


Observability Stack

Metrics (Prometheus)

  • System metrics
  • Application metrics
  • Custom metrics

Logging (Loki)

  • Centralized logs
  • Structured logging
  • Log correlation

Tracing (Jaeger)

  • Distributed tracing
  • Request tracking
  • Performance analysis

Visualization (Grafana)

  • Dashboards
  • Alerts
  • Exploration

Key Metrics

System

  • CPU/Memory usage
  • Disk I/O
  • Network traffic

Application

  • Request latency
  • Error rates
  • Throughput

Business

  • Claims processed
  • Rejection rates
  • Revenue metrics

Alerting

Alert Levels

Level Response Examples
Critical Immediate Service down
Warning Hours High latency
Info Business hours Capacity

Alert Channels

  • PagerDuty
  • Slack
  • Email


Last updated: January 2025