From Alert to Action: AI-Driven API Outage Analysis
How Maersk’s Stargate agent automates RCA across Kong Gateway and Kuma to cut MTTD/MTTR using Grafana alerts, log feature extraction, and mesh health checks.
Maersk walks through building “Stargate,” an AI ops agent that automates incident triage and RCA across Kong Gateway and Kuma. See how Grafana alerts trigger analysis, logs are feature-extracted, and mesh health checks pinpoint faults to cut MTTD/MTTR.
What you’ll learn:
- Kong Gateway + Kuma runtime architecture (edge → data plane → mesh → services)
- Automated detection via Grafana webhooks and Prometheus rules
- Log feature extraction (proxy time, target time, total time) to localize faults
- RCA using Azure OpenAI (GPT-4) with few-shot prompts and safe log preprocessing
- Blast radius analysis (impacted consumers) and ownership routing
- Next steps: proactive mitigation and regional failover