An intelligent Kubernetes troubleshooting system using AI agents for automated incident response and root cause analysis.
This project demonstrates how to build an agentic AIOps system that can automatically investigate Kubernetes issues, analyze observability data, and provide actionable insights for SRE teams.
- 🔍 Intelligent Diagnostics: AI-powered Kubernetes cluster analysis
- 📊 Observability Integration: CloudWatch metrics, logs, and alarms analysis
- 💾 Database Insights: DynamoDB performance and throttling detection
- 🤖 Multi-Agent Coordination: Specialized agents working together
- 🔗 Amazon Q Integration: Natural language interface for investigations
- k8sgpt 0.4.22+ (and make sure amazonbedrock has been configured here )
- docker 27.3.1+
- python 3.13+
- kubectl 1.33+
- aws cli 2.27.2+
- Export AWS credentials into terminal
- Install retail-store-sample-app
- Install manually cloudwatch container insights (doc)
- Make sure you have docker daemon running (e.g. Docker Desktop)
- Install (AWS MCP Server for CloudWatch and (AWS MCP Server for DynamoDB)
# Install dependencies
uv sync
#(optional) create package
uv pip install -e .
# Execute the aws eks update-kubeconfig command to bridge the authentication gap between your local tools and the remote AWS EKS cluster.
# This is needed by k8sgpt to analyze pods, deployments, events in the EKS cluster
aws eks update-kubeconfig --region us-east-1 --name retail-store
# Testing
python scripts/test_orchestrator.py
# ~/.aws/amazonq/mcp.json
{
"mcpServers": {
"sherlock": {
"command": "sherlock-mcp-server",
"args": [],
"env": {
"AWS_REGION": "us-east-1",
"KUBECONFIG": "~/.kube/config",
"BYPASS_TOOL_CONSENT": "true"
},
"disabled": false,
"autoApprove": []
}
}
}
Troubleshooting:
tail -f ~/.aws/amazonq/sherlock-mcp.log
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
