KAAR AI-Powered Kubernetes Cluster Analysis and Remediation
Introducing KAAR AI-Powered Kubernetes Cluster Analysis and Remediation
KAAR (Kubernetes AI-powered Analysis and Remediation), a tool that automates Kubernetes Pod issue detection and resolution using k8sgpt and AWS Bedrock.
In this post, I’ll introduce KAAR, explain its value, and guide you on getting started with it.
The Kubernetes Pod Challenge
Pods are the core of Kubernetes, but they often encounter issues that halt your applications:
- ImagePullBackOff: Pods fail to start due to invalid or inaccessible container images.
- CrashLoopBackOff: Containers crash repeatedly from misconfigured commands or errors like “executable file not found.”
- OOMKilled: Pods are terminated for exceeding memory limits.
- Pending: Pods can’t schedule due to resource constraints or node affinity issues.
Manually debugging these with kubectl describe
and logs
is time-consuming, especially in large clusters.
KAAR automates this process, leveraging AI to resolve Pod issues quickly and reliably, saving valuable time for DevOps teams.
What is KAAR?
KAAR is an open-source tool designed to simplify Kubernetes Pod management. It integrates:
- k8sgpt: A diagnostic tool that scans Kubernetes clusters for Pod issues.
- AWS Bedrock: Uses the Claude v2 model to classify issues and recommend fixes.
- AWS SNS and CloudWatch: Sends notifications and logs results for team visibility.
- kubectl: Applies automated fixes to get Pods back on track.
KAAR is perfect for DevOps engineers, SREs, and architects who want to minimize manual intervention. It currently focuses on Pod remediation, with plans to support Services and Deployments in future releases.
How KAAR Works
KAAR follows a streamlined, AI-powered workflow to fix Pod issues:
- Cluster Scanning: KAAR uses k8sgpt to analyze your cluster and identify Pod issues.
- Issue Classification: Findings are processed by AWS Bedrock to determine issue type.
- Remediation: Applies fixes via
kubectl
, like updating images or adjusting limits. - Verification: Ensures Pods are Running and healthy.
- Notification: Logs results to CloudWatch and sends SNS notifications.
Real-World Examples
Scenario 1: CrashLoopBackOff
A Pod nginx-pod
is stuck due to an invalid command. KAAR:
- Detects the issue with k8sgpt.
- Uses Bedrock to classify it as CrashLoopBackOff.
- Updates the Pod’s command.
- Verifies it is Running.
- Sends an SNS alert:
“Pod nginx-pod in default is Healthy.”
Scenario 2: OOMKilled
A Pod memory-hog
is terminated for low memory. KAAR:
- Identifies the OOMKilled issue.
- Increases the memory limit.
- Confirms stability.
- Sends an alert:
“Pod memory-hog in default is Healthy.”
Python 3.11 with boto3 and pyyaml:
Why Choose KAAR?
KAAR offers compelling benefits for Kubernetes admins:
- Time Savings: Automates remediation.
- Accuracy: AI-driven classification and suggestions.
- AWS Integration: Works with Bedrock, SNS, CloudWatch.
- Future-Ready: Coming support for Services and Deployments.
As an AWS Community Builder, I built KAAR to reflect AWS’s commitment to automation and innovation.
Getting Started with KAAR
Prerequisites
Ensure you have:
Kubernetes Cluster: AWS EKS, minikube, or kind.
1
2
minikube start --driver=docker
kubectl get nodes
AWS Credentials:
1
aws configure
Python 3.11 with boto3 and pyyaml:
1
2
3
python3.11 -m venv python311_env
source python311_env/bin/activate
pip install boto3 pyyaml
Install KAAR via pip:
1
pip install kaar
Create my_config.yaml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
aws:
region: us-east-1
sns_topic_arn: arn:aws:sns:us-east-1:YOUR_AWS_ACCOUNT_ID:KAARAlerts
log_group: /kaar/notifications
log_stream: kaar-notifier
k8sgpt:
backend: amazonbedrock
explain: true
bedrock:
model: anthropic.claude-v2:1
max_tokens: 50
temperature: 0.5
remediation:
max_attempts: 5
retry_interval_seconds: 5
Future Plans
KAAR is poised to become a comprehensive Kubernetes management tool. Planned enhancements include:
- Service Remediation: Support for issues like SelectorMismatch and LoadBalancerPending.
- Deployment Support: Fixes for Deployment misconfigurations.
- EventBridge Integration: Scheduled runs for proactive monitoring, inspired by my KAAR project.
- Local LLMs: Support for Ollama as an alternative to Bedrock.