Post

KAAR AI-Powered Kubernetes Cluster Analysis and Remediation

KAAR AI-Powered Kubernetes Cluster Analysis and Remediation

kaar

Introducing KAAR AI-Powered Kubernetes Cluster Analysis and Remediation

KAAR (Kubernetes AI-powered Analysis and Remediation), a tool that automates Kubernetes Pod issue detection and resolution using k8sgpt and AWS Bedrock.

In this post, I’ll introduce KAAR, explain its value, and guide you on getting started with it.


The Kubernetes Pod Challenge

Pods are the core of Kubernetes, but they often encounter issues that halt your applications:

  • ImagePullBackOff: Pods fail to start due to invalid or inaccessible container images.
  • CrashLoopBackOff: Containers crash repeatedly from misconfigured commands or errors like “executable file not found.”
  • OOMKilled: Pods are terminated for exceeding memory limits.
  • Pending: Pods can’t schedule due to resource constraints or node affinity issues.

Manually debugging these with kubectl describe and logs is time-consuming, especially in large clusters.

KAAR automates this process, leveraging AI to resolve Pod issues quickly and reliably, saving valuable time for DevOps teams.


What is KAAR?

KAAR is an open-source tool designed to simplify Kubernetes Pod management. It integrates:

  • k8sgpt: A diagnostic tool that scans Kubernetes clusters for Pod issues.
  • AWS Bedrock: Uses the Claude v2 model to classify issues and recommend fixes.
  • AWS SNS and CloudWatch: Sends notifications and logs results for team visibility.
  • kubectl: Applies automated fixes to get Pods back on track.

KAAR is perfect for DevOps engineers, SREs, and architects who want to minimize manual intervention. It currently focuses on Pod remediation, with plans to support Services and Deployments in future releases.


How KAAR Works

KAAR follows a streamlined, AI-powered workflow to fix Pod issues:

  1. Cluster Scanning: KAAR uses k8sgpt to analyze your cluster and identify Pod issues.
  2. Issue Classification: Findings are processed by AWS Bedrock to determine issue type.
  3. Remediation: Applies fixes via kubectl, like updating images or adjusting limits.
  4. Verification: Ensures Pods are Running and healthy.
  5. Notification: Logs results to CloudWatch and sends SNS notifications.

Real-World Examples

Scenario 1: CrashLoopBackOff

A Pod nginx-pod is stuck due to an invalid command. KAAR:

  • Detects the issue with k8sgpt.
  • Uses Bedrock to classify it as CrashLoopBackOff.
  • Updates the Pod’s command.
  • Verifies it is Running.
  • Sends an SNS alert:

    “Pod nginx-pod in default is Healthy.”


Scenario 2: OOMKilled

A Pod memory-hog is terminated for low memory. KAAR:

  • Identifies the OOMKilled issue.
  • Increases the memory limit.
  • Confirms stability.
  • Sends an alert:

    “Pod memory-hog in default is Healthy.”


Python 3.11 with boto3 and pyyaml:

Why Choose KAAR?

KAAR offers compelling benefits for Kubernetes admins:

  • Time Savings: Automates remediation.
  • Accuracy: AI-driven classification and suggestions.
  • AWS Integration: Works with Bedrock, SNS, CloudWatch.
  • Future-Ready: Coming support for Services and Deployments.

As an AWS Community Builder, I built KAAR to reflect AWS’s commitment to automation and innovation.


Getting Started with KAAR

Prerequisites

Ensure you have:

Kubernetes Cluster: AWS EKS, minikube, or kind.

1
2
   minikube start --driver=docker
   kubectl get nodes

AWS Credentials:

1
  aws configure

Python 3.11 with boto3 and pyyaml:

1
2
3
  python3.11 -m venv python311_env
  source python311_env/bin/activate
  pip install boto3 pyyaml

Install KAAR via pip:

1
 pip install kaar

Create my_config.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
 aws:
  region: us-east-1
  sns_topic_arn: arn:aws:sns:us-east-1:YOUR_AWS_ACCOUNT_ID:KAARAlerts
  log_group: /kaar/notifications
  log_stream: kaar-notifier

k8sgpt:
  backend: amazonbedrock
  explain: true

bedrock:
  model: anthropic.claude-v2:1
  max_tokens: 50
  temperature: 0.5

remediation:
  max_attempts: 5
  retry_interval_seconds: 5

Future Plans

KAAR is poised to become a comprehensive Kubernetes management tool. Planned enhancements include:

  • Service Remediation: Support for issues like SelectorMismatch and LoadBalancerPending.
  • Deployment Support: Fixes for Deployment misconfigurations.
  • EventBridge Integration: Scheduled runs for proactive monitoring, inspired by my KAAR project.
  • Local LLMs: Support for Ollama as an alternative to Bedrock.

This post is licensed under CC BY 4.0 by the author.