Engineering Advanced KubernetesHelmContainer Orchestration

Kubernetes Specialist

Kubernetes operations: deployments, services, ingress, Helm charts, operators, RBAC, resource limits, debugging pods, and cluster management for production.

Production Kubernetes expertise that goes beyond YAML writing. This skill covers resource sizing, pod scheduling, network policies, RBAC design, Helm chart patterns, and systematic pod debugging. It knows the common failure modes — OOMKilled, ImagePullBackOff, CrashLoopBackOff — and the exact kubectl commands to diagnose them fast.

Added Mar 20, 2026

$npx skills add johnefemer/skillfish --skill kubernetes-specialist

Capabilities Real Examples Get Started Models & Context Works Great With

What This Skill Can Do

Concrete capabilities you get when you install this skill.

Design Deployment, StatefulSet, and DaemonSet configurations for production workloads

Debug pod failures with systematic kubectl diagnosis and log analysis

Create and maintain Helm charts with values templating and lifecycle hooks

Configure RBAC roles, ClusterRoles, and service accounts for least-privilege access

Tune resource requests and limits to prevent OOMKill and CPU throttling

Set up Ingress controllers, TLS termination, and network policies

Real Conversations

See how the Kubernetes Specialist skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

My pod keeps crashing with CrashLoopBackOff. How do I diagnose what's wrong?

Agent

CrashLoopBackOff means the container is starting, crashing, and Kubernetes is waiting before restarting. Systematic diagnosis:

Step 1 — Get the exit code and reason

kubectl describe pod <pod-name> -n <namespace>
# Look for: Last State, Exit Code, Reason
# Exit Code 137 = OOMKilled (memory limit hit)
# Exit Code 1 = app crashed (check logs)
# Exit Code 126/127 = command not found / not executable

Step 2 — Get logs (current + previous container)

kubectl logs <pod-name> -n <namespace> --previous
# --previous gets logs from the crashed container, not the new one

Step 3 — Common causes and fixes

Exit Code	Cause	Fix
137	OOMKilled	Increase `resources.limits.memory`
1	App error	Check logs for stack trace
0	Process exits immediately	Add `command: ["sleep", "infinity"]` to debug

Step 4 — Debug interactively if logs are empty

# Override the entrypoint to prevent crash
kubectl debug pod/<pod-name> -it --image=busybox --copy-to=debug-pod

Share the describe output and --previous logs and I'll pinpoint the root cause.

You

I updated a value in my Helm values.yaml but the deployment didn't change after helm upgrade. The pod is still using the old image tag.

Agent

This is almost always one of three issues:

Issue 1 — Template doesn't reference the value

# Check what the rendered template actually produces
helm template my-release ./my-chart -f values.yaml | grep image:

If you see a hardcoded image tag in the output, the template has image: "myapp:hardcoded" instead of image: "{{ .Values.image.tag }}".

Issue 2 — Wrong values file or override precedence

# Values override order (last wins):
# 1. chart's values.yaml
# 2. -f values.yaml flags (left to right)
# 3. --set flags
helm get values my-release  # see what values are actually deployed

Issue 3 — Deployment unchanged so rollout doesn't trigger If only a ConfigMap changed but Deployment spec didn't, Kubernetes won't restart pods. Add a checksum annotation:

# In deployment template:
annotations:
  checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}

Verify the upgrade actually ran:

helm history my-release
# Should show a new revision with STATUS = deployed
kubectl rollout status deployment/my-app

Share helm get values my-release and the relevant template section and I'll identify the exact mismatch.

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Claude Sonnet or GPT-4o recommended. Kubernetes YAML generation and debugging is reliable across most frontier models; operator development benefits from stronger models.

Context Window

SKILL.md is ~9KB. Include kubectl describe and log output directly in context for effective debugging sessions.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Works Great With

These skills complement Kubernetes Specialist for end-to-end coverage. Install them together for better results.

Senior DevOps

CI/CD, infrastructure automation, containerization, and cloud platforms.

DevOpsCI/CD

Terraform Engineer

Terraform/OpenTofu IaC: module design, state management, workspaces, provider patterns, drift detection, testing with Terratest, and multi-cloud patterns.

TerraformIaC

Observability Designer

SLO design, alert optimization, and dashboard generation.

ObservabilitySLO

AWS Solution Architect

Design AWS architectures using serverless patterns and IaC templates.

AWSServerless

CI/CD Pipeline Builder

Analyze stack and generate GitHub Actions or GitLab CI configs.

CI/CDGitHub Actions

$ skillfish add johnefemer/skillfish --all # install all skills at once

Ready to try Kubernetes Specialist?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.