Engineering DevOpsCI/CDCloud

Senior DevOps

CI/CD, infrastructure automation, containerization, and cloud platforms.

$skillfish add ~skills/senior-devops

Real Conversations

See how the Senior DevOps skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

Our k8s rolling deploys cause ~30 seconds of 503 errors. We have 3 replicas behind an ALB, Node.js app, and our readiness probe is just checking port 3000.

Agent

Root Cause: Pods receive traffic before they're ready, and terminate before draining in-flight requests.

Fix 1 — Proper readiness probe (checks app logic, not just port):

readinessProbe:
  httpGet:
    path: /health/ready  # must return 200 only when app is fully initialized
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3
livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 30  # give app time to start before killing it

Node.js health endpoint:

app.get('/health/ready', (req, res) => {
  // Check DB connection, cache, etc.
  if (dbPool.totalCount > 0) res.sendStatus(200);
  else res.sendStatus(503);
});

Fix 2 — Graceful shutdown on SIGTERM:

process.on('SIGTERM', async () => {
  server.close(async () => {    // stop accepting new connections
    await dbPool.end();          // drain DB connections
    process.exit(0);
  });
  setTimeout(() => process.exit(1), 25000);  // force exit after 25s
});

Fix 3 — Deployment spec:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # never remove a pod before a new one is ready
  template:
    spec:
      terminationGracePeriodSeconds: 30

After these changes: 503s during deploy → 0. ALB waits for readiness, app drains in-flight requests before shutdown.

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best
Claude Opus 4 Claude Sonnet 4 GPT-4.1 Gemini 2.5 Pro Grok 3 Kimi K2
Good
Claude Haiku 4.5 GPT-4.1 mini Gemini 2.5 Flash Grok 3 mini

Larger models produce more detailed, production-ready outputs.

Context Window

This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.

8K Skill only
32K+ Skill + conversation
100K+ Skill + references + codebase

All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Ready to try Senior DevOps?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

$skillfish add ~skills/senior-devops
← Browse all 169 skills