Engineering DevOpsCI/CDCloud

Senior DevOps

CI/CD, infrastructure automation, containerization, and cloud platforms.

Install for

Real Conversations

See how the Senior DevOps skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

Our k8s rolling deploys cause ~30 seconds of 503 errors. We have 3 replicas behind an ALB, Node.js app, and our readiness probe is just checking port 3000.

Agent

Root Cause: Pods receive traffic before they're ready, and terminate before draining in-flight requests.

Fix 1 — Proper readiness probe (checks app logic, not just port):

readinessProbe:
  httpGet:
    path: /health/ready  # must return 200 only when app is fully initialized
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3
livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 30  # give app time to start before killing it

Node.js health endpoint:

app.get('/health/ready', (req, res) => {
  // Check DB connection, cache, etc.
  if (dbPool.totalCount > 0) res.sendStatus(200);
  else res.sendStatus(503);
});

Fix 2 — Graceful shutdown on SIGTERM:

process.on('SIGTERM', async () => {
  server.close(async () => {    // stop accepting new connections
    await dbPool.end();          // drain DB connections
    process.exit(0);
  });
  setTimeout(() => process.exit(1), 25000);  // force exit after 25s
});

Fix 3 — Deployment spec:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # never remove a pod before a new one is ready
  template:
    spec:
      terminationGracePeriodSeconds: 30

After these changes: 503s during deploy → 0. ALB waits for readiness, app drains in-flight requests before shutdown.

You

Our GitHub Actions CI runs in 18 minutes. It's a Node.js monorepo with 4 apps. Every push runs lint, type-check, unit tests, and build for all 4 apps regardless of what changed.

Agent

Strategy: Affected-only runs + aggressive caching

Step 1 — Detect which apps changed:

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      web-app: ${{ steps.filter.outputs.web-app }}
      web-admin: ${{ steps.filter.outputs.web-admin }}
      api: ${{ steps.filter.outputs.api }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            web-app:
              - 'apps/web-app/**'
              - 'packages/ui/**'  # shared dep — also triggers web-app
            web-admin:
              - 'apps/web-admin/**'
              - 'packages/ui/**'
            api:
              - 'services/api/**'

Step 2 — Conditional jobs:

  test-web-app:
    needs: changes
    if: needs.changes.outputs.web-app == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/cache@v4
        with:
          path: ~/.npm
          key: npm-${{ hashFiles('package-lock.json') }}
      - run: npm ci --workspace=apps/web-app
      - run: npm run lint test build --workspace=apps/web-app

Step 3 — Run jobs in parallel (they already will be, since they don't needs: each other)

Expected result:

PR touching only web-app: runs 1 job instead of 4 → 18 min → ~4.5 min
PR touching packages/ui: runs all 4 affected apps in parallel → ~5 min
npm ci cache hit rate: ~90% after first run → saves 2-3 min per job

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best

Claude Opus 4.8 Claude Sonnet 4 GPT-4.1 Gemini 2.5 Pro Grok 3 Kimi K2

Good

Claude Haiku 4.5 GPT-4.1 mini Gemini 2.5 Flash Grok 3 mini

Larger models produce more detailed, production-ready outputs.

Context Window

This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.

8K Skill only

32K+ Skill + conversation

100K+ Skill + references + codebase

All current frontier models support 100K+ context.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Ready to try Senior DevOps?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

Install for

← Browse all skills