Engineering Advanced TestingQAAI Skills

Skill Tester

Quality assurance testing for AI agent skills.

$skillfish add ~skills/skill-tester

Real Conversations

See how the Skill Tester skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

I just finished building a new skill called log-analyzer — it has SKILL.md, README.md, two Python scripts in scripts/, and a references/ directory. I want to make sure it meets POWERFUL tier requirements before I open a PR. How do I run the full quality check?

Agent

This skill includes executable scripts — I'll need your permission before running them.

Here's the full three-tool validation sequence for POWERFUL tier:

# Step 1 — Structural validation
python3 scripts/skill_validator.py skills/log-analyzer --tier POWERFUL

# Step 2 — Script testing (syntax, imports, runtime, output format)
python3 scripts/script_tester.py skills/log-analyzer --timeout 30 --sample-data

# Step 3 — Quality scoring
python3 scripts/quality_scorer.py skills/log-analyzer --detailed --recommendations

What POWERFUL tier requires (checklist before running):

  • SKILL.md: minimum 300 lines
  • 2-3 Python scripts, each 500-800 lines
  • --json and human-readable output modes on all scripts
  • Comprehensive argparse with multiple modes/subcommands
  • references/ directory with documentation
  • expected_outputs/ directory for regression testing

Interpreting results:

=== SKILL VALIDATION REPORT ===
Skill: skills/log-analyzer
Tier: POWERFUL
Overall Score: 82/100 (B)

Structure Validation: ✓ PASS
├─ SKILL.md: ✓ EXISTS (312 lines)
├─ scripts/: ✓ EXISTS (2 files)
├─ references/: ✓ EXISTS
└─ expected_outputs/: ⚠ MISSING  ← add this

Script Testing:
├─ log_scanner.py: ✓ PASS (syntax, imports, --help, --json)
└─ log_summarizer.py: ⚠ WARN — no --json output mode

Recommendations:
• Add expected_outputs/ directory with sample runs
• Add --json flag to log_summarizer.py
• Add --format text/json to argument parser

Fix the warnings before the PR — a B score (80+) is acceptable for POWERFUL tier. Anything below 75 will be flagged in CI.

# Run with --json for CI-parseable output
python3 scripts/quality_scorer.py skills/log-analyzer --json | jq '.overall_score'
# Must be >= 75

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best
Claude Opus 4 Claude Sonnet 4 GPT-4.1 Gemini 2.5 Pro Grok 3 Kimi K2
Good
Claude Haiku 4.5 GPT-4.1 mini Gemini 2.5 Flash Grok 3 mini

Larger models produce more detailed, production-ready outputs.

Context Window

This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.

8K Skill only
32K+ Skill + conversation
100K+ Skill + references + codebase

All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Ready to try Skill Tester?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

$skillfish add ~skills/skill-tester
← Browse all 169 skills