TDD Guide
Test-driven development workflow with test generation and multi-framework support.
$skillfish add ~skills/tdd-guide Real Conversations
See how the TDD Guide skill responds to actual requests. These are representative examples of what your AI agent will produce.
Starting a new Node.js shopping cart service. Team is skeptical of TDD. Can you show me the red-green-refactor cycle on a real example?
TDD Red-Green-Refactor — Shopping Cart:
RED — Write a failing test first:
// cart.test.ts
import { Cart } from './cart';
describe('Cart', () => {
it('adds an item and returns correct total', () => {
const cart = new Cart();
cart.addItem({ id: 'sku-1', name: 'Widget', price: 999, quantity: 2 });
expect(cart.total()).toBe(1998);
});
});
// Run: npx jest -- FAILS: Cannot find module './cart'
GREEN — Write the minimum code to pass:
// cart.ts
export class Cart {
private items: CartItem[] = [];
addItem(item: CartItem) {
this.items.push(item);
}
total(): number {
return this.items.reduce((sum, i) => sum + i.price * i.quantity, 0);
}
}
// Run: npx jest -- PASSES
REFACTOR — Improve without breaking:
// Add quantity merging for duplicate SKUs
addItem(item: CartItem) {
const existing = this.items.find(i => i.id === item.id);
if (existing) { existing.quantity += item.quantity; }
else { this.items.push({ ...item }); }
}
// Run: npx jest -- still PASSES
Next test (repeat cycle):
it('applies 10% discount coupon', () => {
cart.addItem({ id: 'sku-1', price: 1000, quantity: 1 });
cart.applyCoupon({ code: 'SAVE10', type: 'percent', value: 10 });
expect(cart.total()).toBe(900);
});
The pitch to skeptics: TDD doesn't slow you down — it shifts debugging time from after deployment to during writing. The test suite you build becomes the spec.
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Larger models produce more detailed, production-ready outputs.
Context Window
This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.
All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Good to Know
Advanced guide and reference material for TDD Guide. Background, edge cases, and patterns worth understanding.
Contents
When TDD Helps vs Hurts
TDD is a design tool, not a religious obligation. Apply it where the feedback loop pays off.
| Scenario | TDD appropriate? | Notes |
|---|---|---|
| New feature with clear spec | Yes | Red-green-refactor shines; tests become the living spec |
| Exploratory / spike work | No | Write a spike, extract understanding, throw it away, then TDD the real implementation |
| Legacy code without tests | Characterization tests first | Write tests describing current behavior (even if buggy) before modifying anything |
| Performance-critical hot path | Profile first | TDD the correctness; benchmark separately. Don't optimize before measuring. |
| UI components (visual) | Maybe | TDD business logic and state; skip TDD for visual assertions — use snapshot or visual regression testing |
| Third-party integration wrappers | Yes | Mock the external call; TDD the wrapper's behavior and error handling |
The Test Pyramid in Practice
The canonical ratio: 70% unit / 20% integration / 10% E2E. This keeps the test suite fast and failures localized.
How teams end up with an inverted pyramid:
- E2E tests are written first because they "feel like real testing"
- Integration tests catch the bugs unit tests miss, so more get added
- Unit tests require design discipline (DI, pure functions) that teams skip under deadline pressure
How to detect an inverted pyramid from CI metrics:
If: test suite runtime > 10 minutes
And: disabling E2E tests cuts runtime by >60%
Then: your pyramid is inverted
The other signal: flaky tests. E2E and integration tests are the primary source of flakiness (network, timing, shared state). If you have >2% flake rate on your suite, you probably have too many high-level tests.
Recovery: Don't delete E2E tests — add unit tests until the ratio normalizes. Deleting tests leaves undetected regressions.
Coverage Is Not Quality
100% line coverage is achievable with tests that assert nothing:
it('runs without throwing', () => {
expect(() => processPayment(validOrder)).not.toThrow();
});
// Covers every line. Detects zero behavioral regressions.
What coverage misses:
- Missing test cases (the code path you didn't think to write)
- Wrong assertions (testing the wrong output)
- Incorrect edge case handling that still returns a value
Mutation score is the better signal. A mutation testing tool (Stryker for JS/TS, mutmut for Python, pitest for Java) makes small code changes — flipping > to >=, changing a return value — and runs your tests. If your tests don't catch the mutation, the test is weak.
Target: mutation score >70% on critical business logic. Don't apply it to all code — it's expensive to run and the ROI is highest on payment, auth, and calculation logic.
AAA vs BDD Test Style
Both are structuring conventions. The difference is readability for different audiences.
AAA (Arrange/Act/Assert) — clear for developers reading code:
it('applies percentage discount to cart total', () => {
// Arrange
const cart = new Cart([{ price: 1000, qty: 1 }]);
const coupon = { type: 'percent', value: 10 };
// Act
cart.applyCoupon(coupon);
// Assert
expect(cart.total()).toBe(900);
});
BDD (Given/When/Then) — reads like a product spec:
it('applies percentage discount to cart total', () => {
given('a cart containing one item at $10.00', () => {
const cart = new Cart([{ price: 1000, qty: 1 }]);
when('a 10% discount coupon is applied', () => {
cart.applyCoupon({ type: 'percent', value: 10 });
then('the total should be $9.00', () => {
expect(cart.total()).toBe(900);
});
});
});
});
When to use each: AAA for team-internal tests where all readers are engineers. BDD when tests double as acceptance criteria reviewed by product or QA — the Given/When/Then maps directly to user story format. Avoid mixing styles in the same file.
"Obvious Implementation" vs "Fake It Till You Make It"
Kent Beck described two modes in Test-Driven Development by Example:
Obvious implementation: When you know exactly how to write the code correctly, just write it. Don't introduce artificial fakery for its own sake. Skip the stub if the real implementation is three lines.
Fake it till you make it: When the correct implementation isn't clear, start with a hardcoded return value that makes the test pass. Let subsequent tests force you to generalize.
// Test 1: passes with hardcoded return
total() { return 1998; }
// Test 2: new items — hardcoded breaks
// Forces you to write: return this.items.reduce(...)
The value of "fake it" isn't the fake code — it's the discipline of writing tests that force you toward the real implementation one constraint at a time. Use it when you're unsure of the design, not as a default.
Ready to try TDD Guide?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
$skillfish add ~skills/tdd-guide