Data Analyst
Data analysis workflows: pandas/SQL patterns, statistical analysis, visualization (matplotlib, plotly), dashboard design, A/B test analysis, and business metrics.
A practical data analysis skill that turns raw data into defensible insights. It handles the full workflow: data cleaning, exploratory analysis, statistical testing, visualization, and stakeholder-ready summaries. Opinionated on common pitfalls — Simpson's paradox, p-hacking, misleading visualizations, and metric misalignment — so analysis holds up to scrutiny.
Added Mar 20, 2026
$npx skills add johnefemer/skillfish --skill data-analyst What This Skill Can Do
Concrete capabilities you get when you install this skill.
Clean and reshape datasets using pandas, SQL window functions, and dbt transforms
Run statistical tests (t-test, chi-square, Mann-Whitney) and interpret results correctly
Design and analyze A/B experiments with proper power calculations and guardrail metrics
Build interactive dashboards with Plotly, Streamlit, or Observable
Define business metrics that align with company objectives and resist gaming
Identify and flag misleading analysis patterns: confounding, survivorship bias, and p-hacking
Real Conversations
See how the Data Analyst skill responds to actual requests. These are representative examples of what your AI agent will produce.
We want to run an A/B test on our checkout button. Current conversion rate is 3.2%. We want to detect a 0.5 percentage point lift. How many users do we need?
Here's the power calculation:
import numpy as np
from scipy import stats
def sample_size_two_proportion(
baseline: float,
mde: float, # minimum detectable effect
alpha: float = 0.05,
power: float = 0.80,
) -> int:
p1 = baseline
p2 = baseline + mde
p_bar = (p1 + p2) / 2
z_alpha = stats.norm.ppf(1 - alpha / 2) # two-tailed
z_beta = stats.norm.ppf(power)
n = (
(z_alpha * np.sqrt(2 * p_bar * (1 - p_bar))
+ z_beta * np.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) ** 2
/ (p2 - p1) ** 2
)
return int(np.ceil(n))
n = sample_size_two_proportion(baseline=0.032, mde=0.005)
print(f"Per variant: {n:,} users") # → ~14,700 per variant
Result: ~14,700 users per variant (29,400 total).
Critical decisions to make now:
- Runtime: At your traffic volume, how long to hit 29k users? If >4 weeks, consider raising MDE to 1pp.
- Primary metric: conversion rate. Guardrail metrics: revenue per user, cart abandonment — log these too.
- Segment pre-registration: decide mobile vs desktop splits BEFORE launch, not after peeking.
Do NOT peek at results early and stop — that inflates false positive rate to 20%+. Use sequential testing (e.g., scipy.stats with O'Brien-Fleming boundaries) if you need to monitor continuously.
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Works well with any frontier model. Complex statistical reasoning benefits from GPT-4o or Claude Sonnet; pandas/SQL generation is reliable on most models.
Context Window
SKILL.md is ~8KB. Fits in 32K context. Load dataset schema and sample rows in context for best results.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Works Great With
These skills complement Data Analyst for end-to-end coverage. Install them together for better results.
Ready to try Data Analyst?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
$npx skills add johnefemer/skillfish --skill data-analyst