Senior Data Engineer
Data pipelines, ETL/ELT systems, Spark, Airflow, dbt, and Kafka.
$skillfish add ~skills/senior-data-engineer Real Conversations
See how the Senior Data Engineer skill responds to actual requests. These are representative examples of what your AI agent will produce.
We run dbt on 200+ models in BigQuery. Full refresh takes 4 hours. I want to set up Airflow to run only models changed since the last run plus their downstream dependencies.
Incremental dbt + Airflow Pattern:
Step 1 — Get changed models from git diff:
# dags/dbt_incremental_dag.py
from airflow.decorators import dag, task
from airflow.operators.bash import BashOperator
import subprocess
@task
def get_changed_models() -> str:
result = subprocess.run(
['git', 'diff', '--name-only', 'HEAD~1', 'HEAD', '--', 'models/'],
capture_output=True, text=True, cwd='/opt/dbt/project'
)
changed_files = result.stdout.strip().split('\n')
# Convert file paths to dbt model selectors
models = [f.replace('models/', '').replace('.sql', '').replace('/', '.') for f in changed_files if f.endswith('.sql')]
return ' '.join(f'+{m}' for m in models) # + prefix includes downstream
@dag(schedule='@hourly', catchup=False)
def dbt_incremental():
changed = get_changed_models()
BashOperator(
task_id='dbt_run',
bash_command=f'dbt run --select {{ changed }} --profiles-dir /opt/dbt',
)
The +model_name selector runs the model AND all downstream dependents — this is what you want for correctness.
For a more robust setup, use the dbt Cloud API or dbt-airflow provider:
pip install astronomer-cosmos # NASA's dbt-Airflow integration
Cosmos parses the dbt manifest and creates individual Airflow tasks per model — giving you granular retry, skip, and observability.
Expected result: 4h full refresh → 20-40min for a typical PR touching 10-20 models.
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Larger models produce more detailed, production-ready outputs.
Context Window
This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.
All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Ready to try Senior Data Engineer?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
$skillfish add ~skills/senior-data-engineer