RAG Architect
RAG pipeline builder, chunking optimizer, and retrieval evaluator.
$skillfish add ~skills/rag-architect Real Conversations
See how the RAG Architect skill responds to actual requests. These are representative examples of what your AI agent will produce.
We want to build an internal knowledge base chatbot over our Confluence wiki — about 50,000 pages. Engineers should be able to ask questions about our architecture, runbooks, and processes. We're using Python. What's the best architecture for this?
At 50k pages, the architecture decisions that matter most are chunking strategy, vector store selection, and hybrid retrieval.
Recommended stack:
- Chunking: Hierarchical (page title + section heading + content block)
- Embedding model:
text-embedding-3-small(1536 dim, $0.02/1M tokens — ~$0.50 to embed all pages once) - Vector store: pgvector on PostgreSQL (you likely already have Postgres; no new infra)
- Retrieval: Hybrid (dense + BM25 sparse) with cross-encoder reranking
- LLM: Claude 3.5 Sonnet or GPT-4o for generation
Chunking for Confluence pages:
def chunk_confluence_page(page: dict) -> list[dict]:
chunks = []
# Always include page title + space in every chunk for context
page_context = f"Page: {page['title']} | Space: {page['space']}"
# Split on H2/H3 headings (Confluence structure)
sections = split_by_headings(page['body_storage'], max_tokens=400)
for section in sections:
chunks.append({
'text': f"{page_context}\n{section['heading']}\n{section['content']}",
'metadata': {
'page_id': page['id'],
'title': page['title'],
'space': page['space'],
'url': page['_links']['webui'],
'last_modified': page['version']['when'],
}
})
return chunks
Hybrid retrieval with RRF fusion:
def hybrid_search(query: str, k: int = 10):
# Dense: semantic similarity
dense_results = pgvector_search(query_embedding, k=20)
# Sparse: keyword matching
sparse_results = bm25_search(query, k=20)
# Fuse with Reciprocal Rank Fusion
return reciprocal_rank_fusion(dense_results, sparse_results, k=k)
Why hybrid matters here: Engineers search for exact command names (kubectl rollout restart) and semantic concepts ('how to scale a service') — pure dense retrieval misses exact terms, pure BM25 misses semantic intent.
Estimated cost: ~$0.50 for initial embedding. $0.002/query at 10 chunks retrieved. At 1k queries/day: ~$60/month.
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Larger models produce more detailed, production-ready outputs.
Context Window
This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.
All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Ready to try RAG Architect?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
$skillfish add ~skills/rag-architect