Engineering Advanced LLMFine-tuningModel Selection

LLM Architect

LLM system design: model selection, fine-tuning strategies, inference optimization, context window management, multi-model routing, and cost analysis.

A deep systems skill for teams building serious LLM infrastructure. It covers model selection trade-offs across GPT-4o, Claude, Gemini, Llama, and Mistral; fine-tuning strategies (LoRA, QLoRA, RLHF, DPO); inference optimization with vLLM and TGI; context window management with sliding windows and KV cache tuning; and multi-model routing for cost-performance optimization.

Added Mar 20, 2026

$npx skills add johnefemer/skillfish --skill llm-architect

What This Skill Can Do

Concrete capabilities you get when you install this skill.

Select the right model for latency, cost, and capability requirements

Design fine-tuning pipelines using LoRA, QLoRA, or DPO on custom datasets

Optimize inference throughput with vLLM, TGI, or llama.cpp configurations

Architect multi-model routing with fallback, load balancing, and A/B testing

Manage context windows: chunking strategies, KV cache, and prompt compression

Analyze cost-performance trade-offs across model families and hosting options

Real Conversations

See how the LLM Architect skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

We have 50,000 internal support docs and want our LLM to answer questions accurately. Should we fine-tune or use RAG?

Agent

For document Q&A, RAG wins almost every time over fine-tuning. Here's why:

Fine-tuning limitations for your case:

  • Fine-tuning teaches style and behavior, not facts. The model will hallucinate confidently after training on 50k docs.
  • Your docs change. Retraining costs $500-2000/run and takes days.
  • You lose attribution — users can't see which document answered their question.

RAG advantages:

  • Ground answers in retrieved chunks — hallucination drops dramatically
  • Update your knowledge base without retraining: add/remove documents instantly
  • Native citations: return the source doc URL with every answer
  • Cost: embeddings for 50k docs ≈ $2 one-time with text-embedding-3-small

When fine-tuning DOES make sense:

  • You need a specific output format or tone the base model doesn't naturally produce
  • You have <10k high-quality instruction pairs with clear input→output mapping
  • Latency is critical and you want to bake domain vocabulary into weights

Recommendation: Build RAG first with pgvector or Pinecone. If retrieval quality plateaus below 85% on your eval set, then consider fine-tuning the retriever or reranker — not the generator.

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Requires a strong reasoning model — Claude Opus, GPT-4o, or Gemini 2.5 Pro. Architecture decisions need multi-step trade-off analysis that smaller models handle poorly.

Context Window

SKILL.md is ~12KB. Use 100K+ context for sessions covering fine-tuning datasets, inference configs, and eval results simultaneously.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Works Great With

These skills complement LLM Architect for end-to-end coverage. Install them together for better results.

$ skillfish add johnefemer/skillfish --all # install all skills at once

Ready to try LLM Architect?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

$npx skills add johnefemer/skillfish --skill llm-architect
← Browse all skills