Engineering Advanced AILLMProduction AI

AI Engineer

Build production AI applications: LLM integration, function calling, streaming, embedding pipelines, guardrails, cost optimization, and evaluation.

A hands-on production AI skill that helps you ship LLM-powered features reliably. It covers the full stack: streaming completions, function/tool calling, retrieval-augmented generation, embedding pipelines, structured outputs, guardrails, and eval harnesses. Opinionated on cost optimization — knows when to use GPT-4o-mini vs Sonnet vs Gemini Flash, and how to cut token spend by 60%+ without sacrificing quality.

Added Mar 20, 2026

$npx skills add johnefemer/skillfish --skill ai-engineer

Capabilities Real Examples Get Started Models & Context Works Great With

What This Skill Can Do

Concrete capabilities you get when you install this skill.

Integrate LLMs with streaming, function calling, and structured JSON outputs

Build and optimize embedding pipelines for semantic search and RAG

Implement guardrails: input validation, output moderation, and hallucination detection

Design eval harnesses to measure accuracy, latency, and cost regressions

Optimize token usage and model selection to minimize cost per request

Debug LLM failures: prompt sensitivity, context window issues, and tool-call errors

Real Conversations

See how the AI Engineer skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

How do I implement streaming with function calling in the OpenAI API? I want to show partial text while also handling tool calls.

Agent

Streaming + function calling requires careful handling because tool calls arrive as delta chunks, not a complete object. Here's the pattern:

import OpenAI from 'openai';

const client = new OpenAI();

async function streamWithTools(userMessage: string) {
  const stream = await client.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    messages: [{ role: 'user', content: userMessage }],
    tools: [{
      type: 'function',
      function: {
        name: 'get_weather',
        parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
      },
    }],
  });

  let toolCallAccumulator: { id: string; name: string; args: string } | null = null;

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta;
    if (delta?.content) {
      process.stdout.write(delta.content); // stream text
    }
    if (delta?.tool_calls?.[0]) {
      const tc = delta.tool_calls[0];
      if (tc.id) toolCallAccumulator = { id: tc.id, name: tc.function?.name ?? '', args: '' };
      if (tc.function?.arguments) toolCallAccumulator!.args += tc.function.arguments;
    }
    if (chunk.choices[0]?.finish_reason === 'tool_calls' && toolCallAccumulator) {
      const args = JSON.parse(toolCallAccumulator.args);
      const result = await callTool(toolCallAccumulator.name, args);
      console.log('\nTool result:', result);
    }
  }
}

Key gotchas: tool_calls[0].id only appears on the first chunk — accumulate arguments across subsequent chunks. Always check finish_reason === 'tool_calls' before dispatching.

You

We're spending $800/month on OpenAI. Our app does Q&A over support tickets. 90% of queries are simple lookups, 10% need reasoning. How do we cut costs without hurting quality?

Agent

Classic over-provisioning problem. Here's a routing strategy that typically cuts spend 60-75%:

Step 1 — Classify query complexity upfront (cheap)

ROUTING_PROMPT = """Classify this query as SIMPLE or COMPLEX.
SIMPLE: factual lookup, status check, short answer
COMPLEX: multi-step reasoning, comparison, diagnosis
Reply with one word only."""

route = client.chat.completions.create(
    model='gpt-4o-mini',  # $0.15/1M tokens
    messages=[{"role": "user", "content": f"{ROUTING_PROMPT}\n\nQuery: {query}"}],
    max_tokens=5,
)

Step 2 — Route to right model

if route == 'SIMPLE':
    model = 'gpt-4o-mini'   # $0.15/1M input
else:
    model = 'gpt-4o'        # $2.50/1M input

Step 3 — Cache frequent answers Embed each query, check cosine similarity against a Redis cache. Cache hit threshold: 0.95. Expect 30-40% cache hit rate on support tickets.

Step 4 — Trim system prompt Strip whitespace, remove redundant instructions. Shave 200-400 tokens per request.

Projected new cost: $160-250/month. Track model distribution weekly — if COMPLEX queries creep above 20%, re-evaluate the classifier.

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best with Claude Sonnet 4, GPT-4o, or Gemini 2.5 Pro for complex pipeline design. GPT-4o-mini handles simpler integration questions.

Context Window

SKILL.md is ~10KB. Fits in 32K context. For multi-service AI pipelines, 100K+ context recommended to track tool schemas and eval results.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Works Great With

These skills complement AI Engineer for end-to-end coverage. Install them together for better results.

RAG Architect

RAG pipeline builder, chunking optimizer, and retrieval evaluator.

RAGLLM

Agent Designer

Multi-agent orchestration, tool schemas, and performance evaluation.

AgentsAI

LLM Architect

LLM system design: model selection, fine-tuning strategies, inference optimization, context window management, multi-model routing, and cost analysis.

LLMFine-tuning

Senior Backend

REST APIs, database optimization, authentication, and microservices.

BackendAPIs

Observability Designer

SLO design, alert optimization, and dashboard generation.

ObservabilitySLO

$ skillfish add johnefemer/skillfish --all # install all skills at once

Ready to try AI Engineer?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.