Engineering CVMLDetection

Senior Computer Vision

Object detection, image segmentation, CNN/ViT architectures, and deployment.

Install for

Real Conversations

See how the Senior Computer Vision skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

Building a retail shelf audit system — need to detect ~80 product SKUs in real time on a store tablet (Snapdragon 8cx). Should I use YOLOv8 or DETR?

Agent

Recommendation: YOLOv8n or YOLOv8s — not DETR

DETR is a transformer-based detector optimized for accuracy, not latency. On a Snapdragon 8cx without a dedicated NPU, DETR inference will run 500ms-2s per frame — unusable for real-time.

YOLOv8 comparison for your constraint:

Model	mAP50 (COCO)	Latency (CPU)	Size
YOLOv8n	37.3	~45ms	6.3MB
YOLOv8s	44.9	~90ms	22MB
YOLOv8m	50.2	~200ms	52MB

For 80 SKUs on-device, start with YOLOv8s — good accuracy/speed balance.

Fine-tuning pipeline for your 80 SKUs:

from ultralytics import YOLO

model = YOLO('yolov8s.pt')  # start from COCO pretrained weights
results = model.train(
    data='retail_skus.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    augment=True,  # mosaic, mixup, HSV shifts — critical for retail lighting variance
)
# Export for on-device inference
model.export(format='onnx', opset=12, simplify=True)

Data requirements: ~200-500 images per SKU, captured under store lighting conditions. Use LabelStudio for annotation. Augment heavily for rotation and partial occlusion (products get partially blocked by other items).

On-device runtime: ONNX Runtime with XNNPACK backend gives ~2x speedup over vanilla CPU on Snapdragon.

You

My ResNet50 + FPN defect detection model gets mAP50 of 0.41 on validation but only 0.18 on the test set from the factory floor. Training on 2,000 labeled images.

Agent

Gap of 0.41 → 0.18 is a distribution shift problem, not a capacity problem.

Diagnose first:

import torchvision.transforms.functional as TF
from PIL import Image
import numpy as np

# Compare image statistics between train and test sets
train_means = [np.array(Image.open(p)).mean() for p in train_paths[:200]]
test_means = [np.array(Image.open(p)).mean() for p in test_paths[:200]]
print(f"Train brightness: {np.mean(train_means):.1f} | Test: {np.mean(test_means):.1f}")
# Large gap here = lighting condition mismatch

Common factory floor distribution shifts:

Lighting — training images from controlled setup, test from production line with varying ambient light
Camera angle/distance — slight mount variation changes defect apparent size
Product variation — new batch of material looks slightly different

Fix 1 — Aggressive augmentation during training:

transforms.Compose([
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.3, hue=0.1),
    transforms.RandomAffine(degrees=5, translate=(0.05, 0.05), scale=(0.9, 1.1)),
    transforms.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),
])

Fix 2 — Collect 100-200 labeled images from the actual test environment and add to training set. This alone usually closes 70% of the distribution gap.

Fix 3 — If you can't relabel: Try test-time augmentation (TTA) — average predictions across 4 augmented versions of each test image. Typically +3-5 mAP points for free.

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best

Claude Opus 4.8 Claude Sonnet 4 GPT-4.1 Gemini 2.5 Pro Grok 3 Kimi K2

Good

Claude Haiku 4.5 GPT-4.1 mini Gemini 2.5 Flash Grok 3 mini

Larger models produce more detailed, production-ready outputs.

Context Window

This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.

8K Skill only

32K+ Skill + conversation

100K+ Skill + references + codebase

All current frontier models support 100K+ context.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Ready to try Senior Computer Vision?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

Install for

← Browse all skills