Senior Computer Vision
Object detection, image segmentation, CNN/ViT architectures, and deployment.
$skillfish add ~skills/senior-computer-vision Real Conversations
See how the Senior Computer Vision skill responds to actual requests. These are representative examples of what your AI agent will produce.
Building a retail shelf audit system — need to detect ~80 product SKUs in real time on a store tablet (Snapdragon 8cx). Should I use YOLOv8 or DETR?
Recommendation: YOLOv8n or YOLOv8s — not DETR
DETR is a transformer-based detector optimized for accuracy, not latency. On a Snapdragon 8cx without a dedicated NPU, DETR inference will run 500ms-2s per frame — unusable for real-time.
YOLOv8 comparison for your constraint:
| Model | mAP50 (COCO) | Latency (CPU) | Size |
|---|---|---|---|
| YOLOv8n | 37.3 | ~45ms | 6.3MB |
| YOLOv8s | 44.9 | ~90ms | 22MB |
| YOLOv8m | 50.2 | ~200ms | 52MB |
For 80 SKUs on-device, start with YOLOv8s — good accuracy/speed balance.
Fine-tuning pipeline for your 80 SKUs:
from ultralytics import YOLO
model = YOLO('yolov8s.pt') # start from COCO pretrained weights
results = model.train(
data='retail_skus.yaml',
epochs=100,
imgsz=640,
batch=16,
augment=True, # mosaic, mixup, HSV shifts — critical for retail lighting variance
)
# Export for on-device inference
model.export(format='onnx', opset=12, simplify=True)
Data requirements: ~200-500 images per SKU, captured under store lighting conditions. Use LabelStudio for annotation. Augment heavily for rotation and partial occlusion (products get partially blocked by other items).
On-device runtime: ONNX Runtime with XNNPACK backend gives ~2x speedup over vanilla CPU on Snapdragon.
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase. Web-based AI can use the knowledge and frameworks, but won't have tool access.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Larger models produce more detailed, production-ready outputs.
Context Window
This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.
All current frontier models (Claude, GPT, Gemini) support 100K+ context. Use the full window for complex multi-service work.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Ready to try Senior Computer Vision?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
$skillfish add ~skills/senior-computer-vision