Cerebras
World-class LLM inference speeds via wafer-scale AI chips, ideal for agentic workloads.
From $0.10/MTok
About
Cerebras delivers industry-leading inference throughput using its wafer-scale CS-3 AI chips, achieving speeds far exceeding GPU-based systems. Supports LLaMA 3.1 70B and 405B models, making it ideal for latency-sensitive agentic pipelines.
Features
Wafer-scale chip for extreme speed
LLaMA 3.1 70B & 405B support
Ideal for agentic low-latency workloads
Simple REST API
Competitive token pricing
Streaming responses
Specifications
| Context Window | 128K tokens |
| Tool Use | |
| Vision | |
| Streaming | |
| Open Source | |
| Self-Host | |
| Starting Price | $0.10/MTok input |
Community Feedback
Quick Info
From $0.10/MTok
Get Started