Groq
Ultra-fast LLM inference via custom LPU chips, running LLaMA, Mixtral, and Gemma.
Free / $0.05/MTok Free tier available
About
Groq delivers the fastest available LLM inference using its proprietary Language Processing Unit (LPU) chips. Runs popular open-source models like LLaMA 3, Mixtral, and Gemma with a developer-friendly, OpenAI-compatible API and generous free tier.
Features
Fastest available inference speed
LLaMA 3, Mixtral, Gemma models
OpenAI-compatible API
Free tier with generous rate limits
Low-latency for agentic workflows
Streaming support
Specifications
| Context Window | 128K tokens |
| Tool Use | |
| Vision | |
| Streaming | |
| Open Source | |
| Self-Host | |
| Starting Price | Free tier available |