Fireworks AI
High-speed inference for open-source LLMs with function calling and structured outputs.
From $0.20/MTok Free tier available
About
Fireworks AI delivers production-grade inference for open-source models like LLaMA and Mixtral with sub-200ms latency. Supports function calling, structured JSON output, vision models, and serverless or dedicated GPU deployments.
Features
Sub-200ms inference latency
LLaMA 3, Mixtral, Gemma, Qwen models
Function calling & structured output
OpenAI-compatible API
Serverless & dedicated GPU options
Vision & multimodal models
Specifications
| Context Window | 128K tokens |
| Tool Use | |
| Vision | |
| Streaming | |
| Open Source | |
| Self-Host | |
| Starting Price | $0.20/MTok input |