A public API at 3,000 developer customers is a different threat model than an internal service — you need rate limiting that's fair, hard to game, and observable. Here's how to design this properly.
Rate Limiting Strategy
Three common algorithms — pick based on your burst tolerance:
| Algorithm |
Burst Handling |
Implementation |
Best For |
| Token Bucket |
Allows bursts up to bucket capacity |
Redis + Lua script |
Most APIs — flexible and developer-friendly |
| Leaky Bucket |
Smooths bursts to constant output rate |
Queue-based |
Protecting fragile downstream services |
| Sliding Window |
Exact counts, no burst allowance |
Redis sorted set |
Strict compliance (billing, per-unit quotas) |
Recommendation: Token bucket for your developer API. Developers expect occasional burst tolerance (batch processing, retries), and token bucket gives you that without sacrificing protection.
Redis-Backed Rate Limiter (Node.js)
Two layers: per-customer and global.
// lib/rateLimiter.ts
import { Redis } from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
interface RateLimitResult {
allowed: boolean;
remaining: number;
resetAt: number; // unix timestamp
}
// Atomic token bucket via Lua — prevents race conditions
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Refill tokens based on elapsed time
local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate))
if new_tokens >= requested then
redis.call('HMSET', key, 'tokens', new_tokens - requested, 'last_refill', now)
redis.call('EXPIRE', key, 86400)
return {1, math.floor(new_tokens - requested)}
else
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, 86400)
return {0, math.floor(new_tokens)}
end
`;
export async function checkRateLimit(customerId: string): Promise<RateLimitResult> {
const now = Date.now() / 1000;
// Per-customer: 1,000 req/min, burst up to 200
const customerKey = `rl:customer:${customerId}`;
const [custAllowed, custRemaining] = await redis.eval(
TOKEN_BUCKET_SCRIPT, 1, customerKey, 200, 16.67, now, 1
) as [number, number];
// Global: 50,000 req/min across all customers
const globalKey = 'rl:global';
const [globalAllowed] = await redis.eval(
TOKEN_BUCKET_SCRIPT, 1, globalKey, 5000, 833, now, 1
) as [number, number];
const allowed = custAllowed === 1 && globalAllowed === 1;
const resetAt = Math.ceil(now) + 60;
return { allowed, remaining: custRemaining, resetAt };
}
// middleware/rateLimitMiddleware.ts
export async function rateLimitMiddleware(req, res, next) {
const customerId = req.headers['x-customer-id'] || req.apiKey?.customerId;
if (!customerId) return res.status(401).json({ error: 'Unauthorized' });
const result = await checkRateLimit(customerId);
// Always set headers — developers depend on these
res.set({
'X-RateLimit-Limit': '1000',
'X-RateLimit-Remaining': result.remaining.toString(),
'X-RateLimit-Reset': result.resetAt.toString(),
'Retry-After': result.allowed ? undefined : '60',
});
if (!result.allowed) {
return res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Rate limit exceeded. See X-RateLimit-Reset for retry time.',
docs: 'https://docs.yourapi.com/rate-limits',
});
}
next();
}
API Versioning
| Approach |
Example |
Trade-offs |
| URL path |
/v1/orders |
Most visible, easy to route, pollutes URLs |
| Header |
Accept: application/vnd.api+json;version=2 |
Clean URLs, harder to test in browser |
| Query param |
/orders?version=2 |
Easy to test, messy for caching |
Recommendation: URL path versioning for a public developer API. It's explicit, curl-friendly, and easy to document. Reserve header versioning for when you need URL stability (e.g., webhooks).
Version strategy: support N and N-1. Deprecate with Sunset and Deprecation response headers 6 months before removal.
// Sunset header for deprecated versions
if (req.apiVersion === 'v1') {
res.set('Sunset', 'Sat, 01 Jan 2027 00:00:00 GMT');
res.set('Deprecation', 'Mon, 01 Jul 2026 00:00:00 GMT');
res.set('Link', '<https://docs.yourapi.com/v2/migration>; rel="successor-version"');
}
Circuit Breaker for Downstream Failures
Use opossum — battle-tested Node.js circuit breaker library.
import CircuitBreaker from 'opossum';
const paymentServiceBreaker = new CircuitBreaker(callPaymentService, {
timeout: 3000, // fail requests taking >3s
errorThresholdPercentage: 50, // open circuit if 50% of requests fail
resetTimeout: 30000, // try again after 30s
volumeThreshold: 10, // need 10 requests before tripping
});
paymentServiceBreaker.fallback(() => ({
status: 'degraded',
message: 'Payment processing is temporarily unavailable. Your order has been queued.',
retryAfter: 30,
}));
// Expose circuit state in health endpoint
paymentServiceBreaker.on('open', () => {
metrics.increment('circuit_breaker.payment_service.open');
alerting.warn('Payment service circuit breaker opened');
});
export async function processPayment(data) {
return paymentServiceBreaker.fire(data);
}
Health endpoint convention — expose circuit states so your operations team can see degraded dependencies without digging through logs:
GET /health
{
"status": "degraded",
"services": {
"payment_service": { "status": "open", "since": "2026-03-09T11:23:00Z" },
"notification_service": { "status": "closed" }
}
}
Recommended Stack
- Rate limiting:
express-rate-limit for simple cases; Redis + Lua for multi-tier per-customer limits
- Circuit breaker:
opossum (Node.js), resilience4j (JVM), Polly (.NET)
- API versioning routing: Express Router with version prefix, or an API gateway (Kong, AWS API Gateway) to handle routing declaratively
- Observability: Emit
rate_limit_exceeded, circuit_breaker_open, and api_version_used metrics from day one — you'll need them to tune thresholds