Distributed Rate Limiting with Redis
TL;DR — Redis Lua scripts give atomic rate-limit operations across distributed services. Token bucket fits in ~20 lines of Lua. Sliding window uses ZADD + ZREMRANGEBYSCORE. Both run in O(1) memory per key. Hot key contention is the main scaling concern.
After rate limit algorithms, the distributed implementation. Single-process rate limiters don’t work across N service replicas. Redis is the standard shared store.
Why Redis
- Single-node atomic ops via Lua scripts
- Sorted sets perfect for sliding window
- Cheap (in-memory)
- Supports TTL for cleanup
- Universal client libraries
For non-Redis users: memcached can do simple counters; Cassandra works at huge scale; in-process with Raft (rare). For most teams: Redis.
Token bucket in Lua
-- KEYS[1] = bucket key, ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now, ARGV[4] = requested
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call("HMGET", KEYS[1], "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
local allowed = 0
if tokens >= requested then
tokens = tokens - requested
allowed = 1
end
redis.call("HMSET", KEYS[1], "tokens", tokens, "last_refill", now)
redis.call("EXPIRE", KEYS[1], 86400)
return allowed
From the application:
import time, redis
r = redis.Redis()
LUA = """<the script above>"""
script = r.register_script(LUA)
def allow_request(user_id: str, capacity: int = 100, refill: float = 10) -> bool:
now = time.time()
allowed = script(keys=[f"rl:{user_id}"], args=[capacity, refill, now, 1])
return allowed == 1
EVAL runs atomically. No race conditions between read-modify-write.
Sliding window via sorted sets
def allow_sliding(user_id: str, limit: int, window_s: int) -> bool:
now = time.time()
key = f"rl:slide:{user_id}"
cutoff = now - window_s
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, cutoff)
pipe.zcard(key)
pipe.zadd(key, {str(now): now})
pipe.expire(key, window_s)
_, count, _, _ = pipe.execute()
if count >= limit:
r.zrem(key, str(now)) # undo our add
return False
return True
Pros: precise sliding window.
Cons: O(N) memory per key (N = current count). For limits in the millions, sorted set is heavy.
For high-volume use, prefer token bucket; reserve sliding window for low-rate, fairness-sensitive endpoints.
What I use in production
For most rate limits: Lua-based token bucket.
import "github.com/go-redis/redis_rate/v9"
limiter := redis_rate.NewLimiter(rdb)
func loginHandler(w http.ResponseWriter, r *http.Request) {
res, err := limiter.Allow(r.Context(),
fmt.Sprintf("login:%s", clientIP(r)),
redis_rate.PerMinute(10))
if err != nil {
// Redis down — fail open or closed depending on policy
http.Error(w, "internal", 500)
return
}
if res.Allowed == 0 {
w.Header().Set("Retry-After", fmt.Sprint(res.RetryAfter.Seconds()))
http.Error(w, "rate limited", 429)
return
}
// ... actual login logic
}
redis_rate is a Go library implementing token bucket via Lua. Equivalent libraries in Python (limits), Node (rate-limiter-flexible).
Multi-tier limits
Often you want multiple limits at the same time:
- 10 logins/minute per IP
- 100 logins/hour per IP
- 5 logins/minute per email
res1, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:%s", ip), redis_rate.PerMinute(10))
res2, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:h:%s", ip), redis_rate.PerHour(100))
res3, _ := limiter.Allow(ctx, fmt.Sprintf("login:email:%s", email), redis_rate.PerMinute(5))
if res1.Allowed == 0 || res2.Allowed == 0 || res3.Allowed == 0 {
rateLimit(w)
return
}
Each tier is a separate Redis op. Adds latency; usually <1ms per. Three operations = ~3ms additional latency. Acceptable.
Fail-open vs fail-closed
When Redis is down, what does your rate limiter do?
Fail open: allow all requests when Redis unavailable. Service stays up; rate limiting temporarily disabled. Default for most cases.
Fail closed: reject all requests. Service appears down. Use only when rate limiting is a security boundary (e.g., login).
res, err := limiter.Allow(...)
if err != nil {
if isLoginEndpoint {
// Critical — fail closed
return errLimited
}
// Other endpoints — fail open
return nil
}
Per-endpoint decision. Document.
Distributed considerations
Single Redis instance: simple, single point of failure.
Redis Sentinel: primary + replicas, automatic failover. Standard.
Redis Cluster: sharded. Rate limit keys distributed across shards. Operations stay on one shard (hash slot per key). Scales horizontally.
For most teams: Sentinel is sufficient. Cluster when single-instance memory becomes the limit.
Hot key contention
A few keys getting hammered (e.g., a single famous user’s bucket) = Redis instance bottleneck for those operations. Mitigation:
- Sharding by hash of the key
- Local caching with eventual consistency (sacrificing precision)
- Pre-warming during traffic spikes
For most applications, hot key contention isn’t an issue. For consumer-scale apps with celebrity users, it can be.
Observability
Emit metrics:
- Rate-limit hits (counter, labeled by endpoint)
- Rate-limit current rate (gauge)
- Redis call latency (histogram)
Alert on:
- Spike in rate-limit hits (possible attack)
- Redis call latency growing (Redis overloaded)
- Connection failures to Redis (fail open active)
Common Pitfalls
Not using Lua/atomic operations. Read-then-write races; bucket goes negative; race-condition allow-fail.
Forgetting EXPIRE. Keys accumulate; Redis fills up.
Multi-tier limits without Allow for the largest first. Inefficient — bail at the most-likely-to-deny tier first.
Per-IP only. Cloud botnets, mobile NAT defeat.
No graceful degradation. Redis blip = whole service erroring.
Synchronous Redis call blocking handler. For 10K req/s, this adds up. Connection pooling + reasonable timeouts.
Wrapping Up
Token bucket via Lua + go-redis_rate library = production rate limiting. Per-IP + per-user tiers. Fail-open default; fail-closed for auth. Friday: API keys vs OAuth for third parties.