Distributed Rate Limiting with Redis

November 16, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Redis Lua scripts give atomic rate-limit operations across distributed services. Token bucket fits in ~20 lines of Lua. Sliding window uses ZADD + ZREMRANGEBYSCORE. Both run in O(1) memory per key. Hot key contention is the main scaling concern.

After rate limit algorithms, the distributed implementation. Single-process rate limiters don’t work across N service replicas. Redis is the standard shared store.

Why Redis

Single-node atomic ops via Lua scripts
Sorted sets perfect for sliding window
Cheap (in-memory)
Supports TTL for cleanup
Universal client libraries

For non-Redis users: memcached can do simple counters; Cassandra works at huge scale; in-process with Raft (rare). For most teams: Redis.

Token bucket in Lua

-- KEYS[1] = bucket key, ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now, ARGV[4] = requested
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call("HMGET", KEYS[1], "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

local allowed = 0
if tokens >= requested then
    tokens = tokens - requested
    allowed = 1
end

redis.call("HMSET", KEYS[1], "tokens", tokens, "last_refill", now)
redis.call("EXPIRE", KEYS[1], 86400)
return allowed

From the application:

import time, redis

r = redis.Redis()
LUA = """<the script above>"""
script = r.register_script(LUA)

def allow_request(user_id: str, capacity: int = 100, refill: float = 10) -> bool:
    now = time.time()
    allowed = script(keys=[f"rl:{user_id}"], args=[capacity, refill, now, 1])
    return allowed == 1

EVAL runs atomically. No race conditions between read-modify-write.

Sliding window via sorted sets

def allow_sliding(user_id: str, limit: int, window_s: int) -> bool:
    now = time.time()
    key = f"rl:slide:{user_id}"
    cutoff = now - window_s
    pipe = r.pipeline()
    pipe.zremrangebyscore(key, 0, cutoff)
    pipe.zcard(key)
    pipe.zadd(key, {str(now): now})
    pipe.expire(key, window_s)
    _, count, _, _ = pipe.execute()
    if count >= limit:
        r.zrem(key, str(now))   # undo our add
        return False
    return True

Pros: precise sliding window.

Cons: O(N) memory per key (N = current count). For limits in the millions, sorted set is heavy.

For high-volume use, prefer token bucket; reserve sliding window for low-rate, fairness-sensitive endpoints.

What I use in production

For most rate limits: Lua-based token bucket.

import "github.com/go-redis/redis_rate/v9"

limiter := redis_rate.NewLimiter(rdb)

func loginHandler(w http.ResponseWriter, r *http.Request) {
    res, err := limiter.Allow(r.Context(),
        fmt.Sprintf("login:%s", clientIP(r)),
        redis_rate.PerMinute(10))
    if err != nil {
        // Redis down — fail open or closed depending on policy
        http.Error(w, "internal", 500)
        return
    }
    if res.Allowed == 0 {
        w.Header().Set("Retry-After", fmt.Sprint(res.RetryAfter.Seconds()))
        http.Error(w, "rate limited", 429)
        return
    }
    // ... actual login logic
}

redis_rate is a Go library implementing token bucket via Lua. Equivalent libraries in Python (limits), Node (rate-limiter-flexible).

Multi-tier limits

Often you want multiple limits at the same time:

10 logins/minute per IP
100 logins/hour per IP
5 logins/minute per email

res1, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:%s", ip), redis_rate.PerMinute(10))
res2, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:h:%s", ip), redis_rate.PerHour(100))
res3, _ := limiter.Allow(ctx, fmt.Sprintf("login:email:%s", email), redis_rate.PerMinute(5))
if res1.Allowed == 0 || res2.Allowed == 0 || res3.Allowed == 0 {
    rateLimit(w)
    return
}

Each tier is a separate Redis op. Adds latency; usually <1ms per. Three operations = ~3ms additional latency. Acceptable.

Fail-open vs fail-closed

When Redis is down, what does your rate limiter do?

Fail open: allow all requests when Redis unavailable. Service stays up; rate limiting temporarily disabled. Default for most cases.

Fail closed: reject all requests. Service appears down. Use only when rate limiting is a security boundary (e.g., login).

res, err := limiter.Allow(...)
if err != nil {
    if isLoginEndpoint {
        // Critical — fail closed
        return errLimited
    }
    // Other endpoints — fail open
    return nil
}

Per-endpoint decision. Document.

Distributed considerations

Single Redis instance: simple, single point of failure.

Redis Sentinel: primary + replicas, automatic failover. Standard.

Redis Cluster: sharded. Rate limit keys distributed across shards. Operations stay on one shard (hash slot per key). Scales horizontally.

For most teams: Sentinel is sufficient. Cluster when single-instance memory becomes the limit.

Hot key contention

A few keys getting hammered (e.g., a single famous user’s bucket) = Redis instance bottleneck for those operations. Mitigation:

Sharding by hash of the key
Local caching with eventual consistency (sacrificing precision)
Pre-warming during traffic spikes

For most applications, hot key contention isn’t an issue. For consumer-scale apps with celebrity users, it can be.

Observability

Emit metrics:

Rate-limit hits (counter, labeled by endpoint)
Rate-limit current rate (gauge)
Redis call latency (histogram)

Alert on:

Spike in rate-limit hits (possible attack)
Redis call latency growing (Redis overloaded)
Connection failures to Redis (fail open active)

Common Pitfalls

Not using Lua/atomic operations. Read-then-write races; bucket goes negative; race-condition allow-fail.

Forgetting EXPIRE. Keys accumulate; Redis fills up.

Multi-tier limits without Allow for the largest first. Inefficient — bail at the most-likely-to-deny tier first.

Per-IP only. Cloud botnets, mobile NAT defeat.

No graceful degradation. Redis blip = whole service erroring.

Synchronous Redis call blocking handler. For 10K req/s, this adds up. Connection pooling + reasonable timeouts.

Wrapping Up

Token bucket via Lua + go-redis_rate library = production rate limiting. Per-IP + per-user tiers. Fail-open default; fail-closed for auth. Friday: API keys vs OAuth for third parties.

Why Redis

Token bucket in Lua

Sliding window via sorted sets

What I use in production

Multi-tier limits

Fail-open vs fail-closed

Distributed considerations

Hot key contention

Observability

Common Pitfalls

Wrapping Up

Related posts

Rate Limiting Algorithms, Token Bucket, Leaky Bucket, Sliding Window

Rate Limiting at Scale, Token Bucket and Sliding Window in Redis

November Retro, Security Hardening Sprint

Audit Logging for Backend APIs

Input Validation and the OWASP Top 10

CSRF Defense Patterns in 2022

CORS, What It Actually Protects

API Keys vs OAuth for Third-Party Access

Let’s Start a Project