background-shape
Distributed Rate Limiting with Redis
November 16, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Redis Lua scripts give atomic rate-limit operations across distributed services. Token bucket fits in ~20 lines of Lua. Sliding window uses ZADD + ZREMRANGEBYSCORE. Both run in O(1) memory per key. Hot key contention is the main scaling concern.

After rate limit algorithms, the distributed implementation. Single-process rate limiters don’t work across N service replicas. Redis is the standard shared store.

Why Redis

  • Single-node atomic ops via Lua scripts
  • Sorted sets perfect for sliding window
  • Cheap (in-memory)
  • Supports TTL for cleanup
  • Universal client libraries

For non-Redis users: memcached can do simple counters; Cassandra works at huge scale; in-process with Raft (rare). For most teams: Redis.

Token bucket in Lua

-- KEYS[1] = bucket key, ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now, ARGV[4] = requested
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call("HMGET", KEYS[1], "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

local allowed = 0
if tokens >= requested then
    tokens = tokens - requested
    allowed = 1
end

redis.call("HMSET", KEYS[1], "tokens", tokens, "last_refill", now)
redis.call("EXPIRE", KEYS[1], 86400)
return allowed

From the application:

import time, redis

r = redis.Redis()
LUA = """<the script above>"""
script = r.register_script(LUA)

def allow_request(user_id: str, capacity: int = 100, refill: float = 10) -> bool:
    now = time.time()
    allowed = script(keys=[f"rl:{user_id}"], args=[capacity, refill, now, 1])
    return allowed == 1

EVAL runs atomically. No race conditions between read-modify-write.

Sliding window via sorted sets

def allow_sliding(user_id: str, limit: int, window_s: int) -> bool:
    now = time.time()
    key = f"rl:slide:{user_id}"
    cutoff = now - window_s
    pipe = r.pipeline()
    pipe.zremrangebyscore(key, 0, cutoff)
    pipe.zcard(key)
    pipe.zadd(key, {str(now): now})
    pipe.expire(key, window_s)
    _, count, _, _ = pipe.execute()
    if count >= limit:
        r.zrem(key, str(now))   # undo our add
        return False
    return True

Pros: precise sliding window.

Cons: O(N) memory per key (N = current count). For limits in the millions, sorted set is heavy.

For high-volume use, prefer token bucket; reserve sliding window for low-rate, fairness-sensitive endpoints.

What I use in production

For most rate limits: Lua-based token bucket.

import "github.com/go-redis/redis_rate/v9"

limiter := redis_rate.NewLimiter(rdb)

func loginHandler(w http.ResponseWriter, r *http.Request) {
    res, err := limiter.Allow(r.Context(),
        fmt.Sprintf("login:%s", clientIP(r)),
        redis_rate.PerMinute(10))
    if err != nil {
        // Redis down — fail open or closed depending on policy
        http.Error(w, "internal", 500)
        return
    }
    if res.Allowed == 0 {
        w.Header().Set("Retry-After", fmt.Sprint(res.RetryAfter.Seconds()))
        http.Error(w, "rate limited", 429)
        return
    }
    // ... actual login logic
}

redis_rate is a Go library implementing token bucket via Lua. Equivalent libraries in Python (limits), Node (rate-limiter-flexible).

Multi-tier limits

Often you want multiple limits at the same time:

  • 10 logins/minute per IP
  • 100 logins/hour per IP
  • 5 logins/minute per email
res1, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:%s", ip), redis_rate.PerMinute(10))
res2, _ := limiter.Allow(ctx, fmt.Sprintf("login:ip:h:%s", ip), redis_rate.PerHour(100))
res3, _ := limiter.Allow(ctx, fmt.Sprintf("login:email:%s", email), redis_rate.PerMinute(5))
if res1.Allowed == 0 || res2.Allowed == 0 || res3.Allowed == 0 {
    rateLimit(w)
    return
}

Each tier is a separate Redis op. Adds latency; usually <1ms per. Three operations = ~3ms additional latency. Acceptable.

Fail-open vs fail-closed

When Redis is down, what does your rate limiter do?

Fail open: allow all requests when Redis unavailable. Service stays up; rate limiting temporarily disabled. Default for most cases.

Fail closed: reject all requests. Service appears down. Use only when rate limiting is a security boundary (e.g., login).

res, err := limiter.Allow(...)
if err != nil {
    if isLoginEndpoint {
        // Critical — fail closed
        return errLimited
    }
    // Other endpoints — fail open
    return nil
}

Per-endpoint decision. Document.

Distributed considerations

Single Redis instance: simple, single point of failure.

Redis Sentinel: primary + replicas, automatic failover. Standard.

Redis Cluster: sharded. Rate limit keys distributed across shards. Operations stay on one shard (hash slot per key). Scales horizontally.

For most teams: Sentinel is sufficient. Cluster when single-instance memory becomes the limit.

Hot key contention

A few keys getting hammered (e.g., a single famous user’s bucket) = Redis instance bottleneck for those operations. Mitigation:

  • Sharding by hash of the key
  • Local caching with eventual consistency (sacrificing precision)
  • Pre-warming during traffic spikes

For most applications, hot key contention isn’t an issue. For consumer-scale apps with celebrity users, it can be.

Observability

Emit metrics:

  • Rate-limit hits (counter, labeled by endpoint)
  • Rate-limit current rate (gauge)
  • Redis call latency (histogram)

Alert on:

  • Spike in rate-limit hits (possible attack)
  • Redis call latency growing (Redis overloaded)
  • Connection failures to Redis (fail open active)

Common Pitfalls

Not using Lua/atomic operations. Read-then-write races; bucket goes negative; race-condition allow-fail.

Forgetting EXPIRE. Keys accumulate; Redis fills up.

Multi-tier limits without Allow for the largest first. Inefficient — bail at the most-likely-to-deny tier first.

Per-IP only. Cloud botnets, mobile NAT defeat.

No graceful degradation. Redis blip = whole service erroring.

Synchronous Redis call blocking handler. For 10K req/s, this adds up. Connection pooling + reasonable timeouts.

Wrapping Up

Token bucket via Lua + go-redis_rate library = production rate limiting. Per-IP + per-user tiers. Fail-open default; fail-closed for auth. Friday: API keys vs OAuth for third parties.