Back to Blog
Backend Engineering6 min read

Building a Rate Limiter in Pure Python — No Redis Required

No Redis. No django-ratelimit. Just a sliding window, a dictionary, and 40 lines of code.


Most rate limiters hide behind libraries. Here's what's actually inside one — a sliding window, a dictionary, and timestamps you control.


While Redis-backed rate limiters and django-ratelimit work well at scale, there are situations where they feel excessive: small internal tools, personal projects, or lightweight APIs. Building one from scratch is also the best way to understand the underlying mechanics.


The Algorithm: Sliding Window Counter


For every incoming request, examine the last N seconds of activity. Count requests from that client in this window. If the count exceeds your limit, reject it; otherwise allow it and record the timestamp.


The window doesn't reset — it slides. Anything older than 60 seconds gets pruned on every request. This avoids the burst problem at window boundaries that plagues fixed-window approaches.


The Core Logic


import time
from collections import defaultdict

# Stores request timestamps per client key
request_log = defaultdict(list)

def is_rate_limited(client_key: str, limit: int, window_seconds: int) -> bool:
    now = time.time()
    window_start = now - window_seconds
    # Keep only timestamps within the current window
    request_log[client_key] = [
        ts for ts in request_log[client_key]
        if ts > window_start
    ]
    if len(request_log[client_key]) >= limit:
        return True
    request_log[client_key].append(now)
    return False

`request_log` maps client identifiers (IP addresses, user IDs, API keys) to timestamp lists. On each request the function prunes stale timestamps, checks the count, and either blocks or allows the request while recording the new timestamp.


Plugging It Into Django Middleware


import time
from collections import defaultdict
from django.http import JsonResponse

request_log = defaultdict(list)

class RateLimitMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        self.limit = 60        # max requests
        self.window = 60       # per 60 seconds

    def __call__(self, request):
        client_key = self._get_client_key(request)
        if is_rate_limited(client_key, self.limit, self.window):
            return JsonResponse(
                {"error": "Too many requests. Slow down."},
                status=429
            )
        return self.get_response(request)

    def _get_client_key(self, request):
        # Use forwarded IP if behind a proxy, else REMOTE_ADDR
        forwarded_for = request.META.get("HTTP_X_FORWARDED_FOR")
        if forwarded_for:
            return forwarded_for.split(",")[0].strip()
        return request.META.get("REMOTE_ADDR", "unknown")

Register it in settings.py:


MIDDLEWARE = [
    "yourapp.middleware.RateLimitMiddleware",
    # ... rest of your middleware
]

The IP Extraction Detail Worth Knowing


When Django sits behind a load balancer or reverse proxy (Nginx, AWS ALB), REMOTE_ADDR contains the proxy's IP, not the client's. HTTP_X_FORWARDED_FOR may contain comma-separated IPs — the first one is the original client.


Making It Configurable Per-View


For endpoint-specific limits, use a decorator:


from functools import wraps
from django.http import JsonResponse

def rate_limit(limit=30, window=60):
    def decorator(view_func):
        @wraps(view_func)
        def wrapper(request, *args, **kwargs):
            client_key = f"{request.META.get('REMOTE_ADDR')}:{view_func.__name__}"
            if is_rate_limited(client_key, limit, window):
                return JsonResponse({"error": "Rate limit exceeded."}, status=429)
            return view_func(request, *args, **kwargs)
        return wrapper
    return decorator

# Usage
@rate_limit(limit=5, window=60)
def login(request):
    ...

@rate_limit(limit=100, window=60)
def public_feed(request):
    ...

The client_key includes the view name, scoping limits per endpoint so activity on one endpoint doesn't consume quota elsewhere.


What This Doesn't Do — And When to Reach for Redis


This approach has real limitations:


No persistence: the in-memory log vanishes on server restart
No distribution: multiple Gunicorn workers each maintain their own log, allowing clients to multiply quota by hitting different workers
Unbounded growth: under heavy unique-client traffic, memory grows unless cleanup is added

Use this when you have a single-process application, a personal project, or an internal tool. Reach for Redis when you have multiple dynos, multiple EC2 instances behind a load balancer, or any distributed setup requiring shared state.


Build the simple version first. You'll know exactly when you've outgrown it.


Why Your Technical Blog Isn't Ranking (And How to Fix It)The Cryptic Ledger: What a CTF Challenge Taught Me About Hashing