Isaac.

Implement Rate Limiting

A practical, framework-by-framework guide to protecting your APIs from abuse.

What is Rate Limiting?

Rate limiting is a technique that restricts how many requests a user, IP address, or API token can make to a service within a certain time window. It protects resources, ensures fair usage, and helps prevent abuse like accidental request storms, brute-force attacks, or excessive scraping.

Why it matters

Without rate limiting, an API is vulnerable to rapid request spikes that can cause degraded performance or complete outages. Rate limiting enforces predictable load, improves reliability, and protects downstream systems (databases, third-party APIs) from being overwhelmed.

If ignored, attackers or misconfigured clients can consume resources, increase costs, expose data, and create poor experience for legitimate users.

Common approaches

  • Fixed window: Count requests in fixed time windows (e.g., 100 reqs/min). Easy but can have spikes at boundaries.
  • Sliding window: Smooths counts across time—more accurate than fixed windows.
  • Leaky bucket / Token bucket: Good for smoothing bursts; tokens are consumed and refilled at a steady rate.
  • Per-key vs global: Limits can be per-user, per-IP, per-API-key, or global depending on needs.

Examples & code (by framework)

ASP.NET (C#) — simple middleware with in-memory token bucket

This example shows a minimal middleware using an in-memory token bucket. For production, prefer a distributed store (Redis) when running multiple instances.

// RateLimitingMiddleware.cs
using System.Collections.Concurrent;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;

public class RateLimitingMiddleware
{
    private static ConcurrentDictionary<string, (int tokens, long lastRefill)> _store = new();
    private readonly RequestDelegate _next;
    private readonly int _capacity = 100; // tokens
    private readonly int _refillPerMinute = 100;

    public RateLimitingMiddleware(RequestDelegate next) { _next = next; }

    public async Task InvokeAsync(HttpContext context)
    {
        var key = context.Connection.RemoteIpAddress?.ToString() ?? "anon";
        var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();

        _store.AddOrUpdate(key,
            _ => (_capacity - 1, now),
            (_, state) => {
                var (tokens, lastRefill) = state;
                var minutes = (now - lastRefill) / 60;
                if (minutes > 0)
                {
                    var refill = (int)(minutes * _refillPerMinute);
                    tokens = Math.Min(_capacity, tokens + refill);
                    lastRefill = now;
                }

                if (tokens <= 0) return (tokens, lastRefill);
                return (tokens - 1, lastRefill);
            });

        var current = _store[key];
        if (current.tokens < 0)
        {
            context.Response.StatusCode = StatusCodes.Status429TooManyRequests;
            await context.Response.WriteAsync("Too many requests");
            return;
        }

        await _next(context);
    }
}

Explanation: This middleware tracks tokens per IP in a concurrent dictionary. On every request it refills tokens based on elapsed time and consumes one. If no tokens remain, it returns HTTP 429. For multi-instance deployments use Redis or another shared datastore.

Spring Boot (Java) — using bucket4j + Redis

Bucket4j is a proven token-bucket library. The snippet below shows a basic filter wiring a Redis-backed bucket.

// RateLimitFilter.java (simplified)
import io.github.bucket4j.Bucket;
import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Refill;
import io.github.bucket4j.grid.ProxyManager;
import io.github.bucket4j.grid.jcache.JCacheBucketBuilder;
import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.time.Duration;

public class RateLimitFilter implements Filter {
    private ProxyManager<String> buckets; // assume configured with Redis or JCache

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        HttpServletRequest req = (HttpServletRequest) request;
        String key = req.getRemoteAddr();

        Bucket bucket = buckets.getProxy(key, () -> Bucket.builder()
            .addLimit(Bandwidth.classic(100, Refill.intervally(100, Duration.ofMinutes(1))))
            .build());

        if (bucket.tryConsume(1)) {
            chain.doFilter(request, response);
        } else {
            ((HttpServletResponse) response).setStatus(429);
            response.getWriter().write("Too many requests");
        }
    }
}

Explanation: Bucket4j abstracts token buckets and supports distributed stores. Configure a ProxyManager backed by Redis (or other store) so buckets are shared between instances. This filter consumes a token per request and rejects when empty.

Express (Node) — express-rate-limit (memory) and Redis alternative

For single-instance apps, express-rate-limit is easy. For clustered apps, use a Redis store like rate-limit-redis.

// app.js
const express = require('express');
const rateLimit = require('express-rate-limit');
// const RedisStore = require('rate-limit-redis');
// const Redis = require('ioredis');

const app = express();

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per windowMs
  message: 'Too many requests, please try again later.'
});

app.use(limiter);

app.get('/', (req, res) => res.send('hello'));
app.listen(3000);

Explanation: This sets a simple fixed-window limit per IP. Replace the in-memory store with a Redis-backed store for multiple processes to share counters.

Next.js (Edge / API routes) — using middleware

Next.js middleware (Edge) can intercept requests before hitting API routes. Use an external store (Redis) to track counts.

// middleware.ts (Next.js app router)
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

export async function middleware(request: NextRequest) {
  const ip = request.ip ?? 'unknown';
  // pseudo-code: increment counter in Redis and check
  // const remaining = await redis.decr(`rate:${ip}`);
  // if (remaining < 0) return new NextResponse('Too many requests', { status: 429 });

  return NextResponse.next();
}

export const config = { matcher: '/api/:path*' };

Explanation: The middleware runs at the edge and is a good place to enforce global API limits. Use Redis INCR with EXPIRE or Lua script for atomicity when implementing counters.

Flask (Python) — Flask-Limiter (Redis backend)

Flask-Limiter is a mature extension. Configure it with Redis to work across multiple workers.

# app.py
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(app, key_func=get_remote_address, storage_uri="redis://localhost:6379")

@app.route('/')
@limiter.limit("100/minute")
def index():
    return "Hello"

if __name__ == '__main__':
    app.run()

Explanation: The decorator @limiter.limit("100/minute") sets the policy. Flask-Limiter supports many storage backends and flexible rate expressions.

Laravel (PHP) — built-in rate limiting

Since Laravel 8, the framework provides rate limiting via the RateLimiter facade. The examples show per-user and global limits.

// App\Providers\RouteServiceProvider.php (snippet)
use Illuminate\Support\Facades\RateLimiter;
use Illuminate\Http\Request;

public function boot()
{
    RateLimiter::for('api', function (Request $request) {
        return Limit::perMinute(100)->by($request->user()?->id ?: $request->ip());
    });
}

Explanation: Laravel allows naming rate limiters and using them in routes or middleware. The limit is applied per user ID, falling back to IP if unauthenticated. For distributed apps, the cache driver must be shared (Redis, Memcached).

Testing & deployment notes

  • Always test limits with realistic clients (simulate bursts and slow clients).
  • Expose helpful headers (e.g., X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After).
  • Use a distributed store (Redis) for multi-instance apps and to avoid "split-brain" counters.
  • Monitor and alert on 429 rates — a sudden rise can mean an attack or misconfiguration.
  • Consider exemptions (internal services, health checks) and separate quotas for important clients.

Conclusion

Rate limiting is a critical part of any resilient API. It prevents abuse, keeps costs predictable, and protects downstream services. Ignoring it can lead to downtime, increased billings, and a poor user experience for legitimate consumers. Implement sensible defaults, choose an approach that fits your scale (memory for simple apps, Redis or other distributed store for clusters), and communicate limits clearly to clients via headers and documentation.

If you want, I can also generate a copy of the same article as plain Markdown, a printable PDF, or split the code examples into separate files for each framework.