Taming the "Thundering Herd": Protecting Your Database from Mass Cache Misses

3 Apr 20226 min read

In our previous post, we designed a scalable schema for a Reddit-like comment system. We chose our primary keys, set up parent-child relationships, and implemented a Cache-Aside pattern to keep our database from sweating.

But imagine this: A breaking news story hits the front page. Within seconds, 50,000 users click the thread. The cache is empty because the story is brand new.

Suddenly, 50,000 concurrent threads all realize the cache is empty at the exact same millisecond. They all "thunder" toward your database to run the same expensive SQL query. This is the Thundering Herd Problem (also known as a Cache Stampede).

The Anatomy of a Stampede

In a standard LLD implementation, your "Get Comments" logic probably looks like this:

async function getComments(postId: string) {
    let comments = await redis.get(postId);
    
    if (!comments) {
        // THE DANGER ZONE: 50k requests enter here simultaneously
        comments = await db.query("SELECT * FROM comments WHERE post_id = ?", [postId]);
        await redis.set(postId, comments, 'EX', 3600);
    }
    
    return comments;
}

When the cache expires or is cold, the database isn't just hit once; it’s hit by every single active worker. This spike in CPU and connection usage can crash your DB before the first worker even has a chance to write the result back to the cache.

Strategy A: Request Coalescing (Promises as a Shield)

The most elegant "Low-Level" fix is to ensure that for any given key, only one request is allowed to talk to the database at a time.

// Map<key, Promise>
const inFlight = new Map();

async function getItem(key) {
  const cached = cache.get(key);
  if (cached) return cached;

  // Promise shield will be added here
}

async function getItem(key) {
  const cached = cache.get(key);
  if (cached) return cached;

  if (inFlight.has(key)) {
    // 🛡 Shield active — wait for same promise
    return inFlight.get(key);
  }

  // First request becomes the leader
  const promise = (async () => {
    try {
      const value = await db.fetch(key);
      cache.set(key, value);
      return value;
    } finally {
      // IMPORTANT: cleanup
      inFlight.delete(key);
    }
  })();

  inFlight.set(key, promise);
  return promise;
}

  // All waiting callers receive the same resolved value
  const [a, b, c] = await Promise.all([
    getItem("42"),
    getItem("42"),
    getItem("42")
  ]);
  
  // Only ONE db.fetch("42") executed

How it works: Instead of storing just data in your application's memory, you store a Promise.
The Logic: If Request A sees a cache miss, it creates a "Loading Promise" and puts it in a local map. When Request B comes in a millisecond later, it sees the Promise is already "In-Flight" and simply waits for Request A to finish.

Instagram famously used this "Promises" approach to scale their infrastructure. They didn't just cache the value; they cached the intent to fetch the value.

Strategy B: Adding "Jitter" to TTLs

Sometimes we cause our own thundering herds by being too organized. If you set a 24-hour expiry on 1 million comments at exactly midnight, they will all expire at exactly midnight tomorrow.

// ❌ All keys expire at the same time
cache.set("user:1", data1, { ttl: 60_000 });
cache.set("user:2", data2, { ttl: 60_000 });
cache.set("user:3", data3, { ttl: 60_000 });

The Fix: Add Jitter (randomness).

  function withJitter(baseTtlMs, jitterMs) {
    const delta = Math.floor(Math.random() * jitterMs);
    return baseTtlMs - delta;
  }
  
  // Usage
  cache.set("user:1", data, {
    ttl: withJitter(60_000, 10_000) // 50s – 60s
  });

There are more ways to add Jitter which will be covered in upcoming post.

Formula: $TTL = BaseTTL + random(0, MaxJitter)$

By spreading the expiration over a 10-minute window, you transform a vertical spike into a manageable "hill" of traffic.

Strategy C: The "Stale-While-Revalidate" Pattern

For a system like Reddit, is it life-threatening if a user sees a comment 5 seconds late? Usually not.

We can use a Soft TTL and a Hard TTL:

Soft TTL (e.g., 5 mins): When this expires, the first user who hits it triggers a background refresh but is immediately served the old data.
Hard TTL (e.g., 60 mins): The data is deleted only if it hasn't been refreshed for an hour.

This ensures your users never wait on a database query, and your database never sees a spike.

Strategy D: The Tamed Herd (Lock + Cache Protection)

If Strategy A (Coalescing) is about managing memory, and Strategy B (Jitter) is about spreading the load, the Tamed Herd is about absolute enforcement.

In this model, we use a Distributed Lock (typically via Redis) to ensure that even if 100 separate application servers realize the cache is empty, only one single server is granted "permission" to query the database. The others are "tamed"—they are forced to wait and then re-check the cache.

How the "Tame" Flow Works:

The Trigger: A hot post (e.g., a viral Reddit thread) expires from the cache.
The Race: 1,000 concurrent requests hit your API. They all see a cache miss.
The Lock (SET NX): Each request tries to acquire a lock in Redis using the SET NX (Not Exists) command.
- The Winner: One request gets the lock and proceeds to the Database.
- The Herd: The other 999 requests fail to get the lock and enter a brief "sleep and retry" loop.
The Update: The Winner fetches from the DB, populates the cache, and releases the lock.
The Resolution: The 999 waiting requests wake up, see the cache is now full, and return the data without ever touching the database.

LLD Implementation (The Double-Check Lock)

This pattern is very similar to the "Double-Checked Locking" used in Singleton patterns, but scaled for distributed systems.

async function getCommentsTamed(postId: string) {
    // 1. Fast path: Check Cache
    let data = await cache.get(postId);
    if (data) return data;

    const lockKey = `lock:${postId}`;
    
    // 2. Try to acquire the lock (expires in 10s to prevent deadlocks)
    const acquired = await redis.set(lockKey, "locked", "NX", "EX", 10);

    if (acquired) {
        try {
            // 3. DOUBLE CHECK: Someone might have filled the cache 
            // while we were waiting for the lock!
            data = await cache.get(postId);
            if (data) return data;

            // 4. The Protected Query
            const result = await db.query("SELECT * FROM comments WHERE post_id = ?", [postId]);
            await cache.set(postId, result, "EX", 3600);
            return result;
        } finally {
            // 5. Always release the lock
            await redis.del(lockKey);
        }
    } else {
        // 6. Tamed: Wait 50ms and retry
        await sleep(50);
        return getCommentsTamed(postId);
    }
}

Database Safety: It guarantees $O(1)$ load on your database for any specific key, regardless of how many millions of users are active.
Cluster-Wide Sync: Unlike Strategy A (which only works within one server), the Tamed Herd works across your entire global infrastructure.

Comparison

Strategy	Complexity	Best For	DB Load
Jitter	Low	General traffic spikes	Reduced, but still $O(N)$
Request Coalescing	Medium	Saving App Memory / Node.js apps	1 query per server
Tamed Herd (Locking)	High	Hot Keys (Viral content)	1 query per cluster