Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ Load Is Not Traffic

šŸ“˜ Load Is Not Traffic

šŸ“˜ Load Is Not Traffic

3 Apr 20223 min read

Why ā€œ10k RPSā€ tells you almost nothing

Systems don’t fail because of traffic.
They fail because of work.

1ļøāƒ£ The Most Dangerous Metric in System Design

RPS (Requests Per Second)

It’s easy to measure.
It’s easy to graph.
It’s deeply misleading.

Two systems at 1k RPS can differ by 10Ɨ or 100Ɨ in actual load.

2ļøāƒ£ Traffic vs Load

Traffic

  • Number of incoming requests

  • Measured in RPS

  • External view

Load

  • CPU time

  • Memory pressure

  • DB queries

  • Network I/O

  • Lock contention

Load is internal. Traffic is external.

3ļøāƒ£ Why This Distinction Matters

You can have:

  • Low traffic

  • High load

Or:

  • High traffic

  • Low load

Only load determines failure.

4ļøāƒ£ Example 1 — Same Traffic, Different Load

Two endpoints

Endpoint CPU Cost
/ping 0.2 ms
/feed 50 ms

At 1,000 RPS:

/ping  → 0.2 CPU-seconds/sec
/feed  → 50 CPU-seconds/sec

One melts the system.
One barely registers.

5ļøāƒ£ Why Autoscaling Fails Here

Autoscaling reacts to traffic spikes or CPU usage, not work distribution.

If traffic shifts from /ping to /feed:

  • RPS unchanged

  • CPU explodes

  • Autoscaling reacts too late

6ļøāƒ£ Load Is Multi-Dimensional

Load is not one thing.

Resource Load Example
CPU JSON serialization
Memory Large in-flight buffers
DB Cache misses
Network Large responses
Locks Contended mutex

You overload the weakest dimension first.

7ļøāƒ£ Example 2 — DB Load Hiding Behind Low Traffic

app.get("/stats", async (req, res) => {
  const users = await db.query("SELECT * FROM users");
  res.send(users);
});

Traffic:

10 RPS

DB Load:

Full table scan Ɨ 10/sec

System dies quietly.

8ļøāƒ£ The Real Unit of Load: Work Units

Senior systems measure load in:

  • CPU-milliseconds

  • DB queries/sec

  • Rows scanned/sec

  • Bytes moved/sec

Not RPS.

9ļøāƒ£ Capacity Planning Without Load Awareness Fails

āŒ Naive capacity plan

ā€œWe handle 5k RPS.ā€

āœ… Real capacity plan

ā€œWe handle 200 CPU-ms/sec per core, with 5 DB queries per request.ā€

1ļøāƒ£0ļøāƒ£ Load Is Variable Per Request

Even the same endpoint varies:

  • Cache hit vs miss

  • Hot shard vs cold shard

  • Small vs large payload

Planning for averages guarantees tail failures.

1ļøāƒ£1ļøāƒ£ Code Smell — Treating Traffic as Load āŒ

if (rps > 5000) {
  reject();
}

This protects nothing.

1ļøāƒ£2ļøāƒ£ Correct Pattern — Load-Based Admission Control

let cpuBudget = 8000; // ms/sec
let used = 0;

app.get("/data", async (req, res) => {
  const cost = estimateCost(req);

  if (used + cost > cpuBudget) {
    return res.status(503).send("Overloaded");
  }

  used += cost;
  try {
    res.send(await work());
  } finally {
    used -= cost;
  }
});

Control work, not requests.

1ļøāƒ£3ļøāƒ£ Why Tail Latency Depends on Load, Not Traffic

  • Low traffic + heavy requests → high P99

  • High traffic + light requests → low P99

Tail latency tracks resource contention, not request count.