📘 Load Is Not Traffic

3 Apr 20223 min read

Why “10k RPS” tells you almost nothing

Systems don’t fail because of traffic.
They fail because of work.

1️⃣ The Most Dangerous Metric in System Design

RPS (Requests Per Second)

It’s easy to measure.
It’s easy to graph.
It’s deeply misleading.

Two systems at 1k RPS can differ by 10× or 100× in actual load.

2️⃣ Traffic vs Load

Traffic

Number of incoming requests
Measured in RPS
External view

Load

CPU time
Memory pressure
DB queries
Network I/O
Lock contention

Load is internal. Traffic is external.

3️⃣ Why This Distinction Matters

You can have:

Low traffic
High load

Or:

High traffic
Low load

Only load determines failure.

4️⃣ Example 1 — Same Traffic, Different Load

Two endpoints

Endpoint	CPU Cost
`/ping`	0.2 ms
`/feed`	50 ms

At 1,000 RPS:

/ping  → 0.2 CPU-seconds/sec
/feed  → 50 CPU-seconds/sec

One melts the system.
One barely registers.

5️⃣ Why Autoscaling Fails Here

Autoscaling reacts to traffic spikes or CPU usage, not work distribution.

If traffic shifts from /ping to /feed:

RPS unchanged
CPU explodes
Autoscaling reacts too late

6️⃣ Load Is Multi-Dimensional

Load is not one thing.

Resource	Load Example
CPU	JSON serialization
Memory	Large in-flight buffers
DB	Cache misses
Network	Large responses
Locks	Contended mutex

You overload the weakest dimension first.

7️⃣ Example 2 — DB Load Hiding Behind Low Traffic

app.get("/stats", async (req, res) => {
  const users = await db.query("SELECT * FROM users");
  res.send(users);
});

Traffic:

10 RPS

DB Load:

Full table scan × 10/sec

System dies quietly.

8️⃣ The Real Unit of Load: Work Units

Senior systems measure load in:

CPU-milliseconds
DB queries/sec
Rows scanned/sec
Bytes moved/sec

Not RPS.

9️⃣ Capacity Planning Without Load Awareness Fails

❌ Naive capacity plan

“We handle 5k RPS.”

✅ Real capacity plan

“We handle 200 CPU-ms/sec per core, with 5 DB queries per request.”

1️⃣0️⃣ Load Is Variable Per Request

Even the same endpoint varies:

Cache hit vs miss
Hot shard vs cold shard
Small vs large payload

Planning for averages guarantees tail failures.

1️⃣1️⃣ Code Smell — Treating Traffic as Load ❌

if (rps > 5000) {
  reject();
}

This protects nothing.

1️⃣2️⃣ Correct Pattern — Load-Based Admission Control

let cpuBudget = 8000; // ms/sec
let used = 0;

app.get("/data", async (req, res) => {
  const cost = estimateCost(req);

  if (used + cost > cpuBudget) {
    return res.status(503).send("Overloaded");
  }

  used += cost;
  try {
    res.send(await work());
  } finally {
    used -= cost;
  }
});

Control work, not requests.

1️⃣3️⃣ Why Tail Latency Depends on Load, Not Traffic

Low traffic + heavy requests → high P99
High traffic + light requests → low P99

Tail latency tracks resource contention, not request count.