š Load Is Not Traffic
Why ā10k RPSā tells you almost nothing
Systems donāt fail because of traffic.
They fail because of work.
1ļøā£ The Most Dangerous Metric in System Design
RPS (Requests Per Second)
Itās easy to measure.
Itās easy to graph.
Itās deeply misleading.
Two systems at 1k RPS can differ by 10Ć or 100Ć in actual load.
2ļøā£ Traffic vs Load
Traffic
Number of incoming requests
Measured in RPS
External view
Load
CPU time
Memory pressure
DB queries
Network I/O
Lock contention
Load is internal. Traffic is external.
3ļøā£ Why This Distinction Matters
You can have:
Low traffic
High load
Or:
High traffic
Low load
Only load determines failure.
4ļøā£ Example 1 ā Same Traffic, Different Load
Two endpoints
| Endpoint | CPU Cost |
|---|---|
/ping |
0.2 ms |
/feed |
50 ms |
At 1,000 RPS:
/ping ā 0.2 CPU-seconds/sec
/feed ā 50 CPU-seconds/sec
One melts the system.
One barely registers.
5ļøā£ Why Autoscaling Fails Here
Autoscaling reacts to traffic spikes or CPU usage, not work distribution.
If traffic shifts from /ping to /feed:
RPS unchanged
CPU explodes
Autoscaling reacts too late
6ļøā£ Load Is Multi-Dimensional
Load is not one thing.
| Resource | Load Example |
|---|---|
| CPU | JSON serialization |
| Memory | Large in-flight buffers |
| DB | Cache misses |
| Network | Large responses |
| Locks | Contended mutex |
You overload the weakest dimension first.
7ļøā£ Example 2 ā DB Load Hiding Behind Low Traffic
app.get("/stats", async (req, res) => {
const users = await db.query("SELECT * FROM users");
res.send(users);
});
Traffic:
10 RPS
DB Load:
Full table scan Ć 10/sec
System dies quietly.
8ļøā£ The Real Unit of Load: Work Units
Senior systems measure load in:
CPU-milliseconds
DB queries/sec
Rows scanned/sec
Bytes moved/sec
Not RPS.
9ļøā£ Capacity Planning Without Load Awareness Fails
ā Naive capacity plan
āWe handle 5k RPS.ā
ā Real capacity plan
āWe handle 200 CPU-ms/sec per core, with 5 DB queries per request.ā
1ļøā£0ļøā£ Load Is Variable Per Request
Even the same endpoint varies:
Cache hit vs miss
Hot shard vs cold shard
Small vs large payload
Planning for averages guarantees tail failures.
1ļøā£1ļøā£ Code Smell ā Treating Traffic as Load ā
if (rps > 5000) {
reject();
}
This protects nothing.
1ļøā£2ļøā£ Correct Pattern ā Load-Based Admission Control
let cpuBudget = 8000; // ms/sec
let used = 0;
app.get("/data", async (req, res) => {
const cost = estimateCost(req);
if (used + cost > cpuBudget) {
return res.status(503).send("Overloaded");
}
used += cost;
try {
res.send(await work());
} finally {
used -= cost;
}
});
Control work, not requests.
1ļøā£3ļøā£ Why Tail Latency Depends on Load, Not Traffic
Low traffic + heavy requests ā high P99
High traffic + light requests ā low P99
Tail latency tracks resource contention, not request count.
