š Capacity Planning Is Harder Than It Looks
Why āweāre under 50% CPUā is a dangerous lie
Capacity planning is not about how fast your system is.
Itās about how it fails under load.
1ļøā£ The Most Common (Wrong) Mental Model
ā What teams think
āOur service can handle 10k RPS.ā
ā Reality
Capacity is work per second, not requests per second.
Two requests are never equal.
2ļøā£ The First Big Mistake ā RPS ā Load
Example
| Request | CPU Cost |
|---|---|
| GET /profile | 1 ms |
| GET /feed | 40 ms |
| POST /checkout | 120 ms |
At 1,000 RPS:
All profiles ā fine
All checkout ā system dead
Traffic shape matters more than traffic volume.
3ļøā£ The Real Unit of Capacity: CPU-Seconds
Key Rule
A system has N CPU-seconds per second.
Example:
8 cores
Each core ā 1 CPU-second / second
Total budget:
8 CPU-seconds / second
If one request costs:
50 ms CPU = 0.05 CPU-seconds
Max sustainable throughput:
8 / 0.05 = 160 RPS
No amount of async changes this.
4ļøā£ Littleās Law (The Law Everyone Ignores)
Concurrency = Throughput Ć Latency
If:
Throughput = 200 RPS
Latency = 500 ms
Then:
Concurrency = 100 in-flight requests
Increase latency ā concurrency explodes ā memory explodes.
5ļøā£ Why āHeadroomā Doesnāt Save You
ā Common rule
āKeep CPU under 70%ā
Why this fails
GC pauses
Kernel scheduling
Cache misses
Lock contention
At ~70ā80% CPU:
Context switching skyrockets
Tail latency spikes
Throughput drops
Systems donāt degrade linearly.
6ļøā£ The Knee of the Curve (Critical Insight)
Every system has a knee point:
Before knee ā stable
After knee ā latency explodes
Slight load increase ā collapse
7ļøā£ Capacity Planning Mistake #2 ā Ignoring Variance
Even if average load is safe:
P95 requests
Cache misses
Cold shards
Slow disks
ā¦will dominate P99.
Capacity planning must consider worst-case paths, not averages.
8ļøā£ Async, Queues, and the Capacity Illusion
ā āWe added async + a queueā
What actually happened:
Requests stopped blocking
In-flight count increased
Memory usage spiked
Tail latency worsened
9ļøā£ The Only Capacity That Matters: Bottlenecks
Your systemās capacity is the minimum of:
CPU
Memory
DB connections
Disk IOPS
Network
External dependencies
You donāt scale a system.
You scale its tightest bottleneck.
1ļøā£0ļøā£ Practical Capacity Model (Simple & Useful)
For each critical dependency, compute:
Capacity = (Resource Budget) / (Cost per request)
Example:
DB connections = 100
Avg query time = 50 ms
Max DB RPS:
100 / 0.05 = 2000 RPS
Now apply:
Cache miss rate
Retries
Fanout
Real capacity is much lower.
1ļøā£1ļøā£ Why Load Tests Lie
Load tests often:
Use uniform traffic
Ignore cache warmup
Ignore retries
Ignore long tails
Result:
āIt handled 5k RPS in staging!ā
1ļøā£2ļøā£ What Good Capacity Planning Actually Looks Like
ā Principles
Plan for P99 cost, not average
Assume partial failure
Model fanout
Enforce hard limits
Fail fast beyond capacity
1ļøā£3ļøā£ Capacity Without Limits Is Fiction
ā No limit
app.get("/data", async (req, res) => {
res.send(await work());
});
ā Capacity-aware
if (inFlight > MAX) {
return res.status(503).send("Over capacity");
}
Capacity only exists if itās enforced.
1ļøā£4ļøā£ The Dark Truth
Capacity planning is not about preventing overload.
Itās about deciding how you fail.
Fail modes:
Slow failure ā
Cascading failure ā
Fast, bounded failure ā
