Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ Why Autoscaling Often Makes Things Worse

šŸ“˜ Why Autoscaling Often Makes Things Worse

šŸ“˜ Why Autoscaling Often Makes Things Worse

3 Apr 20223 min read

When ā€œadding capacityā€ destabilizes the system

Autoscaling reacts to symptoms.
Real problems are causal.

1ļøāƒ£ The Dangerous Belief

ā€œIf traffic increases, autoscaling will save us.ā€

This assumes:

  • Load is linear

  • Metrics are immediate

  • New instances are instantly useful

  • Bottlenecks scale horizontally

None of these are true in production.

2ļøāƒ£ Autoscaling Is a Feedback Loop (Control Theory)

Autoscaling is a delayed negative feedback loop.

Timeline:

  1. Load increases

  2. Latency rises

  3. Metrics detect it

  4. Scale-out happens

  5. New instances start

  6. Load redistributes

Each step has delay.

Delayed feedback systems oscillate.

3ļøāƒ£ Failure Mode #1 — Cold Start Amplification

What happens during scale-up

  • JVM warmup

  • Cache cold

  • Connection pools empty

  • JIT not optimized

New instances are slower, not faster.

Autoscaling often increases tail latency first.

4ļøāƒ£ Failure Mode #2 — Scaling the Wrong Bottleneck

Example

  • App scales horizontally

  • Database does not

Result:

  • More app instances

  • More DB connections

  • DB melts faster

Autoscaling increases pressure on the bottleneck.

5ļøāƒ£ Failure Mode #3 — Retry & Autoscaling Feedback Loop

Sequence

  1. Latency increases

  2. Clients retry

  3. Load doubles

  4. Autoscaler scales up

  5. New instances generate retries

  6. System collapses

Autoscaling can turn temporary slowness into total failure.

6ļøāƒ£ Failure Mode #4 — Thrashing (Scale Up ↔ Scale Down)

Why it happens

  • Metrics lag (30–60s)

  • Bursty traffic

  • Aggressive scale rules

Instances are:

  • Created

  • Destroyed

  • Recreated

CPU, memory, and caches are constantly cold.

7ļøāƒ£ Autoscaling Hides Capacity Planning Failures

Teams stop asking:

  • What is our real capacity?

  • Where is the bottleneck?

  • What is the knee point?

They rely on:

ā€œAutoscaling will handle it.ā€

Until it doesn’t.

8ļøāƒ£ Code Example — Naive Autoscaling Trigger āŒ

scaleUp:
  cpuUtilization: 70%
  add: 2 pods

Why this is bad

  • CPU is a lagging indicator

  • P99 already exploded

  • Scaling happens too late

9ļøāƒ£ The Right Mental Model

Autoscaling is a cost optimization tool — not a reliability tool.

It is good for:

  • Diurnal traffic

  • Predictable growth

  • Cost efficiency

It is bad for:

  • Spikes

  • Cascading failures

  • Bottlenecked systems

šŸ”‘ What Actually Prevents Failure (Before Autoscaling)

In order of importance:

  1. Hard concurrency limits

  2. Backpressure

  3. Load shedding

  4. Timeouts

  5. Circuit breakers

  6. Autoscaling (last)

1ļøāƒ£0ļøāƒ£ Autoscaling Done Right (When It Helps)

Autoscaling helps only if:

  • Bottleneck scales horizontally

  • Cold start cost is low

  • Load changes slowly

  • Limits already exist

āœ… Better Pattern

if (inFlight > MAX_CAPACITY) {
  return res.status(503).send("Overloaded");
}

Autoscaling then:

  • Handles long-term growth

  • Not short-term survival

1ļøāƒ£1ļøāƒ£ Real-World Examples

Netflix

  • Heavy load shedding

  • Autoscaling secondary

AWS APIs

  • Strict throttling

  • Autoscaling behind limits

Google

  • Explicit admission control

  • Autoscaling only after safety