Loading...

12-Hour Money-Back Guarantee

📘 Clock Skew & Time-Based Bugs

📘 Clock Skew & Time-Based Bugs

📘 Clock Skew & Time-Based Bugs

3 Apr 20224 min read

When “now” is different everywhere

In distributed systems, time is a guess.
And guesses break correctness.

1️⃣ The False Assumption (Root of All Time Bugs)

“Time moves forward at the same rate everywhere.”

This is never true in distributed systems.

Reasons:

  • Clock drift (hardware)

  • NTP adjustments

  • VM pauses

  • GC pauses

  • Leap seconds

  • Network delays

Two machines disagreeing by milliseconds is normal.
By seconds is common.
By minutes happens in production.

2️⃣ What Is Clock Skew?

Clock Skew = difference between clocks on different machines.

Machine A: 10:00:01
Machine B: 09:59:58

Both are “correct” locally.

3️⃣ Why Time-Based Logic Is Dangerous

Time is often used to:

  • Order events

  • Enforce uniqueness

  • Expire data

  • Resolve conflicts

  • Detect staleness

Every one of these can break under skew.

4️⃣ Bug #1 — Time-Based Ordering Is Wrong

❌ Naive Event Ordering

events.sort((a, b) => a.timestamp - b.timestamp);

What you assume

  • Earlier timestamp → happened first

Reality

  • Event B may have happened after A but recorded earlier

Sorting by timestamp lies about causality.

5️⃣ Bug #2 — “Latest Write Wins” Loses Writes

❌ LWW Conflict Resolution

if (incoming.timestamp > stored.timestamp) {
  overwrite();
}

Failure Scenario

  • Node with fast clock overwrites correct data

  • Node with slow clock loses updates forever

Correctness depends on clock accuracy, which you don’t control.

6️⃣ Bug #3 — Time-Based IDs Are Not Unique

❌ Timestamp-based IDs

const id = Date.now();

Under concurrency:

  • Same millisecond

  • Same ID

  • Collision

Worse across machines:

  • Clock jumps backward → duplicates

7️⃣ Bug #4 — TTL & Expiry Bugs

❌ Local Time Expiry

if (Date.now() > expiresAt) {
  invalidate();
}

Failure Modes

  • Clock jumps forward → premature expiry

  • Clock jumps backward → never expires

  • Different nodes disagree on validity

8️⃣ Bug #5 — Timeouts That Aren’t Comparable

❌ Distributed Timeout Logic

if (Date.now() - start > timeout) {
  fail();
}

If start was recorded on another machine:

  • Negative elapsed time

  • Infinite waits

  • Premature failures

9️⃣ Why NTP Doesn’t “Fix” This

NTP:

  • Adjusts clocks gradually

  • Sometimes jumps time

  • Can go backward

NTP reduces skew.
It does not eliminate it.

1️⃣0️⃣ The Golden Rule of Time

Never use wall-clock time to establish ordering or correctness.

Use time only for:

  • Human display

  • Logging

  • Approximate expiry (with tolerance)

1️⃣1️⃣ Correct Tool #1 — Monotonic Clocks

Monotonic clocks:

  • Always move forward

  • Not affected by NTP

  • Local only

✅ Correct Timeout Measurement

const start = performance.now();
// work
if (performance.now() - start > timeoutMs) {
  fail();
}

Monotonic clocks are safe only locally.

1️⃣2️⃣ Correct Tool #2 — Versioning (Not Time)

Replace timestamps with:

  • Versions

  • Counters

  • CAS tokens

UPDATE doc
SET value = ?, version = version + 1
WHERE id = ? AND version = ?

Correctness without time.

1️⃣3️⃣ Correct Tool #3 — Logical Clocks

Lamport Clock (Simple)

counter++

Attach counter to events.

Guarantee:

  • Causal ordering (not real time)

1️⃣4️⃣ Correct Tool #4 — Vector Clocks (Advanced)

Track causality across nodes.

Use when:

  • Conflict resolution matters

  • Order matters

  • Strong correctness required

Cost:

  • Metadata size

  • Complexity

1️⃣5️⃣ Correct Tool #5 — Server-Assigned Time

If time is needed:

  • Assign it in one place

  • Accept bottleneck

INSERT INTO events (created_at)
VALUES (CURRENT_TIMESTAMP);

Centralized time is slow — but correct.

1️⃣6️⃣ Why “Time-Based Sharding” Is Risky

logs_2026_10
logs_2026_11

Clock skew →

  • Writes go to wrong shard

  • Reads miss data

  • Data loss illusions

Always add overlap or buffers.

1️⃣7️⃣ Production Rulebook (Hard-Won)

  1. Never compare timestamps from different machines

  2. Never rely on client time

  3. Never order events by wall-clock time

  4. Use monotonic clocks for durations

  5. Use versions for correctness

  6. Assume clocks go backward