š Correctness Under Concurrency
When your system is fast, scalable ā and still wrong
Most production bugs are not caused by slowness.
They are caused by multiple things happening at the same time.
Concurrency bugs exist:
At low traffic
On one machine
Even with perfect hardware
They are logic bugs, not capacity bugs.
1ļøā£ What āCorrectnessā Actually Means
A system is correct if it behaves as expected under all valid interleavings of operations.
Not:
āWorks on my machineā
āPasses unit testsā
āWorks at low trafficā
But:
Works when operations overlap in unpredictable ways.
2ļøā£ Why Concurrency Breaks Correctness
Because time is not linear in distributed systems.
You think this happens:
A ā B ā C
Reality:
A ā
B
A ā C
Operations interleave.
3ļøā£ The Simplest Concurrency Bug: Lost Update
ā Naive Code
async function likePost(postId) {
const post = await db.get(postId);
post.likes += 1;
await db.save(post);
}
What you expect
2 likes ā likes + 2
What actually happens
2 likes ā likes + 1 ā
Concurrency overwrote correctness.
4ļøā£ Why This Happens (Key Insight)
The bug is not āmissing a lockā.
The bug is read-modify-write without protection.
This pattern is always unsafe under concurrency.
5ļøā£ Naive Fix #1 ā Application Locks ā
await mutex.lock();
const post = await db.get(postId);
post.likes += 1;
await db.save(post);
await mutex.unlock();
Why this fails
Only works in one process
Fails with multiple instances
Kills throughput
6ļøā£ Correct Fix #1 ā Atomic Operations (Best)
UPDATE posts
SET likes = likes + 1
WHERE id = ?
Why this works
Atomic at DB level
No read-modify-write
Scales correctly
Push correctness down to the lowest layer possible.
7ļøā£ Correct Fix #2 ā Optimistic Concurrency Control (OCC)
Version-based update
const post = await db.get(postId);
await db.update(
{ id: postId, version: post.version },
{ likes: post.likes + 1, version: post.version + 1 }
);
If version mismatches ā retry.
8ļøā£ Correctness Bug #2 ā Check-Then-Act
ā Broken Logic
if (balance >= amount) {
balance -= amount;
}
Why it fails
Condition can change
Assumption becomes false
9ļøā£ Correct Fix ā Combine Check + Act
UPDATE accounts
SET balance = balance - 100
WHERE id = ? AND balance >= 100
Rows affected:
1ā success0ā insufficient funds
š Correctness Bug #3 ā Idempotency
ā Duplicate Request Bug
createOrder(order);
createOrder(order); // retry
Result:
- Two orders ā
1ļøā£1ļøā£ Correct Fix ā Idempotency Keys
if (seen(idempotencyKey)) return cachedResult;
const result = createOrder(order);
store(idempotencyKey, result);
Correctness is about recognizing repeated intent.
1ļøā£2ļøā£ Why Correctness Is Harder Than Performance
| Performance | Correctness |
|---|---|
| Gradual failure | Binary failure |
| Can degrade | Cannot be āalmost rightā |
| Measurable | Often invisible |
| Testable | Needs reasoning |
1ļøā£3ļøā£ Why Load Doesnāt Matter Here
Concurrency bugs:
Happen at 2 requests
Reproduce rarely
Pass load tests
Fail in prod
This is why senior engineers fear them.
šÆ Golden Rules for Correctness Under Concurrency
Avoid read-modify-write
Use atomic primitives
Prefer database-enforced correctness
Make operations idempotent
Assume retries happen
Design for reordering
