Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ SQS vs Kafka: Correctness Tradeoffs

šŸ“˜ SQS vs Kafka: Correctness Tradeoffs

šŸ“˜ SQS vs Kafka: Correctness Tradeoffs

30 Mar 20224 min read

Choosing between ā€œevent deliveryā€ and ā€œevent historyā€

SQS and Kafka solve different correctness problems.
Throughput is secondary.

1ļøāƒ£ The Wrong Question People Ask

āŒ Wrong

ā€œWhich is faster? SQS or Kafka?ā€

āœ… Right

ā€œWhat correctness guarantees does my system actually need?ā€

Because once you pick:

  • SQS → you accept lossy history

  • Kafka → you accept coordination cost

You cannot escape the tradeoff.

2ļøāƒ£ Core Mental Model (This Matters)

🟦 SQS Mental Model

A distributed work queue

  • Message exists until processed

  • After deletion → gone forever

  • Queue represents pending work

🟄 Kafka Mental Model

An append-only distributed log

  • Messages are immutable

  • Consumers track position

  • Log represents event history

This difference explains every correctness tradeoff.

3ļøāƒ£ Delivery Semantics (The Foundation)

Property SQS Kafka
Delivery At-least-once At-least-once
Exactly-once āŒ āš ļø (within constraints)
Ordering FIFO only Partition-level
Message retention Until consumed Time/size-based
Replay āŒ āœ…

Kafka remembers.
SQS forgets.

4ļøāƒ£ Duplicate Processing (Both Have It, Differently)

SQS Duplicates

  • Visibility timeout expires

  • Consumer crashes

  • Message reappears

Process → crash → process again

Kafka Duplicates

  • Consumer crashes before commit

  • Offset not committed

  • Message replayed

Process → crash → replay from offset

Both are at-least-once.
But Kafka lets you replay intentionally.

5ļøāƒ£ Correctness Implication #1 — Idempotency

In SQS

Idempotency is mandatory.

No replay = no recovery

If you mess up:

  • Data is wrong forever

In Kafka

Idempotency is strongly recommended.

But you have:

  • Rewind

  • Reprocess

  • Fix-forward

Kafka tolerates mistakes.
SQS does not.

6ļøāƒ£ Ordering Guarantees (Subtle but Huge)

SQS

  • Standard: āŒ no ordering

  • FIFO: ordering per MessageGroupId

If one message is slow:

  • Entire group blocks

Kafka

  • Ordering per partition

  • You control partitioning key

orderId → same partition

Parallelism + ordering = possible.

7ļøāƒ£ Correctness Implication #2 — Causality

SQS

Once message is deleted:

  • Causality is lost

  • History is gone

You cannot answer:

ā€œWhat happened before this?ā€

Kafka

Log preserves causality.

You can:

  • Reconstruct timelines

  • Debug bugs

  • Validate invariants

Kafka is debuggable.
SQS is operational.

8ļøāƒ£ Failure Recovery (Huge Difference)

SQS Failure Recovery

Failure Outcome
Buggy consumer Data corrupted
Bad deploy Messages gone
Wrong logic No rewind

Only option:

  • Manual repair

  • Re-run upstream jobs

Kafka Failure Recovery

Failure Outcome
Buggy consumer Reset offset
Bad deploy Replay
New logic Reprocess

Kafka gives you time travel.

9ļøāƒ£ DLQ vs Replay (Philosophical Difference)

SQS DLQ

  • ā€œThis message is brokenā€

  • Isolate and move on

Correctness boundary

Kafka Replay

  • ā€œProcessing was wrongā€

  • Fix logic

  • Re-run history

Correctness recovery

10ļøāƒ£ Backpressure & Load (Correctness Angle)

SQS

  • Producers keep producing

  • Queue depth grows

  • Processing delay grows silently

Correctness risk:

  • Time-sensitive events become meaningless

Kafka

  • Consumers lag

  • Lag is observable

  • Replay window bounded

Correctness risk:

  • Lag-based staleness

  • But visible and measurable

1ļøāƒ£1ļøāƒ£ Exactly-Once Semantics (Reality Check)

SQS

āŒ Impossible by design

Kafka

āš ļø Possible only if:

  • Idempotent producers

  • Transactional writes

  • Single Kafka cluster

  • Controlled sinks

Even then:

Exactly-once is contextual, not absolute.

1ļøāƒ£2ļøāƒ£ Multi-Consumer Correctness

SQS

Each message → one consumer

Bad for:

  • Fan-out

  • Independent consumers

You must duplicate queues.

Kafka

Many consumers can read same log.

Good for:

  • Analytics

  • Auditing

  • Side effects

Kafka separates event storage from event consumption.

1ļøāƒ£3ļøāƒ£ D2 — Mental Model Comparison

SQS

Kafka

1ļøāƒ£4ļøāƒ£ When SQS Is the Correct Choice

Use SQS when:

  • You want work distribution

  • You don’t care about history

  • You want minimal ops

  • You can tolerate duplicates

  • You want managed simplicity

Examples:

  • Email sending

  • Image processing

  • Background jobs

  • Async side effects

1ļøāƒ£5ļøāƒ£ When Kafka Is the Correct Choice

Use Kafka when:

  • You need event history

  • Replay matters

  • Debuggability matters

  • Multiple consumers exist

  • Correctness > simplicity

Examples:

  • Payments

  • Order lifecycle

  • Analytics

  • CDC

  • Audit logs

1ļøāƒ£6ļøāƒ£ The Hidden Cost Tradeoff

Dimension SQS Kafka
Ops cost Low High
Correctness recovery āŒ āœ…
Debuggability āŒ āœ…
Time travel āŒ āœ…
Simplicity āœ… āŒ

Kafka charges operational complexity
in exchange for correctness leverage.

1ļøāƒ£7ļøāƒ£ Design Rule (Hard-Won)

If you cannot afford to lose history, do not use SQS.
If you cannot afford operational complexity, do not use Kafka.