How Batch Push to SQS Can Still Melt Your Consumers

4 Apr 20224 min read

At first glance, batching with Amazon Web Services SQS looks like the perfect optimization.

Send 10 messages in one API call.
Reduce network overhead.
Increase throughput.
Lower cost.

So teams aggressively batch-produce messages into Amazon Simple Queue Service queues.

Everything works beautifully in staging.

Then production traffic arrives.

Suddenly:

Consumers start slowing down.
Delete APIs begin throttling.
Messages reappear.
Duplicate processing starts happening.
Queue depth explodes.

And the surprising part?

The bottleneck was not sending messages. It was deleting them.

The Architecture Most Teams Build

A common architecture looks like this:

Producer Service
      |
      | BatchSendMessage (10 msgs/request)
      v
+-------------------+
|       SQS         |
+-------------------+
      |
      v
Consumer Workers
      |
      | process each message
      |
      | DeleteMessage
      v
SQS Acknowledgement

The producer uses:

SendMessageBatch

which sends up to 10 messages per request.

This is efficient.

But many teams unknowingly do this on the consumer side:

for (Message msg : messages) {
    process(msg);
    sqs.deleteMessage(receiptHandle);
}

That means:

1 batch receive call
BUT 10 individual delete calls

This becomes dangerous at scale.

The Hidden Problem

Imagine this traffic:

Metric	Value
Messages/sec	100,000
Batch size	10
Receive requests/sec	10,000
Delete requests/sec	100,000

Even though producers optimized API usage with batching, consumers accidentally multiplied API traffic again during deletion.

Now your system performs:

10x more delete requests
Higher TCP overhead
More AWS API throttling
Increased latency
Retry storms

The queue itself becomes healthy.

But the acknowledgement path collapses.

What Happens During Throttling

When DeleteMessage gets throttled:

Consumer processed message successfully
        |
DeleteMessage failed
        |
Visibility timeout expires
        |
Message becomes visible again
        |
Another consumer reprocesses it

Now duplicate processing begins.

This creates secondary problems:

Duplicate payments
Duplicate emails
Double inventory deduction
Repeated notifications
Idempotency pressure on downstream systems

The real issue was never SQS delivery.

It was acknowledgement scalability.

The Dangerous Feedback Loop

This creates a nasty feedback cycle.

Delete throttling
      ↓
Messages reappear
      ↓
Consumers receive more messages
      ↓
More delete attempts
      ↓
Even more throttling

Eventually:

Consumer CPU spikes
Retry queues grow
Visibility timeout tuning becomes unstable
DLQs start filling

Teams often incorrectly blame:

SQS
AWS networking
Consumer autoscaling
Visibility timeout settings

But the root cause is usually:

Individual deletes after batched receives.

The Correct Design

If you batch receive messages:

ReceiveMessage(max=10)

you should also batch delete them:

DeleteMessageBatch

Correct architecture:

Producer
   |
BatchSendMessage
   |
   v
SQS
   |
BatchReceiveMessage
   |
Consumer
   |
Process all successful messages
   |
DeleteMessageBatch

Now instead of:

100,000 delete requests/sec

you get:

10,000 delete requests/sec

That is a massive reduction.

Why This Improves More Than Cost

Most people think batching is only about reducing AWS billing.

But batching also improves:

1. Network Efficiency

Fewer:

TLS handshakes
TCP packets
HTTP requests

2. Better Consumer Throughput

Workers spend less time waiting on acknowledgement APIs.

3. Lower Retry Amplification

Throttling probability drops significantly.

4. More Stable Visibility Timeouts

Messages are acknowledged faster and more consistently.

5. Better Horizontal Scaling

Consumers can scale without overwhelming SQS APIs.

The Production-Grade Consumer Pattern

A robust consumer flow usually looks like this:

1. Receive batch of messages
2. Process in parallel
3. Track successful messages
4. Batch delete only successful ones
5. Retry failed messages later

Pseudo-flow:

List<Message> successful = new ArrayList<>();

for (Message msg : messages) {
    try {
        process(msg);
        successful.add(msg);
    } catch (Exception ex) {
        log.error("processing failed");
    }
}

sqs.deleteMessageBatch(successful);

This avoids:

deleting failed messages
unnecessary retries
excessive API calls

Another Common Mistake

Some systems do this:

Receive 10 messages
Process 1 message
Immediately delete 1 message
Repeat

This destroys batching benefits entirely.

Instead:

accumulate acknowledgements
flush periodically
batch deletes intelligently

Many high-scale systems maintain:

in-memory delete buffers
timed flush intervals
max batch thresholds

Exactly like how Kafka producers batch writes internally.

Real-World Scaling Insight

At large scale, queue systems are rarely bottlenecked by:

enqueue throughput
storage
message delivery

They are bottlenecked by:

acknowledgements
retries
visibility timeout churn
duplicate processing amplification

The “delete path” becomes the real scalability limit.

That is why production-grade messaging systems obsess over:

batch acknowledgements
offset commits
checkpointing
ack aggregation

Even Apache Kafka fundamentally optimizes around efficient acknowledgements using offset commits.

Final Takeaway

Batching only the producer side is half an optimization.

If you:

batch send
batch receive
BUT individually delete

then your architecture still behaves like a high-request-rate system.

The real optimization comes when the entire pipeline becomes batch-aware:

Batch Produce
    ↓
Batch Consume
    ↓
Batch Acknowledge

In distributed systems, the slowest path is often not processing.

It is coordination.

And in SQS-based systems, deletion is coordination.