How Batch Push to SQS Can Still Melt Your Consumers
At first glance, batching with Amazon Web Services SQS looks like the perfect optimization.
Send 10 messages in one API call.
Reduce network overhead.
Increase throughput.
Lower cost.
So teams aggressively batch-produce messages into Amazon Simple Queue Service queues.
Everything works beautifully in staging.
Then production traffic arrives.
Suddenly:
Consumers start slowing down.
Delete APIs begin throttling.
Messages reappear.
Duplicate processing starts happening.
Queue depth explodes.
And the surprising part?
The bottleneck was not sending messages. It was deleting them.
The Architecture Most Teams Build
A common architecture looks like this:
Producer Service
|
| BatchSendMessage (10 msgs/request)
v
+-------------------+
| SQS |
+-------------------+
|
v
Consumer Workers
|
| process each message
|
| DeleteMessage
v
SQS Acknowledgement
The producer uses:
SendMessageBatch
which sends up to 10 messages per request.
This is efficient.
But many teams unknowingly do this on the consumer side:
for (Message msg : messages) {
process(msg);
sqs.deleteMessage(receiptHandle);
}
That means:
1 batch receive call
BUT 10 individual delete calls
This becomes dangerous at scale.
The Hidden Problem
Imagine this traffic:
| Metric | Value |
|---|---|
| Messages/sec | 100,000 |
| Batch size | 10 |
| Receive requests/sec | 10,000 |
| Delete requests/sec | 100,000 |
Even though producers optimized API usage with batching, consumers accidentally multiplied API traffic again during deletion.
Now your system performs:
10x more delete requests
Higher TCP overhead
More AWS API throttling
Increased latency
Retry storms
The queue itself becomes healthy.
But the acknowledgement path collapses.
What Happens During Throttling
When DeleteMessage gets throttled:
Consumer processed message successfully
|
DeleteMessage failed
|
Visibility timeout expires
|
Message becomes visible again
|
Another consumer reprocesses it
Now duplicate processing begins.
This creates secondary problems:
Duplicate payments
Duplicate emails
Double inventory deduction
Repeated notifications
Idempotency pressure on downstream systems
The real issue was never SQS delivery.
It was acknowledgement scalability.
The Dangerous Feedback Loop
This creates a nasty feedback cycle.
Delete throttling
↓
Messages reappear
↓
Consumers receive more messages
↓
More delete attempts
↓
Even more throttling
Eventually:
Consumer CPU spikes
Retry queues grow
Visibility timeout tuning becomes unstable
DLQs start filling
Teams often incorrectly blame:
SQS
AWS networking
Consumer autoscaling
Visibility timeout settings
But the root cause is usually:
Individual deletes after batched receives.
The Correct Design
If you batch receive messages:
ReceiveMessage(max=10)
you should also batch delete them:
DeleteMessageBatch
Correct architecture:
Producer
|
BatchSendMessage
|
v
SQS
|
BatchReceiveMessage
|
Consumer
|
Process all successful messages
|
DeleteMessageBatch
Now instead of:
100,000 delete requests/sec
you get:
10,000 delete requests/sec
That is a massive reduction.
Why This Improves More Than Cost
Most people think batching is only about reducing AWS billing.
But batching also improves:
1. Network Efficiency
Fewer:
TLS handshakes
TCP packets
HTTP requests
2. Better Consumer Throughput
Workers spend less time waiting on acknowledgement APIs.
3. Lower Retry Amplification
Throttling probability drops significantly.
4. More Stable Visibility Timeouts
Messages are acknowledged faster and more consistently.
5. Better Horizontal Scaling
Consumers can scale without overwhelming SQS APIs.
The Production-Grade Consumer Pattern
A robust consumer flow usually looks like this:
1. Receive batch of messages
2. Process in parallel
3. Track successful messages
4. Batch delete only successful ones
5. Retry failed messages later
Pseudo-flow:
List<Message> successful = new ArrayList<>();
for (Message msg : messages) {
try {
process(msg);
successful.add(msg);
} catch (Exception ex) {
log.error("processing failed");
}
}
sqs.deleteMessageBatch(successful);
This avoids:
deleting failed messages
unnecessary retries
excessive API calls
Another Common Mistake
Some systems do this:
Receive 10 messages
Process 1 message
Immediately delete 1 message
Repeat
This destroys batching benefits entirely.
Instead:
accumulate acknowledgements
flush periodically
batch deletes intelligently
Many high-scale systems maintain:
in-memory delete buffers
timed flush intervals
max batch thresholds
Exactly like how Kafka producers batch writes internally.
Real-World Scaling Insight
At large scale, queue systems are rarely bottlenecked by:
enqueue throughput
storage
message delivery
They are bottlenecked by:
acknowledgements
retries
visibility timeout churn
duplicate processing amplification
The “delete path” becomes the real scalability limit.
That is why production-grade messaging systems obsess over:
batch acknowledgements
offset commits
checkpointing
ack aggregation
Even Apache Kafka fundamentally optimizes around efficient acknowledgements using offset commits.
Final Takeaway
Batching only the producer side is half an optimization.
If you:
batch send
batch receive
BUT individually delete
then your architecture still behaves like a high-request-rate system.
The real optimization comes when the entire pipeline becomes batch-aware:
Batch Produce
↓
Batch Consume
↓
Batch Acknowledge
In distributed systems, the slowest path is often not processing.
It is coordination.
And in SQS-based systems, deletion is coordination.
