Queue Semantics and Share Consumer Groups in Apache Kafka (KIP-932)
🚇 no more over-partitioning when all you need is a queue.
Kafka’s regular consumer groups are great for scalability and preserving message order.
But. They don’t address all use cases.
The ordering requirement forces a single consumer to have exclusive access to a set of partitions - a one-to-many mapping.
Scaling consumer groups up can become tricky - you can’t have more consumers than partitions - at most one-to-one.
So, people usually over-partition because of this.
Further, there is no real per-record state. Consumers pull many messages and only advance to an offset if all records before that have been processed.
you don’t need ordering
you need per-record acknowledgement/retries
The current model doesn’t work well with traditional message queue semantics. Those usually support the use case of a job queue - many agents pulling from a pool of jobs in some order.
KIP-932 - Queues for Kafka
KIP-932 proposes to extend Kafka to introduce the notion of share (consumer) groups.
Share groups allow a many-to-many mapping of partitions → consumers.
They are basically a different flavor of consumer groups.
One doesn’t even need to write much new code - a shared group can be run with the exact same API that Kafka consumers support today, just a different configuration (
Share groups treat the whole topic as a single queue.
All the consumers read from all of the partitions - there is no sticky mapping.
This allows you to have a job queue with the extra Kafka benefits of:
✅ no max queue depth
✅ the ability to replay records
✅ Kafka’s greater ecosystem
These queues have at least once semantics. There is also no order.
It elegantly allows you to use Kafka as if it were a queue without changing anything underlying:
How it Works 🤔
Brokers will have a new component that manages share groups - the Share Group Coordinator.
It will keep a sliding window of records per partition which will have every record inside that window be consumable by the share group. These are called the in-flight records.
The broker will track the state of processing for each record in that window, and serve them to share consumers. When consumers acknowledge their consumption, the window will progressively move forward.
Once fetched, the record will be marked as acquired by that particular consumer. A lock is kept on them for up to
share.record.lock.duration\.ms so that if the consumer crashes, the message gets released.
Poison records are also handled - the delivery attempts of acquired-but-not-processed messages are counted, up to a maximum. After that, the record is marked as archived and not sent anymore.
Help support our growth so that we can continue delivering valuable content!
More Kafka? 🔥
🦢The most popular social media post of the week was not about Kafka - rather, S3’s 99.999999999% durability and its Durable Chain of Custody:
Amazon S3 processes more than 100,000,000 requests a second.
And it offers 11 nines of durability - 99.999999999%.
It’s not easy.
Any theoretical failure that could happen - they’ve already hit.
Durability is usually thought of as something that’s inside the system - but… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
Jun 21, 2023
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.