- 2 Minute Streaming
- Posts
- KIP-848: The New Kafka Consumer Group Protocol
KIP-848: The New Kafka Consumer Group Protocol
a 2-min intro to the complete re-design of consumer groups in Kafka
Ah, Consumer Groups…
Many changes are coming to them.
(…and I’m not talking about KIP-932 and its Queue / Share Consumer Groups either…)
First, in case you don’t know the basics 👉
Second, introducing…
✨ KIP-848: The Next Generation of the Consumer Group Protocol ✨
A complete re-design, the newly-proposed protocol fixes many of the problems in the current version of consumer groups:
⛔️ scalability cap
🤯 complexity
🐛 bugs
🐌 slow fixes
🔍 hard to debug
⚙️ too extendable
😵💫 interoperability
😢 inconsistent metadata
And it fixes it with these four pillars:
1. Logic: Client Side → Broker Side
no more group leader, assignment is done in the broker.
2. Rebalance Disruption: Stop the World → Incremental
it is much more incremental in a finer-grained sense.
when a new member joins, you don’t need to wait for all N members (that the group leader used to choose) to rebalance/revoke their partitions.
It can incrementally start getting partitions from the first (fastest) member to revoke.
When you think about it, a rebalance is simply a reassignment of some partitions from some consumers to others. 💡
Why does the whole group need to stop and know about this?
It doesn’t.
3. Protocol: Generic → Consumer
make it a consumer-optimized protocol.
4. Model: Pull-based → Push-Based
the broker now pushes the new partition assignments onto the consumers, instead of them pulling it from the broker (which used to pull it from the group leader).
🤔 How?
Elegantly.
All is streamlined within a new Heartbeat API.
The protocol maintains three types of epochs:
Group Epoch - tracks when a new rebalance should happen
Assignment Epoch - tracks the generation of the assignment
Member Epoch - tracks each member’s last-synced epoch
💡 A new assignment is computed every time the group epoch changes.
All epochs should converge to be the same number.
The order is the following:
the group-wide epoch is bumped.
the target assignment epoch is bumped.
consumers catch up to the member epoch via the heartbeat request, individually. (fine-grained)
In general, what you have is a simple state machine inside the Group Coordinator broker that’s running a constant reconciliation loop. 💥
And for those that learn better through visuals?
Your humble author has a video for you:
Other Details
the session timeout is now defined on the server -
group_consumer_session_timeout_ms
.the heartbeat interval is also now defined on the server -
group_consumer_heartbeat_interval_ms
.static membership (group_instance_id) is still supported.
Extended Protocol - extra APIs are added to extend the protocol dance for power-users like Kafka Streams that still want to have a client-side assignor.
A preview of the protocol is targeting shipping with Apache Kafka 3.7.
Liked this edition?
You won’t find Kafka content like this elsewhere on the internet. So make sure to subscribe.
And help spread the word to support our growth so that we can continue to deliver valuable content!
And if you really enjoy the newsletter in general - please forward it to your team. It only takes 5 seconds. Writing it takes me 5+ hours.