2 Minute Streaming
Posts
Apache Kafka Lines of Code Analysis (Java, Scala, Python)

Apache Kafka Lines of Code Analysis (Java, Scala, Python)

🔍 an analysis of apache kafka's codebase from versions 0.7.2 to 3.7.0

Stanislav Kozlovski
January 15, 2024

Talk is Cheap. Show Me The Code

A distributed streaming platform like Kafka is the infrastructure backbone of many, many companies today.

But how much code does it actually take to create such a platform?

Around 1.2 million.

That is… a lot of code when you think about it.

Apache Kafka 3.7 lines of code by module

If we have to give them a rough grouping:

Backend Server (420k)
- core - 242k
- metadata - 57k
- group-coordinator - 50k
- storage - 28k
- raft - 27k
- server-common - 16k
Clients (329k)
- clients - 287k
- tools - 24k
- trogdor - 18k
Kafka Streams
- streams - 329k
Connect
- connect - 134k

We see it starts to make more sense, as the Kafka repository is well split between a few different projects.

🐣 Started From the Bottom, Now We’re Here

Can you guess how many lines of code Kafka started with?

Picture a number in your mind.

I will now give you a hint: the repository grew at an average rate of 24% per release. There were 24 releases. (cue mental algebra 🙂 )

❝

…

Kafka started with 24,400 lines of code!

That’s literally as much as the tools module today!

But then. Developers started cracking…

Release over release code growth rate (in percentage)

The very first release tripled the codebase’s size, and each of the two subsequent roughly doubled it.

With such a growth rate, you didn’t need many releases to grow the size substantially.

2012 - 24k
2015 - 138k
2017 - 400k
2020 - 726k
2022 - 994k
(start of) 2024 - 1,262k

One thing is clear - development has NOT slowed at all.

If anything, Kafka is having more code contributed to it than ever.

Talk about a healthy community.

👑 Top Contributors

What’s a newsletter without some shout outs?

While many people have written a lot of code, the top contributors in Apache Kafka have consistently contributed for the last ~7 years. This is an amazing feat.

Ismael Juma 👑
Jason Gustafson
Guozhang Wang
Matthias J. Sax
Colin Patrick McCabe
Rajini Sivaram
(many more, but this is all that fit on my screen)

Thank you to all the open source contributors!

🐍 Languages

Kafka is mainly Java. But there was a fair amount of Scala code back in the day, and large parts of the server (core) are still in Scala.

The project is slowly migrating away from Scala by simply writing most new code in Java. For example - all the new major Kafka features are written in Java:

KIP-848: The Next Generation of the Consumer Group Protocol
KIP-405: Tiered Storage
KRaft
The Java codebase growing at hyperspeed

Bonus: Largest Files

As a final present - I present you the top 5 Java/Scala files in terms of size. Challenge yourself to go read some of them 😉

📝 Less-Interesting Notes

this data was counted via cloc
we only count the three main languages in Kafka - Java, Scala and Python
we count blank lines, comments and code lines
the data includes test code, which is in all likelihood a large majority of the codebase (that’s a good thing!)

More than half of Kafka’s code is test code!

Liked this edition?

Help support our growth so that we can continue to deliver valuable content!

🗣This Week’s Socials

We were quiet on the socials this time.

🍳 Apache Kafka 101 in 1 minute

Back to basics, this is a great share with any newbie you know:

Apache Kafka 101 in 1 minute. 🔥
Let’s go! 👇
It’s a distributed commit log.
A log is the simplest data structure - an ordered sequence of records that only supports appends.
🔒 It’s immutable, so you can’t delete or edit the records in place.
Kafka stores its data in topics.… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:23 PM • Jan 13, 2024

_{Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks}_.