2 Minute Streaming
Posts
KIP-405: Tiered Storage

KIP-405: Tiered Storage

🍫in which Kafka goes on a diet to lose a lot of weight (around 10TB+)

Stanislav Kozlovski
June 09, 2023

If you’re using Kafka to its full extent, you’re storing a lot of historical data in it.

But. This can be a bottleneck.

A key limitation in Kafka’s design is that it couples its storage with its compute.

The Problems

⛔️ Scale Up/Down

If you want to reassign partitions, you have to move all the data associated with them.

A simple 10TB disk on a broker can take 27.7 hours to move its data at a decent 100MB/s replication rate. You cannot react timely to any workload change with that.

⛔️ Disk Loss

If your broker dies and the disk is wiped out, it has to start from scratch with an empty disk and replicate in all that 10TB of data. Same problem.

There is room to improve this by 120x - turn a 4-hour recovery time down to 2 minutes. 👇

⛔️ Hard Restarts

When the broker starts up from an ungraceful shutdown, it has to rebuild all the local log index files associated with its partitions in a process called log recovery. With a 10TB disk, this can take many hours or even days.

⛔️ Competition for IOPS

HDDs have improved exponentially since their existence, but some parts have not kept pace. They have been stuck at roughly 120 IOPS for the last two decades.

When consumers try to read historical data, Kafka is forced to read from the disk (as the data isn’t in page cache) and that uses up the precious IOPS on the HDD.

Tests from KIP-405 showed a 43% producer performance decrease when historical consumers were present.

⛔️ Tail Latency

Similarly, latency has not kept pace with HDD improvements at all.

Capacity for HDDs has increased around 48,000 times faster than latency, which means that:

❝

On a per-byte basis, HDDs are becoming slower.

Due to their nature, HDDs are more susceptible to higher outlier latencies than SSDs. In an increasingly latency-sensitive world, this is unacceptable.

KIP-405: Tiered Storage

Tiered storage is the simple idea of storing most of the broker’s data in another server - e.g S3.

KIP-405 introduces this by adding a pluggable external store.

Pluggable is a key word here, as it will enable the open-source community to develop different implementations for different external stores in parallel.

With this, Kafka will have two tiers of storage:

local (hot)
remote (cold)

This will abstracted away seamlessly - clients will not be able to tell where they’re fetching data from.

Tiered Storage visualization

Leader brokers are responsible for tiering the data there (persisting it).

Both leaders/followers can read from the remote store to serve historical data to consumers.

You will be able to enable tiered storage uniquely per topic, with varying local and remote retention settings.

Tests showed a minimal latency impact - 21ms → 25ms of p99 produce latency.

When will it release?

It just missed the 3.5 release and is currently slated for Early Access in 3.6 - so around September 2023.

…

More Kafka? 🔥

I posted two very interesting things on social media today.

S3 Deep Dive

A 34-page long slide deck with a deep dive into S3. While this isn’t related to Kafka, the content was so interesting and groundbreaking (to me) that I had to share it. It is essentially the story of leveraging a massively multi-tenant cloud native design at extreme scale to offer something that would be impossible otherwise. See it here:

Stanislav Kozlovski on LinkedIn: AWS S3 Deep Dive

The best post about AWS S3 that you will find on LinkedIn. Dive with me as we explore their system: 🔥 - 400 TERABITS of throughput a second 💥 - 100 MILLION…

www.linkedin.com/posts/stanislavkozlovski_aws-s3-deep-dive-activity-7072826135792754688-I5pY?utm_source=share&utm_medium=member_desktop

Top 3 Kafka Metrics

If you operate Kafka in any way in production, you will be interested in this. 🔥

I share the top 3 Kafka metrics you need to monitor, plus some gifts 🎁

See it here:

If you can steal one ops tip from me, let it be this:
The top 3 Kafka Metrics you need to monitor:
1.  ❌ Offline Partition
2.  🚩 Under Min ISR Partitions
3.  🔶 Under Replicated Partition (URP)
👇
PS: I’ll leave you with a gift at the end 🎁
— Stanislav Kozlovski (@BdKozlovski)
7:35 AM • Jun 9, 2023

HDD → SSD → HDD

To keep it on topic with tiered storage, I posted a few words about all the cost-benefits Kafka can enjoy from tiered storage by moving to SSDs:

The more things change, the more they stay the same…
Kafka was heavily optimized to run on HDDs. 💿
🔥 - random seeks on HDDs are slow, but linear reads/writes can be fast.
🔥 - log data structure perfectly maps to linear reads/writes.
🔥 - linear reads/writes work well with… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
9:52 AM • Jun 8, 2023

People are telling me that my meme game is only getting better 😁

_{Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks}_.

KIP-405: Tiered Storage

🍫in which Kafka goes on a diet to lose a lot of weight (around 10TB+)

The Problems

⛔️ Scale Up/Down

⛔️ Disk Loss

⛔️ Hard Restarts

⛔️ Competition for IOPS

⛔️ Tail Latency

KIP-405: Tiered Storage

When will it release?

Meta

More Kafka? 🔥

S3 Deep Dive

Top 3 Kafka Metrics

HDD → SSD → HDD