Zero Copy Basics

0️⃣💾 the most concise explanation of the operating system's zero-copy concept in 2 minutes

Stanislav Kozlovski
August 07, 2023

Zero Copy

If you’ve ever read about Kafka, a particular optimization it makes use of might have caught your eye — the operating system’s zero-copy optimization.

❝

A zero-copy operation is one which does not make unnecessary copies of the data.
(it doesn’t actually mean you make literally zero copies)

In Kafka’s case → it is when the OS copies the data from the page cache directly into the socket buffer, effectively bypassing the Kafka broker Java program entirely.

This saves you a few extra copies and user <-> kernel mode switches.

Let us follow an example:

No Zero Copy

If your app’s job is to read a file from the disk and send it over the network, a bunch of unnecessary copies and user/kernel mode switches can be made.

Some terminology:

read buffer - this is the OS page cache.
socket buffer - this is an OS byte buffer for managing packets.
NIC buffer - a byte buffer in the network card.
DMA copy - DMA stands for Direct Memory Access - a feature in memory controllers, which allows hardware (graphic card, sound card, network card, etc.) to access the memory (RAM) without the CPU’s involvement.

In this example, we have 4 mode switches and 4 data copies.

app initiates the disk → OS buffer DMA copy (user → kernel mode)
read buffer → app buffer copy (kernel → user mode)
1. (steps 1, 2 can be run in a loop if you have to read more than what the read buffer can hold)
app → socket buffer copy (user → kernel mode)
socket buffer → NIC buffer DMA copy (kernel → user mode after the response is written out)

We can do better.

Zero Copy

Kafka stores the data in the same binary format it responds to requests with.

It made no sense to do the original steps 2 and 3, as Kafka didn’t do anything with the given data - it would simply pass it back to the kernel.

With zero-copy, the data is NOT copied to Kafka - it directly goes to the NIC buffer.

Notice that there is another optimization here - the read buffer directly copies data to the NIC buffer - not to the socket buffer.

This is the so-called scatter-gather operation (a.k.a Vectorized I/O).

scatter-gather - the act of only storing read buffer pointers in the socket buffer, and having the DMA engine read those addresses directly from memory.

The end result?

2 user/kernel mode switches. (2 less)
2 DMA copies (the same)
1 miniscule CPU copy of pointers. (2 less)

In Kafka

And now for the hard truth - zero-copy isn’t that impactful in most Kafka deployments.

CPU is rarely the bottleneck. The network gets saturated much faster, so the lack of in-memory copies doesn’t move the needle in most cases.

Plus, encryption & SSL/TLS already prohibit Kafka from using zero-copy - and Kafka still performs!

Liked this edition?

Help support our growth so that we can continue to deliver value!

And if you really enjoy the newsletter in general - please forward it to an engineer. It only takes 5 seconds. Writing it takes me 5+ hours.

🗣This Week’s Socials

I’ve started posting a bit less frequently on the socials recently, and I do expect it to stay intermittent throughout next month (traveling) - but the ones I did post, I reckon, were very good:

💦 Kafka’s High Watermark Offset in 2 minutes

I used to misunderstand Kafka’s concept of the high watermark.
That’s normal - the concept is entangled with advanced distributed systems terms - replication, fault tolerance & durability.
But once I visualized it, it became clear.
Let me explain it first, and then you can… httptwitter.com/i/web/status/1…p
— Stanislav Kozlovski (@BdKozlovski)
3:29 PM • Jul 22, 2023

🧩 A Visualization of Replica Distribution in Kafka

One thing determines your long-term success in managing your own Apache Kafka cluster…
Replica distribution.
It’s probably THE determining factor in how well your cluster scales and is used throughout its lifespan.
It is a huge topic in Apache Kafka. (no pun intended)
And… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
7:08 AM • Aug 1, 2023

⭐️ Uber’s Optimal Real-Time Data Analytics Infra Stack

This is what an optimal real-time analytics data infrastructure looks like:👇
Uber has paved the way in showing how to both:
• build infrastructure to support massive amounts of data.
• leverage the data in diverse, often conflicting, use cases.
Each day, they process… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:06 PM • Jul 19, 2023

🔥 10 Performance Tips to 10x Kafka

Ever wish someone had made a performance checklist for Kafka?
Here it is. 🔥
PS: What did I miss? twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
3:46 PM • Jul 21, 2023

👩‍🏫 14 Lessons from Atlassian’s 14 Day Outage

🚨 3AM: wake up by page.
😳 4AM: Realize customer's deployment is deleted.
😥 5AM: Scramble & page teams to find remediation.
😵‍💫 11PM: Contact affected customers.
💀 14 Days Later: Incident fully mitigated.
This should scare you.
Which lesson would have prevented this? http
— Stanislav Kozlovski (@BdKozlovski)
5:28 AM • Jul 28, 2023

🐘 How Cloudflare Manages PostgreSQL

Cloudflare serves around 20% of the web with 46 million requests a second.
Surely they must have a lot of data.
Where do they store it?
Plain old PostgreSQL. 🐘
Around 15-20 clusters of them.
Each cluster consists of 3 servers split into two regions.
The primary region is… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
7:08 AM • Jul 26, 2023

🛡🐘 PgBouncer at Cloudflare

A company like Cloudflare knows a thing or two about protecting systems against client stampedes.
If Cloudflare protects customers with their DDoS Protection, Firewall & DNS Resolver products...
...and PostgreSQL serves the transactional workloads for those services… httptwitter.com/i/web/status/1…p
— Stanislav Kozlovski (@BdKozlovski)
2:23 PM • Aug 3, 2023

The Secret to Startup Success? Boring Technology 🥱

The best startups don’t use the latest fancy language, framework, or cloud service.
The best startups use:
boring technology.
Today’s case in point:
👉 Loom & Postgres 🐘
Loom is a nifty little app that allows you to quickly send out screen + video recordings - a sort of… httptwitter.com/i/web/status/1…p
— Stanislav Kozlovski (@BdKozlovski)
2:10 PM • Aug 2, 2023

🛖 Abstracting Kafka Streams & Flink at Wise

😰 Writing your stream processing with low-level APIs
😠 Writing your stream processing with high-level abstractions
What's right?
The right abstraction, of course! 😇
But most companies and projects lack them.
Which is normal - as one size does not fit all.
The only way to… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
6:25 PM • Jul 20, 2023

_{Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks}_.