- 2 Minute Streaming
- mTLS 🤝 Kafka (2 minute kafka)
mTLS 🤝 Kafka (2 minute kafka)
3 simple examples of mTLS Kafka security setups in Uber, ZenDesk & Wise
mTLS is a complex way of securing your Kafka deployment. Let’s define the terms:
☂️ TLS - a protocol for encrypting the data before it travels over the wire, so that no bad actor can inspect it. It’s used interchangeably with SSL, although that’s the older protocol.
The way it works is that the broker has a signed certificate on it. The client then verifies the certificate to create an encrypted connection.
☂️ mTLS - mutual TLS. 🤝 (TLS + Auth)
Here, the client also has a signed certificate. ⭐️
Both the broker and client verify each other’s certificates, and this allows the broker to authenticate the client since it now knows WHO that client is.
❌ Problems With It
It’s notoriously hard to manage mTLS.
It requires extra infrastructure around Kafka and your apps to ensure you can rotate certificates properly.
You NEED to be able to revoke certificates efficiency for cases where:
💧 the certificate leaks (equivalent of your password being leaked)
💩 is stale enough that requires a refresh (proper security practice to refresh every now and then)
And, unfortunately, there is no industry-wide consensus on how to revoke.
The two most popular ones - Certificate Revocation List (CRLs) and Online Certificate Status Protocol (OCSP) have their own set of problems.
Not all clients or Certificate Authorities (CAs) implement them consistently. The tooling and library support isn’t exactly there.
What Do Companies Use?
The reason many engineers are against mTLS is because of the lackluster support, resulting in every company implementing it in their own way.
This risks security gaps and maintenance overhead in the future.
The Manage-It-Yourself Example
Vault for a CA generating X.509 certs.
Consul for the source of truth regarding who the CA is.
a PKI Auth Manager sidecar to generate certs remotely in vault & store them locally.
a TLS Monitor sidecar to watch for these changes & make sure Kafka reloads the cert.
The Wise Example (pun intended)
In the two other examples I have, SPIFFE and Spire are used.
🔹 SPIFFE (Secure Production Identity Framework For Everyone) - is a framework and set of standards for secure cross-service communication and identification. It is a graduated cloud-native foundation project (CNCF).
🔹 Spire - a reference implementation of SPIFFE, also a graduated CNCF project.
This automates cert rotation and expiry for you.
SPIFFE relies on frequent cert expiry as the mechanism to handle leaked certificates.
See how Uber does it:
Uber has one of the largest Kafka deployments in the WORLD…
So how do they secure it?
mTLS and strong authorization rules.
Uber models its production environment as a zero-trust network. Since any host can be compromised, they rely on strong cryptographic primitives to… twitter.com/i/web/status/1…
— Stanislav Kozlovski (@BdKozlovski)
Dec 3, 2023
Like and Share too if you like it (please)
Liked this edition? 😎
Help support our growth so that we can continue to deliver valuable content!
With TLS, we lose the zero-copy optimization too! If you’d like to read our 2-minute edition regarding that, see:
Apache®, Apache Kafka®, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.