SlideShare a Scribd company logo
WarpStream
Beyond Tiered Storage: Serverless Kafka with No Local Disks
● Cloud disks are expensive.
● Long retention workloads TCO can
be 80% disk cost even at low
throughput
● EBS vs instance storage doesn’t
matter
○ double/triple replication
expensive compared to S3
2
EBS (GP2) $0.1/GiB pre-replication
EBS (GP2) $0.3/GiB post-replication
S3 $0.02/GiB post-replication
Why Tiered Storage?
Retention
Tiered storage helps, but not enough
● Stateful brokers with attached
storage make operations complex,
difficult, and inelastic
● Requires consensus, topic-partition
leaders, custom operations for
scaling in/out and doing node
replacements
● Balancing in general always a
problem
4
Why Drop the Disks?
Operations
● 80%+ of TCO for high throughput
Apache Kafka clusters can be
networking fees
● $0.053 / compressed GiB
transferred in 100% ideal
conditions with fetch-from-follower
enabled
5
Why Drop the Disks?
Networking
Zero disks would be better
WarpStream’s Cloud-Native design
7
greatly reduces the cost and
complexity of Apache Kafka
How It Works
● Entire storage engine redesigned
around minimizing PUT / GET
operations
9
● S3 PUT: $0.000005
● S3 GET: $0.0000004
Optimize for Cloud
Unit Economics
● Entire storage engine redesigned
around minimizing PUT / GET
operations
● Networking is free, and storage is
cheap.
10
Optimize for Cloud
Unit Economics
● S3 PUT: $0.000005
● S3 GET: $0.0000004
● S3 Storage: $0.023/GB-mo
● S3 Cross-AZ Networking: Free
Step 1: Eliminate per-TopicPartition Files
11
��
��
��
Step 1: Eliminate per-TopicPartition Files
12
Step 1: Eliminate per-TopicPartition Files
13
Step 2: Separate Data from Metadata
14
No per-TopicPartition leaders!
15
Step 3: Introduce
Data Locality for
Live Reads
Step 4: Introduce
Data Locality for
Historical Reads
16
17
Step 4: Introduce
Data Locality for
Historical Reads
18
Hard Mode: Compacted Topics
19
Hard Mode: Compacted Topics
● Tiered Storage in open source Apache Kafka does not support compacted
topics
● WarpStream already does compaction internally
● … how hard could it possibly be?
20
Hard Mode: Compacted Topics
21
Hard Mode: Compacted Topics
22
Hard Mode: Compacted Topics
23
Hard Mode: Compacted Topics
24
Hard Mode: Compacted Topics
25
Hard Mode: Compacted Topics
WarpStream costs ~85% less than self-hosted Kafka for high volume workloads
26
Deployment
Model
Workload Profile Hardware Network Object Storage Total Costs
WarpStream Avg. ingress: 1 GiB/sec
Avg. egress: 3 GiB/sec
Retention: 1 day
Replication Factor = 3
3 Availability Zones
$223k/year $<2k/year $61k/year $286k/year
Self Hosted
Apache Kafka
$223k/year $1.68M/year $0 $1.9M/year
WarpStream Is Still Real Time
27
● P99 producer latency of ~400ms
● Producer to Consumer – End to
End Latency <1.5s
WarpStream Supports S3 Express One Zone
28
● P99 producer latency as low as
150ms
● Uses a majority quorum of 3
buckets to provide regional
high-availability
● Data is moved to S3 Standard
asynchronously to reduce
storage costs
29

More Related Content

Similar to Beyond Tiered Storage: Serverless Kafka with No Local Disks (20)

PDF
Cost Dimensions of Kafka - Opti Owl Cloud
SrinivasDevaki
 
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
PDF
Serverless Architectural Patterns and Best Practices | AWS
AWS Germany
 
PDF
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang
 
PDF
Big data and serverless - AWS UG The Netherlands
Marek Kuczynski
 
PDF
War Stories: DIY Kafka
confluent
 
PDF
War Stories: DIY Kafka
confluent
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
HostedbyConfluent
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
Hakka Labs
 
PDF
Kafka internals
David Groozman
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
PDF
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Nick Mahilani
 
PPTX
kafka simplicity and complexity
Paolo Platter
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
How to build a social network on serverless
Yan Cui
 
PDF
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
confluent
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Cost Dimensions of Kafka - Opti Owl Cloud
SrinivasDevaki
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Serverless Architectural Patterns and Best Practices | AWS
AWS Germany
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang
 
Big data and serverless - AWS UG The Netherlands
Marek Kuczynski
 
War Stories: DIY Kafka
confluent
 
War Stories: DIY Kafka
confluent
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
HostedbyConfluent
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
Hakka Labs
 
Kafka internals
David Groozman
 
Introduction to apache kafka
Samuel Kerrien
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Nick Mahilani
 
kafka simplicity and complexity
Paolo Platter
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
How to build a social network on serverless
Yan Cui
 
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
confluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
PDF
How to Build an Event-based Control Center for the Electrical Grid
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
How to Build an Event-based Control Center for the Electrical Grid
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Digital Circuits, important subject in CS
contactparinay1
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Ad

Beyond Tiered Storage: Serverless Kafka with No Local Disks

  • 1. WarpStream Beyond Tiered Storage: Serverless Kafka with No Local Disks
  • 2. ● Cloud disks are expensive. ● Long retention workloads TCO can be 80% disk cost even at low throughput ● EBS vs instance storage doesn’t matter ○ double/triple replication expensive compared to S3 2 EBS (GP2) $0.1/GiB pre-replication EBS (GP2) $0.3/GiB post-replication S3 $0.02/GiB post-replication Why Tiered Storage? Retention
  • 3. Tiered storage helps, but not enough
  • 4. ● Stateful brokers with attached storage make operations complex, difficult, and inelastic ● Requires consensus, topic-partition leaders, custom operations for scaling in/out and doing node replacements ● Balancing in general always a problem 4 Why Drop the Disks? Operations
  • 5. ● 80%+ of TCO for high throughput Apache Kafka clusters can be networking fees ● $0.053 / compressed GiB transferred in 100% ideal conditions with fetch-from-follower enabled 5 Why Drop the Disks? Networking
  • 6. Zero disks would be better
  • 7. WarpStream’s Cloud-Native design 7 greatly reduces the cost and complexity of Apache Kafka
  • 9. ● Entire storage engine redesigned around minimizing PUT / GET operations 9 ● S3 PUT: $0.000005 ● S3 GET: $0.0000004 Optimize for Cloud Unit Economics
  • 10. ● Entire storage engine redesigned around minimizing PUT / GET operations ● Networking is free, and storage is cheap. 10 Optimize for Cloud Unit Economics ● S3 PUT: $0.000005 ● S3 GET: $0.0000004 ● S3 Storage: $0.023/GB-mo ● S3 Cross-AZ Networking: Free
  • 11. Step 1: Eliminate per-TopicPartition Files 11 �� �� ��
  • 12. Step 1: Eliminate per-TopicPartition Files 12
  • 13. Step 1: Eliminate per-TopicPartition Files 13
  • 14. Step 2: Separate Data from Metadata 14 No per-TopicPartition leaders!
  • 15. 15 Step 3: Introduce Data Locality for Live Reads
  • 16. Step 4: Introduce Data Locality for Historical Reads 16
  • 17. 17 Step 4: Introduce Data Locality for Historical Reads
  • 19. 19 Hard Mode: Compacted Topics ● Tiered Storage in open source Apache Kafka does not support compacted topics ● WarpStream already does compaction internally ● … how hard could it possibly be?
  • 26. WarpStream costs ~85% less than self-hosted Kafka for high volume workloads 26 Deployment Model Workload Profile Hardware Network Object Storage Total Costs WarpStream Avg. ingress: 1 GiB/sec Avg. egress: 3 GiB/sec Retention: 1 day Replication Factor = 3 3 Availability Zones $223k/year $<2k/year $61k/year $286k/year Self Hosted Apache Kafka $223k/year $1.68M/year $0 $1.9M/year
  • 27. WarpStream Is Still Real Time 27 ● P99 producer latency of ~400ms ● Producer to Consumer – End to End Latency <1.5s
  • 28. WarpStream Supports S3 Express One Zone 28 ● P99 producer latency as low as 150ms ● Uses a majority quorum of 3 buckets to provide regional high-availability ● Data is moved to S3 Standard asynchronously to reduce storage costs
  • 29. 29