SlideShare a Scribd company logo
© Copyright 2016 HomeAway, Inc.
Kafka: The “Dial Tone” for Data
HomeAway
The world leader for vacation rentals
> 1 million listings
(and growing!)
Agenda
© Copyright 2016 HomeAway, Inc.
• Overview
• The Problem
• The Experiment
• Results: Use Cases
• Lessons Learned
• Next Steps
© Copyright 2016 HomeAway, Inc.
Overview
Difference between Dinosaurs and Unicorns
© Copyright 2016 HomeAway, Inc.
In the old days: “Dial Tone” looked like this
© Copyright 2016 HomeAway, Inc.
ATDT
Today: Kafka is the modern “Dial Tone” for Data
© Copyright 2016 HomeAway, Inc.
Producer
Consumer
The Problem
© Copyright 2016 HomeAway, Inc.
The Problem
© Copyright 2016 HomeAway, Inc.
Our original problem/motivation
© Copyright 2016 HomeAway, Inc.
search head
indexer
indexer
app server forwarder
app server forwarder
1 TB/day ingress and growing!
40,000 calls/sec
Also… Historical Analytic Pipeline was slow/expensive
© Copyright 2016 HomeAway, Inc.
app server
OLTP OLAP
analytics
ETL
Fill the Lake! Alternatives
?
Problem: Fill Hadoop!
Problem Data Lake
© Copyright 2016 HomeAway, Inc.
What we wanted… the Big Idea
© Copyright 2016 HomeAway, Inc.
If you can log it… … you can analyze it!
How to build self-service?
© Copyright 2016 HomeAway, Inc.
Hypothesis: Use Kafka!
© Copyright 2016 HomeAway, Inc.
2 ms median
latency
http://bit.ly/jay_on_logs
the log
2 Million Events / Sec!
(3 cheap machines)
http://goo.gl/pv5GoL
“Benchmarking Apache
Kafka”
© Copyright 2016 HomeAway, Inc.
The Experiment
HACommonsLogging
• KafkaAppender
Schema-on-read
• KafkaAvroLogger
Schema-on-write
Experiment: Schema-on-Read, Schema-on-Write
Data Lake
© Copyright 2016 HomeAway, Inc.
Schema
Registry
Architecture: Kafka + Camus = BigData Ingress
© Copyright 2016 HomeAway, Inc.
Camus
© Copyright 2016 HomeAway, Inc.
The Results
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
Use Cases: Fraud
© Copyright 2016 HomeAway, Inc.
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
User Behavior
Search Requests
A/B Test Readouts
Proctor
EDAP
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
Use Cases: Traveler Segmentation
© Copyright 2016 HomeAway, Inc.
EDAP
Data
Model
Lessons Learned
© Copyright 2016 HomeAway, Inc.
Lesson #1: The Schema [registry] is Everything!
Data Lake
© Copyright 2016 HomeAway, Inc.
Schema
Registry
• Decouples producers from
consumers
• Enforces backwards
compatibility
• Enables self-service /
democratization
• SOT for schemas in the pipe
Lesson #2: A Kafka/SR governance module is helpful
Data Lake
© Copyright 2016 HomeAway, Inc.
• TURN OFF Auto Topic Creation!
• Need a place for developers
to request topics
• Retention Policy
• Expected Load
• Compaction
• Partition Size / Partition Key
• Owner
• LTS Date
Lesson #3: Make it easy to do stream processing
© Copyright 2016 HomeAway, Inc.
Schema
Registry
• samza-archetype
• samza-job-deployer
• Will evaluate k-streams!!!!
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
Next Steps
© Copyright 2016 HomeAway, Inc.
Consistency : 3 types of Data
© Copyright 2016 HomeAway, Inc.
Event
Document
Transactional
Kafka Producer Spooling
© Copyright 2016 HomeAway, Inc.
Conclusion
© Copyright 2016 HomeAway, Inc.
Yesterday
© Copyright 2016 HomeAway, Inc.
Systems of Record
Today
© Copyright 2016 HomeAway, Inc.
Systems of Engagement
Tomorrow
© Copyright 2016 HomeAway, Inc.
Systems of Intelligence
Don’t be a dinosaur…
© Copyright 2016 HomeAway, Inc.
ATDT
Thank you
© Copyright 2016 HomeAway, Inc.
End of Presentation
© Copyright 2016 HomeAway, Inc.

More Related Content

Viewers also liked (20)

PPTX
Microservices in the Apache Kafka Ecosystem
confluent
 
PDF
Power of the Log: LSM & Append Only Data Structures
confluent
 
PPTX
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
PPTX
Never at Rest - IoT and Data Streaming at British Gas Connected Homes, Paul M...
confluent
 
PDF
Ingesting Healthcare Data, Micah Whitacre
confluent
 
PPTX
Espresso Database Replication with Kafka, Tom Quiggle
confluent
 
PDF
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
PPTX
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
confluent
 
PPTX
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
PDF
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
PDF
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
confluent
 
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
PPTX
The Rise of Real Time
confluent
 
PDF
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
confluent
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
confluent
 
PDF
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
PDF
Demystifying Stream Processing with Apache Kafka
confluent
 
PPTX
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Microservices in the Apache Kafka Ecosystem
confluent
 
Power of the Log: LSM & Append Only Data Structures
confluent
 
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Never at Rest - IoT and Data Streaming at British Gas Connected Homes, Paul M...
confluent
 
Ingesting Healthcare Data, Micah Whitacre
confluent
 
Espresso Database Replication with Kafka, Tom Quiggle
confluent
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
confluent
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
confluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
The Rise of Real Time
confluent
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
confluent
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
confluent
 
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
Demystifying Stream Processing with Apache Kafka
confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 

Similar to Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra (20)

PDF
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
Insight Technology, Inc.
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PPTX
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
PPTX
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
PDF
Apache Pulsar at Yahoo! Japan
StreamNative
 
PDF
Advanced Threat Detection on Streaming Data
Carol McDonald
 
PPTX
Dark web markets: from the silk road to alphabay, trends and developments
Andres Baravalle
 
PDF
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
VoltDB
 
PDF
Spoofing and Denial of Service: A risk to the decentralized Internet
APNIC
 
PDF
DDoS And Spoofing, a risk to the decentralized internet
Tom Paseka
 
PDF
Hybrid Cloud Enablement Technologies
hybrid cloud
 
PPTX
Overview of AWS Ground Station
AWS Daily News
 
PPTX
Dear IT...I'd Like A Kubernetes Cluster
Shannon Williams
 
PPTX
Financial Grade OAuth & OpenID Connect
Nat Sakimura
 
PDF
Emerging trends in data analytics
Wei-Chiu Chuang
 
PPTX
Webinar: Are you ready for your peak season?
Jennifer Finney
 
PDF
OpenStack Resources and Capacity Management - Shimon Benattar, Mark Rasin - O...
Cloud Native Day Tel Aviv
 
PDF
Monetize All the Things
Jake Spurlock
 
PPTX
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Michael Kehoe
 
PPTX
Spotinst 'AWS Cost Optimization' Webinar - Jan 20th, 2016
Spotinst
 
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
Insight Technology, Inc.
 
Design Patterns for working with Fast Data
MapR Technologies
 
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
Apache Pulsar at Yahoo! Japan
StreamNative
 
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Dark web markets: from the silk road to alphabay, trends and developments
Andres Baravalle
 
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
VoltDB
 
Spoofing and Denial of Service: A risk to the decentralized Internet
APNIC
 
DDoS And Spoofing, a risk to the decentralized internet
Tom Paseka
 
Hybrid Cloud Enablement Technologies
hybrid cloud
 
Overview of AWS Ground Station
AWS Daily News
 
Dear IT...I'd Like A Kubernetes Cluster
Shannon Williams
 
Financial Grade OAuth & OpenID Connect
Nat Sakimura
 
Emerging trends in data analytics
Wei-Chiu Chuang
 
Webinar: Are you ready for your peak season?
Jennifer Finney
 
OpenStack Resources and Capacity Management - Shimon Benattar, Mark Rasin - O...
Cloud Native Day Tel Aviv
 
Monetize All the Things
Jake Spurlock
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Michael Kehoe
 
Spotinst 'AWS Cost Optimization' Webinar - Jan 20th, 2016
Spotinst
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Design Thinking basics for Engineers.pdf
CMR University
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 

Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Editor's Notes

  • #6: Is often the difference between dinosaurs and unicorns.
  • #7: “We are living at a dawn of a new age, where how we listen to data…
  • #8: Today, we @ HomeAway believe that Kafka is the “Dial Tone” for data, enabling present businesses to turn themselves into rainbows and unicorns.