SlideShare a Scribd company logo
Twitter Real Time Stack
Processing Billions of Events Using
Distributed Log and Heron
Karthik	
  Ramasamy	
  
Twi/er
@karthikz
2
3
Value of Data
It’s contextual
Value&of&Data&to&Decision/Making&
Time&
Preven8ve/&
Predic8ve&
Ac8onable&
Reac8ve&
Historical&
Real%&
Time&
Seconds& Minutes& Hours& Days&
Tradi8onal&“Batch”&&&&&&&&&&&&&&&
Business&&Intelligence&
Informa9on&Half%Life&
In&Decision%Making&
Months&
Time/cri8cal&
Decisions&
[1]	
  Courtesy	
  Michael	
  Franklin,	
  BIRTE,	
  2015.	
  
4
What is Real-Time?
BATCH
high throughput
> 1 hour
monthly active users
relevance for ads
adhoc
queries
REAL TIME
low latency
< 1 ms
Financial
Trading
ad impressions count
hash tag trends
approximate
10 ms - 1 sec
Near Real
Time
latency sensitive
< 500 ms
fanout Tweets
search for Tweets
deterministic
workflows
OLTP
It’s contextual
5
Why Real Time?
G
Emerging break out
trends in Twitter (in the
form #hashtags)
Ü
Real time sports
conversations related
with a topic (recent goal
or touchdown)
!
Real time product
recommendations based
on your behavior &
profile
real time searchreal time trends real time conversations real time recommendations
Real time search of
tweets
s
ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
6
Real Time: Analytics
STREAMING
Analyze	
  data	
  as	
  it	
  is	
  being	
  
produced
INTERACTIVE
Store	
  data	
  and	
  provide	
  results	
  
instantly	
   when	
   a	
   query	
   is	
  
posed
H
C
7
Real Time Use Cases
Online Services
10s of ms
Near Real Time
100s of ms
Data for Batch Analytics
secs to mins
TransacKon	
  log,	
  Queues,	
  
RPCs
Change	
  propagaKon,	
  
Streaming	
  analyKcs
Log	
  aggregaKon,	
  Client	
  
events
I
8
Real Time Stack
Components: Many moving parts
TWITTER REAL
TIME
!
scribe
s
heron
J
Event
Bus
a
dlog
b
9
Scribe
Open source log aggregation
Originally	
  from	
  Facebook.	
  TwiRer	
  
made	
  significant	
  enhancements	
  for	
  
real	
  Kme	
  event	
  aggregaKon
High throughput and scale
Delivers	
  125M	
  messages/min.	
  	
  
Provides	
  Kght	
  SLAs	
  on	
  data	
  reliability
Runs on every machine
Simple,	
  very	
  reliable	
  and	
  efficiently	
  
uses	
  memory	
  and	
  CPU
!
{
"
Event	
  Bus	
  &	
  Distributed	
  Log
Next Generation Messaging
"
11
Twitter Messaging
Kestrel
Core	
  Business	
  Logic	
  
(tweets,	
  fanouts	
  …)
Kestrel
HDFS
Kestrel
Book	
  
Keeper
My	
  SQL Ka]a
Scribe
Deferred	
  
RPC
Gizzard Database Search
12
Kestrel Limitations
Adding subscribers is expensive
Scales poorly as #queues increase
Durability is hard to achieve
Read-behind degrades performance
Too many random I/Os
Cross DC replication
!
#"
7!
13
Kafka Limitations
Relies on file system page cache
Performance degradation when subscribers
fall behind - too much random I/O
!
"
14
Rethinking Messaging
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost efficiency
Unified Stack - tradeoffs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!
15
Event Bus
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost efficiency
Unified Stack - tradeoffs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!
16
Event Bus - Pub-Sub
Write	
  
Proxy
Read	
  
Proxy
Publisher Subscriber
Metadata
Distributed	
  	
  
Log
Distributed	
  Log
17
Distributed Log
Write	
  
Proxy
Read	
  
Proxy
Publisher Subscriber
Metadata
Distributed	
  	
  
Log
18
Distributed Log @Twitter
01 02 03 04
Manhattan
Key Value Store
Durable Deferred
RPC
Real Time
Search Indexing
Pub Sub
System
/
.
-
,
05
/
Globally
Replicated Log
19
Distributed Log @Twitter
400	
  TB/Day	
  
IN
10	
  PB/Day	
  	
  
OUT
2	
  Trillion	
  Events/Day	
  
PROCESSED
100	
  MS	
  
latency
ALGORITHMS
Mining
Streaming Data
Twi/er	
  Heron
Next Generation Streaming Engine
"
22
Better Storm
Twitter Heron
Container	
  Based	
  Architecture
Separate	
  Monitoring	
  and	
  Scheduling
-
Simplified	
  ExecuTon	
  Model
2
Much	
  Be/er	
  Performance$
23
Twitter Heron
Batching of tuples
AmorKzing	
  the	
  cost	
  of	
  transferring	
  tuples !
Task isolation
Ease	
  of	
  debug-­‐ability/isolaKon/profiling
#Fully API compatible with Storm
Directed	
  acyclic	
  graph	
  	
  
	
  Topologies,	
  Spouts	
  and	
  Bolts
"
Support for back pressure
Topologies	
  should	
  self	
  adjusKng
gUse of main stream languages
C++,	
  Java	
  and	
  Python !
Efficiency
Reduce resource consumption
G
Design: Goals
24
Twitter Heron
Guaranteed
Message
Passing
Horizontal
Scalability
Robust
Fault
Tolerance
Concise
Code-Focus
on Logic
b  Ñ /
25
Heron Terminology
Topology
Directed	
  acyclic	
  graph	
  	
  
verKces	
  =	
  computaKon,	
  and	
  	
  
edges	
  =	
  streams	
  of	
  data	
  tuples
Spouts
Sources	
  of	
  data	
  tuples	
  for	
  the	
  topology	
  
Examples	
  -­‐	
  Ka]a/Kestrel/MySQL/Postgres
Bolts
Process	
  incoming	
  tuples,	
  and	
  emit	
  outgoing	
  tuples	
  
Examples	
  -­‐	
  filtering/aggregaKon/join/any	
  funcKon
,
%
26
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
27
Stream Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
28
Heron
Topology 1
Topology
Submission
Scheduler
Topology 2
Topology N
Architecture: High Level
29
Heron
Topology
Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager
Architecture: Topology
30
Heron
% %
S1 B2 B3
%
B4
Stream Manager: BackPressure
31
Stream Manager
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
Stream Manager: BackPressure
S1 S1
S1S1S1 S1
S1S1
32
Heron
B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
Stream Manager: Spout BackPressure
33
Heron Use Cases
REALTIME
ETL
REAL TIME
BI
SPAM
DETECTION
REAL TIME
TRENDS
REALTIME
ML
REAL TIME
OPS
34
Heron
Sample Topologies
35
Heron @Twitter
1 stage 10 stages
3x reduction in cores and memory
Heron has been in production for 2 years
36
Heron
COMPONENTS EXPT #1 EXPT #2 EXPT #3
Spout 25 100 200
Bolt 25 100 200
# Heron containers 25 100 200
# Storm workers 25 100 200
Performance: Settings
37
Heron
Throughput CPU usage
milliontuples/min
0
2750
5500
8250
11000
Spout Parallelism
25 100 200
10,200
5,820
1,545
1,920
965249
Heron (paper) Heron (master)
#coresused
0
112.5
225
337.5
450
Spout Parallelism
25 100 200
397.5
217.5
54
261
137
32
Heron (paper) Heron (master)
Performance: Atmost Once
5 - 6x 1.4 -1.6x
38
Heronmilliontuples/min
0
10
20
30
40
Spout Parallelism
25 100 200
Heron (paper) Heron (master)
4-5x
Performance: CPU Usage
39
Heron @Twitter
>	
  400	
  Real	
  
Time	
  Jobs
500	
  Billions	
  Events/Day	
  
PROCESSED
25-­‐200	
  
MS	
  
latency
Tying	
  Together
"
41
Combining batch and real time
Lambda Architecture
New	
  Data
Client
42
Lambda Architecture - The Good
Event	
  BusScribe	
  CollecKon	
  Pipeline Heron	
  AnalyKcs	
  Pipeline Results
43
Lambda Architecture - The Bad
Have to fix everything (may be twice)!
How much Duct Tape required?
Have to write everything twice!
Subtle differences in semantics
What about Graphs, ML, SQL, etc?
!
#"
7!
44
Summingbird to the Rescue
Summingbird	
  Program
Scalding/Map	
  Reduce
HDFS
Message	
  broker
Heron	
  Topology
Online	
  key	
  value	
  result	
  
store
Batch	
  key	
  value	
  result	
  
store
Client
45
Curious to Learn More?
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT
Storm has long served as the main platform for real-time analytics
at Twitter. However, as the scale of data being processed in real-
time at Twitter has increased, along with an increase in the
diversity and the number of use cases, many limitations of Storm
have become apparent. We need a system that scales better, has
better debug-ability, has better performance, and is easier to
manage – all while working in a shared cluster infrastructure. We
considered various alternatives to meet these needs, and in the end
concluded that we needed to build a new real-time stream data
processing system. This paper presents the design and
implementation of this new system, called Heron. Heron is now
system process, which makes debugging very challenging. Thus, we
needed a cleaner mapping from the logical units of computation to
each physical process. The importance of such clean mapping for
debug-ability is really crucial when responding to pager alerts for a
failing topology, especially if it is a topology that is critical to the
underlying business model.
In addition, Storm needs dedicated cluster resources, which requires
special hardware allocation to run Storm topologies. This approach
leads to inefficiencies in using precious cluster resources, and also
limits the ability to scale on demand. We needed the ability to work
in a more flexible way with popular cluster scheduling software that
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,
Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk,
@jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison
46
Interested in Heron?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://heronstreaming.io
HERON IS OPEN SOURCED
FOLLOW US @HERONSTREAMING
47
Interested in Distributed Log?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://distributedlog.io
DISTRIBUTED LOG IS OPEN SOURCED
FOLLOW US @DISTRIBUTEDLOG
48
WHAT WHY WHERE WHEN WHO HOW
Any Question ???
49
@karthikz
Get in Touch
THANKS	
  FOR	
  ATTENDING	
  !!!

More Related Content

What's hot (20)

PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
High frequency welding
Darshan Shah
 
PPTX
Continuous Casting Steel
Er Soumyabrata Basak
 
PDF
[한국IBM] Watson Explorer 활용사례집
Sejeong Kim 김세정
 
PDF
17767705 heat-treatment-oct08
moh481989
 
PPTX
Spring integration
Dominik Strzyżewski
 
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
PPTX
Metal forming process
zoha nasir
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
Sand casting of metals - Gating system for sand casting mould
Amruta Rane
 
PPTX
Integrating Apache NiFi and Apache Flink
Hortonworks
 
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PDF
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
PDF
Paris Redis Meetup Introduction
Gregory Boissinot
 
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
How Kafka Powers the World's Most Popular Vector Database System with Charles...
HostedbyConfluent
 
PDF
All About Stainless Steel
Team Pacesetter
 
PPT
CHAPTER 3: Induction Coil Design
Fluxtrol Inc.
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
High frequency welding
Darshan Shah
 
Continuous Casting Steel
Er Soumyabrata Basak
 
[한국IBM] Watson Explorer 활용사례집
Sejeong Kim 김세정
 
17767705 heat-treatment-oct08
moh481989
 
Spring integration
Dominik Strzyżewski
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Metal forming process
zoha nasir
 
Cassandra Introduction & Features
DataStax Academy
 
Sand casting of metals - Gating system for sand casting mould
Amruta Rane
 
Integrating Apache NiFi and Apache Flink
Hortonworks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
Paris Redis Meetup Introduction
Gregory Boissinot
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
HostedbyConfluent
 
All About Stainless Steel
Team Pacesetter
 
CHAPTER 3: Induction Coil Design
Fluxtrol Inc.
 

Viewers also liked (20)

PDF
Storm@Twitter, SIGMOD 2014 paper
Karthik Ramasamy
 
PDF
Rihards Olups - Zabbix log management
Zabbix
 
PDF
Back to the future with C++ and Seastar
Tzach Livyatan
 
PPTX
Flink. Pure Streaming
Indizen Technologies
 
PPTX
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
PDF
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
PDF
#TwitterRealTime - Real time processing @twitter
Twitter Developers
 
PDF
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
PDF
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
PDF
Graph Processing with Apache TinkerPop
Jason Plurad
 
PDF
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
PPTX
ETL into Neo4j
Max De Marzi
 
PDF
20170126 big data processing
Vienna Data Science Group
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
PDF
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
PDF
Converting Relational to Graph Databases
Antonio Maccioni
 
PDF
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
PDF
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Storm@Twitter, SIGMOD 2014 paper
Karthik Ramasamy
 
Rihards Olups - Zabbix log management
Zabbix
 
Back to the future with C++ and Seastar
Tzach Livyatan
 
Flink. Pure Streaming
Indizen Technologies
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
#TwitterRealTime - Real time processing @twitter
Twitter Developers
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Graph Processing with Apache TinkerPop
Jason Plurad
 
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
ETL into Neo4j
Max De Marzi
 
20170126 big data processing
Vienna Data Science Group
 
Introduction to Streaming Analytics
Guido Schmutz
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Converting Relational to Graph Databases
Antonio Maccioni
 
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Ad

Similar to Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron (20)

PDF
Handson with Twitter Heron
Data Engineers Guild Meetup Group
 
PDF
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Data Con LA
 
PDF
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
PDF
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
Vinu Charanya
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
PDF
times ten in-memory database for extreme performance
Oracle Korea
 
PDF
Drinking from the Firehose - Real-time Metrics
Samantha Quiñones
 
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
PPTX
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Soroosh Khodami
 
PDF
The hidden engineering behind machine learning products at Helixa
Alluxio, Inc.
 
PDF
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
PDF
Realtime Analytics on AWS
Sungmin Kim
 
PDF
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
PPTX
Times ten 18.1_overview_meetup
Byung Ho Lee
 
PDF
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Databricks
 
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
PPTX
Malstone KDD 2010
Robert Grossman
 
PDF
Become a Performance Diagnostics Hero
TechWell
 
PPTX
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
PPT
Complex Event Processing: What?, Why?, How?
Alexandre Vasseur
 
Handson with Twitter Heron
Data Engineers Guild Meetup Group
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
Vinu Charanya
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
times ten in-memory database for extreme performance
Oracle Korea
 
Drinking from the Firehose - Real-time Metrics
Samantha Quiñones
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Soroosh Khodami
 
The hidden engineering behind machine learning products at Helixa
Alluxio, Inc.
 
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
Realtime Analytics on AWS
Sungmin Kim
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
Times ten 18.1_overview_meetup
Byung Ho Lee
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Databricks
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Malstone KDD 2010
Robert Grossman
 
Become a Performance Diagnostics Hero
TechWell
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Complex Event Processing: What?, Why?, How?
Alexandre Vasseur
 
Ad

More from Karthik Ramasamy (12)

PDF
Scaling Apache Pulsar to 10 PB/day
Karthik Ramasamy
 
PDF
Apache Pulsar @Splunk
Karthik Ramasamy
 
PDF
Pulsar summit-keynote-final
Karthik Ramasamy
 
PDF
Apache Pulsar Seattle - Meetup
Karthik Ramasamy
 
PDF
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy
 
PDF
Creating Data Fabric for #IOT with Apache Pulsar
Karthik Ramasamy
 
PDF
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
PDF
Exactly once in Apache Heron
Karthik Ramasamy
 
PDF
Tutorial - Modern Real Time Streaming Architectures
Karthik Ramasamy
 
PDF
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Karthik Ramasamy
 
PDF
Modern Data Pipelines
Karthik Ramasamy
 
PDF
Storm@Twitter, SIGMOD 2014
Karthik Ramasamy
 
Scaling Apache Pulsar to 10 PB/day
Karthik Ramasamy
 
Apache Pulsar @Splunk
Karthik Ramasamy
 
Pulsar summit-keynote-final
Karthik Ramasamy
 
Apache Pulsar Seattle - Meetup
Karthik Ramasamy
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy
 
Creating Data Fabric for #IOT with Apache Pulsar
Karthik Ramasamy
 
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Exactly once in Apache Heron
Karthik Ramasamy
 
Tutorial - Modern Real Time Streaming Architectures
Karthik Ramasamy
 
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Karthik Ramasamy
 
Modern Data Pipelines
Karthik Ramasamy
 
Storm@Twitter, SIGMOD 2014
Karthik Ramasamy
 

Recently uploaded (20)

PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Library_Management_System_PPT111111.pptx
nmtnissancrm
 

Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

  • 1. Twitter Real Time Stack Processing Billions of Events Using Distributed Log and Heron Karthik  Ramasamy   Twi/er @karthikz
  • 2. 2
  • 3. 3 Value of Data It’s contextual Value&of&Data&to&Decision/Making& Time& Preven8ve/& Predic8ve& Ac8onable& Reac8ve& Historical& Real%& Time& Seconds& Minutes& Hours& Days& Tradi8onal&“Batch”&&&&&&&&&&&&&&& Business&&Intelligence& Informa9on&Half%Life& In&Decision%Making& Months& Time/cri8cal& Decisions& [1]  Courtesy  Michael  Franklin,  BIRTE,  2015.  
  • 4. 4 What is Real-Time? BATCH high throughput > 1 hour monthly active users relevance for ads adhoc queries REAL TIME low latency < 1 ms Financial Trading ad impressions count hash tag trends approximate 10 ms - 1 sec Near Real Time latency sensitive < 500 ms fanout Tweets search for Tweets deterministic workflows OLTP It’s contextual
  • 5. 5 Why Real Time? G Emerging break out trends in Twitter (in the form #hashtags) Ü Real time sports conversations related with a topic (recent goal or touchdown) ! Real time product recommendations based on your behavior & profile real time searchreal time trends real time conversations real time recommendations Real time search of tweets s ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
  • 6. 6 Real Time: Analytics STREAMING Analyze  data  as  it  is  being   produced INTERACTIVE Store  data  and  provide  results   instantly   when   a   query   is   posed H C
  • 7. 7 Real Time Use Cases Online Services 10s of ms Near Real Time 100s of ms Data for Batch Analytics secs to mins TransacKon  log,  Queues,   RPCs Change  propagaKon,   Streaming  analyKcs Log  aggregaKon,  Client   events I
  • 8. 8 Real Time Stack Components: Many moving parts TWITTER REAL TIME ! scribe s heron J Event Bus a dlog b
  • 9. 9 Scribe Open source log aggregation Originally  from  Facebook.  TwiRer   made  significant  enhancements  for   real  Kme  event  aggregaKon High throughput and scale Delivers  125M  messages/min.     Provides  Kght  SLAs  on  data  reliability Runs on every machine Simple,  very  reliable  and  efficiently   uses  memory  and  CPU ! { "
  • 10. Event  Bus  &  Distributed  Log Next Generation Messaging "
  • 11. 11 Twitter Messaging Kestrel Core  Business  Logic   (tweets,  fanouts  …) Kestrel HDFS Kestrel Book   Keeper My  SQL Ka]a Scribe Deferred   RPC Gizzard Database Search
  • 12. 12 Kestrel Limitations Adding subscribers is expensive Scales poorly as #queues increase Durability is hard to achieve Read-behind degrades performance Too many random I/Os Cross DC replication ! #" 7!
  • 13. 13 Kafka Limitations Relies on file system page cache Performance degradation when subscribers fall behind - too much random I/O ! "
  • 14. 14 Rethinking Messaging Durable writes, intra cluster and geo-replication Scale resources independently Cost efficiency Unified Stack - tradeoffs for various workloads Multi tenancy Ease of Manageability ! #" 7!
  • 15. 15 Event Bus Durable writes, intra cluster and geo-replication Scale resources independently Cost efficiency Unified Stack - tradeoffs for various workloads Multi tenancy Ease of Manageability ! #" 7!
  • 16. 16 Event Bus - Pub-Sub Write   Proxy Read   Proxy Publisher Subscriber Metadata Distributed     Log Distributed  Log
  • 17. 17 Distributed Log Write   Proxy Read   Proxy Publisher Subscriber Metadata Distributed     Log
  • 18. 18 Distributed Log @Twitter 01 02 03 04 Manhattan Key Value Store Durable Deferred RPC Real Time Search Indexing Pub Sub System / . - , 05 / Globally Replicated Log
  • 19. 19 Distributed Log @Twitter 400  TB/Day   IN 10  PB/Day     OUT 2  Trillion  Events/Day   PROCESSED 100  MS   latency
  • 21. Twi/er  Heron Next Generation Streaming Engine "
  • 22. 22 Better Storm Twitter Heron Container  Based  Architecture Separate  Monitoring  and  Scheduling - Simplified  ExecuTon  Model 2 Much  Be/er  Performance$
  • 23. 23 Twitter Heron Batching of tuples AmorKzing  the  cost  of  transferring  tuples ! Task isolation Ease  of  debug-­‐ability/isolaKon/profiling #Fully API compatible with Storm Directed  acyclic  graph      Topologies,  Spouts  and  Bolts " Support for back pressure Topologies  should  self  adjusKng gUse of main stream languages C++,  Java  and  Python ! Efficiency Reduce resource consumption G Design: Goals
  • 25. 25 Heron Terminology Topology Directed  acyclic  graph     verKces  =  computaKon,  and     edges  =  streams  of  data  tuples Spouts Sources  of  data  tuples  for  the  topology   Examples  -­‐  Ka]a/Kestrel/MySQL/Postgres Bolts Process  incoming  tuples,  and  emit  outgoing  tuples   Examples  -­‐  filtering/aggregaKon/join/any  funcKon , %
  • 26. 26 Heron Topology % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 27. 27 Stream Groupings 01 02 03 04 Shuffle Grouping Random distribution of tuples Fields Grouping Group tuples by a field or multiple fields All Grouping Replicates tuples to all tasks Global Grouping Send the entire stream to one task / . - ,
  • 29. 29 Heron Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER Metrics Manager Metrics Manager Architecture: Topology
  • 30. 30 Heron % % S1 B2 B3 % B4 Stream Manager: BackPressure
  • 31. 31 Stream Manager S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4 Stream Manager: BackPressure
  • 33. 33 Heron Use Cases REALTIME ETL REAL TIME BI SPAM DETECTION REAL TIME TRENDS REALTIME ML REAL TIME OPS
  • 35. 35 Heron @Twitter 1 stage 10 stages 3x reduction in cores and memory Heron has been in production for 2 years
  • 36. 36 Heron COMPONENTS EXPT #1 EXPT #2 EXPT #3 Spout 25 100 200 Bolt 25 100 200 # Heron containers 25 100 200 # Storm workers 25 100 200 Performance: Settings
  • 37. 37 Heron Throughput CPU usage milliontuples/min 0 2750 5500 8250 11000 Spout Parallelism 25 100 200 10,200 5,820 1,545 1,920 965249 Heron (paper) Heron (master) #coresused 0 112.5 225 337.5 450 Spout Parallelism 25 100 200 397.5 217.5 54 261 137 32 Heron (paper) Heron (master) Performance: Atmost Once 5 - 6x 1.4 -1.6x
  • 38. 38 Heronmilliontuples/min 0 10 20 30 40 Spout Parallelism 25 100 200 Heron (paper) Heron (master) 4-5x Performance: CPU Usage
  • 39. 39 Heron @Twitter >  400  Real   Time  Jobs 500  Billions  Events/Day   PROCESSED 25-­‐200   MS   latency
  • 41. 41 Combining batch and real time Lambda Architecture New  Data Client
  • 42. 42 Lambda Architecture - The Good Event  BusScribe  CollecKon  Pipeline Heron  AnalyKcs  Pipeline Results
  • 43. 43 Lambda Architecture - The Bad Have to fix everything (may be twice)! How much Duct Tape required? Have to write everything twice! Subtle differences in semantics What about Graphs, ML, SQL, etc? ! #" 7!
  • 44. 44 Summingbird to the Rescue Summingbird  Program Scalding/Map  Reduce HDFS Message  broker Heron  Topology Online  key  value  result   store Batch  key  value  result   store Client
  • 45. 45 Curious to Learn More? Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison ABSTRACT Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real- time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage – all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now system process, which makes debugging very challenging. Thus, we needed a cleaner mapping from the logical units of computation to each physical process. The importance of such clean mapping for debug-ability is really crucial when responding to pager alerts for a failing topology, especially if it is a topology that is critical to the underlying business model. In addition, Storm needs dedicated cluster resources, which requires special hardware allocation to run Storm topologies. This approach leads to inefficiencies in using precious cluster resources, and also limits the ability to scale on demand. We needed the ability to work in a more flexible way with popular cluster scheduling software that Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison
  • 46. 46 Interested in Heron? CONTRIBUTIONS ARE WELCOME! https://github.com/twitter/heron http://heronstreaming.io HERON IS OPEN SOURCED FOLLOW US @HERONSTREAMING
  • 47. 47 Interested in Distributed Log? CONTRIBUTIONS ARE WELCOME! https://github.com/twitter/heron http://distributedlog.io DISTRIBUTED LOG IS OPEN SOURCED FOLLOW US @DISTRIBUTEDLOG
  • 48. 48 WHAT WHY WHERE WHEN WHO HOW Any Question ???