SlideShare a Scribd company logo
Lessons learnt from Migrating to
a Stateful Streaming Framework
Software Engineer , Appier
Wei-Che(Tony) Wei
• Use case in Appier
• Moving from micro batch to true streaming
• Tips to conquer obstacles during the migration
• Challenge from design in streaming way
Outline
© Appier Inc. All rights reserved.
• Use case in Appier
• Moving from micro batch to true streaming
• Tips to conquer obstacles during the migration
• Challenge from design in streaming way
Outline
© Appier Inc. All rights reserved.
© Appier Inc. All rights reserved.
About Appier
Appier is a technology
company which aims to
provide artificial intelligence
platforms to help enterprises
solve their most challenging
business problems.
http://www.appier.com/en/index.html
© Appier Inc. All rights reserved.
Use case
● Retargeting
● Fraud detection
● Real time recommendation
● ...
• Use case in Appier
• Moving from micro batch to true streaming
- Disadvantage from micro batch
- New requirements
- Our solution by using Flink
• Tips to conquer obstacles during the migration
• Challenge from design in streaming way
Outline
© Appier Inc. All rights reserved.
Dynamic Rule Service: Previous design
© Appier Inc. All rights reserved.
• High latency
• Struggle with back pressure
• Hard to maintain states
Disadvantage from micro batch
© Appier Inc. All rights reserved.
• “Who visited this site twice in the past week”
• “Who viewed this product in the site but didn’t put into cart”
• “Who visited the certain pattern of some pages and bought this
product”
New requirements
© Appier Inc. All rights reserved.
Dynamic Rule Service with Apache Flink
© Appier Inc. All rights reserved.
Detail of JobGraph (Apache Flink 1.4.0 release)
© Appier Inc. All rights reserved.
• Support more needs for stateful rules
• Flexible architecture
• Performance efficiency
• Cost efficiency
Improvements
© Appier Inc. All rights reserved.
• Use case in Appier
• Moving from micro batch to true streaming
• Tips to conquer obstacles during the migration
• Challenge from design in streaming way
Outline
© Appier Inc. All rights reserved.
Tip 1:
Document and Mailing
List are your best friends
© Appier Inc. All rights reserved.
Case Study:
taskmanager.exit-on-fatal-akka-error: true
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/co
nfig.html#jobmanager--taskmanager
Tip 2:
Monitor and alert
are important
© Appier Inc. All rights reserved.
Tip 2:
Monitor and alert
are important
© Appier Inc. All rights reserved.
Tip 2:
Monitor and alert
are important
© Appier Inc. All rights reserved.
Tip 2:
Monitor and alert
are important
© Appier Inc. All rights reserved.
Tip 2:
Monitor and alert
are important
© Appier Inc. All rights reserved.
Tip 3:
Be familiar with
your environment
and your job
Case Study:
checkpoint stuck with rocksdb statebackend and s3 filesystem
© Appier Inc. All rights reserved.
Tip 3:
Be familiar with
your environment
and your job
Case Study: Verify bottleneck of our streaming job
⨯ network performance issue
✔ memory bound job met the resource limitation
© Appier Inc. All rights reserved.
• Use case in Appier
• Moving from micro batch to true streaming
• Tips to conquer obstacles during the migration
• Challenge from design in streaming way
Outline
© Appier Inc. All rights reserved.
• Replay expired data or rebuild states are complex
• How to expose user-defined metrics well
- Current matched count for each rule
• Too expensive to use querible states
• End-to-end verification is a tough job
- How to verify those replay data is prepared
Challenge from design in streaming way
© Appier Inc. All rights reserved.
• Documents and information from mailing list help a lot.
• Monitor and alert let you respond to problem and diagnose it quickly.
• Stateful streaming is environment sensitive. Be careful of it.
• Community is always behind you.
TL;DR
© Appier Inc. All rights reserved.
Thank you

More Related Content

What's hot (19)

PDF
Getting into the flow building applications with reactive streams
Tim van Eijndhoven
 
PPTX
Growing into a proactive Data Platform
LivePerson
 
PPTX
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...
InfluxData
 
PPTX
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Michael Kehoe
 
PDF
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward
 
PPTX
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Flink Forward
 
PDF
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
PPTX
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
confluent
 
PDF
Matching the Scale at Tinder with Kafka
confluent
 
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
KEY
How to Build a SaaS App With Twitter-like Throughput on Just 9 Servers
New Relic
 
PDF
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
confluent
 
PPTX
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Flink Forward
 
PDF
The Workshop: Alcanzando una observabilidad unificada con Elastic APM
Elasticsearch
 
PPTX
New relic
Shubhani Jain
 
PDF
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
 
PDF
IoT 'Megaservices' - High Throughput Microservices with Akka
Lightbend
 
PDF
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
HostedbyConfluent
 
PPTX
Stream Processing @ Lyft
Jamie Grier
 
Getting into the flow building applications with reactive streams
Tim van Eijndhoven
 
Growing into a proactive Data Platform
LivePerson
 
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...
InfluxData
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Michael Kehoe
 
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward
 
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Flink Forward
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
confluent
 
Matching the Scale at Tinder with Kafka
confluent
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
How to Build a SaaS App With Twitter-like Throughput on Just 9 Servers
New Relic
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
confluent
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Flink Forward
 
The Workshop: Alcanzando una observabilidad unificada con Elastic APM
Elasticsearch
 
New relic
Shubhani Jain
 
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
 
IoT 'Megaservices' - High Throughput Microservices with Akka
Lightbend
 
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
HostedbyConfluent
 
Stream Processing @ Lyft
Jamie Grier
 

Similar to Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrating to a Stateful Streaming Framework" (20)

PPTX
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PDF
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
PDF
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
PPTX
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PDF
Making Sense of Apache Flink: A Fearless Introduction
HostedbyConfluent
 
PPTX
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
PPTX
Flink Streaming @BudapestData
Gyula Fóra
 
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
PPTX
Debunking Common Myths in Stream Processing
Kostas Tzoumas
 
PPTX
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
PDF
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Making Sense of Apache Flink: A Fearless Introduction
HostedbyConfluent
 
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Flink Streaming @BudapestData
Gyula Fóra
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
Debunking Common Myths in Stream Processing
Kostas Tzoumas
 
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Welcome to the Flink Community!
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Flink Forward
 
Ad

Recently uploaded (20)

PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 

Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrating to a Stateful Streaming Framework"

  • 1. Lessons learnt from Migrating to a Stateful Streaming Framework Software Engineer , Appier Wei-Che(Tony) Wei
  • 2. • Use case in Appier • Moving from micro batch to true streaming • Tips to conquer obstacles during the migration • Challenge from design in streaming way Outline © Appier Inc. All rights reserved.
  • 3. • Use case in Appier • Moving from micro batch to true streaming • Tips to conquer obstacles during the migration • Challenge from design in streaming way Outline © Appier Inc. All rights reserved.
  • 4. © Appier Inc. All rights reserved. About Appier Appier is a technology company which aims to provide artificial intelligence platforms to help enterprises solve their most challenging business problems. http://www.appier.com/en/index.html
  • 5. © Appier Inc. All rights reserved. Use case ● Retargeting ● Fraud detection ● Real time recommendation ● ...
  • 6. • Use case in Appier • Moving from micro batch to true streaming - Disadvantage from micro batch - New requirements - Our solution by using Flink • Tips to conquer obstacles during the migration • Challenge from design in streaming way Outline © Appier Inc. All rights reserved.
  • 7. Dynamic Rule Service: Previous design © Appier Inc. All rights reserved.
  • 8. • High latency • Struggle with back pressure • Hard to maintain states Disadvantage from micro batch © Appier Inc. All rights reserved.
  • 9. • “Who visited this site twice in the past week” • “Who viewed this product in the site but didn’t put into cart” • “Who visited the certain pattern of some pages and bought this product” New requirements © Appier Inc. All rights reserved.
  • 10. Dynamic Rule Service with Apache Flink © Appier Inc. All rights reserved.
  • 11. Detail of JobGraph (Apache Flink 1.4.0 release) © Appier Inc. All rights reserved.
  • 12. • Support more needs for stateful rules • Flexible architecture • Performance efficiency • Cost efficiency Improvements © Appier Inc. All rights reserved.
  • 13. • Use case in Appier • Moving from micro batch to true streaming • Tips to conquer obstacles during the migration • Challenge from design in streaming way Outline © Appier Inc. All rights reserved.
  • 14. Tip 1: Document and Mailing List are your best friends © Appier Inc. All rights reserved. Case Study: taskmanager.exit-on-fatal-akka-error: true https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/co nfig.html#jobmanager--taskmanager
  • 15. Tip 2: Monitor and alert are important © Appier Inc. All rights reserved.
  • 16. Tip 2: Monitor and alert are important © Appier Inc. All rights reserved.
  • 17. Tip 2: Monitor and alert are important © Appier Inc. All rights reserved.
  • 18. Tip 2: Monitor and alert are important © Appier Inc. All rights reserved.
  • 19. Tip 2: Monitor and alert are important © Appier Inc. All rights reserved.
  • 20. Tip 3: Be familiar with your environment and your job Case Study: checkpoint stuck with rocksdb statebackend and s3 filesystem © Appier Inc. All rights reserved.
  • 21. Tip 3: Be familiar with your environment and your job Case Study: Verify bottleneck of our streaming job ⨯ network performance issue ✔ memory bound job met the resource limitation © Appier Inc. All rights reserved.
  • 22. • Use case in Appier • Moving from micro batch to true streaming • Tips to conquer obstacles during the migration • Challenge from design in streaming way Outline © Appier Inc. All rights reserved.
  • 23. • Replay expired data or rebuild states are complex • How to expose user-defined metrics well - Current matched count for each rule • Too expensive to use querible states • End-to-end verification is a tough job - How to verify those replay data is prepared Challenge from design in streaming way © Appier Inc. All rights reserved.
  • 24. • Documents and information from mailing list help a lot. • Monitor and alert let you respond to problem and diagnose it quickly. • Stateful streaming is environment sensitive. Be careful of it. • Community is always behind you. TL;DR © Appier Inc. All rights reserved.