SlideShare a Scribd company logo
A Gentle Start with Apache
Flink
Liangjun Jiang
10/21/2019
Me
• Staff Software Engineer
• Had two talks before (“Alibaba
Technology”, “A Template for Enterprise
App with Azure”)
• Different technology stack (frontend,
backend, some data infrastructure and
engineering)
• Contribute to UI automation framework
(macaca) – the most popular and most
feature ready cross platform automation
tool
• Has two kids
• A softball (assistant) coach at Oak Hills Youth Association
• Volunteering with Technology Education and Literacy in Schools (TEALS)
Outlines
• Should we need to care stream
processing?
• Stream Processing technologies and
challenges
• Flink Features and Why Flink
• Demo 0 – Flink Steam Word Count
• Demo 1 – Flink SQL and Table API
• Demo 2 – Flink in Real-time
Recommendation
• Online Resources
• Questions
Stream processing –
should you care?
• Traffic info in Google Maps are getting more
accurate?
• Credit card charge fraud detection
• E-commerce user online shopping tracking
• Real time data warehouse
• Server monitoring
• Sensor (IoT) events
• …
Should We Care?
• Uber, Lyft, Google(Apple) Maps
• Offline Data Warehouse to online (real-
time) one to save money and provide more
up-to-date information to decision maker
• Real time recommendation for online
shopping
https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
Stream Processing technologies and
challenges
APACHE STORM SPARK
STREAMING
APACHE FLINK APACHE BEAM
The Challenges
• You are taking subway and the train goes
into a tunnel where has no cellular
reception
• Your stream processing (Flink, Spark, etc) job
fails, server crashes, network not reachable
• Your sinker (Hive, Cassandra, MySQL, etc)
could fail
• The end-2-end exactly once
• Will the source resend a same message
(Idempotent)
Flink
How Flink implements
end-2-end exactly once
• Checkpoint and Savepoint
• Two-phase commit (leader election, pre-
commit)
Flink Features
and
Performance
• Time: Event time, processing time and ingestion time
• Window: tumbling window, sliding window, session
window
• State (Queryable, broadcast) and state processors APIs
• Table API and SQL
• Batch (Dataset API)
• Event Processing (CEP)
• Operators (stream joining/splitting/side output, async
I/O)
• Deployment and Monitoring
• Kerberos security model
• …
Performance
• Throughput
• Latency
https://www.ververica.com/blog/extending-the-yahoo-streaming-
benchmark
Demo 1: a Basic Flink Stream Application:
Word Count
• Code
• Build & Run with Socket Input
• Submit Flink Job on Web UI: http://localhost:8081
• web UI features
Image source: https://flink.apache.org/usecases.html
> Stream processing with flink
> Stream
> processing
> Stream processing
> flink
> (stream, 3)
> (processing, 3)
> (flink, 2)
Use cases:
1. user puts item into the shopping cart
then check out
2. add to cart but not checkout out
3. Temp. sensor measures temp. variation
etc
Demo 2: NYC Taxi
Demo:
• SQL CLI client
• SQL query on streaming
• Window aggregate & non-window
aggregate
• Source is Kafka
• Sinker is Kafka & ElasticSearch
Stack:
1. Flink
2. ElasticSearch
3. Kafka
4. Zookeeper
> docker-compose up –d
- docker-compose exec sql-client ./sql-
client.sh
- Flink SQL> show tables;
- Flink SQL > describe rides;
- select * from Rides where isInNYC(lon,
lat); // filter and show NYC rides
Http://localhost:8081
SELECT psgCnt, COUNT(*) AS cnt
FROM Rides
WHERE isInNYC(lon, lat)
GROUP BY psgCnt; // show rides with 1 passenger, 2
passengers, etc
http://wuchong.me/blog/2019/08/20/flink-sql-training/#more
Demo 2: NYC Taxi – Cont’d
Taxis in and out for each area in 5 mins
SELECT
toAreaId(lon, lat) AS area,
TUMBLE_END(rideTime, INTERVAL '5' MINUTE) AS window_end,
COUNT(*) AS cnt
FROM Rides
WHERE isInNYC(lon, lat) and isStart
GROUP BY
toAreaId(lon, lat),
TUMBLE(rideTime, INTERVAL '5' MINUTE)
HAVING COUNT(*) >= 5; //has been speeded up, will show data each 30
seconds
INSERT INTO Sink_TenMinPsgCnts
SELECT
TUMBLE_START(rideTime, INTERVAL '10' MINUTE) AS
cntStart,
TUMBLE_END(rideTime, INTERVAL '10' MINUTE) AS cntEnd,
CAST(SUM(psgCnt) AS BIGINT) AS cnt
FROM Rides
GROUP BY TUMBLE(rideTime, INTERVAL '10' MINUTE);
Each 10 mins, count how many passengers are
riding and publish to Kafka – will not demo
Insert Query Result to ElasticSearch
http://localhost:9200/area-cnts/_stats
Demo 3: a Real Time User Based Collaborative
Filter-based Recommendation System
Flink tasks –
• Logs – computing raw data to get stats
• Context – computing user’s behavior and
timestamp for certain actions,
• Profiling for user and product – user’s
preference, product characteristic, user -
product
• collaborative filter based recommendation
• top list – most clicked, favorited or viewed
products
https://github.com/CheckChe0803/flink-recommandSystem-demo
1. Hbase
2. Kafka + zookeeper
3. MySQL
4. Intelli J to run Flink tasks
5. Spring Boot to run Web UI
Demo 3: a Real Time User Based Collaborative
Filter-based Recommendation System
• Top products
• Product Profiling and recommendation (profiling the top products
based on user’s age, preference of color, product origin and style)
• Collaborative filter based recommendation – based on the product
item’s similarity to recommend the similar products to the user.
Online Resources
• https://flink.apache.org/
• https://www.ververica.com
• http://streamingsystems.net/
• My book: Flink In Practice focusing on
examples is under the way
Questions

More Related Content

What's hot (8)

PDF
Logging, monitoring and tracing in a serveless app
Mariano Calandra
 
PDF
Anatomy of a Reactive Application
Mark Wilson
 
PDF
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward
 
PDF
Wire once, rewire twice! (Haskell exchange-2018)
Eric Torreborre
 
ODP
Funambol Automated Tests for SyncML Clients
Funambol
 
PPTX
Flink in action
Artem Semenenko
 
PDF
Asynchronous job queues with python-rq
Ashish Acharya
 
Logging, monitoring and tracing in a serveless app
Mariano Calandra
 
Anatomy of a Reactive Application
Mark Wilson
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward
 
Wire once, rewire twice! (Haskell exchange-2018)
Eric Torreborre
 
Funambol Automated Tests for SyncML Clients
Funambol
 
Flink in action
Artem Semenenko
 
Asynchronous job queues with python-rq
Ashish Acharya
 

Similar to Apache Flink - a Gentle Start (20)

PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PPTX
Apache Flink: Past, Present and Future
Gyula Fóra
 
PPTX
Workshop híbrido: Stream Processing con Flink
confluent
 
PPTX
Flink Streaming @BudapestData
Gyula Fóra
 
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
PPTX
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
PDF
Santander Stream Processing with Apache Flink
confluent
 
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PDF
Stream Processing with Apache Flink
C4Media
 
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
PDF
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
Apache Flink
Mike Frampton
 
PPTX
Apache flink
Ahmed Nader
 
PPTX
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Slim Baltagi
 
PDF
Apache flink
pranay kumar
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Apache Flink: Past, Present and Future
Gyula Fóra
 
Workshop híbrido: Stream Processing con Flink
confluent
 
Flink Streaming @BudapestData
Gyula Fóra
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
Santander Stream Processing with Apache Flink
confluent
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Stream Processing with Apache Flink
C4Media
 
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Apache Flink
Mike Frampton
 
Apache flink
Ahmed Nader
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Slim Baltagi
 
Apache flink
pranay kumar
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Ad

More from Liangjun Jiang (6)

PPTX
Apache Airflow Introduction
Liangjun Jiang
 
PDF
MLflow with Databricks
Liangjun Jiang
 
PDF
Mlflow with databricks
Liangjun Jiang
 
PDF
Alibaba Technology in 2018
Liangjun Jiang
 
PDF
Use Git-flow Manage Your Git Workflow
Liangjun Jiang
 
PDF
What new-in-android-development-tools-googleio2016
Liangjun Jiang
 
Apache Airflow Introduction
Liangjun Jiang
 
MLflow with Databricks
Liangjun Jiang
 
Mlflow with databricks
Liangjun Jiang
 
Alibaba Technology in 2018
Liangjun Jiang
 
Use Git-flow Manage Your Git Workflow
Liangjun Jiang
 
What new-in-android-development-tools-googleio2016
Liangjun Jiang
 
Ad

Recently uploaded (20)

PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 

Apache Flink - a Gentle Start

  • 1. A Gentle Start with Apache Flink Liangjun Jiang 10/21/2019
  • 2. Me • Staff Software Engineer • Had two talks before (“Alibaba Technology”, “A Template for Enterprise App with Azure”) • Different technology stack (frontend, backend, some data infrastructure and engineering) • Contribute to UI automation framework (macaca) – the most popular and most feature ready cross platform automation tool • Has two kids • A softball (assistant) coach at Oak Hills Youth Association • Volunteering with Technology Education and Literacy in Schools (TEALS)
  • 3. Outlines • Should we need to care stream processing? • Stream Processing technologies and challenges • Flink Features and Why Flink • Demo 0 – Flink Steam Word Count • Demo 1 – Flink SQL and Table API • Demo 2 – Flink in Real-time Recommendation • Online Resources • Questions
  • 4. Stream processing – should you care? • Traffic info in Google Maps are getting more accurate? • Credit card charge fraud detection • E-commerce user online shopping tracking • Real time data warehouse • Server monitoring • Sensor (IoT) events • …
  • 5. Should We Care? • Uber, Lyft, Google(Apple) Maps • Offline Data Warehouse to online (real- time) one to save money and provide more up-to-date information to decision maker • Real time recommendation for online shopping https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
  • 6. Stream Processing technologies and challenges APACHE STORM SPARK STREAMING APACHE FLINK APACHE BEAM
  • 7. The Challenges • You are taking subway and the train goes into a tunnel where has no cellular reception • Your stream processing (Flink, Spark, etc) job fails, server crashes, network not reachable • Your sinker (Hive, Cassandra, MySQL, etc) could fail • The end-2-end exactly once • Will the source resend a same message (Idempotent)
  • 9. How Flink implements end-2-end exactly once • Checkpoint and Savepoint • Two-phase commit (leader election, pre- commit)
  • 10. Flink Features and Performance • Time: Event time, processing time and ingestion time • Window: tumbling window, sliding window, session window • State (Queryable, broadcast) and state processors APIs • Table API and SQL • Batch (Dataset API) • Event Processing (CEP) • Operators (stream joining/splitting/side output, async I/O) • Deployment and Monitoring • Kerberos security model • …
  • 12. Demo 1: a Basic Flink Stream Application: Word Count • Code • Build & Run with Socket Input • Submit Flink Job on Web UI: http://localhost:8081 • web UI features Image source: https://flink.apache.org/usecases.html > Stream processing with flink > Stream > processing > Stream processing > flink > (stream, 3) > (processing, 3) > (flink, 2) Use cases: 1. user puts item into the shopping cart then check out 2. add to cart but not checkout out 3. Temp. sensor measures temp. variation etc
  • 13. Demo 2: NYC Taxi Demo: • SQL CLI client • SQL query on streaming • Window aggregate & non-window aggregate • Source is Kafka • Sinker is Kafka & ElasticSearch Stack: 1. Flink 2. ElasticSearch 3. Kafka 4. Zookeeper > docker-compose up –d - docker-compose exec sql-client ./sql- client.sh - Flink SQL> show tables; - Flink SQL > describe rides; - select * from Rides where isInNYC(lon, lat); // filter and show NYC rides Http://localhost:8081 SELECT psgCnt, COUNT(*) AS cnt FROM Rides WHERE isInNYC(lon, lat) GROUP BY psgCnt; // show rides with 1 passenger, 2 passengers, etc http://wuchong.me/blog/2019/08/20/flink-sql-training/#more
  • 14. Demo 2: NYC Taxi – Cont’d Taxis in and out for each area in 5 mins SELECT toAreaId(lon, lat) AS area, TUMBLE_END(rideTime, INTERVAL '5' MINUTE) AS window_end, COUNT(*) AS cnt FROM Rides WHERE isInNYC(lon, lat) and isStart GROUP BY toAreaId(lon, lat), TUMBLE(rideTime, INTERVAL '5' MINUTE) HAVING COUNT(*) >= 5; //has been speeded up, will show data each 30 seconds INSERT INTO Sink_TenMinPsgCnts SELECT TUMBLE_START(rideTime, INTERVAL '10' MINUTE) AS cntStart, TUMBLE_END(rideTime, INTERVAL '10' MINUTE) AS cntEnd, CAST(SUM(psgCnt) AS BIGINT) AS cnt FROM Rides GROUP BY TUMBLE(rideTime, INTERVAL '10' MINUTE); Each 10 mins, count how many passengers are riding and publish to Kafka – will not demo Insert Query Result to ElasticSearch http://localhost:9200/area-cnts/_stats
  • 15. Demo 3: a Real Time User Based Collaborative Filter-based Recommendation System Flink tasks – • Logs – computing raw data to get stats • Context – computing user’s behavior and timestamp for certain actions, • Profiling for user and product – user’s preference, product characteristic, user - product • collaborative filter based recommendation • top list – most clicked, favorited or viewed products https://github.com/CheckChe0803/flink-recommandSystem-demo 1. Hbase 2. Kafka + zookeeper 3. MySQL 4. Intelli J to run Flink tasks 5. Spring Boot to run Web UI
  • 16. Demo 3: a Real Time User Based Collaborative Filter-based Recommendation System • Top products • Product Profiling and recommendation (profiling the top products based on user’s age, preference of color, product origin and style) • Collaborative filter based recommendation – based on the product item’s similarity to recommend the similar products to the user.
  • 17. Online Resources • https://flink.apache.org/ • https://www.ververica.com • http://streamingsystems.net/ • My book: Flink In Practice focusing on examples is under the way