SlideShare a Scribd company logo
© 2019, Amazon Web Services, Inc. or its Affiliates.
Dr. Steffen Hausmann (@sthmmm)
Specialist Solutions Architect Analytics, EMEA
Amazon Web Services
Build and run streaming applications with Apache
Flink and Amazon Kinesis Data Analytics for Java
Applications
© 2019, Amazon Web Services, Inc. or its Affiliates.
Architecture for Streaming Analytics with Apache Flink
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics for Java
Applications
Amazon Elasticsearch
Service
Ingestion layer Processing layer Presentation layer
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Let’s go build!
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Kinesis Data Analytics under the
hood and best practices
© 2019, Amazon Web Services, Inc. or its Affiliates.
Amazon Kinesis Data Analytics under the hood
Kinesis Data Analytics manages the underlying infrastructure
• Operates Zookeeper for high availability
• Configures checkpoints and savepoints
• Automatic failover to healthy nodes
Kinesis Processing Units (or KPUs)
• Basic scaling unit
• Determines parallelism of the application
• Can be further adapted for I/O bound applications with
ParallelismPerKPU
© 2019, Amazon Web Services, Inc. or its Affiliates.
Scaling and Autoscaling
A scaling operation causes downtime of the application
• Scaling triggers a savepoint and
• starts a new application with adapted parallelism
Autoscaling
• Scale up triggered in minutes after a constant spike in CPU usage
• Scale down triggered a few hours after a drop in CPU usage
• Disable autoscaling if you are not CPU bound
© 2019, Amazon Web Services, Inc. or its Affiliates.
Monitoring and Metrics
Application metrics are exposed through Amazon CloudWatch
• IncomingRecords/ Bytes
• Write/ ReadProvisionedThroughputExceeded
• millisBehindLatest
© 2019, Amazon Web Services, Inc. or its Affiliates.
Logging
Application logs are exposed through CloudWatch Logs
• Send custom messages with Log4J & SLF4J
• Avoid extensive logging on the data plane path
• Search and analyze logs with CloudWatch Logs Insights
© 2019, Amazon Web Services, Inc. or its Affiliates.
Data protection and security
Data Protection
• You can encrypt data on the incoming Kinesis data stream
• All data stored in running application storage is encrypted at rest
• You can encrypt data in transit and at rest in Elasticsearch service
Avoid baking credentials into your code
• Use temporary credentials for integrating with AWS services
• Use AWS Secrets Manager to retrieve and rotate passwords
kinesisConsumerConfig.setProperty(
AWSConfigConstants.AWS_CREDENTIALS_PROVIDER, "AUTO"
);
© 2019, Amazon Web Services, Inc. or its Affiliates.
Further Readings
The AWS Big Data blog is a great resource to learn more about Apache Flink and
stream processing on AWS in general.
• https://aws.amazon.com/blogs/big-data/build-and-run-streaming-applications-
with-apache-flink-and-amazon-kinesis-data-analytics-for-java-applications/
• https://aws.amazon.com/blogs/big-data/build-a-real-time-stream-processing-
pipeline-with-apache-flink-on-aws/
• https://github.com/aws-samples/amazon-kinesis-analytics-taxi-consumer
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Thank you!

More Related Content

What's hot (20)

PDF
Grab: Building a Healthy Elasticsearch Ecosystem
Elasticsearch
 
PPTX
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 
PDF
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
PDF
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
PPTX
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Romit Mehta
 
PDF
Empower Your Security Practitioners with Elastic SIEM
Elasticsearch
 
PPTX
PayPal Notebooks at Jupytercon 2018
Romit Mehta
 
PDF
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
PDF
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
InfluxData
 
PDF
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Elasticsearch
 
PDF
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Timothy Spann
 
PDF
Building Audi’s enterprise big data platform
DataWorks Summit
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
InfoTrack: Creating a single source of truth with the Elastic Stack
Elasticsearch
 
PDF
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
Redis Labs
 
PDF
Audi‘s Hadoop Journey into the Hybrid Cloud
DataWorks Summit
 
PDF
Improving Veteran benefit services through efficient data streaming | Robert ...
HostedbyConfluent
 
PDF
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
Elasticsearch
 
PDF
Elasticsearch on Azure
Elasticsearch
 
Grab: Building a Healthy Elasticsearch Ecosystem
Elasticsearch
 
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Romit Mehta
 
Empower Your Security Practitioners with Elastic SIEM
Elasticsearch
 
PayPal Notebooks at Jupytercon 2018
Romit Mehta
 
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
InfluxData
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Elasticsearch
 
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Timothy Spann
 
Building Audi’s enterprise big data platform
DataWorks Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
InfoTrack: Creating a single source of truth with the Elastic Stack
Elasticsearch
 
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
Redis Labs
 
Audi‘s Hadoop Journey into the Hybrid Cloud
DataWorks Summit
 
Improving Veteran benefit services through efficient data streaming | Robert ...
HostedbyConfluent
 
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
Elasticsearch
 
Elasticsearch on Azure
Elasticsearch
 

Similar to Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications - Steffen Hausmann, AWS (20)

PDF
Serverless in Big Data
Eric Johnson
 
PDF
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
 
PDF
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
 
PPTX
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
PDF
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
PDF
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA
 
PPTX
Building a Real-Time Data Platform on AWS
Injae Kwak
 
PDF
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
PDF
Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ...
Florian Benz
 
PDF
Getting started with streaming analytics: Setting up a pipeline
javier ramirez
 
PDF
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
PDF
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
PDF
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
PDF
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
PPSX
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
PDF
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
PPTX
Processamento em tempo real usando AWS - padrões e casos de uso
Amazon Web Services LATAM
 
PDF
Architecting Data Lakes on AWS
Sajith Appukuttan
 
PDF
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward
 
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Serverless in Big Data
Eric Johnson
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
 
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA
 
Building a Real-Time Data Platform on AWS
Injae Kwak
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ...
Florian Benz
 
Getting started with streaming analytics: Setting up a pipeline
javier ramirez
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
Processamento em tempo real usando AWS - padrões e casos de uso
Amazon Web Services LATAM
 
Architecting Data Lakes on AWS
Sajith Appukuttan
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward
 
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Ad

Recently uploaded (20)

PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
July Patch Tuesday
Ivanti
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 

Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications - Steffen Hausmann, AWS

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. Dr. Steffen Hausmann (@sthmmm) Specialist Solutions Architect Analytics, EMEA Amazon Web Services Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. Architecture for Streaming Analytics with Apache Flink Amazon Kinesis Data Streams Amazon Kinesis Data Analytics for Java Applications Amazon Elasticsearch Service Ingestion layer Processing layer Presentation layer
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Let’s go build!
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Kinesis Data Analytics under the hood and best practices
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. Amazon Kinesis Data Analytics under the hood Kinesis Data Analytics manages the underlying infrastructure • Operates Zookeeper for high availability • Configures checkpoints and savepoints • Automatic failover to healthy nodes Kinesis Processing Units (or KPUs) • Basic scaling unit • Determines parallelism of the application • Can be further adapted for I/O bound applications with ParallelismPerKPU
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. Scaling and Autoscaling A scaling operation causes downtime of the application • Scaling triggers a savepoint and • starts a new application with adapted parallelism Autoscaling • Scale up triggered in minutes after a constant spike in CPU usage • Scale down triggered a few hours after a drop in CPU usage • Disable autoscaling if you are not CPU bound
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. Monitoring and Metrics Application metrics are exposed through Amazon CloudWatch • IncomingRecords/ Bytes • Write/ ReadProvisionedThroughputExceeded • millisBehindLatest
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. Logging Application logs are exposed through CloudWatch Logs • Send custom messages with Log4J & SLF4J • Avoid extensive logging on the data plane path • Search and analyze logs with CloudWatch Logs Insights
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. Data protection and security Data Protection • You can encrypt data on the incoming Kinesis data stream • All data stored in running application storage is encrypted at rest • You can encrypt data in transit and at rest in Elasticsearch service Avoid baking credentials into your code • Use temporary credentials for integrating with AWS services • Use AWS Secrets Manager to retrieve and rotate passwords kinesisConsumerConfig.setProperty( AWSConfigConstants.AWS_CREDENTIALS_PROVIDER, "AUTO" );
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. Further Readings The AWS Big Data blog is a great resource to learn more about Apache Flink and stream processing on AWS in general. • https://aws.amazon.com/blogs/big-data/build-and-run-streaming-applications- with-apache-flink-and-amazon-kinesis-data-analytics-for-java-applications/ • https://aws.amazon.com/blogs/big-data/build-a-real-time-stream-processing- pipeline-with-apache-flink-on-aws/ • https://github.com/aws-samples/amazon-kinesis-analytics-taxi-consumer
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Thank you!