SlideShare a Scribd company logo
SERHAT CAN • @SRHTCN
AWS Kinesis
•
AWS Kinesis - Streams, Firehose, Analytics
Table of Contents
Streaming data?
Big Data Processing Approaches
AWS Kinesis Family
Amazon Kinesis Streams in detail
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Streaming Data: Life As It Happens
After the event occurs -> at rest (batch)
As the event occurs -> in motion (streaming)
Big Data Processing Approaches
• Common Big Data Processing Approaches
• Query Engine Approach (Data Warehouse, SQL, NoSQL Databases)
• Repeated queries over the same well-structured data
• Pre-computations like indices and dimensional views improve query performance
• Batch Engines (Map-Reduce)
• The “query” is run on the data. There are no pre-computations
• Streaming Big Data Processing Approach
• Real-time response to content in semi-structured data streams
• Relatively simple computations on data (aggregates, filters, sliding window, etc.)
• Enables data lifecycle by moving data to different stores / open source systems
Kinesis Family
Amazon Kinesis Streams
• A fully managed service for real-time processing of
high- volume, streaming data.
• Kinesis can store and process terabytes of data an
hour from hundreds of thousands of sources.
• Data is replicated across multiple Availability Zones
to ensure high durability and availability.
Amazon Kinesis Streams Concepts
Shard
• Streams are made of Shards. A shard is the base
throughput unit of an Amazon Kinesis stream.
• One shard provides a capacity of 1MB/sec data input
and 2MB/sec data output.
• One shard can support up to 1000 PUT records per
second.
• You can monitor shard-level metrics in Amazon Kinesis
Streams
• Add or remove shards from your stream dynamically
as your data throughput changes by resharding the
stream.
Data Record
• A record is the unit of data stored in an Amazon Kinesis stream.
• A record is composed of a;
• partition key
• sequence number,
• data blob (the data you want to send)
• The maximum size of a data blob (the data payload after Base64-
decoding) is 1 megabyte (MB).
Partition Key
• Partition key is used to segregate and route data records to different
shards of a stream.
• A partition key is specified by your data producer while putting data
into an Amazon Kinesis stream.
• For example, assuming you have an Amazon Kinesis stream with two
shards (Shard 1 and Shard 2). You can configure your data producer
to use two partition keys (Key A and Key B) so that all data records
with Key A are added to Shard 1 and all data records with Key B are
added to Shard 2.
Sequence Number
• Each data record has a sequence number that is unique within its
shard.
• The sequence number is assigned by Streams after you write to the
stream with client.putRecords or client.putRecord.
• Sequence numbers for the same partition key generally increase over
time; the longer the time period between write requests, the larger the
sequence numbers become.
Resharding the Stream
• Streams supports resharding, which enables you to adjust the number of
shards in your stream in order to adapt to changes in the rate of data flow
through the stream.
• There are two types of resharding operations: shard split and shard
merge.
• Shard split: divide a single shard into two shards.
• Shard merge: combine two shards into a single shard.
Resharding the Stream
• Resharding is always “pairwise”: split into & merge more than two shards
in a single operation is NOT allowed
• Resharding is typically performed by an administrative application which
is distinct from the producer (put) applications, and the consumer (get)
applications
• The administrative application would also need a broader set of IAM
permissions for resharding
Splitting a Shard
• Specify how hash key values from the parent shard should be redistributed to the child shards
• The possible hash key values for a given shard constitute a set of ordered contiguous non-
negative integers. This range of possible hash key values is given by
shard.getHashKeyRange().getStartingHashKey();
shard.getHashKeyRange().getEndingHashKey();
• When you split the shard, you specify a value in this range.
• That hash key value and all higher hash key values are distributed to one of the child shards.
• All the lower hash key values are distributed to the other child shard.
Merging Two Shards
• In order to merge two shards, the shards must be adjacent.
• Two shards are considered adjacent if the union of the hash key ranges
for the two shards form a contiguous set with no gaps.
• To identify shards that are candidates for merging, you should filter out all
shards that are in a CLOSED state.
• Shards that are OPEN—that is, not CLOSED—have an ending sequence
number of null.
After Resharding
• After you call a resharding operation, either splitShard or mergeShards,
you need to wait for the stream to become active again. (like create)
• In the process of resharding, a parent shard transitions from an OPEN
state to a CLOSED state to an EXPIRED state.
• When all is done back to ACTIVE state.
Retention Period
• Data records are accessible for a default of 24 hours from the
time they are added to a stream
• Configurable in hourly increments
• From 24 to 168 hours (1 to 7 days)
Amazon Kinesis Producer Library (KPL)
• The KPL is an easy-to-use, highly configurable library that helps you
write to a Amazon Kinesis stream.
• Writes to one or more Amazon Kinesis streams with an automatic and configurable
retry mechanism
• Collects records and uses PutRecords to write multiple records to multiple shards
per request
• Aggregates user records to increase payload size and improve throughput
• Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate
batched records on the consumer
• Submits Amazon CloudWatch metrics on your behalf to provide visibility into
producer performance
• Develop a consumer application for Amazon Kinesis Streams
• The KCL acts as an intermediary between your record processing logic and
Streams.
• KCL application instantiates a worker with configuration information, and then
uses a record processor to process the data received from an Amazon Kinesis
stream.
• You can run a KCL application on any number of instances. Multiple instances
of the same application coordinate on failures and load-balance dynamically.
• You can also have multiple KCL applications working on the same stream,
subject to throughput limits.
Amazon Kinesis Client Library (Life Saver)
Amazon Kinesis Client Library
• Connects to the stream
• Enumerates the shards
• Coordinates shard associations with other workers (if any)
• Instantiates a record processor for every shard it manages
• Pulls data records from the stream
• Pushes the records to the corresponding record processor
• Checkpoints processed records
• Balances shard-worker associations when the worker instance count changes
• Balances shard-worker associations when shards are split or merged
Amazon Kinesis Client Library
• KCL uses a unique Amazon DynamoDB table to keep
track of the application's state
• KCL creates the table with a provisioned throughput of
10 reads per second and 10 writes per second
• Each row in the DynamoDB table represents a shard that
is being processed by your application. The hash key for
the table is the shard ID.
Amazon Kinesis Client Library
• In addition to the shard ID, each row also includes the following data:
• checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across
all shards in the stream.
• checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature,
this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record.
• leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by
another worker.
• leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held
by one worker at a time.
• leaseOwner: The worker that is holding this lease.
• ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last
time a checkpoint was written.
• parentShardId: Used to ensure that the parent shard is fully processed before processing starts on
the child shards. This ensures that records are processed in the same order they were put into the
stream.
Using Shard Iterators
• You retrieve records from the stream on a per-
shard basis. 

• AT_SEQUENCE_NUMBER
• AFTER_SEQUENCE_NUMBER
• AT_TIMESTAMP
• TRIM_HORIZON
• LATEST
Recovering from Failures
• Record Processor Failure
• The worker invokes record processor methods using Java ExecutorService tasks.
• If a task fails, the worker retains control of the shard that the record processor was
processing.
• The worker starts a new record processor task to process that shard
• Worker or Application Failure
• If a worker — or an instance of the Amazon Kinesis Streams application — fails,
you should detect and handle the situation.
Handling Duplicate Records
(Idempotency)
• There are two primary reasons why records may be
delivered more than one time to your Amazon
Kinesis Streams application:
• producer retries
• consumer retries
• Your application must anticipate and appropriately
handle processing individual records multiple times.
Pricing
• Shard Hour (1MB/second ingress, 2MB/second egress)$0.015
• PUT Payload Units, per 1,000,000 units $0.014
• Extended Data Retention (Up to 7 days), per Shard Hour $0.020
• DynamoDB price if you use KCL
Kafka vs. Kinesis Streams
• In Kafka you can configure, for each topic, the replication factor and how many replicas
have to acknowledge a message before is considered successful.So you can definitely
make it highly available.
• Amazon ensures that you won't lose data, but that comes with a performance cost. 

(messages are written to 3 different AZ’s synchronously)
• There are several benchmarks online comparing Kafka and Kinesis, but the result it's
always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At
least for a reasonable price.
• This is in part is because Kafka is insanely fast, but also because Kinesis writes each
message synchronously to 3 different machines. And this is quite costly in terms of
latency and throughput.
• Kafka is one of the preferred options for the Apache stream processing frameworks
• Unsurprisingly, Kinesis is really well integrated with other AWS services
DynamoDB Streams vs. Kinesis Streams
• DynamoDB Streams actions are similar to their
counterparts in Amazon Kinesis Streams, they
are not 100% identical.
• You can write applications for Amazon Kinesis
Streams using the Amazon Kinesis Client Library
(KCL).
• You can leverage the design patterns found
within the KCL to process DynamoDB Streams
shards and stream records. To do this, you use
the DynamoDB Streams Kinesis Adapter
SQS vs. Kinesis Streams
• Amazon Kinesis Streams enables real-time
processing of streaming big data.
• It provides ordering of records, as well as the
ability to read and/or replay records in the same
order to multiple Amazon Kinesis Applications.
• The Amazon Kinesis Client Library (KCL)
delivers all records for a given partition key to
the same record processor, making it easier to
build multiple applications reading from the same
Amazon Kinesis stream (for example, to perform
counting, aggregation, and filtering).
• Amazon Simple Queue Service (Amazon SQS)
offers a reliable, highly scalable hosted queue
for storing messages as they travel between
computers.
• Amazon SQS lets you easily move data between
distributed application components and helps
you build applications in which messages are
processed independently (with message-level
ack/fail semantics), such as automated
workflows.
Amazon Kinesis Firehose
Amazon Kinesis Firehose
• Amazon Kinesis Firehose is the easiest way to load streaming data into AWS.
• It can capture, transform, and load streaming data into Amazon Kinesis
Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service
• Fully managed service that automatically scales to match the throughput of
your data and requires no ongoing administration.
• It can also batch, compress, and encrypt the data before loading it,
minimizing the amount of storage used at the destination and increasing
security.
Amazon Kinesis Analytics
• Process streaming data in real time with standard SQL
• Query streaming data or build entire streaming applications using SQL, so
that you can gain actionable insights and respond to your business and
customer needs promptly.
• Scales automatically to match the volume and throughput rate of your
incoming data
• Only pay for the resources your queries consume. There is no minimum fee
or setup cost.
Amazon Kinesis Analytics
Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
Thank you!
Time to show you real life examples
from OpsGenie

More Related Content

PDF
Introducing Confluent Cloud: Apache Kafka as a Service
confluent
 
PPTX
ElasticSearch Basic Introduction
Mayur Rathod
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
PPTX
Introduction to Kafka and Zookeeper
Rahul Jain
 
PDF
Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인
Amazon Web Services Korea
 
PDF
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
Amazon Web Services Korea
 
PDF
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
Amazon Web Services Korea
 
PPTX
AWS Kinesis
Julian Kleinhans
 
Introducing Confluent Cloud: Apache Kafka as a Service
confluent
 
ElasticSearch Basic Introduction
Mayur Rathod
 
Elasticsearch Introduction
Roopendra Vishwakarma
 
Introduction to Kafka and Zookeeper
Rahul Jain
 
Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인
Amazon Web Services Korea
 
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
Amazon Web Services Korea
 
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
Amazon Web Services Korea
 
AWS Kinesis
Julian Kleinhans
 

What's hot (20)

PDF
Aws
mahes3231
 
PPTX
Azure serverless architectures
Benoit Le Pichon
 
PDF
Introduction to Kibana
Vineet .
 
PPTX
API Management in Azure
Tomasso Groenendijk
 
PPTX
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWSKRUG - AWS한국사용자모임
 
PPTX
4. 대용량 아키텍쳐 설계 패턴
Terry Cho
 
PPTX
Azure Managed Identities and service principals
Christos Matskas
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PDF
Introduction to elasticsearch
hypto
 
PDF
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
Amazon Web Services Korea
 
PDF
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
Amazon Web Services Korea
 
PDF
AWS SAM으로 서버리스 아키텍쳐 운영하기 - 이재면(마이뮤직테이스트) :: AWS Community Day 2020
AWSKRUG - AWS한국사용자모임
 
PPTX
Apache Drill
Ted Dunning
 
PPTX
présentation MLOPS.pptx
bely26
 
PPTX
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
PPSX
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
ODP
Elasticsearch presentation 1
Maruf Hassan
 
PPTX
Hive on Spark の設計指針を読んでみた
Recruit Technologies
 
PDF
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Azure serverless architectures
Benoit Le Pichon
 
Introduction to Kibana
Vineet .
 
API Management in Azure
Tomasso Groenendijk
 
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWSKRUG - AWS한국사용자모임
 
4. 대용량 아키텍쳐 설계 패턴
Terry Cho
 
Azure Managed Identities and service principals
Christos Matskas
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Introduction to elasticsearch
hypto
 
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
Amazon Web Services Korea
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
Amazon Web Services Korea
 
AWS SAM으로 서버리스 아키텍쳐 운영하기 - 이재면(마이뮤직테이스트) :: AWS Community Day 2020
AWSKRUG - AWS한국사용자모임
 
Apache Drill
Ted Dunning
 
présentation MLOPS.pptx
bely26
 
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Elasticsearch presentation 1
Maruf Hassan
 
Hive on Spark の設計指針を読んでみた
Recruit Technologies
 
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Ad

Similar to AWS Kinesis - Streams, Firehose, Analytics (20)

PDF
AWS Kinesis Streams
Fernando Rodriguez
 
PDF
Introduction to Amazon Kinesis Data Streams
Knoldus Inc.
 
PDF
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
PDF
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 
PDF
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
PDF
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
PDF
A quick introduction to AWS Kinesis
ogeisser
 
PDF
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
PPTX
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
PDF
SNS SQS SWF and Kinesis
Mahesh Raj
 
PDF
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
Swapnil Pawar
 
PPTX
Kinesis @ lyft
Mian Hamid
 
PDF
Bridging the Gap: Connecting AWS and Kafka
Pengfei (Jason) Li
 
PDF
Realtime Analytics on AWS
Sungmin Kim
 
PDF
AWS Community Nordics Virtual Meetup
Anahit Pogosova
 
PDF
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
PDF
Event driven architectures with Kinesis
Mark Harrison
 
PDF
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
PPTX
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
PDF
Serverless in Big Data
Eric Johnson
 
AWS Kinesis Streams
Fernando Rodriguez
 
Introduction to Amazon Kinesis Data Streams
Knoldus Inc.
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
A quick introduction to AWS Kinesis
ogeisser
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
SNS SQS SWF and Kinesis
Mahesh Raj
 
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
Swapnil Pawar
 
Kinesis @ lyft
Mian Hamid
 
Bridging the Gap: Connecting AWS and Kafka
Pengfei (Jason) Li
 
Realtime Analytics on AWS
Sungmin Kim
 
AWS Community Nordics Virtual Meetup
Anahit Pogosova
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
Event driven architectures with Kinesis
Mark Harrison
 
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
Serverless in Big Data
Eric Johnson
 
Ad

Recently uploaded (20)

PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 

AWS Kinesis - Streams, Firehose, Analytics

  • 1. SERHAT CAN • @SRHTCN AWS Kinesis •
  • 3. Table of Contents Streaming data? Big Data Processing Approaches AWS Kinesis Family Amazon Kinesis Streams in detail Amazon Kinesis Firehose Amazon Kinesis Analytics
  • 4. Streaming Data: Life As It Happens After the event occurs -> at rest (batch) As the event occurs -> in motion (streaming)
  • 5. Big Data Processing Approaches • Common Big Data Processing Approaches • Query Engine Approach (Data Warehouse, SQL, NoSQL Databases) • Repeated queries over the same well-structured data • Pre-computations like indices and dimensional views improve query performance • Batch Engines (Map-Reduce) • The “query” is run on the data. There are no pre-computations • Streaming Big Data Processing Approach • Real-time response to content in semi-structured data streams • Relatively simple computations on data (aggregates, filters, sliding window, etc.) • Enables data lifecycle by moving data to different stores / open source systems
  • 7. Amazon Kinesis Streams • A fully managed service for real-time processing of high- volume, streaming data. • Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. • Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 9. Shard • Streams are made of Shards. A shard is the base throughput unit of an Amazon Kinesis stream. • One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. • One shard can support up to 1000 PUT records per second. • You can monitor shard-level metrics in Amazon Kinesis Streams • Add or remove shards from your stream dynamically as your data throughput changes by resharding the stream.
  • 10. Data Record • A record is the unit of data stored in an Amazon Kinesis stream. • A record is composed of a; • partition key • sequence number, • data blob (the data you want to send) • The maximum size of a data blob (the data payload after Base64- decoding) is 1 megabyte (MB).
  • 11. Partition Key • Partition key is used to segregate and route data records to different shards of a stream. • A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. • For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
  • 12. Sequence Number • Each data record has a sequence number that is unique within its shard. • The sequence number is assigned by Streams after you write to the stream with client.putRecords or client.putRecord. • Sequence numbers for the same partition key generally increase over time; the longer the time period between write requests, the larger the sequence numbers become.
  • 13. Resharding the Stream • Streams supports resharding, which enables you to adjust the number of shards in your stream in order to adapt to changes in the rate of data flow through the stream. • There are two types of resharding operations: shard split and shard merge. • Shard split: divide a single shard into two shards. • Shard merge: combine two shards into a single shard.
  • 14. Resharding the Stream • Resharding is always “pairwise”: split into & merge more than two shards in a single operation is NOT allowed • Resharding is typically performed by an administrative application which is distinct from the producer (put) applications, and the consumer (get) applications • The administrative application would also need a broader set of IAM permissions for resharding
  • 15. Splitting a Shard • Specify how hash key values from the parent shard should be redistributed to the child shards • The possible hash key values for a given shard constitute a set of ordered contiguous non- negative integers. This range of possible hash key values is given by shard.getHashKeyRange().getStartingHashKey(); shard.getHashKeyRange().getEndingHashKey(); • When you split the shard, you specify a value in this range. • That hash key value and all higher hash key values are distributed to one of the child shards. • All the lower hash key values are distributed to the other child shard.
  • 16. Merging Two Shards • In order to merge two shards, the shards must be adjacent. • Two shards are considered adjacent if the union of the hash key ranges for the two shards form a contiguous set with no gaps. • To identify shards that are candidates for merging, you should filter out all shards that are in a CLOSED state. • Shards that are OPEN—that is, not CLOSED—have an ending sequence number of null.
  • 17. After Resharding • After you call a resharding operation, either splitShard or mergeShards, you need to wait for the stream to become active again. (like create) • In the process of resharding, a parent shard transitions from an OPEN state to a CLOSED state to an EXPIRED state. • When all is done back to ACTIVE state.
  • 18. Retention Period • Data records are accessible for a default of 24 hours from the time they are added to a stream • Configurable in hourly increments • From 24 to 168 hours (1 to 7 days)
  • 19. Amazon Kinesis Producer Library (KPL) • The KPL is an easy-to-use, highly configurable library that helps you write to a Amazon Kinesis stream. • Writes to one or more Amazon Kinesis streams with an automatic and configurable retry mechanism • Collects records and uses PutRecords to write multiple records to multiple shards per request • Aggregates user records to increase payload size and improve throughput • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate batched records on the consumer • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
  • 20. • Develop a consumer application for Amazon Kinesis Streams • The KCL acts as an intermediary between your record processing logic and Streams. • KCL application instantiates a worker with configuration information, and then uses a record processor to process the data received from an Amazon Kinesis stream. • You can run a KCL application on any number of instances. Multiple instances of the same application coordinate on failures and load-balance dynamically. • You can also have multiple KCL applications working on the same stream, subject to throughput limits. Amazon Kinesis Client Library (Life Saver)
  • 21. Amazon Kinesis Client Library • Connects to the stream • Enumerates the shards • Coordinates shard associations with other workers (if any) • Instantiates a record processor for every shard it manages • Pulls data records from the stream • Pushes the records to the corresponding record processor • Checkpoints processed records • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged
  • 22. Amazon Kinesis Client Library • KCL uses a unique Amazon DynamoDB table to keep track of the application's state • KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second • Each row in the DynamoDB table represents a shard that is being processed by your application. The hash key for the table is the shard ID.
  • 23. Amazon Kinesis Client Library • In addition to the shard ID, each row also includes the following data: • checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across all shards in the stream. • checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature, this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record. • leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by another worker. • leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held by one worker at a time. • leaseOwner: The worker that is holding this lease. • ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last time a checkpoint was written. • parentShardId: Used to ensure that the parent shard is fully processed before processing starts on the child shards. This ensures that records are processed in the same order they were put into the stream.
  • 24. Using Shard Iterators • You retrieve records from the stream on a per- shard basis. 
 • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • AT_TIMESTAMP • TRIM_HORIZON • LATEST
  • 25. Recovering from Failures • Record Processor Failure • The worker invokes record processor methods using Java ExecutorService tasks. • If a task fails, the worker retains control of the shard that the record processor was processing. • The worker starts a new record processor task to process that shard • Worker or Application Failure • If a worker — or an instance of the Amazon Kinesis Streams application — fails, you should detect and handle the situation.
  • 26. Handling Duplicate Records (Idempotency) • There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Streams application: • producer retries • consumer retries • Your application must anticipate and appropriately handle processing individual records multiple times.
  • 27. Pricing • Shard Hour (1MB/second ingress, 2MB/second egress)$0.015 • PUT Payload Units, per 1,000,000 units $0.014 • Extended Data Retention (Up to 7 days), per Shard Hour $0.020 • DynamoDB price if you use KCL
  • 28. Kafka vs. Kinesis Streams • In Kafka you can configure, for each topic, the replication factor and how many replicas have to acknowledge a message before is considered successful.So you can definitely make it highly available. • Amazon ensures that you won't lose data, but that comes with a performance cost. 
 (messages are written to 3 different AZ’s synchronously) • There are several benchmarks online comparing Kafka and Kinesis, but the result it's always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At least for a reasonable price. • This is in part is because Kafka is insanely fast, but also because Kinesis writes each message synchronously to 3 different machines. And this is quite costly in terms of latency and throughput. • Kafka is one of the preferred options for the Apache stream processing frameworks • Unsurprisingly, Kinesis is really well integrated with other AWS services
  • 29. DynamoDB Streams vs. Kinesis Streams • DynamoDB Streams actions are similar to their counterparts in Amazon Kinesis Streams, they are not 100% identical. • You can write applications for Amazon Kinesis Streams using the Amazon Kinesis Client Library (KCL). • You can leverage the design patterns found within the KCL to process DynamoDB Streams shards and stream records. To do this, you use the DynamoDB Streams Kinesis Adapter
  • 30. SQS vs. Kinesis Streams • Amazon Kinesis Streams enables real-time processing of streaming big data. • It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. • The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering). • Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. • Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.
  • 32. Amazon Kinesis Firehose • Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. • It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service • Fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. • It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.
  • 33. Amazon Kinesis Analytics • Process streaming data in real time with standard SQL • Query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights and respond to your business and customer needs promptly. • Scales automatically to match the volume and throughput rate of your incoming data • Only pay for the resources your queries consume. There is no minimum fee or setup cost.
  • 34. Amazon Kinesis Analytics Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
  • 35. Thank you! Time to show you real life examples from OpsGenie