SlideShare a Scribd company logo
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sources
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mastering AWS Lambda streaming
event sources
Adam Wagner
S V S 3 2 3 - R
Solutions Architect
Amazon Web Services
Related breakouts
SVS317-R – Serverless stream processing pipeline best practices
SVS401-R – Optimizing your serverless applications
SVS335-R – Serverless at scale: Design patterns and optimizations
API304 – Scalable serverless event-driven applications using Amazon SQS
& Lambda
Agenda
Introduction to streaming event sources for AWS Lambda
Scaling
Monitoring and error handling
Common issues
Performance and optimization
Session expectations
• Chalk-talk format – Please ask questions
• What we will cover
• The details of using Lambda with streaming event sources
• Scaling
• Monitoring
• Error handling
• Performance and optimization
• What we won’t cover
• What is serverless?
• What is Lambda?
• Event sources outside of Amazon Kinesis Data Streams and
Amazon DynamoDB Streams
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Easily collect, process, and analyze video and data streams in real time
Capture, process,
and store video
streams for
analytics
Load data streams
into AWS
data stores
Analyze data
streams with SQL
Build custom
applications that
analyze data
streams
Amazon Kinesis
Video Streams
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
Amazon DynamoDB
Amazon DynamoDB
Document or key-value
Scales to any workload
Fully managed NoSQL
Access control
Event-driven programming
Fast and consistent
DynamoDB Streams
DynamoDB DynamoDB stream
✓ Stream of item changes
✓ Exactly once, guaranteed delivery
✓ Strictly ordered by key
✓ Durable, scalable
✓ Fully managed
✓ 24-hour data retention
✓ Sub-second latency
✓ Event source for Lambda
DynamoDB Streams
What we’re talking about today
Kinesis
Data Streams
Lambda
Data
Produce
r
Data
Produce
r
Data
producer
Downstream
system
Clients
Produce
r
Clients
Produce
r
Clients
DynamoDB DynamoDB stream Lambda
Downstream
system
Kinesis Data Streams
Kinesis
Data Streams Lambda service
Data
ProducerData
ProducerData
producer
Lambda function A
Lambda function B
DynamoDB Streams
Clients
Produce
r
Clients
Produce
r
Clients
DynamoDB DynamoDB stream
Lambda service
Lambda function A
Lambda function B
Kinesis data stream shard detail
Data
ProducerData
ProducerData
producer
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Lambda service
Kinesis data stream
Data
ProducerData
ProducerData
producer
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Kinesis data stream
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Kinesis data stream shard-level detail
Shard
1. Lambda service polls the shard
once per second for a set of records.
Then synchronously invokes the
Lambda function with the batch of
records.
2. If the Lambda returns successfully,
the Lambda service advances to the
next set of records and repeats step
1.
3. If the Lambda errors, by default
the Lambda service invokes the
function with the same set of records
and will continue to do so until it
succeeds or the records age out of
the stream.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis data stream scaling
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Kinesis data stream scaling
aws kinesis update-shard-count --stream-name reinvent19-01
--target-shard-count 8 --scaling-type UNIFORM_SCALING
{
"StreamName": "reinvent19-01",
"CurrentShardCount": 4,
"TargetShardCount": 8
}
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Kinesis data stream scaling
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Shard
Shard
Shard
Shard
aws kinesis update-shard-count --stream-name reinvent19-01
--target-shard-count 8 --scaling-type UNIFORM_SCALING
{
"StreamName": "reinvent19-01",
"CurrentShardCount": 4,
"TargetShardCount": 8
}
Kinesis data stream scaling
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
• Scale more than twice per rolling 24-hour
period per stream
• Scale up to more than double your current
shard count for a stream
• Scale down below half your current shard
count for a stream
• Scale up to more than 500 shards in a stream
• Scale a stream with more than 500 shards
down unless the result is less than 500 shards
• Scale up to more than the shard limit for your
account
Kinesis data stream scaling … more detail
Function
Shard
Shard
Kinesis data stream
Shard
Shard
Shard
Shard
• The stream scales up by
splitting shards
• Splitting a shard creates two
new child shards that split
the partition keyspace of
the parent shard
• Lambda will not start
receiving records from the
child shards until it’s
processed all records from
the parent shard
Throughput considerations
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
Parallelization Factor
FunctionKinesis Stream
Shard
Shard
Shard
Shard
• Adds Lambda parallelization per shard
• Setting of 1 is the same as the current
behavior, maximum setting is 10
• Batching via partition keys to maintain in order
processing per partition key
• Works with both Kinesis Data Streams and
DynamoDB Streams
--parallelization-factor 1
Parallelization Factor
Kinesis Stream
Shard
Shard
Shard
Shard
Function
--parallelization-factor 2
• Adds Lambda parallelization per shard
• Setting of 1 is the same as the current
behavior, maximum setting is 10
• Batching via partition keys to maintain in order
processing per partition key
• Works with both Kinesis Data Streams and
DynamoDB Streams
Parallelization Factor
Kinesis data stream scaling
FunctionKinesis data
stream
Shard
Shard
Shard
Shard
• Auto-scale your shard count using Application
Auto Scaling:
https://aws.amazon.com/blogs/big-
data/scaling-amazon-kinesis-data-streams-
with-aws-application-auto-scaling/
• Scale conservatively to leave overhead for
bursts of traffic
• Scale your shard count to match your Lambda
throughput and/or use Parallelization Factor
• Test! Test! Test! Measure unit tests to watch
for performance regressions, and also test at
scale!
DynamoDB Streams scaling
FunctionDynamoDB stream
Shard
Shard
Shard
Shard
DynamoDB
DynamoDB on-demand vs. provisioned capacity
DynamoDB on-demand scaling
DynamoDB tables using on-demand capacity mode automatically adapt to your application’s traffic
volume. On-demand capacity mode instantly accommodates up to double the previous peak traffic on a
table. For example, if your application’s traffic pattern varies between 25,000 and 50,000 strongly
consistent reads per second where 50,000 reads per second is the previous traffic peak, on-demand
capacity mode instantly accommodates sustained traffic of up to 100,000 reads per second. If your
application sustains traffic of 100,000 reads per second, that peak becomes your new previous peak,
enabling subsequent traffic to reach up to 200,000 reads per second.
DynamoDB on-demand scaling
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis data stream monitoring
Kinesis
Data Streams
Lambda
Data
ProducerData
ProducerData
producer
Downstream
system
DynamoDB stream monitoring
Clients
Produce
r
Clients
Produce
r
Clients
DynamoDB DynamoDB stream Lambda
Downstream
system
Error handling options
A number of new options are available to tune error handling
• Maximum retry attempts – min 0, default/max 10,000
• Maximum Record Age in seconds – min 60, default/max 604,800
• Bisect Batch on Function Failure
• On-Failure Destination
Bisect Batch on Function Failure
Recursively split the failed batch and retry on a
smaller subset of records, eventually isolating
the problematic records
• Boolean – false by default
• These retries do NOT count towards
MaximumRetryAttempts
• Make sure your function is idempotent
On-Failure Destination
An SNS Topic or SQS Queue, which is sent the metadata about a failed
batch of records
• Used only after configured retry limit or maximum record age are
reached.
• Remember the bisected batch retries are not counted towards retry
limit.
• Does not contain the actual records, but does contain all the
information needed to retrieve them!!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common issues
• Kinesis Data Streams
• IteratorAge is growing rapidly
• ReadProvisionedThroughputExceeded throttles
• DynamoDB Streams
• IteratorAge is growing rapidly
• Rapid growth in Lambda concurrency
Kinesis Data Streams IteratorAge is growing rapidly
• Initial questions
• How many Lambda functions subscribed to the stream?
• Does the Lambda function show any errors?
• Does the Lambda function show any throttles?
• Is there a large increase in Kinesis Data Streams metrics IncomingRecords or IncomingBytes?
Kinesis Data Streams IteratorAge is growing rapidly
Kinesis
Data Streams
Lambda
Data
ProducerData
ProducerData
producer
Downstream
system
Kinesis Data Streams IteratorAge is growing rapidly
• Solutions
• If the Lambda is erroring
• Configure an SQS Queue or SNS Topic for failed batches
• Configure MaximumRetryAttempts, BisectBatchOnFunctionError, and
MaximumRecordAgeInSeconds
• Update the Lambda function to log records causing errors and return successfully
• If the Lambda is throttling?
• Increase per function limit/reservation, or raise the account level limit
Kinesis Data Streams IteratorAge is growing rapidly
• Solutions
• If there is a large increase in KDS Metrics IncommingRecords or IncommingBytes
• If this is temporary, you may be able to wait it out. Watch IteratorAge to make sure it
doesn’t climb too high
• Increase the stream data retention, this can be increased up to 7 days
• Increase the Parallelization Factor
• Increase the number of shards in the stream
• Increase the memory assigned to the Lambda function or otherwise optimize the
function’s performance
Kinesis Data Streams
ReadProvisionedThroughputExceeded
The 5 read/sec. or 2 MiB/sec. limit is being hit
• Use enhanced fanout or remove one or more subscribers
• Remember that Kinesis Data Firehose and Kinesis Data Analytics are subscribers as well!
DynamoDB Streams IteratorAge is growing rapidly
• Initial questions
• How many Lambda functions subscribed to the stream?
• Does the Lambda function show any errors or throttles?
• Does the Lambda function show an increase in duration?
• Is there a large increase in the DynamoDB table write (WCU) metrics
• Is there a large increase in the DynamoDB stream metrics
DynamoDB Stream IteratorAge is growing rapidly
• Solutions
• If there is a large increase in writes on the DDB Table:
• If this is temporary, you may be able to wait it out. Watch IteratorAge to make sure it
doesn’t climb too high
• Unlike KDS you can NOT increase the data retention time, so you need to take action more
quickly
• Increase the memory assigned to the Lambda function or otherwise optimize the
function’s performance
• Increase the Parallelization Factor
• If there are more than two Lambda functions subscribed to the stream, consider adding a
Kinesis Data Stream for increasing the fan-out
DynamoDB stream fanout
Clients
Produce
r
Clients
Produce
r
Clients
DynamoDB DynamoDB stream Lambda Kinesis
Data Streams
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance
• What matters to your application?
• End-to-end latency
• Overall cost
• Kinesis Data Streams enhanced fanout (EFO)
• DynamoDB Streams
• Small messages in Kinesis Data Streams
• Aggregation/de-aggregation libraries
• Compression
• Low throughput streams—batch window to the rescue
Lambda supports Kinesis Data Streams Enhanced
Fan-Out and HTTP/2 for faster streaming
Enhanced fan-out allows customers to scale the
number of functions reading from a stream in
parallel while maintaining performance
HTTP/2 data retrieval API improves data delivery
speed between data producers and Lambda
functions by more than 65%
Amazon Kinesis
Data Streams
Kinesis Data Streams: Enhanced Fan-Out
When to use standard consumers:
• Total number of consuming applications is low (< 3)
• Consumers are not latency-sensitive
• Newer error handling options are needed*
• Minimize cost
When to use Enhanced Fan-Out consumers:
• Multiple consumer applications for the same Kinesis Data Stream
• Default limit of 5 registered consuming applications. More can be
supported with a service limit increase request
• Low-latency requirements for data processing
• Messages are typically delivered to a consumer in less than 70 ms
Optimizing Small Messages in Kinesis
• Kinesis Data Streams per shard write limits
• 1MiB/sec or 1,000 messages/sec
• With high volumes of small messages you reach the 1,000 messages/sec limit easily
• This leads to lower throughput per shard and higher costs
Aggregation is the answer!
Aggregation / de-aggregation options
• Producer side
• Kinesis Producer Library(KPL)(https://github.com/awslabs/amazon-kinesis-producer)
• Kinesis Aggregation Library(https://github.com/awslabs/kinesis-aggregation)
• Consumer side within Lambda
• Kinesis Aggregation Library(https://github.com/awslabs/kinesis-aggregation)
• Java, Node.js, and Python versions available
• Another option if your data has a consistent format is Avro
Low-throughput streams
Lambda triggered with very small batches
Leads to higher cost per message
For archiving workloads the resulting payload is too small
FunctionKinesis data stream
Shard
Shard
Shard
Shard
Batch window
• Additional knob to tune the stream trigger
• Set a time to wait before triggering. Max five minutes, set in seconds.
• Batch size is still respected and will trigger on full batches before the batch window is up
• Works for both Kinesis Data Streams and DynamoDB Streams triggers
FunctionKinesis Stream
Shard
Shard
Shard
Shard
Conclusion
• Be clear on the goals of your streaming system
• Understand how your system scales
• Prepare for failures, make use of the new rror handling options
• Test individual components as well as end to end
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Free, on-demand courses on serverless, including
Visit the Learning Library at https://aws.training
Additional digital and classroom trainings cover modern
application development and computing
Learn serverless with AWS Training and Certification
Resources created by the experts at AWS to help you learn modern application development
• Introduction to Serverless
Development
• Getting into the Serverless
Mindset
• AWS Lambda Foundations
• Amazon API Gateway for
Serverless Applications
• Amazon DynamoDB for Serverless
Architectures
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

Similar to AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sources (20)

PPSX
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
PPTX
Building real-time serverless data applications with Confluent and AWS.pptx
Ahmed791434
 
PPTX
Building real-time serverless data applications with Confluent and AWS - Lond...
Ahmed791434
 
PDF
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
PPTX
Kinesis @ lyft
Mian Hamid
 
PDF
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
PDF
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
PPTX
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
PDF
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
PPTX
Experiences sharing about Lambda, Kinesis, and Postgresql
Okis Chuang
 
PDF
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 
PDF
AWS data engineer online course | AWS data engineer training
Accentfuture
 
PDF
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
 
PDF
Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...
HostedbyConfluent
 
PDF
Serverless Architectural Patterns - ServerlessDays TLV
Boaz Ziniman
 
PDF
Serverless Architectural Patterns and Best Practices | AWS
AWS Germany
 
PPTX
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
PDF
SNS SQS SWF and Kinesis
Mahesh Raj
 
PDF
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Building real-time serverless data applications with Confluent and AWS.pptx
Ahmed791434
 
Building real-time serverless data applications with Confluent and AWS - Lond...
Ahmed791434
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
 
Kinesis @ lyft
Mian Hamid
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Sungmin Kim
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Okis Chuang
 
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 
AWS data engineer online course | AWS data engineer training
Accentfuture
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
 
Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...
HostedbyConfluent
 
Serverless Architectural Patterns - ServerlessDays TLV
Boaz Ziniman
 
Serverless Architectural Patterns and Best Practices | AWS
AWS Germany
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
SNS SQS SWF and Kinesis
Mahesh Raj
 
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 

Recently uploaded (20)

PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
July Patch Tuesday
Ivanti
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Ad

AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sources

  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mastering AWS Lambda streaming event sources Adam Wagner S V S 3 2 3 - R Solutions Architect Amazon Web Services
  • 3. Related breakouts SVS317-R – Serverless stream processing pipeline best practices SVS401-R – Optimizing your serverless applications SVS335-R – Serverless at scale: Design patterns and optimizations API304 – Scalable serverless event-driven applications using Amazon SQS & Lambda
  • 4. Agenda Introduction to streaming event sources for AWS Lambda Scaling Monitoring and error handling Common issues Performance and optimization
  • 5. Session expectations • Chalk-talk format – Please ask questions • What we will cover • The details of using Lambda with streaming event sources • Scaling • Monitoring • Error handling • Performance and optimization • What we won’t cover • What is serverless? • What is Lambda? • Event sources outside of Amazon Kinesis Data Streams and Amazon DynamoDB Streams
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. Amazon Kinesis Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics
  • 8. Amazon DynamoDB Amazon DynamoDB Document or key-value Scales to any workload Fully managed NoSQL Access control Event-driven programming Fast and consistent
  • 9. DynamoDB Streams DynamoDB DynamoDB stream ✓ Stream of item changes ✓ Exactly once, guaranteed delivery ✓ Strictly ordered by key ✓ Durable, scalable ✓ Fully managed ✓ 24-hour data retention ✓ Sub-second latency ✓ Event source for Lambda DynamoDB Streams
  • 10. What we’re talking about today Kinesis Data Streams Lambda Data Produce r Data Produce r Data producer Downstream system Clients Produce r Clients Produce r Clients DynamoDB DynamoDB stream Lambda Downstream system
  • 11. Kinesis Data Streams Kinesis Data Streams Lambda service Data ProducerData ProducerData producer Lambda function A Lambda function B
  • 12. DynamoDB Streams Clients Produce r Clients Produce r Clients DynamoDB DynamoDB stream Lambda service Lambda function A Lambda function B
  • 13. Kinesis data stream shard detail Data ProducerData ProducerData producer FunctionKinesis data stream Shard Shard Shard Shard Lambda service
  • 15. Kinesis data stream FunctionKinesis data stream Shard Shard Shard Shard
  • 16. Kinesis data stream shard-level detail Shard 1. Lambda service polls the shard once per second for a set of records. Then synchronously invokes the Lambda function with the batch of records. 2. If the Lambda returns successfully, the Lambda service advances to the next set of records and repeats step 1. 3. If the Lambda errors, by default the Lambda service invokes the function with the same set of records and will continue to do so until it succeeds or the records age out of the stream.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. Kinesis data stream scaling FunctionKinesis data stream Shard Shard Shard Shard
  • 19. Kinesis data stream scaling aws kinesis update-shard-count --stream-name reinvent19-01 --target-shard-count 8 --scaling-type UNIFORM_SCALING { "StreamName": "reinvent19-01", "CurrentShardCount": 4, "TargetShardCount": 8 } FunctionKinesis data stream Shard Shard Shard Shard
  • 20. Kinesis data stream scaling FunctionKinesis data stream Shard Shard Shard Shard Shard Shard Shard Shard aws kinesis update-shard-count --stream-name reinvent19-01 --target-shard-count 8 --scaling-type UNIFORM_SCALING { "StreamName": "reinvent19-01", "CurrentShardCount": 4, "TargetShardCount": 8 }
  • 21. Kinesis data stream scaling FunctionKinesis data stream Shard Shard Shard Shard • Scale more than twice per rolling 24-hour period per stream • Scale up to more than double your current shard count for a stream • Scale down below half your current shard count for a stream • Scale up to more than 500 shards in a stream • Scale a stream with more than 500 shards down unless the result is less than 500 shards • Scale up to more than the shard limit for your account
  • 22. Kinesis data stream scaling … more detail Function Shard Shard Kinesis data stream Shard Shard Shard Shard • The stream scales up by splitting shards • Splitting a shard creates two new child shards that split the partition keyspace of the parent shard • Lambda will not start receiving records from the child shards until it’s processed all records from the parent shard
  • 24. Parallelization Factor FunctionKinesis Stream Shard Shard Shard Shard • Adds Lambda parallelization per shard • Setting of 1 is the same as the current behavior, maximum setting is 10 • Batching via partition keys to maintain in order processing per partition key • Works with both Kinesis Data Streams and DynamoDB Streams --parallelization-factor 1
  • 25. Parallelization Factor Kinesis Stream Shard Shard Shard Shard Function --parallelization-factor 2 • Adds Lambda parallelization per shard • Setting of 1 is the same as the current behavior, maximum setting is 10 • Batching via partition keys to maintain in order processing per partition key • Works with both Kinesis Data Streams and DynamoDB Streams
  • 27. Kinesis data stream scaling FunctionKinesis data stream Shard Shard Shard Shard • Auto-scale your shard count using Application Auto Scaling: https://aws.amazon.com/blogs/big- data/scaling-amazon-kinesis-data-streams- with-aws-application-auto-scaling/ • Scale conservatively to leave overhead for bursts of traffic • Scale your shard count to match your Lambda throughput and/or use Parallelization Factor • Test! Test! Test! Measure unit tests to watch for performance regressions, and also test at scale!
  • 28. DynamoDB Streams scaling FunctionDynamoDB stream Shard Shard Shard Shard DynamoDB
  • 29. DynamoDB on-demand vs. provisioned capacity
  • 30. DynamoDB on-demand scaling DynamoDB tables using on-demand capacity mode automatically adapt to your application’s traffic volume. On-demand capacity mode instantly accommodates up to double the previous peak traffic on a table. For example, if your application’s traffic pattern varies between 25,000 and 50,000 strongly consistent reads per second where 50,000 reads per second is the previous traffic peak, on-demand capacity mode instantly accommodates sustained traffic of up to 100,000 reads per second. If your application sustains traffic of 100,000 reads per second, that peak becomes your new previous peak, enabling subsequent traffic to reach up to 200,000 reads per second.
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. Kinesis data stream monitoring Kinesis Data Streams Lambda Data ProducerData ProducerData producer Downstream system
  • 35. Error handling options A number of new options are available to tune error handling • Maximum retry attempts – min 0, default/max 10,000 • Maximum Record Age in seconds – min 60, default/max 604,800 • Bisect Batch on Function Failure • On-Failure Destination
  • 36. Bisect Batch on Function Failure Recursively split the failed batch and retry on a smaller subset of records, eventually isolating the problematic records • Boolean – false by default • These retries do NOT count towards MaximumRetryAttempts • Make sure your function is idempotent
  • 37. On-Failure Destination An SNS Topic or SQS Queue, which is sent the metadata about a failed batch of records • Used only after configured retry limit or maximum record age are reached. • Remember the bisected batch retries are not counted towards retry limit. • Does not contain the actual records, but does contain all the information needed to retrieve them!!
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 39. Common issues • Kinesis Data Streams • IteratorAge is growing rapidly • ReadProvisionedThroughputExceeded throttles • DynamoDB Streams • IteratorAge is growing rapidly • Rapid growth in Lambda concurrency
  • 40. Kinesis Data Streams IteratorAge is growing rapidly • Initial questions • How many Lambda functions subscribed to the stream? • Does the Lambda function show any errors? • Does the Lambda function show any throttles? • Is there a large increase in Kinesis Data Streams metrics IncomingRecords or IncomingBytes?
  • 41. Kinesis Data Streams IteratorAge is growing rapidly Kinesis Data Streams Lambda Data ProducerData ProducerData producer Downstream system
  • 42. Kinesis Data Streams IteratorAge is growing rapidly • Solutions • If the Lambda is erroring • Configure an SQS Queue or SNS Topic for failed batches • Configure MaximumRetryAttempts, BisectBatchOnFunctionError, and MaximumRecordAgeInSeconds • Update the Lambda function to log records causing errors and return successfully • If the Lambda is throttling? • Increase per function limit/reservation, or raise the account level limit
  • 43. Kinesis Data Streams IteratorAge is growing rapidly • Solutions • If there is a large increase in KDS Metrics IncommingRecords or IncommingBytes • If this is temporary, you may be able to wait it out. Watch IteratorAge to make sure it doesn’t climb too high • Increase the stream data retention, this can be increased up to 7 days • Increase the Parallelization Factor • Increase the number of shards in the stream • Increase the memory assigned to the Lambda function or otherwise optimize the function’s performance
  • 44. Kinesis Data Streams ReadProvisionedThroughputExceeded The 5 read/sec. or 2 MiB/sec. limit is being hit • Use enhanced fanout or remove one or more subscribers • Remember that Kinesis Data Firehose and Kinesis Data Analytics are subscribers as well!
  • 45. DynamoDB Streams IteratorAge is growing rapidly • Initial questions • How many Lambda functions subscribed to the stream? • Does the Lambda function show any errors or throttles? • Does the Lambda function show an increase in duration? • Is there a large increase in the DynamoDB table write (WCU) metrics • Is there a large increase in the DynamoDB stream metrics
  • 46. DynamoDB Stream IteratorAge is growing rapidly • Solutions • If there is a large increase in writes on the DDB Table: • If this is temporary, you may be able to wait it out. Watch IteratorAge to make sure it doesn’t climb too high • Unlike KDS you can NOT increase the data retention time, so you need to take action more quickly • Increase the memory assigned to the Lambda function or otherwise optimize the function’s performance • Increase the Parallelization Factor • If there are more than two Lambda functions subscribed to the stream, consider adding a Kinesis Data Stream for increasing the fan-out
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 49. Performance • What matters to your application? • End-to-end latency • Overall cost • Kinesis Data Streams enhanced fanout (EFO) • DynamoDB Streams • Small messages in Kinesis Data Streams • Aggregation/de-aggregation libraries • Compression • Low throughput streams—batch window to the rescue
  • 50. Lambda supports Kinesis Data Streams Enhanced Fan-Out and HTTP/2 for faster streaming Enhanced fan-out allows customers to scale the number of functions reading from a stream in parallel while maintaining performance HTTP/2 data retrieval API improves data delivery speed between data producers and Lambda functions by more than 65% Amazon Kinesis Data Streams
  • 51. Kinesis Data Streams: Enhanced Fan-Out When to use standard consumers: • Total number of consuming applications is low (< 3) • Consumers are not latency-sensitive • Newer error handling options are needed* • Minimize cost When to use Enhanced Fan-Out consumers: • Multiple consumer applications for the same Kinesis Data Stream • Default limit of 5 registered consuming applications. More can be supported with a service limit increase request • Low-latency requirements for data processing • Messages are typically delivered to a consumer in less than 70 ms
  • 52. Optimizing Small Messages in Kinesis • Kinesis Data Streams per shard write limits • 1MiB/sec or 1,000 messages/sec • With high volumes of small messages you reach the 1,000 messages/sec limit easily • This leads to lower throughput per shard and higher costs Aggregation is the answer!
  • 53. Aggregation / de-aggregation options • Producer side • Kinesis Producer Library(KPL)(https://github.com/awslabs/amazon-kinesis-producer) • Kinesis Aggregation Library(https://github.com/awslabs/kinesis-aggregation) • Consumer side within Lambda • Kinesis Aggregation Library(https://github.com/awslabs/kinesis-aggregation) • Java, Node.js, and Python versions available • Another option if your data has a consistent format is Avro
  • 54. Low-throughput streams Lambda triggered with very small batches Leads to higher cost per message For archiving workloads the resulting payload is too small FunctionKinesis data stream Shard Shard Shard Shard
  • 55. Batch window • Additional knob to tune the stream trigger • Set a time to wait before triggering. Max five minutes, set in seconds. • Batch size is still respected and will trigger on full batches before the batch window is up • Works for both Kinesis Data Streams and DynamoDB Streams triggers FunctionKinesis Stream Shard Shard Shard Shard
  • 56. Conclusion • Be clear on the goals of your streaming system • Understand how your system scales • Prepare for failures, make use of the new rror handling options • Test individual components as well as end to end
  • 57. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Free, on-demand courses on serverless, including Visit the Learning Library at https://aws.training Additional digital and classroom trainings cover modern application development and computing Learn serverless with AWS Training and Certification Resources created by the experts at AWS to help you learn modern application development • Introduction to Serverless Development • Getting into the Serverless Mindset • AWS Lambda Foundations • Amazon API Gateway for Serverless Applications • Amazon DynamoDB for Serverless Architectures
  • 58. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 59. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.