SlideShare a Scribd company logo
Real-time serverless
analytics at Shedd
Overview and hands-on workshop
Dobo Radichkov
OLX Data Summit, March 2018
2
What to expect…
ØGoal is to give you a sweeping view of the Shedd
serverless real-time analytics stack
ØWe will cover a lot of new tools and tech building blocks,
though we will steer clear of the nitty gritty details
ØExpect technical content and hands-on exercises – for
the non-technical folk in the audience, try to focus on the
high-level understanding of the concepts
ØWe hope the presentation gives you inspiration and
smoothens the learning curve in case you decide to
pursue a similar approach
3
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
4
Why real-time analytics?
VS
Offline Real-time
5
Why real-time analytics?
VS
Offline Real-time
Enables products that adapt and respond to
changing user behaviour instantly and continuously
6
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
Day 1
activity
Browser Viewer Buyer
7
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
8
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
How can real-time analytics help?
9
Real-time analytics unlocks a number of capabilities
Segment user behaviour and build real-time single customer viewSegmentation
Personalisation
Targeting
Reporting
A/B testing
Data-driven
products
Instantly personalise product experience based on up-to-date user
preferences and behaviour
Target users with push notifications, in-app messaging and custom
product flows based on real-time triggers and rules
Build mission-critical reports for real-time decision-making (e.g.
during large live marketing campaign or new product releases)
Continuously optimise live A/B tests based on real-time results
Enable integration of data analytics & models within our products
10
Real-time analytics enables us to unlock the full value of dataThe diminishing value of data
Recent data is highly valuab
If you act on it in time
Perishable Insights (M. Gualtieri, F
Old + Recent data is more v
If you have the means to combine t
11
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Today we will take a peek at Shedd’s real-time data stack
12
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
13
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
14
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
15
Kinesis includes 3 flavours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
Stream à Process Stream à Analyse Stream à Ingest
16
Kinesis Data Stream architecture
▪ 1 MB / sec data input
▪ 1 MB / sec data output
▪ 1000 records / sec
▪ 24 hours data retention
▪ $0.015 / shard / hour
($10.80 / shard / month)
▪ $0.014 / 1M records
($14 / 1B records)
…
Stream
Shard
Event / data record (e.g. JSON object)
Write event to stream shard
Read event from stream shard
17
Exercise: Create stream and feed with sample data
1. Create Kinesis data stream 2. Feed sample real-time data
https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
18
Kinesis Analytics enables real-time data analysis,
transformation, enrichment and visualisation
19
Exercise: Create Kinesis Analytics application and run some
real-time SQL analysis
1. Create Kinesis Analytics app 2. Run real-time SQL analysis
20
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
21
Evolution of computing models
ON-PREMISE
Physical servers
SERVER as a service
Virtual server in
the cloud
Amazon EC2
APP as a service
Virtual app
container
Amazon ECS
FUNCTION as a service
Serverless
computing
AWS Lambda
22
Lambda is Amazon’s serverless event-driven compute service
Write code in
Python, Node.js,
Java, and others
and upload to
Lambda
Trigger code from
other AWS services,
HTTP endpoints or
in-app activity
Scale seamlessly and
elastically with number of
events, only using
required compute
resource
Only pay for the
compute time
used (per 100ms
execution time)
Forget about infrastructure, administration and scaling – focus 100% on your app logic
23
Exercise: Let’s create 2 simple Lambda functions
1. Create Hello World 2. Create stream processor
24
Combining Lambda with API gateway empowers the data
professional to create serverless APIs
25
serverless framework streamlines and automates deployment
26
Exercise: Create APIs with serverless + API gateway + Lambda
1. Create Hello World endpoint 2. Create mock API endpoint
27
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
28
ElastiCache is Amazon’s managed service for Redis:
an INSANELY fast in-memory key-value database
▪In-memory
▪Low latency
▪Ridiculously fast
▪NoSQL à key-value store
▪Open source
29
Redis + Redshift =
▪ Run few queries infrequently
▪ Process billions of records per query
▪ Standard SQL
▪ Batch
▪ Run millions of commands continuously
▪ Process few records per command
▪ 200 Redis commands + Lua scripting
▪ Real-time
30
Redis is a key-value store supporting 5 basic data types
Key => { Data Structures }
Key
"I'm a Plain Text String!"
Key1 Val1
Key2 Val 2
A: 0.1 B: 0.3 C: 500 D: 500
A B C D
C B B A C
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
String
Hash
List
Set
Sorted set
31
Exercise: Let’s have a look at Redis in action
1. Play with Redis commands 2. Test Redis speed
32
Recap: We covered the 3 AWS building blocks for real-time data
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
+
33
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
34
Real-time vs offline data stacks
Offline
stack
Real-
time
stack
Raw data Files on S3 Kinesis streams
Database Redshift Redis
Volume
High – processing millions /
billions of records at the same time
Low – processing
single records at a time
Velocity
Low – running
few queries at a time
High – running thousands / millions
of queries at the same time
Query language SQL Python + Redis commands
End-user Humans, BI tools Lambda, APIs, products
35
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Shedd end-to-end data stack architecutre
36
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
37
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
Segmentation API
Lambda
Kingsman service
38
Shedd app
Android /
iOS SDK
FRONTEND
Analytics API
handler
Lambda
Endpoint(s)
API gateway
API
Data
warehouse
Redshift
Redis
bulk loader
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd analytics APIs
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
39
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
Thank you
Questions? Feedback?
Dobo Radichkov
Analytics summit, Jan 2018

More Related Content

PDF
London Redshift Meetup - July 2017
Pratim Das
 
PDF
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
Dobo Radichkov
 
PDF
OLX Ventures blockchain perspective, Feb 2018
Dobo Radichkov
 
PPTX
Best Practices for MongoDB in Today's Telecommunications Market
MongoDB
 
PDF
How Financial Services Organizations Use MongoDB
MongoDB
 
PPTX
Webinar: How Financial Services Organizations Use MongoDB
MongoDB
 
PPTX
MongoDB on Financial Services Sector
Norberto Leite
 
PPT
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
MongoDB
 
London Redshift Meetup - July 2017
Pratim Das
 
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
Dobo Radichkov
 
OLX Ventures blockchain perspective, Feb 2018
Dobo Radichkov
 
Best Practices for MongoDB in Today's Telecommunications Market
MongoDB
 
How Financial Services Organizations Use MongoDB
MongoDB
 
Webinar: How Financial Services Organizations Use MongoDB
MongoDB
 
MongoDB on Financial Services Sector
Norberto Leite
 
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
MongoDB
 

What's hot (20)

PPTX
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
PPT
How Retail Banks Use MongoDB
MongoDB
 
PDF
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB
 
PPT
Webinar: Expanding Retail Frontiers with MongoDB
MongoDB
 
PDF
Improving Transactional Applications with Analytics
DATAVERSITY
 
PPTX
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
MongoDB
 
PPTX
Webinar: Position and Trade Management with MongoDB
MongoDB
 
PPTX
Use Cases for NoSQL in Media
Sander Kieft
 
PPTX
How leading financial services organisations are winning with tech
MongoDB
 
PDF
MongoDB in FS
MongoDB
 
PDF
How Enterprises are Using NoSQL for Mission-Critical Applications
DATAVERSITY
 
PPTX
How Insurance Companies Use MongoDB
MongoDB
 
PPTX
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
PDF
GraphTalks Rome - The Italian Business Graph
Neo4j
 
PPTX
Event-Based Subscription with MongoDB
MongoDB
 
PPTX
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
PDF
Neo4j PartnerDay Amsterdam 2017
Neo4j
 
PDF
JavaScript as Data Processing Language & HTML5 Integration
Quentin Adam
 
PPTX
Calculating ROI with Innovative eCommerce Platforms
MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
How Retail Banks Use MongoDB
MongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB
 
Webinar: Expanding Retail Frontiers with MongoDB
MongoDB
 
Improving Transactional Applications with Analytics
DATAVERSITY
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
MongoDB
 
Webinar: Position and Trade Management with MongoDB
MongoDB
 
Use Cases for NoSQL in Media
Sander Kieft
 
How leading financial services organisations are winning with tech
MongoDB
 
MongoDB in FS
MongoDB
 
How Enterprises are Using NoSQL for Mission-Critical Applications
DATAVERSITY
 
How Insurance Companies Use MongoDB
MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
GraphTalks Rome - The Italian Business Graph
Neo4j
 
Event-Based Subscription with MongoDB
MongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
Neo4j PartnerDay Amsterdam 2017
Neo4j
 
JavaScript as Data Processing Language & HTML5 Integration
Quentin Adam
 
Calculating ROI with Innovative eCommerce Platforms
MongoDB
 
Ad

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona (20)

PDF
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Sungmin Kim
 
PPTX
Modern data warehouse
Elena Lopez
 
PDF
Lyft data Platform - 2019 slides
Karthik Murugesan
 
PDF
The Lyft data platform: Now and in the future
markgrover
 
PPTX
Functional architectural patterns
Lars Albertsson
 
PPTX
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
PDF
Real-time big data analytics based on product recommendations case study
deep.bi
 
PDF
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
PDF
20141021 AWS Cloud Taekwon - Big Data on AWS
Amazon Web Services Korea
 
PPTX
The Internet as a Single Database
Datafiniti
 
PPTX
Snowplow Analytics: from NoSQL to SQL and back again
Alexander Dean
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
PPTX
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
PDF
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
PDF
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Grid Dynamics
 
PPTX
Implementing Analytics in High-Traffic Social Games
Social Point
 
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
PDF
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
PPTX
How does Microsoft solve Big Data?
James Serra
 
PDF
Cloud Big Data Architectures
Lynn Langit
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Sungmin Kim
 
Modern data warehouse
Elena Lopez
 
Lyft data Platform - 2019 slides
Karthik Murugesan
 
The Lyft data platform: Now and in the future
markgrover
 
Functional architectural patterns
Lars Albertsson
 
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Real-time big data analytics based on product recommendations case study
deep.bi
 
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
20141021 AWS Cloud Taekwon - Big Data on AWS
Amazon Web Services Korea
 
The Internet as a Single Database
Datafiniti
 
Snowplow Analytics: from NoSQL to SQL and back again
Alexander Dean
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Grid Dynamics
 
Implementing Analytics in High-Traffic Social Games
Social Point
 
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
How does Microsoft solve Big Data?
James Serra
 
Cloud Big Data Architectures
Lynn Langit
 
Ad

Recently uploaded (20)

PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Doc9.....................................
SofiaCollazos
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

  • 1. Real-time serverless analytics at Shedd Overview and hands-on workshop Dobo Radichkov OLX Data Summit, March 2018
  • 2. 2 What to expect… ØGoal is to give you a sweeping view of the Shedd serverless real-time analytics stack ØWe will cover a lot of new tools and tech building blocks, though we will steer clear of the nitty gritty details ØExpect technical content and hands-on exercises – for the non-technical folk in the audience, try to focus on the high-level understanding of the concepts ØWe hope the presentation gives you inspiration and smoothens the learning curve in case you decide to pursue a similar approach
  • 3. 3 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 5. 5 Why real-time analytics? VS Offline Real-time Enables products that adapt and respond to changing user behaviour instantly and continuously
  • 6. 6 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies Day 1 activity Browser Viewer Buyer
  • 7. 7 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer
  • 8. 8 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer How can real-time analytics help?
  • 9. 9 Real-time analytics unlocks a number of capabilities Segment user behaviour and build real-time single customer viewSegmentation Personalisation Targeting Reporting A/B testing Data-driven products Instantly personalise product experience based on up-to-date user preferences and behaviour Target users with push notifications, in-app messaging and custom product flows based on real-time triggers and rules Build mission-critical reports for real-time decision-making (e.g. during large live marketing campaign or new product releases) Continuously optimise live A/B tests based on real-time results Enable integration of data analytics & models within our products
  • 10. 10 Real-time analytics enables us to unlock the full value of dataThe diminishing value of data Recent data is highly valuab If you act on it in time Perishable Insights (M. Gualtieri, F Old + Recent data is more v If you have the means to combine t
  • 11. 11 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Today we will take a peek at Shedd’s real-time data stack
  • 12. 12 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 13. 13 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 14. 14 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 15. 15 Kinesis includes 3 flavours © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS Stream à Process Stream à Analyse Stream à Ingest
  • 16. 16 Kinesis Data Stream architecture ▪ 1 MB / sec data input ▪ 1 MB / sec data output ▪ 1000 records / sec ▪ 24 hours data retention ▪ $0.015 / shard / hour ($10.80 / shard / month) ▪ $0.014 / 1M records ($14 / 1B records) … Stream Shard Event / data record (e.g. JSON object) Write event to stream shard Read event from stream shard
  • 17. 17 Exercise: Create stream and feed with sample data 1. Create Kinesis data stream 2. Feed sample real-time data https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
  • 18. 18 Kinesis Analytics enables real-time data analysis, transformation, enrichment and visualisation
  • 19. 19 Exercise: Create Kinesis Analytics application and run some real-time SQL analysis 1. Create Kinesis Analytics app 2. Run real-time SQL analysis
  • 20. 20 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 21. 21 Evolution of computing models ON-PREMISE Physical servers SERVER as a service Virtual server in the cloud Amazon EC2 APP as a service Virtual app container Amazon ECS FUNCTION as a service Serverless computing AWS Lambda
  • 22. 22 Lambda is Amazon’s serverless event-driven compute service Write code in Python, Node.js, Java, and others and upload to Lambda Trigger code from other AWS services, HTTP endpoints or in-app activity Scale seamlessly and elastically with number of events, only using required compute resource Only pay for the compute time used (per 100ms execution time) Forget about infrastructure, administration and scaling – focus 100% on your app logic
  • 23. 23 Exercise: Let’s create 2 simple Lambda functions 1. Create Hello World 2. Create stream processor
  • 24. 24 Combining Lambda with API gateway empowers the data professional to create serverless APIs
  • 25. 25 serverless framework streamlines and automates deployment
  • 26. 26 Exercise: Create APIs with serverless + API gateway + Lambda 1. Create Hello World endpoint 2. Create mock API endpoint
  • 27. 27 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 28. 28 ElastiCache is Amazon’s managed service for Redis: an INSANELY fast in-memory key-value database ▪In-memory ▪Low latency ▪Ridiculously fast ▪NoSQL à key-value store ▪Open source
  • 29. 29 Redis + Redshift = ▪ Run few queries infrequently ▪ Process billions of records per query ▪ Standard SQL ▪ Batch ▪ Run millions of commands continuously ▪ Process few records per command ▪ 200 Redis commands + Lua scripting ▪ Real-time
  • 30. 30 Redis is a key-value store supporting 5 basic data types Key => { Data Structures } Key "I'm a Plain Text String!" Key1 Val1 Key2 Val 2 A: 0.1 B: 0.3 C: 500 D: 500 A B C D C B B A C Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets String Hash List Set Sorted set
  • 31. 31 Exercise: Let’s have a look at Redis in action 1. Play with Redis commands 2. Test Redis speed
  • 32. 32 Recap: We covered the 3 AWS building blocks for real-time data KINESIS Stream data LAMBDA Process data ELASTICACHE Store data +
  • 33. 33 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 34. 34 Real-time vs offline data stacks Offline stack Real- time stack Raw data Files on S3 Kinesis streams Database Redshift Redis Volume High – processing millions / billions of records at the same time Low – processing single records at a time Velocity Low – running few queries at a time High – running thousands / millions of queries at the same time Query language SQL Python + Redis commands End-user Humans, BI tools Lambda, APIs, products
  • 35. 35 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Shedd end-to-end data stack architecutre
  • 36. 36 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 37. 37 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING Segmentation API Lambda Kingsman service
  • 38. 38 Shedd app Android / iOS SDK FRONTEND Analytics API handler Lambda Endpoint(s) API gateway API Data warehouse Redshift Redis bulk loader Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd analytics APIs Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 39. 39 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 40. Thank you Questions? Feedback? Dobo Radichkov Analytics summit, Jan 2018