Building Serverless
Data Infrastructure in the AWS Cloud
Ryan Plant
@ryan_plant
November 10, 2017
Thanks	to	our	Sponsors!
Partners
Premier
Marquee:
Prize:
Get	the	app!	Give	feedback!
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
The world’s most valuable resource is no longer oil, but data…
May 6th, 2017
Data => Revenue
(but extraction, refinement, packaging, and distribution needed)
DW
Traditional Data Warehousing
Volume, variety, and velocity…
Advanced analytics…
Artificial intelligence…
”What got us here won’t (entirely) get us there…”
Mostly proprietary…
Costly and complex to scale…
Next Generation Data Infrastructure
(i.e. the “data lake”)
James “Data Lake” Dixon
If you think of a datamart as a store of
bottled water – cleansed and packaged
and structured for easy consumption –
the data lake is a large body of water in a
more natural state…
From Data Warehouses to Lakes
A data pond, lake, ocean is not a product it’s an architecture…
(and architecture is a principled and pattern-oriented approach to building systems)
Any and all data…
Any source and format…
Any time…
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
APPS &
SOURCES
STORAGE AND PROCESSING LAYER
SERVING LAYER
Storage
Catalog
Processing
Analytics
&
Artificial
Intelligence
Ingestion
Models & Marts
DATA OPS
API
Search
Security
Config
Telemetry
Cost Mgmt
DATA OPS
Security
Config
Telemetry
Cost Mgmt
SERVING LAYER
Models & Marts
API
Search
APPS &
SOURCES
STORAGE AND PROCESSING LAYER
StorageIngestion
Catalog
Processing
Analytics
&
Artificial
Intelligence
Data Ingestion Pipelines
SERVICE
SERVICE
SERVICE
MONOLITH
MONOLITH
MONOLITH Change Data Capture
(CDC)
STREAMS
MESSAGING
FILE EXTRACTS
STORAGE
source data aggregated, stored indefinitely
many supported formats
append
append
PUT
Security
segregation & encryption
Storage and Catalog
STORAGE
RAW REFINED
Catalog
• Register source and schema
• Data attribute inventory
• Relationships and dependencies
• Etc…
dataIngestion
Catalog
Raw to Refined Processing Pipelines
STORAGE
RAW REFINED
Processing Pipelines
dataIngestion
C1 C2 C3 C..n
• Preserve RAW data; enrich only
• Apply transforms to create new, REFINED
datasets (e.g. customer partitioned views)
• Catalog new datasets
• Enable new use cases:
• Reporting/Analytical views
• Machine/Deep Learning
X Y Z
ALL DATA
Processing Pipelines
Catalog
Analytics and AI
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATA
X Y Z
… … …
DATA OPS
Security
Config
Telemetry
Cost Mgmt
APPS &
SOURCES
STORAGE AND PROCESSING LAYER
StorageIngestion
Catalog
Processing
Analytics
&
Artificial
Intelligence
SERVING LAYER
Models & Marts
API
Search
Processing Pipelines
Catalog
Curation and Serving
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATA
X Y Z
Models and Marts
… … …
Search
… … …
Processing Pipelines
Catalog
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATA
X Y Z
Models and Marts
… … …
Search
… … …
API
APPS &
SOURCES
STORAGE AND PROCESSING LAYER
SERVING LAYER
Storage
Catalog
Processing
Analytics
&
Artificial
Intelligence
Ingestion
Models & Marts
DATA OPS
API
Search
Security
Config
Telemetry
Cost Mgmt
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
Lots of software, hardware, etc.
TRADITIONAL INVESTMENT IN NEXT GENERATION DATA
CAPITAL AND RISK BARRIERS
acquire/write and maintain software
procure, install, and maintain hardware
get commercial real estate license
Building Serverless Data Infrastructure in the AWS Cloud
PUBLIC CLOUD ECONOMIES OF SCALE
CLOUD OPTIMIZATION
Infrastructure as a Service
Someone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a Service
Someone else’s software, servers, hardware and real estate
Your custom application software
Software as a Service
Someone else’s application software, you provide the data
(everything else doesn’t matter)
Cycle Time
Capital Optimization
Differentiation Focus
High
Higher
Highest
Go Serverless!
(as much as possible)
everything is an event: messages, log entries, file I/Os, clock alarms, etc.
listen for events: trigger a handler with an event
stateless event handling: avoid state, persist as event source, handoff as soon as possible
automation through orchestration and coordination
Principles for event-driven, reactive data infrastructure primed for serverless architectures
StorageIngestion
SQS
SNS
Kinesis
DynamoDB/RDS
event triggers y = f (x)
y = f (x, y)
y = f ([x, y])
event handlers
AWS Glacier
(archival)
/{source}-raw/{key}/YYYY-MM-DD
/{source}-refined/{key}/YYYY-MM-DD
AWS Lambda AWS S3
(ready)
KMS
(encryption) lifecycle policies
IAM + Directory
(access control)
CloudWatch/Trail
to S3 direct
AWS Step Functions
(coordinated state)
Catalog
Storage
Sources
Ingestion
AWS Glue
(serverless ETL/ELT)
source crawlers
metadata
classifier
classifier
doSomething(…) {…}trigger
Processing Pipelines
jobs and job runner
To Targets
Catalog
Storage
Sources
&
Targets Ingestion
Processing Pipelines
AWS Glue
(serverless ETL/ELT)
AWS EMR
(Managed Hadoop)
Streaming
Kinesis
Batch
AWS Batch
Targets
&
SourcesIngestion
Serving Layer
Catalog
Storage
Processing Pipelines
AWS Glue
(serverless ETL/ELT)
Serving Layer
AWS ElasticSearch
(managed ES)
AWS RedShift
Spectrum
(Parallel DW)
Sources
Ingestion
AWS Athena
(Ad-hoc Query)
Catalog
Storage
Processing Pipelines
Serving Layer
Sources
Ingestion
AWS API Gateway
(serverless APIs)
AWS QuickSight
(visualization)
AWS Cognito
(Web/Mobile Identity and SSO)
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
CLOUD OPTIMIZATION
Infrastructure as a Service
Someone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a Service
Someone else’s software, servers, hardware and real estate
Your custom application software
Software as a Service
Someone else’s application software, you provide the data
(everything else doesn’t matter)
Cycle Time
Capital Optimization
Differentiation Focus
High
Higher
Highest
CLOUD OPTIMIZATION
Infrastructure as a Service
Someone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a Service
Someone else’s software, servers, hardware and real estate
Your custom application software
Software as a Service
Someone else’s application software, you provide the data
(everything else doesn’t matter)
You are likely here…
Aim here…
TBD
Opportunity!
Public Cloud R&D
Investment
SERVERLESS: USE CAUTION
The floor is wet (and is constantly getting mopped!)
The edges are sharp:
• Development, Test, Debug tools and experience
• Configuration and Deployment challenges
• Variable, non-deterministic performance
Extremely new (but inevitable) paradigm…
Building Serverless Data Infrastructure in the AWS Cloud

More Related Content

PDF
Future of Serverless
PPTX
New AWS Services for Bioinformatics
PDF
Autoscale DynamoDB with Dynamic DynamoDB
PDF
AWS Glue - let's get stuck in!
PDF
Querying Data Pipeline with AWS Athena
PDF
Beyond Relational
PDF
Logging infrastructure for Microservices using StreamSets Data Collector
PDF
AWS Kinesis Streams
Future of Serverless
New AWS Services for Bioinformatics
Autoscale DynamoDB with Dynamic DynamoDB
AWS Glue - let's get stuck in!
Querying Data Pipeline with AWS Athena
Beyond Relational
Logging infrastructure for Microservices using StreamSets Data Collector
AWS Kinesis Streams

Similar to Building Serverless Data Infrastructure in the AWS Cloud (12)

PDF
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
PDF
Building a Data Lake on AWS
PDF
Building Data Lakes with Apache Airflow
PDF
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
PDF
Serverless Big Data Architectures: Serverless Data Analytics
PDF
AWS Big Data Landscape
PDF
Your First Data Lake on AWS_Simon Elisha
PDF
Architecting Data Lakes on AWS
PDF
introduction to azure synapse analytics.
PDF
Modern Data architecture Design
PPTX
Databricks Platform.pptx
PDF
Introduction Big Data
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
Building a Data Lake on AWS
Building Data Lakes with Apache Airflow
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Serverless Big Data Architectures: Serverless Data Analytics
AWS Big Data Landscape
Your First Data Lake on AWS_Simon Elisha
Architecting Data Lakes on AWS
introduction to azure synapse analytics.
Modern Data architecture Design
Databricks Platform.pptx
Introduction Big Data
Ad

Recently uploaded (20)

PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PPTX
Chapter_05_System Modeling for software engineering
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PPTX
Relevance Tuning with Genetic Algorithms
PPT
3.Software Design for software engineering
PPTX
Foundations of Marketo Engage: Nurturing
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PDF
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PPTX
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
PPTX
FLIGHT TICKET API | API INTEGRATION PLATFORM
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PDF
Mobile App for Guard Tour and Reporting.pdf
PDF
Mobile App Backend Development with WordPress REST API: The Complete eBook
PDF
Module 1 - Introduction to Generative AI.pdf
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
ESDS_SAP Application Cloud Offerings.pptx
SAP Business AI_L1 Overview_EXTERNAL.pptx
Chapter_05_System Modeling for software engineering
Why 2025 Is the Best Year to Hire Software Developers in India
Relevance Tuning with Genetic Algorithms
3.Software Design for software engineering
Foundations of Marketo Engage: Nurturing
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
FLIGHT TICKET API | API INTEGRATION PLATFORM
Top 10 Project Management Software for Small Teams in 2025.pdf
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
Mobile App for Guard Tour and Reporting.pdf
Mobile App Backend Development with WordPress REST API: The Complete eBook
Module 1 - Introduction to Generative AI.pdf
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Ad

Building Serverless Data Infrastructure in the AWS Cloud