SlideShare a Scribd company logo
How to enable the
Lean Enterprise
Johann Romefort
co-founder @ rainbow
My Background
• Seesmic - Co-founder & CTO
Video conversation platform
Social media clients…lots of pivots :)
• Rainbow - Co-founder & CTO
Enterprise App Store
Goal of this presentation
• Understand what is the Lean
Enterprise, how it relates to big data
and the software architecture you
build
• Have a basic understanding of the
technologies and tools involved
What is the Lean
Enterprise?
http://en.wikipedia.org/wiki/Lean_enterprise
“Lean enterprise is a
practice focused on
value creation for the
end customer with
minimal waste and
processes.”
Enabling the
OODA Loop
!
!
“Get inside your adversaries'
OODA loop to disorient them”
!
OBSERVE
ORIENT
DECIDE
ACT
USAF Colonel John Boyd on Combat:
OODA Loop
Enabling the
OODA Loop
OODA Loop
The OODA Loop
for software
image credit: Adrian Cockcroft
OODA Loop
• (Observe) Innovation and (Decide)
Culture are mainly human-based
• Orient (BigData) and Act (Cloud) can
be automated
ORIENT
What is Big Data?
• It’s data at the intersection of 3 V:
• Velocity (Batch / Real time / Streaming)
• Volume (Terabytes/Petabytes)
• Variety (structure/semi-structured/unstructured)
Why is everybody talking about it?
• Cost of generation of data has gone down
• By 2015, 3B people will be online, pushing data
volume created to 8 zettabytes
• More data = More insights = Better decisions
• Ease and cost of processing is falling thanks to
cloud platforms
Data flow and constraints
Generate
Ingest / Store
Process
Visualize / Share
The 3 V involve
heterogeneity and
make it hard to
achieve those steps
What is AWS?
• AWS is a cloud computing platform
• On-demand delivery of IT resources
• Pay-as-you-go pricing model
Cloud Computing
+ +
StorageCompute Networking
Adapts dynamically to ever
changing needs to stick closely
to user infrastructure and
applications requirements
How does AWS helps
with Big Data?
• Remove constraints on the ingesting, storing, and
processing layer and adapts closely to demands.
• Provides a collection of integrated tools to adapt to
the 3 V’s of Big Data

• Unlimited capacity of storage and processing power
fits well to changing data storage and analysis
requirements.
Computing Solutions
for Big Data on AWS
Kinesis
EC2 EMR
Redshift
Computing Solutions
for Big Data on AWS
EC2
All-purpose computing instances.
Dynamic Provisioning and resizing
Let you scale your infrastructure
at low cost
Use Case: Well suited for running custom or proprietary
application (ex: SAP Hana, Tableau…)
Computing Solutions
for Big Data on AWS
EMR
‘Hadoop in the cloud’
Adapt to complexity of the analysis
and volume of data to process
Use Case: Offline processing of very large volume of data,
possibly unstructured (Variety variable)
Computing Solutions
for Big Data on AWS
Kinesis
Stream Processing
Real-time data
Scale to adapt to the flow of
inbound data
Use Case: Complex Event Processing, click streams,
sensors data, computation over window of time
Computing Solutions
for Big Data on AWS
RedShift
Data Warehouse in the cloud
Scales to Petabytes
Supports SQL Querying
Start small for just $0.25/h
Use Case: BI Analysis, Use of ODBC/JDBC legacy software
to analyze or visualize data
Storage Solution
for Big Data on AWS
DynamoDB RedShift
S3 Glacier
Storage Solution
for Big Data on AWS
DynamoDB
NoSQL Database
Consistent
Low latency access
Column-base flexible
data model
Use Case: Offline processing of very large volume of data,
possibly unstructured (Variety variable)
Storage Solution
for Big Data on AWS
S3
Use Case: Backups and Disaster recovery, Media storage,
Storage for data analysis
Versatile storage system
Low-cost
Fast retrieving of data
Storage Solution
for Big Data on AWS
Glacier
Use Case: Storing raw logs of data. Storing media archives.
Magnetic tape replacement
Archive storage of cold data
Extremely low-cost
optimized for data infrequently
accessed
What makes AWS different
when it comes to big data?
Given the 3V’s a collection of tools is most of the time
needed for your data processing and storage.
Integrated Environment for Big Data
AWS Big Data solutions comes integrated with each others
already
AWS Big Data solutions also integrate with the whole AWS
ecosystem (Security, Identity Management, Logging, Backups,
Management Console…)
Example of products interacting with
each other.
Tightly integrated rich
environment of tools
On-demand scaling sticking to
processing requirements
+
=
Extremely cost-effective and easy to
deploy solution for big data needs
• Error Detection: Real-time detection of hardware
problems
• Optimization and Energy management
Use Case:
Real-time IOT Analytics
Gathering data in real time from sensors deployed in
factory and send them for immediate processing
First Version of the
infrastructure
Aggregate
Sensors
data
nodejs
stream
processor
On customer site
evaluate rules
over time
window
in-house hadoop cluster
mongodb
feed algorithm
write raw
data for
further
processing
backup
Version of the infrastructure
ported to AWS
Aggregate
Sensors
data
On customer site
evaluate rules
over time
window
write raw
data for
archiving
Kinesis RedShift
for BI
analysis
Glacier
ACT
Cloud and Lean
Enterprise
Let’s start with a
personal example
Lean Enterprise, Microservices and Big Data
First year @seesmic
• Prototype becomes production
• Monolithic architecture
• No analytics/metrics
• Little monitoring
• Little automated testing
I built a monolith
or…at least I tried
Early days at SeesmicFirst year @seesmic
Everybody loves a good
horror story
We crashed Techcrunch
Lean Enterprise, Microservices and Big Data
What did we do?
Add a QA Manager
Add bearded
SysAdmin
We added tons of process
so nothing can’t go wrong
Impact on dev team
• Frustration of slow release process
• Lots of back and forth due to bugs and
the necessity to test app all over each
time
• Chain of command too long
• Feeling no power in the process
• Low trust
Impact on product team
• Frustration of not executing fast
enough
• Frustration of having to ask for
everything (like metrics)
• Feeling engineers always have the last
word
Impact on Management
• Break down software into smaller
autonomous units
• Break down teams into smaller
autonomous units
• Automating and tooling, CI / CD
• Plan for the worst
What can you do?
=
Break down software into
smaller autonomous units
Introduction to
Microservices
Monolith vs Microservices
- 10000ft view -
Monolith vs Microservices
- databases -
Monolith vs Microservices
- servers -
Microservices
- example -
Break down team into
smaller units
Amazon’s
“two-pizza teams”
• 6 to 10 people; you can feed them
with two pizzas.
• It’s not about size, but about
accountability and autonomy
• Each team has its own fitness
function
• Full devops model: good tooling needed
• Still need to be designed for resiliency
• Harder to test
Friction points
Continuous Integration
(CI) is the practice, in software engineering, of
merging all developer working copies with a
shared mainline several times a day
Continuous Deployment
Continuous Deployment
Tools for Continuous
Integration
• Jenkins (Open Source, Lot of plugins,
hard to configure)
• Travis CI (Look better, less plugins)
Tools for Continuous
Deployment
• GO.cd (Open-Source)
• shippable.com (SaaS, Docker
support)
• Code Deploy (AWS)
+ Puppet, Chef, Ansible, Salt, Docker…
Impact on dev
• Autonomy
• Not afraid to try new things
• More confident in codebase
• Don’t have to linger around with old
bugs until there’s a release
Impact on product team
• Iterate faster on features
• Can make, bake and break hypothesis
faster
• Product gets improved incrementally
everyday
Impact on Management
• Enabling Microservices architecture
• Enabling better testing
• Enabling devops model
• Come talk to the Docker team
tomorrow!
Thank You
follow me: @romefort
romefort@gmail.com

More Related Content

PDF
Data Lake and the rise of the microservices
Bigstep
 
PDF
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
PDF
Cloudstate - Towards Stateful Serverless
Lightbend
 
PDF
Monitoring MySQL at scale
Ovais Tariq
 
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
PDF
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
DataStax Academy
 
PPTX
Exploring microservices in a Microsoft landscape
Alex Thissen
 
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
Sunil Nagaraj
 
Data Lake and the rise of the microservices
Bigstep
 
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Cloudstate - Towards Stateful Serverless
Lightbend
 
Monitoring MySQL at scale
Ovais Tariq
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
DataStax Academy
 
Exploring microservices in a Microsoft landscape
Alex Thissen
 
Databus - LinkedIn's Change Data Capture Pipeline
Sunil Nagaraj
 

What's hot (20)

PPTX
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
PPTX
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
PPTX
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
PPTX
When the Cloud is a Rockin: High Availability in Apache CloudStack
John Burwell
 
PDF
Pets vs. Cattle: The Elastic Cloud Story
Randy Bias
 
PPTX
Understanding Codenvy - for Containerized Developer Workspaces
Lynn Langit
 
PDF
Riak at Engine Yard Cloud
Ines Sombra
 
PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
PDF
Microservices, Kubernetes, and Application Modernization Done Right
Lightbend
 
PDF
Lightbend Fast Data Platform
Lightbend
 
PPTX
analytic engine - a common big data computation service on the aws
Scott Miao
 
PDF
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
Redis Labs
 
PDF
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
PPTX
Comparison of various streaming technologies
Sachin Aggarwal
 
PPTX
Real time analytics
Leandro Totino Pereira
 
PDF
Big Data Computing Architecture
Gang Tao
 
PPTX
Power of OpenStack & Hadoop
Tuan Yang
 
PDF
Big Data - in the cloud or rather on-premises?
Guido Schmutz
 
PDF
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
John Burwell
 
Pets vs. Cattle: The Elastic Cloud Story
Randy Bias
 
Understanding Codenvy - for Containerized Developer Workspaces
Lynn Langit
 
Riak at Engine Yard Cloud
Ines Sombra
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Microservices, Kubernetes, and Application Modernization Done Right
Lightbend
 
Lightbend Fast Data Platform
Lightbend
 
analytic engine - a common big data computation service on the aws
Scott Miao
 
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
Redis Labs
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
Comparison of various streaming technologies
Sachin Aggarwal
 
Real time analytics
Leandro Totino Pereira
 
Big Data Computing Architecture
Gang Tao
 
Power of OpenStack & Hadoop
Tuan Yang
 
Big Data - in the cloud or rather on-premises?
Guido Schmutz
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
Ad

Viewers also liked (20)

PDF
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
PDF
Apply Machine Learning to Microservices
Kai Wähner
 
PDF
Weave Networking on Docker
Stylight
 
PDF
Getting started on IoT with AWS and NodeMCU for less than 5€
Stylight
 
PDF
Big data on AWS
Stylight
 
PDF
CoreOS introduction - Johann Romefort
Stylight
 
PDF
A Microservice Architecture for Big Data Pipelines
Daniel Mescheder
 
PDF
Wakayama 1 day
Takashi Yasui
 
PPTX
Κωνσταντίνος Καβάφης
nicolaidoumarina
 
PPT
Packpin SV2B presentation
packpin
 
PPTX
Μάγια Ζαχαρίας
nicolaidoumarina
 
PPTX
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
PPTX
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
PDF
Sample ppt new niche interior by mulavira interior systems
Mulavira Interior Systems
 
PPTX
Antena array
mustikacahyaputri
 
PPTX
ΟΙ ΖΟΥΛΟΥ Παναγιώτα
nicolaidoumarina
 
PDF
Portugal x Brasil- No mercado de transferências de jogadores
Football Improvement Portugal
 
PPTX
What is a startup
Mohammadreza Hosseini
 
PPTX
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
PDF
6365042 dictionar-psihologie-larousse1
Holhos Flavia
 
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
Apply Machine Learning to Microservices
Kai Wähner
 
Weave Networking on Docker
Stylight
 
Getting started on IoT with AWS and NodeMCU for less than 5€
Stylight
 
Big data on AWS
Stylight
 
CoreOS introduction - Johann Romefort
Stylight
 
A Microservice Architecture for Big Data Pipelines
Daniel Mescheder
 
Wakayama 1 day
Takashi Yasui
 
Κωνσταντίνος Καβάφης
nicolaidoumarina
 
Packpin SV2B presentation
packpin
 
Μάγια Ζαχαρίας
nicolaidoumarina
 
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
Sample ppt new niche interior by mulavira interior systems
Mulavira Interior Systems
 
Antena array
mustikacahyaputri
 
ΟΙ ΖΟΥΛΟΥ Παναγιώτα
nicolaidoumarina
 
Portugal x Brasil- No mercado de transferências de jogadores
Football Improvement Portugal
 
What is a startup
Mohammadreza Hosseini
 
Καβάφης Κωνσταντίνος
nicolaidoumarina
 
6365042 dictionar-psihologie-larousse1
Holhos Flavia
 
Ad

Similar to Lean Enterprise, Microservices and Big Data (20)

PDF
Big Data on AWS
Johann Romefort
 
PPT
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
PDF
Horses for Courses: Database Roundtable
Eric Kavanagh
 
PDF
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
PPTX
Choosing technologies for a big data solution in the cloud
James Serra
 
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
PPTX
Financial impact of Cloud Computing
krisbliesner
 
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
PDF
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
PDF
Digital_IOT_(Microsoft_Solution).pdf
ssuserd23711
 
PPTX
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
PPTX
How does Microsoft solve Big Data?
James Serra
 
PPTX
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
PDF
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Gary Arora
 
PDF
How to Choose a Host for a Big Data Project
Peak Hosting
 
PPTX
Big Data Session 1.pptx
ElsonPaul2
 
PPTX
Serverless-Computing-The-Future-of-Backend-Development
Ozias Rondon
 
Big Data on AWS
Johann Romefort
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
Choosing technologies for a big data solution in the cloud
James Serra
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Financial impact of Cloud Computing
krisbliesner
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
Digital_IOT_(Microsoft_Solution).pdf
ssuserd23711
 
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
How does Microsoft solve Big Data?
James Serra
 
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Introduction to Stream Processing
Guido Schmutz
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Gary Arora
 
How to Choose a Host for a Big Data Project
Peak Hosting
 
Big Data Session 1.pptx
ElsonPaul2
 
Serverless-Computing-The-Future-of-Backend-Development
Ozias Rondon
 

Recently uploaded (20)

PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PDF
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
Inventory management chapter in automation and robotics.
atisht0104
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Introduction to Data Science: data science process
ShivarkarSandip
 
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
Zero Carbon Building Performance standard
BassemOsman1
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 

Lean Enterprise, Microservices and Big Data

  • 1. How to enable the Lean Enterprise Johann Romefort co-founder @ rainbow
  • 2. My Background • Seesmic - Co-founder & CTO Video conversation platform Social media clients…lots of pivots :) • Rainbow - Co-founder & CTO Enterprise App Store
  • 3. Goal of this presentation • Understand what is the Lean Enterprise, how it relates to big data and the software architecture you build • Have a basic understanding of the technologies and tools involved
  • 4. What is the Lean Enterprise? http://en.wikipedia.org/wiki/Lean_enterprise “Lean enterprise is a practice focused on value creation for the end customer with minimal waste and processes.”
  • 5. Enabling the OODA Loop ! ! “Get inside your adversaries' OODA loop to disorient them” ! OBSERVE ORIENT DECIDE ACT USAF Colonel John Boyd on Combat: OODA Loop
  • 7. The OODA Loop for software image credit: Adrian Cockcroft
  • 8. OODA Loop • (Observe) Innovation and (Decide) Culture are mainly human-based • Orient (BigData) and Act (Cloud) can be automated
  • 10. What is Big Data? • It’s data at the intersection of 3 V: • Velocity (Batch / Real time / Streaming) • Volume (Terabytes/Petabytes) • Variety (structure/semi-structured/unstructured)
  • 11. Why is everybody talking about it? • Cost of generation of data has gone down • By 2015, 3B people will be online, pushing data volume created to 8 zettabytes • More data = More insights = Better decisions • Ease and cost of processing is falling thanks to cloud platforms
  • 12. Data flow and constraints Generate Ingest / Store Process Visualize / Share The 3 V involve heterogeneity and make it hard to achieve those steps
  • 13. What is AWS? • AWS is a cloud computing platform • On-demand delivery of IT resources • Pay-as-you-go pricing model
  • 14. Cloud Computing + + StorageCompute Networking Adapts dynamically to ever changing needs to stick closely to user infrastructure and applications requirements
  • 15. How does AWS helps with Big Data? • Remove constraints on the ingesting, storing, and processing layer and adapts closely to demands. • Provides a collection of integrated tools to adapt to the 3 V’s of Big Data
 • Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.
  • 16. Computing Solutions for Big Data on AWS Kinesis EC2 EMR Redshift
  • 17. Computing Solutions for Big Data on AWS EC2 All-purpose computing instances. Dynamic Provisioning and resizing Let you scale your infrastructure at low cost Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)
  • 18. Computing Solutions for Big Data on AWS EMR ‘Hadoop in the cloud’ Adapt to complexity of the analysis and volume of data to process Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 19. Computing Solutions for Big Data on AWS Kinesis Stream Processing Real-time data Scale to adapt to the flow of inbound data Use Case: Complex Event Processing, click streams, sensors data, computation over window of time
  • 20. Computing Solutions for Big Data on AWS RedShift Data Warehouse in the cloud Scales to Petabytes Supports SQL Querying Start small for just $0.25/h Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data
  • 21. Storage Solution for Big Data on AWS DynamoDB RedShift S3 Glacier
  • 22. Storage Solution for Big Data on AWS DynamoDB NoSQL Database Consistent Low latency access Column-base flexible data model Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 23. Storage Solution for Big Data on AWS S3 Use Case: Backups and Disaster recovery, Media storage, Storage for data analysis Versatile storage system Low-cost Fast retrieving of data
  • 24. Storage Solution for Big Data on AWS Glacier Use Case: Storing raw logs of data. Storing media archives. Magnetic tape replacement Archive storage of cold data Extremely low-cost optimized for data infrequently accessed
  • 25. What makes AWS different when it comes to big data?
  • 26. Given the 3V’s a collection of tools is most of the time needed for your data processing and storage. Integrated Environment for Big Data AWS Big Data solutions comes integrated with each others already AWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)
  • 27. Example of products interacting with each other.
  • 28. Tightly integrated rich environment of tools On-demand scaling sticking to processing requirements + = Extremely cost-effective and easy to deploy solution for big data needs
  • 29. • Error Detection: Real-time detection of hardware problems • Optimization and Energy management Use Case: Real-time IOT Analytics Gathering data in real time from sensors deployed in factory and send them for immediate processing
  • 30. First Version of the infrastructure Aggregate Sensors data nodejs stream processor On customer site evaluate rules over time window in-house hadoop cluster mongodb feed algorithm write raw data for further processing backup
  • 31. Version of the infrastructure ported to AWS Aggregate Sensors data On customer site evaluate rules over time window write raw data for archiving Kinesis RedShift for BI analysis Glacier
  • 32. ACT
  • 34. Let’s start with a personal example
  • 36. First year @seesmic • Prototype becomes production • Monolithic architecture • No analytics/metrics • Little monitoring • Little automated testing
  • 37. I built a monolith
  • 39. Early days at SeesmicFirst year @seesmic
  • 40. Everybody loves a good horror story
  • 43. What did we do?
  • 44. Add a QA Manager
  • 46. We added tons of process so nothing can’t go wrong
  • 47. Impact on dev team • Frustration of slow release process • Lots of back and forth due to bugs and the necessity to test app all over each time • Chain of command too long • Feeling no power in the process • Low trust
  • 48. Impact on product team • Frustration of not executing fast enough • Frustration of having to ask for everything (like metrics) • Feeling engineers always have the last word
  • 50. • Break down software into smaller autonomous units • Break down teams into smaller autonomous units • Automating and tooling, CI / CD • Plan for the worst What can you do?
  • 51. = Break down software into smaller autonomous units
  • 53. Monolith vs Microservices - 10000ft view -
  • 57. Break down team into smaller units
  • 58. Amazon’s “two-pizza teams” • 6 to 10 people; you can feed them with two pizzas. • It’s not about size, but about accountability and autonomy • Each team has its own fitness function
  • 59. • Full devops model: good tooling needed • Still need to be designed for resiliency • Harder to test Friction points
  • 60. Continuous Integration (CI) is the practice, in software engineering, of merging all developer working copies with a shared mainline several times a day
  • 63. Tools for Continuous Integration • Jenkins (Open Source, Lot of plugins, hard to configure) • Travis CI (Look better, less plugins)
  • 64. Tools for Continuous Deployment • GO.cd (Open-Source) • shippable.com (SaaS, Docker support) • Code Deploy (AWS) + Puppet, Chef, Ansible, Salt, Docker…
  • 65. Impact on dev • Autonomy • Not afraid to try new things • More confident in codebase • Don’t have to linger around with old bugs until there’s a release
  • 66. Impact on product team • Iterate faster on features • Can make, bake and break hypothesis faster • Product gets improved incrementally everyday
  • 68. • Enabling Microservices architecture • Enabling better testing • Enabling devops model • Come talk to the Docker team tomorrow!