Big Data Analytics
with Amazon Web Services



            Dr. Matt Wood
   An Online Seminar. Tuesday 16th October.
Hello, and thank you.
Big Data Analytics

   An introduction
Big Data Analytics

   An introduction

   The story of analytics on AWS
Big Data Analytics

   An introduction

   The story of analytics on AWS

   AWS Marketplace
Big Data Analytics

   An introduction

   The story of analytics on AWS

   AWS Marketplace

   Success story: Brightcove
1




INTRODUCING BIG DATA
Data for competitive
     advantage.
Using data

  Customer segmentation,
  financial modeling,
  system analysis,
  line-of-sight,
  business intelligence.
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Cost of data generation
       is falling.
lower cost,
increased throughput

                             Generation




                         Collection & storage




                       Analytics & computation




                       Collaboration & sharing
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Very high barrier to turning
  data into information.
Move from a
data generation challenge to
    analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data
        approach.
Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.

  Infrastructure for distributed
       storage and analysis
Software

  Frameworks for
  data-intensive workloads.

  Distributed by design.
Infrastructure

  Platform for
  data-intensive workloads.

  Distributed by design.
Support the
data timeline.
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Lower the
barrier to entry.
Accelerate time to market
   and increase agility.
Enable new business
   opportunities.
Washington Post

   Pinterest

    NASA
“AWS enables Pfizer to explore
difficult or deep scientific
questions in a timely, scalable
manner and helps us make better
decisions more quickly”

Michael Miller, Pfizer
2




THE STORY OF ANALYTICS
EC2
Utility computing.
 6 years young.
Scale out systems


 Embarrassingly parallel problems.
 Queue based distribution.
 Small, medium and high scale.
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Cost optimization.



    EC2
Utility computing.
 6 years young.
Achieving economies of scale
100%




                                      Time
Achieving economies of scale
100%




               Reserved capacity




                                      Time
Achieving economies of scale
100%




                On-demand




               Reserved capacity




                                      Time
Achieving economies of scale
                                   UNUSED CAPACITY
100%




                On-demand




               Reserved capacity




                                                     Time
Spot Instances


 Bid on unused EC2 capacity.
 Very large discount.
 Perfect for batch runs.
 Balance cost and scale.
<$1000 per hour
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.

 Complex cluster configuration
 and management.
Amazon Elastic MapReduce

 Managed Hadoop clusters.

 Easy to provision and monitor.

 Write two functions. Scale up.

 Optimized for S3 access.
S3

Input data
S3

        Input data




Code     Elastic
       MapReduce
S3

        Input data




Code     Elastic     Name
       MapReduce     node
S3

        Input data




Code     Elastic     Name
       MapReduce     node




                            Elastic
                            cluster
S3

        Input data




Code     Elastic     Name
       MapReduce     node


                                      HDFS


                            Elastic
                            cluster
S3

        Input data




Code     Elastic              Name
       MapReduce              node

                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
S3

        Input data




Code     Elastic              Name                            Output
       MapReduce              node                          S3 + SimpleDB


                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
S3

Input data




                    Output
                  S3 + SimpleDB
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Performance
Performance
 Compute performance
Cluster Compute

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings
Cluster Compute

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings

 + GPU enabled instances
Performance
 Compute performance
IO performance



Performance
 Compute performance
NoSQL
Unstructured data storage.
DynamoDB

 Predictable, consistent performance
 Unlimited storage
 Single digit millisecond latencies
 No schema for unstructured data
 Backed on solid state drives
...and SSDs for all.
  New Hi1 storage instances.
hi1.4xlarge

  2 x 1Tb SSDs
  10 GigE network
  HVM: 90k IOPS read, 9k to 75k write
  PV: 120k IOPS read, 10k to 85k write
“The hi1.4xlarge configuration is
about half the system cost for the
same throughput.”


Netflix
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Performance + ease of use
3




AWS MARKETPLACE
Extend platform with
     partners
Innovate on behalf of
    customers
Remove undifferentiated
    heavy lifting
AWS Marketplace
aws.amazon.com/marketplace
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Collection & storage



    Acunu Reflex
    Apache Cassandra NoSQL database


    MongoDB
    With and without EBS RAID storage


    Couchbase
    Community and Enterprise editions



    ScaleArc
    MySQL load balancing
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Analytics & computation



      KarmaSphere Analytics
      for Amazon Elastic MapReduce



      MapR M5
      Hadoop Distribution



      Metamarkets
      Event based data processing
Analytics & computation



      StackIQ Rocks+
      HPC clusters with MPI, Grid Engine



      Univa Grid Engine
      One click cluster deployment



      Quantivo
      Data association analytics
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Collaboration & sharing




Aspera Faspex
   20 Mbps data transfer
4




SUCCESS STORY

More Related Content

PPTX
Azure Databricks & Spark @ Techorama 2018
PPTX
Dataminds - ML in Production
PDF
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
PPTX
A developer's introduction to big data processing with Azure Databricks
PDF
Architecting Data Lakes on AWS
PPTX
Azure satpn19 time series analytics with azure adx
PPTX
Time Series Analytics Azure ADX
PPTX
Azure Data Explorer deep dive - review 04.2020
Azure Databricks & Spark @ Techorama 2018
Dataminds - ML in Production
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
A developer's introduction to big data processing with Azure Databricks
Architecting Data Lakes on AWS
Azure satpn19 time series analytics with azure adx
Time Series Analytics Azure ADX
Azure Data Explorer deep dive - review 04.2020

What's hot (16)

PPTX
Industry experts webinar slides (final v1.0)
PPT
Survey of Real-time Processing Systems for Big Data
PPTX
MCT Virtual Summit 2021
PPTX
The Microsoft BigData Story
PDF
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
PPTX
PPTX
Interactive query in hadoop
PDF
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
PDF
Demystifying AI, Machine Learning and Deep Learning
PPTX
Hd insight overview
PDF
Amazon big success using big data analytics
PPTX
Cascading User Group Meet
PDF
Simplify and Scale Data Engineering Pipelines with Delta Lake
PPTX
Big Data on Azure Tutorial
PDF
Microsoft Build 2020: Data Science Recap
PPTX
Interactive query using hadoop
Industry experts webinar slides (final v1.0)
Survey of Real-time Processing Systems for Big Data
MCT Virtual Summit 2021
The Microsoft BigData Story
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Interactive query in hadoop
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Demystifying AI, Machine Learning and Deep Learning
Hd insight overview
Amazon big success using big data analytics
Cascading User Group Meet
Simplify and Scale Data Engineering Pipelines with Delta Lake
Big Data on Azure Tutorial
Microsoft Build 2020: Data Science Recap
Interactive query using hadoop
Ad

Similar to Big Data Analytics with Amazon Web Services (20)

PDF
Architecting Virtualized Infrastructure for Big Data
PPTX
Big data hadoop ecosystem and nosql
PDF
Introduction to Gruter and Gruter's BigData Platform
PDF
Big Data/Hadoop Infrastructure Considerations
PPT
Bd cloud v3
PDF
Hadoop on Azure, Blue elephants
PDF
Hadoop Trends
PDF
8 mattwoodaws-intro-pdf-110411093115-phpapp01
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
PDF
Infochimps #1 Big Data Platform for the Cloud
KEY
Processing Big Data
PDF
Bi with apache hadoop(en)
PDF
Apache hadoop bigdata-in-banking
PDF
Hadoop for shanghai dev meetup
PDF
Big data and Analytics on AWS
PPTX
Big data and cloud
PDF
Big data on aws
PDF
Systems Bioinformatics Workshop Keynote
PDF
Managing Big Data (Chapter 2, SC 11 Tutorial)
PDF
Cloud computing era
Architecting Virtualized Infrastructure for Big Data
Big data hadoop ecosystem and nosql
Introduction to Gruter and Gruter's BigData Platform
Big Data/Hadoop Infrastructure Considerations
Bd cloud v3
Hadoop on Azure, Blue elephants
Hadoop Trends
8 mattwoodaws-intro-pdf-110411093115-phpapp01
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Infochimps #1 Big Data Platform for the Cloud
Processing Big Data
Bi with apache hadoop(en)
Apache hadoop bigdata-in-banking
Hadoop for shanghai dev meetup
Big data and Analytics on AWS
Big data and cloud
Big data on aws
Systems Bioinformatics Workshop Keynote
Managing Big Data (Chapter 2, SC 11 Tutorial)
Cloud computing era
Ad

Recently uploaded (20)

PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Flame analysis and combustion estimation using large language and vision assi...
DOCX
search engine optimization ppt fir known well about this
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Consumable AI The What, Why & How for Small Teams.pdf
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
MuleSoft-Compete-Deck for midddleware integrations
sbt 2.0: go big (Scala Days 2025 edition)
Advancing precision in air quality forecasting through machine learning integ...
sustainability-14-14877-v2.pddhzftheheeeee
Convolutional neural network based encoder-decoder for efficient real-time ob...
4 layer Arch & Reference Arch of IoT.pdf
Lung cancer patients survival prediction using outlier detection and optimize...
Flame analysis and combustion estimation using large language and vision assi...
search engine optimization ppt fir known well about this
giants, standing on the shoulders of - by Daniel Stenberg
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Training Program for knowledge in solar cell and solar industry
NewMind AI Weekly Chronicles – August ’25 Week IV
Improvisation in detection of pomegranate leaf disease using transfer learni...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf

Big Data Analytics with Amazon Web Services