SlideShare a Scribd company logo
An introduction to cloud
computing with
Amazon Web Services
and
MongoDB
Samuel Demharter
DTC, 10 March 2016
Cloud Computing
“Everybody's in it and nobody's in it. It's like
a cloud that everybody has given a little puff
of mist to, and then the cloud does all the
heavy thinking for everybody. I don't mean
there's really a cloud. I just mean it's
something like that.”
The Sirens of Titan, Kurt Vonnegut, 1959
Definition
• Gartner Group: “A style of computing in
which massively scalable scalable and
elastic IT-enabled capabilities are
delivered as a service using Internet
technologies.”
Cloud Computing Service Models
Software As A Service
(SAAS)
Platform As A Service
(PAAS)
Infrastructure As A
Service (IAAS)
Amazon Web Services
• Development started in 2002
• In 2006, Amazon launched its Elastic
Compute cloud (EC2) and S3 storage
service
• Amazon EC2/S3 was the first widely
accessible cloud computing infrastructure
service
Amazon Web Services (AWS)
AWS
Computing
EC2
MapReduce
Storage
S3
EBS
Databases
SimpleDB
DynamoDB
Others
Others
AWS Computing
• Elastic Compute Cloud (EC2)
– Access to individual instances as you would
with any other machine
– Customisable configuration
– Auto Scaling
• Amazon Elastic MapReduce
– Process vast amounts of data
– Utilise Hadoop framework
AWS Storage
• Simple Storage Service (S3)
– Scalable cloud storage
– HTTP access
– Object store not a file system
– Cheap
• Elastic Block Storage (EBS)
– Local storage
– For use with EC2 instances
– Take snapshot backups
– Fast
AWS Databases
• Amazon SimpleDB (noSQL)
– Ease of administration
• Amazon DynamoDB (noSQL)
– Scalability & durability
• Amazon Relational Database Service
(SQL)
– Efficient indexing & querying
• Amazone ElastiCache
– Fast data access
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
huMONGOus – scalable
– natural
What is a database?
A database is a collection of information that
is organized so that it can easily be
accessed, managed, and updated.
Why use a database?
• Reusability : You need a single, public,
interface for your data storage that all parts of
your application can use.
• Availability : You need be sure that your
application will always be able to read and
write data.
• Durability : You need to be sure that your
data will stick around.
• Scalability : You need your data storage to
be able to grow with your application.
Typical SQL and noSQL databases
SQL
Oracle
MySQL
Microsoft SQL
NoSQL
Key-Value
Column
Document
Graph-based
SQL – Structured Query Language
NoSQL – Not Only SQL
MongoDB
CouchDB
Riak
SQL vs MongoDB
http://sql-vs-nosql.blogspot.co.uk
MongoDB
• Distributed
• Document-oriented
• Schema-less storage solution
• Uses JSON-style documents
• Supports Python, PHP, Java, Ruby, C++, etc.
• Replica sets for failovers and speeding up
reads
• Sharding for high performance
SQL vs MongoDB (noSQL)
SQL MongoDB (noSQL)
Requires structured data/ well-
designed schema
semi-structured, unstructured &
polymorphic data
Table based Document based
Database atomicity Document atomicity/
eventual consistency
Rules enforced by database Rules enforced by user
Scale-up Scale-out (suitable for distributed
computing)
Flexible & fast
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
Table - Who is the account holder
for account ID 3?
Document - Who is the account
holder for account ID 3?
Redundancy and Data Availability -
Replication
Scaling out - Sharding
• A means for partitioning data across
servers for high performance
An introduction to cloud computing with Amazon Web Services and MongoDB
Real-time Analytics
Usage Example 1: DNA Sequencing
• Real-time DNA sequencing
• Raw Data
PC
• Basecalling
AWS
• Basecalled
Data
PC
Usage Example 1: DNA Sequencing
• Use AWS EC2 computing and S3 storage
• Spot market – auction of unused EC2
instances
• Pay-Per-Use an important economical
factor for Nanopore
• Use a combination of MongoDB and SQL
Usage Example 2: Genome Analysis
Genetic Variant Calling
Peter White et al., Ohio State University in collaboration with Genome Next
https://youtu.be/upAtK_SOtsY
Resources
• AWS Tutorials - https://qwiklabs.com
• MapReduce -
http://hadoop.apache.org/docs/r1.2.1/mapr
ed_tutorial.html
• AWS for Research -
https://aws.amazon.com/grants/
• MongoDB - http://university.mongodb.com/
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
An introduction to cloud computing with Amazon Web Services and MongoDB
Definitions
• Instance: A copy of an Amazon Machine
Image running as a virtual server in the
AWS cloud
• Instance type: A specification that defines
the memory, CPU, storage capacity, and
hourly cost for an instance.
• Amazon Machine Image: AMIs are like a
template of a computer's root drive.
• Pixar accidentally wipes out nearly every
file of "Toy Story 2" about 10 months into
production. Fortunately, supervising
technical director Galyn Susman had just
become a new mom and had an entire
copy of the movie on her home computer
so that she could work from home. Woody
and Buzz live to see another day, and
movie.
An introduction to cloud computing with Amazon Web Services and MongoDB

More Related Content

What's hot (20)

PDF
Migrating a multi tenant app to Azure (war biopic)
★ Akshay Surve
 
PPTX
Analytics in the Cloud
Ross McNeely
 
PPTX
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
★ Akshay Surve
 
PPTX
Graph Databases at Netflix
Ioannis Papapanagiotou
 
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
DataConf
 
PDF
Cloud Computing - War of Stacks
Khadka Dipesh
 
PDF
Cloud Overview
iasaglobal
 
PPTX
Introdcution to Azure
Omid Vahdaty
 
PPTX
Machine Learning on the Microsoft Stack
Lynn Langit
 
PPTX
Amazon Web Services (Database)
Nishant Bhardwaj
 
DOCX
Cloud compt
thiyagu0484
 
PPTX
Azure Global Bootcamp - CIS Handson
Jan Pieter Posthuma
 
PDF
Modern Data architecture Design
Kujambu Murugesan
 
PPTX
IronSource Atom - Redshift - Lessons Learned
Idan Tohami
 
PPTX
AWS Distilled
Jeyaram Gurusamy
 
PDF
Snowball 180625113523
Guna Shekar
 
PDF
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
PPTX
Cloud service comparisons
Mark Marciante
 
PPT
Amazon Webservice & Cloud Computing
Jack Smith
 
PDF
Aws cost optimization: lessons learned, strategies, tips and tools
Felipe
 
Migrating a multi tenant app to Azure (war biopic)
★ Akshay Surve
 
Analytics in the Cloud
Ross McNeely
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
★ Akshay Surve
 
Graph Databases at Netflix
Ioannis Papapanagiotou
 
Vitalii Bondarenko "Machine Learning on Fast Data"
DataConf
 
Cloud Computing - War of Stacks
Khadka Dipesh
 
Cloud Overview
iasaglobal
 
Introdcution to Azure
Omid Vahdaty
 
Machine Learning on the Microsoft Stack
Lynn Langit
 
Amazon Web Services (Database)
Nishant Bhardwaj
 
Cloud compt
thiyagu0484
 
Azure Global Bootcamp - CIS Handson
Jan Pieter Posthuma
 
Modern Data architecture Design
Kujambu Murugesan
 
IronSource Atom - Redshift - Lessons Learned
Idan Tohami
 
AWS Distilled
Jeyaram Gurusamy
 
Snowball 180625113523
Guna Shekar
 
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
Cloud service comparisons
Mark Marciante
 
Amazon Webservice & Cloud Computing
Jack Smith
 
Aws cost optimization: lessons learned, strategies, tips and tools
Felipe
 

Similar to An introduction to cloud computing with Amazon Web Services and MongoDB (20)

PDF
AWS & MongoDB
Jeremy Taylor
 
PDF
MongoDB on AWS
eldariof
 
PDF
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
PPTX
Sql vs NoSQL
RTigger
 
PPTX
amazon database
PrasannaBhalerao3
 
PDF
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Amazon Web Services LATAM
 
PPTX
Amazon Web Services OverView
Ariel K
 
PPTX
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
Cobus Bernard
 
DOCX
Amazon web services
Nishant Bhardwaj
 
PDF
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
PPTX
The Non-Relational Revolution
Mikhail Prudnikov
 
PDF
AWS re:Invent Recap
Allen-Michael (AM) Grobelny
 
PPTX
Cloudcomputing
sree raj
 
ZIP
Gluecon 2012 - DynamoDB
Jeff Douglas
 
KEY
DynamoDB Gluecon 2012
Appirio
 
PPTX
Jump Start to Amazon Web Services
Gagan Sikri
 
KEY
Escalando Aplicaciones Web
Santiago Coffey
 
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
Randall Hunt
 
PPTX
Amazon Web Services CC ppt finished.pptx
YUGANDHARSRINIVASRAG
 
PDF
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
AWS & MongoDB
Jeremy Taylor
 
MongoDB on AWS
eldariof
 
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
Sql vs NoSQL
RTigger
 
amazon database
PrasannaBhalerao3
 
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Amazon Web Services LATAM
 
Amazon Web Services OverView
Ariel K
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
Cobus Bernard
 
Amazon web services
Nishant Bhardwaj
 
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
The Non-Relational Revolution
Mikhail Prudnikov
 
AWS re:Invent Recap
Allen-Michael (AM) Grobelny
 
Cloudcomputing
sree raj
 
Gluecon 2012 - DynamoDB
Jeff Douglas
 
DynamoDB Gluecon 2012
Appirio
 
Jump Start to Amazon Web Services
Gagan Sikri
 
Escalando Aplicaciones Web
Santiago Coffey
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
Randall Hunt
 
Amazon Web Services CC ppt finished.pptx
YUGANDHARSRINIVASRAG
 
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
Ad

Recently uploaded (20)

PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
Ad

An introduction to cloud computing with Amazon Web Services and MongoDB

  • 1. An introduction to cloud computing with Amazon Web Services and MongoDB Samuel Demharter DTC, 10 March 2016
  • 2. Cloud Computing “Everybody's in it and nobody's in it. It's like a cloud that everybody has given a little puff of mist to, and then the cloud does all the heavy thinking for everybody. I don't mean there's really a cloud. I just mean it's something like that.” The Sirens of Titan, Kurt Vonnegut, 1959
  • 3. Definition • Gartner Group: “A style of computing in which massively scalable scalable and elastic IT-enabled capabilities are delivered as a service using Internet technologies.”
  • 4. Cloud Computing Service Models Software As A Service (SAAS) Platform As A Service (PAAS) Infrastructure As A Service (IAAS)
  • 5. Amazon Web Services • Development started in 2002 • In 2006, Amazon launched its Elastic Compute cloud (EC2) and S3 storage service • Amazon EC2/S3 was the first widely accessible cloud computing infrastructure service
  • 6. Amazon Web Services (AWS) AWS Computing EC2 MapReduce Storage S3 EBS Databases SimpleDB DynamoDB Others Others
  • 7. AWS Computing • Elastic Compute Cloud (EC2) – Access to individual instances as you would with any other machine – Customisable configuration – Auto Scaling • Amazon Elastic MapReduce – Process vast amounts of data – Utilise Hadoop framework
  • 8. AWS Storage • Simple Storage Service (S3) – Scalable cloud storage – HTTP access – Object store not a file system – Cheap • Elastic Block Storage (EBS) – Local storage – For use with EC2 instances – Take snapshot backups – Fast
  • 9. AWS Databases • Amazon SimpleDB (noSQL) – Ease of administration • Amazon DynamoDB (noSQL) – Scalability & durability • Amazon Relational Database Service (SQL) – Efficient indexing & querying • Amazone ElastiCache – Fast data access
  • 15. What is a database? A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
  • 16. Why use a database? • Reusability : You need a single, public, interface for your data storage that all parts of your application can use. • Availability : You need be sure that your application will always be able to read and write data. • Durability : You need to be sure that your data will stick around. • Scalability : You need your data storage to be able to grow with your application.
  • 17. Typical SQL and noSQL databases SQL Oracle MySQL Microsoft SQL NoSQL Key-Value Column Document Graph-based SQL – Structured Query Language NoSQL – Not Only SQL MongoDB CouchDB Riak
  • 19. MongoDB • Distributed • Document-oriented • Schema-less storage solution • Uses JSON-style documents • Supports Python, PHP, Java, Ruby, C++, etc. • Replica sets for failovers and speeding up reads • Sharding for high performance
  • 20. SQL vs MongoDB (noSQL) SQL MongoDB (noSQL) Requires structured data/ well- designed schema semi-structured, unstructured & polymorphic data Table based Document based Database atomicity Document atomicity/ eventual consistency Rules enforced by database Rules enforced by user Scale-up Scale-out (suitable for distributed computing) Flexible & fast
  • 24. Table - Who is the account holder for account ID 3?
  • 25. Document - Who is the account holder for account ID 3?
  • 26. Redundancy and Data Availability - Replication
  • 27. Scaling out - Sharding • A means for partitioning data across servers for high performance
  • 30. Usage Example 1: DNA Sequencing • Real-time DNA sequencing • Raw Data PC • Basecalling AWS • Basecalled Data PC
  • 31. Usage Example 1: DNA Sequencing • Use AWS EC2 computing and S3 storage • Spot market – auction of unused EC2 instances • Pay-Per-Use an important economical factor for Nanopore • Use a combination of MongoDB and SQL
  • 32. Usage Example 2: Genome Analysis Genetic Variant Calling Peter White et al., Ohio State University in collaboration with Genome Next https://youtu.be/upAtK_SOtsY
  • 33. Resources • AWS Tutorials - https://qwiklabs.com • MapReduce - http://hadoop.apache.org/docs/r1.2.1/mapr ed_tutorial.html • AWS for Research - https://aws.amazon.com/grants/ • MongoDB - http://university.mongodb.com/
  • 37. Definitions • Instance: A copy of an Amazon Machine Image running as a virtual server in the AWS cloud • Instance type: A specification that defines the memory, CPU, storage capacity, and hourly cost for an instance. • Amazon Machine Image: AMIs are like a template of a computer's root drive.
  • 38. • Pixar accidentally wipes out nearly every file of "Toy Story 2" about 10 months into production. Fortunately, supervising technical director Galyn Susman had just become a new mom and had an entire copy of the movie on her home computer so that she could work from home. Woody and Buzz live to see another day, and movie.

Editor's Notes

  • #6: In 2006, Amazon launched its Elastic Compute cloud (EC2) as a commercial web service that allows small companies and individuals to rent computers on which to run their own computer applications. Other key factors that have enabled cloud computing to evolve include the maturing of virtualisation technology, the development of universal high-speed bandwidth, and universal software interoperability standards
  • #7: a collection of cloud computing services e.g. Amazon markets AWS as a service to provide large computing capacity more quickly and more cheaply than a client company building an actual physical server farm.[3]
  • #8: Hadoop is a framework for distributing data and processing across resizable cluster of EC2 instances
  • #10: EMR: A web service that makes it easy to process large amounts of data efficiently. Amazon EMR uses Hadoop processing combined with several AWS products to do such tasks as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehousing.
  • #15: Open source Popular with start-ups
  • #17: Simple application that stores data in file Want to read data later Another programme wants to read data What if not same language? Multiple programmes at same time use data? Overloaded. Scale up or scale out? Scale up – improve hardware – eventually runs out Scale out – distribute data – manage data across multiple hosts
  • #18: noSQL termed in 2009
  • #20: uses JSON-style documents to represent, query and modify data Similar to CouchBase and CouchDB MongoDB success is largely due to having easy-to-use, familiar tools.
  • #21: MongoDB uses memory mapped file for its storage engine (data is structured per record)
  • #28: A shard is a replica set that contains a subset of the data for the sharded cluster. Together, the cluster’s shards hold the entire data set for the cluster.
  • #38: A virtual machine is a software computer that, like a physical computer, runs an operating system and applications. The virtual machine is comprised of a set of specification and configuration files and is backed by the physical resources of a host. Some instance types are designed for standard applications, whereas others are designed for CPU-intensive, memory-intensive applications, and so on. AMI contains the operating system and can also include software and layers of your application, such as database servers, middleware, web servers, and so on.