SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Dr. Kevin Jorissen, AWS Research Computing
Amazon cloud resources as part
of scientific workflows & HPC
Sept 13, 2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kevin Jorissen
Seattle
Kevin has 10 years of experience in
computational science, and holds a Ph.D. in
Physics. He developed codes solving the
quantum physics equations for light absorption
by materials, taught workshops to scientists
worldwide, and wrote about high performance
computing in the cloud before it was fashionable.
He worked as a postdoctoral researcher in
Antwerp, Lausanne, Seattle, and Zurich. He
contributed to the WIEN2k code (Density
Functional Theory calculations of material
properties, www.wien2k.at) and the FEFF code
(X-ray and Electron absorption spectra,
www.feffproject.org).
Kevin joined Amazon in 2015 to help accelerate
the adoption of cloud computing in the scientific
community globally.,
BIO
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Hot off the presses: WWPS AWS Summit
Real-Time Machine Learning on Satellite Imagery: How DigitalGlobe Uses Amazon SageMaker to Massively
Scale-up Information Extraction from Satellite Imagery
Using AWS and Open Data to Meet the Demands of Disaster Response Situations
Transitioning Geoscience Research to the Cloud: Opportunities and Challenges
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS
Enabling Sustainable Research Platforms in the Cloud
Enabling Research using Hybrid HPC Cloud Computing
Precision Medicine on the Cloud
Transforming Research in Collaboration with Funding Agencies
Enabling Research using Hybrid HPC Cloud Computing
Innovation on the Edge: How Rapid Experimentation with Technology is Achieving Results in the Enterprise
With NASA JPL
Accelerating Analytics for the Future of Genomics
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Data Lake Ingestion
Empowering Every Brain! How Brain Power is using AWS-Powered AI in their Mission to Help People with
Autism and Other Brain-Related Challenges
… Now available at https://www.youtube.com/user/AmazonWebServices/videos
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Processing and Streaming
GOES-16 Data with
AWS Managed Services
Hot off the presses: WWPS AWS Summit
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Machine learning outputs Professional mapper tracing
The final output of high-voltage grids
DevelopmentSeed:
Machine Learning with Earth Observation Imagery
Hot off the presses: WWPS AWS Summit
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Key Strengths of AWS for Scientific Discoveries
Time to discovery
• Availability of resources, scalability, right-sizing
• Experiment fast
• Avoid undifferentiated work
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Availability of resources: We’re off to a cute start …
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Right-sized resources: Genomics processing on FPGA Accelerators
Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of
1,000 Genomes
Orlando, Fla., Oct 19, 2017 – The
Children’s Hospital of Philadelphia
(CHOP) and Edico Genome today set
a new scientific world standard in
rapidly processing whole human
genomes into data files usable for
researchers aiming to bring precision
medicine into mainstream clinical
practice. Utilizing Edico Genome’s
DRAGENTM Genome Pipeline,
deployed on 1,000 Amazon EC2 F1
instances on the Amazon Web
Services (AWS) Cloud, 1,000
pediatric genomes were processed
in two hours and 25 minutes.
… Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Moving quickly with managed services
DNA Sequencing using AWS container services
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
RFP != INNOVATION
What if you could …
• Access FPGA servers today instead of waiting for a procurement?
• See if your code runs better on modern GPUs (and tear it down again immediately
if it doesn’t?)
• Spin up a compute cluster now instead of waiting to apply for an allocation?
• Rearchitect a science application into serverless or microservices in a few weeks,
because AWS does the undifferentiated heavy lifting?
• Let your collaborators try out a new ML algorithm on your data, without having to ask
you for resources?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Key Strengths of AWS for Scientific Discoveries
Time to discovery
• Availability of resources, scalability, right-sizing
• Experiment fast
• Avoid undifferentiated work
Collaboration
• Data lake model
• Security & compliance
• Sharing
• Infrastructure, ML, Analytics
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Collaborating on scientific data in the cloud
AthenaGlue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Collaborating on scientific data in the cloud
NOAA- NEXRAD on AWS S3, usage increased 2.3x
greater scientific impact
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
NIH initiatives: National Cancer Institute – Cloud Resources
http://www.cancergenomicscloud.org
Funded projects to create collaborative environments on cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
NASA Image and Video Library (2017)
https://aws.amazon.com/partners/success/nasa-image-library/
• Easy Access to the Wonders of Space. Fully
compliant with Section 508 of the
Rehabilitation Act.
• Built-in Scalability. “On-demand scalability
will be invaluable for events such as the solar
eclipse that’s happening later this summer—
both as we upload new media and as the
public comes to view that content,” says
Bryan Walls, Imagery Experts Deputy
Program Manager at NASA.
• Good Use of Taxpayer Dollars. By building
its Image and Video Library in the cloud,
NASA avoided the costs associated with
deploying and maintaining server and storage
hardware in-house. Instead, the agency can
simply pay for the AWS resources it uses at
any given time.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
”
“
U.K. Met Office Uses AWS to Deliver Tailored Meteorological Data
The Met Office has been a widely respected
national weather service in the United Kingdom
for 160 years.
“We are using the AWS
Cloud to drive the mass-
market availability of
customizable weather
information.
• Needed the means to send weather data
to device users and third-party customers.
• Deployed Amazon ElastiCache to respond
to peak demands.
• Attracted more than half a million users
with its WeatherCloud app.
• Scaled data storage tenfold and reduced
solution costs by 50 percent.
• Enabled innovation of big data services in
a competitive landscape.
James Tomkins
Head of Enterprise IT Architecture
Met Office
”
“
https://aws.amazon.com/solutions/case-studies/the-met-office/
https://aws.amazon.com/about-aws/whats-new/2017/08/uk-met-office-high-resolution-weather-forecast-data-is-now-on-aws/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Public Datasets on AWS
To stimulate innovation, AWS hosts a selection of datasets that anyone can
access for free. Data in our public datasets is available for rapid access to our
flexible and low-cost computing resources.
Earth Science
• Landsat
• NEXRAD
• NASA NEX
Life Science
• TCGA & ICGC (used at OICR)
• 1000 Genomes
• Genome in a Bottle
• Human Microbiome Project
• 3000 Rice Genome Internet Science
• Common Crawl Corpus
• Google Books Ngrams
• Multimedia Commons
https://aws.amazon.com/public-datasets/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS (RODA)
https://registry.opendata.awsRegister YOUR datasets here and tell people how to use them.
Browse what your peers are sharing.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Getting started: AWS Researcher’s Handbook
The 150-page missing manual for science in the cloud.
Written by Amazon’s Research Computing
community for scientists.
• Explains foundational concepts about how AWS
can accelerate time-to-science in the cloud.
• Step-by-step best practices for securing your
environment to ensure your research data is safe
and your privacy is protected.
• Tools for budget management that will help you
control your spending and limit costs (and
preventing any over-runs).
• Catalogue of scientific solutions from partners
chosen for their outstanding work with scientists.
aws.amazon.com/rcp
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Getting started: institutional solutions @ Emory
https://edscoop.com/emory-university-research-aws-cloud-rich-mendola
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
RISELab (Real-time Intelligent Secure Execution)
Collaborative 5-year effort between UC Berkeley, National Science Foundation,
and industry partners. (2017-2021) – AWS is founding partner. https://riselab.cs.berkeley.edu
• Students and researchers at RISELab use AWS to rapidly prototype and develop
new systems at a scale and with a speed not possible before.
• Resulted in Apache Spark, developed on AWS, and integrated with AWS core services.
Berkeley Data Analytics Stack
GOAL:
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
TL;DR (takeaway messages)
• AWS collaborates with research community on the biggest research
challenges
• AWS cloud has scale, services and data capabilities beyond research
cyberinfrastructure
• AWS Research Cloud Program eases the on-ramp
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Why AWS
Scale and Elasticity
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Life Sciences
HPC Workloads in the Cloud
Financial Services Energy & Geo Sciences
Design & Engineering Media & Entertainment Autonomous Vehicles
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Elasticity: Natural Language Processing at Clemson University
550,000 cores using EC2 Spot Instances
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-
vcpus-ec2-spot-instances/
“I am absolutely thrilled with the outcome of this experiment. The graduate students on the project […]
used resources from AWS and Omnibond and developed a new software infrastructure to perform
research at a scale and time-to-completion not possible with only campus resources.” – Prof. Amy Apon,
Co-Director of the Complex Systems, Analytics and Visualization Institute
“spot market”: cheap AWS computing –a good fit for research
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Elasticity: Natural Language Processing at Clemson University
550,000 cores & EC2 Spot Instances
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-
vcpus-ec2-spot-instances/
“I am absolutely thrilled with the outcome of this experiment. The graduate students on the project are
amazing. They used resources from AWS and Omnibond and developed a new software infrastructure to
perform research at a scale and time-to-completion not possible with only campus resources. Per-second
billing was a key enabler of these experiments.” – Prof. Amy Apon, Co-Director of the Complex Systems,
Analytics and Visualization Institute
. Browse data at https://registry.opendata.aws
https://blogs.univa.com/2018/06/univa-demonstrates-extreme-scale-automation-by-deploying-more-than-one-million-cores-in-a-single-univa-grid-engine-cluster-using-aws/
2 months ago – 500,000 cores
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
BNL/Fermilab and High Throughput Computing at Scale
High Energy Physics
• Discovery of the Higgs Boson Particle
• Added 58,000 Spot Cores Elastically
• Monte Carlo Simulations Searching for Particles
• Reduced workload from 6 weeks to 10 days
Source: https://aws.amazon.com/blogs/aws/experiment-that-discovered-the-higgs-boson-uses-aws-to-probe-nature/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
ATLAS Architecture with AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
West
ATLAS Architecture
+West
West
Results Collision Data
Data
EC2 instances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
NASA – Climate Research
• Mosaicking 2,500+ QuickBird satellite images into
100-kilometer (km) x 100-km tiles, which are then
broken into 25-km x 25-km sub-tiles for
processing.
• Orthorectifying and mosaicking all satellite data in
ADAPT
• Identifying trees and shrubs using adaptive
vegetation classifier algorithms. Estimating
biomass. Incorporating algorithms to calculate
tree and shrub height for biomass estimates.
The combined resources of ADAPT and AWS potentially
reduce total processing time from 10 months to less than 1
month
Source: https://www.nas.nasa.gov/SC15/demos/demo31.html
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Genomics processing on FPGA Accelerators
Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of
1,000 Genomes
Orlando, Fla., Oct 19, 2017 – The
Children’s Hospital of Philadelphia
(CHOP) and Edico Genome today set
a new scientific world standard in
rapidly processing whole human
genomes into data files usable for
researchers aiming to bring precision
medicine into mainstream clinical
practice. Utilizing Edico Genome’s
DRAGENTM Genome Pipeline,
deployed on 1,000 Amazon EC2 F1
instances on the Amazon Web
Services (AWS) Cloud, 1,000
pediatric genomes were processed
in two hours and 25 minutes.
… Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Why AWS
Constant technological evolution allows
technical experimentation
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Why AWS
Compliance and Security
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Compliance Programs
SOC 1
Global
SOC 2 SOC 3
https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/
United
States
Asia
Pacific
Europe
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
GovCloud region for sensitive workloads
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS cloud services
Compute; Storage; Networking; Machine Learning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Global Infrastructure
18 Regions – 52 Availability Zones – 100+ Edge Locations
Region & Number of Availability Zones
US East EU
N. Virginia (6), Ohio (3) Ireland (3)
Frankfurt (2)
US West London (2)
Oregon (3)
Northern California (3) Asia Pacific
Singapore (2)
AWS GovCloud Sydney (2), Tokyo (3),
(US-West) (2) Seoul (2), Mumbai (2)
Canada China
Central (2) Beijing (2)
South America
São Paulo (3)
Announced Regions
China, France, Hong Kong, Sweden, Bahrain, and a
second AWS GovCloud Region in the US.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
The AWS Console (in your web browser)
Common Services
for HPC Applications
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Controlling your AWS resources
1. Web browser (point-and-click)
2. Command-line interface (script, automate)
3. SDKs (GUIs, platforms, Science Gateways)
https://aws.amazon.com/cli/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon S3
Secure, durable,
highly-scalable object
storage. Fast access,
low cost.
For long-term durable
storage of data, in a
readily accessible
get/put access format.
Primary durable and
scalable storage for
HPC data
Amazon Glacier
Secure, durable, long
term, highly cost-
effective object
storage.
For long-term storage
and archival of data
that is infrequently
accessed.
Use for long-term,
lower-cost archival
of HPC data
EC2+EBS
Block storage device
(SSD or HDD) for file
system attached to
EC2 instance. Can
build parallel file
system (e.g., using
Intel Lustre).
For near-line storage
of files optimized for
high I/O performance.
Use for high-IOPs,
temporary working
storage
AWS Storage Options for HPC Workloads
EFS
Highly available,
multi-AZ, fully
managed network-
attached elastic file
system.
For near-line, highly-
available storage of
files in a traditional
NFS format (NFSv4).
Use for read-often,
temporary working
storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data transfer
HPC Data Flow on AWS Storage
Campus data center
Amazon
Glacier
Amazon S3
AWS Direct
Connect
ISV
Connectors
Storage
Gateway
AWS
Snowball
Internet/VPN
Ingress
Egress
Lifecycle
EC2 Instance
EBS
Instance
Store
Object, block, file storage
Kinesis Data
Firehose
S3 Transfer
Acceleration
Amazon
CloudFront
Other Shared File
System
EFS
25 Gbps to S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Spectrum of Compute Instance Types
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
We’re ready to build a cluster:
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using a compute cluster in the cloud
Shared File Storage
Cloud-based, scaling HPC cluster
Cluster head node with
job scheduler
Amazon S3
and
Amazon
Glacier
Thin or Zero Client
- No local data -
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Template—based compute clusters
• Infrastructure as code: all the elements of a compute cluster are
defined in a template, called “CloudFormation”.
• AWS executes the template and stands up the prescribed
infrastructure in 5-10 minutes.
• Come with popular job schedulers, shared NFS storage, etc.
• Install your applications & stage your data.
• Of course, you can customize it endlessly.
• You can create and destroy clusters at will. Creating a cluster can
become a 1-line action in your bash scripts.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Compute clusters in the cloud are elastic
morning afternoon evening
t
c c
t
c
t
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon S3
storage of input/output
R4
P3 P3 P3
P3 P3 P3 R4
C5
C5
C5 C5
C5 C5
GPU cluster
visualization
CPU cluster
weather model
Compute clusters in the cloud are fit for purpose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon S3
storage of input/output
Compute clusters in the cloud are ephemeral
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Tightly and loosely coupled workloads
Cluster HPC
Tightly coupled,
latency sensitive
applications
Use larger EC2
compute instances,
placement groups,
enhanced networking
Grid HPC
Loosely coupled, HTC,
pleasingly parallel
Use a variety of EC2
instances, multiple AZs,
Spot, Auto Scaling,
Amazon SQS
Ensemble?
Run all members at once!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Automatic re-sizing of compute clusters based upon need
Define minimum and maximum pool sizes and when scaling
and cool down occurs.
CloudWatch usage metrics drive scaling, for example CPU
utilization or job queue depth Trigger auto-scaling
policy
Autoscaling
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Burst for fast results
# CPUs
time
# CPUs
time
Wall clock time: ~1 hour
Wall clock time: ~1 week
Cost: equal
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Popular scientific applications prepackaged
Deploys in ~5 minutes.
Familiar job schedulers, scientific applications, and shared file system.
Install any software you need.
No job queues – it’s your personal cluster.
Access to the graphical console.
Deploys in minutes.
Scales as large as needed when
you add jobs to the queue, and
scales back down when the jobs
are done.
Using a compute cluster in the cloud
Self-scaling HPC clusters instantly ready to compute, billed by the hour and
use the AWS Spot market by default, so they’re very low cost
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using a compute cluster in the cloud
Self-scaling HPC clusters instantly ready to compute, billed by the hour and
use the AWS Spot market by default, so they’re very low cost
Command Line (ssh) Graphical Console
NAMD example shown: http://docs.alces-flight.com/en/stable/getting-started/environment-usage/using-openfoam-with-alces-flight-compute.html
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Create an Alces cluster from the AWS Marketplace
• Specify the details
of template
instantiation
• Called a “stack”
• Allows you to tailor
stack to needs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Your choice:
•SGE
•Slurm
•Torque
•OpenLava
•PBS Pro
•You have full rights so you can
always install your favorite,
custom scheduler
•Or skip it
Schedulers
Smart scaling:
•Scheduler knows how much
work is waiting in the job queue
•Triggers expansion of compute
fleet if needed (up to limit)
•Terminates idle compute nodes
so you don’t pay for idle nodes.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Alces has >1300+ applications
[alces@login1(myAWSomeHPCDemo) ~]$ alces gridware list
https://gridware.alces-flight.com/software
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Performance for Fluid Dynamics on AWS
ANSYS Fluent
• AWS c4.8xlarge
• 140M cells
• F1 car CFD benchmark
http://www.ansys-blog.com/simulation-on-the-cloud/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Test using larger, real-world examples
• Use large cases for testing: do not benchmark scalability using only small examples
Domain decomposition
• Choose number of cells per core for either per-core efficiency or for faster results
Instance types
• C4 or M4 are best choices today
Network
• Use a placement group
• Enable enhanced networking
OS version
• Use Amazon Linux or a version 3.10 or later Linux kernel
Processor states and affinity
• Use P-states to reduce processor variability
• Use CPU affinity to pin threads to CPU cores
MPI libraries
• Intel MPI recommended
Hyper-threading
• Test with Hyper-threading on and off
• Usually off is best, but not always
Performance Considerations for HPC on AWS
“BEST
PRACTICES”:
Well architected for HPC paper
HPC best practices paper
RCP Researchers Handbook
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
How do I get the lowest network Latency?
172.31.0.0/18
172.31.1.0/24
172.31.1.7
172.31.1.8
172.31.1.9
instance
instance
instance
• Use a current instance type
– With network optimization
– EBS Optimized
• Use Enhanced Networking
• Launch with a placement group
• … important for the performance of
tightly-coupled HPC codes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
http://www2.mmm.ucar.edu/wrf/OnLineTutorial/wrf_in_cloud_aws_tutorial.php
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
http://www2.mmm.ucar.edu/wrf/OnLineTutorial/wrf_in_cloud_aws_tutorial.php
http://cloud-gc.readthedocs.io/en/latest/chapter02_beginner-tutorial/quick-start.html#quick-start-label
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
WRF Weather Prediction
WRF Scaling and Performance on AWS
Weather and climate models are popular workloads on AWS:
Researchers, businesses (The Weather Channel), financial sector, …
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cost versus solution time is what matters
0
20
40
60
80
100
120
140
160
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 50 100 150 200 250 300 350
Scale	Up
Time	(s)
Cores
c4.8xlarge	Time c4.8xlarge	Scaleup
It’s your choice:
• Lowest Cost
• Fastest run time
• Something in the middle?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Clusters in the cloud: What we’ve seen so far
Cloud Clusters are ephemeral. You can destroy an old cluster and replace it
with a new cluster on a whim. This brings many advantages:
• You can script (automate) the creation of a cluster e.g. in a bash script
• Every user can have their own supercomputer – no more queues
• Software flexibility: you can install any of your custom codes
• You can choose the right server type for each job (e.g. CPU-constrained,
memory-constrained, GPU, …)
• You can let the number of compute servers fluctuate with the job queue
• You can scale the problem size to real-world dimensions because there’s
always room for n+1
• You can experiment and develop quickly (“agility”)
• You can get quicker results by using a larger cluster for a shorter time
• Every time AWS releases a new instance type, you benefit from a
performance boost (hardware refresh is easy and free!)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Homework
Sign up for the Researchers Handbook for AWS at aws.amazon.com/rcp . Browse data at https://registry.opend
1. Alces Flight compute cluster - NAMD tutorial: Launch “Performance Compute (SGE)” cluster at
https://launch.alces-flight.com/default/launch , wait for e-mail confirmation, then tutorial from
http://docs.alces-flight.com/en/stable/getting-started/environment-usage/using-openfoam-with-alces-flight-compute.html
2. Containers + AWS Batch for DNA sequencing: https://github.com/awslabs/aws-batch-genomics
3. Containers – WRF Big Weather Web: www.bigweatherweb.org
4. Serverless Computing – PyWren: http://pywren.io/pages/gettingstarted.html
then https://github.com/pywren/examples/
5. SageMaker Machine Learning labs: files from https://bit.ly/2HhD2SG ; instructions at
https://github.com/wleepang/sagemaker4research-workshop ; further labs at
https://developmentseed.org/blog/2018/01/19/sagemaker-label-maker-case/ and
https://aws.amazon.com/blogs/machine-learning/simulate-quantum-systems-on-amazon-sagemaker/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Spot: ICRAR/CHILES finding neutral hydrogen galaxies
• A global radio astronomy consortium (led by Columbia University in New York)
needed to process observational data from the Very Large Array telescope in New
Mexico. A 12-hour SLA meant they need ~$2 million of conventional HPC hardware.
This was impossible because they had only $50k.
• Using the EC2 Spot market in AWS’s northern Virginia region, they were able to
deploy their HPC workload at a much larger scale -- so they always beat their SLA --
whilst averaging only $1,200 per month of EC2 compute resources, well within
their 2-year budget of $50k.
• The project produced a major discovery which smashed the previous record for
identifying a neutral hydrogen galaxy by nearly twice the redshift of its predecessor.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
On-Demand
Pay for compute
capacity by the
second with no long-
term commitments
For bursty workloads
AWS Compute Consumption Models
Reserved
Commit upfront to 1
year and receive a
significant discount on
the hourly charge
For steady utilization
Spot
Bid for unused
capacity, charged at a
Spot Price which
fluctuates based on
supply and demand
For non-urgent, fault-
tolerant workloads
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Spot Market filler
0.0 0
1.5 0
3.0 0
4.5 0
6.0 0
# CPUs
time
Spot Market
Our ultimate space
filler.
Spot
Instances
allow you to
name your
own price
for spare
AWS
The “Spot Market” stretches your research budget $$
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Spot Rules are Simple
Spot is a market in which the price
of compute changes based on
supply and demand
You’ll never pay more than your
bid. When the market exceeds your
bid you get 2 minutes to wrap up
your work. Time to checkpoint!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Applications of the SPOT Market
Many examples we saw earlier used
Spot EC2 instances:
•Fermilab (60,000 cores)
•Novartis
Clemson (550,000 cores)
New record 650,000+ cores)
•Discounts of 50%-90% are common
Many cluster tools use Spot
automatically:
•Alces Flight
•CfnCluster
•NICE EnginFrame
•CloudyCluster
•…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Spot is an excellent match for HTC / Grid Computing
Fault toleranceStateless Multi-AZ
Loosely
coupled
Instance
Flexibility
¢
But also often used for HPC (tightly coupled) workloads
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Advanced Spot usage
Spot Fleet
•“Give me 400 cores - choose the
best Availability Zone and
instance types for me”
•You can select and weigh
instance types
•AWS chooses cheapest compute
fleet for you
Spot Block
•Your spot instances are
guaranteed for up to 6 hours
•Slightly lower discount
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using Containers
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Elastic Container Service for Kubernetes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using Containers
AWS Batch – a managed service for container based jobs
Tutorial using AWS Batch for DNA sequencing: https://github.com/awslabs/aws-batch-genomics
Container Based: Each job is a Docker container with runtime parameters. Submit
tens to millions of jobs to a queue, with priority and job dependency options.
Fully Managed: No software to install or servers to manage. AWS Batch provisions,
manages, and scales the infrastructure needed to run the jobs.
Cost optimization: use spot instances or reserved instances to get the most research
possible out of your research budget.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“Job-A”
“Job-C”
C:0
…
C:1
C:2
C:3
C:9999
D:0
D:1
D:2
D:3
D:9999
“Job-D”“Job-B”
B:0
…
B:1
B:9
B:2
“Job-E”
Heavy
Network I/O
CPU
Intensive
Large
Memory
Setup Cleanup
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using Containers
DNA Sequencing
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Building High-Throughput Genomics Batch Workflows on AWS
https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-introduction-part-1-
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Batch Concepts
•Job Queue
•Compute Environments
•Job Definitions
•Jobs
• Single jobs vs array jobs
•Scheduler
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Job Queues
Jobs are submitted to a Job Queue, where they reside until they are able to be
scheduled to a compute resource. Information related to completed jobs
persists in the queue for 24 hours.
$ aws batch create-job-queue --job-queue-name genomics
--priority 500 --compute-environment-order ...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Compute Environments
Job queues are mapped to one or more Compute Environments containing
the EC2 instances used to run containerized batch jobs.
Managed compute environments enable you to describe your business
requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as
a % of On-Demand) and we launch and scale resources on your behalf.
You can choose specific instance types (e.g. c4.8xlarge), instance families
(e.g. C4, M4, P3), or simply choose “optimal” and AWS Batch will launch
appropriately sized instances from our latest C/M/R instance families.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Compute Environments
Alternatively, you can launch and manage your own resources within an
Unmanaged compute environment. Your instances need to include the ECS agent
and run supported versions of Linux and Docker.
AWS Batch will then create an Amazon ECS cluster which can accept the
instances you launch. Jobs can be scheduled to your Compute Environment as
soon as your instances are healthy and register with the ECS Agent.
$ aws batch create-compute-environment --compute-
environment-name unmanagedce --type UNMANAGED ...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Job Definitions
Similar to ECS Task Definitions, AWS Batch Job Definitions specify how
jobs are to be run. While each job must reference a job definition, many
parameters can be overridden.
Some of the attributes specified in a job definition:
•IAM role associated with the job
•vCPU and memory requirements
•Retry strategy
•Mount points
•Container properties
•Environment variables
$ aws batch register-job-definition --job-definition-name gatk
--container-properties ...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Jobs
Jobs are the unit of work executed by AWS Batch as containerized
applications running on Amazon EC2.
Containerized jobs can reference a container image, command, and
parameters or users can simply provide a .zip containing their application
and we will run it on a default Amazon Linux container.
$ aws batch submit-job --job-name variant-calling
--job-definition gatk --job-queue genomics
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Easily run massively parallel jobs
Instead of submitting a large number of independent “simple jobs”, we also
support “array jobs” that run many copies of an application against an array
of elements.
Array jobs are an efficient way to run:
•Parametric sweeps
•Monte Carlo simulations
•Processing a large collection of objects
Job A Job C
Job B:0
Job B:1
Job B:n
…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Workflows, Pipelines, and Job Dependencies
Jobs can express a dependency on the successful
completion of other jobs or specific elements of an array
job.
Use your preferred workflow engine and language to
submit jobs. Flow-based systems simply submit jobs
serially, while DAG-based systems submit many jobs at
once, identifying inter-job dependencies.
$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Serverless Computing: AWS Lambda
Bring your own code
Node.JS, Java, Python
Java = Any JVM based
language such as Scala, Clojure,
etc.
Bring your own libraries
Flexible invocation paths
Event or RequestResponse
invoke options
Existing integrations with
various AWS services
Simple resource model
• Select memory from 128MB
to 1.5GB in 64MB steps
• CPU & Network allocated
proportionately to RAM
• Reports actual usage
Fine grained permissions
• Uses IAM role for Lambda
execution permissions
• Uses Resource policy for
AWS event sources
AWS Lambda is a service which allows for software functions in a variety of languages to be deployed
into the cloud natively, and to be triggered directly or driven by events in the cloud. The infrastructure
(hardware, operating system and software environment) for Lambda is managed by AWS and scales rapidly.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Two examples of HPC on Lambda
def my_function(b):
x = np.random.normal(0, b, 1024)
A = np.random.normal(0, b, (1024, 1024))
return np.dot(A, x)
pwex = pywren.default_executor()
res = pwex.map(my_function, np.linspace(0.1, 100, 1000))
PyWren.io
PyWren lets you run your existing
python code at massive scale via
AWS Lambda
CSIRO have built quickly
scaling genomics
analysis on AWS Lambda
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
IoT Real-time Flood Mapping with PetaBencana.id
Critical Web Services for Emergency Management
● Custom interface for Emergency
Control Room
● Real time flood data entered into
system via web interface and sourced
from Twitter
● IoT water level sensing devices, to
cheaply increase the monitoring
across the waterway network in
Jakarta using AWS IoT services
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Machine Learning Process is Hard…
Fetch data
Clean &
format data
Prepare &
transform
data
Train modelEvaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage
Notebook environments
• Get data to notebooks
securely
Experimentation
• Set up and manage
clusters
• Scale/distribute ML
algorithms
Deployment
• Set up and
manage inference
clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
The Amazon AI/ML Stack
PLATFORM SERVICES
APPLICATION SERVICES
FRAMEWORKS & INTERFACES
Caffe2 CNTK
Apache
MXNet
PyTorch TensorFlow Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend Lex
INFRASTRUCTURE
CPU IoT & EdgeGPU (P3) Mobile
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon SageMaker
1 2 3 4
I I I I
Notebook Instances Algorithms ML Training Service ML Hosting Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
1
I
Notebook Instances
Zero Setup For Exploratory Data Analysis
Authoring &
Notebooks
ETL Access to AWS
Database services
Access to S3 Data
Lake
• Recommendations/Personalization
• Fraud Detection
• Forecasting
• Image Classification
• Churn Prediction
• Marketing Email/Campaign Targeting
• Log processing and anomaly detection
• Speech to Text
• More…
“Just add data”
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
2
I
Algorithms
Training code
• Matrix Factorization
• Regression
• Principal Component Analysis
• K-Means Clustering
• Gradient Boosted Trees
• And More!
Amazon provided Algorithms
Bring Your Own Script (SM builds the Container)
SM Estimators in
Apache Spark Bring Your Own Algorithm (You build the Container)
Amazon SageMaker: algorithms
Streaming
datasets, for
cheaper training
Train faster, in a
single pass
Greater
reliability on
extremely large
datasets
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Managed Distributed Training with Flexibility
Training code
• Matrix Factorization
• Regression
• Principal Component Analysis
• K-Means Clustering
• Gradient Boosted Trees
• And More!
Amazon provided Algorithms
Bring Your Own Script (SM builds the Container)
Bring Your Own Algorithm (You build the Container)
3
I
ML Training Service
Fetch Training data
Save Model Artifacts
Fully
managed –
Secured–
Amazon ECR
Save Inference Image
SM Estimators in
Apache Spark
CPU GPU HPO
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
4
I
ML Hosting Service
Amazon ECR
30 50
10 10
ProductionVariant
Model Artifacts
Inference Image
Model versions
Versions of the same
inference code saved in
inference containers.
Prod is the primary
one, 50% of the traffic
must be served there!
One-Click!
EndpointConfiguration
Inference Endpoint
Amazon Provided Algorithms
Amazon SageMaker
Easy Model Deployment to Amazon SageMaker
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trainingdata
Modelartifacts
Training code Helper code
Helper codeInference code
GroundTruth Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
SageMaker: under the hood
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Global Infrastructure
18 Regions – 52 Availability Zones – 100+ Edge Locations
Region & Number of Availability Zones
US East EU
N. Virginia (6), Ohio (3) Ireland (3)
Frankfurt (2)
US West London (2)
Oregon (3)
Northern California (3) Asia Pacific
Singapore (2)
AWS GovCloud Sydney (2), Tokyo (3),
(US-West) (2) Seoul (2), Mumbai (2)
Canada China
Central (2) Beijing (2)
South America
São Paulo (3)
Announced Regions
China, France, Hong Kong, Sweden, Bahrain, and a
second AWS GovCloud Region in the US.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Regional
Network
Internet2
Research &
Education
Network
Public
Bilateral
Peering
Commercial
Transit
Internet2
Transit Rail
Commercial
Peering
Service
Privately
Owned or
Carrier
Network
AWS
Direct
Connect
Location
From You to AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon Virtual Private Cloud (VPC)
Root Account
(Payer)
Sandbox Central IT Researcher Department
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Virtual Private Cloud
A VPC is a virtual network within an AWS
Region where you can launch resources
You define:
• Address space (RFC 1918)
• Network subnets
• Route tables
• Firewall and ACL rules
• Internet connectivity
Peer multiple VPCs across one or more
accounts
Extend your on-premises network into
AWS
View the “From One To Many: Evolving VPC
Design” video
10.0.0.0/24 10.0.1.0/24
10.0.0.0/16
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
CloudTr
ail
AWS
Config
CloudWatch
Alarms
Archive
Logs
Bucket
S3
Lifecycle
Policies
to
Glacier
AWS Account
Standard Architecture Deployed by AWS Quick Start
us-east-1b
us-east-1c
Proxies
NAT
RDS DB
DMZSubnet
PrivateSubnet
PrivateSubnet
RDS DB
PrivateSubnet
PrivateSubnet
Production VPC
DMZSubnet
Proxies
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Management VPC
VPCPeer
Use
rs
Quick Start Design with Management, Production, and Notional Development VPCs
VPC
Peer
N
O
TIO
N
A
L
Archive
Logs
Bucket
S3 Lifecycle
Policies to
Glacier
CloudTr
ail
AWS
Config
Rules
CloudWatch
Alarms
NAT
us-east-1b
Bastion
us-east-1c
Potential
use for
security
appliances
for
monitoring,
logging,
etc.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Regional
Network
Internet2
Research &
Education
Network
Public
Bilateral
Peering
Commercial
Transit
Internet2
Transit Rail
Commercial
Peering
Service
Privately
Owned or
Carrier
Network
AWS
Direct
Connect
Location
Common Scenarios
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Commercial
Transit
Scenario 1 - Commercial Transit
• PROS
• Readily available
• Multiple redundant paths built in
• CONS
• At the mercy of the Internet
• Public
• Higher data egress cost
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Regional
Network
Public
Bilateral
Peering
Scenario 3 - Regional Network - Peering
CENIC
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Regional
Network
Internet2
Research &
Education
Network
Scenario 4 - Internet2 Research & Education Network
• PROS
• Multiple redundant paths built in
• FAST!!!
• CONS
• Shared infrastructure
• Semi Public
• Higher data egress cost (AWS sees it as out
over the Internet)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Use Internet2 to Access Your Workloads
us-west-2
us-east-1
AWS and Internet2
Network Peering Location
And Bandwidth
AWS Region
80 Gbps
20 Gbps
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Privately
Owned or
Carrier
Network
AWS
Direct
Connect
Location
Scenario 5 - AWS Direct Connect
• PROS
• Private circuit
• Low latency
• Lower data egress costs
• CONS
• Longer setup times at start
• Can be more expensive to do
redundantly
• Need network engineering help
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Direct Connect
• Dedicated, private connection into AWS
• 1 Gbps / 10 Gbps
• Smaller options through partners
• Create private (VPC) or public virtual interfaces to AWS
• Consistent network performance
• Option for redundant connections
• Multiple AWS accounts can share a connection
• Uses BGP to exchange routing information over a VLAN
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Direct Connect Locations (US)
Oregon
N. Virginia
Direct Connect location(s)
AWS Region
GovCloud
N. California
Ohio
New York City
Dallas
Chicago Reston
Ashburn
Seattle
Las Vegas
Los Angeles
Santa Clara
San Jose
Portland
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Scenario n:
Use them all!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
This is
YOU!
Regional
Network
Internet2
Research &
Education
Network
Public
Bilateral
Peering
Commercial
Transit
Internet2
Transit Rail
Commercial
Peering
Service
Privately
Owned or
Carrier
Network
AWS
Direct
Connect
Location
Common Scenarios
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
+
Register and enroll in the
AWS Research Cloud Program
https://aws.amazon.com/rcp
Launch your own personal cluster
Using Alces Flight
http://alces-flight.com/community
1. 2.
Thank You
jorissen@amazon.com

More Related Content

Similar to Amazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin Jorissen (7)

PDF
Aws
mahes3231
 
PDF
Log Analytics with AWS
AWS Germany
 
PDF
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
 
PDF
엔터프라이즈를 위한 머신러닝 그리고 AWS (김일호 솔루션즈 아키텍트, AWS) :: AWS Techforum 2018
Amazon Web Services Korea
 
PPTX
Construindo data lakes e analytics com AWS
Amazon Web Services LATAM
 
PDF
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
Adir Sharabi
 
PDF
Building a modern data platform in the cloud. AWS DevDay Nordics
javier ramirez
 
Log Analytics with AWS
AWS Germany
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
 
엔터프라이즈를 위한 머신러닝 그리고 AWS (김일호 솔루션즈 아키텍트, AWS) :: AWS Techforum 2018
Amazon Web Services Korea
 
Construindo data lakes e analytics com AWS
Amazon Web Services LATAM
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
Adir Sharabi
 
Building a modern data platform in the cloud. AWS DevDay Nordics
javier ramirez
 

More from Lab San Isidro (20)

PDF
Adaptarse para que la innovación suceda - Viviana Monge
Lab San Isidro
 
PDF
Trabajando para ti, por una ciudad sostenible - Renato Moscoso
Lab San Isidro
 
PDF
Logros de Lab San Isidro - Jimena Sánchez
Lab San Isidro
 
PDF
La experiencia de Lab San Isidro - IOT - Sensores
Lab San Isidro
 
PDF
Plataforma Única del Estado Gob.pe - San Isidro Meetup
Lab San Isidro
 
PDF
Transformación Digital - Presentación de RENIEC en el San Isidro Meetup
Lab San Isidro
 
PDF
Taller Design Thinking: el Diseño Centrado en Ciudadanos
Lab San Isidro
 
PDF
Proyecto de Implementación de sensores en el distrito de San Isidro
Lab San Isidro
 
PDF
App Infocity - San Isidro Meetup
Lab San Isidro
 
PDF
Portal Empleo 1 - San Isidro Meetup
Lab San Isidro
 
PPTX
Videopapeleta - San Isidro Meetup
Lab San Isidro
 
PDF
Guio Driver - San Isidro Meetup
Lab San Isidro
 
PDF
Recomedik - San Isidro Meetup
Lab San Isidro
 
PDF
Sinapssis: Connection of the Future - San Isidro Meetup
Lab San Isidro
 
PDF
¿Por qué tributar? - SUNAT
Lab San Isidro
 
PDF
Regimenes tributarios - SUNAT
Lab San Isidro
 
PDF
Presentación de los Fab Lab - conferencia Mujeres Makers 2018
Lab San Isidro
 
PDF
Laboratorio de Innovación de Promperú - San Isidro Meetup
Lab San Isidro
 
PDF
Startup Rebiú - San Isidro Meetup
Lab San Isidro
 
PDF
Net Impact Perú en el San Isidro Meetup
Lab San Isidro
 
Adaptarse para que la innovación suceda - Viviana Monge
Lab San Isidro
 
Trabajando para ti, por una ciudad sostenible - Renato Moscoso
Lab San Isidro
 
Logros de Lab San Isidro - Jimena Sánchez
Lab San Isidro
 
La experiencia de Lab San Isidro - IOT - Sensores
Lab San Isidro
 
Plataforma Única del Estado Gob.pe - San Isidro Meetup
Lab San Isidro
 
Transformación Digital - Presentación de RENIEC en el San Isidro Meetup
Lab San Isidro
 
Taller Design Thinking: el Diseño Centrado en Ciudadanos
Lab San Isidro
 
Proyecto de Implementación de sensores en el distrito de San Isidro
Lab San Isidro
 
App Infocity - San Isidro Meetup
Lab San Isidro
 
Portal Empleo 1 - San Isidro Meetup
Lab San Isidro
 
Videopapeleta - San Isidro Meetup
Lab San Isidro
 
Guio Driver - San Isidro Meetup
Lab San Isidro
 
Recomedik - San Isidro Meetup
Lab San Isidro
 
Sinapssis: Connection of the Future - San Isidro Meetup
Lab San Isidro
 
¿Por qué tributar? - SUNAT
Lab San Isidro
 
Regimenes tributarios - SUNAT
Lab San Isidro
 
Presentación de los Fab Lab - conferencia Mujeres Makers 2018
Lab San Isidro
 
Laboratorio de Innovación de Promperú - San Isidro Meetup
Lab San Isidro
 
Startup Rebiú - San Isidro Meetup
Lab San Isidro
 
Net Impact Perú en el San Isidro Meetup
Lab San Isidro
 
Ad

Recently uploaded (20)

PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Ad

Amazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin Jorissen

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Dr. Kevin Jorissen, AWS Research Computing Amazon cloud resources as part of scientific workflows & HPC Sept 13, 2018
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kevin Jorissen Seattle Kevin has 10 years of experience in computational science, and holds a Ph.D. in Physics. He developed codes solving the quantum physics equations for light absorption by materials, taught workshops to scientists worldwide, and wrote about high performance computing in the cloud before it was fashionable. He worked as a postdoctoral researcher in Antwerp, Lausanne, Seattle, and Zurich. He contributed to the WIEN2k code (Density Functional Theory calculations of material properties, www.wien2k.at) and the FEFF code (X-ray and Electron absorption spectra, www.feffproject.org). Kevin joined Amazon in 2015 to help accelerate the adoption of cloud computing in the scientific community globally., BIO
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Hot off the presses: WWPS AWS Summit Real-Time Machine Learning on Satellite Imagery: How DigitalGlobe Uses Amazon SageMaker to Massively Scale-up Information Extraction from Satellite Imagery Using AWS and Open Data to Meet the Demands of Disaster Response Situations Transitioning Geoscience Research to the Cloud: Opportunities and Challenges AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS Enabling Sustainable Research Platforms in the Cloud Enabling Research using Hybrid HPC Cloud Computing Precision Medicine on the Cloud Transforming Research in Collaboration with Funding Agencies Enabling Research using Hybrid HPC Cloud Computing Innovation on the Edge: How Rapid Experimentation with Technology is Achieving Results in the Enterprise With NASA JPL Accelerating Analytics for the Future of Genomics Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Data Lake Ingestion Empowering Every Brain! How Brain Power is using AWS-Powered AI in their Mission to Help People with Autism and Other Brain-Related Challenges … Now available at https://www.youtube.com/user/AmazonWebServices/videos
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Processing and Streaming GOES-16 Data with AWS Managed Services Hot off the presses: WWPS AWS Summit
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Machine learning outputs Professional mapper tracing The final output of high-voltage grids DevelopmentSeed: Machine Learning with Earth Observation Imagery Hot off the presses: WWPS AWS Summit
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Key Strengths of AWS for Scientific Discoveries Time to discovery • Availability of resources, scalability, right-sizing • Experiment fast • Avoid undifferentiated work
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Availability of resources: We’re off to a cute start …
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Right-sized resources: Genomics processing on FPGA Accelerators Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of 1,000 Genomes Orlando, Fla., Oct 19, 2017 – The Children’s Hospital of Philadelphia (CHOP) and Edico Genome today set a new scientific world standard in rapidly processing whole human genomes into data files usable for researchers aiming to bring precision medicine into mainstream clinical practice. Utilizing Edico Genome’s DRAGENTM Genome Pipeline, deployed on 1,000 Amazon EC2 F1 instances on the Amazon Web Services (AWS) Cloud, 1,000 pediatric genomes were processed in two hours and 25 minutes. … Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Moving quickly with managed services DNA Sequencing using AWS container services
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark RFP != INNOVATION What if you could … • Access FPGA servers today instead of waiting for a procurement? • See if your code runs better on modern GPUs (and tear it down again immediately if it doesn’t?) • Spin up a compute cluster now instead of waiting to apply for an allocation? • Rearchitect a science application into serverless or microservices in a few weeks, because AWS does the undifferentiated heavy lifting? • Let your collaborators try out a new ML algorithm on your data, without having to ask you for resources?
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Key Strengths of AWS for Scientific Discoveries Time to discovery • Availability of resources, scalability, right-sizing • Experiment fast • Avoid undifferentiated work Collaboration • Data lake model • Security & compliance • Sharing • Infrastructure, ML, Analytics
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Collaborating on scientific data in the cloud AthenaGlue
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Collaborating on scientific data in the cloud NOAA- NEXRAD on AWS S3, usage increased 2.3x greater scientific impact
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark NIH initiatives: National Cancer Institute – Cloud Resources http://www.cancergenomicscloud.org Funded projects to create collaborative environments on cloud
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark NASA Image and Video Library (2017) https://aws.amazon.com/partners/success/nasa-image-library/ • Easy Access to the Wonders of Space. Fully compliant with Section 508 of the Rehabilitation Act. • Built-in Scalability. “On-demand scalability will be invaluable for events such as the solar eclipse that’s happening later this summer— both as we upload new media and as the public comes to view that content,” says Bryan Walls, Imagery Experts Deputy Program Manager at NASA. • Good Use of Taxpayer Dollars. By building its Image and Video Library in the cloud, NASA avoided the costs associated with deploying and maintaining server and storage hardware in-house. Instead, the agency can simply pay for the AWS resources it uses at any given time.
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark ” “ U.K. Met Office Uses AWS to Deliver Tailored Meteorological Data The Met Office has been a widely respected national weather service in the United Kingdom for 160 years. “We are using the AWS Cloud to drive the mass- market availability of customizable weather information. • Needed the means to send weather data to device users and third-party customers. • Deployed Amazon ElastiCache to respond to peak demands. • Attracted more than half a million users with its WeatherCloud app. • Scaled data storage tenfold and reduced solution costs by 50 percent. • Enabled innovation of big data services in a competitive landscape. James Tomkins Head of Enterprise IT Architecture Met Office ” “ https://aws.amazon.com/solutions/case-studies/the-met-office/ https://aws.amazon.com/about-aws/whats-new/2017/08/uk-met-office-high-resolution-weather-forecast-data-is-now-on-aws/
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Public Datasets on AWS To stimulate innovation, AWS hosts a selection of datasets that anyone can access for free. Data in our public datasets is available for rapid access to our flexible and low-cost computing resources. Earth Science • Landsat • NEXRAD • NASA NEX Life Science • TCGA & ICGC (used at OICR) • 1000 Genomes • Genome in a Bottle • Human Microbiome Project • 3000 Rice Genome Internet Science • Common Crawl Corpus • Google Books Ngrams • Multimedia Commons https://aws.amazon.com/public-datasets/
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS (RODA) https://registry.opendata.awsRegister YOUR datasets here and tell people how to use them. Browse what your peers are sharing.
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Getting started: AWS Researcher’s Handbook The 150-page missing manual for science in the cloud. Written by Amazon’s Research Computing community for scientists. • Explains foundational concepts about how AWS can accelerate time-to-science in the cloud. • Step-by-step best practices for securing your environment to ensure your research data is safe and your privacy is protected. • Tools for budget management that will help you control your spending and limit costs (and preventing any over-runs). • Catalogue of scientific solutions from partners chosen for their outstanding work with scientists. aws.amazon.com/rcp
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Getting started: institutional solutions @ Emory https://edscoop.com/emory-university-research-aws-cloud-rich-mendola
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark RISELab (Real-time Intelligent Secure Execution) Collaborative 5-year effort between UC Berkeley, National Science Foundation, and industry partners. (2017-2021) – AWS is founding partner. https://riselab.cs.berkeley.edu • Students and researchers at RISELab use AWS to rapidly prototype and develop new systems at a scale and with a speed not possible before. • Resulted in Apache Spark, developed on AWS, and integrated with AWS core services. Berkeley Data Analytics Stack GOAL:
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark TL;DR (takeaway messages) • AWS collaborates with research community on the biggest research challenges • AWS cloud has scale, services and data capabilities beyond research cyberinfrastructure • AWS Research Cloud Program eases the on-ramp
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Why AWS Scale and Elasticity
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Life Sciences HPC Workloads in the Cloud Financial Services Energy & Geo Sciences Design & Engineering Media & Entertainment Autonomous Vehicles
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Elasticity: Natural Language Processing at Clemson University 550,000 cores using EC2 Spot Instances https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million- vcpus-ec2-spot-instances/ “I am absolutely thrilled with the outcome of this experiment. The graduate students on the project […] used resources from AWS and Omnibond and developed a new software infrastructure to perform research at a scale and time-to-completion not possible with only campus resources.” – Prof. Amy Apon, Co-Director of the Complex Systems, Analytics and Visualization Institute “spot market”: cheap AWS computing –a good fit for research
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Elasticity: Natural Language Processing at Clemson University 550,000 cores & EC2 Spot Instances https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million- vcpus-ec2-spot-instances/ “I am absolutely thrilled with the outcome of this experiment. The graduate students on the project are amazing. They used resources from AWS and Omnibond and developed a new software infrastructure to perform research at a scale and time-to-completion not possible with only campus resources. Per-second billing was a key enabler of these experiments.” – Prof. Amy Apon, Co-Director of the Complex Systems, Analytics and Visualization Institute . Browse data at https://registry.opendata.aws https://blogs.univa.com/2018/06/univa-demonstrates-extreme-scale-automation-by-deploying-more-than-one-million-cores-in-a-single-univa-grid-engine-cluster-using-aws/ 2 months ago – 500,000 cores
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark BNL/Fermilab and High Throughput Computing at Scale High Energy Physics • Discovery of the Higgs Boson Particle • Added 58,000 Spot Cores Elastically • Monte Carlo Simulations Searching for Particles • Reduced workload from 6 weeks to 10 days Source: https://aws.amazon.com/blogs/aws/experiment-that-discovered-the-higgs-boson-uses-aws-to-probe-nature/
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark ATLAS Architecture with AWS
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark West ATLAS Architecture +West West Results Collision Data Data EC2 instances
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark NASA – Climate Research • Mosaicking 2,500+ QuickBird satellite images into 100-kilometer (km) x 100-km tiles, which are then broken into 25-km x 25-km sub-tiles for processing. • Orthorectifying and mosaicking all satellite data in ADAPT • Identifying trees and shrubs using adaptive vegetation classifier algorithms. Estimating biomass. Incorporating algorithms to calculate tree and shrub height for biomass estimates. The combined resources of ADAPT and AWS potentially reduce total processing time from 10 months to less than 1 month Source: https://www.nas.nasa.gov/SC15/demos/demo31.html
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Genomics processing on FPGA Accelerators Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of 1,000 Genomes Orlando, Fla., Oct 19, 2017 – The Children’s Hospital of Philadelphia (CHOP) and Edico Genome today set a new scientific world standard in rapidly processing whole human genomes into data files usable for researchers aiming to bring precision medicine into mainstream clinical practice. Utilizing Edico Genome’s DRAGENTM Genome Pipeline, deployed on 1,000 Amazon EC2 F1 instances on the Amazon Web Services (AWS) Cloud, 1,000 pediatric genomes were processed in two hours and 25 minutes. … Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Why AWS Constant technological evolution allows technical experimentation
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Why AWS Compliance and Security
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Compliance Programs SOC 1 Global SOC 2 SOC 3 https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/ United States Asia Pacific Europe
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark GovCloud region for sensitive workloads
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS cloud services Compute; Storage; Networking; Machine Learning
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Global Infrastructure 18 Regions – 52 Availability Zones – 100+ Edge Locations Region & Number of Availability Zones US East EU N. Virginia (6), Ohio (3) Ireland (3) Frankfurt (2) US West London (2) Oregon (3) Northern California (3) Asia Pacific Singapore (2) AWS GovCloud Sydney (2), Tokyo (3), (US-West) (2) Seoul (2), Mumbai (2) Canada China Central (2) Beijing (2) South America São Paulo (3) Announced Regions China, France, Hong Kong, Sweden, Bahrain, and a second AWS GovCloud Region in the US.
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark The AWS Console (in your web browser) Common Services for HPC Applications
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Controlling your AWS resources 1. Web browser (point-and-click) 2. Command-line interface (script, automate) 3. SDKs (GUIs, platforms, Science Gateways) https://aws.amazon.com/cli/
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon S3 Secure, durable, highly-scalable object storage. Fast access, low cost. For long-term durable storage of data, in a readily accessible get/put access format. Primary durable and scalable storage for HPC data Amazon Glacier Secure, durable, long term, highly cost- effective object storage. For long-term storage and archival of data that is infrequently accessed. Use for long-term, lower-cost archival of HPC data EC2+EBS Block storage device (SSD or HDD) for file system attached to EC2 instance. Can build parallel file system (e.g., using Intel Lustre). For near-line storage of files optimized for high I/O performance. Use for high-IOPs, temporary working storage AWS Storage Options for HPC Workloads EFS Highly available, multi-AZ, fully managed network- attached elastic file system. For near-line, highly- available storage of files in a traditional NFS format (NFSv4). Use for read-often, temporary working storage
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data transfer HPC Data Flow on AWS Storage Campus data center Amazon Glacier Amazon S3 AWS Direct Connect ISV Connectors Storage Gateway AWS Snowball Internet/VPN Ingress Egress Lifecycle EC2 Instance EBS Instance Store Object, block, file storage Kinesis Data Firehose S3 Transfer Acceleration Amazon CloudFront Other Shared File System EFS 25 Gbps to S3
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Spectrum of Compute Instance Types General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark We’re ready to build a cluster:
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using a compute cluster in the cloud Shared File Storage Cloud-based, scaling HPC cluster Cluster head node with job scheduler Amazon S3 and Amazon Glacier Thin or Zero Client - No local data -
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Template—based compute clusters • Infrastructure as code: all the elements of a compute cluster are defined in a template, called “CloudFormation”. • AWS executes the template and stands up the prescribed infrastructure in 5-10 minutes. • Come with popular job schedulers, shared NFS storage, etc. • Install your applications & stage your data. • Of course, you can customize it endlessly. • You can create and destroy clusters at will. Creating a cluster can become a 1-line action in your bash scripts.
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Compute clusters in the cloud are elastic morning afternoon evening t c c t c t
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon S3 storage of input/output R4 P3 P3 P3 P3 P3 P3 R4 C5 C5 C5 C5 C5 C5 GPU cluster visualization CPU cluster weather model Compute clusters in the cloud are fit for purpose
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon S3 storage of input/output Compute clusters in the cloud are ephemeral
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Tightly and loosely coupled workloads Cluster HPC Tightly coupled, latency sensitive applications Use larger EC2 compute instances, placement groups, enhanced networking Grid HPC Loosely coupled, HTC, pleasingly parallel Use a variety of EC2 instances, multiple AZs, Spot, Auto Scaling, Amazon SQS Ensemble? Run all members at once!
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Automatic re-sizing of compute clusters based upon need Define minimum and maximum pool sizes and when scaling and cool down occurs. CloudWatch usage metrics drive scaling, for example CPU utilization or job queue depth Trigger auto-scaling policy Autoscaling
  • 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Burst for fast results # CPUs time # CPUs time Wall clock time: ~1 hour Wall clock time: ~1 week Cost: equal
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Popular scientific applications prepackaged Deploys in ~5 minutes. Familiar job schedulers, scientific applications, and shared file system. Install any software you need. No job queues – it’s your personal cluster. Access to the graphical console. Deploys in minutes. Scales as large as needed when you add jobs to the queue, and scales back down when the jobs are done. Using a compute cluster in the cloud Self-scaling HPC clusters instantly ready to compute, billed by the hour and use the AWS Spot market by default, so they’re very low cost
  • 53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using a compute cluster in the cloud Self-scaling HPC clusters instantly ready to compute, billed by the hour and use the AWS Spot market by default, so they’re very low cost Command Line (ssh) Graphical Console NAMD example shown: http://docs.alces-flight.com/en/stable/getting-started/environment-usage/using-openfoam-with-alces-flight-compute.html
  • 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Create an Alces cluster from the AWS Marketplace • Specify the details of template instantiation • Called a “stack” • Allows you to tailor stack to needs
  • 55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Your choice: •SGE •Slurm •Torque •OpenLava •PBS Pro •You have full rights so you can always install your favorite, custom scheduler •Or skip it Schedulers Smart scaling: •Scheduler knows how much work is waiting in the job queue •Triggers expansion of compute fleet if needed (up to limit) •Terminates idle compute nodes so you don’t pay for idle nodes.
  • 56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Alces has >1300+ applications [alces@login1(myAWSomeHPCDemo) ~]$ alces gridware list https://gridware.alces-flight.com/software
  • 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Performance for Fluid Dynamics on AWS ANSYS Fluent • AWS c4.8xlarge • 140M cells • F1 car CFD benchmark http://www.ansys-blog.com/simulation-on-the-cloud/
  • 58. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Test using larger, real-world examples • Use large cases for testing: do not benchmark scalability using only small examples Domain decomposition • Choose number of cells per core for either per-core efficiency or for faster results Instance types • C4 or M4 are best choices today Network • Use a placement group • Enable enhanced networking OS version • Use Amazon Linux or a version 3.10 or later Linux kernel Processor states and affinity • Use P-states to reduce processor variability • Use CPU affinity to pin threads to CPU cores MPI libraries • Intel MPI recommended Hyper-threading • Test with Hyper-threading on and off • Usually off is best, but not always Performance Considerations for HPC on AWS “BEST PRACTICES”: Well architected for HPC paper HPC best practices paper RCP Researchers Handbook
  • 59. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark How do I get the lowest network Latency? 172.31.0.0/18 172.31.1.0/24 172.31.1.7 172.31.1.8 172.31.1.9 instance instance instance • Use a current instance type – With network optimization – EBS Optimized • Use Enhanced Networking • Launch with a placement group • … important for the performance of tightly-coupled HPC codes
  • 60. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark http://www2.mmm.ucar.edu/wrf/OnLineTutorial/wrf_in_cloud_aws_tutorial.php
  • 61. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark http://www2.mmm.ucar.edu/wrf/OnLineTutorial/wrf_in_cloud_aws_tutorial.php http://cloud-gc.readthedocs.io/en/latest/chapter02_beginner-tutorial/quick-start.html#quick-start-label
  • 62. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark WRF Weather Prediction WRF Scaling and Performance on AWS Weather and climate models are popular workloads on AWS: Researchers, businesses (The Weather Channel), financial sector, …
  • 63. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cost versus solution time is what matters 0 20 40 60 80 100 120 140 160 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 0 50 100 150 200 250 300 350 Scale Up Time (s) Cores c4.8xlarge Time c4.8xlarge Scaleup It’s your choice: • Lowest Cost • Fastest run time • Something in the middle?
  • 64. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Clusters in the cloud: What we’ve seen so far Cloud Clusters are ephemeral. You can destroy an old cluster and replace it with a new cluster on a whim. This brings many advantages: • You can script (automate) the creation of a cluster e.g. in a bash script • Every user can have their own supercomputer – no more queues • Software flexibility: you can install any of your custom codes • You can choose the right server type for each job (e.g. CPU-constrained, memory-constrained, GPU, …) • You can let the number of compute servers fluctuate with the job queue • You can scale the problem size to real-world dimensions because there’s always room for n+1 • You can experiment and develop quickly (“agility”) • You can get quicker results by using a larger cluster for a shorter time • Every time AWS releases a new instance type, you benefit from a performance boost (hardware refresh is easy and free!)
  • 65. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Homework Sign up for the Researchers Handbook for AWS at aws.amazon.com/rcp . Browse data at https://registry.opend 1. Alces Flight compute cluster - NAMD tutorial: Launch “Performance Compute (SGE)” cluster at https://launch.alces-flight.com/default/launch , wait for e-mail confirmation, then tutorial from http://docs.alces-flight.com/en/stable/getting-started/environment-usage/using-openfoam-with-alces-flight-compute.html 2. Containers + AWS Batch for DNA sequencing: https://github.com/awslabs/aws-batch-genomics 3. Containers – WRF Big Weather Web: www.bigweatherweb.org 4. Serverless Computing – PyWren: http://pywren.io/pages/gettingstarted.html then https://github.com/pywren/examples/ 5. SageMaker Machine Learning labs: files from https://bit.ly/2HhD2SG ; instructions at https://github.com/wleepang/sagemaker4research-workshop ; further labs at https://developmentseed.org/blog/2018/01/19/sagemaker-label-maker-case/ and https://aws.amazon.com/blogs/machine-learning/simulate-quantum-systems-on-amazon-sagemaker/
  • 66. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Spot: ICRAR/CHILES finding neutral hydrogen galaxies • A global radio astronomy consortium (led by Columbia University in New York) needed to process observational data from the Very Large Array telescope in New Mexico. A 12-hour SLA meant they need ~$2 million of conventional HPC hardware. This was impossible because they had only $50k. • Using the EC2 Spot market in AWS’s northern Virginia region, they were able to deploy their HPC workload at a much larger scale -- so they always beat their SLA -- whilst averaging only $1,200 per month of EC2 compute resources, well within their 2-year budget of $50k. • The project produced a major discovery which smashed the previous record for identifying a neutral hydrogen galaxy by nearly twice the redshift of its predecessor.
  • 67. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark On-Demand Pay for compute capacity by the second with no long- term commitments For bursty workloads AWS Compute Consumption Models Reserved Commit upfront to 1 year and receive a significant discount on the hourly charge For steady utilization Spot Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand For non-urgent, fault- tolerant workloads
  • 68. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Spot Market filler 0.0 0 1.5 0 3.0 0 4.5 0 6.0 0 # CPUs time Spot Market Our ultimate space filler. Spot Instances allow you to name your own price for spare AWS The “Spot Market” stretches your research budget $$
  • 69. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Spot Rules are Simple Spot is a market in which the price of compute changes based on supply and demand You’ll never pay more than your bid. When the market exceeds your bid you get 2 minutes to wrap up your work. Time to checkpoint!
  • 70. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Applications of the SPOT Market Many examples we saw earlier used Spot EC2 instances: •Fermilab (60,000 cores) •Novartis Clemson (550,000 cores) New record 650,000+ cores) •Discounts of 50%-90% are common Many cluster tools use Spot automatically: •Alces Flight •CfnCluster •NICE EnginFrame •CloudyCluster •…
  • 71. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Spot is an excellent match for HTC / Grid Computing Fault toleranceStateless Multi-AZ Loosely coupled Instance Flexibility ¢ But also often used for HPC (tightly coupled) workloads
  • 72. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Advanced Spot usage Spot Fleet •“Give me 400 cores - choose the best Availability Zone and instance types for me” •You can select and weigh instance types •AWS chooses cheapest compute fleet for you Spot Block •Your spot instances are guaranteed for up to 6 hours •Slightly lower discount
  • 73. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using Containers
  • 74. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Elastic Container Service for Kubernetes
  • 75. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using Containers AWS Batch – a managed service for container based jobs Tutorial using AWS Batch for DNA sequencing: https://github.com/awslabs/aws-batch-genomics Container Based: Each job is a Docker container with runtime parameters. Submit tens to millions of jobs to a queue, with priority and job dependency options. Fully Managed: No software to install or servers to manage. AWS Batch provisions, manages, and scales the infrastructure needed to run the jobs. Cost optimization: use spot instances or reserved instances to get the most research possible out of your research budget.
  • 76. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “Job-A” “Job-C” C:0 … C:1 C:2 C:3 C:9999 D:0 D:1 D:2 D:3 D:9999 “Job-D”“Job-B” B:0 … B:1 B:9 B:2 “Job-E” Heavy Network I/O CPU Intensive Large Memory Setup Cleanup
  • 77. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using Containers DNA Sequencing
  • 78. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Building High-Throughput Genomics Batch Workflows on AWS https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-introduction-part-1-
  • 79. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Batch Concepts •Job Queue •Compute Environments •Job Definitions •Jobs • Single jobs vs array jobs •Scheduler
  • 80. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Job Queues Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours. $ aws batch create-job-queue --job-queue-name genomics --priority 500 --compute-environment-order ...
  • 81. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Compute Environments Job queues are mapped to one or more Compute Environments containing the EC2 instances used to run containerized batch jobs. Managed compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as a % of On-Demand) and we launch and scale resources on your behalf. You can choose specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, P3), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from our latest C/M/R instance families.
  • 82. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Compute Environments Alternatively, you can launch and manage your own resources within an Unmanaged compute environment. Your instances need to include the ECS agent and run supported versions of Linux and Docker. AWS Batch will then create an Amazon ECS cluster which can accept the instances you launch. Jobs can be scheduled to your Compute Environment as soon as your instances are healthy and register with the ECS Agent. $ aws batch create-compute-environment --compute- environment-name unmanagedce --type UNMANAGED ...
  • 83. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Job Definitions Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden. Some of the attributes specified in a job definition: •IAM role associated with the job •vCPU and memory requirements •Retry strategy •Mount points •Container properties •Environment variables $ aws batch register-job-definition --job-definition-name gatk --container-properties ...
  • 84. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Jobs Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters or users can simply provide a .zip containing their application and we will run it on a default Amazon Linux container. $ aws batch submit-job --job-name variant-calling --job-definition gatk --job-queue genomics
  • 85. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 86. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Easily run massively parallel jobs Instead of submitting a large number of independent “simple jobs”, we also support “array jobs” that run many copies of an application against an array of elements. Array jobs are an efficient way to run: •Parametric sweeps •Monte Carlo simulations •Processing a large collection of objects Job A Job C Job B:0 Job B:1 Job B:n …
  • 87. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Workflows, Pipelines, and Job Dependencies Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job. Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at once, identifying inter-job dependencies. $ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ...
  • 88. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Serverless Computing: AWS Lambda Bring your own code Node.JS, Java, Python Java = Any JVM based language such as Scala, Clojure, etc. Bring your own libraries Flexible invocation paths Event or RequestResponse invoke options Existing integrations with various AWS services Simple resource model • Select memory from 128MB to 1.5GB in 64MB steps • CPU & Network allocated proportionately to RAM • Reports actual usage Fine grained permissions • Uses IAM role for Lambda execution permissions • Uses Resource policy for AWS event sources AWS Lambda is a service which allows for software functions in a variety of languages to be deployed into the cloud natively, and to be triggered directly or driven by events in the cloud. The infrastructure (hardware, operating system and software environment) for Lambda is managed by AWS and scales rapidly.
  • 89. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Two examples of HPC on Lambda def my_function(b): x = np.random.normal(0, b, 1024) A = np.random.normal(0, b, (1024, 1024)) return np.dot(A, x) pwex = pywren.default_executor() res = pwex.map(my_function, np.linspace(0.1, 100, 1000)) PyWren.io PyWren lets you run your existing python code at massive scale via AWS Lambda CSIRO have built quickly scaling genomics analysis on AWS Lambda
  • 90. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark IoT Real-time Flood Mapping with PetaBencana.id Critical Web Services for Emergency Management ● Custom interface for Emergency Control Room ● Real time flood data entered into system via web interface and sourced from Twitter ● IoT water level sensing devices, to cheaply increase the monitoring across the waterway network in Jakarta using AWS IoT services
  • 91. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Machine Learning Process is Hard… Fetch data Clean & format data Prepare & transform data Train modelEvaluate model Integrate with prod Monitor / debug / refresh Data wrangling • Set up and manage Notebook environments • Get data to notebooks securely Experimentation • Set up and manage clusters • Scale/distribute ML algorithms Deployment • Set up and manage inference clusters • Manage and auto scale inference APIs • Testing, versioning, and monitoring
  • 92. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark The Amazon AI/ML Stack PLATFORM SERVICES APPLICATION SERVICES FRAMEWORKS & INTERFACES Caffe2 CNTK Apache MXNet PyTorch TensorFlow Torch Keras Gluon AWS Deep Learning AMIs Amazon SageMaker AWS DeepLens Rekognition Transcribe Translate Polly Comprehend Lex INFRASTRUCTURE CPU IoT & EdgeGPU (P3) Mobile
  • 93. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon SageMaker 1 2 3 4 I I I I Notebook Instances Algorithms ML Training Service ML Hosting Service
  • 94. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 1 I Notebook Instances Zero Setup For Exploratory Data Analysis Authoring & Notebooks ETL Access to AWS Database services Access to S3 Data Lake • Recommendations/Personalization • Fraud Detection • Forecasting • Image Classification • Churn Prediction • Marketing Email/Campaign Targeting • Log processing and anomaly detection • Speech to Text • More… “Just add data”
  • 95. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 2 I Algorithms Training code • Matrix Factorization • Regression • Principal Component Analysis • K-Means Clustering • Gradient Boosted Trees • And More! Amazon provided Algorithms Bring Your Own Script (SM builds the Container) SM Estimators in Apache Spark Bring Your Own Algorithm (You build the Container) Amazon SageMaker: algorithms Streaming datasets, for cheaper training Train faster, in a single pass Greater reliability on extremely large datasets
  • 96. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Managed Distributed Training with Flexibility Training code • Matrix Factorization • Regression • Principal Component Analysis • K-Means Clustering • Gradient Boosted Trees • And More! Amazon provided Algorithms Bring Your Own Script (SM builds the Container) Bring Your Own Algorithm (You build the Container) 3 I ML Training Service Fetch Training data Save Model Artifacts Fully managed – Secured– Amazon ECR Save Inference Image SM Estimators in Apache Spark CPU GPU HPO
  • 97. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 4 I ML Hosting Service Amazon ECR 30 50 10 10 ProductionVariant Model Artifacts Inference Image Model versions Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there! One-Click! EndpointConfiguration Inference Endpoint Amazon Provided Algorithms Amazon SageMaker Easy Model Deployment to Amazon SageMaker InstanceType: c3.4xlarge InitialInstanceCount: 3 ModelName: prod VariantName: primary InitialVariantWeight: 50
  • 98. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon ECR Model Training (on EC2) Model Hosting (on EC2) Trainingdata Modelartifacts Training code Helper code Helper codeInference code GroundTruth Client application Inference code Training code Inference requestInference response Inference Endpoint Amazon SageMaker SageMaker: under the hood
  • 99. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Global Infrastructure 18 Regions – 52 Availability Zones – 100+ Edge Locations Region & Number of Availability Zones US East EU N. Virginia (6), Ohio (3) Ireland (3) Frankfurt (2) US West London (2) Oregon (3) Northern California (3) Asia Pacific Singapore (2) AWS GovCloud Sydney (2), Tokyo (3), (US-West) (2) Seoul (2), Mumbai (2) Canada China Central (2) Beijing (2) South America São Paulo (3) Announced Regions China, France, Hong Kong, Sweden, Bahrain, and a second AWS GovCloud Region in the US.
  • 100. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Regional Network Internet2 Research & Education Network Public Bilateral Peering Commercial Transit Internet2 Transit Rail Commercial Peering Service Privately Owned or Carrier Network AWS Direct Connect Location From You to AWS
  • 101. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon Virtual Private Cloud (VPC) Root Account (Payer) Sandbox Central IT Researcher Department
  • 102. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Virtual Private Cloud A VPC is a virtual network within an AWS Region where you can launch resources You define: • Address space (RFC 1918) • Network subnets • Route tables • Firewall and ACL rules • Internet connectivity Peer multiple VPCs across one or more accounts Extend your on-premises network into AWS View the “From One To Many: Evolving VPC Design” video 10.0.0.0/24 10.0.1.0/24 10.0.0.0/16
  • 103. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark CloudTr ail AWS Config CloudWatch Alarms Archive Logs Bucket S3 Lifecycle Policies to Glacier AWS Account Standard Architecture Deployed by AWS Quick Start us-east-1b us-east-1c Proxies NAT RDS DB DMZSubnet PrivateSubnet PrivateSubnet RDS DB PrivateSubnet PrivateSubnet Production VPC DMZSubnet Proxies
  • 104. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Management VPC VPCPeer Use rs Quick Start Design with Management, Production, and Notional Development VPCs VPC Peer N O TIO N A L Archive Logs Bucket S3 Lifecycle Policies to Glacier CloudTr ail AWS Config Rules CloudWatch Alarms NAT us-east-1b Bastion us-east-1c Potential use for security appliances for monitoring, logging, etc.
  • 105. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Regional Network Internet2 Research & Education Network Public Bilateral Peering Commercial Transit Internet2 Transit Rail Commercial Peering Service Privately Owned or Carrier Network AWS Direct Connect Location Common Scenarios
  • 106. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Commercial Transit Scenario 1 - Commercial Transit • PROS • Readily available • Multiple redundant paths built in • CONS • At the mercy of the Internet • Public • Higher data egress cost
  • 107. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Regional Network Public Bilateral Peering Scenario 3 - Regional Network - Peering CENIC
  • 108. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Regional Network Internet2 Research & Education Network Scenario 4 - Internet2 Research & Education Network • PROS • Multiple redundant paths built in • FAST!!! • CONS • Shared infrastructure • Semi Public • Higher data egress cost (AWS sees it as out over the Internet)
  • 109. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Use Internet2 to Access Your Workloads us-west-2 us-east-1 AWS and Internet2 Network Peering Location And Bandwidth AWS Region 80 Gbps 20 Gbps
  • 110. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Privately Owned or Carrier Network AWS Direct Connect Location Scenario 5 - AWS Direct Connect • PROS • Private circuit • Low latency • Lower data egress costs • CONS • Longer setup times at start • Can be more expensive to do redundantly • Need network engineering help
  • 111. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Direct Connect • Dedicated, private connection into AWS • 1 Gbps / 10 Gbps • Smaller options through partners • Create private (VPC) or public virtual interfaces to AWS • Consistent network performance • Option for redundant connections • Multiple AWS accounts can share a connection • Uses BGP to exchange routing information over a VLAN
  • 112. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Direct Connect Locations (US) Oregon N. Virginia Direct Connect location(s) AWS Region GovCloud N. California Ohio New York City Dallas Chicago Reston Ashburn Seattle Las Vegas Los Angeles Santa Clara San Jose Portland
  • 113. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Scenario n: Use them all!
  • 114. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark This is YOU! Regional Network Internet2 Research & Education Network Public Bilateral Peering Commercial Transit Internet2 Transit Rail Commercial Peering Service Privately Owned or Carrier Network AWS Direct Connect Location Common Scenarios
  • 115. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark + Register and enroll in the AWS Research Cloud Program https://aws.amazon.com/rcp Launch your own personal cluster Using Alces Flight http://alces-flight.com/community 1. 2. Thank You [email protected]