Migrating Existing Open Source Machine Learning to Azure

Replicable and scriptable
Consistent syntax on Windows (cmd / Powershell), Mac, Linux, WSL
cda.ms/sH

Visual Studio [Code] Tools for AI
VS & VS Code extensions to
streamline computations in
servers, Azure ML, Batch AI, …
End to end development
environment, from new project
through training
Support for remote training & job
management
On top of all of the goodness of
VS (Python, Jupyter, Git, etc)
THR3129 Getting Started with Visual Studio Tools for AI, Chris Lauren

https://aka.ms/dsvm/overview
https://github.com/Azure/DataScienceVM

• Local tools
• Local Debug
• Faster
experimentation
Single VM
Development
• Larger VMs
• GPU
Scale Up
• Multi Node
• Remote Spark
• Batch Nodes
• VM Scale Sets
Scale Out

Series RAM vCPU GPU Approx Cost
Standard_B1s 1 Gb 1 None Free [*]
DS3_v2 14Gb 4 None $0.23 / hr
DS4_v2 28Gb 8 None $0.46 / hr
A8v2 16Gb 8 None $0.82 / hr
Standard_NC6 56 Gb 6 0.5 NV Tesla K80 $0.93 / hr
Standard_ND6s 112 Gb 6 1x Tesla P40 $2.14 / hr
[*] Not recommended: Standard_B1s (free, but too small to be useful)

https://xxx.xxx.xxx.xxx:8000/
http://xxx.xxx.xxx.xxx:8787/
https://cda.ms/s0

Azure Batch Batch pools
Configure and
create VMs to cater
for any scale: tens
to thousands.
Automatically scale
the number of
VMs to maximize
utilization.
Choose the VM
size most suited
to your
application.
Batch jobs and tasks
Task is a unit of execution;
task = command line application
Jobs created and tasks submitted
to a pool; tasks are queued, then
assigned to VMs.
Any application, any
execution time; run
applications unchanged.
Automatic detection and
retry of frozen or failing
tasks.

Cost savings
Scale clusters
size up and
down as
needed
Reserved
Instances for
persistent
infrastructure
Per-second
billing for
VMs
Flexible
consumption
and savings
with low-
priority VMs

Scaling AI with DSVM and Batch AI
DSVM
(Dev/Test Workstation)
Azure File
Store
Azure Batch AI
Cluster
Batch AI Run Script
Store Py Scripts in File Store
Create Py Scripts
Trained AI
Model

BRK3320 The Developer Data Scientist – Creating New
Analytics Driven Applications using Apache Spark with
Azure Databricks
May 8 10:30 AM-11:45 AM, Sheraton Grand Ballroom A

• Traditionally, static-sized clusters were the standard, so
compute and storage had to be collocated
• A single cluster with all necessary applications would be
installed onto the cluster (typically managed by YARN, or
something similar)
• The cluster was either over-utilized (jobs had to be
queued due to lack of capacity) OR was under-utilized
(there were idle cores that burned costs)
• Teams of data-scientists would have to submit jobs agaisnt
a single cluster - this meant that the cluster had to be
generic, preventing users from truly customizing their
clusters specifically for their jobs
Traditional / On-Premise Paradigm
DataStore

• With cloud computing, customers are no longer limited to
static size clusters
• Each job, or set of jobs, can have its own cluster so that a
customer is only charged for the minutes that the job runs
for
• Each user can have their own cluster, so that they don’t
have to compute for resources
• Each user can have their own custom cluster that is
created specifically for their experience and their
workload. Each user can install exactly the software they
need without polluting other user’s experiences
• IT admins don’t need to worry about running out of
capacity or burning dollars on idle cores
Modern / Cloud Paradigm
DataStore

Connect to the Spark cluster:
library(sparklyr)
cluster_url <- paste0("spark://", system("hostname -i", intern = TRUE), ":7077")
sc <- spark_connect(master = cluster_url)
Load in some data:
library(dplyr)
flights_tbl <- copy_to(sc, nycflights13::flights, "flights")
Munge with dplyr:
delay <- flights_tbl %>%
group_by(tailnum) %>%
summarise(count = n(), dist = mean(distance), delay = mean(arr_delay)) %>%
filter(count > 20, dist < 2000, !is.na(delay)) %>%
collect

> m <- ml_linear_regression(delay ~ dist, data=delay_near)
* No rows dropped by 'na.omit' call
> summary(m)
Call: ml_linear_regression(delay ~ dist, data = delay_near)
Deviance Residuals::
Min 1Q Median 3Q Max
-19.9499 -5.8752 -0.7035 5.1867 40.8973
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6904319 1.0199146 0.677 0.4986
dist 0.0195910 0.0019252 10.176 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-Squared: 0.09619
Root Mean Squared Error: 8.075
>

cda.ms/sf
https://code.visualstudio.com/
cda.ms/sH
aka.ms/dsvm/overview
github.com/Azure/aztk

Migrating Existing Open Source Machine Learning to Azure

More Related Content

What's hot (16)

Similar to Migrating Existing Open Source Machine Learning to Azure (20)

More from Revolution Analytics (20)

Recently uploaded (20)

Migrating Existing Open Source Machine Learning to Azure

Editor's Notes