SlideShare a Scribd company logo
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
cda.ms/7f
cda.ms/7g
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Replicable and scriptable
Consistent syntax on Windows (cmd / Powershell), Mac, Linux, WSL
cda.ms/sH
Migrating Existing Open Source Machine Learning to Azure
Visual Studio [Code] Tools for AI
VS & VS Code extensions to
streamline computations in
servers, Azure ML, Batch AI, …
End to end development
environment, from new project
through training
Support for remote training & job
management
On top of all of the goodness of
VS (Python, Jupyter, Git, etc)
THR3129 Getting Started with Visual Studio Tools for AI, Chris Lauren
Migrating Existing Open Source Machine Learning to Azure
https://aka.ms/dsvm/overview
https://github.com/Azure/DataScienceVM
cda.ms/sN
Migrating Existing Open Source Machine Learning to Azure
• Local tools
• Local Debug
• Faster
experimentation
Single VM
Development
• Larger VMs
• GPU
Scale Up
• Multi Node
• Remote Spark
• Batch Nodes
• VM Scale Sets
Scale Out
Series RAM vCPU GPU Approx Cost
Standard_B1s 1 Gb 1 None Free [*]
DS3_v2 14Gb 4 None $0.23 / hr
DS4_v2 28Gb 8 None $0.46 / hr
A8v2 16Gb 8 None $0.82 / hr
Standard_NC6 56 Gb 6 0.5 NV Tesla K80 $0.93 / hr
Standard_ND6s 112 Gb 6 1x Tesla P40 $2.14 / hr
[*] Not recommended: Standard_B1s (free, but too small to be useful)
https://xxx.xxx.xxx.xxx:8000/
http://xxx.xxx.xxx.xxx:8787/
https://cda.ms/s0
Migrating Existing Open Source Machine Learning to Azure
Not Hotdog:
cda.ms/sT
Migrating Existing Open Source Machine Learning to Azure
Azure Batch Batch pools
Configure and
create VMs to cater
for any scale: tens
to thousands.
Automatically scale
the number of
VMs to maximize
utilization.
Choose the VM
size most suited
to your
application.
Batch jobs and tasks
Task is a unit of execution;
task = command line application
Jobs created and tasks submitted
to a pool; tasks are queued, then
assigned to VMs.
Any application, any
execution time; run
applications unchanged.
Automatic detection and
retry of frozen or failing
tasks.
Cost savings
Scale clusters
size up and
down as
needed
Reserved
Instances for
persistent
infrastructure
Per-second
billing for
VMs
Flexible
consumption
and savings
with low-
priority VMs
Scaling AI with DSVM and Batch AI
DSVM
(Dev/Test Workstation)
Azure File
Store
Azure Batch AI
Cluster
Batch AI Run Script
Store Py Scripts in File Store
Create Py Scripts
Trained AI
Model
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
github.com/Azure/BatchAI
Migrating Existing Open Source Machine Learning to Azure
BRK3320 The Developer Data Scientist – Creating New
Analytics Driven Applications using Apache Spark with
Azure Databricks
May 8 10:30 AM-11:45 AM, Sheraton Grand Ballroom A
• Traditionally, static-sized clusters were the standard, so
compute and storage had to be collocated
• A single cluster with all necessary applications would be
installed onto the cluster (typically managed by YARN, or
something similar)
• The cluster was either over-utilized (jobs had to be
queued due to lack of capacity) OR was under-utilized
(there were idle cores that burned costs)
• Teams of data-scientists would have to submit jobs agaisnt
a single cluster - this meant that the cluster had to be
generic, preventing users from truly customizing their
clusters specifically for their jobs
Traditional / On-Premise Paradigm
DataStore
• With cloud computing, customers are no longer limited to
static size clusters
• Each job, or set of jobs, can have its own cluster so that a
customer is only charged for the minutes that the job runs
for
• Each user can have their own cluster, so that they don’t
have to compute for resources
• Each user can have their own custom cluster that is
created specifically for their experience and their
workload. Each user can install exactly the software they
need without polluting other user’s experiences
• IT admins don’t need to worry about running out of
capacity or burning dollars on idle cores
Modern / Cloud Paradigm
DataStore
www.github.com/azure/aztk
spark.rstudio.com
Migrating Existing Open Source Machine Learning to Azure
Connect to the Spark cluster:
library(sparklyr)
cluster_url <- paste0("spark://", system("hostname -i", intern = TRUE), ":7077")
sc <- spark_connect(master = cluster_url)
Load in some data:
library(dplyr)
flights_tbl <- copy_to(sc, nycflights13::flights, "flights")
Munge with dplyr:
delay <- flights_tbl %>%
group_by(tailnum) %>%
summarise(count = n(), dist = mean(distance), delay = mean(arr_delay)) %>%
filter(count > 20, dist < 2000, !is.na(delay)) %>%
collect
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
> m <- ml_linear_regression(delay ~ dist, data=delay_near)
* No rows dropped by 'na.omit' call
> summary(m)
Call: ml_linear_regression(delay ~ dist, data = delay_near)
Deviance Residuals::
Min 1Q Median 3Q Max
-19.9499 -5.8752 -0.7035 5.1867 40.8973
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6904319 1.0199146 0.677 0.4986
dist 0.0195910 0.0019252 10.176 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-Squared: 0.09619
Root Mean Squared Error: 8.075
>
cda.ms/sf
cda.ms/sf
https://code.visualstudio.com/
cda.ms/sH
aka.ms/dsvm/overview
github.com/Azure/aztk
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure

More Related Content

What's hot (16)

PDF
Move Over, Rsync
All Things Open
 
PPTX
Openstack study-nova-02
Jinho Shin
 
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
DataStax
 
PPTX
Seastar at Linux Foundation Collaboration Summit
Don Marti
 
PDF
London Hug 19/5 - Terraform in Production
London HashiCorp User Group
 
PDF
Seastar @ SF/BA C++UG
Avi Kivity
 
PDF
Seastar @ NYCC++UG
Avi Kivity
 
PDF
Build a Complex, Realtime Data Management App with Postgres 14!
Jonathan Katz
 
PPTX
Load testing Cassandra applications
Ben Slater
 
PDF
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
NETWAYS
 
PDF
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
PDF
Openstack Scheduler and Scalability Issue
Vigneshvar A.S
 
PDF
Open stack china_201109_sjtu_jinyh
OpenCity Community
 
PPTX
Building and Scaling Node.js Applications
Ohad Kravchick
 
PPT
AWS migration: getting to Data Center heaven with AWS and Chef
Juan Vicente Herrera Ruiz de Alejo
 
PDF
Ansible with AWS
Allan Denot
 
Move Over, Rsync
All Things Open
 
Openstack study-nova-02
Jinho Shin
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
DataStax
 
Seastar at Linux Foundation Collaboration Summit
Don Marti
 
London Hug 19/5 - Terraform in Production
London HashiCorp User Group
 
Seastar @ SF/BA C++UG
Avi Kivity
 
Seastar @ NYCC++UG
Avi Kivity
 
Build a Complex, Realtime Data Management App with Postgres 14!
Jonathan Katz
 
Load testing Cassandra applications
Ben Slater
 
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
NETWAYS
 
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Openstack Scheduler and Scalability Issue
Vigneshvar A.S
 
Open stack china_201109_sjtu_jinyh
OpenCity Community
 
Building and Scaling Node.js Applications
Ohad Kravchick
 
AWS migration: getting to Data Center heaven with AWS and Chef
Juan Vicente Herrera Ruiz de Alejo
 
Ansible with AWS
Allan Denot
 

Similar to Migrating Existing Open Source Machine Learning to Azure (20)

PPTX
Migrating existing open source machine learning to azure
Microsoft Tech Community
 
PPTX
Windows azure overview for SharePoint Pros
Usama Wahab Khan Cloud, Data and AI
 
PPTX
Azure machine learning service
Ruth Yakubu
 
PDF
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
PPTX
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
PDF
I want my model to be deployed ! (another story of MLOps)
AZUG FR
 
PDF
Flink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward
 
PDF
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
PDF
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
PPTX
Azure satpn19 time series analytics with azure adx
Riccardo Zamana
 
PPTX
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PDF
High Performance Computing (HPC) and Engineering Simulations in the Cloud
The UberCloud
 
PDF
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 
PDF
Julien Simon "Scaling ML from 0 to millions of users"
Fwdays
 
PPTX
Azure Batch Service Meetup Presentation
George Grammatikos
 
ODP
AutoScaling and Drupal
Promet Source
 
PDF
One-Man Ops
Jos Boumans
 
PPTX
High Performance Computing Pitch Deck
Nicholas Vossburg
 
Migrating existing open source machine learning to azure
Microsoft Tech Community
 
Windows azure overview for SharePoint Pros
Usama Wahab Khan Cloud, Data and AI
 
Azure machine learning service
Ruth Yakubu
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
I want my model to be deployed ! (another story of MLOps)
AZUG FR
 
Flink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
Azure satpn19 time series analytics with azure adx
Riccardo Zamana
 
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
The UberCloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 
Julien Simon "Scaling ML from 0 to millions of users"
Fwdays
 
Azure Batch Service Meetup Presentation
George Grammatikos
 
AutoScaling and Drupal
Promet Source
 
One-Man Ops
Jos Boumans
 
High Performance Computing Pitch Deck
Nicholas Vossburg
 
Ad

More from Revolution Analytics (20)

PPTX
The case for R for AI developers
Revolution Analytics
 
PPTX
The R Ecosystem
Revolution Analytics
 
PPTX
R Then and Now
Revolution Analytics
 
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
PPTX
Reproducible Data Science with R
Revolution Analytics
 
PPTX
The Value of Open Source Communities
Revolution Analytics
 
PPTX
The R Ecosystem
Revolution Analytics
 
PPTX
R at Microsoft (useR! 2016)
Revolution Analytics
 
PPTX
Building a scalable data science platform with R
Revolution Analytics
 
PPTX
R at Microsoft
Revolution Analytics
 
PPTX
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
PPTX
Taking R Analytics to SQL and the Cloud
Revolution Analytics
 
PPTX
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
PPTX
The network structure of cran 2015 07-02 final
Revolution Analytics
 
PPTX
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
PPTX
R at Microsoft
Revolution Analytics
 
PDF
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution Analytics
 
PDF
Warranty Predictive Analytics solution
Revolution Analytics
 
PPTX
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
PDF
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 
The case for R for AI developers
Revolution Analytics
 
The R Ecosystem
Revolution Analytics
 
R Then and Now
Revolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
Reproducible Data Science with R
Revolution Analytics
 
The Value of Open Source Communities
Revolution Analytics
 
The R Ecosystem
Revolution Analytics
 
R at Microsoft (useR! 2016)
Revolution Analytics
 
Building a scalable data science platform with R
Revolution Analytics
 
R at Microsoft
Revolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
Taking R Analytics to SQL and the Cloud
Revolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
The network structure of cran 2015 07-02 final
Revolution Analytics
 
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
R at Microsoft
Revolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution Analytics
 
Warranty Predictive Analytics solution
Revolution Analytics
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 
Ad

Recently uploaded (20)

PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
How Cloud Computing is Reinventing Financial Services
Isla Pandora
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Human Resources Information System (HRIS)
Amity University, Patna
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
How Cloud Computing is Reinventing Financial Services
Isla Pandora
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Tally software_Introduction_Presentation
AditiBansal54083
 

Migrating Existing Open Source Machine Learning to Azure

  • 7. Replicable and scriptable Consistent syntax on Windows (cmd / Powershell), Mac, Linux, WSL cda.ms/sH
  • 9. Visual Studio [Code] Tools for AI VS & VS Code extensions to streamline computations in servers, Azure ML, Batch AI, … End to end development environment, from new project through training Support for remote training & job management On top of all of the goodness of VS (Python, Jupyter, Git, etc) THR3129 Getting Started with Visual Studio Tools for AI, Chris Lauren
  • 14. • Local tools • Local Debug • Faster experimentation Single VM Development • Larger VMs • GPU Scale Up • Multi Node • Remote Spark • Batch Nodes • VM Scale Sets Scale Out
  • 15. Series RAM vCPU GPU Approx Cost Standard_B1s 1 Gb 1 None Free [*] DS3_v2 14Gb 4 None $0.23 / hr DS4_v2 28Gb 8 None $0.46 / hr A8v2 16Gb 8 None $0.82 / hr Standard_NC6 56 Gb 6 0.5 NV Tesla K80 $0.93 / hr Standard_ND6s 112 Gb 6 1x Tesla P40 $2.14 / hr [*] Not recommended: Standard_B1s (free, but too small to be useful)
  • 20. Azure Batch Batch pools Configure and create VMs to cater for any scale: tens to thousands. Automatically scale the number of VMs to maximize utilization. Choose the VM size most suited to your application. Batch jobs and tasks Task is a unit of execution; task = command line application Jobs created and tasks submitted to a pool; tasks are queued, then assigned to VMs. Any application, any execution time; run applications unchanged. Automatic detection and retry of frozen or failing tasks.
  • 21. Cost savings Scale clusters size up and down as needed Reserved Instances for persistent infrastructure Per-second billing for VMs Flexible consumption and savings with low- priority VMs
  • 22. Scaling AI with DSVM and Batch AI DSVM (Dev/Test Workstation) Azure File Store Azure Batch AI Cluster Batch AI Run Script Store Py Scripts in File Store Create Py Scripts Trained AI Model
  • 28. BRK3320 The Developer Data Scientist – Creating New Analytics Driven Applications using Apache Spark with Azure Databricks May 8 10:30 AM-11:45 AM, Sheraton Grand Ballroom A
  • 29. • Traditionally, static-sized clusters were the standard, so compute and storage had to be collocated • A single cluster with all necessary applications would be installed onto the cluster (typically managed by YARN, or something similar) • The cluster was either over-utilized (jobs had to be queued due to lack of capacity) OR was under-utilized (there were idle cores that burned costs) • Teams of data-scientists would have to submit jobs agaisnt a single cluster - this meant that the cluster had to be generic, preventing users from truly customizing their clusters specifically for their jobs Traditional / On-Premise Paradigm DataStore
  • 30. • With cloud computing, customers are no longer limited to static size clusters • Each job, or set of jobs, can have its own cluster so that a customer is only charged for the minutes that the job runs for • Each user can have their own cluster, so that they don’t have to compute for resources • Each user can have their own custom cluster that is created specifically for their experience and their workload. Each user can install exactly the software they need without polluting other user’s experiences • IT admins don’t need to worry about running out of capacity or burning dollars on idle cores Modern / Cloud Paradigm DataStore
  • 34. Connect to the Spark cluster: library(sparklyr) cluster_url <- paste0("spark://", system("hostname -i", intern = TRUE), ":7077") sc <- spark_connect(master = cluster_url) Load in some data: library(dplyr) flights_tbl <- copy_to(sc, nycflights13::flights, "flights") Munge with dplyr: delay <- flights_tbl %>% group_by(tailnum) %>% summarise(count = n(), dist = mean(distance), delay = mean(arr_delay)) %>% filter(count > 20, dist < 2000, !is.na(delay)) %>% collect
  • 37. > m <- ml_linear_regression(delay ~ dist, data=delay_near) * No rows dropped by 'na.omit' call > summary(m) Call: ml_linear_regression(delay ~ dist, data = delay_near) Deviance Residuals:: Min 1Q Median 3Q Max -19.9499 -5.8752 -0.7035 5.1867 40.8973 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6904319 1.0199146 0.677 0.4986 dist 0.0195910 0.0019252 10.176 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-Squared: 0.09619 Root Mean Squared Error: 8.075 >

Editor's Notes

  • #10: Can go download and use this today. All goodness of VSCode, etc. is part of this
  • #12: Tools: - R/Python/Julia/etc. - Data platforms (SQL Server 2016 on Windows, Azure Data Lake and HDInsight tools in VS 2015, Spark and Hadoop on Linux), plus tools like Squirrel SQL, ODBC/JDBC drivers - Data movement tools: these, plus Azure CLI - ML + AI: deep learning + GPU support on new images. LightGBM. MicrosoftML. MRS. We enable many workflows. Also use it to experiment with new tools. Discuss: Windows 2012 versus 2016, Ubuntu vs CentOS. Deep learning + GPU support on new images. https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-virtual-machine-overview
  • #14: Tools: - R/Python/Julia/etc. - Data platforms (SQL Server 2016 on Windows, Azure Data Lake and HDInsight tools in VS 2015, Spark and Hadoop on Linux), plus tools like Squirrel SQL, ODBC/JDBC drivers - Data movement tools: these, plus Azure CLI - ML + AI: deep learning + GPU support on new images. LightGBM. MicrosoftML. MRS. We enable many workflows. Also use it to experiment with new tools. Discuss: Windows 2012 versus 2016, Ubuntu vs CentOS. Deep learning + GPU support on new images. https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-virtual-machine-overview
  • #15: Typical data scientist coding workflow looks something like this: You start small with a single DSVM. You perfect the code on just a subset of the data. Don’t worry about big data right away to keep good code pace. Once you are satisfied code works on a single machine, try to scale it up to larger VMs. Can GPUs or HPC type configurations help. And finally start working with your full dataset. You should have a good idea what kind of config you may need based on the single VM and scale up scenarios. From your DSVM desktop you can connect to remote Spark nodes, submit jobs to a batch pool, leverage a scale set which can autoscale. Bottom line is that ue DSVM for the tools, use different Azure services to help you scale up and scale out.
  • #21: Azure Batch provides APIs for creating pools of resources, and then scheduling jobs and tasks to those resources. And the best part is that there is no charge for using Batch: you just pay for the compute and storage resources.
  • #22: We understand that cost savings are paramount for customers. Azure offers flexible consumption. Mix and match low-priority VMs at discounted rates with on-demand VMs, along with per-minute billing to address with your priorities and budget.   Our portal offers highly granular insight into your usage, associated costs, and groups using your resources. The Azure built-in, policy-based governance helps create a rich and integrated collaborative experience.
  • #23: Talk about complete end to end development using DSVM. Left: Shared Data Stores, both cloud and on Prem. Center: DSVM as dev environments in the cloud Right: Trained Models and Code deployed from DSVM to Other Production systems or DSVMS used in Production as well.
  • #29: R is #8 in January 2018 Tiobe language rankings. #6 in IEEE Spectrum 2017 top programming languages.