SlideShare a Scribd company logo
Free Servers to Build Big Data System on: Bing’s Approach
• 10-year experience in Big Data and AI platform
• PMP, MBA, MCSE (Data Mgmt and Analytics)
Now:
• Large-scale (100K-server) offline processing
platform for Bing
• OSS stack evangelization and adoption
Past:
• Curated Data Sets for Office 365
• Compliant DL training platform Office 365
• Data-Driven Engineering Culture Building
kailiu@microsoft.com
Free Servers to Build Big Data System on: Bing’s Approach
What is Bing MagneTar Platform
• Imagine you have 1 million
machines
• Not all of them are fully
utilized
• I can reuse underutilized
capacity…
• To host DL and Open Source
pipelines
Utilization curve
1 million machines
100%
Big Data and Deep Learning
(Hadoop, Spark, Kafka
TensorFlow, ONNX, etc.)
Free Servers to Build Big Data System on: Bing’s Approach
Challenges and Solutions to use Free Servers
Yet-to-Retire
Machines
Maintenance Buffer
Machines
Online Serving
Machines
Key Characteristics Relatively stable, but subject
to return any time;
Large amount, but churning
quickly
Running production
critical services
May have spare cycles
time to time
Key Challenges Maintain data availability at
bulk machine moves
Predict machine return and
smart task allocation
Isolate data tasks from
production services
PerfISO
Advanced YARN NodeLabels
HDFS Block Placement Policy
Primary SecondaryIdle
Primary memory usage Secondary memory usage
Total memory for primary + secondary Buffer
spark-submit.cmd --conf
"spark.yarn.executor.nodeLabelExpression=*besteffort*|*persistent*"
spark-submit.cmd --conf
"spark.yarn.executor.nodeLabelExpression=*persistent*#multivm,!IsCtrlEnv"
Local/Start Rack A Remote Rack
Thank You!
Free Servers to Build Big Data System on: Bing’s Approach

More Related Content

What's hot (20)

PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Solving Performance Problems on Hadoop
Tyler Mitchell
 
PDF
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
DataWorks Summit
 
PPTX
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Data Con LA
 
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
PPTX
Platform for Data Scientists
datamantra
 
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
PPTX
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
PPTX
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
PPTX
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
PDF
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
Ognjen Antonic
 
PPTX
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon
 
PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
PDF
Big Data Computing Architecture
Gang Tao
 
PPTX
Which data should you move to Hadoop?
Attunity
 
PPTX
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Kinetica
 
PPTX
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
DataWorks Summit
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Data Con LA
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
Platform for Data Scientists
datamantra
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
Ognjen Antonic
 
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
Big Data Computing Architecture
Gang Tao
 
Which data should you move to Hadoop?
Attunity
 
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Kinetica
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 

Similar to Free Servers to Build Big Data System on: Bing’s Approach (20)

PDF
J1 - Keynote Data Platform - Rohan Kumar
MS Cloud Summit
 
PDF
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Kinetica
 
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
PPTX
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
PDF
Cloud-native Semantic Layer on Data Lake
Databricks
 
PPTX
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
PDF
25 snowflake
剑飞 陈
 
PPTX
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Tyler Wishnoff
 
PPTX
Building Enterprise OLAP on Hadoop for FSI
Luke Han
 
PDF
An overview of modern scalable web development
Tung Nguyen
 
PPTX
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
PPTX
Transforms Document Management at Scale with Distributed Database Solution wi...
DataStax Academy
 
PPTX
DA_01_Intro.pptx
Alok Mohapatra
 
PPTX
Is OLAP Dead?: Can Next Gen Tools Take Over?
Senturus
 
PPTX
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
PPTX
From Data to Services at the Speed of Business
Ali Hodroj
 
PPTX
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
J1 - Keynote Data Platform - Rohan Kumar
MS Cloud Summit
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Kinetica
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
Cloud-native Semantic Layer on Data Lake
Databricks
 
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
25 snowflake
剑飞 陈
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Tyler Wishnoff
 
Building Enterprise OLAP on Hadoop for FSI
Luke Han
 
An overview of modern scalable web development
Tung Nguyen
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Transforms Document Management at Scale with Distributed Database Solution wi...
DataStax Academy
 
DA_01_Intro.pptx
Alok Mohapatra
 
Is OLAP Dead?: Can Next Gen Tools Take Over?
Senturus
 
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
From Data to Services at the Speed of Business
Ali Hodroj
 
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The Future of AI & Machine Learning.pptx
pritsen4700
 

Free Servers to Build Big Data System on: Bing’s Approach