SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Query Execution and Performance
Basics
Simplified U-SQL Job Workflow
Job Front End
Job Scheduler Compiler Service
Job Queue
Job Manager
U-SQL Catalog
YARN
Job submission
Job execution
U-SQL Runtime Vertex execution
U-SQL Compilation Process
C#
C++
Algebra
Other files
(system files, deployed resources)
managed dll
Unmanaged dll
Compilation output (in job folder)
Compiler &
Optimizer
U-SQL Metadata
Service
Deployed to
Vertices
Job Status in
Visual Studio
Preparing
Queued
Running
Finalizing
Ended
(Succeeded, Failed, Cancelled)
New
Compiling
Queued
Scheduling
Starting
Running
Ended
UX Job State
The script is being compiled by the Compiler Service
All jobs enter the queue.
Are there enough ADLAUs to start the job?
If yes, then allocate those ADLAUs for the job
The U-SQL runtime is now executing the code on 1
or more ADLAUs or finalizing the outputs
The job has concluded.
The Job Queue
The queue is ordered by
job priority.
Lower numbers -> higher
priority.
1 = highest.
Running jobs
When a job is at the top
of the queue, it will start
running.
Defaults:
Max Running Jobs = 3
Max Tokens per job = 20
Max Queue Size = 200
Priority Doesn’t Preempt Running Jobs
X has Pri=1.
X
A
B
C
X will NOT preempt running jobs. X will have to wait.
These are all running
and have very low
priority (pri=1000)
Resources
Blue items: the output of the
compiler
Grey items: U-SQL runtime bits
Download all the resources
Download a specific resource
The Job Folder
Inside the Default ADL Store:
/system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID
/system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb-49af9a784f64
Query Execution
Plans, Vertices, Stages, Parallelism, ADLAUs
Job
Scheduler
& Queue
Front-EndService
13
Optimizer
Vertex Scheduling
Compiler
Runtime
Visual Studio
Portal / API
Query Life
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Parallelism
100 (ADLAUs)
Work composed of
12K Vertices
U-SQL Script -> Job Graph
Logical -> Physical Plan
Each square = “a vertex”
represents a fraction of the
total
Vertexes in each SuperVertex (aka
“Stage) are doing the same
operation on a different part of the
same data.
Visualized as a
“Job Graph”
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6 GB of
memory
Execution with Requested Parallelism
Requested Parallelism = 1
(reserve enough to do 1
vertex at a time)
Requested Parallelism = 4
(reserve enough to do 4
vertices at a time)
Notes
The next stage can
start before the
previous one has
finished
It may not be possible
to use all the reserved
parallelism during a
Stage
Notes
The Job Resources are copied to each vertex
JOB
RESOURCES
Stage Details
252 Pieces of work
AVG Vertex
execution time
4.3 Billion rows
Data Read & Written
Super Vertex = Stage
Automatic Vertex retry
ORANGE: A vertex
failed … but was retried
automatically
Overall Stage
Completed Successfully
Vertex Execution View
All the vertexes
Filter which vertexes to
see
The Critical Path
Vertex Relationships
The vertex on the bottom depends on the output of the vertex
in the top
Critical Path
The dependency
chain of vertexes that
kept the job running
to the very end.
Efficiency
Cost vs Latency
𝐽𝑜𝑏𝐶𝑜𝑠𝑡 = 5𝑐 + 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 × 𝐴𝐷𝐿𝑈𝐴𝑠 × 𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛
Allocation
Allocating 10 ADLAUs
for a 10 minute job.
Cost
= 10 min * 10 ADLAUs
= 100 ADLAU minutes
Time
Blue line: Allocated
Over Allocation Consider using fewer ADLAUs
You are paying for the area under the
blue line
You are only using the area under the
red line
Time
Profile isn’t loaded
Profile is loaded now
Click Resource usage
Blue: Allocation
Red: Actual running
Dips down to 1 active vertex at
these times
Smallest estimated time when
given 2425 ADLAUs
1410 seconds
= 23.5 minutes
Model with 100 ADLAUs
8709 seconds
= 145.5 minutes
http://aka.ms/AzureDataLake

More Related Content

What's hot (20)

PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
PPTX
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
PPTX
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
PPTX
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
PPTX
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
Azure Data Factory Data Flows Training v005
Mark Kromer
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
Microsoft's Hadoop Story
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
Azure Data Factory Data Flows Training v005
Mark Kromer
 

Viewers also liked (10)

PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Intro (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
Query Optimization & How to interpret query execution plan
Amol Barewar
 
PPT
Query execution
Digvijay Singh
 
PPTX
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
PPT
Introduction to Algorithms
Venkatesh Iyer
 
PPT
14. Query Optimization in DBMS
koolkampus
 
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
U-SQL Intro (SQLBits 2016)
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
Query Optimization & How to interpret query execution plan
Amol Barewar
 
Query execution
Digvijay Singh
 
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Introduction to Algorithms
Venkatesh Iyer
 
14. Query Optimization in DBMS
koolkampus
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Ad

Similar to U-SQL Query Execution and Performance Basics (SQLBits 2016) (20)

PPTX
C# Parallel programming
Umeshwaran V
 
PDF
Async/Await Best Practices
Particular Software
 
PPTX
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
lyvanlinh519
 
DOCX
Parallel Programming With Dot Net
Neeraj Kaushik
 
PDF
Embulk - 進化するバルクデータローダ
Sadayuki Furuhashi
 
PPTX
SPL_ALL_EN.pptx
政宏 张
 
PPT
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
PPTX
Parallel Programming
Mindfire Solutions
 
PPTX
TaskFlow Y! + HP brownbag
Joshua Harlow
 
PDF
Orchestrating complex workflows with aws step functions
Chris Shenton
 
PDF
Conquering "big data": An introduction to shard query
Justin Swanhart
 
DOCX
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
PPTX
Windowing functions session for Slovak SQL Pass & BI
Andrej Zafka
 
PPTX
Watch Re-runs on your SQL Server with RML Utilities
dpcobb
 
PDF
Yaetos Tech Overview
prevota
 
PPSX
Intro to Talend Open Studio for Data Integration
Philip Yurchuk
 
PDF
Data Processing with Apache Spark Meetup Talk
Eren Avşaroğulları
 
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
ODP
Deferred Processing in Ruby - Philly rb - August 2011
rob_dimarco
 
PPSX
Survey of task scheduler
elisha25
 
C# Parallel programming
Umeshwaran V
 
Async/Await Best Practices
Particular Software
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
lyvanlinh519
 
Parallel Programming With Dot Net
Neeraj Kaushik
 
Embulk - 進化するバルクデータローダ
Sadayuki Furuhashi
 
SPL_ALL_EN.pptx
政宏 张
 
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
Parallel Programming
Mindfire Solutions
 
TaskFlow Y! + HP brownbag
Joshua Harlow
 
Orchestrating complex workflows with aws step functions
Chris Shenton
 
Conquering "big data": An introduction to shard query
Justin Swanhart
 
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
Windowing functions session for Slovak SQL Pass & BI
Andrej Zafka
 
Watch Re-runs on your SQL Server with RML Utilities
dpcobb
 
Yaetos Tech Overview
prevota
 
Intro to Talend Open Studio for Data Integration
Philip Yurchuk
 
Data Processing with Apache Spark Meetup Talk
Eren Avşaroğulları
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Deferred Processing in Ruby - Philly rb - August 2011
rob_dimarco
 
Survey of task scheduler
elisha25
 
Ad

More from Michael Rys (7)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 

Recently uploaded (20)

PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
big data eco system fundamentals of data science
arivukarasi
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 

U-SQL Query Execution and Performance Basics (SQLBits 2016)

Editor's Notes

  • #15: https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables