Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Does SQL
“Top 5”s
Surprises for SQL
Users
AS is not as
• C# keywords and SQL keywords overlap
• Costly to make case-insensitive -> Better
build capabilities than tinker with syntax
= != ==
• Remember: C# expression language
null IS NOT NULL
• C# nulls are two-valued
PROCEDURES but no WHILE
No UPDATE nor MERGE
• Transform/Recook instead
U-SQL Does SQL (SQLBits 2016)
@customers =
SELECT Customer.ToUpper() AS Customer
FROM @orders
WHERE Customer.Contains("Contoso");
C# Expression
Transforming Rowsets
C# Expression
Use WHERE for filtering
@rows =
SELECT
Customer,
SUM(Amount) AS TotalAmount
FROM @orders
GROUP BY Customer;
Many other aggregations are
possible. You can define
your own aggregator with
C#!
Grouping & Aggregation
@rows =
SELECT
Customer,
SUM(Amount) AS TotalAmount
FROM @orders
GROUP BY Customer
HAVING TotalAmount > 1000000;
HAVING filters the output of
a GROUP BY
Grouping & Aggregation (2)
Sorting a rowset
@customers
SELECT *
FROM @customers
ORDER BY Amount ASC
FETCH FIRST 3 ROWS;
SELECT with ORDER BY
requires a FETCH FIRST!
Sorting on OIUTPUT
OUTPUT @customers
TO @"/output.tsv"
ORDER BY Amount ASC
USING Outputters.Tsv();
Creating Constant Rowsets in Script
@departments =
SELECT * FROM
(VALUES
(31, "Sales"),
(33, "Engineering"),
(34, "Clerical"),
(35, "Marketing")
) AS
D( DepID, DepName );
@m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs
FROM @t;
@t = SELECT author, "authored" AS category
FROM @t
UNION ALL
SELECT r.Substring(1) AS r, "referenced" AS category
FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r);
category,
, category
@m(refs)
@me, @you
@him, @her
Refs(r)
@me
@you
@him
@her
@me, @you
@me
@you
U-SQL
Joins
Join operators
• INNER JOIN
• LEFT or RIGHT or FULL OUTER JOIN
• CROSS JOIN
• SEMIJOIN
• equivalent to IN subquery
• ANTISEMIJOIN
• Equivalent to NOT IN subquery
Notes
• ON clause comparisons need to be of the simple form:
rowset.column == rowset.column
or AND conjunctions of the simple equality comparison
• If a comparand is not a column, wrap it into a column in a previous
SELECT
• If the comparison operation is not ==, put it into the WHERE clause
• turn the join into a CROSS JOIN if no equality comparison
Reason: Syntax calls out which joins are efficient
U-SQL
Analytics
Windowing Expression
Window_Function_Call 'OVER' '('
[ Over_Partition_By_Clause ]
[ Order_By_Clause ]
[ Row _Clause ]
')'.
Window_Function_Call :=
Aggregate_Function_Call
| Analytic_Function_Call
| Ranking_Function_Call.
Windowing Aggregate Functions
ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP
Analytics Functions
CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT,
PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG
Ranking Functions
DENSE_RANK, NTILE, RANK, ROW_NUMBER
U-SQL Does SQL (SQLBits 2016)
INSERT
• INSERT constant
values
• INSERT from
queries
• Multiple INSERTs
INSERT constant values
INSERT INTO T VALUES (1, "text",
new SQL.MAP<string,string>("key","value"));
INSERT from queries
INSERT INTO T SELECT col1, col2, col3 FROM
@rowset;
Multiple INSERTs into same table
• Is supported
• Generates separate file per insert in physical storage:
• Can lead to performance degradation
• Recommendations:
• Try to avoid small inserts
• Rebuild table after frequent insertions with:
ALTER TABLE T REBUILD;
Additional
Resources
Documentation
U-SQL Reference Doc: http://aka.ms/usql_reference
Sample Projects
https://github.com/Azure/usql/tree/master/Examples/Ambulan
ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data
https://github.com/Azure/usql/tree/master/Examples/TweetAn
alysis
http://aka.ms/AzureDataLake

More Related Content

PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
U-SQL - Azure Data Lake Analytics for Developers
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
ADL/U-SQL Introduction (SQLBits 2016)
U-SQL - Azure Data Lake Analytics for Developers
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Killer Scenarios with Data Lake in Azure with U-SQL
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

What's hot (20)

PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
PPTX
U-SQL Query Execution and Performance Tuning
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
PPTX
U-SQL Intro (SQLBits 2016)
PPTX
Using C# with U-SQL (SQLBits 2016)
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PPTX
Stored procedure tuning and optimization t sql
PPTX
MS SQL SERVER: Programming sql server data mining
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
PDF
esProc introduction
PDF
Advanced SQL For Data Scientists
PPTX
Be A Hero: Transforming GoPro Analytics Data Pipeline
PPTX
Discardable In-Memory Materialized Queries With Hadoop
PDF
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
PDF
What's new in Mondrian 4?
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
U-SQL Query Execution and Performance Tuning
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Stored procedure tuning and optimization t sql
MS SQL SERVER: Programming sql server data mining
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
esProc introduction
Advanced SQL For Data Scientists
Be A Hero: Transforming GoPro Analytics Data Pipeline
Discardable In-Memory Materialized Queries With Hadoop
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
What's new in Mondrian 4?

Viewers also liked (11)

PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
U-SQL Learning Resources (SQLBits 2016)
PPTX
Introducing U-SQL (SQLPASS 2016)
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PPTX
Azure Data Lake and U-SQL
PPTX
Microsoft's Hadoop Story
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
Azure Data Lake Analytics Deep Dive
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Introducing U-SQL (SQLPASS 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Azure Data Lake and U-SQL
Microsoft's Hadoop Story
Analyzing StackExchange data with Azure Data Lake
Azure Data Lake Analytics Deep Dive
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping

Similar to U-SQL Does SQL (SQLBits 2016) (20)

PDF
Oracle Advanced SQL and Analytic Functions
PDF
Presentation top tips for getting optimal sql execution
PPTX
Business Intelligence Portfolio
PDF
Oracle_Analytical_function.pdf
PPTX
Exploring Advanced SQL Techniques Using Analytic Functions
PPTX
Exploring Advanced SQL Techniques Using Analytic Functions
ODP
Oracle SQL Advanced
PPT
INTRODUCTION TO SQL QUERIES REALTED BRIEF
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PDF
Multidimensional Data Analysis with Ruby (sample)
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PDF
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
PDF
Presentation interpreting execution plans for sql statements
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PPTX
Simplifying SQL with CTE's and windowing functions
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PDF
Database Systems Design Implementation and Management 11th Edition Coronel So...
PDF
Exploring T-SQL Anti-Patterns
Oracle Advanced SQL and Analytic Functions
Presentation top tips for getting optimal sql execution
Business Intelligence Portfolio
Oracle_Analytical_function.pdf
Exploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic Functions
Oracle SQL Advanced
INTRODUCTION TO SQL QUERIES REALTED BRIEF
Database Systems Design Implementation and Management 11th Edition Coronel So...
Multidimensional Data Analysis with Ruby (sample)
Database Systems Design Implementation and Management 11th Edition Coronel So...
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
Presentation interpreting execution plans for sql statements
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Database Systems Design Implementation and Management 11th Edition Coronel So...
Database Systems Design Implementation and Management 11th Edition Coronel So...
Simplifying SQL with CTE's and windowing functions
Database Systems Design Implementation and Management 11th Edition Coronel So...
Database Systems Design Implementation and Management 11th Edition Coronel So...
Exploring T-SQL Anti-Patterns

More from Michael Rys (10)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data Processing with .NET and Spark (SQLBits 2020)
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...

Recently uploaded (20)

PDF
GPL License Terms of document persentaion
PDF
TenneT-Integrated-Annual-Report-2018.pdf
PPTX
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
PPTX
Understanding AI: Basics on Artificial Intelligence and Machine Learning
PDF
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf
PPT
DWDM unit 1 for btech 3rd year students.ppt
PPTX
Transport System for Biology students in the 11th grade
PPTX
AI-Augmented Business Process Management Systems
PPTX
cardiac failure and associated notes.pptx
PPTX
Chapter_4_ network layer , data planv8.2.pptx
PPTX
Chapter_5_ network layer control plan v8.2.pptx
PPT
Technicalities in writing workshops indigenous language
PDF
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
PPTX
The future of AIThe future of AIThe future of AI
PDF
PPT nikita containers of the company use
PPTX
Machine Learning: An Introduction to Smart AI
PPTX
Basic Statistical Analysis for experimental data.pptx
PPTX
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
PPTX
BDA_Basics of Big data Unit-1.pptx Big data
PDF
NU-MEP-Standards معايير تصميم جامعية .pdf
GPL License Terms of document persentaion
TenneT-Integrated-Annual-Report-2018.pdf
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
Understanding AI: Basics on Artificial Intelligence and Machine Learning
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf
DWDM unit 1 for btech 3rd year students.ppt
Transport System for Biology students in the 11th grade
AI-Augmented Business Process Management Systems
cardiac failure and associated notes.pptx
Chapter_4_ network layer , data planv8.2.pptx
Chapter_5_ network layer control plan v8.2.pptx
Technicalities in writing workshops indigenous language
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
The future of AIThe future of AIThe future of AI
PPT nikita containers of the company use
Machine Learning: An Introduction to Smart AI
Basic Statistical Analysis for experimental data.pptx
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
BDA_Basics of Big data Unit-1.pptx Big data
NU-MEP-Standards معايير تصميم جامعية .pdf

U-SQL Does SQL (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Does SQL
  • 2. “Top 5”s Surprises for SQL Users AS is not as • C# keywords and SQL keywords overlap • Costly to make case-insensitive -> Better build capabilities than tinker with syntax = != == • Remember: C# expression language null IS NOT NULL • C# nulls are two-valued PROCEDURES but no WHILE No UPDATE nor MERGE • Transform/Recook instead
  • 4. @customers = SELECT Customer.ToUpper() AS Customer FROM @orders WHERE Customer.Contains("Contoso"); C# Expression Transforming Rowsets C# Expression Use WHERE for filtering
  • 5. @rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer; Many other aggregations are possible. You can define your own aggregator with C#! Grouping & Aggregation
  • 6. @rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000; HAVING filters the output of a GROUP BY Grouping & Aggregation (2)
  • 7. Sorting a rowset @customers SELECT * FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS; SELECT with ORDER BY requires a FETCH FIRST!
  • 8. Sorting on OIUTPUT OUTPUT @customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();
  • 9. Creating Constant Rowsets in Script @departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );
  • 10. @m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t; @t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r); category, , category @m(refs) @me, @you @him, @her Refs(r) @me @you @him @her @me, @you @me @you
  • 11. U-SQL Joins Join operators • INNER JOIN • LEFT or RIGHT or FULL OUTER JOIN • CROSS JOIN • SEMIJOIN • equivalent to IN subquery • ANTISEMIJOIN • Equivalent to NOT IN subquery Notes • ON clause comparisons need to be of the simple form: rowset.column == rowset.column or AND conjunctions of the simple equality comparison • If a comparand is not a column, wrap it into a column in a previous SELECT • If the comparison operation is not ==, put it into the WHERE clause • turn the join into a CROSS JOIN if no equality comparison Reason: Syntax calls out which joins are efficient
  • 12. U-SQL Analytics Windowing Expression Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ] [ Order_By_Clause ] [ Row _Clause ] ')'. Window_Function_Call := Aggregate_Function_Call | Analytic_Function_Call | Ranking_Function_Call. Windowing Aggregate Functions ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP Analytics Functions CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG Ranking Functions DENSE_RANK, NTILE, RANK, ROW_NUMBER
  • 14. INSERT • INSERT constant values • INSERT from queries • Multiple INSERTs INSERT constant values INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value")); INSERT from queries INSERT INTO T SELECT col1, col2, col3 FROM @rowset; Multiple INSERTs into same table • Is supported • Generates separate file per insert in physical storage: • Can lead to performance degradation • Recommendations: • Try to avoid small inserts • Rebuild table after frequent insertions with: ALTER TABLE T REBUILD;
  • 15. Additional Resources Documentation U-SQL Reference Doc: http://aka.ms/usql_reference Sample Projects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis