Visually Transform Data in
Azure Data Factory or Azure Synapse Analytics
Cathrine Wilhelmsen
#PASSDataSummit
Data Warehousing Big Data and Analytics
#PASSDataSummit
Data Warehousing Big Data and Analytics
#PASSDataSummit
Data Warehousing Big Data and Analytics
Visually Transform Data in
Azure Data Factory or
Azure Synapse Analytics
Cathrine Wilhelmsen
She / Her
Solutions Architect
Evidi
#PASSDataSummit
Learning Pathway:
The Battle of the Data Transformation Tools
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics
Cathrine Wilhelmsen
Power up Your Transformations Game with Power Query in Power BI and Fabric
Marthe Moengen
Azure Databricks and Notebooks in Fabric – A Transformation Dream Come True?
Emilie Rønning
The Battle of the Data Transformation Tools
Cathrine Wilhelmsen, Marthe Moengen, Emilie Rønning
#PASSDataSummit
Session Description
Do you need to clean, convert, aggregate, prepare, or transform large amounts of data, but
don't want to spend your time learning a new programming language or writing lots of code? If
so, Data Flows in Azure Data Factory or Azure Synapse Analytics could be the tool for you!
By using Data Flows, you can build both simple and complex data transformations in a visual
editor. These Data Flows are executed on an underlying spark cluster for optimal scale-out
performance for big data analytics, without you having to worry about any nitty-gritty details.
We will look at the capabilities and use cases for Data Flows, where they best fit into your
architecture, and how they compare to Power Queries (called Dataflows Gen2 in Microsoft
Fabric). Then, we will work through a few different Data Flows demos to dig deeper into the
various transformations available, as well as the expression language and how to use the
visual expression builder. Finally, we will cover how to orchestrate and monitor our Data
Flows, discuss lessons learned, and explain the pricing model.
Cathrine
Wilhelmsen
She / Her
Solutions Architect
Evidi
hi@cathrinew.net
cathrinew.net
@cathrinew
I love data and coding, as well as
teaching and sharing knowledge
Microsoft Data Platform MVP
Organizing Fabric February
Renovating a house
#PASSDataSummit
Quick
Overview
#PASSDataSummit
What is Azure Data Factory?
Standalone service for:
• Data Integration
• Workflow Orchestration
• Scheduling
#PASSDataSummit
What is Azure Synapse Analytics?
Unified analytics platform:
• Data Integration
• Data Lake
• Data Warehousing
• Big Data Analytics
• Time-Series Analytics
• Data Science
#PASSDataSummit
Ingest Data Transform Data
#PASSDataSummit
Orchestration
Ingest Data Transform Data
#PASSDataSummit
Triggers
Linked Services
Activities
Datasets
Pipelines
#PASSDataSummit
Ingesting
Data
#PASSDataSummit
Copy Data Activity
The core activity *
Supports 100+ connectors
Powerful built-in capabilities
* Cathrine's opinion
#PASSDataSummit
Copy Data Activity: Binary Files
Source Sink
#PASSDataSummit
Copy Data Activity: Complex Data
Source Sink
Serialization
Deserialization
Compression
Decompression
Column
Mapping
#PASSDataSummit
Copy Data Activity: Complex Data
Source Sink
Serialization
Deserialization
Compression
Decompression
Column
Mapping
Convert file formats
#PASSDataSummit
Copy Data Activity: Complex Data
Source Sink
Serialization
Deserialization
Compression
Decompression
Column
Mapping
Zip or unzip files
#PASSDataSummit
Copy Data Activity: Complex Data
Source Sink
Serialization
Deserialization
Compression
Decompression
Column
Mapping
Map columns implicitly or explicitly
#PASSDataSummit
Demo
Ingesting Data
#PASSDataSummit
Transforming
Data
#PASSDataSummit
Transforming Data
Designer-First
Data Flows
Code-First
Notebooks, SQL Scripts
#PASSDataSummit
Transforming Data:
Data Flows
#PASSDataSummit
What are Data Flows?
• Data transformation at scale
• Visual editor, low-code experience
• Runs on serverless, managed Spark clusters
#PASSDataSummit
Why use Data Flows?
Transform big data without writing code
Modify complex structures using the expression
language instead of Python, Scala, etc.
#PASSDataSummit
Why use Data Flows?
Optimized for data warehousing scenarios
Slowly changing dimensions, fact table loading,
fuzzy lookups, data quality validation etc.
#PASSDataSummit
Why use Data Flows?
Can handle flexible schemas and schema drift
Column pattern matching, rule-based mappings,
byNames, byPosition, etc.
#PASSDataSummit
Data Flows:
Transformations
#PASSDataSummit
What are transformations?
• One step in the data flow
• Executed sequentially
• Order generally doesn’t matter for performance
#PASSDataSummit
Which transformations exist?
• Inputs / outputs (Blue)
• Multiple inputs / outputs (Purple)
• Schema modifiers (Green)
• Row modifiers (Orange)
• Formatters (Teal)
• Flowlets (Turquoise)
#PASSDataSummit
Demo
Transforming Data
Data Flows:
Orchestration
#PASSDataSummit
What does orchestration mean?
Defining Workflows
Which activities to run in which order?
Configuring Alerts and Error Handling
How to handle unexpected results and failures?
Adding Triggers
When to execute pipelines?
#PASSDataSummit
Activity Dependencies
#PASSDataSummit
Activity Dependencies
#PASSDataSummit
Activity Dependencies
#PASSDataSummit
Activity Dependencies: Logical… ?
#PASSDataSummit
Activity Dependencies: Logical AND
AND
AND
AND
#PASSDataSummit
Activity Dependencies: Logical OR
#PASSDataSummit
Triggers
Execute last published pipeline:
On a set Schedule
In a Tumbling Window
When Event happens
Now
#PASSDataSummit
Triggers: Schedule
Execute one or more pipelines on a set schedule:
• Every Wednesday at 06:00
• Last day of the month at 18:00
• Every Monday at 04:00 and Friday at 20:00
#PASSDataSummit
Triggers: Tumbling Window
Execute a single pipeline for each time slice:
• For every 15 minutes
• For every 1 hour
• For every 24 hours
#PASSDataSummit
Triggers: Storage or Custom Events
Execute one or more pipelines when:
• Blob is Created
• Blob is Deleted
• Custom Event Happens
#PASSDataSummit
Triggers: Now
Execute a single pipeline immediately
#PASSDataSummit
Demo
Orchestration and
Monitoring
Pricing
#PASSDataSummit
Azure Data Factory Data Flows
Basic: $0.274 per vCore-hour
General Purpose
Standard: $0.343 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Azure Synapse Analytics Data Flows
Basic: $0.257 per vCore-hour
General Purpose
Standard: $0.325 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Rounding Up
Basic: $0.257 per vCore-hour
General Purpose
Standard: $0.325 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour
General Purpose
Standard: $0.325 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour
General Purpose
Standard: $0.325 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour
General Purpose
Standard: $0.325 per vCore-hour
Memory Optimized
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour
General Purpose: 4 (+4 Driver Cores)
Standard: $0.325 per vCore-hour
Memory Optimized: 4 (+4 Driver Cores)
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour = $2.056 per hour
General Purpose: 4 (+4 Driver Cores)
Standard: $0.325 per vCore-hour = $2.6 per hour
Memory Optimized: 4 (+4 Driver Cores)
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
#PASSDataSummit
Data Flows: Cluster Size
Basic: $0.257 per vCore-hour = $69.9 per hour
General Purpose: 256 (+16 Driver Cores)
Standard: $0.325 per vCore-hour = $88.4 per hour
Memory Optimized: 256 (+16 Driver Cores)
* Prices in USD from November 2023.
All activities are prorated by the minute and rounded up.
The minimum cluster size is 8 vCores.
Continued
Learning
#PASSDataSummit
Kamil Nowinski’s Cheat Sheet
github.com/Azure-Player/CheatSheets
Keeping Up with Data Flows
Resources:
aka.ms/dflinks
Videos:
aka.ms/dataflowvideos
Lessons
Learned
#PASSDataSummit
«Overkill for small data»
- Cathrine
#PASSDataSummit
«Smaller datasets for testing
will save your butt»
- Cathrine
#PASSDataSummit
«Turn. It. Off. When. Finished.»
- Cathrine
#PASSDataSummit
Q&A
Session evaluation
Your feedback is important to us!
PASSDataCommunitySummit.com/evaluation
Evaluate this session at:
#PASSDataSummit
Coming up next in our Learning Pathway:
The Battle of the Data Transformation Tools
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics
Cathrine Wilhelmsen
Power up Your Transformations Game with Power Query in Power BI and Fabric
Marthe Moengen
Azure Databricks and Notebooks in Fabric – A Transformation Dream Come True?
Emilie Rønning
The Battle of the Data Transformation Tools
Cathrine Wilhelmsen, Marthe Moengen, Emilie Rønning
Thank you!
Special thanks to Mark Kromer
Cathrine Wilhelmsen
cathrine@fabricfebruary.com
cathrinew.net
@cathrinew

More Related Content

PPTX
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
PPTX
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
PPTX
Azure satpn19 time series analytics with azure adx
PPTX
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
PDF
Realtime Analytics on AWS
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Azure Stream Analytics
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Azure satpn19 time series analytics with azure adx
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
Realtime Analytics on AWS
Analyzing StackExchange data with Azure Data Lake
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Azure Stream Analytics

Similar to Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PASS Data Community Summit 2023) (20)

PDF
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
PDF
Modern data warehouse with Azure
PPTX
NDC Sydney - Analyzing StackExchange with Azure Data Lake
PPTX
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
PDF
Horses for Courses: Database Roundtable
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PDF
Big Data - in the cloud or rather on-premises?
PPTX
Aws meetup 20190427
PDF
1 Introduction to Microsoft data platform analytics for release
PPTX
Event Hub & Azure Stream Analytics
PPTX
Azure Stream Analytics : Analyse Data in Motion
PDF
Estimating the Total Costs of Your Cloud Analytics Platform 
PDF
Headaches and Breakthroughs in Building Continuous Applications
PPTX
Microsoft Fabric Introduction
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PPTX
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
DOCX
PRIME COMPARISON of Azure Data Bricks, Azure Synapse, vs Azure Data Factory.docx
PPTX
StructuredStreaming webinar slides.pptx
PPTX
Microsoft Azure Big Data Analytics
PPTX
StructuredStreaming webinar slides.pptx
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Modern data warehouse with Azure
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Horses for Courses: Database Roundtable
Integration Monday - Analysing StackExchange data with Azure Data Lake
Big Data - in the cloud or rather on-premises?
Aws meetup 20190427
1 Introduction to Microsoft data platform analytics for release
Event Hub & Azure Stream Analytics
Azure Stream Analytics : Analyse Data in Motion
Estimating the Total Costs of Your Cloud Analytics Platform 
Headaches and Breakthroughs in Building Continuous Applications
Microsoft Fabric Introduction
Analytics in a Day Ft. Synapse Virtual Workshop
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
PRIME COMPARISON of Azure Data Bricks, Azure Synapse, vs Azure Data Factory.docx
StructuredStreaming webinar slides.pptx
Microsoft Azure Big Data Analytics
StructuredStreaming webinar slides.pptx
Ad

More from Cathrine Wilhelmsen (20)

PDF
Fra utvikler til arkitekt: Skap din egen karrierevei ved å utvikle din person...
PDF
One Year in Fabric: Lessons Learned from Implementing Real-World Projects (PA...
PDF
Data Factory in Microsoft Fabric (MsBIP #82)
PDF
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
PDF
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
PDF
Website Analytics in My Pocket using Microsoft Fabric (SQLBits 2024)
PDF
Data Integration using Data Factory in Microsoft Fabric (ESPC Microsoft Fabri...
PDF
Choosing between Fabric, Synapse and Databricks (Data Left Unattended 2023)
PDF
Data Integration with Data Factory (Microsoft Fabric Day Oslo 2023)
PDF
The Battle of the Data Transformation Tools (PASS Data Community Summit 2023)
PDF
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...
PDF
Website Analytics in my Pocket using Microsoft Fabric (AdaCon 2023)
PDF
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
PDF
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (D...
PDF
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (S...
PDF
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced ...
PDF
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
PDF
6 Tips for Building Confidence as a Public Speaker (SQLBits 2022)
PDF
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Fra utvikler til arkitekt: Skap din egen karrierevei ved å utvikle din person...
One Year in Fabric: Lessons Learned from Implementing Real-World Projects (PA...
Data Factory in Microsoft Fabric (MsBIP #82)
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Website Analytics in My Pocket using Microsoft Fabric (SQLBits 2024)
Data Integration using Data Factory in Microsoft Fabric (ESPC Microsoft Fabri...
Choosing between Fabric, Synapse and Databricks (Data Left Unattended 2023)
Data Integration with Data Factory (Microsoft Fabric Day Oslo 2023)
The Battle of the Data Transformation Tools (PASS Data Community Summit 2023)
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...
Website Analytics in my Pocket using Microsoft Fabric (AdaCon 2023)
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (D...
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (S...
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced ...
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
6 Tips for Building Confidence as a Public Speaker (SQLBits 2022)
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Ad

Recently uploaded (20)

PPTX
Machine Learning: An Introduction to Smart AI
PDF
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
PPTX
Chapter_5_ network layer control plan v8.2.pptx
PDF
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
PPTX
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
PDF
TenneT-Integrated-Annual-Report-2018.pdf
PPTX
An Introduction to Lean Six Sigma for Bilginer
PPTX
Understanding AI: Basics on Artificial Intelligence and Machine Learning
PPTX
Basic Statistical Analysis for experimental data.pptx
PPTX
ISO 9001-2015 quality management system presentation
PPT
Handout for Lean and Six Sigma application
PPTX
Sistem Informasi Manejemn-Sistem Manajemen Database
PPTX
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
PPTX
Bussiness Plan S Group of college 2020-23 Final
PDF
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
PPTX
Transport System for Biology students in the 11th grade
PDF
Library Hi Tech, technology of the world
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
AI-Augmented Business Process Management Systems
PDF
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf
Machine Learning: An Introduction to Smart AI
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
Chapter_5_ network layer control plan v8.2.pptx
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
TenneT-Integrated-Annual-Report-2018.pdf
An Introduction to Lean Six Sigma for Bilginer
Understanding AI: Basics on Artificial Intelligence and Machine Learning
Basic Statistical Analysis for experimental data.pptx
ISO 9001-2015 quality management system presentation
Handout for Lean and Six Sigma application
Sistem Informasi Manejemn-Sistem Manajemen Database
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
Bussiness Plan S Group of college 2020-23 Final
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
Transport System for Biology students in the 11th grade
Library Hi Tech, technology of the world
Nucleic-Acids_-Structure-Typ...-1.pdf 011
AI-Augmented Business Process Management Systems
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf

Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PASS Data Community Summit 2023)