3
Most read
4
Most read
16
Most read
1
Airflow 101
Saar Bergerbest
Data Engineer
2
● Airflow is a platform to programmatically, schedule and monitor workflows.
● Started as an Airbnb project at 2014.
Joined the Apache Software Foundation’s incubation program in March 2016.
Using by the following companies:
What is Airflow?
3
● Scheduling - giving you the ability to schedule data pipelines easily.
● Dependencies - conveniently define sequences of tasks.
● Triggering - targeting task instances in specific states (failed, or success).
● Downtime Recover - in case airflow is restart it can fill the gaps.
● UI - The Airflow UI make it easy to monitor and troubleshoot your data pipelines.
● Error handling (retries) and logging.
Why Airflow?
4
● TaskOperator: Individual task that need to be done.
● Streamdependencies: The relation between the different tasksoperators (for
example - taskA needs to run after taskB).
● DAG:
○ Directed Acyclic Graph -
https://en.wikipedia.org/wiki/Directed_acyclic_graph
○ Collection of all the tasks you want to run, organized in a way that reflects
their relationships and dependencies.
○ DAG can have sub-dags
○ DAG can invoke other DAG
Basic Components
5
● Describes a single workflow.
● Scheduled with a cron.
● Determines the dependencies between tasks.
A task can depend on one or more other tasks for it to trigger.
● A task can be configured to execute after all its dependencies succeeded (default), once one is
succeeded, one one is failed, etc.
● In case Dag failed at task X - we can restart that task and its dependencies.
● In case subDag failed - we can restart all the tasks of that subDag.
Example
DAG
6
1. PythonOperator - Executes a Python callable function
Example
1. BranchPythonOperator - Allows a workflow to "branch" or follow a
single path following the execution of this task.
It derives the PythonOperator and expects a Python function that
returns
the task_id to follow.
Example
1. BashOperator - Execute a Bash script, command or set of commands.
Example
Operators: Python + Branch + Bash
7
● Jinja2 is a modern and designer-friendly templating language for Python.
● Airflow leverages the power of Jinja Templating and provides the pipeline author with a
set of built-in parameters and macros.
● Provides concise and elegant syntax.
Example
Jinja Template
Jinja documentation:
http://jinja.pocoo.org/docs/dev/
8
● Airflow tasks can runs on several workers , XCom let tasks exchange messages.
● Xcom defined by a key, value, and timestamp, but also track attributes like the task/DAG that
created the XCom and when it should become visible.
● Any object that can be *pickled can be used as an XCom value, so users should make sure to
use objects of appropriate size.
● Tasks can push XComs at any time by calling the xcom_push() method. In addition, if a task
returns a value then an XCom containing that value is automatically pushed.
● Tasks call xcom_pull() to retrieve XComs.
Example
XCom (cross communication via db)
*pickle:
https://docs.python.org/2/library/pickle.html
9
● Information such as hostname, port, login and passwords to other systems and services
is handled in the ‘Connection’ section of the UI.
● The pipeline code you will author will reference the ‘conn_id’ of the Connection objects.
● The information is saved in the db that Airflow manages, there is an option to encrypt
passwords.
Connections
10
● Key-Value storage within Airflow.
● Variables can be listed, created, updated and deleted from the UI, code or CLI.
Example
Variables
11
Operator is doing task , Sensor monitor status
1. SimpleHttpOperator - Calls an endpoint on an HTTP system to execute an action.
1. HttpSensor - Executes a HTTP get statement and returns False on failure:
404 not found or response_check function returned False.
Example
Operator/Sensor: SimpleHttp + HttpSensor
12
● Subdag id must be ‘parent.child’ (show in the example).
● Used to pack workflows that are used multiple times (modules).
● Gives the ability to retry on a whole logic unit.
Example
Operators: Subdag
1313
Another Examples for Operators
HiveOperator
PrestoToMysqlOper
ator
S3FileTransformOp
erator
SlackOperator DockerOperator EmailOperator
14
● Monitors Dags according to cron configuration.
● Monitor to tasks within DAGs (Triggers the task instances whose dependencies have
been met).
● The scheduler starts an instance of the executor specified in the your airflow.cfg file
(default Executor) or defined for the task in the code which executes the tasks.
Example of starting the scheduler...
Scheduler
15
1. SequentialExecutor (default with sqlite):
● Runs one task instance at a time.
1. LocalExecutor:
● Runs tasks in parallel.
● Configured number of LocalWorkers which execute the tasks.
1. CeleryExecutor:
● Parallel and distributed
● Using Celery backend - a tasks for execution queue (RabbitMQ, Redis).
● The scheduler will insert to the Celery queue the relevant tasks, Celery workers will executes
them.
Executors
16
UI

More Related Content

PPTX
Apache airflow
PDF
Airflow for Beginners
PDF
Apache Airflow
PDF
Introducing Apache Airflow and how we are using it
PDF
Airflow introduction
PPTX
Apache Airflow overview
PDF
Apache airflow
PDF
Apache Airflow
Apache airflow
Airflow for Beginners
Apache Airflow
Introducing Apache Airflow and how we are using it
Airflow introduction
Apache Airflow overview
Apache airflow
Apache Airflow

What's hot (20)

PDF
Introduction to Apache Airflow
PPTX
Airflow presentation
PDF
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
PDF
Orchestrating workflows Apache Airflow on GCP & AWS
PDF
Apache Airflow
PPTX
Airflow - a data flow engine
PDF
Building an analytics workflow using Apache Airflow
PDF
Apache Airflow Architecture
PDF
Airflow presentation
PPTX
Apache Airflow Introduction
PDF
Airflow tutorials hands_on
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
PDF
Airflow Intro-1.pdf
PPTX
Airflow and supervisor
PDF
Building Better Data Pipelines using Apache Airflow
PDF
From airflow to google cloud composer
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
Iceberg: a fast table format for S3
PDF
Airflow Best Practises & Roadmap to Airflow 2.0
PPTX
PostGreSQL Performance Tuning
Introduction to Apache Airflow
Airflow presentation
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Orchestrating workflows Apache Airflow on GCP & AWS
Apache Airflow
Airflow - a data flow engine
Building an analytics workflow using Apache Airflow
Apache Airflow Architecture
Airflow presentation
Apache Airflow Introduction
Airflow tutorials hands_on
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Airflow Intro-1.pdf
Airflow and supervisor
Building Better Data Pipelines using Apache Airflow
From airflow to google cloud composer
Apache Spark in Depth: Core Concepts, Architecture & Internals
Iceberg: a fast table format for S3
Airflow Best Practises & Roadmap to Airflow 2.0
PostGreSQL Performance Tuning
Ad

Similar to Airflow 101 (20)

PPTX
airflow web UI and CLI.pptx
PPTX
Apache Airdrop detailed description.pptx
PPTX
airflowpresentation1-180717183432.pptx
PDF
Building Automated Data Pipelines with Airflow.pdf
PDF
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
PPTX
DataPipelineApacheAirflow.pptx
PPSX
Introduce Airflow.ppsx
PDF
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
PDF
Airflow - Insane power in a Tiny Box
PPTX
PDF
Managing transactions on Ethereum with Apache Airflow
PPTX
Apache AirfowAsaSAsaSAsSas - Session1.pptx
PDF
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
PDF
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
PPTX
Airflow at lyft
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
PPTX
Fyber - airflow best practices in production
PDF
Upcoming features in Airflow 2
PDF
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
airflow web UI and CLI.pptx
Apache Airdrop detailed description.pptx
airflowpresentation1-180717183432.pptx
Building Automated Data Pipelines with Airflow.pdf
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
DataPipelineApacheAirflow.pptx
Introduce Airflow.ppsx
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
Airflow - Insane power in a Tiny Box
Managing transactions on Ethereum with Apache Airflow
Apache AirfowAsaSAsaSAsSas - Session1.pptx
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Airflow at lyft
How I learned to time travel, or, data pipelining and scheduling with Airflow
Fyber - airflow best practices in production
Upcoming features in Airflow 2
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
How I learned to time travel, or, data pipelining and scheduling with Airflow
Ad

Recently uploaded (20)

PPTX
cyber row.pptx for cyber proffesionals and hackers
PPTX
GPS sensor used agriculture land for automation
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPTX
langchainpptforbeginners_easy_explanation.pptx
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
Hushh.ai: Your Personal Data, Your Business
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PDF
Grey Minimalist Professional Project Presentation (1).pdf
PPTX
Introduction to Fundamentals of Data Security
PPTX
C programming msc chemistry pankaj pandey
PPTX
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPTX
Stats annual compiled ipd opd ot br 2024
PPTX
Capstone Presentation a.pptx on data sci
PPTX
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPTX
machinelearningoverview-250809184828-927201d2.pptx
cyber row.pptx for cyber proffesionals and hackers
GPS sensor used agriculture land for automation
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
Teal Blue Futuristic Metaverse Presentation.pdf
langchainpptforbeginners_easy_explanation.pptx
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
1 hour to get there before the game is done so you don’t need a car seat for ...
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Hushh.ai: Your Personal Data, Your Business
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
Grey Minimalist Professional Project Presentation (1).pdf
Introduction to Fundamentals of Data Security
C programming msc chemistry pankaj pandey
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
Stats annual compiled ipd opd ot br 2024
Capstone Presentation a.pptx on data sci
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
machinelearningoverview-250809184828-927201d2.pptx

Airflow 101

  • 2. 2 ● Airflow is a platform to programmatically, schedule and monitor workflows. ● Started as an Airbnb project at 2014. Joined the Apache Software Foundation’s incubation program in March 2016. Using by the following companies: What is Airflow?
  • 3. 3 ● Scheduling - giving you the ability to schedule data pipelines easily. ● Dependencies - conveniently define sequences of tasks. ● Triggering - targeting task instances in specific states (failed, or success). ● Downtime Recover - in case airflow is restart it can fill the gaps. ● UI - The Airflow UI make it easy to monitor and troubleshoot your data pipelines. ● Error handling (retries) and logging. Why Airflow?
  • 4. 4 ● TaskOperator: Individual task that need to be done. ● Streamdependencies: The relation between the different tasksoperators (for example - taskA needs to run after taskB). ● DAG: ○ Directed Acyclic Graph - https://en.wikipedia.org/wiki/Directed_acyclic_graph ○ Collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. ○ DAG can have sub-dags ○ DAG can invoke other DAG Basic Components
  • 5. 5 ● Describes a single workflow. ● Scheduled with a cron. ● Determines the dependencies between tasks. A task can depend on one or more other tasks for it to trigger. ● A task can be configured to execute after all its dependencies succeeded (default), once one is succeeded, one one is failed, etc. ● In case Dag failed at task X - we can restart that task and its dependencies. ● In case subDag failed - we can restart all the tasks of that subDag. Example DAG
  • 6. 6 1. PythonOperator - Executes a Python callable function Example 1. BranchPythonOperator - Allows a workflow to "branch" or follow a single path following the execution of this task. It derives the PythonOperator and expects a Python function that returns the task_id to follow. Example 1. BashOperator - Execute a Bash script, command or set of commands. Example Operators: Python + Branch + Bash
  • 7. 7 ● Jinja2 is a modern and designer-friendly templating language for Python. ● Airflow leverages the power of Jinja Templating and provides the pipeline author with a set of built-in parameters and macros. ● Provides concise and elegant syntax. Example Jinja Template Jinja documentation: http://jinja.pocoo.org/docs/dev/
  • 8. 8 ● Airflow tasks can runs on several workers , XCom let tasks exchange messages. ● Xcom defined by a key, value, and timestamp, but also track attributes like the task/DAG that created the XCom and when it should become visible. ● Any object that can be *pickled can be used as an XCom value, so users should make sure to use objects of appropriate size. ● Tasks can push XComs at any time by calling the xcom_push() method. In addition, if a task returns a value then an XCom containing that value is automatically pushed. ● Tasks call xcom_pull() to retrieve XComs. Example XCom (cross communication via db) *pickle: https://docs.python.org/2/library/pickle.html
  • 9. 9 ● Information such as hostname, port, login and passwords to other systems and services is handled in the ‘Connection’ section of the UI. ● The pipeline code you will author will reference the ‘conn_id’ of the Connection objects. ● The information is saved in the db that Airflow manages, there is an option to encrypt passwords. Connections
  • 10. 10 ● Key-Value storage within Airflow. ● Variables can be listed, created, updated and deleted from the UI, code or CLI. Example Variables
  • 11. 11 Operator is doing task , Sensor monitor status 1. SimpleHttpOperator - Calls an endpoint on an HTTP system to execute an action. 1. HttpSensor - Executes a HTTP get statement and returns False on failure: 404 not found or response_check function returned False. Example Operator/Sensor: SimpleHttp + HttpSensor
  • 12. 12 ● Subdag id must be ‘parent.child’ (show in the example). ● Used to pack workflows that are used multiple times (modules). ● Gives the ability to retry on a whole logic unit. Example Operators: Subdag
  • 13. 1313 Another Examples for Operators HiveOperator PrestoToMysqlOper ator S3FileTransformOp erator SlackOperator DockerOperator EmailOperator
  • 14. 14 ● Monitors Dags according to cron configuration. ● Monitor to tasks within DAGs (Triggers the task instances whose dependencies have been met). ● The scheduler starts an instance of the executor specified in the your airflow.cfg file (default Executor) or defined for the task in the code which executes the tasks. Example of starting the scheduler... Scheduler
  • 15. 15 1. SequentialExecutor (default with sqlite): ● Runs one task instance at a time. 1. LocalExecutor: ● Runs tasks in parallel. ● Configured number of LocalWorkers which execute the tasks. 1. CeleryExecutor: ● Parallel and distributed ● Using Celery backend - a tasks for execution queue (RabbitMQ, Redis). ● The scheduler will insert to the Celery queue the relevant tasks, Celery workers will executes them. Executors
  • 16. 16 UI