Orchestration
Created: May 14, 2021, Updated: March 16, 2022
So far, when we wanted to run something, we had to trigger it manually via Airflow UI. This is awesome, but let’s think bigger. You don’t want to wake up every day in the morning at 6 to click our DAGs. You want to setup a way to make Airflow run your tasks automatically. This is what orchestrations are for.
Configuration
Orchestrations are a way to glue all the individual components together. Imagine our scenario with a single extractor, transformation and a datamart. What we want Bizzflow to do is:
Every day at 6:00 AM run extractor and when it completes,
run transformation and when it completes, run datamart
Again, Bizzflow doesn’t exactly speak English, so let’s find a way to put all these information to JSON:
{
"id": "{orchestration_id}",
"schedule": "{crontab_notation}",
"tasks": [
{
"type": "{task_component_type}",
"id": "{component_id}"
}
]
}
id
Orchestration id
is again a way for us to distinct multiple orchestrations. Let’s call our main
.
schedule
Orchestration schedule is a crontab notation. To achieve notation
at 6:00 AM every day
, crontab notation would be 0 6 * * *
.
tasks
tasks
is an array of tasks. Every task needs to have type
parameter and an id
parameter. With type
you specify, what kind of component should run the task (extractor
, transformation
, datamart
, …).
id
specifies the id
of the configuration. With extractors, it is the file name of the extractor configuration.
All other components have their id
in their configuration files.
Let’s put it all together:
orchestrations.json
[
{
"id": "main",
"schedule": "0 6 * * *",
"tasks": [
{
"type": "extractor",
"id": "classicmodels"
},
{
"type": "transformation",
"id": "main"
},
{
"type": "datamart",
"id": "main"
}
]
}
]
And that’s it. Commit the changes and run 90_update_project
DAG once again. An orchestration DAG should pop out:
Go ahead and click on the DAG’s name 00_Orchestration_main
. DAG details should appear, it will look like this:
You can notice there are three tasks in our orchestration - ex_classicmodels
, tr_main
and dm_main
. This is
awesome, because it means we do not have to run individual tasks again. If you hit Trigger DAG
button, you should
now run the whole orchestration. Check out Consoles -> Latest tasks
. You will see three tasks queued and they
will run one after another.
From now on, this orchestration will run daily at 6 AM.
Where to next?
Aaaand it’s gone. You finished Bizzflow guide - BASIC
. If you wish to continue, go back to
Project design and review all the steps from there. Try adding your own datasource
or processing another free dataset from the Relational Dataset Repository.