Orchestration

Created: May 14, 2021, Updated: February 28, 2022

So far, when we wanted to run something, we had to trigger it manually via Airflow UI. This is awesome, but let’s think bigger. You don’t want to wake up every day in the morning at 6 to click our DAGs. You want to setup a way to make Airflow run your tasks automatically. This is what orchestrations are for.

Configuration

Orchestrations are a way to glue all the individual components together. Imagine our scenario with a single extractor, transformation and a datamart. What we want Bizzflow to do is:

Every day at 6:00 AM run extractor and when it completes,
run transformation and when it completes, run datamart

Again, Bizzflow doesn’t exactly speak English, so let’s find a way to put all these information to JSON:

{
  "id": "{orchestration_id}",
  "schedule": "{crontab_notation}",
  "tasks": [
    {
      "type": "{task_component_type}",
      "id": "{component_id}"
    }
  ]
}

id

Orchestration id is again a way for us to distinct multiple orchestrations. Let’s call our main.

schedule

Orchestration schedule is a crontab notation. To achieve notation at 6:00 AM every day, crontab notation would be 0 6 * * *.

tasks

tasks is an array of tasks. Every task needs to have type parameter and an id parameter. With type you specify, what kind of component should run the task (extractor, transformation, datamart, …). id specifies the id of the configuration. With extractors, it is the file name of the extractor configuration. All other components have their id in their configuration files.

Let’s put it all together:


orchestrations.json

[
  {
    "id": "main",
    "schedule": "0 6 * * *",
    "tasks": [
      {
        "type": "extractor",
        "id": "classicmodels"
      },
      {
        "type": "transformation",
        "id": "main"
      },
      {
        "type": "datamart",
        "id": "main"
      }
    ]
  }
]

And that’s it. Commit the changes and run 90_update_project DAG once again. An orchestration DAG should pop out:

Airflow with orchestration DAG
Airflow with orchestration DAG

Go ahead and click on the DAG’s name 00_Orchestration_main. DAG details should appear, it will look like this:

Airflow orchestration tasks
Airflow orchestration tasks

You can notice there are three tasks in our orchestration - ex_classicmodels, tr_main and dm_main. This is awesome, because it means we do not have to run individual tasks again. If you hit Trigger DAG button, you should now run the whole orchestration. Check out Consoles -> Latest tasks. You will see three tasks queued and they will run one after another.

Airflow orchestration task instances
Airflow orchestration task instances

From now on, this orchestration will run daily at 6 AM.

Where to next?

Gone meme
Feeling the emptiness?

Aaaand it’s gone. You finished Bizzflow guide - BASIC. If you wish to continue, go back to Project design and review all the steps from there. Try adding your own datasource or processing another free dataset from the Relational Dataset Repository.