Environment
Created: May 17, 2021, Updated: November 28, 2022
This chapter should sum up all the neccessary things you should have set up right now. This guide does not cover installing and setting things up. If you need to install everything first, please refer to our Getting started checklist.
Database access tool
You should have some kind of database tool installed. The most commonly used are DBeaver and
DataGrip. This guide will show how to set up your sandbox connection using
DBeaver
, but everything should work as well with DataGrip
and all other tools that support Google BigQuery
.
Bizzflow project repository
During installation, a Bizzflow project repository was created for you, or maybe you are using your own. Either way, the repository should look the same.
Above is what the repository will look like in Gitlab
. If you are using Github
, Bitbucket
or any other
git host and your structure looks the same, you are good to go!
Apache Airflow UI access
Apache Airflow is the heart of Bizzflow. It manages scheduling of tasks and provides us with a nifty UI we will use to control what happens in our project.
You should be able to access your Airflow web interface. If you had someone else install Bizzflow for you, they should be able to let you know how to access it. It will look something like this:
Main layout
The main layout consists of a navigation bar at the top of the page with various links we will go through in some of the next steps in this guide. The part we are interested in right now is the list of DAGs.
Right after installation, you should see two DAGs - 90_update_project
and 90_update_toolkit
. The number prefix
serves to sort the DAGs in a way it makes the most sense and you can as well ignore it.
On / Off button [1]
This button serves to either enable or disable a DAG.
Off
, it is disabled and will NEVER run, even if it is triggered manually.
This tends to be a reason for a lot of misunderstandings. If your DAG does not run, please make sure it is enabled.DAG [2]
This is the DAG’s name to help you better navigate between them (soon there will be a lot more than just those two).
Schedule [3]
Once you decide to let Airflow run your tasks periodically based on a schedule, you will see the settings here.
90_update_toolkit
is actually
a cron notation. If you want to understand better how it works, we recommend
using a service such as crontab.guru to make sure you know what you are doing.Links [4]
You will find useful links here, such as list of latest DAG runs and so long.
What we are interested in right now is the first one, the little play button ▶
. This makes it possible to run
your DAG at any time. But more on that later.
Consoles [5]
This is a Bizzflow-exclusive navigation bar extension with useful links.
Cooltivator
- A useful application to help you clean your data. See here
Latest Tasks
- A link to the list of tasks sorted by their execution date. You will need this a lot when working with Airflow.
Flow UI
- A UI created to make your life easier. Read more here.
What the heck is a kex
You may be asking yourself, what the heck is a Kex
?
As you may have seen in the Bizzflow’s Key Concept in
Bizzflow wiki, your storage (Data warehouse) is split horizontally to stages (raw
, input
, transform
, output
and datamart
). For better clarity Bizzflow also uses vertical splitting of the stages, meaning there may be
more related units of data in a single stage. E.g. raw
stage serves for storing raw data from your data sources,
but since there may be more data sources, it only makes sense to store them separated. This is what Kexes are
all about. You have raw data, but you have accounts
and transactions
from your database and a balance sheet
from a ERP. This would result in three tables across two Kexes:
raw_database
.accounts
raw_database
.transactions
raw_erp
.balance_sheet
If you still have no clue what we are talking about, please check Bizzflow’s wiki for ETL Process Structure as this chapter covers everything you may possibly need to know about Kexes.
Kexes are nothing but database schemas in the background, but since the terminology does not fully address the relation of the underlying data, we decided to call them Kexes.
Good to go
If you checked you’ve got an easy access to all the things listed above, you are good to go!