This chapter should sum up all the neccessary things you should have set up right now. This guide does not cover installing and setting things up. If you need to install everything first, please refer to our Getting started checklist.
Database access tool
You should have some kind of database tool installed. The most commonly used are DBeaver and
DataGrip. This guide will show how to set up your sandbox connection using
DBeaver, but everything should work as well with
DataGrip and all other tools that support
Bizzflow project repository
During installation, a Bizzflow project repository was created for you, or maybe you are using your own. Either way, the repository should look the same.
Above is what the repository will look like in
Gitlab. If you are using
Bitbucket or any other
git host and your structure looks the same, you are good to go!
Apache Airflow UI access
Apache Airflow is the heart of Bizzflow. It manages scheduling of tasks and provides us with a nifty UI we will use to control what happens in our project.
You should be able to access your Airflow web interface. If you had someone else install Bizzflow for you, they should be able to let you know how to access it. It will look something like this:
The main layout consists of a navigation bar at the top of the page with various links we will go through in some of the next steps in this guide. The part we are interested in right now is the list of DAGs.
Right after installation, you should see two DAGs -
90_update_toolkit. The number prefix
serves to sort the DAGs in a way it makes the most sense and you can as well ignore it.
On / Off button
This button serves to either enable or disable a DAG.
Off, it is disabled and will NEVER run, even if it is triggered manually. This tends to be a reason for a lot of misunderstandings. If your DAG does not run, please make sure it is enabled.
This is the DAG’s name to help you better navigate between them (soon there will be a lot more than just those two).
Once you decide to let Airflow run your tasks periodically based on a schedule, you will see the settings here.
You will find useful links here, such as list of latest DAG runs and so long.
What we are interested in right now is the first one, the little play button
▶. This makes it possible to run
your DAG at any time. But more on that later.
This is a Bizzflow-exclusive navigation bar extension with useful links.
- A useful application to help you clean your data. See here
- A link to the list of tasks sorted by their execution date. You will need this a lot when working with Airflow.
- A UI created to make your life easier. Read more here.
What the heck is a kex
You may be asking yourself, what the heck is a
As you may have seen in the Bizzflow’s Key Concept in
Bizzflow wiki, your storage (Data warehouse) is split horizontally to stages (
datamart). For better clarity Bizzflow also uses vertical splitting of the stages, meaning there may be
more related units of data in a single stage. E.g.
raw stage serves for storing raw data from your data sources,
but since there may be more data sources, it only makes sense to store them separated. This is what Kexes are
all about. You have raw data, but you have
transactions from your database and a
from a ERP. This would result in three tables across two Kexes:
If you still have no clue what we are talking about, please check Bizzflow’s wiki for ETL Process Structure as this chapter covers everything you may possibly need to know about Kexes.
Kexes are nothing but database schemas in the background, but since the terminology does not fully address the relation of the underlying data, we decided to call them Kexes.
Good to go
If you checked you’ve got an easy access to all the things listed above, you are good to go!