Created: May 14, 2021, Updated: February 8, 2022
Bizzflow is a data pipeline tool for every use case.
Get Started with Bizzflow
Before starting with Bizzflow, you should take some time to take a look at the Key Concept figure below in order to understand the basics of what Bizzflow is, what it does and how does it do.
As seen in the simplified picture above, there are few components that come into play when talking about a Bizzflow project.
Data sources represent your primary data in their native location. Be it a
a table in a
database or an endpoint of an
API, Bizzflow can extract data from it.
Every Bizzflow project must have a
git repository that contains your
YAML configuration and your project documentation.
Cloud Project Container
Bizzflow runs in a cloud environment. It doesn’t matter whether it’s a
Google Cloud Platform project,
Microsoft Azure resource group or
Amazon Web Services account. You need an access to your cloud
project container in order to install Bizzflow and set it up.
In the container there will always be at least three main resource types -
Cloud Compute is a service providing us with Virtual Machines. We usually use two of those:
- runs a 24/7 Apache Airflow’s scheduler and every few seconds or so tries to figure out whether there are any tasks that are supposed to run
- also provides you with a user interface to monitor your tasks and run them on demand
- get started every time there is some heavy lifting to be done
- basically runs whenever an extractor od Docker transformation tasks run
Cloud Storage can be understood as a cloud file storage. Bizzflow uses this storage to
CSV files and logs. You will mostly be able to ignore this resource as our Virtual Machines
deal with without any need of interaction.
Analytical warehouse is the place where the magic happens. All the data extracted from your data sources will be
available to you in your warehouse. You can write
SQL transformations that will run within the warehouse
and finally you can create a
datamart, which is a separate space within your warehouse you may use
to make your output data available to 3rd party tools (such as visualization tools).
Components communicate with each other, while most of the communication is initiated by the scheduler running
In a nutshell, the data pipeline starts with your
Data Source. Bizzflow extracts data from the
in an operation called
extraction and places it as
CSV files in the
Cloud Storage. Then the
get loaded into the
Data Warehouse in the form of tables.
If you opt to go with the
ETL approach, you can write
SQL transformations that will run within the
Data Warehouse, creating output tables you can use in 3rd party tools (such as visualisation tools).
Alternatively, if you prefer the
ELT approach, you may use Bizzflow to orchestrate
dbt and create views and
dbt models instead.