Prerequisites
Created: May 17, 2021, Updated: September 14, 2023
In order to install and operate Bizzflow, there are some prerequisites you should get and wrap your head around of first.
Cloud requirements
- one of:
- GCP account + Bigquery
- AWS account + Snowflake
- Azure account + SQLServer
- two virtual machines (VMs)
- two public ip addresses
- one relational database service - postgresql
- cloud storage service
Before you install
Before you install Bizzflow in your cloud, you should at least take a look at the Key Concept of Bizzflow to understand, which resources does Bizzflow use and why will Bizzflow use them. No matter which provider you will use, make sure you have either billing or some alternative of your provider’s free tier enabled.
By installing you acknowledge that you may be billed by your cloud provider for resources created during the installation and their usage after the installation.
Required knowledge
In order to install Bizzflow and use it, you should be able to click your way around your cloud provider’s user interface. In addition, you should be familiar with following list of technologies.
JSON/YAML
JavaScript Object Notation (JSON
) and YAML Ain’t Markup Language (YAML
) both provide ways to serialize
structured data. We use them for the project configuration, e.g.:
// This is what an SQL transformation configuration could look like in JSON
{
"id": "my-transformation",
"input_tables": ["in_main.my_table"],
"type": "sql"
}
# This is what an SQL transformation configuration could look like in YAML
id: "my-transformation"
input_tables:
- in_main.my_table
type: "sql"
If you are completely unfamiliar with what you see above, you should take your time to learn how to understand
and write either JSON
or YAML
yourself before using Bizzflow.
SQL
Structured Query Language or SQL
is a language used to query data from databases and modify database
structure. If you intend to use Bizzflow for SQL transformations, you should be familiar with SQL
dialect used within your selected Analytical Warehouse.
git
Git is the most popular version control system. As you have already seen in Key Concept chapter, Bizzflow uses git for project management. You will have to be able to maintain your code in a git repository. It doesn’t matter whether you will clone the repo locally or use your git host’s web IDE.
We recommend Roger Dudler’s git guide to help you wrap your head around git. For basic Bizzflow usage, you need to be familiar at least with following git subcommands:
git clone
git pull/push
git commit
git add
Excelent free git hosting is provided e.g. by gitlab or
github, but any git hosting with SSH key authentication
will work with Bizzflow.
ssh
You may be familiar openssh client tools
such as ssh
and ssh-keygen
.
If so, good for you. If not, please note that you cannot install Bizzflow
without a private SSH key. If you do not want to install openssh
, you can
try generating SSH key using online services, but we strongly discourage
from doing so. You can find detailed instructions on how to generate SSH key
later in installation guide.
Database tools
It doesn’t matter which tool will you be using, as long as it is compatible with the warehouse you have selected for your Bizzflow installation.
We recommend using a multi-database tool such as DBeaver or DataGrip.
Minimal cloud requirements
The setup creates virtual machines and database of the following types:
Azure | GCP | AWS | |
---|---|---|---|
vm-airflow | Standard_B2ms | n2-standard-2 | t3.medium |
vm-worker | Standard_B2ms | n2-standard-2 | t3.large |
scheduler postgres database | B_Standard_B1ms | db-g1-small | db.t3.small |
main database | MSSQLServer | Bigquery | Snowflake |
The specified VM and database types should be considered the minimal required configuration to run a small scale project. Keep in mind that as your project grows, the performance of your virtual machines will have to grow with it in order to keep your pipelines from failing.