Prerequisites

Created: May 17, 2021, Updated: September 14, 2023

In order to install and operate Bizzflow, there are some prerequisites you should get and wrap your head around of first.

Cloud requirements

  • one of:
    • GCP account + Bigquery
    • AWS account + Snowflake
    • Azure account + SQLServer
  • two virtual machines (VMs)
  • two public ip addresses
  • one relational database service - postgresql
  • cloud storage service

Before you install

Before you install Bizzflow in your cloud, you should at least take a look at the Key Concept of Bizzflow to understand, which resources does Bizzflow use and why will Bizzflow use them. No matter which provider you will use, make sure you have either billing or some alternative of your provider’s free tier enabled.

By installing you acknowledge that you may be billed by your cloud provider for resources created during the installation and their usage after the installation.

Required knowledge

In order to install Bizzflow and use it, you should be able to click your way around your cloud provider’s user interface. In addition, you should be familiar with following list of technologies.

JSON/YAML

JavaScript Object Notation (JSON) and YAML Ain’t Markup Language (YAML) both provide ways to serialize structured data. We use them for the project configuration, e.g.:

// This is what an SQL transformation configuration could look like in JSON
{
  "id": "my-transformation",
  "input_tables": ["in_main.my_table"],
  "type": "sql"
}
# This is what an SQL transformation configuration could look like in YAML
id: "my-transformation"
input_tables:
  - in_main.my_table
type: "sql"

If you are completely unfamiliar with what you see above, you should take your time to learn how to understand and write either JSON or YAML yourself before using Bizzflow.

SQL

Structured Query Language or SQL is a language used to query data from databases and modify database structure. If you intend to use Bizzflow for SQL transformations, you should be familiar with SQL dialect used within your selected Analytical Warehouse.

git

Git is the most popular version control system. As you have already seen in Key Concept chapter, Bizzflow uses git for project management. You will have to be able to maintain your code in a git repository. It doesn’t matter whether you will clone the repo locally or use your git host’s web IDE.

We recommend Roger Dudler’s git guide to help you wrap your head around git. For basic Bizzflow usage, you need to be familiar at least with following git subcommands:

  • git clone
  • git pull/push
  • git commit
  • git add

Excelent free git hosting is provided e.g. by gitlab or github, but any git hosting with SSH key authentication will work with Bizzflow.

ssh

You may be familiar openssh client tools such as ssh and ssh-keygen. If so, good for you. If not, please note that you cannot install Bizzflow without a private SSH key. If you do not want to install openssh, you can try generating SSH key using online services, but we strongly discourage from doing so. You can find detailed instructions on how to generate SSH key later in installation guide.

Database tools

It doesn’t matter which tool will you be using, as long as it is compatible with the warehouse you have selected for your Bizzflow installation.

We recommend using a multi-database tool such as DBeaver or DataGrip.

Minimal cloud requirements

The setup creates virtual machines and database of the following types:

AzureGCPAWS
vm-airflowStandard_B2msn2-standard-2t3.medium
vm-workerStandard_B2msn2-standard-2t3.large
scheduler postgres databaseB_Standard_B1msdb-g1-smalldb.t3.small
main databaseMSSQLServerBigquerySnowflake

The specified VM and database types should be considered the minimal required configuration to run a small scale project. Keep in mind that as your project grows, the performance of your virtual machines will have to grow with it in order to keep your pipelines from failing.