Extractors

Created: May 20, 2021, Updated: June 20, 2022

Extractors are components within your data pipeline that copy the data from your data source to Bizzflow's data storage (analytical warehouse).

Extractors' configuration is maintained in /extractors directory within the project repository’s root. Each extractor’s task is described in a separate JSON (or YAML) file, that needs to follow structure described below.

🤓

Extractors are expected to take some CPU and RAM when running and so every time extractor component runs in your Bizzflow project, worker machine (see Cloud Compute) is started (if it wasn’t running before), the component runs on the machine and the machine is then shut down (if it is not set to keep running in Project Configuration)

For list of available extractor components, see Data Sources.

Extractor Configuration

Every extractor configuration needs a separate file. Length of filenames is limited, for more please see Naming. Configuration inside the file needs two keys: type and config. type tells Bizzflow which component to use and config is then passed to the component when running.

/extractors/example.json

{
  "type": "ex-mysql",
  "config": {
    // mysql component-specific configuration
  }
}

Or the same example in YAML:

/extractors/example.yaml

type: ex-mysql
config:
  # mysql component-specific configuration

How to find out the component-specific configuration

In every single one of the officially supported extractors there will be a description within component’s repository README.md file and a configuration sample. In most cases we also include JSON schema in a separate file.

Storing credentials and sensitive data

You should never store credentials or sensitive data in your git repository. Bizzflow comes prepared for this. Anytime you would need to input a password to the configuration file, you can instead refer to an encrypted Airflow Connection data using #!#:connection_id whereas the #!#: tells Bizzflow not to interpret the following string literally but instead search for a connection with id connection_id in your Airflow Connections and use its password. See the Basic Tutorial for more.

Custom Extractor Component Configuration

If you want to use your own instead of Bizzflow’s public components see Component configuration.

Example: Setting up `MySQL` extractor

Let’s say, for the sake of our example, that our database maindata is running on a server named supermysqldb.com. Following is an example of how to tell Bizzflow to extract tables users and invoices from the database. We created Airflow Connection with id maindata containing our password to the database.

The MySQL extractor’s repository contains an example, so that specification of the extractor in a JSON file should be fairly simple.

/extractors/superdb.json

{
  "type": "ex-mysql",
  "config": {
    "user": "mario",
    "password": "#!#:maindata",
    "host": "supermysqldb.com",
    "database": "maindata",
    "query": {
      "users": "SELECT * FROM `users`",
      "invoices": "SELECT * FROM `invoices`"
    }
  }
}

Or using YAML:

/extractors/superdb.yaml

type: ex-mysql
config:
  user: mario
  password: "#!#:maindata"
  host: supermysqldb.com
  database: maindata
  query:
    users: SELECT * FROM `users`
    invoices: SELECT * FROM `invoices`

❗

You should always avoid using asterisk * in SELECT statements. If the structure of the tables change suddenly (missing a column, extra column added), you want either the pipeline to fail as soon as possible (before extracting data) or not fail at all (skipping the new column because it is not named in the SELECT statement). We only use * in examples so that they are more readable.

Extractors

Extractor Configuration#

How to find out the component-specific configuration#

Storing credentials and sensitive data#

Custom Extractor Component Configuration#

Example: Setting up MySQL extractor#

Extractor Configuration

How to find out the component-specific configuration

Storing credentials and sensitive data

Custom Extractor Component Configuration

Example: Setting up `MySQL` extractor