Extractors

Created: May 20, 2021, Updated: June 20, 2022

Extractors are components within your data pipeline that copy the data from your data source to Bizzflow's data storage (analytical warehouse).

Extractors' configuration is maintained in /extractors directory within the project repository’s root. Each extractor’s task is described in a separate JSON (or YAML) file, that needs to follow structure described below.

For list of available extractor components, see Data Sources.

Extractor Configuration

Every extractor configuration needs a separate file. Length of filenames is limited, for more please see Naming. Configuration inside the file needs two keys: type and config. type tells Bizzflow which component to use and config is then passed to the component when running.

/extractors/example.json

{
  "type": "ex-mysql",
  "config": {
    // mysql component-specific configuration
  }
}

Or the same example in YAML:

/extractors/example.yaml

type: ex-mysql
config:
  # mysql component-specific configuration

How to find out the component-specific configuration

In every single one of the officially supported extractors there will be a description within component’s repository README.md file and a configuration sample. In most cases we also include JSON schema in a separate file.

Storing credentials and sensitive data

You should never store credentials or sensitive data in your git repository. Bizzflow comes prepared for this. Anytime you would need to input a password to the configuration file, you can instead refer to an encrypted Airflow Connection data using #!#:connection_id whereas the #!#: tells Bizzflow not to interpret the following string literally but instead search for a connection with id connection_id in your Airflow Connections and use its password. See the Basic Tutorial for more.

Custom Extractor Component Configuration

If you want to use your own instead of Bizzflow’s public components see Component configuration.

Example: Setting up MySQL extractor

Let’s say, for the sake of our example, that our database maindata is running on a server named supermysqldb.com. Following is an example of how to tell Bizzflow to extract tables users and invoices from the database. We created Airflow Connection with id maindata containing our password to the database.

The MySQL extractor’s repository contains an example, so that specification of the extractor in a JSON file should be fairly simple.

/extractors/superdb.json

{
  "type": "ex-mysql",
  "config": {
    "user": "mario",
    "password": "#!#:maindata",
    "host": "supermysqldb.com",
    "database": "maindata",
    "query": {
      "users": "SELECT * FROM `users`",
      "invoices": "SELECT * FROM `invoices`"
    }
  }
}

Or using YAML:

/extractors/superdb.yaml

type: ex-mysql
config:
  user: mario
  password: "#!#:maindata"
  host: supermysqldb.com
  database: maindata
  query:
    users: SELECT * FROM `users`
    invoices: SELECT * FROM `invoices`