General Project Configuration
Created: May 20, 2021, Updated: September 14, 2023
General project configuration is specified in project.json
file in the repository’s root.
You will find yourself editing this file in rare cases only.
project.json example
This is what a typical project.json
file may look like (the actual values may differ based on your
cloud provider):
{
"project_id": "",
"git_project_path": "git@gitlab.com:tomas.votava/bizzflow-azure.git",
"git_toolkit_path": "",
"git_toolkit_tag": "",
"dataset_location": "",
"compute_zone": "",
"compute_region": "",
"notification_email": ["tomas.votava@bizztreat.com"],
"debug": false,
"live_bucket": "bizzflow-live",
"archive_bucket": "bizzflow-archive",
"worker_machine": [
{
"id": "",
"name": "vm-worker",
"host": "10.0.2.5",
"user": "bizzflow",
"components_path": "/home/bizzflow/components",
"data_path": "/home/bizzflow/data",
"config_path": "/home/bizzflow/config",
"keep_running": false
}
],
"user": "bizzflow",
"query_timeout": 600,
"hostname": "40.89.158.93",
"public_ip": "",
"classes": {
"storage_manager": "AzureSQLStorageManager",
"sandbox_manager": "AzureSqlSandboxManager",
"vault_manager": "AirflowVaultManager",
"worker_manager": "AzureWorkerManager",
"file_storage_manager": "ABSFileStorageManager",
"datamart_manager": "AzureSQLDatamartManager",
"credentials_manager": "AzureSQLCredentialManager",
"transformation_executor": "AzureSQLTransformationExecutor",
"step": "AzureSQLStep"
},
"azure_blob_account_name": "bizzflowbizzflowbf58bkji",
"resource_group": "bizzflow-bf58bkji",
"storage": {
"host": "bizzflow-bf58bkji.database.windows.net",
"database": "bizzflow",
"port": 1433,
"backend": "azuresql",
"default_column_type": "NVARCHAR(MAX)",
},
"telemetry": {
"generate": true,
"schedule": "0 4 * * *",
"backend": "azuresql"
}
}
Configuration keys
Few of the keys you will probably want to edit at some point are in the table below.
Key | Type | Description |
---|---|---|
notification_email | list of strings | list of e-mails for notifications |
worker_machine.keep_running | boolean | Specify whether or not you want to keep the machine running after orchestration |
query_timeout | int | query timeout for SQL queries |
default_column_type | string | Optional, specify a default column type for created tables |
telemetry | object | Telemetry configuration |
telemetry.generate | boolean | Specify whether or not you want to generate telemetry datamart (default: false ) |
telemetry.schedule | string | Cron schedule for telemetry refresh (default: 0 1 * * * ) |
If you want to specify a different default_column_type
, you need to add it to your project.json
file manually. When adding it to the a project running on GCP, you will also need to add the storage
key and put it inside.
If default_column_type
is not specified, Bizzflow will create tables with the following types:
- Snowlake:
VARCHAR(16777216)
- Azure:
VARCHAR(8000)
- GCP:
STRING
Telemetry
Bizzflow can generate a telemetry datamart for you. This datamart contains information about your Bizzflow jobs. You can use this datamart to analyze you past jobs, their failures and performance.
By default, telemetry is disabled. To enable it, set telemetry.generate
to true
. If you do not specify
a cron schedule, the datamart will be generated every day at 1 AM.
After enabling telemetry, you can find a new DAG with id 00_Orchestration_bizzflow_telemetry
in your
Airflow instance. You can run this DAG manually to generate the datamart immediately.
After telemetry orchestration has finished, you can find the datamart in your storage.
Navigate to Flow UI
-> Storage
. A new kex dm_bizzflow_telemetry
should be present in the list.