General Project Configuration

Created: May 20, 2021, Updated: September 14, 2023

General project configuration is specified in project.json file in the repository’s root. You will find yourself editing this file in rare cases only.

project.json example

This is what a typical project.json file may look like (the actual values may differ based on your cloud provider):

{
  "project_id": "",
  "git_project_path": "git@gitlab.com:tomas.votava/bizzflow-azure.git",
  "git_toolkit_path": "",
  "git_toolkit_tag": "",
  "dataset_location": "",
  "compute_zone": "",
  "compute_region": "",
  "notification_email": ["tomas.votava@bizztreat.com"],
  "debug": false,
  "live_bucket": "bizzflow-live",
  "archive_bucket": "bizzflow-archive",
  "worker_machine": [
    {
      "id": "",
      "name": "vm-worker",
      "host": "10.0.2.5",
      "user": "bizzflow",
      "components_path": "/home/bizzflow/components",
      "data_path": "/home/bizzflow/data",
      "config_path": "/home/bizzflow/config",
      "keep_running": false
    }
  ],
  "user": "bizzflow",
  "query_timeout": 600,
  "hostname": "40.89.158.93",
  "public_ip": "",
  "classes": {
    "storage_manager": "AzureSQLStorageManager",
    "sandbox_manager": "AzureSqlSandboxManager",
    "vault_manager": "AirflowVaultManager",
    "worker_manager": "AzureWorkerManager",
    "file_storage_manager": "ABSFileStorageManager",
    "datamart_manager": "AzureSQLDatamartManager",
    "credentials_manager": "AzureSQLCredentialManager",
    "transformation_executor": "AzureSQLTransformationExecutor",
    "step": "AzureSQLStep"
  },
  "azure_blob_account_name": "bizzflowbizzflowbf58bkji",
  "resource_group": "bizzflow-bf58bkji",
  "storage": {
    "host": "bizzflow-bf58bkji.database.windows.net",
    "database": "bizzflow",
    "port": 1433,
    "backend": "azuresql",
    "default_column_type": "NVARCHAR(MAX)",
  },
  "telemetry": {
    "generate": true,
    "schedule": "0 4 * * *",
    "backend": "azuresql"
  }
}

Configuration keys

Few of the keys you will probably want to edit at some point are in the table below.

KeyTypeDescription
notification_emaillist of stringslist of e-mails for notifications
worker_machine.keep_runningbooleanSpecify whether or not you want to keep the machine running after orchestration
query_timeoutintquery timeout for SQL queries
default_column_typestringOptional, specify a default column type for created tables
telemetryobjectTelemetry configuration
telemetry.generatebooleanSpecify whether or not you want to generate telemetry datamart (default: false)
telemetry.schedulestringCron schedule for telemetry refresh (default: 0 1 * * *)

If you want to specify a different default_column_type, you need to add it to your project.json file manually. When adding it to the a project running on GCP, you will also need to add the storage key and put it inside.

If default_column_type is not specified, Bizzflow will create tables with the following types:

  • Snowlake: VARCHAR(16777216)
  • Azure: VARCHAR(8000)
  • GCP: STRING

Telemetry

Bizzflow can generate a telemetry datamart for you. This datamart contains information about your Bizzflow jobs. You can use this datamart to analyze you past jobs, their failures and performance.

By default, telemetry is disabled. To enable it, set telemetry.generate to true. If you do not specify a cron schedule, the datamart will be generated every day at 1 AM.

After enabling telemetry, you can find a new DAG with id 00_Orchestration_bizzflow_telemetry in your Airflow instance. You can run this DAG manually to generate the datamart immediately.

After telemetry orchestration has finished, you can find the datamart in your storage. Navigate to Flow UI -> Storage. A new kex dm_bizzflow_telemetry should be present in the list.