January 2021

Posted January 1, 2021 ‐ 2 min read

Automatic task retry, improved project configuration handling, DAG dependency and others.

New features

  • Automatic retries within the toolkit’s pipeline, applies to:
    • git operations
    • docker operations
    • loading tables
  • Writers support
    • you can now send data from output stages to a component that will be able to write the data to external systems
    • usage is very similar to that of extractors and is described in Bizzflow wiki
  • Improved DAG generator
    • DAG generator was refactored and split into multiple files
  • Project configuration handling was improved
    • If the git configuration is invalid, you can do your repairs and/or revert in the git project and you should be able to run update_project DAG even if the configuration was invalid (no need to SSH into vm-airflow anymore)
    • Deleting extractor configuration from project should not cause configuration validation errors anymore (issue 26)
  • Toolkit updates
    • we are slowly starting to make updating more sensible - from now on, toolkit will be updated automatically only once a week
    • in the future, we will try to trigger updates only if there actually are some
  • Project updates (aka git pull)
    • project is not longer updated automatically during orchestration, you have to run update_project manually after every edit
    • this is due to our philosophy that you should always be running ‘latest working’ and not necessarily ‘latest’
    • in the future, we will try to automatically apply latest configuration upon push / merge to master branch of the project
  • DAG Dependency
    • this feature was released previously but we feel it should be noted here as well
    • see more in Bizzflow wiki
  • ‘Always running’ configuration for worker machine
    • you can specify "keep_running": true in the project.json config file and Bizzflow will never turn your worker machine off
    • good when frequent orchestrations were slowed down by VM’s turning on and off overhead
    • see more in Bizzflow wiki

Bug fixes

  • Duplicate notifications
    • you should not receive duplicate notifications on pipeline errors anymore
    • warnings on retry should not send notification anymore (issue 36)
  • Cleanup live storage
    • live storage is now cleaned both upon start and end of run a of a component, so you should never again have to deal with tables reappearing in your storage even though they are no more in extractor’s configuration
  • Docker transformation failure on ‘no output tables’
    • docker transformations should fail no more in GCP when there are no output tables (issue 33)

Known issues

  • Docker transformations in the AWS
    • currently, docker transformations do not work well within AWS, because of bad behaviour of awcli in combination with glob patterns
    • in GCP this feature has full support
  • See toolkit project issues for more