January 2021
Posted January 1, 2021 ‐ 2 min read
Automatic task retry, improved project configuration handling, DAG dependency and others.
New features
- Automatic retries within the
toolkit
’s pipeline, applies to:- git operations
- docker operations
- loading tables
- Writers support
- you can now send data from output stages to a component that will be able to write the data to external systems
- usage is very similar to that of extractors and is described in Bizzflow wiki
- Improved DAG generator
- DAG generator was refactored and split into multiple files
- Project configuration handling was improved
- If the git configuration is invalid, you can do your repairs and/or revert in the git project and you should be able to run
update_project
DAG even if the configuration was invalid (no need to SSH intovm-airflow
anymore) - Deleting extractor configuration from project should not cause configuration validation errors anymore (issue 26)
- If the git configuration is invalid, you can do your repairs and/or revert in the git project and you should be able to run
- Toolkit updates
- we are slowly starting to make updating more sensible - from now on,
toolkit
will be updated automatically only once a week - in the future, we will try to trigger updates only if there actually are some
- we are slowly starting to make updating more sensible - from now on,
- Project updates (aka
git pull
)- project is not longer updated automatically during orchestration, you have to run
update_project
manually after every edit - this is due to our philosophy that you should always be running ‘latest working’ and not necessarily ‘latest’
- in the future, we will try to automatically apply latest configuration upon push / merge to
master
branch of the project
- project is not longer updated automatically during orchestration, you have to run
- DAG Dependency
- this feature was released previously but we feel it should be noted here as well
- see more in Bizzflow wiki
- ‘Always running’ configuration for worker machine
- you can specify
"keep_running": true
in theproject.json
config file and Bizzflow will never turn your worker machine off - good when frequent orchestrations were slowed down by VM’s turning on and off overhead
- see more in Bizzflow wiki
- you can specify
Bug fixes
- Duplicate notifications
- you should not receive duplicate notifications on pipeline errors anymore
- warnings on retry should not send notification anymore (issue 36)
- Cleanup live storage
- live storage is now cleaned both upon start and end of run a of a component, so you should never again have to deal with tables reappearing in your storage even though they are no more in extractor’s configuration
- Docker transformation failure on ‘no output tables’
- docker transformations should fail no more in GCP when there are no output tables (issue 33)
Known issues
- Docker transformations in the AWS
- currently, docker transformations do not work well within AWS, because of bad behaviour of
awcli
in combination with glob patterns - in GCP this feature has full support
- currently, docker transformations do not work well within AWS, because of bad behaviour of
- See toolkit project issues for more