Skip to content

Commit

Permalink
Release v0.2.0
Browse files Browse the repository at this point in the history
* Added uniqueness check([#200](#200)). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue [154](#154).
* Added column expression support for limits in not less and not greater than checks, and updated docs ([#200](#200)). This commit introduces several changes to simplify and enhance data quality checking in PySpark workloads for both streaming and batch data. The naming conventions of rule functions have been unified, and the `is_not_less_than` and `is_not_greater_than` functions now accept column names or expressions as limits. The input parameters for range checks have been unified, and the logic of `is_not_in_range` has been updated to be inclusive of the boundaries. The project's documentation has been improved, with the addition of comprehensive examples, and the contribution guidelines have been clarified. This change includes a breaking change for some of the checks. Users are advised to review and test the changes before implementation to ensure compatibility and avoid any disruptions. Reslves issues: [131](#131), [197](#200), [175](#175), [205](#205)
* Include predefined check functions by default when applying custom checks by metadata ([#203](#203)). The data quality engine has been updated to include predefined check functions by default when applying custom checks using metadata in the form of YAML or JSON. This change simplifies the process of defining custom checks, as users no longer need to manually import predefined functions, which were previously required and could be cumbersome. The default behavior now is to import all predefined checks. The `validate_checks` method has been updated to accept a dictionary of custom check functions instead of global variables. This improvement resolves issue [#48](#48).
  • Loading branch information
mwojtyczka committed Mar 10, 2025
1 parent d2df543 commit e6d81f8
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Version changelog

## 0.2.0

* Added uniqueness check([#200](https://github.com/databrickslabs/dqx/issues/200)). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue [154](https://github.com/databrickslabs/dqx/issues/154).
* Added column expression support for limits in not less and not greater than checks, and updated docs ([#200](https://github.com/databrickslabs/dqx/issues/200)). This commit introduces several changes to simplify and enhance data quality checking in PySpark workloads for both streaming and batch data. The naming conventions of rule functions have been unified, and the `is_not_less_than` and `is_not_greater_than` functions now accept column names or expressions as limits. The input parameters for range checks have been unified, and the logic of `is_not_in_range` has been updated to be inclusive of the boundaries. The project's documentation has been improved, with the addition of comprehensive examples, and the contribution guidelines have been clarified. This change includes a breaking change for some of the checks. Users are advised to review and test the changes before implementation to ensure compatibility and avoid any disruptions. Reslves issues: [131](https://github.com/databrickslabs/dqx/issues/131), [197](https://github.com/databrickslabs/dqx/pull/200), [175](https://github.com/databrickslabs/dqx/issues/175), [205](https://github.com/databrickslabs/dqx/issues/205)
* Include predefined check functions by default when applying custom checks by metadata ([#203](https://github.com/databrickslabs/dqx/issues/203)). The data quality engine has been updated to include predefined check functions by default when applying custom checks using metadata in the form of YAML or JSON. This change simplifies the process of defining custom checks, as users no longer need to manually import predefined functions, which were previously required and could be cumbersome. The default behavior now is to import all predefined checks. The `validate_checks` method has been updated to accept a dictionary of custom check functions instead of global variables. This improvement resolves issue [#48](https://github.com/databrickslabs/dqx/issues/48).
## 0.1.13

* Fixed cli installation and demo ([#177](https://github.com/databrickslabs/dqx/issues/177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](https://github.com/databrickslabs/dqx/issues/171) and [#198](https://github.com/databrickslabs/dqx/issues/198).
Expand Down
2 changes: 1 addition & 1 deletion src/databricks/labs/dqx/__about__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.1.13"
__version__ = "0.2.0"

0 comments on commit e6d81f8

Please sign in to comment.