Include predefined check functions by default when applying custom checks by metadata #203

mwojtyczka · 2025-03-03T12:05:44Z

Changes

Include predefined check functions by default when applying checks with metadata (yaml/json). Previously, when defining custom check functions, one had to explicitly import pre-defined functions as well which was very cumbersome and not intuitive.

Linked issues

Resolves #48

Tests

manually tested
added unit tests
added integration tests

github-actions · 2025-03-03T12:23:24Z

✅ 127/127 passed, 1 skipped, 16m54s total

_{Running from acceptance #533}

* Added uniqueness check([#200](#200)). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue [154](#154). * Added column expression support for limits in not less and not greater than checks, and updated docs ([#200](#200)). This commit introduces several changes to simplify and enhance data quality checking in PySpark workloads for both streaming and batch data. The naming conventions of rule functions have been unified, and the `is_not_less_than` and `is_not_greater_than` functions now accept column names or expressions as limits. The input parameters for range checks have been unified, and the logic of `is_not_in_range` has been updated to be inclusive of the boundaries. The project's documentation has been improved, with the addition of comprehensive examples, and the contribution guidelines have been clarified. This change includes a breaking change for some of the checks. Users are advised to review and test the changes before implementation to ensure compatibility and avoid any disruptions. Reslves issues: [131](#131), [197](#200), [175](#175), [205](#205) * Include predefined check functions by default when applying custom checks by metadata ([#203](#203)). The data quality engine has been updated to include predefined check functions by default when applying custom checks using metadata in the form of YAML or JSON. This change simplifies the process of defining custom checks, as users no longer need to manually import predefined functions, which were previously required and could be cumbersome. The default behavior now is to import all predefined checks. The `validate_checks` method has been updated to accept a dictionary of custom check functions instead of global variables. This improvement resolves issue [#48](#48).

include predefined checks when custom check functions are provided

9f4d2db

mwojtyczka requested a review from a team as a code owner March 3, 2025 12:05

mwojtyczka requested review from tombonfert and removed request for a team March 3, 2025 12:05

mwojtyczka temporarily deployed to tool March 3, 2025 12:05 — with GitHub Actions Inactive

mwojtyczka changed the title ~~Include predefined check functions by default when applying checks with metadata (yaml/json)~~ Include predefined check functions by default when applying custom checks by metadata Mar 3, 2025

mwojtyczka requested review from alexott and nehamilak-db March 3, 2025 12:06

updated docs

551d49a

mwojtyczka temporarily deployed to tool March 3, 2025 12:30 — with GitHub Actions Inactive

refactor

c9edf28

mwojtyczka temporarily deployed to tool March 3, 2025 13:22 — with GitHub Actions Inactive

updated links

a57b44e

mwojtyczka had a problem deploying to tool March 3, 2025 13:36 — with GitHub Actions Error

refactor

bc25a4c

mwojtyczka had a problem deploying to tool March 3, 2025 13:39 — with GitHub Actions Error

refactor

3d1e5ae

mwojtyczka had a problem deploying to tool March 3, 2025 13:44 — with GitHub Actions Error

refactor

d5121b5

mwojtyczka temporarily deployed to tool March 3, 2025 13:45 — with GitHub Actions Inactive

alexott approved these changes Mar 3, 2025

View reviewed changes

mwojtyczka merged commit 5835138 into main Mar 3, 2025
9 checks passed

mwojtyczka deleted the func_registry branch March 3, 2025 14:18

mwojtyczka mentioned this pull request Mar 10, 2025

Release v0.2.0 #207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include predefined check functions by default when applying custom checks by metadata #203

Include predefined check functions by default when applying custom checks by metadata #203

mwojtyczka commented Mar 3, 2025

github-actions bot commented Mar 3, 2025 •

edited

Loading

Include predefined check functions by default when applying custom checks by metadata #203

Include predefined check functions by default when applying custom checks by metadata #203

Conversation

mwojtyczka commented Mar 3, 2025

Changes

Linked issues

Tests

github-actions bot commented Mar 3, 2025 • edited Loading

github-actions bot commented Mar 3, 2025 •

edited

Loading