Refactor to remove try_cast spark function #163

mwojtyczka · 2025-02-07T22:38:34Z

Changes

Removed try_cast as it is not available in all runtimes.

Linked issues

Resolves #158

Tests

manually tested
added unit tests
added integration tests

github-actions · 2025-02-07T22:45:40Z

✅ 121/121 passed, 1 skipped, 30m59s total

_{Running from acceptance #449}

sundarshankar89

LGTM

* Provided option to customize reporting column names ([#127](#127)). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to verify the application of checks with custom column naming. These changes aim to improve the customizability, flexibility, and user experience of DQEngine by providing more control over the reporting columns and resolving issue [#46](#46). * Fixed parsing error when loading checks from a file ([#165](#165)). In this release, we have addressed a parsing error that occurred when loading checks (data quality rules) from a file, fixing issue [#162](#162). The specific issue being resolved is a SQL expression parsing error. The changes include refactoring tests to eliminate code duplication and improve maintainability, as well as updating method and variable names to use `filepath` instead of "path". Additionally, new unit and integration tests have been added and manually tested to ensure the correct functionality of the updated code. * Removed usage of try_cast spark function from the checks to make sure DQX can be run on more runtimes ([#163](#163)). In this release, we have refactored the code to remove the usage of the `try_cast` Spark function and replace it with `cast` and `isNull` checks to improve code compatibility, particularly for runtimes where `try_cast` is not available. The affected functionality includes null and empty column checks, checking if a column value is in a list, and checking if a column value is a valid date or timestamp. We have added unit and integration tests to ensure functionality is working as intended. * Added filter to rules so that you can make conditional checks ([#141](#141)). The filter serves as a condition that data must meet to be evaluated by the check function. The filters restrict the evaluation of checks to only apply to rows that meet the specified conditions. This feature enhances the flexibility and customizability of data quality checks in the DQEngine.

refactored to remove try_cast function

faa847e

mwojtyczka requested a review from a team as a code owner February 7, 2025 22:38

mwojtyczka requested review from pratikk-databricks and removed request for a team February 7, 2025 22:38

mwojtyczka temporarily deployed to tool February 7, 2025 22:38 — with GitHub Actions Inactive

mwojtyczka requested a review from alexott February 7, 2025 22:44

alexott approved these changes Feb 8, 2025

View reviewed changes

sundarshankar89 approved these changes Feb 10, 2025

View reviewed changes

alexott merged commit 3ca34dc into main Feb 10, 2025
9 checks passed

alexott deleted the try_cast_remove branch February 10, 2025 08:19

mwojtyczka mentioned this pull request Feb 12, 2025

Release v0.1.11 #168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to remove try_cast spark function #163

Refactor to remove try_cast spark function #163

mwojtyczka commented Feb 7, 2025

github-actions bot commented Feb 7, 2025 •

edited

Loading

sundarshankar89 left a comment

Refactor to remove try_cast spark function #163

Refactor to remove try_cast spark function #163

Conversation

mwojtyczka commented Feb 7, 2025

Changes

Linked issues

Tests

github-actions bot commented Feb 7, 2025 • edited Loading

sundarshankar89 left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 7, 2025 •

edited

Loading