-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Demo notebook "dqx_demo_tool" fails if you use table name in your run_config #171
Closed
1 task done
Closed
1 task done
Labels
Comments
It was a bit intentional to use native spark api in the demo to show that you don't actually need any special functions to interact with dqx. But I agree it would make more sense to use |
mwojtyczka
added a commit
that referenced
this issue
Feb 27, 2025
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198). * [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
Merged
mwojtyczka
added a commit
that referenced
this issue
Feb 27, 2025
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198). * [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue for this?
Current Behavior
This line will only work if "run_config.input_location" is a path and will fail if it is a table name.
bronze_df = spark.read.format(run_config.input_format).load(run_config.input_location).limit(1000)
Expected Behavior
Documentation states:
" input_location: s3://iot-ingest/raw # <- Input location for profiling (UC table or cloud path)"
https://databrickslabs.github.io/dqx/docs/guide/#using-cli
Code comment in config.py:
"input_location: str | None = None # input data path or a table"
utils.py / read_input_data() read the input_location correctly, so just the demo that is not correct.
I would expect the demo not to fail if a table name is used in this config field.
Steps To Reproduce
Set input_location to a table full name and run the dqx_demo_tool.py demo notebook on a shared cluster with DBR 15.4.
Cloud
AWS
Operating System
macOS
Relevant log output
The text was updated successfully, but these errors were encountered: