Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Demo notebook "dqx_demo_tool" fails if you use table name in your run_config #171

Closed
1 task done
gergo-databricks opened this issue Feb 13, 2025 · 1 comment · Fixed by #177 or #199
Closed
1 task done
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@gergo-databricks
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

This line will only work if "run_config.input_location" is a path and will fail if it is a table name.
bronze_df = spark.read.format(run_config.input_format).load(run_config.input_location).limit(1000)

Expected Behavior

Documentation states:
" input_location: s3://iot-ingest/raw # <- Input location for profiling (UC table or cloud path)"
https://databrickslabs.github.io/dqx/docs/guide/#using-cli

Code comment in config.py:
"input_location: str | None = None # input data path or a table"

utils.py / read_input_data() read the input_location correctly, so just the demo that is not correct.

I would expect the demo not to fail if a table name is used in this config field.

Steps To Reproduce

Set input_location to a table full name and run the dqx_demo_tool.py demo notebook on a shared cluster with DBR 15.4.

Cloud

AWS

Operating System

macOS

Relevant log output

Error:
Path must be absolute: my_catalog.my_schema.my_table/_delta_log

JVM stacktrace:
java.lang.IllegalArgumentException
	at com.databricks.common.path.AbstractPath$.fromHadoopPath(AbstractPath.scala:114)
...
@gergo-databricks gergo-databricks added the bug Something isn't working label Feb 13, 2025
@mwojtyczka mwojtyczka added enhancement New feature or request and removed bug Something isn't working labels Feb 14, 2025
@mwojtyczka
Copy link
Contributor

mwojtyczka commented Feb 14, 2025

It was a bit intentional to use native spark api in the demo to show that you don't actually need any special functions to interact with dqx. But I agree it would make more sense to use read_input_data function in the tool demo.

@mwojtyczka mwojtyczka added the bug Something isn't working label Feb 14, 2025
@mwojtyczka mwojtyczka self-assigned this Feb 18, 2025
mwojtyczka added a commit that referenced this issue Feb 27, 2025
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198).
* [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
mwojtyczka added a commit that referenced this issue Feb 27, 2025
* Fixed cli installation and demo
([#177](#177)). In this
release, changes have been made to adjust the dashboard name, ensuring
compliance with new API naming rules. The dashboard name now only
contains alphanumeric characters, hyphens, or underscores, and the
reference section has been split for clarity. In addition, demo for the
tool has been updated to work regardless if a path or UC table is
provided in the config. Furthermore, documentation has been refactored
and udpated to improve clarity. The following issue have been closed:
[#171](#171) and
[#198](#198).
* [Feature] Update is_(not)_in_range
([#87](#87)) to support
max/min limits from col
([#153](#153)). In this
release, the `is_in_range` and `is_not_in_range` quality rule functions
have been updated to support a column as the minimum or maximum limit,
in addition to a literal value. This change is accomplished through the
introduction of optional `min_limit_col_expr` and `max_limit_col_expr`
arguments, allowing users to specify a column expression as the minimum
or maximum limit. Extensive testing, including unit tests and
integration tests, has been conducted to ensure the correct behavior of
the new functionality. These enhancements offer increased flexibility
when defining quality rules, catering to a broader range of use cases
and scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants