[BUG]: installing dbx in workspace leading to error #175

michael-galli · 2025-02-24T09:24:15Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I connected to my workspace by:
databricks auth login --host [workspace url]

then I ran:
databricks labs install dqx

Error message:
Traceback (most recent call last):
File "C:\Users\michael.galli.databricks\labs\dqx\lib\src\databricks\labs\dqx\installer\install.py", line 38, in
from databricks.labs.dqx.runtime import Workflows
File "C:\Users\michael.galli.databricks\labs\dqx\lib\src\databricks\labs\dqx\runtime.py", line 10, in
from databricks.labs.dqx.profiler.workflow import ProfilerWorkflow
File "C:\Users\michael.galli.databricks\labs\dqx\lib\src\databricks\labs\dqx\profiler\workflow.py", line 3, in
from databricks.labs.dqx.contexts.workflows import RuntimeContext
File "C:\Users\michael.galli.databricks\labs\dqx\lib\src\databricks\labs\dqx\contexts\workflows.py", line 3, in
from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'
Error: installer: exit status 1

Expected Behavior

No response

Steps To Reproduce

No response

Cloud

AWS

Operating System

macOS

Relevant log output

michael-galli · 2025-02-24T11:15:10Z

resolution: I needed Databricks CLI v0.241 or later.

… not less / greater than checks, and updated docs (#200) ## Changes * Added uniqueness check to verify values in a column are unique. Report an issue for each row that contains a duplicate value. Allow to specify custom window spec. * Renamed rule functions to unify the naming conventions across all checks. * Extended `is_not_less_than` and `is_not_greater_than` to accept column name or column expression as limit. * Unified input parameters to have a single field for min and max limits in the `is_is_range` and `is_not_in_range` checks. * Updated logic of `is_not_in_range` to be inclusive of the boundaries for consistency with the `is_is_range` check. * Updated quality checks api descriptions. * Improved documentation and provided comprehensive examples of checks. * Added info on using private PYPI package and installation of the lastest Databricks CLI to avoid installation issues This change unifies the naming convention across all checks and introduces a breaking change! ### Linked issues Resolves #154 #131 #197 #175 #205 ### Tests  - [x] manually tested - [x] added unit tests - [x] added integration tests

* Added uniqueness check([#200](#200)). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue [154](#154). * Added column expression support for limits in not less and not greater than checks, and updated docs ([#200](#200)). This commit introduces several changes to simplify and enhance data quality checking in PySpark workloads for both streaming and batch data. The naming conventions of rule functions have been unified, and the `is_not_less_than` and `is_not_greater_than` functions now accept column names or expressions as limits. The input parameters for range checks have been unified, and the logic of `is_not_in_range` has been updated to be inclusive of the boundaries. The project's documentation has been improved, with the addition of comprehensive examples, and the contribution guidelines have been clarified. This change includes a breaking change for some of the checks. Users are advised to review and test the changes before implementation to ensure compatibility and avoid any disruptions. Reslves issues: [131](#131), [197](#200), [175](#175), [205](#205) * Include predefined check functions by default when applying custom checks by metadata ([#203](#203)). The data quality engine has been updated to include predefined check functions by default when applying custom checks using metadata in the form of YAML or JSON. This change simplifies the process of defining custom checks, as users no longer need to manually import predefined functions, which were previously required and could be cumbersome. The default behavior now is to import all predefined checks. The `validate_checks` method has been updated to accept a dictionary of custom check functions instead of global variables. This improvement resolves issue [#48](#48).

michael-galli added the bug Something isn't working label Feb 24, 2025

michael-galli closed this as completed Feb 24, 2025

wouove mentioned this issue Mar 5, 2025

[BUG]: SSL issue when installing tool through databricks labs install ... #197

Closed

1 task

mwojtyczka mentioned this issue Mar 6, 2025

Added uniqueness check, added column expression support for limits in not less / greater than checks, and updated docs #200

Merged

3 tasks

mwojtyczka mentioned this issue Mar 10, 2025

Release v0.2.0 #207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: installing dbx in workspace leading to error #175

[BUG]: installing dbx in workspace leading to error #175

michael-galli commented Feb 24, 2025

michael-galli commented Feb 24, 2025

[BUG]: installing dbx in workspace leading to error #175

[BUG]: installing dbx in workspace leading to error #175

Comments

michael-galli commented Feb 24, 2025

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Cloud

Operating System

Relevant log output

michael-galli commented Feb 24, 2025