[BUG]: SQL Expression Parsing Error, when expression includes quotes #162

bullah12 · 2025-02-07T17:06:51Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When defining a SQL expression rule within a yaml file, if the expression contains any quotation marks, the parser is unable to validate the checks using the dq_engine.load_checks_from_workspace_file(file) function.

For example, when applying the following YAML rule to dq_engine.load_checks_from_workspace_file():

 - name: example_test
   criticality: error
   check:
     function: sql_expression
     arguments:
       expression: col1 like "Team %"

It will return the following error:

ParserError: while parsing a flow mapping
  in "<unicode string>", line 1, column 45:
     ...  "sql_expression", "arguments": {"expression": "col1 like "Team %""}}
                                         ^
expected ',' or '}', but got '<scalar>'
  in "<unicode string>", line 1, column 70:
     ... ents": {"expression": "col1 like "Team %""}}
                                         ^
File <command-7538297306428071>, line 1
----> 1 checks = dq_engine.load_checks_from_workspace_file(workspace_path="/Workspace/Users/<user-name>/<rules_file.yml>")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/yaml/parser.py:549, in Parser.parse_flow_mapping_key(self, first)
    547     else:
    548         token = self.peek_token()
--> 549         raise ParserError("while parsing a flow mapping", self.marks[-1],
    550                 "expected ',' or '}', but got %r" % token.id, token.start_mark)
    551 if self.check_token(KeyToken):
    552     token = self.get_token()

The same problem will occur if you used SQL expressions like:

col1 = 'Team A'

Expected Behavior

The function should be able to expect quotation marks as part of the SQL expression argument, and ensure the quotation marks are carried over during JSON parsing.

Since it does not, you need to find a workaround to ensure the 'dq_engine.load_checks_from_workspace_file' function can ignore the quotation marks when parsing to JSON (Workaround is in Steps to Reproduce section)

Steps To Reproduce

As a temporary workaround, do the following:

Replace any quotation marks you would typically use in your SQL expression with the curly apostrophe () [Notes]. e.g: col1 like Team %`
Run your DQX load function. e.g: checks = dq_engine.load_checks_from_workspace_file(workspace_path="")
Define the following function in your notebook/workspace:

def replace_quotes(checks,symbol='`'):
    cleaned_checks = checks.copy()
    for check in cleaned_checks:
        if check['check']['function']=='sql_expression':
            if symbol in check['check']['arguments']['expression'].upper():
                check['check']['arguments']['expression'] = check['check']['arguments']['expression'].replace(symbol,'"')
    return cleaned_checks

Pass your checks variable through the function. e.g: checks = replace_quotes(checks)
Check JSON form of your checks to ensure the rule has been translated over correctly

Notes:
The curly apostrophe can be replaced with any other combination of symbols you'd like. You just need to ensure you change it within your replace_quotes function too.

Cloud

Azure

Operating System

Windows

Relevant log output

The text was updated successfully, but these errors were encountered:

mwojtyczka · 2025-02-11T17:59:42Z

@bullah12 thank you for reporting this.
Fix should be merged and released this week: #165

* Provided option to customize reporting column names ([#127](#127)). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to verify the application of checks with custom column naming. These changes aim to improve the customizability, flexibility, and user experience of DQEngine by providing more control over the reporting columns and resolving issue [#46](#46). * Fixed parsing error when loading checks from a file ([#165](#165)). In this release, we have addressed a parsing error that occurred when loading checks (data quality rules) from a file, fixing issue [#162](#162). The specific issue being resolved is a SQL expression parsing error. The changes include refactoring tests to eliminate code duplication and improve maintainability, as well as updating method and variable names to use `filepath` instead of "path". Additionally, new unit and integration tests have been added and manually tested to ensure the correct functionality of the updated code. * Removed usage of try_cast spark function from the checks to make sure DQX can be run on more runtimes ([#163](#163)). In this release, we have refactored the code to remove the usage of the `try_cast` Spark function and replace it with `cast` and `isNull` checks to improve code compatibility, particularly for runtimes where `try_cast` is not available. The affected functionality includes null and empty column checks, checking if a column value is in a list, and checking if a column value is a valid date or timestamp. We have added unit and integration tests to ensure functionality is working as intended. * Added filter to rules so that you can make conditional checks ([#141](#141)). The filter serves as a condition that data must meet to be evaluated by the check function. The filters restrict the evaluation of checks to only apply to rows that meet the specified conditions. This feature enhances the flexibility and customizability of data quality checks in the DQEngine.

bullah12 added the bug Something isn't working label Feb 7, 2025

mwojtyczka self-assigned this Feb 10, 2025

mwojtyczka mentioned this issue Feb 11, 2025

Fixed parsing error when loading checks from a file #165

Merged

3 tasks

mwojtyczka closed this as completed in #165 Feb 12, 2025

mwojtyczka closed this as completed in 17613f3 Feb 12, 2025

mwojtyczka mentioned this issue Feb 12, 2025

Release v0.1.11 #168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: SQL Expression Parsing Error, when expression includes quotes #162

[BUG]: SQL Expression Parsing Error, when expression includes quotes #162

bullah12 commented Feb 7, 2025

mwojtyczka commented Feb 11, 2025 •

edited

Loading

[BUG]: SQL Expression Parsing Error, when expression includes quotes #162

[BUG]: SQL Expression Parsing Error, when expression includes quotes #162

Comments

bullah12 commented Feb 7, 2025

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Cloud

Operating System

Relevant log output

mwojtyczka commented Feb 11, 2025 • edited Loading

mwojtyczka commented Feb 11, 2025 •

edited

Loading