Improve xml hierarchy test methods, refactor test_trigger.py tests to use them #570

lindsay-stevens · 2021-11-01T04:35:51Z

Closes #563

Why is this the best possible solution? Were any other approaches considered?

This follows on from #548 to add the discussed xml__xpath_match kwarg for PyxformTestCase and to switch to the lxml package from ElementTree for the purposes of test XPath evaluation. The benefit of lxml is that it's XPath 1.0 processor can do a lot more than ElementTree's XPath subset. Gains include being able to write compound predicates (with and) in one predicate instead of like .//bind[1=1][1=1], the use of axes, and numerous XPath functions. Also it allows the old code for switching XML tokenizers for Python 3.7 to be removed. More thorough tests for the xml__xpath* matchers are added as well.

The way around the default namespace limitation in lxml (mentioned in #548) was to tell the XPath processor about the same namespace URI/URL but assign a non-empty and not-None prefix to it. The lxml docs say the namespace match is actually on the URI not the prefix, so it's a bit like SQL aliases in different SELECT clauses. Since the default namespace is http://www.w3.org/2002/xforms I went with a prefix of x for this purpose. The assignment happens in pyxform_test_case.py on L231 which is finally used at L556. This means that in any usages of the xml__xpath* test matchers, XPath expressions referring to elements in the default namespace need to be written like .//x:bind instead of .//bind. Attributes without a prefix don't require this x prefix. There are numerous examples in test_pyxform_test_case.py.

This PR also delivers on the main purpose of #563 to refactor the test_trigger.py tests to use the new xml__xpath* matchers. With above improvements and addition of xml__xpath_match, it was all possible with xml__xpath_match.

Further details in the commit messages.

What are the regression risks?

This is a new testing feature so there would be no regressions unless other branches had already started using it.

It's possible that the refactors of test_trigger.py test are not perfect, despite being careful. Each of these tests passes. The XPath results can be checked by setting debug=True, which prints out the XML document, each XPath expression and what it matched.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Not applicable.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run nosetests and verified all tests pass
run black pyxform tests to format code
verified that any code or assets from external sources are properly credited in comments

- adds xpath_match test matcher, which is a convenience on xpath_count where exactly one match is required. - adds more tests for xpath_* matchers. Previous test cases were minimal, the new suite covers 1 or n XPaths with 1 or n matches, for each of the matchers. The test data fed to these test cases cover a range of common XPath patterns, including a search for an attribute value (see s1c5). - changes the XPath engine. The previous approach in XLSForm#548 used stdlib elementtree xpath, which is like XPath but has limited expressions capabilities. The use of lxml makes available a wider range of XPath capabilities. - changes XPath requirements. lxml is a bit more strict on the XPath spec so elements in the default namespace must be prefixed with something (x), since the document contains other namespaces.

- now used by tests/pxform_test_case.py

- fix n_match tests: cases weren't using the suite2 cases because of typo in the convenience lists. For xpath_match, skip n_xpath_n_match_pass scenario and rework n_xpath_n_match_fail. - fix s2c1: incorrect count - fix s2c2: incorrect exact match strings - add docs: describe suites and cases. Clarification in check_content. - add validation: check the tuple types for xpath_exact and xpath_count.

- move matcher code: to facilitate formalised func signatures / types / docstrings, and to allow neater / less indented code (vs keeping it all inline with the check_content func). - remove tokenizer funcs: not needed anymore since change to using lxml. - update invocation (L337-L346) and docstring: usage of xpath matchers intended and tested with whole XML document only, not the instance/model/itext fragments. - update xml__xpath_exact input args: for the expected strings, now takes a set instead of list. XPath processes in document order but document order not necessarily stable between runs.

- Equivalent for model__contains, xml__contains is generally a XPath for the element in the string with attribute predicates to check the content is the same. The improvement though is the check for that element in a precise document location. So in most cases the full path was used rather than a full search (e.g. ".//x:bind") or axis. - Equivalent for xml__excludes is a XPath for the root element ("/h:html") so that there is 1 match, but with a predicate asserting that no descendant elements have the name and attributes of interest.

lognaturel

I focused mostly on the updated tests which look good to me. I agree with the rationale for introducing lxml. The approach for the default namespace makes sense.

I'm wondering whether we should strongly recommend using XPath matchers for new tests now that all this investment has gone into making them nice? The biggest benefit I see is that it removes the attribute order issue entirely and makes tests more robust against changes to things like whitespace in the XForms output. If you agree, @lindsay-stevens, would be great to add to the README.

codecov-commenter · 2021-11-04T09:23:13Z

Codecov Report

Merging #570 (161c3e9) into master (6ca3484) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #570   +/-   ##
=======================================
  Coverage   84.20%   84.20%           
=======================================
  Files          25       25           
  Lines        3672     3672           
  Branches      865      865           
=======================================
  Hits         3092     3092           
  Misses        440      440           
  Partials      140      140

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ca3484...161c3e9. Read the comment docs.

lindsay-stevens · 2021-11-04T09:31:21Z

Thanks @lognaturel I agree and have added a section to the README for this. I've focussed on xml__xpath_match because after having used it I think that will probably be all that's needed in >95% of cases.

lindsay-stevens · 2021-11-04T09:39:40Z

Appveyor build error seems to be random pip error, I only added to the Readme since the passing build in baec985

lindsay-stevens added 5 commits October 30, 2021 09:15

add: lxml to dev dependencies

3cd53c2

- now used by tests/pxform_test_case.py

lindsay-stevens requested review from lognaturel and yanokwa November 1, 2021 04:40

lindsay-stevens changed the title ~~Pyxform 563~~ Improve xml hierarchy test methods, refactor test_trigger.py tests to use them Nov 1, 2021

lognaturel approved these changes Nov 3, 2021

View reviewed changes

add: readme section about xml__xpath* matchers

ef2f906

fix: double backticks needed for inline code formatting style in RST

7dd4e31

dev: re-trigger checks

161c3e9

lindsay-stevens merged commit 66c0d5b into XLSForm:master Nov 17, 2021

lindsay-stevens deleted the pyxform-563 branch November 17, 2021 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve xml hierarchy test methods, refactor test_trigger.py tests to use them #570

Improve xml hierarchy test methods, refactor test_trigger.py tests to use them #570

lindsay-stevens commented Nov 1, 2021 •

edited

Loading

lognaturel left a comment •

edited

Loading

codecov-commenter commented Nov 4, 2021 •

edited

Loading

lindsay-stevens commented Nov 4, 2021

lindsay-stevens commented Nov 4, 2021

Improve xml hierarchy test methods, refactor test_trigger.py tests to use them #570

Improve xml hierarchy test methods, refactor test_trigger.py tests to use them #570

Conversation

lindsay-stevens commented Nov 1, 2021 • edited Loading

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

lognaturel left a comment • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Nov 4, 2021 • edited Loading

Codecov Report

lindsay-stevens commented Nov 4, 2021

lindsay-stevens commented Nov 4, 2021

lindsay-stevens commented Nov 1, 2021 •

edited

Loading

lognaturel left a comment •

edited

Loading

codecov-commenter commented Nov 4, 2021 •

edited

Loading