Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with mypy --strict #2283

Open
ajnelson-nist opened this issue Mar 17, 2023 · 15 comments
Labels
core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` enhancement New feature or request

Comments

@ajnelson-nist
Copy link
Contributor

I just hit an unpleasant surprise on running unit tests on a project that worked yesterday, but failed today. The issue I hit was in using mypy --strict on a SPARQL CONSTRUCT query. A similar issue exists for SELECT queries, but I've resolved that one.

With RDFLib 6.2.0, this worked:

    for result in in_graph.query(construct_query):
        assert isinstance(result[0], URIRef)
        assert isinstance(result[1], URIRef)

Those asserts are a pattern I've had to implement to convince mypy --strict about types when I have later code assuming e.g. result[0] (and [1]) is a URIRef.

With 6.3.0 (released today), I get a complaint that I haven't guaranteed result is indexable. This is true for SELECT and CONSTRUCT queries.

error: Value of type "Union[Tuple[Node, Node, Node], bool, ResultRow]" is not indexable  [index]

I can get by with the following extra bits for SELECT queries:

#!/usr/bin/env/python

# ...
from rdflib.query import ResultRow
# ...
for result in graph.query(select_query):
    assert isinstance(result, ResultRow)

That pretty much followed from the mypy complaint I excerpted above, and checking for where ResultRow was defined. But, for CONSTRUCT queries, the Tuple syntax doesn't quite work the same.

#!/usr/bin/env/python

# ...
for result in graph.query(construct_query):
    assert isinstance(result, Tuple[URIRef, URIRef, URIRef])

mypy --strict throws a different error on that:

error: Argument 2 to "isinstance" has incompatible type "object"; expected "_ClassInfo"  [arg-type]

I don't see a unit test demonstrating how to get this to work. I also am not understanding how the return type of rdflib.graph.Graph.query() -> rdflib.query.Result eventually ties to that Union that covers SELECT, CONSTRUCT and ASK return types.

How should the results of CONSTRUCT queries interact with mypy --strict? Is there a type alias I should use instead of that Tuple? I thought there was an alias somewhere in term.py to cover a general triple, but I'm having trouble finding the symbol now.

@aucampia
Copy link
Member

aucampia commented Mar 17, 2023

What is happening here is:

rdflib.graph.Graph.query returns a rdflib.query.Result:

rdflib/rdflib/graph.py

Lines 1509 to 1518 in 334787b

def query(
self,
query_object: Union[str, Query],
processor: Union[str, query.Processor] = "sparql",
result: Union[str, Type[query.Result]] = "sparql",
initNs: Optional[Mapping[str, Any]] = None, # noqa: N803
initBindings: Optional[Mapping[str, Identifier]] = None,
use_store_provided: bool = True,
**kwargs: Any,
) -> query.Result:

rdflib.query.Result's iter method returns a Union of Iterator[Union["_TripleType", bool, ResultRow]].

rdflib/rdflib/query.py

Lines 358 to 360 in 334787b

def __iter__(
self,
) -> Iterator[Union["_TripleType", bool, ResultRow]]:

This Union exists because depending on the type of the query, the result will be different.

The problem is, there is not enough information available to the type checker to determine what kind of result is being returned, so users have to do this themselves.

If you know you passed a CONSTRUCT query, then I think either of the following should eliminate type errors:

for result in graph.query(...):
    assert isinstance(result, tuple)
    ...
for result in cast(Iterable[Tuple[URIRef, URIRef, URIRef]], graph.query(...)):
    ...

I think the reason why assert isinstance(result, Tuple[URIRef, URIRef, URIRef]) causes problems is that you should not use types (i.e. stuff from the typing module) with isinstance and issubclass:

From https://docs.python.org/3/library/typing.html#typing.TypeVar:

In general, isinstance() and issubclass() should not be used with types.

@aucampia
Copy link
Member

One way to make this less annoying is to add an enum valued query_type parameter to Graph.query, and then use overload to annotate that the result will be different depending on the value of query_type passed in. That way you could do:

for triple in graph.query(..., query_type=QueryType.CONSTRUCT):
    ...

@ajnelson-nist
Copy link
Contributor Author

Thank you @aucampia , your first post addressed all of my questions.

Your second post ... I get what the extra parameter would do, but I feel like that's breaking some obtrusiveness boundary of the type system. I'm not sure, just a gut response that will probably change on another day.

@ajnelson-nist
Copy link
Contributor Author

I'm satisfied with this thread being closed, unless you think there's a graceful way to handle it.

My best guess would be supplanting the first argument to query with an optional exactly one of three keyword arguments, select_query: str, construct_query: str, ask_query: str, and maybe each of those could be union'd with a subclass of PreparedQuery.

@ajnelson-nist
Copy link
Contributor Author

Eh, pardon:

s/graceful way to handle it/more graceful way to handle automatic type hinting/

s/best guess/reflexive guess/

@aucampia
Copy link
Member

The graph.query(..., query_type=QueryType.CONSTRUCT) is not a super great solution, if we did something like that we would then do a check inside query() and raise a value error if the query is not of the requested type before executing, but still it is not nice, and I agree it does not feel right.

But definitely the current Graph.query() interface is very annoying with mypy.

I will close this issue, but we should think of ways to make the interface friendlier to use with type checking than it is now.

@ajnelson-nist
Copy link
Contributor Author

I caught another typo in my split-out suggestion: I'd confused PreparedQuery with rdflib.plugins.sparql.processor.prepareQuery, and its return result rdflib.plugins.sparql.sparql.Query. Maybe that Query class could be subclassed with the various SPARQL query types, and that could be used for type-sensitive calls to rdflib.graph.Graph.query.

@aucampia
Copy link
Member

Maybe that Query class could be subclassed with the various SPARQL query types, and that could be used for type-sensitive calls to rdflib.graph.Graph.query.

This would be a cleaner and better solution.

@ajnelson-nist
Copy link
Contributor Author

I'm now fine with leaving this Issue open, to relate to a PR implementing that.

I can't start that PR today.

@aucampia
Copy link
Member

I'm now fine with leaving this Issue open, to relate to a PR implementing that.

I can't start that PR today.

Sounds good, the title should maybe be adjusted, but the interface should definitely be improved and best to track it.

@ajnelson-nist ajnelson-nist changed the title mypy --strict now confused on RDFLib 6.3.0 SPARQL results SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with mypy --strict Mar 17, 2023
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Mar 17, 2023
RDFLib 6.3.0, released today, included an update to type hints for the
`Graph.query` function.  Issue 2283 logs some of the new induced type
hints required to get `mypy --strict` to pass.

This patch is needed to respond to the newly induced type assertion
requirements.  It is intentionally based off of `main` in case a
hotfix `case-utils` release is required.

References:
* RDFLib/rdflib#2283

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 17, 2023
@aucampia aucampia added enhancement New feature or request core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` labels Mar 19, 2023
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023
ajnelson-nist added a commit to ajnelson-nist/CASE-Examples-QC that referenced this issue Mar 20, 2023
This patch upgrades all scripts to pass static type review, including
updates due to RDFLib 6.3.1.

References:
* RDFLib/rdflib#2283
* casework/CASE-Utilities-Python#62

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Implementation-GNU-Time that referenced this issue Mar 20, 2023
Due to a revision in RDFLib type signature requirements with 6.3.0, and
an overlapping-in-maintenance-period break in a `pre-commit` package,
this patch batches updates to fix two dependencies' issues at once.

This patch is built off of `main` because this could potentially be part
of a bugfix release.

References:
* RDFLib/rdflib#2283
@tobiasschweizer
Copy link

Hi there,

I am facing a similar issue in my code. I am doing a SELECT query and the possible result types are Union["_TripleType", bool, ResultRow]. However, for a SELECT query only ResultRow may be returned. I solved this by using isinstance for now.

I would strongly support what you proposed in #2283 (comment).

Let me know if I can be of any assistance.

ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Mar 27, 2023
This patch is known to be an incomplete implementation.  Running
`mypy test/test_typing.py` before this patch raised 0 type errors.
After just adding the two new tests, 3.  After starting the subclasses
of Query, 30.

References:
* RDFLib#2283

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist
Copy link
Contributor Author

Started PR 2320 to address this Issue.

I started in "the middle" of the implementation, enacting the most obvious implementation steps I'd noted above and that @tobiasschweizer also pointed at.

After doing so, I now think the challenge for the type reviewer is the simplest type interface, Graph.query(x: str) can't pass type review stand-alone because ultimately some interpretation would need to happen on the strong. This issue bubbles up to parseQuery not having a good way inside it to assign a Query subclass type at type-review time, e.g.:

my_select_query = "SELECT ..."
my_query_object: SelectQuery = parseQuery(my_select_query)

The string has to be parsed to satisfy that type designation. I know there's some things that can be done with expected string literal values and the type system, but I'm not so sure determining which SPARQL query form is in a string (especially with preceding PREFIX statements) is a fair ask.

My current thinking is that parseQuery needs to be specialized for the most "fluid" type review experience for the user.

my_select_query = "SELECT ..."
my_query_object: SelectQuery = parseSelectQuery(my_select_query)
for result in graph.query(my_query_object): ...

parseSelectQuery would just be effectively a pass-through to parseQuery, but would type-assert that the return value of parseQuery is a SelectQuery.

That currently feels like the least intrusive type assertion framework to me. This feels like a good point to pause for sanity-checking. I'll need to come back to this in a few days.

@aucampia
Copy link
Member

parseSelectQuery would just be effectively a pass-through to parseQuery, but would type-assert that the return value of parseQuery is a SelectQuery.

There sadly are not that many better options, and even if there were, they would not be that distinct in operation.

I think maybe the right function may be prepareQuery instead of parseQuery though:

def prepareQuery(
queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None
) -> Query:
"""
Parse and translate a SPARQL Query
"""
ret = translateQuery(parseQuery(queryString), base, initNs)
ret._original_args = (queryString, initNs, base)
return ret

We would need to create different query types, though, so one option is to have:

class SelectQuery

    @classmethod
    def prepare(cls,  queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None) -> SelectQuery:
        ...

That would then still do the type-asserts internally, but really from a user point of view it is not that relevant, all they see is a function that either does not return, or returns a SelectQuery. Whether that is because it can't parse anything else, or because it parses other things and just raises exceptions for them, is essentially the same to users (given we ingore the compute cost of parsing)

The benefit of having a method on each subclass is that it makes for maybe slightly more succinct code, and seems a bit more natural to than to have the different free functions.

@aucampia
Copy link
Member

aucampia commented Mar 27, 2023

I know there's some things that can be done with expected string literal values and the type system, but I'm not so sure determining which SPARQL query form is in a string (especially with preceding PREFIX statements) is a fair ask.

Just to confirm here, you are right, the type system can't really distinguish from the current arguments what is going on with the query.

Our options are:

  • Use separate functions (what you suggest)
  • Add a differentiator that the type system can understand (e.g. query_type: QueryType = QueryType.SELECT or query_type: Literal["select", "construct", ...] = "select")

The differentiator works, but it makes for more complex type signatures also, and more complex documentation. Separate functions are better in that regard, but there are tradeoffs both ways.

I somewhat prefer having class methods on the differentiated subclasses, this still comes down to separate functions, but places them in a bit more natural place from my point of view.

@aucampia
Copy link
Member

aucampia commented Mar 27, 2023

We would need to create different query types, though, so one option is to have:

class SelectQuery

    @classmethod
    def prepare(cls,  queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None) -> SelectQuery:
        ...

This could actually go to the base class and be typed as returning typing_extensions.Self (or just return cls(...)), which should be picked up as Self.

Then as long as the subclass constructors validate the query (i.e. so long as SelectQueury.__init__ raises an exception if it gets parameters that are not for selecting) then it should be fine.

So, a very rudimentary option is:

class Query:
    def __init__(self, prologue: Prologue, algebra: CompValue):
        ...

    @classmethod
    def prepare(cls, ...):
        result = prepareQuery(...)
        return cls(result.prologue, result.algebra)

class SelectQuery(Query):
    def __init_(self, prologue: Prologue, algebra: CompValue):
        # assert that prologue and/or algebra is for select, if not raise ValueError
        # init super type

It could be more sophisticated behind the scenes, but this should do everything we need and presents the right interface to users.

Users can then do

result = graph.query(SelectQuery.prepare(...))

And should get an appropriately typed result.

The interface of graph.query will still have to be overloaded though, so it is not that different from passing an enum, differentiator, but this is a bit more natural.

ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this issue Mar 31, 2023
Some of these issues are due to RDFLib Issue 2283.  Some are due to
symbol reuse.

A follow-on patch will regenerate Make-managed files.

References:
* RDFLib/rdflib#2283

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this issue Mar 31, 2023
Some of these issues are due to RDFLib Issue 2283.  Some are due to
symbol reuse.

A follow-on patch will regenerate Make-managed files.

References:
* RDFLib/rdflib#2283

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Implementation-ExifTool that referenced this issue Apr 4, 2023
No effects were observed on Make-managed files.

References:
* RDFLib/rdflib#2283

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to ajnelson-nist/pySHACL that referenced this issue Aug 28, 2024
This patch handles a type review matter not yet automatically provided
by `Graph.query`, though rdflib Issue 2283 is planned to provide a
mechanism for this.

References:
* RDFLib/rdflib#2283

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Sep 10, 2024
Potentially, adding an assert() addresses the subclass type review,
per `@classmethod` docs:

> ... If a class method is called for a derived class, the derived class
> object is passed as the implied first argument.

This patch also reverts the first-draft change to `Graph.query`.

References:
* https://docs.python.org/3/library/functions.html#classmethod
* RDFLib#2283 (comment)

Suggested-by: Iwan Aucamp <[email protected]>
Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Sep 10, 2024
References:
* RDFLib#2283

Signed-off-by: Alex Nelson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants