SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with `mypy --strict` #2283

ajnelson-nist · 2023-03-17T18:04:55Z

I just hit an unpleasant surprise on running unit tests on a project that worked yesterday, but failed today. The issue I hit was in using mypy --strict on a SPARQL CONSTRUCT query. A similar issue exists for SELECT queries, but I've resolved that one.

With RDFLib 6.2.0, this worked:

    for result in in_graph.query(construct_query):
        assert isinstance(result[0], URIRef)
        assert isinstance(result[1], URIRef)

Those asserts are a pattern I've had to implement to convince mypy --strict about types when I have later code assuming e.g. result[0] (and [1]) is a URIRef.

With 6.3.0 (released today), I get a complaint that I haven't guaranteed result is indexable. This is true for SELECT and CONSTRUCT queries.

error: Value of type "Union[Tuple[Node, Node, Node], bool, ResultRow]" is not indexable  [index]

I can get by with the following extra bits for SELECT queries:

#!/usr/bin/env/python

# ...
from rdflib.query import ResultRow
# ...
for result in graph.query(select_query):
    assert isinstance(result, ResultRow)

That pretty much followed from the mypy complaint I excerpted above, and checking for where ResultRow was defined. But, for CONSTRUCT queries, the Tuple syntax doesn't quite work the same.

#!/usr/bin/env/python

# ...
for result in graph.query(construct_query):
    assert isinstance(result, Tuple[URIRef, URIRef, URIRef])

mypy --strict throws a different error on that:

error: Argument 2 to "isinstance" has incompatible type "object"; expected "_ClassInfo"  [arg-type]

I don't see a unit test demonstrating how to get this to work. I also am not understanding how the return type of rdflib.graph.Graph.query() -> rdflib.query.Result eventually ties to that Union that covers SELECT, CONSTRUCT and ASK return types.

How should the results of CONSTRUCT queries interact with mypy --strict? Is there a type alias I should use instead of that Tuple? I thought there was an alias somewhere in term.py to cover a general triple, but I'm having trouble finding the symbol now.

The text was updated successfully, but these errors were encountered:

aucampia · 2023-03-17T18:29:15Z

What is happening here is:

rdflib.graph.Graph.query returns a rdflib.query.Result:

rdflib/rdflib/graph.py

Lines 1509 to 1518 in 334787b

    
           def query( 
        
               self, 
        
               query_object: Union[str, Query], 
        
               processor: Union[str, query.Processor] = "sparql", 
        
               result: Union[str, Type[query.Result]] = "sparql", 
        
               initNs: Optional[Mapping[str, Any]] = None,  # noqa: N803 
        
               initBindings: Optional[Mapping[str, Identifier]] = None, 
        
               use_store_provided: bool = True, 
        
               **kwargs: Any, 
        
           ) -> query.Result:

rdflib.query.Result's iter method returns a Union of Iterator[Union["_TripleType", bool, ResultRow]].

rdflib/rdflib/query.py

Lines 358 to 360 in 334787b

    
           def __iter__( 
        
               self, 
        
           ) -> Iterator[Union["_TripleType", bool, ResultRow]]:

This Union exists because depending on the type of the query, the result will be different.

The problem is, there is not enough information available to the type checker to determine what kind of result is being returned, so users have to do this themselves.

If you know you passed a CONSTRUCT query, then I think either of the following should eliminate type errors:

for result in graph.query(...):
    assert isinstance(result, tuple)
    ...

for result in cast(Iterable[Tuple[URIRef, URIRef, URIRef]], graph.query(...)):
    ...

I think the reason why assert isinstance(result, Tuple[URIRef, URIRef, URIRef]) causes problems is that you should not use types (i.e. stuff from the typing module) with isinstance and issubclass:

From https://docs.python.org/3/library/typing.html#typing.TypeVar:

In general, isinstance() and issubclass() should not be used with types.

aucampia · 2023-03-17T18:33:12Z

One way to make this less annoying is to add an enum valued query_type parameter to Graph.query, and then use overload to annotate that the result will be different depending on the value of query_type passed in. That way you could do:

for triple in graph.query(..., query_type=QueryType.CONSTRUCT):
    ...

ajnelson-nist · 2023-03-17T18:35:22Z

Thank you @aucampia , your first post addressed all of my questions.

Your second post ... I get what the extra parameter would do, but I feel like that's breaking some obtrusiveness boundary of the type system. I'm not sure, just a gut response that will probably change on another day.

ajnelson-nist · 2023-03-17T18:36:57Z

I'm satisfied with this thread being closed, unless you think there's a graceful way to handle it.

My best guess would be supplanting the first argument to query with an optional exactly one of three keyword arguments, select_query: str, construct_query: str, ask_query: str, and maybe each of those could be union'd with a subclass of PreparedQuery.

ajnelson-nist · 2023-03-17T18:38:20Z

Eh, pardon:

s/graceful way to handle it/more graceful way to handle automatic type hinting/

s/best guess/reflexive guess/

aucampia · 2023-03-17T18:39:27Z

The graph.query(..., query_type=QueryType.CONSTRUCT) is not a super great solution, if we did something like that we would then do a check inside query() and raise a value error if the query is not of the requested type before executing, but still it is not nice, and I agree it does not feel right.

But definitely the current Graph.query() interface is very annoying with mypy.

I will close this issue, but we should think of ways to make the interface friendlier to use with type checking than it is now.

ajnelson-nist · 2023-03-17T19:08:50Z

I caught another typo in my split-out suggestion: I'd confused PreparedQuery with rdflib.plugins.sparql.processor.prepareQuery, and its return result rdflib.plugins.sparql.sparql.Query. Maybe that Query class could be subclassed with the various SPARQL query types, and that could be used for type-sensitive calls to rdflib.graph.Graph.query.

aucampia · 2023-03-17T19:18:46Z

Maybe that Query class could be subclassed with the various SPARQL query types, and that could be used for type-sensitive calls to rdflib.graph.Graph.query.

This would be a cleaner and better solution.

ajnelson-nist · 2023-03-17T19:22:23Z

I'm now fine with leaving this Issue open, to relate to a PR implementing that.

I can't start that PR today.

aucampia · 2023-03-17T19:24:19Z

I'm now fine with leaving this Issue open, to relate to a PR implementing that.

I can't start that PR today.

Sounds good, the title should maybe be adjusted, but the interface should definitely be improved and best to track it.

RDFLib 6.3.0, released today, included an update to type hints for the `Graph.query` function. Issue 2283 logs some of the new induced type hints required to get `mypy --strict` to pass. This patch is needed to respond to the newly induced type assertion requirements. It is intentionally based off of `main` in case a hotfix `case-utils` release is required. References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

References: * RDFLib/rdflib#2283

References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

This patch upgrades all scripts to pass static type review, including updates due to RDFLib 6.3.1. References: * RDFLib/rdflib#2283 * casework/CASE-Utilities-Python#62 Signed-off-by: Alex Nelson <[email protected]>

Due to a revision in RDFLib type signature requirements with 6.3.0, and an overlapping-in-maintenance-period break in a `pre-commit` package, this patch batches updates to fix two dependencies' issues at once. This patch is built off of `main` because this could potentially be part of a bugfix release. References: * RDFLib/rdflib#2283

tobiasschweizer · 2023-03-27T06:45:49Z

Hi there,

I am facing a similar issue in my code. I am doing a SELECT query and the possible result types are Union["_TripleType", bool, ResultRow]. However, for a SELECT query only ResultRow may be returned. I solved this by using isinstance for now.

I would strongly support what you proposed in #2283 (comment).

Let me know if I can be of any assistance.

This patch is known to be an incomplete implementation. Running `mypy test/test_typing.py` before this patch raised 0 type errors. After just adding the two new tests, 3. After starting the subclasses of Query, 30. References: * RDFLib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist · 2023-03-27T15:37:56Z

Started PR 2320 to address this Issue.

I started in "the middle" of the implementation, enacting the most obvious implementation steps I'd noted above and that @tobiasschweizer also pointed at.

After doing so, I now think the challenge for the type reviewer is the simplest type interface, Graph.query(x: str) can't pass type review stand-alone because ultimately some interpretation would need to happen on the strong. This issue bubbles up to parseQuery not having a good way inside it to assign a Query subclass type at type-review time, e.g.:

my_select_query = "SELECT ..."
my_query_object: SelectQuery = parseQuery(my_select_query)

The string has to be parsed to satisfy that type designation. I know there's some things that can be done with expected string literal values and the type system, but I'm not so sure determining which SPARQL query form is in a string (especially with preceding PREFIX statements) is a fair ask.

My current thinking is that parseQuery needs to be specialized for the most "fluid" type review experience for the user.

my_select_query = "SELECT ..."
my_query_object: SelectQuery = parseSelectQuery(my_select_query)
for result in graph.query(my_query_object): ...

parseSelectQuery would just be effectively a pass-through to parseQuery, but would type-assert that the return value of parseQuery is a SelectQuery.

That currently feels like the least intrusive type assertion framework to me. This feels like a good point to pause for sanity-checking. I'll need to come back to this in a few days.

aucampia · 2023-03-27T21:45:07Z

parseSelectQuery would just be effectively a pass-through to parseQuery, but would type-assert that the return value of parseQuery is a SelectQuery.

There sadly are not that many better options, and even if there were, they would not be that distinct in operation.

I think maybe the right function may be prepareQuery instead of parseQuery though:

rdflib/rdflib/plugins/sparql/processor.py

Lines 21 to 29 in 0d07f9b

    
           def prepareQuery( 
        
               queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None 
        
           ) -> Query: 
        
               """ 
        
               Parse and translate a SPARQL Query 
        
               """ 
        
               ret = translateQuery(parseQuery(queryString), base, initNs) 
        
               ret._original_args = (queryString, initNs, base) 
        
               return ret

We would need to create different query types, though, so one option is to have:

class SelectQuery

    @classmethod
    def prepare(cls,  queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None) -> SelectQuery:
        ...

That would then still do the type-asserts internally, but really from a user point of view it is not that relevant, all they see is a function that either does not return, or returns a SelectQuery. Whether that is because it can't parse anything else, or because it parses other things and just raises exceptions for them, is essentially the same to users (given we ingore the compute cost of parsing)

The benefit of having a method on each subclass is that it makes for maybe slightly more succinct code, and seems a bit more natural to than to have the different free functions.

aucampia · 2023-03-27T21:49:04Z

I know there's some things that can be done with expected string literal values and the type system, but I'm not so sure determining which SPARQL query form is in a string (especially with preceding PREFIX statements) is a fair ask.

Just to confirm here, you are right, the type system can't really distinguish from the current arguments what is going on with the query.

Our options are:

Use separate functions (what you suggest)
Add a differentiator that the type system can understand (e.g. query_type: QueryType = QueryType.SELECT or query_type: Literal["select", "construct", ...] = "select")

The differentiator works, but it makes for more complex type signatures also, and more complex documentation. Separate functions are better in that regard, but there are tradeoffs both ways.

I somewhat prefer having class methods on the differentiated subclasses, this still comes down to separate functions, but places them in a bit more natural place from my point of view.

aucampia · 2023-03-27T22:08:36Z

We would need to create different query types, though, so one option is to have:

class SelectQuery

    @classmethod
    def prepare(cls,  queryString: str, initNs: Mapping[str, Any] = {}, base: Optional[str] = None) -> SelectQuery:
        ...

This could actually go to the base class and be typed as returning typing_extensions.Self (or just return cls(...)), which should be picked up as Self.

Then as long as the subclass constructors validate the query (i.e. so long as SelectQueury.__init__ raises an exception if it gets parameters that are not for selecting) then it should be fine.

So, a very rudimentary option is:

class Query:
    def __init__(self, prologue: Prologue, algebra: CompValue):
        ...

    @classmethod
    def prepare(cls, ...):
        result = prepareQuery(...)
        return cls(result.prologue, result.algebra)

class SelectQuery(Query):
    def __init_(self, prologue: Prologue, algebra: CompValue):
        # assert that prologue and/or algebra is for select, if not raise ValueError
        # init super type

It could be more sophisticated behind the scenes, but this should do everything we need and presents the right interface to users.

Users can then do

result = graph.query(SelectQuery.prepare(...))

And should get an appropriately typed result.

The interface of graph.query will still have to be overloaded though, so it is not that different from passing an enum, differentiator, but this is a bit more natural.

Some of these issues are due to RDFLib Issue 2283. Some are due to symbol reuse. A follow-on patch will regenerate Make-managed files. References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

No effects were observed on Make-managed files. References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

This patch handles a type review matter not yet automatically provided by `Graph.query`, though rdflib Issue 2283 is planned to provide a mechanism for this. References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

Potentially, adding an assert() addresses the subclass type review, per `@classmethod` docs: > ... If a class method is called for a derived class, the derived class > object is passed as the implied first argument. This patch also reverts the first-draft change to `Graph.query`. References: * https://docs.python.org/3/library/functions.html#classmethod * RDFLib#2283 (comment) Suggested-by: Iwan Aucamp <[email protected]> Signed-off-by: Alex Nelson <[email protected]>

References: * RDFLib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist changed the title ~~mypy --strict now confused on RDFLib 6.3.0 SPARQL results~~ SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with mypy --strict Mar 17, 2023

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 17, 2023

Update Python type signatures for compatibility with RDFLib 6.3.0

938ceb9

References: * RDFLib/rdflib#2283

aucampia added enhancement New feature or request core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` labels Mar 19, 2023

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023

Update SPARQL query result strict typing

7679d19

References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023

Update SPARQL query result strict typing

8634bb7

References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 20, 2023

Update SPARQL query result strict typing

12cebcc

References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist added a commit to casework/CASE-Implementation-ExifTool that referenced this issue Apr 4, 2023

Address type issues

e930faa

No effects were observed on Make-managed files. References: * RDFLib/rdflib#2283 Signed-off-by: Alex Nelson <[email protected]>

ajnelson-nist mentioned this issue Aug 28, 2024

Give firm type hint on SELECT query result RDFLib/pySHACL#243

Merged

ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Sep 10, 2024

Draft remaining tests

a22213c

References: * RDFLib#2283 Signed-off-by: Alex Nelson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with `mypy --strict` #2283

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with `mypy --strict` #2283

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023 •

edited

Loading

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

tobiasschweizer commented Mar 27, 2023

ajnelson-nist commented Mar 27, 2023

aucampia commented Mar 27, 2023

aucampia commented Mar 27, 2023 •

edited

Loading

aucampia commented Mar 27, 2023 •

edited

Loading

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with mypy --strict #2283

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with mypy --strict #2283

Comments

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023 • edited Loading

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

ajnelson-nist commented Mar 17, 2023

aucampia commented Mar 17, 2023

tobiasschweizer commented Mar 27, 2023

ajnelson-nist commented Mar 27, 2023

aucampia commented Mar 27, 2023

aucampia commented Mar 27, 2023 • edited Loading

aucampia commented Mar 27, 2023 • edited Loading

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with `mypy --strict` #2283

SPARQL results starting with RDFLib 6.3.0 induce new inlined type check needs with `mypy --strict` #2283

aucampia commented Mar 17, 2023 •

edited

Loading

aucampia commented Mar 27, 2023 •

edited

Loading

aucampia commented Mar 27, 2023 •

edited

Loading