Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix return of the context of a given identifier #845

Open
wants to merge 3 commits into
base: 8.x
Choose a base branch
from

Conversation

GreatYYX
Copy link

The query with GRAPH doesn't work properly if graph is added from rdflib.graph.Dataset.add_graph(). See the following example:

from rdflib.graph import Graph, Dataset

data = '''
<http://data.yyx.me/jack> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_a> .
<http://data.yyx.me/david> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_b> .
'''

ds = Dataset()
g = Graph(identifier='subgraph')
g.parse(data=data, format='n3')
ds.add_graph(g)
res = ds.query('''
SELECT ?s ?p ?o
WHERE {
    GRAPH <subgraph> {
        ?s ?p ?o
    }
}
''')
for r in res:
    print(r)

Nothing returns.

But if you replace query (replace named graph subgraph to a variable) to

SELECT ?g ?s ?p ?o
WHERE {
    GRAPH ?g {
        ?s ?p ?o
    }
}

you can get correct results:

(rdflib.term.URIRef('subgraph'), rdflib.term.URIRef('http://data.yyx.me/david'), rdflib.term.URIRef('http://onto.yyx.me/work_for'), rdflib.term.URIRef('http://data.yyx.me/company_b'))
(rdflib.term.URIRef('subgraph'), rdflib.term.URIRef('http://data.yyx.me/jack'), rdflib.term.URIRef('http://onto.yyx.me/work_for'), rdflib.term.URIRef('http://data.yyx.me/company_a'))

This issue is caused by the implementation of rdflib.graph.ConjunctiveGraph.get_context().

@coveralls
Copy link

coveralls commented Aug 31, 2018

Coverage Status

Coverage increased (+0.06%) to 75.631% when pulling ab0ad14 on GreatYYX:master into 827eabd on RDFLib:master.

@gromgull
Copy link
Member

gromgull commented Oct 9, 2018

This is re: #167

@gromgull
Copy link
Member

gromgull commented Oct 9, 2018

It's been a while, but I prefer this: #409 - I think 🤔

@GreatYYX
Copy link
Author

GreatYYX commented Oct 9, 2018

@gromgull Wow, 4 years ago 🤦‍♂️ I personally agree with your solution, two levels of APIs, one for store one for graph (Why it's not merged? Haven't finished yet?). Even if changes need from plugin developers, it's still better then affecting end-users.

But I really confuse of current dataset? My understanding is, dataset is something that can contain multiple graphs. API level for dataset is that if call dataset.add_graph, a graph attaches to it (or merge graphs if already there); if call dataset.add_quad, the triple finally stores into corresponding graph (create one if not exists). If graph URI is not specified (or by calling dataset.add_triple), default graph in dataset is used. But the implementation right now does not make sense.

If I create graph first and add data to it then attach it to dataset:

ds = Dataset()
g = Graph(identifier='subgraph')
g.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a')))
ds.add_graph(g)

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

I printed out _IOMemory__obj2int because I think this is a part of triple indices in IOMemory store (correct me if I'm wrong).

The output:

in sub graph: 4567309560 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -1299107138, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1562118502, rdflib.term.URIRef('http://data.yyx.me/company_a'): -192532751, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: -1988344489}
in dataset: 4541685488 {None: None}

Triple stored in graph, which is correct.

But if I change the code to add quad to it:

ds = Dataset()
ds.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a'),URIRef('subgraph')))

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

The output:

in sub graph: 4375498480 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -424282055, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1479980341, rdflib.term.URIRef('http://data.yyx.me/company_a'): -721699414, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1870356057}
in dataset: 4375498480 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -424282055, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1479980341, rdflib.term.URIRef('http://data.yyx.me/company_a'): -721699414, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1870356057}

dataset.store refers to graph.store. Why?

Now if I concatenate above snippets (changed a little bit in URIs but same graph URI):

ds = Dataset()

g = Graph(identifier='subgraph')
g.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a')))
ds.add_graph(g)

ds.add((URIRef('http://yyx.me/jack2'),URIRef('http://onto.yyx.me/work_for2'),URIRef('http://data.yyx.me/company_a2'),URIRef('subgraph')))

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

The output:

in sub graph: 4337822968 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -910005344, rdflib.term.URIRef('http://onto.yyx.me/work_for'): 770331380, rdflib.term.URIRef('http://data.yyx.me/company_a'): -335385546, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: -559991583}
in dataset: 4312198896 {None: None, rdflib.term.URIRef('http://yyx.me/jack2'): 1534701893, rdflib.term.URIRef('http://onto.yyx.me/work_for2'): -1673133418, rdflib.term.URIRef('http://data.yyx.me/company_a2'): 1155685328, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1022967075}

:-( I think that's why get_context is so hard to deal with.

If store handles graph store (by graph id), graph handles triples and dataset handles multiple graphs (default graph can store in dataset.store since dataset a concrete graph or create a graph named default) or quads, data finally stores in corresponding graphs, then get_context is really clear and simple.

@nicholascar
Copy link
Member

Hi @GreatYYX, are you still interested in solving this issue? We area just about to release 5.0.0 which won't include any changes like this or PR #409 however we will then move to working on 6.0.0, due in perhaps a couple of months, and that release will include a fix to graph ID issues in rdflib, hopefully comprehensively (i.e. for Graph, ConjunctiveGraph, Store & Dataset).

@GreatYYX
Copy link
Author

Hi @nicholascar , it's been a while, but I'm happy to hear that this project is still being pushed forward. LMK if there is anything i can help.

@white-gecko white-gecko requested review from ashleysommer and nicholascar and removed request for ashleysommer and nicholascar May 1, 2020 22:32
@white-gecko
Copy link
Member

Do we want to close this in favor of #409 or do we want to accept it for now?

@nicholascar
Copy link
Member

Close/Open Travis

@nicholascar nicholascar closed this May 3, 2020
@nicholascar nicholascar reopened this May 3, 2020
@nicholascar
Copy link
Member

nicholascar commented May 3, 2020

It looks like the proponent prefers #409 too, but raised a number of questions. I propose leaving this open but with the expectation that we carry #409 forward after address the questions in this PR, which will likely mean changes/additions to #409.

@nicholascar
Copy link
Member

This issue is all about what type of object is needed to refer to a graph. The object in the example is a string but you need to use a URIRef.

Proposed solution for 6.0.0: use type hints (since Python 3.6 will be the minimum supported version) indicating that a URIRef is needed. We plan to deprecate the use of Graph.

@ghost ghost added the id-as-cntxt tracking related issues label Feb 12, 2022
@ghost ghost added the 7.0 Changes planned for version 7 label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
7.0 Changes planned for version 7 cleanup id-as-cntxt tracking related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants