fix return of the context of a given identifier #845

GreatYYX · 2018-08-31T18:29:22Z

The query with GRAPH doesn't work properly if graph is added from rdflib.graph.Dataset.add_graph(). See the following example:

from rdflib.graph import Graph, Dataset

data = '''
<http://data.yyx.me/jack> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_a> .
<http://data.yyx.me/david> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_b> .
'''

ds = Dataset()
g = Graph(identifier='subgraph')
g.parse(data=data, format='n3')
ds.add_graph(g)
res = ds.query('''
SELECT ?s ?p ?o
WHERE {
    GRAPH <subgraph> {
        ?s ?p ?o
    }
}
''')
for r in res:
    print(r)

Nothing returns.

But if you replace query (replace named graph subgraph to a variable) to

SELECT ?g ?s ?p ?o
WHERE {
    GRAPH ?g {
        ?s ?p ?o
    }
}

you can get correct results:

(rdflib.term.URIRef('subgraph'), rdflib.term.URIRef('http://data.yyx.me/david'), rdflib.term.URIRef('http://onto.yyx.me/work_for'), rdflib.term.URIRef('http://data.yyx.me/company_b'))
(rdflib.term.URIRef('subgraph'), rdflib.term.URIRef('http://data.yyx.me/jack'), rdflib.term.URIRef('http://onto.yyx.me/work_for'), rdflib.term.URIRef('http://data.yyx.me/company_a'))

This issue is caused by the implementation of rdflib.graph.ConjunctiveGraph.get_context().

coveralls · 2018-08-31T18:43:43Z

Coverage increased (+0.06%) to 75.631% when pulling ab0ad14 on GreatYYX:master into 827eabd on RDFLib:master.

gromgull · 2018-10-09T14:38:40Z

This is re: #167

gromgull · 2018-10-09T14:41:00Z

It's been a while, but I prefer this: #409 - I think 🤔

GreatYYX · 2018-10-09T23:45:26Z

@gromgull Wow, 4 years ago 🤦‍♂️ I personally agree with your solution, two levels of APIs, one for store one for graph (Why it's not merged? Haven't finished yet?). Even if changes need from plugin developers, it's still better then affecting end-users.

But I really confuse of current dataset? My understanding is, dataset is something that can contain multiple graphs. API level for dataset is that if call dataset.add_graph, a graph attaches to it (or merge graphs if already there); if call dataset.add_quad, the triple finally stores into corresponding graph (create one if not exists). If graph URI is not specified (or by calling dataset.add_triple), default graph in dataset is used. But the implementation right now does not make sense.

If I create graph first and add data to it then attach it to dataset:

ds = Dataset()
g = Graph(identifier='subgraph')
g.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a')))
ds.add_graph(g)

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

I printed out _IOMemory__obj2int because I think this is a part of triple indices in IOMemory store (correct me if I'm wrong).

The output:

in sub graph: 4567309560 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -1299107138, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1562118502, rdflib.term.URIRef('http://data.yyx.me/company_a'): -192532751, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: -1988344489}
in dataset: 4541685488 {None: None}

Triple stored in graph, which is correct.

But if I change the code to add quad to it:

ds = Dataset()
ds.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a'),URIRef('subgraph')))

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

The output:

in sub graph: 4375498480 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -424282055, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1479980341, rdflib.term.URIRef('http://data.yyx.me/company_a'): -721699414, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1870356057}
in dataset: 4375498480 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -424282055, rdflib.term.URIRef('http://onto.yyx.me/work_for'): -1479980341, rdflib.term.URIRef('http://data.yyx.me/company_a'): -721699414, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1870356057}

dataset.store refers to graph.store. Why?

Now if I concatenate above snippets (changed a little bit in URIs but same graph URI):

ds = Dataset()

g = Graph(identifier='subgraph')
g.add((URIRef('http://yyx.me/jack'),URIRef('http://onto.yyx.me/work_for'),URIRef('http://data.yyx.me/company_a')))
ds.add_graph(g)

ds.add((URIRef('http://yyx.me/jack2'),URIRef('http://onto.yyx.me/work_for2'),URIRef('http://data.yyx.me/company_a2'),URIRef('subgraph')))

c = ds.store._IOMemory__all_contexts.pop()
print('in sub graph:', id(c.store), c.store._IOMemory__obj2int)
print('in dataset:', id(ds.store), ds.store._IOMemory__obj2int)

The output:

in sub graph: 4337822968 {None: None, rdflib.term.URIRef('http://yyx.me/jack'): -910005344, rdflib.term.URIRef('http://onto.yyx.me/work_for'): 770331380, rdflib.term.URIRef('http://data.yyx.me/company_a'): -335385546, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: -559991583}
in dataset: 4312198896 {None: None, rdflib.term.URIRef('http://yyx.me/jack2'): 1534701893, rdflib.term.URIRef('http://onto.yyx.me/work_for2'): -1673133418, rdflib.term.URIRef('http://data.yyx.me/company_a2'): 1155685328, <Graph identifier=subgraph (<class 'rdflib.graph.Graph'>)>: 1022967075}

:-( I think that's why get_context is so hard to deal with.

If store handles graph store (by graph id), graph handles triples and dataset handles multiple graphs (default graph can store in dataset.store since dataset a concrete graph or create a graph named default) or quads, data finally stores in corresponding graphs, then get_context is really clear and simple.

nicholascar · 2020-03-28T07:21:09Z

Hi @GreatYYX, are you still interested in solving this issue? We area just about to release 5.0.0 which won't include any changes like this or PR #409 however we will then move to working on 6.0.0, due in perhaps a couple of months, and that release will include a fix to graph ID issues in rdflib, hopefully comprehensively (i.e. for Graph, ConjunctiveGraph, Store & Dataset).

GreatYYX · 2020-04-15T05:05:32Z

Hi @nicholascar , it's been a while, but I'm happy to hear that this project is still being pushed forward. LMK if there is anything i can help.

white-gecko · 2020-05-01T22:36:01Z

Do we want to close this in favor of #409 or do we want to accept it for now?

nicholascar · 2020-05-03T02:28:04Z

Close/Open Travis

nicholascar · 2020-05-03T02:30:34Z

It looks like the proponent prefers #409 too, but raised a number of questions. I propose leaving this open but with the expectation that we carry #409 forward after address the questions in this PR, which will likely mean changes/additions to #409.

nicholascar · 2020-07-30T04:44:32Z

This issue is all about what type of object is needed to refer to a graph. The object in the example is a string but you need to use a URIRef.

Proposed solution for 6.0.0: use type hints (since Python 3.6 will be the minimum supported version) indicating that a URIRef is needed. We plan to deprecate the use of Graph.

GreatYYX added 2 commits August 31, 2018 11:12

fix return of the context of a given identifier

c9958d3

make sure context is type Graph

c828029

nicholascar added this to the rdflib 6.0.0 milestone Mar 28, 2020

nicholascar added the cleanup label Mar 28, 2020

nicholascar mentioned this pull request Mar 28, 2020

Move Store API to work with identifiers, not graphs #409

Closed

white-gecko mentioned this pull request Apr 2, 2020

Update Lexical-to-value mapping for xsd:boolean #913

Closed

Merge branch 'master' into HEAD

ab0ad14

white-gecko requested review from ashleysommer and nicholascar and removed request for ashleysommer and nicholascar May 1, 2020 22:32

nicholascar closed this May 3, 2020

nicholascar reopened this May 3, 2020

ghost added the id-as-cntxt tracking related issues label Feb 12, 2022

ghost added the 7.0 Changes planned for version 7 label Jun 8, 2022

white-gecko modified the milestones: rdflib 6.x.x, 2022 June release, 2022 July release (6.2.0), 2022 October release (7.0.0) Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix return of the context of a given identifier #845

fix return of the context of a given identifier #845

GreatYYX commented Aug 31, 2018

coveralls commented Aug 31, 2018 •

edited

Loading

gromgull commented Oct 9, 2018

gromgull commented Oct 9, 2018

GreatYYX commented Oct 9, 2018 •

edited

Loading

nicholascar commented Mar 28, 2020

GreatYYX commented Apr 15, 2020

white-gecko commented May 1, 2020

nicholascar commented May 3, 2020

nicholascar commented May 3, 2020 •

edited

Loading

nicholascar commented Jul 30, 2020

fix return of the context of a given identifier #845

Are you sure you want to change the base?

fix return of the context of a given identifier #845

Conversation

GreatYYX commented Aug 31, 2018

coveralls commented Aug 31, 2018 • edited Loading

gromgull commented Oct 9, 2018

gromgull commented Oct 9, 2018

GreatYYX commented Oct 9, 2018 • edited Loading

nicholascar commented Mar 28, 2020

GreatYYX commented Apr 15, 2020

white-gecko commented May 1, 2020

nicholascar commented May 3, 2020

nicholascar commented May 3, 2020 • edited Loading

nicholascar commented Jul 30, 2020

coveralls commented Aug 31, 2018 •

edited

Loading

GreatYYX commented Oct 9, 2018 •

edited

Loading

nicholascar commented May 3, 2020 •

edited

Loading