-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout parameter is not always respected by fuseki server #3044
Comments
@mpagni12 -- theer isn't enough to recreate the situation from the description. Could you provide the fuseki server log file for the relevant first request (timeouts) and later (non-timeouts). If that can be "verbose" (server " What is the storage? TDB2? Any inference?
It does seem to need know what the queries are. Note: if the server has just started, first requests are slower. Java starts optimizing code and the file cache becomes more active over time. |
I quite recently fixed several cases where timeouts were not working properly. Without your query load its hard to tell whether this is related to something I did not fix - or perhaps to something that I accidentally broke. Are update requests involved? |
Here is the log: fuseki.log.gz Storage is TDB with no inference. |
I am using 5.3.0, should I downgrade my version for testing? |
That would be very helpful in order to determine whether the issue existed for a while or was introduced recently. |
I rebuild a database and rerun the same test using 5.0.0: all lengthy queries timeout after 10 s :-) This strongly suggests that the bug was introduce later. |
I looked for I didn't find any 200's over 10 seconds. I did find one outlier The timeout is tested every so often - it isn't an interrupt - so the query is in a busy CPU loop (probably fixable - caveat polling the timeout too much is a bit expensive). |
So I can reproduce timeouts being ignored on property paths, such as this one on a dbpedia dataset (lots of labels): PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x { ?x rdfs:label/rdfs:label ?z } time curl http://localhost:3030/ds --data-urlencode 'query=PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?x { ?x rdfs:label/rdfs:label ?z }' --data-urlencode 'timeout=1' I have not analyzed property paths further. I can also reproduce a similar (perhaps the same) problem with this query: PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x {
?x rdfs:label ?o
{ ?x rdfs:label ?z }
UNION
{ BIND("hi" AS ?s) }
} The problem is synchronization in conjunction with an eagerly executing iterator: The timeout fires and tries to abort, but doesn't get the lock: jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java Lines 364 to 367 in 6a86b5d
The lock is acquired here: jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java Lines 438 to 445 in 6a86b5d
And while the lock is held jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java Lines 472 to 475 in 6a86b5d
|
A sequence path should be expanded to triple patterns. { ?x rdfs:label / rdfs:label ?z } {
?x rdfs:label ??P1 .
??P1 rdfs:label ?z .
} Looking at the log, the timeout isn't getting forgotten - it is delayed (presumably by being busy in some tight loop). The execution plan for the |
Removing the lock around For the sequence path query, query execution hangs trying to produce a result with this stack trace - it does not appear to be affected by the locking issue - the context has the cancel signal correctly set to true but the signal is not considered. So it should be two separate issues. |
The problem I see with eager iterators in the snippet below is, that if jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java Lines 472 to 475 in 6a86b5d
IMO the clean solution would be to change AbstractIterHashJoin to be lazy - i.e. move buildHashTable to the hasNext method. Then the lock would only be held shortly for the iterator construction, and then the timeout can kick in normally. |
…s initial hash probe table.
…s initial hash probe table.
…s initial hash probe table.
…s initial hash probe table.
…s initial hash probe table.
…s initial hash probe table.
…s initial hash probe table.
Version
5.3.0
What happened?
Using the Python requests client, I am running sequentially a list of diverse SPARQL queries on a fuseki endpoint at localhost:3030, using POST request and a timeout parameter set to 10 s. For the few first lengthy query in the list, the timeout works as expected, i.e. the request terminates after approx 10 s and with a 503 status code. However at some point the timeout of subsequent queries is ignored, i.e. the queries executing correctly and completely.
As I suspect that this problem might be linked to the fuseki HTTP server, I have not investigated the content of the queries that are not timing out. Should I?
Thanks in advance,
Marco
Relevant output and stacktrace
Are you interested in making a pull request?
None
The text was updated successfully, but these errors were encountered: