timeout parameter is not always respected by fuseki server #3044

mpagni12 · 2025-03-04T14:04:32Z

Version

5.3.0

What happened?

Using the Python requests client, I am running sequentially a list of diverse SPARQL queries on a fuseki endpoint at localhost:3030, using POST request and a timeout parameter set to 10 s. For the few first lengthy query in the list, the timeout works as expected, i.e. the request terminates after approx 10 s and with a 503 status code. However at some point the timeout of subsequent queries is ignored, i.e. the queries executing correctly and completely.

As I suspect that this problem might be linked to the fuseki HTTP server, I have not investigated the content of the queries that are not timing out. Should I?

Thanks in advance,

Marco

Relevant output and stacktrace

Are you interested in making a pull request?

None

afs · 2025-03-04T16:28:39Z

@mpagni12 -- theer isn't enough to recreate the situation from the description.

Could you provide the fuseki server log file for the relevant first request (timeouts) and later (non-timeouts). If that can be "verbose" (server "-v") that would be best.

What is the storage? TDB2? Any inference?

I have not investigated the content of the queries that are not timing out. Should I?

It does seem to need know what the queries are.

Note: if the server has just started, first requests are slower. Java starts optimizing code and the file cache becomes more active over time.

Aklakan · 2025-03-04T20:04:51Z

I quite recently fixed several cases where timeouts were not working properly. Without your query load its hard to tell whether this is related to something I did not fix - or perhaps to something that I accidentally broke.

Are update requests involved?
Do the timeouts work properly for your query load with jena 5.0.0?

mpagni12 · 2025-03-05T12:44:49Z

Here is the log: fuseki.log.gz

Storage is TDB with no inference.

mpagni12 · 2025-03-05T12:45:21Z

I quite recently fixed several cases where timeouts were not working properly. Without your query load its hard to tell whether this is related to something I did not fix - or perhaps to something that I accidentally broke.

Are update requests involved? Do the timeouts work properly for your query load with jena 5.0.0?

I am using 5.3.0, should I downgrade my version for testing?

Aklakan · 2025-03-05T13:29:23Z

I am using 5.3.0, should I downgrade my version for testing?

That would be very helpful in order to determine whether the issue existed for a while or was introduced recently.

mpagni12 · 2025-03-05T15:55:17Z

I rebuild a database and rerun the same test using 5.0.0: all lengthy queries timeout after 10 s :-)

This strongly suggests that the bug was introduce later.

afs · 2025-03-05T16:21:45Z

I looked for timeout= and matched queries to their response code: e.g. [34] 503 Service Unavailable (10.879 s) or [26] 200 OK (7.066 s)

I didn't find any 200's over 10 seconds.
I didn't find any 503's under 10 seconds.

I did find one outlier [20] 503 Service Unavailable (377.566 s). It is like the query from #3021 without the suggested workaround but has extra work added. The OPTIONAL look like they have a partial cross-product in them.

The timeout is tested every so often - it isn't an interrupt - so the query is in a busy CPU loop (probably fixable - caveat polling the timeout too much is a bit expensive).

Aklakan · 2025-03-05T19:07:37Z

So I can reproduce timeouts being ignored on property paths, such as this one on a dbpedia dataset (lots of labels):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x { ?x rdfs:label/rdfs:label ?z }

time curl http://localhost:3030/ds --data-urlencode 'query=PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?x { ?x rdfs:label/rdfs:label ?z }' --data-urlencode 'timeout=1'

I have not analyzed property paths further.

I can also reproduce a similar (perhaps the same) problem with this query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x {
  ?x rdfs:label ?o
    { ?x rdfs:label ?z }
  UNION
    { BIND("hi" AS ?s) }
}

The problem is synchronization in conjunction with an eagerly executing iterator:

The timeout fires and tries to abort, but doesn't get the lock:

jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java

Lines 364 to 367 in 6a86b5d

    
           class TimeoutCallback implements Runnable { 
        
               @Override 
        
               public void run() { 
        
                   synchronized (lockTimeout) {

The lock is acquired here:

jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java

Lines 438 to 445 in 6a86b5d

    
           private void startQueryIterator() { 
        
               synchronized (lockTimeout) { 
        
                   if (cancelSignal.get()) { 
        
                       // Fail before starting the iterator if cancelled already 
        
                       throw new QueryCancelledException(); 
        
                   } 
        
                   startQueryIteratorActual();

And while the lock is held QueryExecDataset.startQueryIteratorActual executes a hash join eagerly in the line getPlan().iterator(); - so the abort signal due to timeout does not get through:

jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java

Lines 472 to 475 in 6a86b5d

    
           if ( !isTimeoutSet(timeout1) && !isTimeoutSet(timeout2) ) { 
        
               // Case -1,-1 
        
               queryIterator = getPlan().iterator(); 
        
               return;

afs · 2025-03-05T20:41:24Z

A sequence path should be expanded to triple patterns.

{ ?x rdfs:label / rdfs:label ?z }

{
?x rdfs:label ??P1 .
??P1  rdfs:label ?z .
}

Looking at the log, the timeout isn't getting forgotten - it is delayed (presumably by being busy in some tight loop).

The execution plan for the [20] does not have a hash join in it. It's conditional and sequence but I think the OPTIONALs can massive fan-out and should be hash left joins.

Aklakan · 2025-03-05T21:00:34Z

Removing the lock around QueryExecDataset.startQueryIterator() makes the query with UNION { BIND() } work (the eager hash join then gets cancelled because the signal is no longer blocked).

For the sequence path query, query execution hangs trying to produce a result with this stack trace - it does not appear to be affected by the locking issue - the context has the cancel signal correctly set to true but the signal is not considered.

So it should be two separate issues.

Aklakan · 2025-03-05T21:27:44Z

The problem I see with eager iterators in the snippet below is, that if getPlan().iterator() fails such as due to concurrent abort, then there is no iterator that can be closed -> likely resource leak. Or at least warnings from QueryIteratorCheck about non-closed iterators in the execCxt. (Or would closing the plan prevent that? - I don't yet know what resources this would close.)

jena/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java

Lines 472 to 475 in 6a86b5d

    
           if ( !isTimeoutSet(timeout1) && !isTimeoutSet(timeout2) ) { 
        
               // Case -1,-1 
        
               queryIterator = getPlan().iterator(); 
        
               return;

IMO the clean solution would be to change AbstractIterHashJoin to be lazy - i.e. move buildHashTable to the hasNext method. Then the lock would only be held shortly for the iterator construction, and then the timeout can kick in normally.

…s initial hash probe table.

mpagni12 added the bug label Mar 4, 2025

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 5, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

1d84676

…s initial hash probe table.

Aklakan linked a pull request Mar 5, 2025 that will close this issue

GH-3044: HashJoin iterator construction no longer eagerly builds hash probe table. #3047

Open

3 tasks

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

0661332

…s initial hash probe table.

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

026f326

…s initial hash probe table.

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

c1ec70f

…s initial hash probe table.

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

0adb79d

…s initial hash probe table.

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

7a98372

…s initial hash probe table.

Aklakan added a commit to Aklakan/jena that referenced this issue Mar 6, 2025

apacheGH-3044: HashJoin iterator construction no longer eagerly build…

6b61668

…s initial hash probe table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timeout parameter is not always respected by fuseki server #3044

timeout parameter is not always respected by fuseki server #3044

mpagni12 commented Mar 4, 2025

afs commented Mar 4, 2025

Aklakan commented Mar 4, 2025 •

edited

Loading

mpagni12 commented Mar 5, 2025

mpagni12 commented Mar 5, 2025

Aklakan commented Mar 5, 2025

mpagni12 commented Mar 5, 2025

afs commented Mar 5, 2025

Aklakan commented Mar 5, 2025 •

edited

Loading

afs commented Mar 5, 2025

Aklakan commented Mar 5, 2025 •

edited

Loading

Aklakan commented Mar 5, 2025 •

edited

Loading

timeout parameter is not always respected by fuseki server #3044

timeout parameter is not always respected by fuseki server #3044

Comments

mpagni12 commented Mar 4, 2025

Version

What happened?

Relevant output and stacktrace

Are you interested in making a pull request?

afs commented Mar 4, 2025

Aklakan commented Mar 4, 2025 • edited Loading

mpagni12 commented Mar 5, 2025

mpagni12 commented Mar 5, 2025

Aklakan commented Mar 5, 2025

mpagni12 commented Mar 5, 2025

afs commented Mar 5, 2025

Aklakan commented Mar 5, 2025 • edited Loading

afs commented Mar 5, 2025

Aklakan commented Mar 5, 2025 • edited Loading

Aklakan commented Mar 5, 2025 • edited Loading

Aklakan commented Mar 4, 2025 •

edited

Loading

Aklakan commented Mar 5, 2025 •

edited

Loading

Aklakan commented Mar 5, 2025 •

edited

Loading

Aklakan commented Mar 5, 2025 •

edited

Loading