You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When querying a native DuckDB file over HTTP (with range requests), I am seeing corrupt reads when hitting the cache. i.e.:
Load page — query works
Refresh (sometimes needs more than once), query fails with the following error:
Error: IO Error: Corrupt database file: computed checksum 16933857704068960742 does not match stored checksum 0 in block at location 4206592
at ma.startPendingQuery (bindings_base.ts:188:19)
at Fo.onMessage (worker_dispatcher.ts:228:51)
at Wc.globalThis.onmessage (duckdb-browser-eh.worker.ts:29:19)
Clear cache — query works again for one load (go to state 2)
Some observations:
The stored checksum in the error message is always 0.
It seems like the file and the query result need to ~100 MB to trigger.
Likely needs HTTP range request. I've been testing with Node http-server package, but also seen the same behaviour on IIS, so assuming it's a browser/library issue.
Sometimes the location in the TypeScript is different, but probably a red-herring, eg:
Error: IO Error: Corrupt database file: computed checksum 11427024748155090702 does not match stored checksum 0 in block at location 337653760
at ma.pollPendingQuery (bindings_base.ts:201:19)
at Fo.onMessage (worker_dispatcher.ts:245:51)
at Wc.globalThis.onmessage (duckdb-browser-eh.worker.ts:29:19)
To Reproduce
Reproducible example (not minimal)
Create a largish dummy database with Python for eg:
"""create a dummy duckdb native database with a timeseries table"""importduckdb# v1.2.0importpandasaspdimportnumpyasnpcon=duckdb.connect("prices.db")
con.sql(
"""CREATE OR REPLACE TABLE prices ( datetime TIMESTAMP, forecast_scenario int64, member int64, price float)"""
)
t=pd.date_range("2020-01-01", end="2026", freq="h")
df=pd.concat(
(
pd.DataFrame(
{
"datetime": t,
"forecast_scenario": j,
"member": i,
"price": np.random.rand(len(t)),
}
)
foriinrange(100)
forjinrange(10)
)
)
con.execute("""INSERT INTO prices select * from dforderby (forecast_scenario, datetime, member)""")
con.query("SELECT * FROM prices LIMIT 10")
con.close()
Serve the created db file and following HTML with Node http-server:
<!DOCTYPE html><htmllang="en"><head><metacharset="UTF-8" /><metaname="viewport" content="width=device-width, initial-scale=1.0" /><title>Document</title></head><body><script>constgetDb=async()=>{constduckdb=window.duckdb;// @ts-ignoreif(window._db)returnwindow._db;constJSDELIVR_BUNDLES=duckdb.getJsDelivrBundles();// Select a bundle based on browser checksconstbundle=awaitduckdb.selectBundle(JSDELIVR_BUNDLES);constworker_url=URL.createObjectURL(newBlob([`importScripts("${bundle.mainWorker}");`],{type: "text/javascript",}));// Instantiate the asynchronus version of DuckDB-wasmconstworker=newWorker(worker_url);// const logger = null //new duckdb.ConsoleLogger();constlogger=newduckdb.ConsoleLogger();constdb=newduckdb.AsyncDuckDB(logger,worker);awaitdb.instantiate(bundle.mainModule,bundle.pthreadWorker);URL.revokeObjectURL(worker_url);window._db=db;returndb;};</script><scripttype="module">import*asduckdbfrom'https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm';window.duckdb=duckdb;getDb().then(async(db)=>{awaitdb.registerFileURL('prices.db',newURL('../prices.db',window.location.href).href,4)constconn=awaitdb.connect();awaitconn.query(`ATTACH 'prices.db' (READ_ONLY)`)forawait(constbatchofawaitconn.send(` SELECT * FROM prices.prices WHERE datetime > '2025-01-01' and datetime <= '2025-01-02'; `)){console.log(batch);}});</script><divid="output"></div></body></html>
Load page in browser — should succeed
Refresh (potentially more than once) — hit above error
Clear browser cache and repeat — works again.
Browser/Environment:
Chrome 131
Device:
Windows 10 x86-64
DuckDB-Wasm Version:
1.29.0
DuckDB-Wasm Deployment:
JSDelivr
Full Name:
David Horsley
Affiliation:
Hydro Tasmania
The text was updated successfully, but these errors were encountered:
What happens?
When querying a native DuckDB file over HTTP (with range requests), I am seeing corrupt reads when hitting the cache. i.e.:
Some observations:
The stored checksum in the error message is always
0
.It seems like the file and the query result need to ~100 MB to trigger.
Likely needs HTTP range request. I've been testing with Node http-server package, but also seen the same behaviour on IIS, so assuming it's a browser/library issue.
Sometimes the location in the TypeScript is different, but probably a red-herring, eg:
To Reproduce
Reproducible example (not minimal)
Browser/Environment:
Chrome 131
Device:
Windows 10 x86-64
DuckDB-Wasm Version:
1.29.0
DuckDB-Wasm Deployment:
JSDelivr
Full Name:
David Horsley
Affiliation:
Hydro Tasmania
The text was updated successfully, but these errors were encountered: