-
-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: concurrency issues #1299
Comments
I am talking about semver major here, redesigning this library to solve these problems. I guess the .serialize() and .parallel() idea is the goal idea, just the implementation is wrong. maybe remove all methods from the Database class for .run etc and only expose dbManager.writeLock/dbManager.readLock (better names) and it must be a promise so then make it dbManager.writeLock(async (db) => { // db variable "promoted" to new type.
//Entire writeLock holds exclusive db lock and mutex, no write contention.
// writeLock method should use rwlock to queue up writes and not
// open mutex until its turn comes.
await db.query("BEGIN");
const stmt = await db.prepare("INSERT ..."); // return WritableStatement
for (let i = 0; i < 100000; i++) {
stmt.update([i]); // We don't need result of this, don't wait
// for result before enqueuing next, enqueue them all so that
// the thread can work as fast as possible.
}
// Inserts guaranteed in order even without awaiting.
// Commit will not execute and return until every previous
// statement is finished. Single threaded queue.
await db.query("COMMIT");
}); where db parameter for writeLock is WritableDatabase and db.readLock is ReadableDatabase Readable should not export any methods that can be used to mutate the DB, and error if it detects INSERT/UPDATE/DELETE/CREATE/ALTER etc. Writable should have .update (instead of run/exec) for update operations. Java has a good break up of the 2 operations, and suggest following its style and .query returns results, .update does updates and returns affected rows. Then Writable has .update, Readable does not. |
Regarding #1292. I am personally not able to take on such a large task like this, but hoping I can help in designing the solution to this critical issue for someone else to run with if they think they are up for the challenge. Ultimately do need maintainers thoughts on if they are ok with this approach before someone should consider running with it. |
I just noticed db.serialize() used without a callback can be used to leave it in the state. doing that in the example code does bring it to return 100k results as expected. So I guess there is a workaround in userland for now, and documentation can be helped here. might also be able to make .serialize and .parallel use rwlock under the hood? But still argue that the library can use a fundamentally better approach in the C++ land for this to ensure end users don't have to jump through hoops to send queries correctly. While the documentation does mention it can be used without a callback, this was extremely unintuitive since all the examples show otherwise. Additionally, this solution ONLY works when using external concurrency protection with rwlock to ensure that serialize boolean doesn't change from another operation mid process. I would update wiki with valid workarounds for now, but its not open to public for edit. |
Hi @aikar Is it possible to implement this functionality whilst keeping the existing implementation until we are sure it can be deprecated? Thank you for your time, |
Based on my latest findings, I believe we can add NEW api for accessing the DB in a safe and performant manner, then update all documentation to encourage use of it, without breaking API. Might also be able to silently add promise support to the .serialize and .parallelize methods so they don't flip it back until the async function returns. these methods can then be updated to use the .readLock and .writeLock concepts too. Doing this would then make anyone using callbacks on those methods start becoming safer, without any behavior change risk. bonus points if in follow up work, can identify query type, and automatically switch runtime mode to serialize for all write operations if it's not already in serialize mode. Unless someone can show me otherwise, I do not believe it's possible to perform concurrent writes, so all write operations should always be in serialize mode? And read operations can be in parallel, but should be in serialize if within a transaction. Also, what is the projects minimum nodejs version? |
Hi @aikar I suggest: Thank you for the analysis and interest! |
@aikar Thanks for sharing this it really helps alot. |
I brought up some information here, #703
but wanted to make a direct issue on this topic.
real world use of this library will be in a highly asynchronous context.
However, db.serialize is a synchronous operation:
https://github.com/mapbox/node-sqlite3/blob/master/src/database.cc#L294
As soon as the callback returns, it is reset back to previous value
issue example (pretend db.run returns a promise)
Ok, so one might think I can work around this by simply not awaiting
Expectation: 100k returned. We set all values to y after insert.
Got: 60-70k or random values
The reason is, each of these queries are submitted to the uv_default_loop():
https://github.com/mapbox/node-sqlite3/blob/master/src/statement.cc#L432
https://github.com/mapbox/node-sqlite3/blob/master/src/statement.cc#L438
https://github.com/mapbox/node-sqlite3/blob/master/src/macros.h#L128
There are multiple threads in the uv work queue. Each of your queries are running over multiple threads out of order!
What's worse, is an update needs to lock the database for the write, so that each of these threads are now also fighting over the mutex, slowing down the uv queues too!
Solution
I'm not sure how to perfectly solve this. Primary idea is to run its own single thread for writes.
maybe each query needs to be done in some kind of Promise supported callback, where it defines which thread pool to use, a single thread write, multiple thread read.
Additionally maybe best to not use the default uv work queue at all, ensuring that file locks don't block the uv queues.
I ultimately went down investigating this when I noticed my http server started timing out during heavy insert periods.
I'm not sure it's fully possible to fix this issue at application level outside of my solution supplied on that linked issue, where every write operation needs to be behind an rwlock.writeLock and every query must be awaited on to ensure only a single query at a time.
This has a huge time cost overhead though.
Thoughts?
The text was updated successfully, but these errors were encountered: