-
-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement stmt.locked getter #424
Comments
You seem to have the right idea about what's going on.
It can be relatively expensive, yes. However, the only way to know if it's worth caching is to benchmark it yourself on the machine that you're actually using.
Your cache would need to manage multiple statements containing the same SQL. An array will probably work fine. You only need to create a new statement object when trying to use an SQL string whose associated statements are ALL in-use. In practice, that should only happen when nesting iterators. So unless someone is nesting hundreds of iterators (which nobody would do), the number of statements per SQL string should remain pretty small (~1-4). Therefore, I don't see a need to ever retire them. If you're really worried, you could implement a rule that your cache will only keep a maximum of 10 statements per SQL string—any extras must be compiled on the spot. That would cause a performance hit in extreme (extremely unlikely) cases, but would protect your memory usage from ballooning (in extremely unlikely cases).
This is not an artificial limitation of |
This means you need to keep track of which statements are in-use. You'll need to develop an API that allows you to keep track of this information. Here is one way to structure such a cache: const cache = new Map();
// Used to borrow a statement from cache, or create a new one
function claim(sql) {
let list = cache.get(sql);
if (!list) cache.set(sql, list = []);
return list.pop() || db.prepare(sql);
}
// Used to return a statement to cache (up to a maximum of 10 per SQL string)
function relinquish(stmt) {
let list = cache.get(sql);
if (list.length < 10) list.push(stmt);
} You could have your application use these functions directly ( "Borrowing callback"With the function borrow(sql, fn) {
const stmt = claim(sql);
try {
return fn(stmt);
} finally {
relinquish(stmt);
}
}
borrow('SELECT * FROM data', (stmt) => {
// do stuff with stmt...
}) "Proxy class"With a proxy class Statement {
constructor(sql) {
this.sql = sql;
}
run(...args) {
return borrow(this.sql, stmt => stmt.run(...args));
}
get(...args) {
return borrow(this.sql, stmt => stmt.get(...args));
}
all(...args) {
return borrow(this.sql, stmt => stmt.all(...args));
}
*iterate(...args) {
const stmt = claim(sql);
try {
for (const row of stmt.iterate(...args)) {
yield row;
}
} finally {
relinquish(stmt);
}
}
} And there are probably other approaches too. You must decide which abstraction is right for your application. Also, it's worth mentioning that all this caching and extra logic isn't free. You should definitely measure the performance of the approach you choose to know if it's worth doing at all. |
Wow, thanks for the detailed response and sample code! It occurred to me that one very simple thing that could greatly reduce the brittle-ness of the first approach and obviate the need for the complexity of the second and third approach is a simple class Statement property or method Using this primitive, it would be possible for us (or, perhaps in the future, better-sqlite3) to implement a In a way, What do you think about adding a .locked -> boolean true if you have called I took a look at what existing functions will throw when By the way, in my application I will normally have 10-100 iterators that are going simultaneously on a small number of big tables. They are not really "nested" in any sense; they are scanning various tables in parallel, where the next iterator to be advanced depends on the values read from existing iterators (think walking a set of sorted lists in parallel). This is actually a pretty common operation for certain kinds of text processing, so is not that far-fetched. Most key-value database packages (the descendants of dbm) have some kind of lightweight cursor operation to facilitate this kind of parallel table-walking and it would be awesome if better-sqlite3 could do it too. It seems like sqlite3 has everything we need; we just have to find a clean and non-brittle way of knowing when we need to create more statement handles. |
There's another way I thought of to get the The idea is that rather than calling This produces a ghetto version of It's still brittle because
|
I know it's a bit late, but this is now supported via the Docs: https://github.com/JoshuaWise/better-sqlite3/blob/master/docs/api.md#properties-1 |
This is related to #203 and #367 but not the same.
I am SO happy that you did the fix in #203 to allow multiple simultaneous read-only iterate()s (and read-write with an unsafe mode flag in not-yet-shipped code) as I need that.
But if I am understanding the results I am seeing correctly, it looks like the implementation of multiple simultaneous iterate()s prevents another useful optimization, re-using statement objects. Let me explain...
I am still hitting cases of iterate() returning "This statement is busy executing a query" on read-only queries even with v6.0.1.
I tracked it down and the behavior seems to occur when both of these are true:
This even happens when there are NO other intervening statements being executed other than SELECTs (no read-write statements on any table).
The reason why I am re-using the same JavaScript statement object is that I understand it is expensive to parse the SQL and do other things in prepare() and not something one would want to do on every statement execution, so I have a cache like:
However, when I hit the cache and a case happens to satisfy both part 1 and 2 above, I get the error. For my application, I DEFINITELY need to have a sizable number of iterators (say 10-100) that have not yet hit their end active simultaneously, and some of those will be from the same prepare()d SQL statement.
While I don't 100% understand the code, the restriction seems to come from node_modules/better-sqlite3/src/objects/statement-iterator.lzz:
where a given stmt becomes completely locked up from subsequent iterate() until its Cleanup(). Cleanup() apparently happens from Throw(), from Return() (which called from the final JS_next() and Next() at end of iteration even for those not using for...of) and presumably also when the iterator is garbage-collected?
So, the questions are:
is my understanding of the problem correct?
is prepare() in fact expensive and worth caching?
how can we achieve multiple simultaneous iterate()s and also get the performance benefits of re-using prepare()? I could perhaps envision complicating my cache with some kind of "pool" of JavaScript statement objects for the same SQL string, but that seems like a sketchy venture...not sure I have enough info at the library user level to know when to add more objects for a given SQL and when to retire them.
is the limitation artificial at the better-sqlite3 implementation level? in sqlite3 at the C level, are the expensive parsing actions in fact separate from the identity of the object which does iteration, and can we somehow split those two at the better-sqlite3 API level to get the best of both worlds?
Thanks!
The text was updated successfully, but these errors were encountered: