-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iterate(): small but important doc clarification #406
Comments
So here's the thing. The But for all practical purposes, it will probably work as you expect in your first example. Any memory that this library allocates (that isn't managed by v8), is de-allocated after each loop iteration. The Also:
This made me laugh. |
Interesting. The main question is "when fetching a large result set, what is the wost-case amount of memory that better-sqlite3 will allocate and hold onto during the full iteration, and is that amount proportional to the number of results?" Unreferenced memory held by the JS engine is less of a concern because there should only be able to be a constant amount of it before the GC kicks in, so that is not proportional to the number of results. So some dox like this will do the trick:
|
I think it would be helpful to include some dox like the example one in the last message in the official dox for iterate() at: https://github.com/JoshuaWise/better-sqlite3/blob/master/docs/api.md |
I agree it would be helpful to have that in the documentation. It got me a little confused, because what I observe seems to be much more like the second example. When I execute This question is a bit more than three years old, so maybe this is not the best place to ask this, but has something changed since then ? And this should definitely be documented if this is the actual behavior, because I used this function thinking it would be awesome and use little memory, and I got quite disappointed when I saw that😅. |
@pelletier197 I am not able to reproduce the behavior you described. I filled a database with 10,000,000 rows (~5GiB of data), using the following script: const db = require('better-sqlite3')('test.db');
db.pragma('journal_mode = MEMORY'); // Makes it faster
db.pragma('locking_mode = EXCLUSIVE'); // Makes it faster
db.exec('BEGIN EXCLUSIVE; COMMIT'); // // Needed for locking_mode = EXCLUSIVE
db.exec(`
CREATE TABLE data (
id INTEGER PRIMARY KEY,
chunk BLOB NOT NULL
) STRICT;
`);
const chunk = Buffer.alloc(512);
const stmt = db.prepare('INSERT INTO data (id, chunk) VALUES (?, ?)');
for (let id = 0; id < 10000000; ++id) {
stmt.run(id, chunk);
}
db.close(); Then I ran the following script to iterate through all the rows: console.log('opening database...');
const db = require('better-sqlite3')('test.db', { readonly: true });
console.log('preparing statement...');
const stmt = db.prepare('SELECT chunk FROM data ORDER BY id ASC').pluck();
console.log('running query...');
const iterator = stmt.iterate();
let index = 0;
console.log('getting row #0...');
for (const _ of iterator) {
console.log(`getting row #${++index}...`);
}
console.log('done!'); The process memory usage never went above ~70MiB. Then I tried replacing |
Note that if you're using |
@JoshuaWise thank you for your help. I realised that there was an |
@pelletier197 , That seems an interesting find of orderby clause, the question is where did you get that clue, and does it also happens for numeric primary key columns cause that's what I assume becomes clustered Index for table in which they are stored.. also doesnt SQLite implement merge sort on disk for large dataset? A need for every database cause they never know what size the result will be.. or I am missing something.. |
Thanks for your awesome library! The KISS design and lack of async and other zombie borg "you will be assimilated" thinking is truly a breath of fresh air!
The dox for iterate() at
https://github.com/JoshuaWise/better-sqlite3/blob/master/docs/api.md
don't actually say what the memory usage of iterate() is.
If iterate() guarantees to only hold a certain constant number of rows in memory at once in the worst-case (all the way down to the C library level), then this is SUPER useful to know and should be documented.
If Iterate() is actually a wrapper around all(), or otherwise could hold the whole record set in memory in the worst-case (at any level) then this is ALSO super useful to know and should be documented.
Right now, there is a comment about performance vs. all() which makes me lean towards the second interpretation; whatever is true should be explicitly documented.
The text was updated successfully, but these errors were encountered: