-
Notifications
You must be signed in to change notification settings - Fork 115
Consider an alternative disk representation for datom flags #29
Comments
Note that each of these columns also appears transiently when inserting datoms into |
SQLite 3.15 allows the use of deterministic functions in partial index expressions, so we can now go whole-hog on bitfields if we choose. TryGhost/node-sqlite3#724 adds this for |
Hooray: https://bugzilla.mozilla.org/show_bug.cgi?id=1310361 already landed. |
|
I did this in https://github.com/thomcc/mentat/commit/ba29de5998d8a905e024339e9e5167ee56f2551c (note that not all tests pass, but it was just a test to see how beneficial it was). Sadly, it does not appear to make a meaningful difference in the on-disc database size :( |
You need to make |
Yeah, it's not nothing (average of 2-3 bytes per datom), but not really clear that it's worth the version breakage or additional complexity. It probably saves more for things using
|
We have four fields in
datoms
:index_vaet
,index_avet
,index_fulltext
andunique_value
.These fields appear in
datoms
and in the indicesidx_datoms_eavt
andidx_datoms_aevt
.Some rows also appear in
idx_datoms_avet
,idx_datoms_vaet
,idx_datoms_fulltext
, andidx_datoms_unique_value
.That is: each datom is responsible for anywhere from 12 to 28
TINYINT
s in the DB. Sometimes these won't contribute to space (if they're zero, I've read that sqlite should pack them down to nothing), but assuming one byte per field, half a million datoms will add up to 14MB to the DB just for these flags.These flags are entirely derived from the schema: a datom's attribute is the sole determinant (AFAICS) of whether these flags are
1
or0
for a row.Now, partial indexes cannot refer to other tables or call functions, so we can't simply join against the schema in the
CREATE INDEX … WHERE
clause. But we could use a more complex operator-driven expression, including bitmasks, to compress these flags.We could also consider approaches to simply removing columns:
index_vaet
andindex_fulltext
are mutually exclusive, so they could be two integer values in a single field.index_vaet
corresponds to:db/valueType :db.type/ref
, which is already implicitly represented by avalue_type_tag
of 0, so we can filter on that instead.Finally, we could consider direct schema creation as an approach: rather than having
idx_datoms_fulltext
, for example, we could create per-attribute indices when we register a schema fragment that includes:fulltext true
:CREATE INDEX idx_datoms_fulltext_page_title ON datoms (value_type_tag, v, a, e) WHERE a = 65538 -- The interned attribute.
I don't know if sqlite will happily use a collection of such indices (perhaps it'd be quicker for real-world queries!), but it's worth exploring.
The text was updated successfully, but these errors were encountered: