Thoughts: what is a sync, really?

Syncing consists of the following operations, typically in this order:

Replication in of remote state.
Deduction of equivalences.
Resolution of conflicts and rewriting or eliding local changes wrt equivalences and new remote state.
Optional flattening of resolved local changes à la git rebase.
Replication out of local state.

Original implementations of Firefox Sync conflated all of these stages:

Records are downloaded one-by-one.
Equivalences are calculated on the fly, in memory.
Conflicts are immediately resolved by overwriting local state in place or discarding incoming records.
Replication out is blind copying of post-resolution local state.

This is flawed in lots of ways, which are well documented in bugs.

(Being designed to support the original clients, the Firefox Sync server itself doesn't preserve the prior state of a record, or indeed exchange changes at all — after a client has resolved changes to some entity, the previous server state is entirely replaced. This makes backup/restore, cross-record consistency, tracking down bugs, data recovery, and less simplistic conflict resolution and detection nearly impossible.)

Improved versions of Sync began to tease these client-side stages apart:

Two-phase application — buffering downloaded records before applying them en mass — partly and temporarily separated out replication, which offers more options for detection and resolution of conflicts and inconsistencies.
Structural/transactional record application on iOS separated out equivalence and resolution (albeit only within the scope of a sync, with no permanent record of those states.)

Mentat offers the opportunity to truly separate these:

Replicate down remote datoms, storing them whole.¹
Derive equivalences and new facts by comparing local and remote data. Some of this can be explicit in the datom stream (e.g., by describing new entities via lookup refs), and some can be schema-derived (cardinality constraints).
Store those equivalences. These are part of the data model: they’ll be used when merging datom streams, and are necessary for examining history. If two systems can both operate offline, then one of a dictionary or a history rewriting mechanism is necessary to merge data. Rewriting history is expensive, so…
Detect and resolve conflicts. This is relatively easy compared to Sync: the full history of changes on both sides (modulo excision and history rewriting) is available, so the only conflicts will be real conflicts. Ideally all of these conflicts will be schema-encoded: two cardinality-one assertions for the same entity, for example. Some can be domain-level and detected by code: prior art around automatic conflict detection doesn't convince me that domain-level conflict resolution is redundant.
Store appropriate assertions and retractions to record resolved conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
Commit the transaction and make it available for replication. Now other devices can also see exactly how we resolved conflicts.

¹ The presence of cardinality and uniqueness constraints implies that this isn’t direct storage; in an RDF/OWL world, this would be direct storage! However, if we squint at Mentat’s concept of a transaction — which, after all, includes states that would be invalid if the transaction were split in pieces — then we might be able to achieve this. Think about SQLite’s PRAGMA defer_foreign_keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts: what is a sync, really?

Clone this wiki locally