Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Thoughts: what is a sync, really?

Richard Newman edited this page Apr 1, 2017 · 3 revisions

Syncing consists of the following operations, typically in this order:

  • Replication in of remote state.
  • Deduction of equivalences.
  • Resolution of conflicts and rewriting or eliding local changes wrt equivalences and new remote state.
  • Optional flattening of resolved local changes à la git rebase.
  • Replication out of local state.

Original implementations of Firefox Sync conflated all of these stages:

  • Records are downloaded one-by-one.
  • Equivalences are calculated on the fly, in memory.
  • Conflicts are immediately resolved by overwriting local state in place or discarding incoming records.
  • Replication out is blind copying of post-resolution local state.

This is flawed in lots of ways, which are well documented in bugs.

Improved versions of Sync began to tease these apart:

  • Two-phase application — buffering downloaded records — partly and temporarily separated out replication, which offers more options for detection and resolution of conflicts and inconsistencies.
  • Structural/transactional record application on iOS separated out equivalence and resolution (albeit only within the scope of a sync, with no permanent record of those states.)

Mentat offers the opportunity to truly separate these:

  • Replicate down remote datoms, storing them whole.[1]
  • Derive equivalences and new facts by comparing local and remote data. Some of this can be explicit in the datom stream (e.g., by defining new entities with lookup refs), and some can be schema-derived (cardinality constraints).
  • Store those equivalences. These are part of the data: they’ll be used when merging datom streams, and are necessary for examining history. If two systems can both operate offline, then one of a dictionary or a history rewriting mechanism is necessary to merge data. Rewriting history is expensive, so…
  • Detect and resolve conflicts. This is relatively easy compared to Sync: the full history of changes on both sides (modulo excision and history rewriting) is available, so the only conflicts will be real conflicts. Ideally all of these conflicts will be schema-encoded: two cardinality-one assertions for the same entity, for example. Some can be domain-level and detected by code.
  • Store assertions and retractions to resolve conflicts. We now have a concrete, permanent record of exactly what happened during a sync!
  • Commit the transaction and make it available for replication. Now other devices can also see exactly how we resolved conflicts.

[1] The presence of cardinality and uniqueness constraints implies that this isn’t direct storage; in an RDF/OWL world, this would be direct storage! However, if we squint at Mentat’s concept of a transaction — which, after all, includes states that would be invalid if the transaction were split in pieces — then we might be able to achieve this. Think about SQLite’s PRAGMA defer_foreign_keys.