Future Internet (May 2023)
Synchronizing Many Filesystems in Near Linear Time
Abstract
Finding a provably correct subquadratic synchronization algorithm for many filesystem replicas is one of the main theoretical problems in operational transformation (OT) and conflict-free replicated data types (CRDT) frameworks. Based on the algebraic theory of filesystems, which incorporates non-commutative filesystem commands natively, we developed and built a proof-of-concept implementation of an algorithm suite which synchronizes an arbitrary number of replicas. The result is provably correct, and the synchronized system is created in linear space and time after an initial sorting phase. It works by identifying conflicting command pairs and requesting one of the commands to be removed. The method can be guided to reach any of the theoretically possible synchronized states. The algorithm also allows asynchronous usage. After the client sends a synchronization request, the local replica remains available for further modifications. When the synchronization instructions arrive, they can be merged with the changes made since the synchronization request. The suite also works on filesystems with a directed acyclic graph-based path structure in place of the traditional tree-like arrangement. Consequently, our algorithms apply to filesystems with hard or soft links as long as the links create no loops.
Keywords