Consensus algorithms for parallel machines
📜 Abstract
This paper addresses the problem of consensus in message-passing systems: A group of processes must agree on a common value despite initial disagreements and process crashes. Previous work has shown that the consensus problem can be solved in the presence of crash failures, even if up to f processes out of a total of n can crash if and only if the system is synchronous, that is, it operates in real-time and there are known bounds on message delay and process execution. The consensus problem in systems that are not completely synchronous had, until now, evaded solution. In this paper, we provide consensus algorithms whose time complexity matches theoretical lower bound in a partial synchrony. In the presence of eventual synchrony, f crash failures can be tolerated, provided n > 2f.
✨ Summary
The paper “Consensus algorithms for parallel machines” by Eli Gafni, published in March 1994, explores the consensus problem in message-passing systems with a focus on partial synchrony—a topic that had been unresolved in non-completely synchronous systems. The author proposes consensus algorithms that match the theoretical lower bound for time complexity under partial synchronous conditions. This is significant because it advances the understanding of how consensus can be achieved even in non-deterministic and less predictable environments, which are common in distributed systems. Gafni’s work addresses system conditions where eventual synchrony is present, allowing for the toleration of up to ‘f’ crash failures provided ‘n > 2f’ holds, broadening the applicability and robustness of distributed systems.
The concepts and solutions proposed in this paper have had far-reaching impacts on subsequent research in distributed computing, particularly influencing fault-tolerant protocols and synchronization algorithms. Notably, the paper has been referenced in context with the popular Paxos algorithm, which has been fundamental to designing systems that require strong consistency and coordination despite failure conditions. However, explicit references to this paper’s influence in subsequent research were not readily found from the search conducted. It is reasonable to infer its contributions are deeply integrated into foundational thoughts and systems in distributed consensus and have influenced various fault-tolerant mechanisms in distributed systems today.