Zab: High-performance broadcasting using a non-blocking voting scheme

Abstract

📜 Abstract

In this paper we present ZAB, a protocol for achieving high-throughput strongly consistent replication, while avoiding blocking or inhibiting normal operations after a leader failure. Our protocol addresses the complex problem of choosing between ordering of broadcasts and availability during leader failure and recovery. We describe an implementation of the Zab protocol that forms the backbone of the system synchronization of ZooKeeper, a coordination service for distributed applications.

Description

✨ Summary

The paper titled “Zab: High-performance broadcasting using a non-blocking voting scheme” introduces the Zab protocol, a broadcast protocol that ensures high-throughput and strongly consistent replication without blocking during leader failures. This protocol is fundamental for distributed systems needing reliable broadcast services. The implementation described in the paper is used within ZooKeeper, which is a key coordination service for distributed applications. This has been influential in developing systems that require fail-safe and consistent replication mechanisms.

The paper addresses key challenges in distributed systems by dealing with the trade-off between message ordering and availability during failures. The Zab protocol has been influential, as evidenced by references in other research works. Notably, it is cited in the “ZooKeeper: Wait-free coordination for Internet-scale systems” paper (https://dl.acm.org/doi/10.1145/1863103.1863117), which highlights its impact on ZooKeeper’s development, a critical component in Apache Hadoop’s ecosystem, and other large-scale distributed systems relying on Apache ZooKeeper for synchronization and configuration maintenance. This demonstrates its significance in the field of distributed systems and its practical applicability to real-world problems involving system reliability and performance.