paper

Spanner: Google's Globally-Distributed Database

  • Authors:

📜 Abstract

Spanner is Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows. This paper describes how Spanner is structured, its novel features, the rationale underlying various design decisions, and the experience of creating, supporting, and using Spanner.

✨ Summary

This paper introduces Spanner, a globally-distributed database developed by Google to handle large-scale data across multiple datacenters. Spanner uniquely combines global distribution with externally-consistent distributed transactions, made possible through innovative timestamping techniques tied to Google’s use of GPS and atomic clocks, enabling precise clock synchronization across its infrastructure. The paper details the architecture and design choices that support the scalability, availability, and consistency of Spanner.

Spanner has significantly influenced both the research community and industry by demonstrating a successful implementation of a globally distributed, highly-scalable database that supports strong consistency. It has set a standard for what has become known as NewSQL databases, which aim to combine the scalability traditionally associated with NoSQL systems with the ACID guarantees of traditional relational databases.

A notable impact is seen where Spanner’s concepts have been expanded upon or referenced, such as in the design and development of CockroachDB, which aims to offer a similar type of distributed and strongly-consistent database while being open-source (CockroachDB). Additionally, many cloud service providers have looked towards offering distributed databases inspired by Spanner (Google Cloud Spanner).

Overall, Spanner’s innovative approach to managing global data distribution, consistency, and scale has paved the way for the development of new technologies and set foundational standards in distributed database management. The paper is widely cited in academic and industry literature regarding distributed systems and database scalability (DBLP).