paper

Bigtable: A Distributed Storage System for Structured Data

  • Authors:

📜 Abstract

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size and latency requirements. Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

✨ Summary

The paper “Bigtable: A Distributed Storage System for Structured Data” was published by Fay Chang et al. in October 2006 at the OSDI conference. Bigtable was developed by Google to manage structured data and is designed for scalability across thousands of servers, potentially handling petabytes of data. This system has been successfully integrated into several Google products, such as web indexing, Google Earth, and Google Finance, due to its flexible data model and high performance.

Bigtable has significantly influenced both academic research and industrial applications. In academia, it has inspired further research into distributed storage systems, data models, and database performance. It is often cited in works discussing NoSQL databases and large-scale data management. The system’s principles have been implemented in popular open-source projects like Apache Hadoop’s HBase and Apache Cassandra, which have been extensively discussed in various research papers and technical blogs detailing modern data infrastructure (for example, the survey paper on NoSQL databases https://ieeexplore.ieee.org/document/7006512, or the performance comparison study https://link.springer.com/article/10.1007/s10115-013-0691-z). These projects have been instrumental in bringing Bigtable’s underlying concepts to organizations beyond Google, including Facebook (which developed Cassandra) and many enterprises leveraging HBase for scalable data storage solutions.