paper

Making reliable distributed systems in the presence of software errors

  • Authors:

📜 Abstract

This thesis addresses the problem of constructing reliable, fault-tolerant software systems. We develop a novel approach by integrating concepts from the functional, logical, and concurrent programming paradigms. Our model is strongly inspired by ideas from the Erlang programming language. We show how programs written in an error-faulting, concurrent language such as Erlang can be made highly reliable by structuring them as communicating processes. The thesis makes contributions in several areas: concurrent programming, reliable systems, testing, and the theory and practice of functional programming.

✨ Summary

This doctoral thesis by Joe Armstrong, presented in December 2003, is a seminal work in understanding the construction of reliable distributed systems in environments that are prone to software errors. The thesis introduces and elaborates on the idea of using the Erlang programming language to build systems where reliability and fault-tolerance are paramount. Erlang’s approach to error handling, based on the “let it crash” philosophy, allows systems to recover from faults effectively by organizing programs into isolated processes that communicate via message passing. This model is essential in achieving high reliability, particularly in systems requiring continuous operation, such as telecommunication systems.

The paper has significantly impacted both academia and industry, particularly noticeable in the adoption of Erlang in telecommunication environments and other industries requiring robust, distributed systems. Further research influenced by this thesis includes works on concurrent and functional programming languages, distributed computing models, and improved system reliability approaches. Relevant sources citing this work include:

This thesis has established an influential foundation for understanding and implementing fault-tolerant distributed systems, widely adopted and continued to be relevant in various state-of-the-art applications.