20 Jun 24

TLA+ is a high-level language for modeling programs and systems – especially concurrent and distributed ones. It’s based on the idea that the best way to describe things precisely is with simple mathematics. TLA+ and its tools are useful for eliminating fundamental design errors, which are hard to find and expensive to correct in code.


13 Jul 21

This talk explores architectures which have emerged based on event logs, what they are useful for, and consequences of different approaches. The focus is on the quality attributes such as fault tolerance, scalability, operations, integration, and development concerns.


30 May 20

A git repository that serves as a tutorial for analyzing a blocking queue implementation using TLA+, each git commit introduces a new concept.


15 Feb 20

Physalia is a transactional key- value store, optimized for use in large-scale cloud control planes, which takes advantage of knowledge of transaction patterns and infrastructure design to offer both high availabil- ity and strong consistency to millions of clients. Physalia uses its knowledge of datacenter topology to place data where it is most likely to be available. Instead of being highly available for all keys to all clients, Physalia focuses on being extremely available for only the keys it knows each client needs, from the perspective of that client. This paper describes Physalia in context of Amazon EBS, and some other uses within Amazon Web Services.


09 Aug 16

This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.

by mlb Aug 2016 saved 2 times