
Chapter 2: Data Models and Query Languages. Chapter 1: Reliable, Scalable, and Maintainable Applications. What happens if nodes crash or temporarily go offline? Are messages lost?. What happens if producer is faster than consumer?. In practice, making data available quickly - even in a quirky, difficult to use format - is more valuable than trying to decide on the ideal data model up front. XA transactions: “Just” a C API for interfacing with the 2PC coordinator. Two-Phase Commit (2PC) blocks if coordinator crashes. Causal consistency is the strongest possible consistency model that does not slow down due to network delays, and remains available in the face of network failures. Need to reread this chapter 10 more times. Linearizability is slow, and this is true all the time. Due to network delays, quorums do not guarantee linearizability. Make a system appear as if there were only one copy of the data and all operations on it are atomic.
Server can check if the client still holds a lock/lease by remembering the last writer fencing token.
Clock reading should return a range of time + confidence level, instead of point in time. Google assumes 6ms drift for clock synchronized with NTP every 30 secs, 17 secs if synchronized once a day. Human error is the major cause of network outages. When a timeout occurs, you still don’t know whether the remote node got your request or not, or if is still queued. If you send a request to another node and don’t receive a response, it is impossible to tell why.
In a system with thousands of nodes, something is always broken. SQL standard definition of isolation levels is flawed. Consistency is a property of the application, not the database. High availability, low latency, occasional stale read. Multi datacenter, offline clients, collaborative editing. Scalability of read-only replicas requires async replication. Two operations are concurrent if neither “happens-before” the other. RPC/location transparency: there is no point to make a remote service look too much like a local object, because it is a fundamentally different thing. Base64 encoding increases data size by 33%. Less convenient for simple one-off queries. Good for evolvability, ease to add new relations and properties. Data models affect how we think about the problem that we are solving. Every legacy system is unpleasant on its own way. Reliability: Systems should work correctly even in the face of adversity, including human error. Martin also explains some of the book contents his distributed system course. It made me smile that there is one chapter dedicated to the perils of distributed programming, when the fact is that the whole book is a warning after another of all the possible things that can go wrong. These are my notes on Designing Data-Intensive Applications by Martin Kleppmann.