Lectures

The following topics will be presented over the course of the semester. Each topic will be covered in (roughly) one lecture. Lecture notes are linked as they become available.

Course introduction
- CAs’ homework series intro
- Go tutorial
Distributed systems primer
- challenges and goals of distributed systems
- example architectures
Distributed computation
- MapReduce
- Spark
- Tradeoffs
Communication models
- remote procedure calls (RPC)
- RPC libraries
- failure models
- semantics
Time and coordination
- challenges
- physical and logical clocks
- distributed mutual exclusion
Agreement in distributed systems
- the atomic commitment problem
- the consensus problem
- use cases for each
- FLP impossibility result of achieving consensus
The transaction abstraction
- ACID semantics
- concurrency control mechanisms
- recovery mechanisms
Atomic commitment protocols
- 2-phase-commit
- blocking nature
Consensus protocols
- Paxos overview, key ideas, basic algorithm
- examples of normal operation and operation under failures
- liveness failure mode
- multi-Paxos
- applications
Case studies from industry:
Broader view of isolation and consistency semantics
- isolation: serializability, repeatable reads, read committed, read uncommitted
- consistency: external, sequential, causal, eventual
- mechanisms for each
- performance/usability tradeoffs
Beyond storage and MapReduce: Broader infrastructure systems
- Google’s software stack
- Meta’s software stack
- Hadoop and Spark software stacks
Cluster scheduling
- scheduler architectures and considerations
- frameworks: YARN, Mesos, Borg
- algorithms: dominant resource fairness, bin packing
Testing and model checking
- testing approaches and challenges
- formal specification and model checking
- TLA+ primer
Security and Byzantine fault tolerance

Distributed Systems Fundamentals

Lectures