The following topics will be presented over the course of the semester. Each topic will be covered in (roughly) one week of lectures. Lecture notes are linked as they become available.

  1. Course introduction
  2. Distributed systems primer
    • challenges and goals of distributed systems
    • example architectures
  3. Distributed computation (Asaf Cidon invited lecture)
    • MapReduce
    • Spark
    • Tradeoffs
  4. Communication models
    • remote procedure calls (RPC)
    • RPC libraries
    • failure models
    • semantics
  5. Time and coordination
    • challenges
    • physical and logical clocks
    • distributed mutual exclusion
  6. Agreement in distributed systems
    • the atomic commitment problem
    • the consensus problem
    • use cases for each
    • FLP impossibility result of achieving consensus
  7. The transaction abstraction
    • ACID semantics
    • concurrency control mechanisms
    • recovery mechanisms
  8. Atomic commitment protocols
    • 2-phase-commit
    • blocking nature
  9. Consensus protocols
    • Paxos overview, key ideas, basic algorithm
    • examples of normal operation and operation under failures
    • liveness failure mode
    • multi-Paxos
    • applications
  10. Case studies from industry:
  11. Broader view of isolation and consistency semantics
    • isolation: serializability, repeatable reads, read committed, read uncommitted
    • consistency: external, sequential, causal, eventual
    • mechanisms for each
    • performance/usability tradeoffs
  12. Beyond storage and MapReduce: Broader infrastructure systems
    • Google’s software stack
    • Facebook’s software stack
    • Open source software stacks
  13. Other models of interaction in DS
    • producer-consumer interaction
    • publish/subscribe systems, streaming systems, examples
    • event-driven and microservice architectures
  14. Select topics of DS security
    • byzantine fault tolerance
    • authentication protocols: Needham-Schroeder, Kerberos