This page is for the previous offering of this course in Fall 2021, which was fully online.

The next offering of this course will be in Spring 2022 and is expected to be in person.


Distributed systems help programmers aggregate the resources of many networked computers to construct highly available and scalable services. Most of the applications and services we interact with today are distributed, some at enormous scales.

This class teaches design and implementation techniques that enable the building of fast, scalable, fault-tolerant distributed systems. Topics include distributed communication models (e.g., sockets, remote procedure calls, distributed shared memory), distributed synchronization (clock synchronization, logical clocks, distributed mutex), distributed file systems, replication, consistency models, fault tolerance, distributed transactions, agreement and commitment, Paxos-based consensus, MapReduce infrastructures, scalable distributed databases.

The class combines concepts and algorithms with descriptions of real-world implementations at Google, Facebook, Yahoo, Microsoft, LinkedIn, etc. In addition to lectures, students will get hands-on experience building distributed systems through a series of coding-oriented homeworks. The series, adopted from MIT’s course, implements a fault-tolerant, sharded key/value store.


Grading

Because the online nature of the Fall 2020 course, there will be no “in-class” quizzes or exams. The grade will be assigned based on performance on the five homeworks.

Additionally, a 10% extra credit may be awarded to students with significant and particularly insightful contributions on Piazza and/or in class throughout the semester. There is no specific number of these awards, but you should think of awardees as people who have stood out consistently and have improved the class in some significant way.


Prerequisites

The homework series will require a lot of coding. Hence, in this class, we require that you have solid coding experience, particularly building systems-level components (e.g., not just apps). This can come either from personal or industry experience, or from the following Columbia courses or equivalents:

  1. COMS W3137 Data Structures and Algorithms
  2. COMS W3157 Advanced Programming
  3. COMS W3827 Fundamentals of Computer Systems
  4. W4118 Operating Systems is not required, but it is a big plus for your homework assignments

Please make sure you can meet the resource requirements listed in the homeworks section.


Acknowledgements

This class, along with the materials distributed for it, was inspired by Distributed Systems courses at various institutions: