Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems

Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2010), New York, NY, June 14-18, 2010, pp. 155-166


We present SCRIBE, the first system to provide transparent, low- overhead application record-replay and the ability to go live from replayed execution. SCRIBE introduces new lightweight operat- ing system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependen- cies sufficient for replay, avoiding the recording overhead of main- taining an exact execution ordering. Sync points convert asyn- chronous interactions that can occur at arbitrary times into syn- chronous events that are much easier to record and replay. We have implemented SCRIBE without changing, relinking, or re- compiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware perfor- mance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our re- sults show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. SCRIBE recording overhead is less than 2.5% for server applications includ- ing Apache and MySQL, and less than 15% for desktop applica- tions including Firefox, Acrobat, OpenOffice, parallel kernel com- pilation, and movie playback.



Columbia University Department of Computer Science