Record and vPlay: Problem Determination with Virtual Replay Across Heterogeneous Systems
Ph.D. Thesis, Department of Computer Science, Columbia University, January 2012
Application down time is one of the major reasons for revenue loss in the modern enterprise. While aggressive release schedules cause frail software to be released, application failures occurring in the field cost millions to the technical support organizations in personnel time. Since developers usually don’t have direct access to the field environment for a variety of privacy and security reasons, problems are reproduced, analyzed and fixed in very different lab environments. However, the complexity and diversity of application environments make it difficult to accurately replicate the production environment. The indiscriminate collection of data provided by the bug reports often overwhelm or even mislead the developer. A typical issue requires time consuming rounds of clarifications and interactions with the end user, even after which the issue may not manifest.
This dissertation introduces vPlay, a software problem determination system which captures software bugs as they occur in the field into small and self-contained recordings, and allows them to be deterministically reproduced across different operating systems and heterogeneous environments. vPlay makes two key advances over the state of the art. First, the recorded bug can be reproduced in a completely different operating system environment without any kind of dependency on the source. vPlay packages up every piece of data necessary to correctly reproduce the bug on any stateless target machine in the developer environment, without the application, its binaries, and other support data. Second, the data captured by vPlay is small, typically amounting to a few megabytes. vPlay achieves this without requiring changes to the applications, base kernel or hardware.
vPlay employs a recording mechanism which provides data level independence between the application and its source environment by adopting a state machine model of the application to capture every piece of state accessed by the application. vPlay minimizes the size of the recording through a new technique called partial checkpointing, to efficiently capture the partial intermediate state of the application required to replay just the last few moments of its execution prior to the failure. The recorded state is saved as a partial checkpoint along with metadata representing the information specific to the source environment, such as calling convention used for the system calls on the source system, to make it portable across operating systems. A partial checkpoint is loaded by a partial checkpoint loader, which itself is designed to be portable across diÂ¥erent operating systems. Partial checkpointing is combined with a logging mechanism, which monitors the application to identify and record relevant accessed state for root cause analysis and to record application’s nondeterministic events.
vPlay introduces a new type of virtualization abstraction called vPlay Container, to natively replay an application built for one operating system on another. vPlay Container relies on the self-contained recording produced by vPlay to decouple the application from the target operating system environment in three key areas. The application is decoupled from (1) the address space and its content by transparently fulfilling its memory accesses, (2) the instructions and the processor MMU structures such as segment descriptor tables through a binary translation technique designed specifically for user application code, (3) the operating system interface and its services by abstracting the system call interface through emulation and replay. To facilitate root cause analysis, vPlay Container integrates with a standard debugger to enable the user to set breakpoints and single step the replayed execution of the application to examine the contents of variables and other program state at each source line. We have implemented a vPlay prototype which can record unmodified Linux applications and natively replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that vPlay can reproduce real bugs and be used in production with modest recording overhead.subhravetithesis