Recent Publications
Teaching Operating Systems Using Android
Abstract
The computing landscape is shifting towards mobile and embedded devices. To learn about operating systems, it is increasingly important for students to gain hands-on kernel programming experience in these environments, which are quite different from traditional desktops and servers. We present our work at Columbia University to teach operating systems by leveraging Android, an open, commercially supported software platform increasingly used on mobile and embedded devices. We introduce a series of five Android kernel programming projects suitable for a one semester introductory operating systems course. Each project teaches a core operating systems concept infused with Android or mobile device-specific context, such as Android-specific process relationships, use of sensors, and design considerations for resource constrained mobile devices. We also introduce an Android virtual laboratory based on virtual appliances, distributed version control, and live demonstrations which gives students hands-on Android experience, all with minimal computing infrastructure. We have used these Android kernel programming projects and virtual lab to teach an introductory operating systems course. Despite mistakes and mis-steps from teaching the course for the first time using Android, over 80% of students surveyed enjoyed using Android in the course, and students preferred Android to traditional desktop development by 3 to 1.
Record and vPlay: Problem Determination with Virtual Replay Across Heterogeneous Systems
Abstract
Application down time is one of the major reasons for revenue loss in the modern enterprise. While aggressive release schedules cause frail software to be released, application failures occurring in the field cost millions to the technical support organizations in personnel time. Since developers usually don’t have direct access to the field environment for a variety of privacy and security reasons, problems are reproduced, analyzed and fixed in very different lab environments. However, the complexity and diversity of application environments make it difficult to accurately replicate the production environment. The indiscriminate collection of data provided by the bug reports often overwhelm or even mislead the developer. A typical issue requires time consuming rounds of clarifications and interactions with the end user, even after which the issue may not manifest.
This dissertation introduces vPlay, a software problem determination system which captures software bugs as they occur in the field into small and self-contained recordings, and allows them to be deterministically reproduced across different operating systems and heterogeneous environments. vPlay makes two key advances over the state of the art. First, the recorded bug can be reproduced in a completely different operating system environment without any kind of dependency on the source. vPlay packages up every piece of data necessary to correctly reproduce the bug on any stateless target machine in the developer environment, without the application, its binaries, and other support data. Second, the data captured by vPlay is small, typically amounting to a few megabytes. vPlay achieves this without requiring changes to the applications, base kernel or hardware.
vPlay employs a recording mechanism which provides data level independence between the application and its source environment by adopting a state machine model of the application to capture every piece of state accessed by the application. vPlay minimizes the size of the recording through a new technique called partial checkpointing, to efficiently capture the partial intermediate state of the application required to replay just the last few moments of its execution prior to the failure. The recorded state is saved as a partial checkpoint along with metadata representing the information specific to the source environment, such as calling convention used for the system calls on the source system, to make it portable across operating systems. A partial checkpoint is loaded by a partial checkpoint loader, which itself is designed to be portable across diÂ¥erent operating systems. Partial checkpointing is combined with a logging mechanism, which monitors the application to identify and record relevant accessed state for root cause analysis and to record application’s nondeterministic events.
vPlay introduces a new type of virtualization abstraction called vPlay Container, to natively replay an application built for one operating system on another. vPlay Container relies on the self-contained recording produced by vPlay to decouple the application from the target operating system environment in three key areas. The application is decoupled from (1) the address space and its content by transparently fulfilling its memory accesses, (2) the instructions and the processor MMU structures such as segment descriptor tables through a binary translation technique designed specifically for user application code, (3) the operating system interface and its services by abstracting the system call interface through emulation and replay. To facilitate root cause analysis, vPlay Container integrates with a standard debugger to enable the user to set breakpoints and single step the replayed execution of the application to examine the contents of variables and other program state at each source line. We have implemented a vPlay prototype which can record unmodified Linux applications and natively replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that vPlay can reproduce real bugs and be used in production with modest recording overhead.
Pervasive Detection of Process Races in Deployed Systems
Abstract
Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.
Practical Software Model Checking via Dynamic Interface Reduction
Abstract
Implementation-level software model checking explores the state space of a system implementation directly to find potential software defects without requiring any specification or modeling. Despite early successes, the effectiveness of this approach remains severely constrained due to poor scalability caused by state-space explosion. DeMeter makes software model checking more practical with the following contributions: (i) proposing dynamic interface reduction, a new state-space reduction technique, (ii) introducing a framework that enables dynamic interface reduction in an existing model checker with a reasonable amount of effort, and (iii) providing the framework with a distributed runtime engine that supports parallel distributed model checking.
We have integrated DeMeter into two existing model checkers, MaceMC and MoDist, each involving changes of around 1,000 lines of code. Compared to the original MaceMC and MoDist model checkers, our experiments have shown state-space reduction from a factor of five to up to five orders of magnitude in representative distributed applications such as Paxos, Berkeley DB, Chord, and Pastry. As a result, when applied to a deployed Paxos implementation, which has been running in production data centers for years to manage tens of thousands of machines, DeMeter manages to explore completely a logically meaningful state space that covers both phases of the Paxos protocol, offering higher assurance of software reliability that was not possible before.
Efficient Deterministic Multithreading through Schedule Relaxation
Abstract
Deterministic multithreading (DMT) eliminates many pernicious software problems caused by nondeterminism. It works by constraining a program to repeat the same thread interleavings, or schedules, when given same input. Despite much recent research, it remains an open challenge to build both deterministic and efficient DMT systems for general programs on commodity hardware. To deterministically resolve a data race, a DMT system must enforce a deterministic schedule of shared memory accesses, or mem-schedule, which can incur prohibitive overhead. By using schedules consisting only of synchronization operations, or sync-schedule, this overhead can be avoided. However, a sync-schedule is deterministic only for race-free programs, but most programs have races.
Our key insight is that races tend to occur only within minor portions of an execution, and a dominant majority of the execution is still race-free. Thus, we can resort to a mem-schedule only for the "racy" portions and enforce a sync-schedule otherwise, combining the efficiency of sync-schedules and the determinism of mem-schedules. We call these combined schedules hybrid schedules.
Based on this insight, we have built PEREGRINE, an efficient deterministic multithreading system. When a program first runs on an input, PEREGRINE records an execution trace. It then relaxes this trace into a hybrid schedule and reuses the schedule on future compatible inputs efficiently and deterministically. PEREGRINE further improves efficiency with two new techniques: determinism-preserving slicing to generalize a schedule to more inputs while preserving determinism, and schedule-guided simplification to precisely analyze a program according to a specific schedule. Our evaluation on a diverse set of programs shows that PEREGRINE is deterministic and efficient, and can frequently reuse schedules for half of the evaluated programs.
Cells: A Virtual Mobile Smartphone Architecture
Abstract
Smartphones are increasingly ubiquitous, and many users carry multiple phones to accommodate work, personal, and geographic mobility needs. We present Cells, a virtualization architecture for enabling multiple virtual smartphones to run simultaneously on the same physical cellphone in an isolated, secure manner. Cells introduces a usage model of having one foreground virtual phone and multiple background virtual phones. This model enables a new device namespace mechanism and novel device proxies that integrate with lightweight operating system virtualization to multiplex phone hardware across multiple virtual phones while providing native hardware device performance. Cells virtual phone features include fully accelerated 3D graphics, complete power management features, and full telephony functionality with separately assignable telephone numbers and caller ID support. We have implemented a prototype of Cells that supports multiple Android virtual phones on the same phone. Our performance results demonstrate that Cells imposes only modest runtime and memory overhead, works seamlessly across multiple hardware devices including Google Nexus 1 and Nexus S phones, and transparently runs Android applications at native speed without any modifications.
Regaining Control over Cloud and Mobile Data
Abstract
While emerging computing technologies – such as cloud computing and small, powerful, mobile devices – offer previously unimaginable global access to data and applications, they also threaten users’ sense of control over data ownership, distribution, and properties. For example, uploading some data to a Web service – such as a document to Google Docs, a photo to Facebook, or an email to Hotmail – causes the user to lose control over that data. The user cannot ensure that the service deletes all copies of her data when she asks it to do so, that her data is not shared with advertisers, or that her data is replicated sufficiently to ensure its long-term availability. Similarly, storing data on a mobile device causes the user to lose control over that data when the device is stolen or lost; the user cannot ensure that the data can never be compromised and she cannot tell whether it has been compromised.
This dissertation examines the broad data security, privacy, and management challenges raised by modern technology and proposes a set of techniques that address these issues. We present four systems, each aiming to re-empower users with a specific aspect of their lost data control. Keypad offers remote access control and auditability for data stored on a stolen device. Vanish provides control over the lifetime of data stored in untrusted clouds. Comet lets clients customize the functionality of trusted cloud storage, while Menagerie provides a uniform view of a user’s scattered Web data. We present the design, implementation, and detailed evaluation for each of the four systems, demonstrating the feasibility of our approaches.
New Directions for Self-destructing Data
Abstract
This paper seeks to advance the state of the art in practical selfdestructing data systems that secure sensitive data from disclosure in our highly mobile, social-networked, cloud-computing world. Our work facilitates the automatic, timed, and simultaneous destruction of all copies of a self-destructing data object (such as a message or file) without any explicit action by the user and without relying on any single trusted third party.
We make three contributions to the study of self-destructing data. First, we present Cascade, an extensible framework for integrating multiple key-storage mechanisms into a single self-destructing data system. Cascade enhances resistance to attack by combining the security advantages of a diverse set of key-storage approaches. Second, we introduce Tide, a new key-storage system for self-destructing data that leverages the ubiquity and easy deployment of Apache Web servers throughout the Internet. Third, based on our earlier work on Vanish and in light of recent attacks against the Vuze DHT, we demonstrate how to significantly harden Vuze and other DHTs against Sybil data-harvesting attacks, making DHTs applicable as key-storage systems under Cascade.
To validate our approach, we designed, implemented, deployed, and measured these systems. We prototyped the extensible Cascade system with support for Tide, Vuze, and OpenDHT. We prototyped the Tide key-storage system on Apache, deployed it on over 400 PlanetLab nodes in the Internet, and demonstrated that the structure is highly immune to attack. Finally, we designed and deployed a set of defenses to Sybil data-harvesting attacks in the live Vuze P2P system and measured them at full scale in the million-node DHT; our results demonstrate that these defenses provide a three-orderof- magnitude improvement over the original Vuze DHT, rendering data-harvesting attacks extremely impractical.
Context-based Online Configuration-Error Detection
Abstract
Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected without user interference.
This paper presents CODE—tool that automatically detects software configuration errors. Our approach is based on identifying invariant configuration access rules that predict what access events follow what contexts. It requires no source code, application-specific semantics, or heavyweight program analysis. Using these rules, CODE can sift through a voluminous number of events and detect deviant program executions. This is in contrast to previous approaches that focus on only diagnosis. In our experiments, CODE successfully detected a real configuration error in one of our deployment machines, in addition to 20 user-reported errors that we reproduced in our test environment. When analyzing month-long event logs from both user desktops and production servers, CODE yielded a low false positive rate. The efficiency of CODE makes it feasible to be deployed as a practical management tool with low overhead.
Record and Transplay: Partial Checkpointing for Replay Debugging Across Heterogeneous Systems
Abstract
Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. To address this problem, we present Transplay, a system that captures production software bugs into small per-bug recordings which are used to reproduce the bugs on a completely different operating system without access to any of the original software used in the production environment. Transplay introduces partial checkpointing, a new mechanism that efficiently captures the partial state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries, libraries, support data, or the original execution environment. Transplay integrates with existing debuggers to provide standard debugging facilities to allow the user to examine the contents of variables and other program state at each source line of the application’s replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that Transplay can reproduce real bugs and be used in production with modest recording overhead.
Optimizing Data Partitioning for Data-Parallel Computing
Abstract
Performance of data-parallel computing (e.g., MapReduce, DryadLINQ) heavily depends on its data partitions. Solutions implemented by the current state of the art systems are far from optimal. Techniques proposed by the database community to find optimal data partitions are not directly applicable when complex user-defined functions and data models are involved. We outline our solution, which draws expertise from various fields such as programming languages and optimization, and present our preliminary results.
Finding Concurrency Errors in Sequential Code—OS-level, In-vivo Model Checking of Process Races
Abstract
While thread races have drawn huge attention from the research community, little has been done for process races, where multiple, possibly sequential, processes access a shared resource, such as a file, without proper synchronization. We present a preliminary study of real process races and show that they are numerous, dangerous, and difficult to detect. To address this problem, we present the design of RACEPRO, an in-vivo model checking system for automatically detecting process races in deployed systems, along with preliminary results from a RACEPRO prototype. To the best of our knowledge, we are the first to study real process races, and RACEPRO is the first system to detect them.
