Recent Publications
VMTorrent: Scalable P2P Virtual Machine Streaming
CleanOS: Limiting Mobile Data Exposure with Idle Eviction
Abstract
Mobile-device theft and loss have reached gigantic proportions. Despite these threats, today’s mobile devices are saturated with sensitive information due to operating systems that never securely erase data and applications that hoard it on the vulnerable device for performance or convenience. This paper presents CleanOS, a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. To do so, CleanOS leverages a key property of today’s mobile applications — the use of trusted, cloud-based services. Specifically, CleanOS identifies and tracks sensitive data in RAM and on stable storage, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. We call this process idle eviction of sensitive data. To implement CleanOS, we used the TaintDroid mobile taint-tracking system to identify sensitive data locations and instrumented Android’s Dalvik interpreter to securely evict that data after a specified period of non-use. Our experimental results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks.
FairTorrent: A Deficit-Based Distributed Algorithm to Ensure Fairness in Peer-to-Peer Systems
Abstract
Peer-to-peer file-sharing applications suffer from a fundamental problem of unfairness. Free-riders cause slower download times for others by contributing little or no upload bandwidth while consuming much download bandwidth. Previous attempts to address this fair bandwidth allocation problem suffer from slow peer discovery, inaccurate predictions of neighboring peers’ bandwidth allocations, underutilization of bandwidth, and complex parameter tuning. We present FairTorrent, a new deficit-based distributed algorithm that accurately rewards peers in accordance with their contribution. A FairTorrent peer simply uploads the next data block to a peer to whom it owes the most data as measured by a deficit counter. FairTorrent is resilient to exploitation by free-riders and strategic peers, is simple to implement, requires no bandwidth overallocation, no prediction of peers’ rates, no centralized control, and no parameter tuning. We implemented FairTorrent in a BitTorrent client without modifications to the BitTorrent protocol and evaluated its performance against other widely used BitTorrent clients. Our results show that FairTorrent provides up to two orders of magnitude better fairness, up to five times better download times for contributing peers, and 60%–100% better performance on average in live BitTorrent swarms.
The Design, Implementation, and Evaluation of Cells: A Virtual Smartphone Architecture
Abstract
Smartphones are increasingly ubiquitous, and many users carry multiple phones to accommodate work, personal, and geographic mobility needs. We present Cells, a virtualization architecture for enabling multiple virtual smartphones to run simultaneously on the same physical cellphone in an isolated, secure manner. Cells introduces a usage model of having one foreground virtual phone and multiple background virtual phones. This model enables a new device namespace mechanism and novel device proxies that integrate with lightweight operating system virtualization to multiplex phone hardware across multiple virtual phones while providing native hardware device performance. Cells virtual phone features include fully accelerated 3D graphics, complete power management features, and full telephony functionality with separately assignable telephone numbers and caller ID support. We have implemented a prototype of Cells that supports multiple Android virtual phones on the same phone. Our performance results demonstrate that Cells imposes only modest runtime and memory overhead, works seamlessly across multiple hardware devices including Google Nexus 1 and Nexus S phones, and transparently runs Android applications at native speed without any modifications.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization
Abstract
Parallel programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over all schedules requires over-approximations, resulting in poor precision; dynamic analysis rarely covers more than a tiny fraction of all schedules. We propose an approach called schedule specialization to analyze a parallel program over only a small set of schedules for precision, and then enforce these schedules at runtime for soundness of the static analysis results. We build a schedule specialization framework for C/C++ multithreaded programs that use Pthreads. Our framework avoids the need to modify every analysis to be schedule-aware by specializing a program into a simpler program based on a schedule, so that the resultant program can be analyzed with stock analyses for improved precision. Moreover, our framework provides a precise schedule-aware def-use analysis on memory locations, enabling us to build three highly precise analyses: an alias analyzer, a data-race detector, and a path slicer. Evaluation on 17 programs, including 2 real-world programs and 15 popular benchmarks, shows that analyses using our framework reduced may-aliases by 61.9%, false race reports by 69%, and path slices by 48.7%; and detected 7 unknown bugs in well-checked programs.
Concurrency Attacks
Abstract
Just as errors in sequential programs can lead to security exploits, errors in concurrent programs can lead to concurrency attacks. Questions such as whether these attacks are real and what characteristics they have remain largely unknown. In this paper, we present a preliminary study of concurrency attacks and the security implications of real concurrency errors. Our study yields several interesting findings. For instance, we observe that the exploitability of a concurrency error depends on the duration of the timing window within which the error may occur. We further observe that attackers can increase this window through carefully crafted inputs. We also find that four out of five commonly used sequential defense mechanisms become unsafe when applied to concurrent programs. Based on our findings, we propose new defense directions and fixes to existing defenses.
[download_link link="http://systems.cs.columbia.edu/files/concurrency-attacks_errors.tar.bz2" variation="green"]errors.tar.bz2[/download_link]
Teaching Operating Systems Using Android
Abstract
The computing landscape is shifting towards mobile and embedded devices. To learn about operating systems, it is increasingly important for students to gain hands-on kernel programming experience in these environments, which are quite different from traditional desktops and servers. We present our work at Columbia University to teach operating systems by leveraging Android, an open, commercially supported software platform increasingly used on mobile and embedded devices. We introduce a series of five Android kernel programming projects suitable for a one semester introductory operating systems course. Each project teaches a core operating systems concept infused with Android or mobile device-specific context, such as Android-specific process relationships, use of sensors, and design considerations for resource constrained mobile devices. We also introduce an Android virtual laboratory based on virtual appliances, distributed version control, and live demonstrations which gives students hands-on Android experience, all with minimal computing infrastructure. We have used these Android kernel programming projects and virtual lab to teach an introductory operating systems course. Despite mistakes and mis-steps from teaching the course for the first time using Android, over 80% of students surveyed enjoyed using Android in the course, and students preferred Android to traditional desktop development by 3 to 1.
NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds
Abstract
To provide fault tolerance for cloud storage, recent studies propose to stripe data across multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all its data, then we need to repair the lost data from other surviving clouds to preserve data redundancy. We present a proxy-based system for multiple-cloud storage called NCCloud, which aims to achieve cost-effective repair for a permanent single-cloud failure. NCCloud is built on top of network-coding-based storage schemes called regenerating codes. Specifically, we propose an implementable design for the functional minimum-storage regenerating code (F-MSR), which maintains the same data redundancy level and same storage requirement as in traditional erasure codes (e.g., RAID-6), but uses less repair traffic. We implement a proof-of-concept prototype of NCCloud and deploy it atop local and commercial clouds. We validate the cost effectiveness of F-MSR in storage repair over RAID-6, and show that both schemes have comparable response time performance in normal cloud storage operations.
Record and vPlay: Problem Determination with Virtual Replay Across Heterogeneous Systems
Abstract
Application down time is one of the major reasons for revenue loss in the modern enterprise. While aggressive release schedules cause frail software to be released, application failures occurring in the field cost millions to the technical support organizations in personnel time. Since developers usually don’t have direct access to the field environment for a variety of privacy and security reasons, problems are reproduced, analyzed and fixed in very different lab environments. However, the complexity and diversity of application environments make it difficult to accurately replicate the production environment. The indiscriminate collection of data provided by the bug reports often overwhelm or even mislead the developer. A typical issue requires time consuming rounds of clarifications and interactions with the end user, even after which the issue may not manifest.
This dissertation introduces vPlay, a software problem determination system which captures software bugs as they occur in the field into small and self-contained recordings, and allows them to be deterministically reproduced across different operating systems and heterogeneous environments. vPlay makes two key advances over the state of the art. First, the recorded bug can be reproduced in a completely different operating system environment without any kind of dependency on the source. vPlay packages up every piece of data necessary to correctly reproduce the bug on any stateless target machine in the developer environment, without the application, its binaries, and other support data. Second, the data captured by vPlay is small, typically amounting to a few megabytes. vPlay achieves this without requiring changes to the applications, base kernel or hardware.
vPlay employs a recording mechanism which provides data level independence between the application and its source environment by adopting a state machine model of the application to capture every piece of state accessed by the application. vPlay minimizes the size of the recording through a new technique called partial checkpointing, to efficiently capture the partial intermediate state of the application required to replay just the last few moments of its execution prior to the failure. The recorded state is saved as a partial checkpoint along with metadata representing the information specific to the source environment, such as calling convention used for the system calls on the source system, to make it portable across operating systems. A partial checkpoint is loaded by a partial checkpoint loader, which itself is designed to be portable across diÂ¥erent operating systems. Partial checkpointing is combined with a logging mechanism, which monitors the application to identify and record relevant accessed state for root cause analysis and to record application’s nondeterministic events.
vPlay introduces a new type of virtualization abstraction called vPlay Container, to natively replay an application built for one operating system on another. vPlay Container relies on the self-contained recording produced by vPlay to decouple the application from the target operating system environment in three key areas. The application is decoupled from (1) the address space and its content by transparently fulfilling its memory accesses, (2) the instructions and the processor MMU structures such as segment descriptor tables through a binary translation technique designed specifically for user application code, (3) the operating system interface and its services by abstracting the system call interface through emulation and replay. To facilitate root cause analysis, vPlay Container integrates with a standard debugger to enable the user to set breakpoints and single step the replayed execution of the application to examine the contents of variables and other program state at each source line. We have implemented a vPlay prototype which can record unmodified Linux applications and natively replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that vPlay can reproduce real bugs and be used in production with modest recording overhead.
Improving Virtual Appliance Management through Virtual Layered File Systems
Abstract
Managing many computers is difficult. Recent virtualization trends exacerbate this problem by making it easy to create and deploy multiple virtual appliances per physical machine, each of which can be configured with different applications and utilities. This results in a huge scaling problem for large organizations as management overhead grows linearly with the number of appliances.
To address this problem, we introduce Strata, a system that combines unioning file system and package management semantics to enable more efficient creation, provisioning and management of virtual appliances. Unlike traditional systems that depend on monolithic file systems, Strata uses a collection of individual sotware layers that are composed together into the Virtual Layered File System (VLFS) to provide the traditional file system view. Individual layers are maintained in a central repository and shared across all file systems that use them. Layer changes and upgrades only need to be done once in the repository and are then automatically propagated to all virtual appliances, resulting in management overhead independent of the number of appliances. Our Strata Linux prototype requires only a single loadable kernel module providing the VLFS support and doesn’t require any application or source code level kernel modifications. Using this prototype, we demonstrate how Strata enables fast system provisioning, simplifies system maintenance and upgrades, speeds system recovery from security exploits, and incurs only modest performance overhead.
Pervasive Detection of Process Races in Deployed Systems
Abstract
Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.
Practical Software Model Checking via Dynamic Interface Reduction
Abstract
Implementation-level software model checking explores the state space of a system implementation directly to find potential software defects without requiring any specification or modeling. Despite early successes, the effectiveness of this approach remains severely constrained due to poor scalability caused by state-space explosion. DeMeter makes software model checking more practical with the following contributions: (i) proposing dynamic interface reduction, a new state-space reduction technique, (ii) introducing a framework that enables dynamic interface reduction in an existing model checker with a reasonable amount of effort, and (iii) providing the framework with a distributed runtime engine that supports parallel distributed model checking.
We have integrated DeMeter into two existing model checkers, MaceMC and MoDist, each involving changes of around 1,000 lines of code. Compared to the original MaceMC and MoDist model checkers, our experiments have shown state-space reduction from a factor of five to up to five orders of magnitude in representative distributed applications such as Paxos, Berkeley DB, Chord, and Pastry. As a result, when applied to a deployed Paxos implementation, which has been running in production data centers for years to manage tens of thousands of machines, DeMeter manages to explore completely a logically meaningful state space that covers both phases of the Paxos protocol, offering higher assurance of software reliability that was not possible before.
