- AboutThis should describe the systems research collaboration, and present the overall research goals of the new group.
- PeopleHere are the different labs in the SRC…
- PublicationsA page where you will find categorized publications!
- ProjectsA page where you will find our projects
- ResourcesVarious resources for prospective students, current students, alumni. Maybe put something here about life in NYC and at Columbia…
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2014
As ARM CPUs become increasingly common in mobile devices and servers, there is a growing demand for providing the benefits of virtualization for ARM-based devices. We present our experiences building the Linux ARM hypervisor, KVM/ARM, the first full system ARM virtualization solution that can run unmodified guest operating systems on ARM multicore hardware. KVM/ARM introduces split-mode virtualization, allowing a hypervisor to split its execution across CPU modes and be integrated into the Linux kernel. This allows KVM/ARM to leverage existing Linux hardware support and functionality to simplify hypervisor development and maintainability while utilizing recent ARM hardware virtualization extensions to run virtual machines with comparable performance to native execution. KVM/ARM has been successfully merged into the mainline Linux kernel, ensuring that it will gain wide adoption as the virtualization platform of choice for ARM. We provide the first measurements on real hardware of a complete hypervisor using ARM hardware virtualization support. Our results demonstrate that KVM/ARM has modest virtualization performance and power costs, and can achieve lower performance and power costs compared to x86-based Linux virtualization on multicore hardware.
Proceedings of the 45th ACM Technical Symposium on Computer Science Education, March 2014
Learning about operating systems often involves modifying a large and complex code base. Grading student projects can be difficult and time consuming, yet students often do not learn from their programming errors and struggle to understand core operating system concepts. We present GradeBoard, a code review system designed to simplify grading for instructors and enable students to understand and learn from their errors. GradeBoard provides an easy-to-use Web interface that allows instructors to annotate student code submissions with grading comments and scores, and students to discuss the comments and scores with instructors. GradeBoard presents student code changes with syntax highlighting and lets users collapse or expand code sections to provide a desired level of context, making it easier to read and understand student programming project submissions. Comments and scores are easily identifiable by visual cues, improving interaction between instructors and students. We have deployed and used GradeBoard in a large operating systems course involving Linux kernel programming projects. GradeBoard provided robust, easy-to-use functionality for reviewing Linux kernel code changes, improved the instructional staff grading experience, and over 90% of students surveyed indicated that GradeBoard improved their understanding of the kernel programming projects better than other alternatives.
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2014
We present Cider, an operating system compatibility architecture that can run applications built for different mobile ecosystems, iOS or Android, together on the same smartphone or tablet. Cider enhances the domestic operating system, Android, of a device with kernel-managed, per-thread personas to mimic the application binary interface of a foreign operating system, iOS, enabling it to run unmodified foreign binaries. This is accomplished using a novel combination of binary compatibility techniques including two new mechanisms: compile-time code adaptation, and diplomatic functions. Compile-time code adaptation enables existing unmodified foreign source code to be reused in the domestic kernel, reducing implementation effort required to support multiple binary interfaces for executing domestic and foreign applications. Diplomatic functions leverage per-thread personas, and allow foreign applications to use domestic libraries to access proprietary software and hardware interfaces. We have built a Cider prototype, and demonstrate that it imposes modest performance overhead and runs un-modified iOS and Android applications together on a Google Nexus tablet running the latest version of Android.
Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS 2013), November 2013
Dynamic data flow tracking (DFT) is a technique broadly used in a variety of security applications that, unfortunately, exhibits poor performance, preventing its adoption in production systems. We present ShadowReplica, a new and efficient approach for accelerating DFT and other shadow memory-based analyses, by decoupling analysis from execution and utilizing spare CPU cores to run them in parallel. Our approach enables us to run a heavyweight technique, like dynamic taint analysis (DTA), twice as fast, while concurrently consuming fewer CPU cycles than when applying it in-line. DFT is run in parallel by a second shadow thread that is spawned for each application thread, and the two communicate using a shared data structure. We avoid the problems suffered by previous approaches, by introducing an off-line application analysis phase that utilizes both static and dynamic analysis methodologies to generate optimized code for decoupling execution and implementing DFT, while it also minimizes the amount of information that needs to be communicated between the two threads. Furthermore, we use a lock-free ring buffer structure and an N- way buffering scheme to efficiently exchange data between threads and maintain high cache-hit rates on multi-core CPUs. Our evaluation shows that ShadowReplica is on average ~2.3x faster than in-line DFT (~2.75x slowdown over native execution) when running the SPEC CPU2006 benchmark, while similar speed ups were observed with command-line utilities and popular server software. Astoundingly, ShadowReplica also reduces the CPU cycles used up to 30%.
Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), November 2013
Multithreaded programs are hard to get right. A key reason is that the contract between developers and runtimes grants exponentially many schedules to the runtimes. We present PARROT, a simple, practical runtime with a new contract to developers. By default, it orders thread synchronizations in the well-defined round-robin order, vastly reducing schedules to provide determinism (more precisely, deterministic synchronizations) and stability (i.e., robustness against input or code perturbations, a more useful property than determinism). When default schedules are slow, it allows developers to write intuitive performance hints in their code to switch or add schedules for speed. We believe this â€œmeet in the middleâ€ contract eases writing correct, efficient programs.
We further present an ecosystem formed by integrating PARROT with a model checker called DBUG. This ecosystem is more effective than either system alone: DBUG checks the schedules that matter to PARROT, and PARROT greatly increases the coverage of DBUG.
Results on a diverse set of 108 programs, roughly 10x more than any prior evaluation, show that PARROT is easy to use (averaging 1.2 lines of hints per program); achieves low overhead (6.9% for 55 real-world programs and 12.7% for all 108 programs), 10x better than two prior systems; scales well to the maximum allowed cores on a 24-core server and to different scales/types of workloads; and increases DBUG's coverage by 106 - 1019734 for 56 programs. PARROT's source code, entire benchmark suite, and raw results are available at github.com/columbia/smt-mc.
Proceedings of the 4th ACM Symposium on Cloud Computing (SOCC '13), October 2013
Cloud-sourced virtual appliances (VAs) have been touted as powerful solutions for many software maintenance, mobility, backward compatibility, and security challenges. In this paper, we ask whether it is possible to create a VA cloud service that supports fluid, interactive user experience even over mobile networks. More specifically, we wish to support a YouTube-like streaming service for executable content, such as games, interactive books, research artifacts, etc. Users should be able to post, browse through, and interact with executable content swiftly and without long interruptions. Intuitively, this seems impossible; the bandwidths, latencies, and costs of last-mile networks would be prohibitive given the sheer sizes of virtual machines! Yet, we show that a set of carefully crafted, novel prefetching and streaming techniques can bring this goal surprisingly close to reality. We show that vTube, a VA streaming system that incorporates our techniques, supports fluid interaction even in challenging network conditions, such as 4G LTE.
Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '13), August 2013
Alias analysis is perhaps one of the most crucial and widely used analyses, and has attracted tremendous research efforts over the years. Yet, advanced alias analyses are extremely difficult to get right, and the bugs in these analyses are one key reason that they have not been adopted to production compilers. This paper presents NEONGOBY, a system for effectively detecting errors in alias analysis implementations, improving their correctness and hopefully widening their adoption. NEONGOBY detects the worst type of bugs where the alias analysis claims that two pointers never alias, but they actually alias at runtime. NEONGOBY works by dynamically observing pointer addresses during the execution of a test program and then checking these addresses against an alias analysis for errors. It is explicitly designed to (1) be agnostic to the alias analysis it checks for maximum applicability and ease of use and (2) detect alias analysis errors that manifest on real-world programs and workloads. It emits no false positives as long as test programs do not have undefined behavior per ANSI C specification or call external functions that interfere with our detection algorithm. It reduces performance overhead using a practical selection of techniques. Evaluation on three popular alias analyses and real-world programs Apache and MySQL shows that NEONGOBY effectively finds 29 alias analysis bugs with zero false positives and reasonable overhead; the most serious four bugs have been patched by the developers. To enable alias analysis builders to start using NEONGOBY today, we have released it open-source at https://github.com/columbia/neongoby, along with our error detection results and proposed patches.
Proceedings of the 8th ACM Conference on emerging Networking EXperiments and Technologies, (CoNEXT 2012), December 2012
Clouds commonly store Virtual Machine (VM) images on networked storage. This poses a serious potential scalability bottleneck as launching a single fresh VM instance requires, at minimum, several hundred MB of network reads. As this bottleneck occurs most severely during read-intensive launching of new VMs, we focus on scalably minimizing time to boot a VM and load its critical applications.
While effective scalable P2P streaming techniques for Video on Demand (VOD) scenarios where blocks arrive in-order and at constant rate are available, no techniques address scalable large-executable streaming. VM execution is non-deterministic, divergent, variable rate, and cannot miss blocks. VMTorrent introduces a novel combination of block prioritization, profile-based execution prefetch, on-demand fetch, and decoupling of VM image presentation from underlying data-stream. VMTorrent provides the first complete and effective solution to this growing scalability problem that is based on making better use of existing capacity, instead of throwing more hardware at it.
Supported by analytic modeling, we present comprehensive experimental evaluation of VMTorrent on real systems at scale, demonstrating the effectiveness of VMTorrent. We find that VMTorrent supports comparable execution time to that achieved using local disk. VMTorrent maintains this performance while scaling to 100 instances, providing up to 11x speedup over current state-of-the-art and 30x over traditional network storage.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI'12), October 2012
Mobile-device theft and loss have reached gigantic proportions. Despite these threats, today's mobile devices are saturated with sensitive information due to operating systems that never securely erase data and applications that hoard it on the vulnerable device for performance or convenience. This paper presents CleanOS, a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. To do so, CleanOS leverages a key property of today's mobile applications -- the use of trusted, cloud-based services. Specifically, CleanOS identifies and tracks sensitive data in RAM and on stable storage, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. We call this process idle eviction of sensitive data. To implement CleanOS, we used the TaintDroid mobile taint-tracking system to identify sensitive data locations and instrumented Android's Dalvik interpreter to securely evict that data after a specified period of non-use. Our experimental results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks.
IEEE/ACM Transactions on Networking, Volume 20, Issue 5, October 2012
Peer-to-peer file-sharing applications suffer from a fundamental problem of unfairness. Free-riders cause slower download times for others by contributing little or no upload bandwidth while consuming much download bandwidth. Previous attempts to address this fair bandwidth allocation problem suffer from slow peer discovery, inaccurate predictions of neighboring peers' bandwidth allocations, underutilization of bandwidth, and complex parameter tuning. We present FairTorrent, a new deficit-based distributed algorithm that accurately rewards peers in accordance with their contribution. A FairTorrent peer simply uploads the next data block to a peer to whom it owes the most data as measured by a deficit counter. FairTorrent is resilient to exploitation by free-riders and strategic peers, is simple to implement, requires no bandwidth overallocation, no prediction of peers' rates, no centralized control, and no parameter tuning. We implemented FairTorrent in a BitTorrent client without modifications to the BitTorrent protocol and evaluated its performance against other widely used BitTorrent clients. Our results show that FairTorrent provides up to two orders of magnitude better fairness, up to five times better download times for contributing peers, and 60%â€“100% better performance on average in live BitTorrent swarms.