- AboutThis should describe the systems research collaboration, and present the overall research goals of the new group.
- PeopleHere are the different labs in the SRC…
- PublicationsA page where you will find categorized publications!
- ProjectsA page where you will find our projects
- ResourcesVarious resources for prospective students, current students, alumni. Maybe put something here about life in NYC and at Columbia…
Proceedings of the 1st ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2005), October 2005
We present Group Round-Robin (GRR) scheduling, a hybrid fair packet scheduling framework based on a grouping strategy that narrows down the traditional trade-off between fairness and com- putational complexity. GRR combines its grouping strategy with a specialized round-robin scheduling algorithm that utilizes the prop- erties of GRR groups to schedule flows within groups in a man- ner that provides O(1) bounds on fairness with only O(1) time complexity. Under the practical assumption that GRR employs a small constant number of groups, we apply GRR to popular fair queuing scheduling algorithms and show how GRR can be used to achieve constant bounds on fairness and time complexity for these algorithms. We also present and prove new results on the fairness bounds for several of these fair queuing algorithms using a consistent fairness measure. We analyze the behavior of GRR and present experimental results that demonstrate how GRR can be combined with existing scheduling algorithms to provide much lower scheduling overhead and more than an order of magnitude better scheduling accuracy in practice than scheduling algorithms without GRR.
Web Content Delivery (Web Information Systems Engineering and Internet Technologies Book Series), September 2005
Proceedings of the 2005 IEEE International Conference on Cluster Computing (Cluster 2005), September 2005
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network ap- plications on commodity clusters. ZapC provides a thin virtualization layer on top of the operating system that de- couples a distributed application from dependencies on the cluster nodes on which it is executing. This decoupling en- ables ZapC to checkpoint an entire distributed application across all nodes in a coordinated manner such that it can be restarted from the checkpoint on a different set of cluster nodes at a later time. ZapC checkpoint-restart operations execute in parallel across different cluster nodes, provid- ing faster checkpoint-restart performance. ZapC uniquely supports network state in a transport protocol independent manner, including correctly saving and restoring socket and protocol state for both TCP and UDP connections. We have implemented a ZapC Linux prototype and demonstrate that it provides low virtualization overhead and fast checkpoint- restart times for distributed network applications without any application, library, kernel, or network protocol modi- fications.
Proceedings of the 4th International Conference on Web-based Learning (ICWL 2005), July 2005
The increasing popularity of online courses has highlighted the lack of collaborative tools for student groups. In addition, the intro- duction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources used by students. We present an e-Learning architecture and adaptation model called AI2 TV (Adaptive Internet Interactive Team Video), which allows virtual students, possi- bly some or all disadvantaged in network resources, to collaboratively view a video in synchrony. AI2 TV upholds the invariant that each stu- dent will view semantically equivalent content at all times. Video player actions, like play, pause and stop, can be initiated by any student and their results are seen by all the other students. These features allow group members to review a lecture video in tandem, facilitating the learning process. Experimental trials show that AI2 TV can successfully synchro- nize video for distributed students while, at the same time, optimizing the video quality, given fluctuating bandwidth, by adaptively adjusting the quality level for each student.
Proceedings of the 2nd IEEE International Conference on Autonomic Computing (ICAC 2005), June 2005
Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating sys- tem updates while preserving application service availabil- ity. AutoPod provides a group of processes and associated users with an isolated machine-independent virtualized en- vironment that is decoupled from the underlying operating system instance. This virtualized environment is integrated with a novel checkpoint-restart mechanism which allows processes to be suspended, resumed, and migrated across operating system kernel versions with different security and maintenance patches. AutoPod incorporates a system status service to determine when operating system patches need to be applied to the current host, then automatically migrates application services to another host to preserve their avail- ability while the current host is updated and rebooted.
Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS '05), June 2005
File systems, RAID systems, and applications that care about data consistency, among others, assure data integrity by carefully forcing valuable data to stable storage. Unfortunately, verifying that a system recovers from a crash to a valid state at any program counter is very difficult. Previous techniques for finding data integrity bugs have been heavyweight, requiring extensive effort for each OS and file system to be checked. We demonstrate a lightweight, flexible, easy-to-apply technique by developing a tool called eXplode and show how we used it to find 25 serious bugs in eight Linux file systems, Linux software RAID 5, Linux NFS, and three version control systems.
Proceedings of the 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI 2005), May 2005
An important trend in information technology is the use of increasingly large distributed systems to deploy increasingly complex and mission-critical applications. In order for these systems to achieve the ultimate goal of having similar ease- of-use properties as centralized systems they must allow fast, reliable, and lightweight management and synchronization of their configuration state. This goal poses numerous technical challenges in a truly Internet-scale system, including varying degrees of network connectivity, inevitable machine failures, and the need to distribute information globally in a fast and re- liable fashion. In this paper we discuss the design and implementation of a configuration management system for the Akamai Network. It allows reliable yet highly asynchronous delivery of configura- tion information, is significantly fault-tolerant, and can scale if necessary to hundreds of thousands of servers. The system is fully functional today providing configuration management to over 15,000 servers deployed in 1200+ differ- ent networks in 60+ countries.
Proceedings of the 14th International World Wide Web Conference (WWW 2005), May 2005
We present WebPod, a portable system that enables mobile users to use the same persistent, personalized web brows- ing session on any Internet-enabled device. No matter what computer is being used, WebPod provides a consistent brows- ing session, maintaining all of a userâ€™s plugins, bookmarks, browser web content, open browser windows, and browser configuration options and preferences. This is achieved by leveraging rapid improvements in capacity, cost, and size of portable storage devices. WebPod provides a virtualiza- tion and checkpoint/restart mechanism that decouples the browsing environment from the host, enabling web browsing sessions to be suspended to portable storage, carried around, and resumed from the storage device on another computer. WebPod virtualization also isolates web browsing sessions from the host, protecting the browsing privacy of the user and preventing malicious web content from damaging the host. We have implemented a Linux WebPod prototype and demonstrate its ability to quickly suspend and resume web browsing sessions, enabling a seamless web browsing expe- rience for mobile users as they move among computers.
Group Ratio Round-Robin: O(1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems
Proceedings of the 2005 USENIX Annual Technical Conference, April 2005
We present Group Ratio Round-Robin (GR3 ), the first pro- portional share scheduler that combines accurate propor- tional fairness scheduling behavior with O(1) scheduling overhead on both uniprocessor and multiprocessor systems. GR3 uses a simple grouping strategy to organize clients into groups of similar processor allocations which can be more easily scheduled. Using this strategy, GR3 combines the benefits of low overhead round-robin execution with a novel ratio-based scheduling algorithm. GR3 introduces a novel frontlog mechanism and weight readjustment algo- rithm to operate effectively on multiprocessors. GR3 pro- vides fairness within a constant factor of the ideal general- ized processor sharing model for client weights with a fixed upper bound and preserves its fairness properties on multi- processor systems. We have implemented GR3 in Linux and measured its performance. Our experimental results show that GR3 provides much lower scheduling overhead and much better scheduling accuracy than other schedulers commonly used in research and practice.