June, 2011 Archive

Record and Transplay: Partial Checkpointing for Replay Debugging Across Heterogeneous Systems

Dinesh Subhraveti, Jason Nieh Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2011), San Jose, CA, June 7-11, 2011 Abstract Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. To address this problem, we present Transplay, a system that …

Read More

Context-based Online Configuration-Error Detection

Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, Arunvijay Kumar Proceedings of the USENIX Annual Technical Conference (USENIX ’11), June, 2011 Abstract Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected …

Read More

Columbia University Department of Computer Science