Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We have performed the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we are creating RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.
Resources
Pervasive Detection of Process Races in Deployed Systems
Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), October 2011
Finding Concurrency Errors in Sequential Code—OS-level, In-vivo Model Checking of Process Races
Proceedings of the 131th USENIX workshop on Hot topics in operating systems (HOTOS '11), May 2011