Context-based Online Configuration-Error Detection
Proceedings of the USENIX Annual Technical Conference (USENIX ’11), June, 2011
Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected without user interference.
This paper presents CODE—tool that automatically detects software configuration errors. Our approach is based on identifying invariant configuration access rules that predict what access events follow what contexts. It requires no source code, application-specific semantics, or heavyweight program analysis. Using these rules, CODE can sift through a voluminous number of events and detect deviant program executions. This is in contrast to previous approaches that focus on only diagnosis. In our experiments, CODE successfully detected a real configuration error in one of our deployment machines, in addition to 20 user-reported errors that we reproduced in our test environment. When analyzing month-long event logs from both user desktops and production servers, CODE yielded a low false positive rate. The efficiency of CODE makes it feasible to be deployed as a practical management tool with low overhead.code:usenix11