Many previous systems can aid race detection, and diagnosing. However, they do not directly address deployed races. A conventional solution to fixing deployed race is software update, but this method requires application restarts. Live update system can avoid restarts by adapting conventional patches into hot patches and applying them to live systems, but conventional patches can introduce new errors, and creating a releasable patch can still take time.
Loom is a “live-workaround” system designed to quickly protect applications against races until correct conventional patches are available and the applications can be restarted. It reflects our belief that the true power of live update is its ability to provide immediate workarounds. To use Loom, developers first compile their application with Loom. At runtime, to workaround a race, an application developer writes an execution filter that synchronizes the application source to filter out racy thread interleavings. This filter is kept separate from the source. Application users can then download the filter and, for immediate protection, install it to their application without a restart.
We evaluated LOOM on nine real races from a diverse set of six applications: two server applications, MySQL and Apache; one desktop application PBZip2 (a parallel compression tool); and implementations of three scientific algorithms in SPLASH2. Our results show that Loom is effective, fast, scalable, and easy to use.
Deployed multithreaded applications contain many races because these applications are difficult to write, test, and debug. Worse, the number of races in deployed applications may drastically increase due to the rise of multicore hardware and the immaturity of current race detectors.
LOOM is a “live-workaround” system designed to quickly and safely bypass application races at runtime. LOOM provides a flexible and safe language for developers to write execution filters that explicitly synchronize code. It then uses an evacuation algorithm to safely install the filters to live applications to avoid races. It reduces its performance overhead using hybrid instrumentation that combines static and dynamic instrumentation.
We evaluated LOOM on nine real races from a diverse set of six applications, including MySQL and Apache. Our results show that (1) LOOM can safely fix all evaluated races in a timely manner, thereby increasing application availability; (2) LOOM incurs little performance overhead; (3) LOOM scales well with the number of application threads; and (4) LOOM is easy to use.
We released Loom open-source at GitHub.