This work is supported by the Office of Naval Research (ONR) through Contract N00014-12-1-0166. Opinions, findings, conclusions and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the US Government or ONR.
Dynamic data flow tracking (DFT) is a technique broadly used in a variety of security applications that, unfortunately, exhibits poor performance, preventing its adoption in production systems. We present ShadowReplica, a new and efficient approach for accelerating DFT and other shadow memory-based analyses, by decoupling analysis from execution and utilizing spare CPU cores to run them in parallel. Our approach enables us to run a heavyweight technique, like dynamic taint analysis (DTA), twice as fast, while concurrently consuming fewer CPU cycles than when applying it in-line. DFT is run in parallel by a second shadow thread that is spawned for each application thread, and the two communicate using a shared data structure. We avoid the problems suffered by previous approaches, by introducing an off-line application analysis phase that utilizes both static and dynamic analysis methodologies to generate optimized code for decoupling execution and implementing DFT, while it also minimizes the amount of information that needs to be communicated between the two threads. Furthermore, we use a lock-free ring buffer structure and an N- way buffering scheme to efficiently exchange data between threads and maintain high cache-hit rates on multi-core CPUs. Our evaluation shows that ShadowReplica is on average ~2.3x faster than in-line DFT (~2.75x slowdown over native execution) when running the SPEC CPU2006 benchmark, while similar speed ups were observed with command-line utilities and popular server software. Astoundingly, ShadowReplica also reduces the CPU cycles used up to 30%.