Alias analysis is perhaps one of the most crucial and widely used analyses, and has attracted tremendous research efforts over the years. Yet, advanced alias analyses are extremely difficult to get right, and the bugs in these analyses are most likely the reason that they have not been adopted to production compilers.
We are building NeonGoby, a system for effectively detecting errors in alias analysis implementations, improving their correctness and hopefully vastly widening their adoption. We explicitly designed NeonGoby to be agnostic to the alias analysis it checks: the only requirement is a standard MayAlias(p, q) interface that returns true if p and q may alias and false otherwise. This minimum requirement ensures maximum applicability and ease of use. To check an alias analysis with NeonGoby, a user additionally chooses a test program and workload at her will. For instance, she can choose a large program such as Apache and MySQL and a stressful workload that together exercise many diverse program constructs, such as the corner cases listed in the previous paragraph. This flexibility enables NeonGoby to catch alias analysis bugs that manifest on real-world programs and workloads.
We currently implemented NeonGoby within the LLVM compiler, and checked three popular LLVM alias analyses, including (1) basicaa, LLVM's default alias analysis; (2) anders-aa, LLVM's implementation of an inter-procedural Andersen's algorithm; (3) and ds-aa, a context-sensitive, field-sensitive algorithm with full heap cloning. To check these analyses, we selected two real-world programs MySQL and Apache and the workloads their developers use. NeonGoby found 29 bugs in anders-aa and ds-aa, including 24 previously unknown bugs, with only 2 false positives and reasonable performance overhead. We have reported five bugs to ds-aa developers, 4 of which have been patched.