Scalable and Systematic Detection of Buggy Inconsistencies in Source Code

Mark Gabel, Junfeng Yang, Yuan Yu, Moises Goldszmidt, Zhendong Su

Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’10), October, 2010


Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel inconsistent clone analysis first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible buggy change analysis framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.



Columbia University Department of Computer Science