Guanyin: A Cloud Computing Infrastructure for Perpetual Checking of Deployed Software

Software reliability affects virtually everyone. Thorough software checking is unquestionably crucial to improve software reliability, but the checking coverage of most existing techniques is severely hampered by where they are applied: a software product is typically checked only at the site where it is developed, thus the number of different states checked is throttled by those sites' resources (e.g., machines, testers/users, software/hardware configurations).

To address this fundamental problem, we are investigating mechanisms that will enable software vendors to continue checking for bugs after a product is deployed, thus checking a drastically more diverse set of states. Our research focus includes the investigation, development, and deployment of: (1) a wide-area autonomic software checking infrastructure to support continuous checking of deployed software in a transparent, efficient, and scalable manner; (2) a simple yet general and powerful checking interface to facilitate creation of new checking techniques and combination of existing techniques into more powerful means to find subtle bugs that are often not found during conventional pre-deployment testing; (3) lightweight isolation, checkpoint, migration, and deterministic replay mechanisms that enable replication of application processes as checking launch points, isolation of replicas from users, migration of replicas across hosts, and replay of identified bugs without need for the original execution environment; and (4) distributed computing mechanisms for efficiently and scalably leveraging geographically dispersed idle resources to determine where and when replicas should be executed to improve the speed and coverage of software checking, thereby converting available hardware cycles into improved software reliability.