Automatically detecting and removing ineffective database schema mutants
Introduction
Since data is one of an organization’s most valuable and strategic assets, testing the relational database schema, which protects the integrity of this data, is of paramount importance. Mutation analysis is a means of estimating the fault-finding “strength” of a test suite. As with program mutation, however, relational database schema mutation results in many “ineffective” mutants that both degrade test suite quality estimates and make mutation analysis more time consuming. My colleagues and I recently published a paper that presents a taxonomy of ineffective mutants for relational database schemas, summarizing the root causes of ineffectiveness with a series of key patterns (McMinn et al. 2019)
Using these patterns, the paper also presents algorithms that automatically detect and remove ineffective mutants. In an experimental study involving the mutation analysis of 34 schemas used with three popular relational database management systems—HyperSQL, PostgreSQL, and SQLite—the results show that the presented techniques can identify and discard large numbers of ineffective mutants that can account for up to 24% of mutants overall, leading to a change in mutation score for 33 out of 34 schemas. The tests for seven schemas were found to achieve 100% scores, indicating that they were capable of detecting and killing all non-equivalent mutants. The results also reveal that the execution cost of mutation analysis may be significantly reduced, especially for popular, yet “heavyweight”, DBMSs like PostgreSQL.
Background
Relational database schemas define the structure and type of data that will reside within a database, declaring any relationships between pieces of data that may exist. Integrity constraints are vital for protecting the validity and authenticity of database data. There are five common types of constraints: PRIMARY KEY
, UNIQUE
, NOT NULL
, FOREIGN KEY
, and CHECK
constraints. Testing these constraints is crucial to ensure the integrity of the data stored in the relational database. Mutation analysis helps to evaluate the quality of the tests.
Specifically, mutation analysis estimates the fault-finding capability of a test suite by generating copies of an artifact under test and seeding small faults, known as “mutants”, into those copies. The percentage of mutants detected by a test suite is known as the “mutation score”: the higher the mutation score, the stronger the suite is judged to be at trapping real faults. However, mutation analysis can result in the generation of many “ineffective mutants”, which can be invalid, equivalent, or redundant. These ineffective mutants reduce the usefulness of the final mutation score and incur an execution time overhead.
Ineffective Mutants
Ineffective mutants can manifest in various ways. For instance, PRIMARY KEY
constraints ensure the uniqueness of database table rows, which is also a property of UNIQUE
constraints. This leads to a source of equivalent mutants in the SQLite DBMS. This paper identifies a wide range of representative root causes of ineffectiveness in the mutants of relational database schemas and summarizes these root causes into a number of patterns in database schemas that a tool can use for ineffective mutant detection. The paper also introduces a new type of ineffective mutant: the “impaired” mutant, which represents infeasible database schemas that are not completely invalid but are trivially killed by test cases.
Building on this understanding of ineffective mutations, the paper also introduces ways to statically analyze database schema mutants, identifying those that are ineffective and removing them from the mutant pool used in mutation analysis. These algorithms are implemented in our open-source tool called SchemaAnalyst. An experimental study using real-world database schemas hosted by three relational database management systems shows that these techniques can detect and remove large numbers of ineffective mutants, making mutation scores more useful and reducing the execution cost of mutation analysis.
Conclusion
The paper (McMinn et al. 2019)
As my colleagues and I continue to explore both mutation testing and database schema testing, your insights and suggestions are appreciated! If you have ideas on this topic, please contact me. Or, if you want to stay informed about new developments and blog posts related to mutation testing and other software testing topics, you can subscribe to my mailing list.