Can human testers understand automatically generated test cases?
Introduction
Throughout my career, my research has focused on creating and evaluating methods for automatically generating test cases and test inputs, thereby supporting programmers with clever tests that may find defects in, for instance, their programs or database schemas. As an example, (Alsharif, Kapfhammer, and McMinn 2018)
Research
Since relational databases are a key component of software systems ranging from small mobile to large enterprise applications, there are well-studied methods that automatically generate test cases for database-related functionality. As explained in a previous blog post called Introducing a research foundation for testing relational database schemas, a schema testing tool automatically generates INSERT
statements with data values designed to either satisfy (i.e., be accepted into the database) or violate the schema (i.e., be rejected from the database). From my own experience, writing schema tests is difficult because a human tester has to carefully reason about the, sometimes complex, relational schema that protects the database’s contents. While having tools like SchemaAnalyst (McMinn et al. 2016)
The aforementioned paper, (Alsharif, Kapfhammer, and McMinn 2019) INSERT
s influences human understandability: the use of default values for elements not involved in the test (but necessary for adhering to SQL’s syntax rules) aided participants, allowing them to easily identify and understand the important test values. Yet, negative numbers and “garbage” strings hindered this process.
The second finding is more far reaching: humans found the outcome of test cases very difficult to predict when NULL
was used in conjunction with foreign keys and CHECK
constraints. This suggests that, while including NULL
s can surface the confusing semantics of database schemas, their use makes tests less understandable for humans. While these results specifically apply to database schema tests, we anticipate that the results may apply to automatically generated tests for general-purpose programs in languages like Java.
Recommendations
In the context of database schema testing, my co-authors and I make the following suggestions for both the software engineers who manually write tests and for developers who create tools that automatically generate tests.
Negative numbers and
NULL
values are confusing for human testers and tests should only include them when they are needed to reveal a defect.Test should use simple repetitions of numerical, categorical, and textual values for unimportant values because they make it easier for human testers to focus on the critical values on which the test’s status hinges.
Since readable strings, in comparison to random textual values, help to ensure that human testers better understand the intention of the test case, they should be used if possible.
While some automated test data generation methods are starting to adopt these suggestions, more work remains to create automated test data generation tools and manual testing guidelines that ensure that tests are understandable by human testers. Do you have ideas about how to make tests more readable? Do you want to collaborate with me on research to make the next generation of automated test data generators? If so, then I hope that you will contact me with your ideas!