Regression testing of software is costly — but you can do something about it!
Introduction
To establish a confidence in the correctness of their software, developers will often write test suites that they will re-run as the program is modified, attempting to ensure that they have not introduced regressions, or faults that arise as an unintended side effect of changes. This process, known as regression testing, is important and yet expensive.
Regression testing is costly because, as software grows in size and complexity, a suite accumulate more tests and therefore take longer to run. For instance, a 2011 conference paper reports that the regression testing of a Microsoft product required several days. Of course, even if a regression test suite only takes minutes to run, developers may find its execution distracting.
Techniques
Prior work has developed many techniques to address the substantial computational cost associated with regression testing, with test suite reduction and prioritization emerging as two of the most promising. Test suite reduction aims to find a smaller test suite that covers the same requirements as does the full test suite (Lin, Tang, and Kapfhammer 2014)
Another approach to regression testing involves selecting only those test cases that exercise the program functionality that was most recently changed. A recent paper reports that test selection saves Microsoft millions of dollars every year — all without compromising the quality of the program under test. So, interested in learning more about regression testing? If you are, then please read (Kapfhammer 2010)
Do you perform regression testing of software in industry? Or, do you conduct regression testing on an open-source tool? Are you interested in discovering ways to transition cutting-edge research into your practice of regressing testing? If you answered “yes” to any of these questions, please contact me with your own ideas and experiences. Finally, since I wrote this blog post, my colleagues and students and I have published additional papers about regression testing, with a focus on re-running tests for database schemas, in papers like (Alsharif, Kapfhammer, and McMinn 2020)