Testing the Tests via Mutation
To set the ground for this post I first need to reiterate a message that has been heard from many other sources. Code coverage can only tell you how bad your test suite is, not how good it is. Meaning that if you have bad coverage you know something is wrong, but a high coverage number tells you virtually nothing. This message seems obvious for many developers out there, but it’s not heard by a surprisingly large section of them.
I have been a teaching assistant for several undergraduate and graduate testing classes. Every year when I grade their unit testing projects I always get several of them (about a quarter) that hand in tests with “good” coverage, but not a single assert statement to check the results of their tests. This is even after I get up in front of the class and rant about the exact problem from previous years. Needless to say those students tend to do poorly.
So, if test coverage is not a good metric, how can developers measure the quality of their test suites? I’m going to talk about one approach in this post, mutation testing.
What is Mutation Testing?
When testing code, developers want to confirm that the application does what it is supposed to do by trying a whole bunch of different cases where the end result is know. Mutation testing applies a similar idea to testing the test suite by intentionally inserting bugs into the application’s source code. If the test suite can’t find these bugs, then we know something is wrong. We can further study the effectiveness of individual test cases by observing the number of mutations each test case can find. If there are test cases that detect no mutations, then they need to be examined to find out exactly what functionality they are trying to test.
Now at this point I do need to say something that should be obvious. Like most forms of testing, mutation can only show you the presence of problems, not guarantee their absence. If mutations are found, it does not necessarily mean that the test suite is finding all bugs. The goal of mutation testing is to give developers more confidence in their test suite, and, as we talk about the kinds of mutations automated tools do, you’ll see where the limitations of this form of testing are.
What Kinds of Mutations are There?
The mutations performed by automated tools are usually fairly straightforward. They are not large logic changes, but instead small replacements of logical and arithmetic operators. For instance, mutated code may replace the ‘+’ character by a ‘-’ character and vise versa. It may change logical AND operators such as && to logical OR operators such as ||. Mutation will also change comparisons such as == to < or > as well as swap increments like ++ for decrements such as –. In some cases, mutation may also change constant values, change return codes, or even reorder case statements.
Developers need to be aware that some mutations may not really change the semantic meaning of the source code. For instance, if a loop is incrementing an index value (i++) until a check (i == 5) is made, swapping the == comparator for >= will create an equivalent mutant. Meaning that in this particular case i == 5 will trigger at the same time as i >= 5. Performing this mutation should not trigger a failure of any of the unit tests. Some mutation tools will attempt to reduce the number of equivalent mutants, but their success rates will vary with every case.
What Code Should be Mutated?
One of the biggest reasons mutation has not been as popular as some think it should be is because it leads to a large number of mutant applications. So, if the time to compile the application or the time to execute the test suite is long, developers have to be careful about how their applications are to be mutated.
A simple way to organize mutation testing is to first figure out how many mutants you have the time to test. The formula for that is simple: int NumMutants = MutationTime/(CompileTime+TestingTime). If we only introduce one mutation per version, mutation testing may not get you very far. It is clear that for any non-trivial application we will have to do better.
It is not clear, however, just how many mutations should be introduced in a version. Simply changing every operator that can be changed won’t do much good as a single mutation may always be caught as a bug, well before other mutations are even touched. We need to do something smarter.
What we really want, is to introduce as many independent mutations as we can and run only those test cases that may be effected by those mutations. This is exactly the information we get from coverage. To target a test case we can study what code is covered by that test case and then introduce mutations, one at a time, only in that code. Then, by separating the test cases into sets such that each test case in the set covers a different portion of the code, we can introduce mutations in the different regions.
For an example consider the following piece of code:
//SET UP if(a == TRUE){ //DO SOMETHING }else{ //DO SOMETHING ELSE }
If in this case we have two test cases that both execute SET UP and then test case 1 executes DO SOMETHING, while test case 2 executes DO SOMETHING ELSE, we can introduce mutations in DO SOMETHING as well as DO SOMETHING ELSE, while keeping the rest of the code the same. This way we test two different test cases. If we mutate SET UP, both cases should fail and therefore detect that bug, introducing mutations in DO SOMETHING and DO SOMETHING ELSE in addition to mutating SET UP, will get us nowhere since the bug in SET UP will be caught first.
Thus, we have to be smart about where mutations are introduced. Some mutation tools still take the simple approach of only mutating the code in one place and then executing the entire test suite. When applying this technique, you may be able to reduce the amount of time spent by only running tests that cover the mutated code. In addition if your setup allows for incremental compiling, or is a scripted system, your build time may be short as well.
What Tools are Out There?
Mutation testing is heavily dependent on the existence of tools. Therefore I want to end this by pointing to a few of the ones out there:
- Jumble for Java
- Jester for Java and Pester for Python
- Heckle for Ruby
- Nester fo C#
- SQLMutation for SQL
If you know of other free tools please post them in the comments and I will add them to the list. Two other great resources to check out are Mutation Testing Online and the Wikipedia entry for Mutation Testing