CONTINUOUS TESTINGFalse Positives and False Negatives in Software Testing- Learning to Trust Your Tests

Teams and organizations are placing an increasing amount of trust in automated checks when determining whether or not an increment is fit for production, for the simple reason that there's less time available for testing. What about when tests return wrong results?
Bas Dijkstra Bas DijkstraMay 5, 20206 min

Running the risk of preaching to the choir here, we all know (or should know by now) that software testing is an important and valuable part of the software development lifecycle. We also know that in many organizations, this lifecycle has seen radical changes in the last years. Where previously teams released new increments of their software a couple of times per year, they have now shifted to (or are in the process of shifting to) a Continuous Integration and Delivery model, where new increments can potentially be released multiple times per week or even per day.

 

This shift towards shorter release cycles and continuous testing has not decreased the importance of and need for testing, obviously. We all still want to release software  that we can be proud of, and quality is an important aspect in this. However, it has had a significant impact on the time available for testing and, as a result, on the way testing is performed. Teams and organizations are placing an increasing amount of trust in automated checks when determining whether or not an increment is fit for production, for the simple reason that there’s less time available for testing from the moment a developer checks in their changes in a version control system to the moment where the software is available to the end user.

 

In these Continuous Delivery pipelines, many important decisions, such as:

 

  • Does the code still meet our quality and formatting standards to a large enough extent that we can continue building on it?
  • Does the application still meet the codified expectations that we expressed in our unit, integration and full stack tests?
  • Does the application still meet the standards we set as a team in terms of performance, security and other quality attributes?
  • Do we feel ready to release our product to the production environment and make it available to our end users?

 

are made automatically, as in: without any form or human intervention. This means that a lot of trust is being placed in the automated tests that try to answer these questions. And there’s a risk there, a risk I’m sure many teams are (vaguely) aware of, but that a lot of times I don’t see addressed properly: the risk of untrustworthy and downright deceptive tests.

 

In the remainder of this article, I’d like to identify some ways in which tests can deceive us, as well as give suggestions on how to make sure that we can trust our tests more and lessen the risk of untrustworthy tests.

 

False Positives in Software Testing

 

These are tests that fail for a reason other than a defect in your application under test. Common causes for false positives are:

* a lack of reliable waiting and synchronization strategies (these occur especially often in tests that run through a graphical user interface)

* a failing test data strategy, resulting in a wrong initial application state before the test is run

* test results being influenced by previous tests, either explicitly (test 2 building on the resulting state of test 1) or implicitly (test 2 suffering from residual state after test 1 has finished executing)

Having a lot of false positives, and as a result, a lot of failing builds, is a common source of frustration and dwindling trust in test automation, up to the point where teams disable tests or remove them from the delivery pipeline altogether.

 

False negatives in software testing

 

While false positives can be a source of frustration for a team, false negatives can prove to be an even bigger risk, both to the extent to which teams trust their automation as well as to the product it’s supposed to test. 

 

False negatives are tests that are designed to verify a specific characteristic of the behavior or implementation of an application, but fail to detect the defects therein they are supposed to identify. While false positives present themselves with every test execution, false negatives show a pretty green checkmark while secretly letting defects proceed to the next stage of the pipeline, and possibly all the way to production, unnoticed. Cue a support team member coming in with the latest report on user complaints, asking ‘Why didn’t you test this?’, while the development team looking at each other thinking ‘… but we DID have a test for this! How could this have happened?’.

 

Luckily, there are a number of strategies to help you cure false positives and false negatives in your test automation, and, even better, prevent introducing them in the first place.

 

  • Perform code reviews on tests

 

Writing automated tests is writing software, so it’s a good idea to treat your tests as first-class citizens in your code base. One activity to include is to perform code reviews on your test automation whenever a change is about to be put in the version control system. Did the creator of the test actually cover what was meant to be covered? Didn’t they inadvertently make a mistake when defining its expected outcome?

 

Oh, and these reviews should not be limited to tests that are written directly in code. It’s perfectly possible to perform reviews on low-code tests, too, before they’re added to version control and made part of automated delivery pipelines.

 

  • Test your tests

 

For every test you write, learn to ask a simple question: ‘can it fail?’. Ideally, make a small change in the application you’re testing (introducing a timeout, changing the configuration slightly) to see if the test picks it up and fails gracefully, with an unambiguous and actional message. If that’s not possible or feasible, at least change the expected outcome to see if that makes the test fail. Don’t just do this when the test is created, but repeat this from time to time for existing tests, too, to see if they haven’t lost their defect detection power.

 

  • Review your test suite periodically

 

Every now and then, go through your entire test suite and see if all of your automation is still relevant. Especially when the required feedback loop becomes shorter, every test should matter. Don’t just stick to keeping and maintaining a test only because of the effort it took to create it (i.e., beware of the sunk cost fallacy). Also, beware of thinking that more tests equals better, and that 100% test coverage (what does that even mean?) is the end goal to strive for. Make sure all of your tests deserve their place in your code base and delivery pipelines.

 

  • Look into techniques like mutation testing

 

There are several tools and techniques out there that can help you perform the above activities more efficiently through automation. One of those is mutation testing, a technique that helps improve the quality of your tests, typically of your unit tests. Mutation testing tools create small variations in your code base (these are referred to as ‘mutants’) and subsequently run your (unit) tests to see if they fail. If your test fails, then all is good, since the test successfully detected the change in your application. If the tests keep passing, however, there might be a problem with your tests!

 

Hopefully, this article will help you take a closer look at your tests, as well as at the trust you and your organization places in them. Are you ready to trust your tests?

Bas Dijkstra

Bas Dijkstra

I help teams and organizations improve their testing efforts through smart application of tools. I’m currently mostly active as a trainer, teaching people and teams how to make their first or their next move in the world of test automation. I love to share and expand my knowledge of the test automation field by delivering talks, workshops and training courses, both at conferences as well as on-site with my clients.