Tracking and Resolving Software Regressions

URL copied!

Categories: Testing and QAProject ManagementAgileDevOpsAutomotiveCommunicationsConsumer and RetailFinancial ServicesHealthcareManufacturing and IndustrialMedia

“Regressions” are a serious problem in many software projects. Especially in legacy systems, regressions can indicate a code base that is “fragile” — that is, one where a change in one area can break something in a totally unrelated feature. In more modern systems, regressions are generally indicators of process problems (e.g., an overly complicated source code branching strategy, or not using an “Infrastructure as Code” approach to version control your environments). In either case, regressions are generally perceived as indicators of some underlying problem, and a high rate of regressions is a legitimate cause for concern.

A “regression” is something that used to work that no longer does. In other words, the software has gone backward (regressed) in response to a change, rather than moving forward. The nightmare regression situation is code that has become so fragile that it breaks whenever you make a change. This often happens in legacy systems that were — or have become — tightly-coupled and monolithic. In fact, this fragility can progress to the point where a system becomes literally unmaintainable. Any change, no matter how small, incurs such a high cost of testing and debugging that further changes become economically unviable.

However, there are other causes for apparent regressions besides a fragile code base, and getting the true cause sorted out early can speed resolution of the issue and avoid mis-diagnosis.

Tracking Regressions

In the below made-up “regression tracker” example, cells E8 and E11 appear to be regressions in the classical “code change breaking something else” sense. Judging by the version numbers, there were no major changes made to the environment, the data, or the test case itself that would have caused a test that had previously passed to suddenly fail. The only significant change is the build itself. Our first suspicion, therefore, is that this is a coding problem —something is broken that used to work. While analysis is needed to be certain, we have good reason to suspect this to be a literal regression.

Figure 1: Example regression tracker (click to open in new tab)

For cells K10 and L13, we should suspect — absent specific knowledge to the contrary — that a change in the test case itself may have caused the failure. Both tests had been consistently passing in previous builds, even in multiple environments and data configurations. The only significant change here—in addition to the build—is that the test case version has changed. We should therefore look at changes to the test case as a potential cause for this apparent regression, in addition to looking at the code itself. Note that the outcome of this analysis can go either way: the new test case might be right, while the old test case that had been passing was wrong. However, we at least know there is a bug somewhere—in the test case and/or in the code—and that this is probably not a classical regression. It is more likely either an issue with the new test, or a long-term bug in the code that was not caught by the old test.

On the subject of long-uncaught bugs, I like the saying, “Don’t confuse the end of an illusion with the beginning of a crisis.” A bug that has been in the code base for a long time does not become a regression when it is finally discovered. Unless it’s something that used to work and no longer does, it’s not a regression. If it never worked (or hasn’t worked for a very long time), it’s your state of knowledge that has changed, not the state of the system. While fixing this bug may be important now that you know about it, treating it as a “regression” mischaracterizes it and can lead to the wrong root-cause action. The right action — besides fixing it plus any related issues — is to improve the test process so that future bugs of this nature will be caught sooner.

Resolving Regressions

As you see in the tracker, we strongly recommend “versioning” everything: the environments, the data, the test cases and the code. Versioning “data” can mean different things in different contexts. In some cases, it might mean versioning the database schema or schemas; in others it might be an identifier for a given test data set. In the case of NoSQL and event-driven architectures, the “data version” might describe a configuration-controlled set of contracts. In any case, we recommend that when versioning anything, follow the convention that a “breaking change” gets a new major version number. For example, a change from data version 3.8.1 to data version 4.0.0 would indicate a breaking change, while a change from version 3.7 to 3.8 would not. Breaking changes need to be planned for and staged; minor changes should be reverse-compatible, and everything should “just work.”

In the case of cell I9, there was a major change made to the data configuration between the previous test run (where the test had passed) and the current test run. This may well be an indicator that the code being tested has not been adapted to accommodate the new data version. If this is the case, cell I9 is not a regression in the sense that one code change broke something else; rather it’s a failure of the code to support a (presumably planned) major data version change. While perhaps a fine distinction, knowing the probable cause of the issue can speed resolution.

The failures indicated by cells J11, J12 and F17 indicate possible environment issues. For cells J11 and J12, the same version of the test case had been passing in multiple previous environment versions. The major breaking change to the UAT environment from version 1.8.0 to 2.0 is a possible source of these failures, which should be evaluated in addition to the code itself. In the case of cell F17, the same version of the test case against the same version of the code failed in the staging environment, but it passed in the UAT environment. We should suspect some environmental dependency there — or else code that is not tolerant of certain environment features, rather than a regression per se.

Conclusion

While simple in concept, tracking a test’s pass-fail history together with the relevant environmental and execution factors is a powerful tool to understand the root cause of failures that are, or that can appear to be, regressions. This tracking requires a number of best practices, such as getting your test cases under source control (and versioning them), as well as versioning execution environments, test data, and sometimes schemas or contracts. The benefit, however, is the ability to identify the root cause of an apparent regression quickly and objectively.