As the novel coronavirus (COVID-19) pandemic spread across the globe in the past couple of months, we’ve been learning a lot about the disease and attempts to control its impact.
One of the subjects that almost always came up in any discussion on COVID-19, especially in the USA, where I live, is “testing”. This had me intrigued as a DevOps (and software testing) professional. While I am not an expert in healthcare, and while COVID-19 testing and its implications are quite different than that for software testing, I quickly began to realize the similitudes between the two.
This blog attempts to capture some of the parallels I have noticed between the two testing domains, and what lessons we can learn from each that may be applicable to the other.
COVID-19 Testing and Software Testing and Quality Assurance bear a great degree of resemblance
The landscape around COVID-19 (and its testing) is changing rapidly on a day to day basis, this article attempts to capture the facts that reflect the current reality. Given the pace of change, I recognize that some of the COVID-related facts in article may become irrelevant in the near future; however some of the principles and lessons learnt may still be applicable.
I will organize my observations and thoughts into the following subsections.
1 Testing is very important for both
In almost all discussions on COVID-19, we heard that testing is of paramount importance to understand where we stand in terms of quality of public health. The testing data (and the # of positive cases identified) also are vital inputs into the planning, containment and remediation processes. I heard experts say that they need to do more and more COVID-19 testing to improve their understanding of the situation and plan accordingly. The following screen shot from the telecast of New York Governor Andrew Cuomo on this daily updates COVID-19 crisply summarizes how important testing is.
The same is true for software of course. Testing is the leading process for identifying defects and flaws in the software. Especially in an age of agile delivery (where speed and time to market are often the most important drivers), there is a need to balance speed with quality (and testing) – and quality has significant impact on business outcomes.
While it is clear why testing is important for both domains it is important for us to distinguish the differences in the implications of testing. The important thing about testing (the “activity”) in both is produce quality as the “outcome” – in this case quality of public health or the quality of software, respectively
In the case of COVID-19 testing, that outcome (quality of public health) is the minimization of the number of infections and most importantly the number of fatalities. The way to do that is to contain the spread of the disease. Since the infection spreads so fast, it is imperative to identify infected subjects as early as possible so that containment measures can be put in place.
In the case of software testing, similarly, the outcome (quality of software) is the minimization of the number of serious defects leaked to production and improve the reliability and UX/CX of the software/service. And we all know that in software testing early detection and containment of defects is very important – since the cost of remediation keeps increasing the later the defect is detected.
In both cases, then, we see that early detection and containment are important. However, the role of quality assurance – of which testing is an integral part — but to assure overall quality. We need to evaluate our testing efforts in the context of this overall bigger picture.
2. Good testing is not easy, and needs to be done early
We all heard, especially in the USA, that there were a lot of challenges with COVID testing. Not enough tests were available early on (see Figure below), the process was cumbersome (and uncomfortable for the subjects being tested), and results took a long time (initially up to a week or more). We also heard that many tests were unreliable and returned false positives.
All of these factors had significant implications on the quality of public health given the fact that the virus propagated rapidly, and every impediment in testing posed downstream challenges in the containment processes. To quote an expert from this article:
“The testing fiasco was the original sin of America’s pandemic failure, the single flaw that undermined every other countermeasure”
Despite the fact that software testing is an old discipline, surprising, we find the same challenges. Until the recent evolution of Continuous Testing and Test Driven Development (TDD), software testing has tended to lag behind development, tests take a lot of time to develop and maintain, a significant amount of the testing is still manual (and takes a long time to execute, impacting the feedback time to development), and tests results are sometimes flaky (i.e. do not produce consistent results).
With Continuous Testing came techniques like Shift-Left and Test Driven Development (TDD) where tests were available before (or as soon as) software was developed, so that there was no lag in the availability of tests.
3. We need automated testing in fast changing landscapes
When COVID-19 tests were finally available, the initial tests took up to a week. Very soon pharmaceutical companies started to create better, automated tests made the
results faster. As of date, new tests produced test results in about 5 to 15 minutes (see figure below). This agility in testing made it easier to plan for counter measures to tackle
the aggressive spread of the virus.
Not surprisingly, automation in software testing has been key in enabling agile delivery. It enables fast feedback time to development teams (for example after a commit or build) and enables them to take action quickly to address the problem. The longer the feedback time, the great the loss of developer productivity and higher the cost of fixing the problem.
4. But test automation is not enough, we also need good testing processes and infrastructure
Even with faster automated COVID-19 tests, we saw that that there were other impediments. The number of testing sites were few, tests needed to be sent to a remote lab, there were long lines (which negated the purpose of “social distancing”) and limited supplies (resulting in people leaving untested). We also heard about challenges related to test results reporting as more and more private facilities started to offer the tests.
We see the same pattern with software testing. Test automation simply reduces the time it takes to execute the tests. However, to do efficient testing, we also need to create proper test environments correctly and quickly, populate the environments with appropriate test data, and remove other impediments such as dependencies on other applications that were not readily available for testing (using techniques like service virtualization). Similarly, it is just as important to automate the process to provision the tests and capture the results.
5. It helps to bring the tests to closer to the subjects (or the application) rather than bring the subjects (or the application) to centralized tests
As a result of the impediments noted above, officials started setup of more convenient and accessible locations for COVID-19 testing, such as drive-thru testing sites, and even mobile testing facilities, which were driven closer to subjects’ home. Some testing kits are self-service that does not require active intervention by a health-care worker.
We see the same apply to software testing as well. In traditional testing models, Testing COEs existed to develop and execute tests. This acted as a bottleneck from a resource and agility perspective. Testing jobs were queued (just like the queue for COVID-19 tests), and feedback time was slow. In modern Continuous Testing, testing is democratized. Developers and testers work collaboratively on testing and test assets. Regardless of who builds the tests, they are available to be run at any time as part of the CI/CD process. For example, automated tests developed by testers can be executed by developers or by the CI engine as part of the build verification process
6. Testing techniques need to evolve to look at alternate approaches
Initially, COVID-19 testing involved analyzing throat swabs or sputum samples. This works only for patients who are currently infected (i.e. “active infection”) with the virus. Later testing evolved to testing for specific antibodies in the blood, which not only identified subjects that are currently infected, but also identified subjects that have already had (and maybe subsequently recovered from) the virus. The latter techniques obviously provide a better picture of the total population infected by the virus. It also provides additional benefits – this approach is now being extended to investigate the development of COVID-19 vaccines. This is helping to step-up efforts in disease prevention and in bringing the endemic under control.
Similarly, software testing has also evolved from traditional approaches of running tests on the software to using novel AI/ML techniques to identify if defects are hiding in the software. The latter techniques pick up possible defects by analyzing the other types of data (e.g. application logs, incident logs) to identify defects without running any actual tests. Similarly AIOps techniques use AI/ML to automate not only to reduce the mean time to detect (MTD), but also provide extensive root cause analysis information that help us prevent the problem from re-occurring.
7. It is not practical to test everyone (or everything) – do monitoring and build immunity/resilience instead
Since many people infected with the virus are asymptomatic, it is considered ideal to be able to test everybody. However, this is impractical due to all the testing challenges (as well as cost considerations) mentioned above. Hence techniques are being evolved to better monitor both the spread of the disease overall (for example by using web-connected thermometers) as well as to monitor high risk subjects. We also see the use of data from cellphones to track the spread of the disease. This allows health officials to take both proactive and responsive actions without having to test the population extensively.
We see a very similar approach in “Shift-Right” of software testing as well. Monitoring of applications in production provide insights into application health in a manner that may not be possible (or practical) in test environments. In addition, synthetic monitors may be created by developers and testers and deployed into production to provide fast feedback on the health of the application so that remediation actions may be taken.
Some countries in Europe have implemented very low level of tests per capita. The approach they seem to be taking is the development of “herd immunity” which happens when a significant percentage of the population becomes immune to a disease (after recovery) and that slows or stops the spread of the disease.
While this approach is questionable for COVID-19 context without a vaccine in place at the present time, this approach to building resilience is in a fact popular technique in software engineering. Building fault-tolerant high resiliency software may in fact reduce the need to do extensive testing up-front. Matter of fact, I notice a recent Gartner report on approaches for developing AI-enabled, resilient and bug-resistant applications.
8. Risk-based testing approach is preferred
We heard from health officials that persons most at risk from COVID-19 complications (and even death) are seniors (above the age of 60) or those with underlying health conditions (see Figure below). We sadly observed this in reality as COVID-19 ravaged a senior nursing home in Seattle, WA, causing more than 19 deaths in a short span on time.
In software testing, we can immediately relate this to the well-defined concept of risk-based testing. Since testing and test resources are scarce, it is also important to optimize the testing we do using techniques such as model-based testing.
9. Prevention is easier and better than cure – isolation and anti-viral approaches
One of the key initiatives being advocated in the prevention or containment of COVID-19 is “social distancing” (as well as techniques like isolation and quarantine).
This concept – building loosely coupled systems (such as microservices) – is also remarkably similar – as a way of improving failure (or fault) isolation in software systems. Software sandboxes are also used promote better fault isolation.
Other approaches advocated for COVID-19 prevention include washing of hands and use of disinfectants.
In the software domain, we can immediately relate this to use software security scanning and testing tools, anti-virus tools as well as enterprise security systems to prevent hacker attacks.
10. Prevention and containment requires collaboration
One of the key lessons we learnt from COVID-19 containment and prevention efforts is that it is a community effort (Figure below). Just because lower age and healthier segments of the populations are less at risk from complications does not mean that not collaborate with the broader community isolation efforts.
Similarly we see in software testing, the best defect prevention happens when different teams work collaboratively – such as developers, testers, product owners, SREs etc. This is especially key in agile software development where pace of change is rapid.
11. Prevention and early testing allow better capacity and contingency management
One of the key factors driving early COVID-19 testing and containment is better capacity planning. We were told that we needed to “flatten the curve” so that hospitals and care facilities are not overwhelmed. We have seen the development of peak healthcare stress models and “surge planning” to address the rapidly growing cases.
This is analogous to software performance and stress testing and scalability analysis that allows us to better prepare for high usage scenarios (e.g. for Black Friday demands on retail sites). Early reliability testing of software allows us to identify and fix such scalability issues, or plan for allowing such scalability (for example by provisioning additional compute or storage resources).
12. Overall hygiene is just as important as testing for overall quality
In the context of COVID-19 we saw a great deal of emphasis on general hygiene (such as periodic washing of hands, wearing of masks).
This is analogous as well to engineering best practices for maintaining software hygiene and health. We have already discussed that it is not practical to test everything and software quality is everybody’s job, not just the testers’.
13. Data privacy is important
All of the COVID-19 testing challenges notwithstanding, we mostly saw the protection of privacy of information (PII data) on specific test subjects, Figure below (unless they chose to reveal the test results on their own). This is not just for regulatory reasons, but also to prevent social ostracization in a sensitive circumstance such as this.
Similarly in software testing it is of paramount importance to protect access to PII data during testing, especially when test data is derived from production. Test Data Management solutions provide the ability to both identify sensitive PII data as well as mask such data in test and development environments.
14. Analytics, modeling, insights and AI are key evolving trends
Sophisticated analytics models such as this one, have been developed for forecasting the spread of the virus, number of fatalities, and the peak occurrence dates, etc. based on some emerging patterns. The availability of so much data (which also changes on a daily basis) allows modelers to develop correlations and ML algorithms to improve such predictions. These models are absolutely critical for health officials to plan for proactive actions.
We see intelligent software testing also take advantage of such analytics techniques for things like defect/failure prediction that allow us to plan our testing efforts better as well as better contain defects.
Summary and Key Takeaways
The blog captures the remarkable similitudes between COVID-19 testing and software testing, though they address very different domains. The key similitudes for me lie in management of scarce test resources, test automation for rapid feedback, and use of test data for response planning.
The situation on COVID-19 testing and containment is very fluid right now and is changing rapidly. Specifically in the USA, we learnt that the latest projection for COVID-19 fatalities is more than 200,000, a significantly higher number than previously anticipated. Remember, early in March, US healthcare officials predicted a much lower impact?
The most striking takeaway for me is the mis-assessment of the risk and the apparent lack of risk-based testing and containment.
Risk-based testing is a well-established and proven discipline in software engineering that would appear to be very applicable to the COVID-19 situation.
Since as discussed above, it is not practical to test everyone, it would have been preferable for example to focus on testing the highest risk subjects (and folks close to them) first and isolate them accordingly before testing of other sections of the population. These are folks typically in nursing homes and senior care centers. We also learnt that folks in congested low income neighborhoods in inner cities are also more vulnerable. Rigorously testing and sanitizing the “zone of influence” of the most vulnerable populations may have helped to protect those who are most likely to be fatalities. This concept is not new. For example, airports use this technique to rigorously test and sanitize the zone of influence near and in vicinity. Sweden is one country that has adopted this approach successfully, without using extensive social distancing.
As we build out more distributed and autonomous systems that cross the digital/cyber-physical boundary (for example smart robots or embedded IoT systems like smart pacemakers), we’re likely to see greater intertwining of inter-domain testing disciplines, for example see here. In such systems, the testing approach is a remarkable fusion of techniques in both disciplines. For example: how do detect and contain the spread of a “software” virus in pacemakers that are embedded in human subjects that then endanger the health of the wearers?
As I monitor the COVID-19 situation, especially in the USA, I see other striking similitudes to other aspects of DevOps and Continuous Delivery. Maybe a separate bog in the days to come. Till such time, stay safe and healthy, and get yourself tested (and contained) if you feel unwell.
I would like to thank my colleague Paul Meresanu for his valuable guidance in improving the blog.