The Hitch-hiker’s Guide to Continuous Delivery
Hopefully, having read the title to this article, you are now picturing in your mind the front cover of the Hitch-hiker’s Guide to the Galaxy. If so, there is hope for you my friend! In said guide, I have it on good authority that the entry for “Continuous Delivery” reads thus:
“A right royal pain in the caboose brought to you by the same geniuses that, having inflicted Agile development on everyone whether they asked for it or not, and let’s face it no-one asked for it, decided they weren’t done tormenting the galaxy. Best avoided unless you actually want to be cost-effective, competitive and relevant.”
Maybe you’ve read the continuous delivery scriptures or been hanging out in the wrong parts of California listening to spotty oiks showing off about how their chaos monkey has just growth-hacked a canary until it turned an odd shade of blue and green and can now deploy any code change to production in 3 milliseconds while simultaneously cooking lunch. If you did I hope you had fun, learned something and got all inspired. But then I am willing to bet you returned to your day job in Enterprise software and came face to face with the crushing reality that you had no freaking clue how to begin to make your world start to look anything vaguely like that!
You’re not alone. I know this because I have spent a lot of time talking to application development leaders in large enterprises about this, and they are all in one way or another, peering and squinting at the Continuous Delivery promised land in the far off hazy distance and attempting to plot a course in its general direction. It’s very tough sledding for all of them for a host of reasons, but they all know they need to be on it because their competitors are, they do “actually want to be cost-effective, competitive and relevant”, and their organizations are spending a lot of money on “digital transformation” projects around Agile, DevOps, and Application Refreshes, for exactly this reason.
Alright, well grab yourself a towel … we’re going in.
Agile and the Law of Diminishing Returns
First, let’s step back a bit and think about what we’ve all done with Agile. A few years ago, development teams started going all agile on us and throwing out releases at far higher frequency. This put traditional QA / test, and ops organizations firmly on the critical path between ideas (requirements) and outcomes (working software being used) and under intense pressure to adapt and invest in getting much more efficient about how they test and release so they could handle the higher cadence without spiraling costs (primarily labour costs).
So you’d expect that several years into all this investment and upheaval we’d at least now have most application testing automated and shifted left, quality being fully owned by the agile teams, with test and release organizations well off that critical path and well on the way to being retrained and redeployed into agile / devops teams.
And there goes a flying pig.
In Enterprise software, 70% of testing is still being done manually even after all the investment. Most teams are struggling to get to release cadences shorter than about 3 months, and testing is still the long pole in the tent. And there is a pretty good reason for that.
The Limiting Factor: Release Taxes
Think for a minute about the economics of this from a testing and release perspective. You are testing to mitigate the risk around the release of a software change. That risk mitigation has a cost, let’s call it the “release tax”. If you halve the release cadence (e.g. 12 months to 6 months), you pay the tax twice as often. If you are going to do that without spiraling costs, you have to get twice as efficient in how you do the testing. To do that you invest in test automation and shifting as much of the testing as you can “left” into automated test suites and ultimately Continuous Integration in order to reduce the tax rate, keeping the total tax burden reasonable.
But this typically hits a wall somewhere around 3 month releases. Why? Let’s say your release tax is $n. In a year you do r releases. Your total tax is $ r*n. Plotting this out for a range of release cadences from twice a year (26 weeks between releases) to once a week you get this:
By halving the release cadence (e.g. 12 months to 6 months), you pay the tax twice as often. To do that without spiraling costs, you have to get twice as efficient in how you do the testing.
Getting down to 12 weeks between releases (3 month release cadence) requires only relatively small incremental efficiency improvements to how you test to remain economically viable i.e. you can invest sensibly in test automation to lower the tax rate enough to offset the effect of paying more often. But beyond that it gets very rapidly tougher and more expensive to continue automating your way to shorter and shorter releases. The marginal investment required to keep making marginal improvements soon becomes frankly uninvestable.
Now, am I saying Agile is not all that? No. As long as you are actually getting customer feedback in some form from your shorter release cycles (3 months is way better than 18) and feeding that into your planning for subsequent cycles, you are way ahead of where we all were with long-running waterfall projects in the all too recent past, because confidence that you are building the right thing and the ability to being continuously running lots of little experiments is actually priceless. But assuming you’ve made good progress with Agile, don’t get religious and precious about it because what got you here, won’t get you where you need to go next. Agile will only take you so far on the journey to being world class when it comes to being cost-effective, competitive and relevant.
We all want testing off of the critical path for delivering value, shorter release cadences, higher frequency feedback loops and all that jazz, without sacrificing quality or getting generally cavalier about the business risk associated with software change, which in enterprise software is a major concern for very good reasons. But folks, we are not going to get there given realistic budgetary constraints by just doing more incremental investments in test automation and shift-left, as important as those investments are.
Managing Release Risk at Speed
Release risk mitigation strategies for new world enterprise software engineering are not about being incrementally more efficient, they require a very different approach to software release risk, which has really three main aspects to it:
- risk surface containmentin various forms including small change payloads (thank you agile) to small components (thank you microservices) rather than massive changes (thank you waterfall) to massive components (thank you monoliths and n-tier apps).
- smart risk perception based on an insightful understanding of where the risk is and isn’t in any given release of any given application. What actually changed ? What is the potential functional, non-functional and business impact? What is the worst that could happen? What would it take to recover if it did?
- smart risk management mitigating properly understood risk with a smart and coherent risk management strategy consisting of mitigation and contingency. Testing is a way to mitigate risk. So is architecture. So are some deployment mechanisms (e.g. canary roll-outs).
It goes without saying (and therefore I am going to say it) that there is no point mitigating risks you don’t have e.g. running regression tests on things that are outside the risk surface. Which of course we all do. All the time. This is dumb and we should stop this.
Also, mitigation is not always the right approach to take for managing all risks. In some cases the mitigation plan may be disproportionately expensive (in time and/or cost) to the size and nature of the risk and it may be far smarter to posit that the risk won’t come to pass and rely on the contingency plan to fast fail forward if it does. This is not cavalier if it’s done rationally based on an assessment of the risk. It is cavalier if it’s just development teams running amok with no understanding or concern for the risk.
In any event this is a major level up from a “test plan” and I don’t even think it’s particularly helpful to think about this in terms of “changing how I test”. It’s more useful I think to come up a level and think about how I build and deploy software at speed with well contained and managed risk. Smart, evolved testing is part of the answer, but it’s only really one leg of a four-legged stool of which the other three legs are architecture, smart coding and Continuous Integration / Continuous Delivery (CI/CD) pipeline.
The Agile and Devops literature talks extensively about people (org, skills, mindsets), organizational culture, and processes. All of which are super important topics. But it doesn’t talk anything like enough about architecture and it should, because when it comes to enabling speed with quality in the application development process, architecture matters.
Architecture can avoid certain types of risk entirely that otherwise have to be mitigated through expensive activities like testing, and give yourself simpler problems to solve with people, process and tools. It can ensure that release risk surfaces are systemically contained. For instance, let’s say you are dealing with some nice shiny microservices where all regression and other release level testing was automated from day one. Well congratulations, you have effectively sidestepped the release tax issue that drives agile into the law of diminishing returns through a combination of architecturally driven risk surface containment and process discipline around test automation. You’ll have to deal with some other problems I’ll be talking about in later articles in this series, but at least not that one.
Incidentally, how much of your enterprise app development world actually consists of cute little butter-wouldn’t-melt-in-their-mouths microservices? I have done straw poll surveys on this question with room fulls of enterprise application development leaders and across the industry right now it is less than 5%. And is moving to microservices the only architectural answer to the problems of speed with controlled risk? No, no it absolutely isn’t. We need to talk about telemetry, feature flags, API policies and discipline, the pros and cons of cloud native architectures and function-as-a-service, containerization, the dependency and complexity problems that result from having thousands of microservices etc… Again, architecture matters.
The hyper-drive to Enterprise Continuous Delivery nirvana kicks in when architecture, pipeline, coding and Continuous Testing practices are engineered as one coherent system, with the objective of enabling speed with contained and managed risk. The new projects being driven by application refresh initiatives should be very thoughtful about that. But it isn’t the case that you can make no progress with any of this until your apps are fundamentally refreshed. It will help a lot when they are, but there is a continuum of excellence overall from the excellence you have today to the next level excellence and you can progress on the other legs of the stool even if the architecture leg is a bit wobbly for many/most of your apps and will be for the foreseeable future.
Next Up On Your Continuous Delivery Journey
In this series of articles I will delve into a number of topics on Continuous Delivery and Continuous Testing that I see enterprises grappling with as they set a course towards the Valley Beyond of Continuous Delivery. Don’t overthink that analogy, fellow Westworld fans!
Continuous Delivery & Continuous Testing Resources:
Continuous Testing Novel – The Kitty Hawk Venture
To help you on your Continuous Delivery and Continuous Testing journey, we have recently published “The Kitty Hawk Venture,” a novel about Continuous Testing in DevOps to support Continuous Delivery and business success.
As a companion piece to “The Kitty Hawk Venture,” you can also download the “Definitive Guide to Continuous Testing” so you can learn the strategies, technologies and techniques that go into a successful Continuous Testing program.