Software Rhythm: End-Game

This is part three of the Software Rhythm series.

I define the end-game as everything between “feature-complete” and release. After all features are complete, the primary activities are integration, bug-fixing, testing, and, documentation.

Generally, you want to be doing all of these activities throughout the release so that you avoid a big-bang integration at the end. However, releases are date-driven and your goal is always to satisfy the business by putting as much value as possible into the release. This inevitably means that you will be finishing several features at the end of the development, especially the most important and riskiest long-pole items.

Thus, there is always integration work to be done at this point. As much as possible, you want to manage the risk of this end release integration. If at all possible, it’s recommended to develop and test big features in isolated branches so that you can manage the individual feature risk (by deciding not to integrate). However, you then incur that risk all at once at integration time and integrating multiple big features at the same time compounds that risk.

Testing

You should be testing throughout the release, but generally once things are considered “feature complete” the test team can start officially doing verification activities. There are many kinds of testing and my best advice is to not put all your eggs in any one basket. [Note that I’m not including unit testing here as that must occur as part of development and thus should already be done.]

You should use a mix of testing techniques and extract the 80/20 from all of them. That means doing acceptance testing based on use cases, focused functional testing, interface testing, performance testing, stress testing, exploratory testing, regression testing, etc.

Pay attention to where you find bugs. If hot spots appear, focus on the hot spots as you’ve likely found a problem area. As you develop new tests, record them for future regression testing. This builds a suite of tests that over time can be used strategically to fully test the product.

Bug Barbecue

As you test, you will inevitably find bugs. The more you test, the more bugs you will find. This is good – you want to collect as many of these bugs as you can possibly find.

I suspect it would seem weird to most people outside the software world to know that we release products with hundreds or thousands of known issues but it happens in many or maybe most projects today. This used to bother me a lot, but I’ve come to accept it as simply the the state of the world.

It’s not that we can’t find and fix those problems – it’s rather that the cost of doing so is not worth it. Every day we use a stack of software on every device we have where every layer of the stack from hardware up to a web app on a browser is full of bugs. Yet, this doesn’t impede our productivity much. If that software was literally bug free, it might cost 10x or 100x as much to make and then people wouldn’t use it at all, so would derive no value from it. That’s not to say there isn’t vast room for improvement of course.

I’ve come to see the art of issue management as figuring out which bugs should be fixed. To do this, you want the widest possible input channel – all the bugs you can find. You then need to really manage that stream of bugs, just like a product manager manages a stream of feature requests. Managing means regularly having a key group of decision makers sit down, review all new issues and any important updates and make decisions. Regularly means at minimum once a week and ideally every single day.

This applies not just during the end-game, but all the time. Even during the Opening and the Mid-game, older releases experience problems or bugs are found while developing new features or doing maintenance on old ones. But during the End-game, managing those bugs rules what the dev team does, so must be given priority.

I’m somewhat ambivalent about test-first development or TDD but bug-fixing is one area where I don’t think you should do anything till you can reproduce the problem. There are exceptions but this should be your rule. Over time you build a regression suite so that you can avoid breaking and fixing the same problem over and over which creates a positive feedback cycle.

It is also important to “trust but verify” as Reagan said. In other words, don’t trust. It’s not enough to have someone submit a bug, and have a developer fix it. A tester must go back and verify the fix. The number of bug fixes that don’t actually fix the bug is surprisingly high – but you won’t know how high till you start verification. You may notice that this is yet another little feedback loop.

Lockdown

Putting all this together, how do you actually get to the point of release? You must ruthlessly ratchet up the quality (by testing, bug fixing, and verification) while ratcheting down the risk (by increased scrutiny of changes). Of course, these conflict to some degree so you must make increasingly hard choices.

To me, this process feels like planning for a party. A few weeks out you can make sweeping changes about what you’re going to make, who’s coming, etc. But as you get closer and closer to the day of the party you’ve already bought some of the food, you have things planned, and changes get harder and harder to make. Finally, there is a last minute execution of pre-planned preparations and then ding-dong the doorbell rings. Things unfold rapidly from there, often not according to plan as people arrive at different times, weather changes, I someone sets something on fire, etc. Everyone works together to get a product to the point of release, which inevitably means having a known set of issues that you must live with to get the product out the door.

As the release point approaches, you must make finer and finer decisions about what is really important to get into the release. At some point, you have to start leaving out things that really should go in just because the risk is too great. You can change a message in an exception, but you can’t rewrite a web service. It’s just too risky.

The best way to do this lockdown is to schedule a series of phases ahead of time. Usually these have timelines attached to them but you shouldn’t plan to hit them necessarily. It’s better to define a series of quality gates that trigger the next phase. The quality gates can be whether ceretain tests have been completed, # (or existence) of certain priority level bugs, incoming bug rate, incoming bug severity rate, and so on. If you’re not hitting quality gates fast enough, you’ll need to take corrective action.

You also then need to define the level of scrutiny that gets applied during each phase. During the first phase, you might require just that developers write a test to reproduce the bug and review with another developer. Then add review by another developer or by a manager. And ultimately, whoever is responsible for managing the release itself should be approving any change.

Beta

One common practice is the “beta” release. The goal of the beta release is to give users the product early to get feedback. This can be very useful to you because it increases your bug input funnel in ways that either you can’t afford to or simply are not able to as you don’t have the ability to run under real customer scenarios.

The trick is that the beta needs to work “enough” that users are able to actually test it and not feel like it’s a bunch of crap and you need to do it early enough that you can address any changes that are found before the final release. This is often hard to fit into a release schedule unless you have a significant portion of your overall schedule dedicated to the end-game.

Thanks!