Achieving Zero-Defects Software: Software Engineering Institute's Watts Humphrey thinks the traditional code/test/fix cycle is no longer good enough

During his 27 year career at IBM, Watts S. Humphrey spent a lot of time thinking about how good programmers, despite their best efforts, can produce bad code. In the mid-1960s, for example, he recalls testing the OS/360 operating system. The development team had access to "acres of machines" to exercise the code. Yet customers always found bugs that were so seemingly obvious-at least to them-that they couldn't understand how Humphrey's team could have overlooked them.

Humphrey's answer is still controversial. Testing alone, he says, cannot solve the defect problem. No matter how many tests are performed, someone somewhere will run the software in a way that the developers never considered. Customers are unpredictable, and they'll fool you ever time. They'll use unexpectedly large values. They'll connect a thousand files, instead of the expected 15. They'll run the code on some machine configuration you never expected. Given all the variables, says Humphrey, the question is not why complex software systems don't work, but why they should work at all.

While Humphrey has long retired from IBM, he doesn't have much time to play the golf courses near his home in Sarasota, Florida. As a Fellow at Carnegie Mellon's Software Engineering Institute, Humphrey devotes his time to teaching developers how to reduce defects ahead of testing, as well as how to make software companies as a whole more quality conscious. The payoff, he says, is better software, produced on schedule for less cost-with the added bonus of having a development team that actually enjoys its work. It is a full-time calling, and Humphrey seems born to the task. Humphrey has written nine books including, most recently, "Winning with Software: An Executive Strategy." He has worked with major companies, including Microsoft, Intuit, ABB, Xerox, and the U.S. Navy to improve the internal development process. In Chennai, India, there's even an institute named after him: the Watts Humphrey Software Quality Institute.

"Defects have been an enormous problem for software developers," he says. "Organizations spend roughly half their time and money finding and fixing them. What people don't typically recognize is that the defects problem really paralyzes the whole development process. Because the number of defects and the time it takes to fix them are both highly variable, defects can make it all but impossible to predict development schedules and cost." And testing, alone, he reiterates, is not the answer.

That argument goes against conventional software development practices-in which developers write code as quickly as possible, then use the debugger to find and fix the glitches. On the surface, the process seems to work. Code is written quickly and the mistakes are caught eventually. The problem is that as code gets more complex, the number of defects missed by testing increases-and becomes the problem not just of beta testers, but of paying customers. And of course, some software defects can cause major problems: power blackouts, train stoppages, aborted rocket launches.

Humphrey says that catching 100 percent of the defects is tough because defects have grown so common. Programmers, he says, inject a defect for every nine or 10 lines of code. The finished software contains about five to six defects per thousand lines of code. "We have data now on some 30,000 programs. And the defect rate is very consistent throughout." Of course one could quibble with him about what exactly is a defect. A bunch of attorneys could argue the question into the ground and still not come up with a definition. So Humphrey suggests, simply, that a defect is anything that must be changed in the source code before you deliver the program. If it's broke and you have to fix it, that's one defect less in a world full of defects. You've corrected it, but on Humphrey's scorecard, it still counts.

Humphrey says that defects are injected throughout the development process. Consider, he says, a million line program. By his estimate, developers have reviewed the requirements, produced a brief design, and written this code as quickly as they can, and in the process, will have injected-for argument sake about 100,000 defects. Most of these will be uncovered and corrected at various stages of testing:

The compiler, which is the first line of defense, will find about half of them: leaving about 50,000 defects.
Unit testing-where the developer actually tries to run the compiled code-catches half of the remaining defects: leaving 25,000.
Integration testing, to make certain each programmer-assigned module works with the others, leaves about 12,000 defects.
System testing, which assures the system as a whole works according specification, leaves about 6,000 defects.

And that is how the finished, million-line program ships-with 6,000 defects. The defect rate varies with the type of software, Humphrey says. Operating systems typically have ten to 20 times fewer defects than most applications because they receive so much use. When an OS crashes, it gets your attention.

Humphrey says that tests are limited, because the number of ways to test a complex system are almost infinite. You can't test for everything-the possibilities are just too great. So you test for just a small subset-a testing "footprint," he calls it-of the possible scenarios. How small is that footprint? "Most people guess 20 to 30 percent," he says. "The people designing Microsoft operating systems think one percent. My guess is that it's less than .01 percent-it's a very small number, because the number of possible data values, the number of ways job streams can run, the number of possible equipment combinations are all so great." And that's why Humphrey thinks that testing alone will not uncover all the defects. You can test for all the ways you anticipate customers using the system. But people, being who they are, will inevitably invent new ways that the developers never imagined.

Humphrey is a practiced teacher-and so he poses a question to me: "Suppose I've got a software system that is flying an airplane or running a nuclear power plant, and I suddenly have an emergency. Where am I on the testing footprint?" I stumble a bit, but the answer is quite obvious: if a defect could lead to an emergency, the developers would have fixed it-if they had uncovered it. But their tests didn't uncover it. Therefore, the defect is outside the testing footprint. That's the way it is with all catastrophic software glitches-the underlying defects went undetected by testing, because the developers never tested for it, because they never imagined a scenario in which they needed to test for it.

By way of illustration, Humphrey recounts talking to a seasoned investigator of nuclear power plant emergencies. "Invariably, it wasn't just one event that caused the emergency, but a combination of four to six unlikely events that all went wrong at once." If you looked at this sequence of events the day before, you'd declare it an impossible scenario. It could never happen, any more than a royal flush in poker, a big prize lottery ticket, or a golfer struck by lightening on the 17th green. But these events do happen, and so do multiple-event breakdowns: they occur all the time.

Internet security is another problem that testing alone won't solve. The software industry so far is dealing with the problem strictly in reactive mode: a problem is uncovered, publicized, possibly exploited, and patched. Humphrey thinks that the malicious and criminal attacks of today are almost certain to be followed by more severe terrorists attacks, and that poor-quality software is inherently more vulnerable. And that the software industry can't build more secure software on its own, governments will mandate it in law-even if though the lawmakers have no idea how to do it. (He points out that the U.S. Congress is passing comparable laws governing power plants in the wake of a recent power outage affecting the northeast U.S. The lawmakers don't know how to fix the problem, but they want it fixed.)

"The point I'm making is that the software quality problem is bad, and it can't be solved with more testing." So what do you do? Humphrey' answer: you have to go into the test with zero defects-or something near that-in the footprint. "Because if you find zero or very few defects inside the footprint, the odds are I have zero or very few defects outside. Suppose I run the tests on the million lines of code-testing every way I can test. But instead of finding 6,000 defects, I find just three? What does that say about the quality of the code outside the footprint?" Humphrey argues that the odds are good that few defects will be found there, as well. Small as it is, the statistical sample found inside the footprint should reflect the defect rate outside the footprint. If the first looks favorable, the second ought to be, as well.

"That means re-thinking the development process so that I have no potential defects left when I get into system tests. Which means, somehow, going to the next tier up-integration tests-with no defects. Which means going up to unit test, and then...." Well, you can see where he's going with this: the developer working at the coding phase is the first line of defense. Commit fewer defects here, and you'll fewer to catch later on. Humphrey also recommends that several programmers pore over the code in detail. "As a group, you'll find mistakes that you would never find on your own."

Manufacturing quality assurance-applied to software development

Humphrey's idea of catching mistakes early in the software-building process echoes the ideas of the American statistician, W. Edwards Deming and Joseph Juran, author of the Quality Control Handbook. Those ideas gained early acceptance in the Japanese automobile industry of the 1970s, where mistakes were uncovered on the assembly line, not on the road. Other industries, including semiconductor manufacturing, have followed suit. Humphrey says that software development has been slow to follow.

The possible reason is that code seems so comparatively easy to fix-just issue a patch over the Internet. By contrast, a defective automobile design must be recalled and fixed one car at a time. The same is true for semiconductor chips-even a minor mistake in a chip means you have to produce an entirely new one. The fix and rebuild cycle on complex chips, says Humphrey, can take up to six months. Therefore, chip designers have tried to get it right the first time. Humphrey argues that the lesson applies to software developers, as well: it takes longer and costs more money to produce poor quality products than it does to do quality work in the first place. Just as quality control became the business of the machine welder and spray painter on the assembly line, it should also be the business of the programmer writing code. Make fewer mistakes at every stage of the process and you'll have fewer mistakes to find and fix later on.

"The best way to determine if an organization has learned this lesson is to ask their management why they don't produce higher quality software," Humphrey said in a panel on Internet security. "If they say: 'We can't afford to,' they have not learned the fundamental lesson of all quality work, hardware or software. What they are telling you is that they can't afford to do more testing. While it is almost certainly true that they cannot afford more testing, they can afford to produce higher quality products and they soon won't be able to afford not to."

But is that really possible? To find out, Humphrey spent three years writing code with the idea of fixing defects up front. That was the idea, but Humphrey soon discovered that while he could reduce the number of defects, he couldn't eliminate them. "Because I'm human, I make mistakes. But-being human, I tend to make the same kind of mistakes over and over. But that turns out to be an advantage, because when I got data on the kinds of mistakes I make, I now know what to look for." That, says Humphrey, is true for all developers. None are perfect, all make mistakes, but each in his or her own way. "I also found that, as my methods changed and I gained experience, the kinds of errors I committed changed over time."

Humphrey's experience now informs his programmer course: Personal Software Process (PSP). "We show developers how to define the process they are going to use, how to measure and track it-and they get remarkably good. In time, programmers can find a very high percentage of defects before they compile and test. Feedback is the key to reducing software defects, as well as every other kind of human improvement." Of course, compiler errors are also a kind of feedback, but Humphrey argues that record keeping is key. Without that, you merely fix the defect and move on, without considering the process. As a result, developers don't see the pattern of defects, and underestimate just how may defects they inject.

Developers aren't necessarily fond of the process-especially when they first encounter it. Who wants to spend time logging defects, when you could be correcting them? But if you just correct defects without thinking about how you made them in the first place, you haven't learned anything. "It's only when developers systematically pay attention to the numbers that they get shocked into reality: They actually committed 60 defects in this simple module." The Software Engineering Institute claims that programmers taking the PSP course inject, on average, 58 percent fewer defects after PSP training than before. They also save time, moving from 39.4 percent, on average, behind schedule to averaging 10.4% ahead of schedule.

As one PSP student put it: "In week one, engineers complain endlessly about why they have to collect compile defect data. In week two, engineers complain about 'that one compile defect I should have found in my review!'"

Humphrey argues that tracking and feedback works for every aspect of software development, from requirements on through system testing. But to his surprise, even after programmers learned his techniques-and experienced the benefits-they didn't tend to use them on the job. The problem turned out to be a lack of management support. "Programmers would say-'my management didn't understand it, they didn't believe it, they kept asking why I'm not doing testing.' So Humphrey added a set of courses called the Team Software Process (TSP)-geared for the larger team, including management.

"My message is this: do it right the first time, fix it up front, measure quality, manage it, and care about it. And get support from management." Humphrey says that when you do all this, the work gets more fun, because it is more self-directed. "The key to developing quality software is to do quality intellectual work. High-quality intellectual work is not done by people who are upset, tired, and unmotivated. They have to be trying to do really good work. People need to be excited about their work, feel it's important, have some control over it-and doing it in a way that makes sense to them."

Sidebar 1: Connecting with Your Work

In addition to Deming and Juran, Watts Humphrey has another intellectual predecessor: Frederick Winslow Taylor, known as the first industrial manager. "He was responsible for fifty-fold improvements in manual labor. He got a bad name for being very directed, but if you read his work, you find it isn't true-he was very much focused on the quality of life. Taylor showed people how to organize and manage manual labor. His point was that you have to take breaks, you have to encourage them. Quite frankly, we are extending the same principles for intellectual work.

Ultimately, says Humphrey, software quality comes down to motivation: how programmers feel about their jobs, company and work matters a lot when you are trying to eliminate defects.

Humphrey describes an opening meeting at Microsoft in which the senior management and marketing people "come in and explain what we want, and why. We spend the next four days putting together a plan, designing the roles and the goals for each step of the process. We come up with the processes, develop a strategy, make a plan for the team and for each developer, extending in detail over at least the next few months-so that the developers know what they are doing and how they will work together. We show them how to track quality step by step, as well as the timeline, so that they can report, for example, that they are 2.6 weeks behind schedule.

The team comes back with a plan, and it rarely meets management expectations, especially when it comes to the due date. "I've watched over 50 teams myself, and in only two cases, have they come in on time."

Without these methods, says Humphrey, management usually prevails. The programmers agree to deadlines they can't meet, and, not surprisingly, they either don't meet them, or don't deliver what's promised. Humphrey thinks there's a better way. The development team, he says, should hold its ground-but they should be able to understand and explain exactly why they can't meet expectations, so that management has some understanding of the problem. The development team should also provide some alternatives: adding more people, taking more time, building in fewer functions, or building in the called-for functions over multiple releases. While at IBM, Humphrey himself took this last approach when struggling to meet the schedule for OS/350-a schedule Humphrey called a complete fiction. The first release was strictly bare-bones. The follow-on releases layered in the functionality originally promised in the first release.

As for the Microsoft team, it came back with ten alternate plans. Over the long run," says Humphrey," it delivered on the day they planned, with quality a hundred times improved,"

Sidebar 2: Some Guiding Principles from Watts Humphrey

Every engineer is different; to be most effective, engineers must plan their work and they must base their plans on personal data.
To consistently improve their performance, engineers must measure their work and use their results to improve.
To produce quality products, engineers must feel personally responsible for the quality of their products. Superior products are not produced by accident; engineers must strive to do quality work.
It costs less to find and fix defects earlier in a process than later.
It is more efficient to prevent defects than to find and fix them.
The right way is always the fastest and cheapest way to do a job.
There are two unwritten laws of software and technology. One is that a project gets to be one year late one day at a time. The other is that surprises always cause more work to be done.
What people overlook is that when the schedule becomes the only goal, people still need to deliver a product that works and meets customer requirements.