|This technique really shows its power in systems of several
hundred thousand components. For instance, binary search could find a single
component in a system of 1,048,576 components (a moderate sized automated
system) using only 20 tests.
NOTE: Implicit in all this is that if you keep narrowing it down, whether binary or not, as long as you don't repeatedly double back in areas you've already tested, it is a MATHEMATICAL CERTAINTY you'll eventually solve the problem.
Intermittence invalidates most tests which could split the search area, resulting in backtracking. It thus renders binary search a useful but insufficient tool for troubleshooting. Intermittence eliminates the mathematical certainty of solution -- indeed many intermittents remain unsolved. There are several techniques to maximize your chance of solving an intermittent.
Remember the order comes from your knowledge of the system, and nobody knows everything, including the system documentation. The less complete your knowledge, the more trial and error is necessary. Nevertheless, in real life even a minimum of knowledge allows a reasonable approximation of binary search, so this isn't much of a limitation.
Quadruple Tradeoff: Ease vs. Likelihood vs. Even Divisions vs. Safety:
A more significant limitation is the fact that troubleshooting tests are often time consuming and risky. The test which would most exactly split the remaining search area in half is often the toughest. Thus we temper our desire for even divisions with the reality that we need to minimize time and risk. Often our troubleshooting instinct tells us it's likely the problem resides in a tiny portion of the remaining search area. In that case it's perfectly permissible to test to prove or disprove it's in the tiny area, but it's never permissible just to assume it. If a test carries a credible risk of harm to the system, property or person, try to find a safer test, even if it's harder, less likely, and doesn't divide the remaining problem scope as evenly.
Letting the Problem Out of the Box:
The Divide and Conquer process can be thought of as continually forcing the problem into ever smaller boxes, until it's trapped. Some of the worst troubleshooting debacles I've seen involved the problem escaping the box. In other words, the troubleshooter thought he had proved it was in one area, when it was really in another. When that happens, tests become inconclusive and the troubleshooter starts to doubt himself. Whole days can be wasted. Take every precaution to avoid this -- don't skip steps.
|NOTE: The March 1998 issue of Troubleshooting Professional Magazine, themed "Bottleneck Analysis", is essential reading for narrowing problems in systems whose symptom description includes words like "too" or "insufficient". You can see it at http://www.troubleshooters.com/tpromag/9803.htm. The December 1998 TPM describes the narrowing process on intermittent problems, and can be read at http://www.troubleshooters.com/tpromag/9812.htm.|
[ Next step | Back to Universal Troubleshooting Process | Email Steve Litt | Home Page ]
Copyright (C) 1996, 2006, 2012 by Steve Litt. -- Legal