
![]() |
This technique really shows its power in systems of several
hundred thousand components. For instance, binary search could find a single
component in a system of 1,048,576 components (a moderate sized automated
system) using only 20 tests.
NOTE: Implicit in all this is that if you keep narrowing it down, whether binary or not, as long as you don't repeatedly double back in areas you've already tested, it is a MATHEMATICAL CERTAINTY you'll eventually solve the problem. |
Intermittence:
Intermittence invalidates most tests which could split the search area,
resulting in backtracking. It thus renders binary search a useful but insufficient
tool for troubleshooting. Intermittence eliminates the mathematical certainty
of solution -- indeed many intermittents remain unsolved. There are several
techniques to maximize your chance of solving an intermittent.
Ordered Set:
Remember the order comes from your knowledge of the system, and nobody
knows everything, including the system documentation. The less complete
your knowledge, the more trial and error is necessary. Nevertheless, in
real life even a minimum of knowledge allows a reasonable approximation
of binary search, so this isn't much of a limitation.

Quadruple Tradeoff: Ease vs. Likelihood vs. Even Divisions vs. Safety:
A more significant limitation is the fact that troubleshooting tests
are often time consuming and risky. The test which would most exactly split
the remaining search area in half is often the toughest. Thus we temper
our desire for even divisions with the reality that we need to minimize
time and risk. Often our troubleshooting instinct tells us it's likely
the problem resides in a tiny portion of the remaining search area. In
that case it's perfectly permissible to test to prove or disprove it's
in the tiny area, but it's never permissible just to assume it. If a test
carries a credible risk of harm to the system, property or person, try
to find a safer test, even if it's harder, less likely, and doesn't divide
the remaining problem scope as evenly.

Letting the Problem Out of the Box:
The Divide and Conquer process can be thought of as continually forcing
the problem into ever smaller boxes, until it's trapped. Some of the worst
troubleshooting debacles I've seen involved the problem escaping the box.
In other words, the troubleshooter thought he had proved it was in one
area, when it was really in another. When that happens, tests become inconclusive
and the troubleshooter starts to doubt himself. Whole days can be wasted.
Take every precaution to avoid this -- don't skip steps.
| NOTE: The March 1998 issue of Troubleshooting Professional Magazine, themed "Bottleneck Analysis", is essential reading for narrowing problems in systems whose symptom description includes words like "too" or "insufficient". You can see it at http://www.troubleshooters.com/tpromag/9803.htm. The December 1998 TPM describes the narrowing process on intermittent problems, and can be read at http://www.troubleshooters.com/tpromag/9812.htm. |
[ Next step | Back to Universal Troubleshooting Process | Email Steve Litt | Home Page ]