Troubleshooters.Com Presents

Troubleshooting Professional Magazine

Volume 4 Issue 12, December 2000
The Many Faces of Troubleshooting and Problem Solving

Copyright (C) 2000 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

[ Troubleshooters.Com | Back Issues ]

The mere formulation of a problem is far more often essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science. - Albert Einstein

Editors Desk
Vote for Your Favorite Troubleshooting Professional
Distinctions
Problem Solving Principles
Root Cause Analysis
Theory of Constraints
Statistical Process Control
Well Defined System Troubleshooting
Comparison
Generic Problem Solving
Cars and Tanks
Specialized Problem Solving Methodologies
Some Books for Problem Solving
I Stand on Their Shoulders
Linux Log: Linux Troubleshooting Distinctions
Letters to the Editor
How to Submit an Article
URLs Mentioned in this Issue

Editors Desk

By Steve Litt

What's in a word? Consider the words "troubleshoot", "troubleshooter", "problem solving", "problem solver", and "problem":

trou·ble·shoot also trou·ble-shoot (trbl-sht).
v. trou·ble·shot (-sht), trou·ble·shoot·ing, trou·ble·shoots.
v. intr.
To work or serve as a troubleshooter.

v. tr.
To investigate as a troubleshooter and eliminate or settle problems with: cited the service it
performed in trouble-shooting an employee problem (Vance Packard).

                                   Source: The American Heritage® Dictionary of the English Language, Third Edition
                                                  Copyright © 1996, 1992 by Houghton Mifflin Company.
                                               Published by Houghton Mifflin Company. All rights reserved.

trouble-shoot v : solve problems;
"He is known to be good at trouble-shooting"
Source: WordNet ® 1.6, © 1997 Princeton University

trou·ble·shoot·er also trou·ble-shoot·er (trbl-shtr).
n.

         1.A worker whose job is to locate and eliminate sources of trouble, as in mechanical
            operations.
         2.A mediator skilled in settling disputes especially of a diplomatic, political, or industrial nature.

troubleshooter n :
a worker whose job is to locate and fix sources of trouble (especially in mechanical
devices) [syn: trouble shooter]
Source: WordNet ® 1.6, © 1997 Princeton University

problem solving n
1: the area of cognitive psychology that studies the processes involved in solving
problems 2: the thought processes involved in solving a problem
Source: WordNet ® 1.6, © 1997 Princeton University

prob·lem (prblm)
n. Abbr. prob.

         1.A question to be considered, solved, or answered: math problems; the problem of how to
            hem a skirt evenly.
         2.A situation, matter, or person that presents perplexity or difficulty: urban problems such as
            traffic congestion and smog; the philosophical problem of evil. See Usage Note at
            dilemma.

Note: All the preceding definitions were obtained at http://www.dictionary.com

Basically, troubleshooting and problem solving are the same thing. They both relate to solving problems. The word "troubleshoot" tends to be used more for repair of well defined systems (as implied in the references to "mechanical" devices), or in human disputes. But really, according to the dictionary they're both the same thing, and can apply to well defined systems (systems with a known and documented "as designed" state and behavior), or fuzzily defined systems without a known or documented "as designed" state and behavior.

The generality in definition has led to some interesting choices and mischoices in the hiring of consultants and expert trainers, as well as the adoption of methodologies to "solve problems". The good choices have spawned success, while the mischoices have failed and been labeled "program of the month".

I'm not immune to making mischoices. Although I have, for years, steadfastly defined my brand of Troubleshooting as "The act of restoring a sub-performing system back to its as-designed state", and alternatively (as used with Bottleneck Analysis) "The act of improving the performance of a system beyond its as-designed state", several times early in my career I made the mistake of selling it to those needing to solve problems in "systems" lacking an "as designed state". You know, business problems, human relationships and organizational issues and the like.

And of course, I've seen folks sell generic problem solving courses to those who needed to "restore a sub-performing system back to its as-designed state" -- an equally bad mistake.

In both cases, desire for revenue partially accounted for such mistakes. But not entirely. The different types of problem solving, and the distinctions between them, are obvious only after considerable thought, even to experts.

This issue of Troubleshooting Professional Magazine discusses the various types of problem solving (and troubleshooting), the distinctions and the root differences between types of systems, and the problem solving methodologies optimized for those types of systems. Be sure to see the article titled Cars and Tanks, which asserts by analogy the point that although generic problem solving methods CAN be used to solve well defined problems, doing so is suboptimal.

If you're a Troubleshooter or Problem Solver of any type, this issue of TPM was made especially for you.

Steve Litt can be reached at Steve Litt's email address.

Vote for Your Favorite Troubleshooting Professional

By Steve Litt

Now is the time to vote for your favorite Troubleshooting Professional issue (by year and month) and/or article (by year, month and title). Submit your entry to Steve Litt's email address. The results will be announced in our fourth anniversary issue, which comes out in January.

Also, we're having our first annual Troubleshooting Professional Magazine trivia contest. Simply name the issue (by year and month) and article names for the ten different articles containing the these ten phrases:

Gary Kildall has reached out of the grave and grabbed Bill Gates' ankle.
Then, just for a second, a big red thing with three silver eyeballs and huge yellow fangs appears in the closet door.
A few days later, as Communism crumbled throughout Europe, The Attitude became the third Troubleshooting Tool.
I've been the worst, and I've been the best, and the only difference is what I learned about Troubleshooting Process.
There's slushy snow on the ground, and like a couple fourth graders the tycoon and the pauper trudge through the muck.
Like so many before me, I rumbled down the highways paralleling the old route 66 with nothing but an ancient overpacked car and the age-old dream of jobs and sunshine where old 66 ended at a sign reading "Santa Monica Pier".
The girls are all high school seniors (some for the second time), who look 30, walk with too much slink and smile with too much smirk.
Smiles becomes wide-eyed surprise as the wiseguys in the Mustang see me pass, with the front tires a few inches off the ground.
And above all, if you don't want to see him cry, don't tell him you set up an Intranet with virtual hosting, DNS and full CGI (which can interface to Oracle or Sybase or free PostgreSQL), in four hours on a $600 computer.
Richard Stallman wrote the manifesto. Linus Torvalds proved it worked.

Anyone answering all 10 correctly will be mentioned in the January 2001 issue, and if they'd like, will have an email link and an http link from the 1/2001 issue. If nobody gets all 10, the person with the most correct will get their name, and if desired email and/or http links.

Steve Litt is the author of "Rapid Learning: Secret Weapon of the Successful Technologist". He can be reached at Steve Litt's email address.

Distinctions

By Steve Litt

This issue of Troubleshooting Professional Magazine uses the word "distinction" extensively. It means "difference", as shown by the following definition:

dis·tinc·tion (d-stngkshn) n.
2.The condition or fact of being dissimilar or distinct; difference: the crucial distinction between education and indoctrination (A. Bartlett Giamatti). See Synonyms at difference.

Source: The American Heritage® Dictionary of the English Language, Third Edition
Copyright © 1996, 1992 by Houghton Mifflin Company.
Published by Houghton Mifflin Company. All rights reserved.

Note: The preceding definition was obtained at http://www.dictionary.com

You might wonder why I don't use the word "difference" instead of "distinction". It's an issue of connotation, not definition. Use of the word "difference" usually focuses attention on the two entities being compared. Use of the word "distinction" focuses on the difference. In other words:

Difference:
4 > 3

Distinction:
4 > 3

"Distinction" is more of an entity in and of itself, which is what we want for many of the explanations that follow. In many types of mental activity, the key to mastery is understanding the distinction, not just the two items being contrasted.

Steve Litt can be reached at Steve Litt's email address.

Problem Solving Principles

By Steve Litt

The most generally accepted, and generally applicable problem solving methodologies basically do the following two analyses:

Analyze the problem state.
Analyze the solved state.

The problem state is basically "what don't you like about the way this works", with the analysis yielding the root cause of it not working correctly. Anyone troubleshooting well defined systems like machines and computer systems and networks will instantly recognize this as what they do. It consists of an accurate symptom description followed by deductive reasoning.

Analyzing the solved state is basically asking "how would you like things to be", followed by inductive reasoning to formulate a system that can deliver those results. Also included is design work in actually realizing the conceived solved state. Because design and inductive reasoning requires much more creativity, analyzing the solved state requires quite a bit of creativity (and therefore quite a bit of brainpower to supply that creativity). Remember this point.

Sometimes only one of these two steps is required. In fixing a machine, the solved state degenerates into "the as designed state and behavior". No analysis, induction or creativity needed. This point will be made time and again throughout this issue of Troubleshooting Professional.

Likewise, analyzing the problem state is sometimes not necessary, as pointed out by problem solving expert Fred Nickols (link in URL's section). Fred points out that sometimes it's impossible to return to the pre-problem state anyway, so analyzing the problem state is a needless exercise. As an example he points out the stock market crash of 1987, and the fact that though the crash caused many problems, the crash could not be undone.

Perhaps a more common example of irrelevant problem state occurs in problems created by progress. When the free Linux operating system acquired a power level comparable with UNIX, many UNIX vendors went out of business. The UNIX vendors were still operating as designed, but now they were in an environment hostile to that design. No amount of assigning cause would make the UNIX vendors profitable. Only a new solved state would do that.

Most advocates of general problem solving methodologies make additions to the two steps of analyzing the problem state and analyzing the solved state. Many include a step or substep that could be called "how do we get there from here?". In other words, how do we move from our present condition to the solved state? Oftentimes that requires co-worker support, management "buy in", a budget, and much, much more. It's not simple.

Another frequently added step is "how do we prevent future problems?". That can be either future occurrences of the same problem, problems caused by the solution, or even brand new problems. So a possible generic problem solving process could look like this:

Find the root cause of the problem
Determine the desired solution
Decide how to implement that solution
Head off future problems

Don't let the simplicity of the preceding process fool you. Each of those steps contains many substeps and tools. For instance, five of the ten steps in the Universal Troubleshooting Process apply solely to #1 above:

2. Get a complete and accurate symptom description
3. Make damage control plan
4. Reproduce the symptom
5. Do the appropriate general maintenance
6. Narrow it down to the root cause

Most experts have an extremely complex methodology for determining the desired solution. It's tough to jump-start creativity.

So we have an interesting distinction in problem solving. Some systems have a documented as-designed state and behavior -- typically machines, computerized systems and networks. The desired solution in such systems degenerates to the restoration of as-designed state and behavior. No inductive solution finding is necessary. Several problem solving methodologies are optimized for systems with defined and documented state and behavior. Those methodologies produce ultra-fast solutions because the Troubleshooter doesn't waste his time doing steps whose purpose is to creatively determine the solved state, which in this case is already defined.

I choose to call systems with a defined and documented state and behavior "well defined systems". I choose to call systems without a defined and documented state and behavior "fuzzily defined systems". System definition is not a binary absolute, but rather a spectrum. On one end are machines designed by humans, with complete schematic diagrams. Everything about the system is known. Simple to moderately complex machines, and extremely well documented computer programs are great examples.

On the other end of the spectrum is the human mind, which is probably the most erratic, unpredictable and variable system imaginable. Any documentation of the human mind, or relationships between small number of people, is a matter statistics at best, or conjecture and anecdote at worst. When dealing with the human mind or relationships between a few people, solutions are almost completely creative in nature -- it's just too hard to deduce a "root cause". This is especially true in relationship disputes, where each party assigns causation to the other :-)

Human physiology is somewhere in the middle. It's absolutely documented and defined that a human without a liver or a heart will soon die. It's absolutely documented and defined that the organ used to see is the eyes. At the most general level, doctors and physiologists can draw a very accurate block diagram of a human. But when it gets down to details like the effect of hormones on various systems in the body, the definition goes way down, once again supported by statistics and anecdotal evidence. To the extent that human physiology is well defined, the "fix" is to restore the bad "part" to its as-designed (or in this case normal) state. At lower levels where physiology is fuzzily defined, much creativity is necessary to fix symptoms with a minimum of side effects.

Somewhere between human physiology and simple machines are complex mechanisms like computer operating systems. Although the point could be made that an operating system is completely documented by its body of source code, the fact is there's no human capable of knowing that whole body of source code. Therefore, its level of definition depends on block diagrams and other documentation. Some operating systems are more predictable and straightforward than others. UNIX and most of its workalikes (Linux included) are fairly modular, lending themselves well to rather complete documentation of state and behavior. The Windows operating system, on the other hand, is rather non-modular and unpredictable, so that at many levels it is not well defined in spite of the fact that it was built by humans and its entire body of source code exists.

Somewhere between human physiology and the human mind is the human organization. Humans are like gaseous material -- the more of them there are, the more predictable (deducible) they become. The existence of rules and policies make them even more predictable, in some cases to the point where they can be accurately modeled. Human organizations can be documented, defined and modeled when mixed with machines, technology and policies, such as in a factory. This is one of the reasons for the extraordinary success of the Theory of Constraints in the manufacturing sector.

So to solve problems, the level of definition defines whether it's necessary to analyze the problem state, the solved state.

Other Distinctions

Level of definition is perhaps the most evident and important distinction, but not the only one.

Reproducibility/Intermittence

Another important distinction is that of reproducibility, which again is a spectrum. At one end is the reproducible problem, which can "always" be reproduced by a known set of steps (the reproduction sequence). Any problem not reproducible is said to be intermittent. But intermittents come in all occurrence frequencies, from the type that always happens within 2 minutes (which is very near reproducible, at least for troubleshooting purposes), to the event, a single occurrence which requires a very special methodology to solve.

Within the reproducibility distinction, there's a sub-distinction consisting of whether intermittence is caused by fuzzily defined system (example: the Windows operating system), or a well defined component whose malfunction happens to vary with time, temperature, stress, distortion, etc. (a thermally intermittent transistor is an example). They are both very difficult to solve. The latter is usually soluble given sufficient time. The former often is not. They're both troubleshot approximately the same way, so this is the last we'll say about this sub distinction.

The most effective weapon against intermittence is maintenance, both preventative (proactive: before the fact) and repair-consequent (reactive: after the fact, called General Maintenance in the Universal Troubleshooting Process). Intermittence is often caused by worn, dirty or bent connections between components, rather than components themselves. Such non-component causes are harder to find by normal deductive reasoning, but they're a primary target of maintenance. Repair consequent maintenance is forbidden and extremely dangerous in safety critical situations, but preventative maintenance is acceptable and necessary in safety critical situations.

Because maintenance is such an effective weapon against intermittence, those problem solving methodologies emphasizing maintenance are most effective against intermittence. The champion here is the Intelliworxx Era 4 troubleshooting tool, which raises repair-consequent maintenance to an artform. The Universal Troubleshooting Process also places emphasis on repair-consequent maintenance, having it as step 5 of the process (General Maintenance).

The Root Cause Analysis problem solving methodology contains a step called Barrier Analysis, which examines barriers to failure. Certainly preventative maintenance, and the policies and procedures that support it, is a major barrier. Though not specifically named in Root Cause Analysis, preventative maintenance is obviously a major part of that methodology.

As a matter of fact, intermittence seldom survives Root Cause Analysis. Besides its emphasis on preventative maintenance, Root Cause Analysis demands the finding of the true root cause. For instance, a power plant's reactor tripped because of a bad power supply board, but what was wrong with the board? The board had a bad solder joint, but why did that bad solder joint happen? The solder joint occurred from constantly elevated temperatures in the room, but why were the temperatures high? The temperatures were high because one of the room's air conditioners had conked out, but why wasn't that fact discovered before it caused damage. The fact wasn't discovered because there was no reporting procedure for temperature in the room. So ultimately, lack of a procedure to report the room's temperature tripped the reactor. Once the procedure is put in place, the air conditioner is repaired or replaced, the solder joint is resoldered, and the reactor is put back on line, it will never happen again. Well, except that my simplification forgot to follow the fault tree down the air conditioner to find why it failed, but assuming you follow that fault tree too, you can be pretty sure that failure mechanism will never occur again.

Contrast this with the repair of consumer equipment, where the solder joint itself is considered the root cause. If that stereo is returned to a hot room...

Safety Sensitivity

Some systems are more safety sensitive than others. For instance, it would be rare indeed for an inadequate repair on a battery powered radio to cause injury to anyone. On the other hand, a repair mishap on a nuclear defense system could kill millions. Repair methodologies for the two are completely different.

Most evident, the repair-consequent maintenance that speeds repair so much in non safety sensitive systems can kill in safety sensitive systems. I have no idea how nuclear defense systems work, but imagine an armed missile goes into launch sequence and fortunately is manually disarmed. Would you go in and clean all switches and controls?

Not likely. Let's say cleaning those switches and controls fixed the problem because the problem was a dirty switch. You don't know which switch. You don't know what caused it to get dirty. You never will. You just erased the evidence. Some day that switch or another one will get dirty, a missile will go into launch sequence again, and maybe, just maybe, nobody will shut it down in time. No repair-consequent maintenance is acceptable in extreme safety sensitive situations.

Another obvious difference in the treatment of safety sensitive systems is that you don't try to reproduce the problem. It would be just a little too gutsy to try, for instance, to reproduce the missile's spontaneous launch sequence initiation, thus placing the world within a minute of nuclear war. Incidentally, this is sort of what happened at Chernobyl. The technicians wanted to investigate the system's behavior in the absence of various safety mechanisms, so they defeated those safety mechanisms.

And then there's the fact that in extreme safety sensitive systems, the term "root cause" has an entirely different meaning, as illustrated by the discussion of the power plant in the preceding section of this article. Also, in extreme safety sensitive systems, one never dismisses an apparently disappeared intermittent with "it's probably fixed". Consumer testing is wonderful for televisions, but not for jumbo jets.

Between the extremes of battery powered radios and nuclear defense systems are a wide spectrum of systems whose problem solving methodologies represent a tradeoff between the cost effectiveness and safety.

Bad car brakes can kill, but nobody would spend the thousands of dollars it would take to trace a lone event of brake failure. Instead, symptom reproduction is attempted, repair-consequent maintenance is done, and finally a non-rigorous analysis is done of likely causes, and all implicated parts are replaced. This is half way between what would be done with a battery operated radio (sorry, we can't reproduce the symptom), and a nuclear power plant (full Root Cause Analysis).

Total Failures vs. Inadequacies

When a car won't start, it won't start. No ifs, ands or buts -- it doesn't go. That's a total failure. But sometimes a car can go only 30 mph. That's an inadequacy. The first hint of an inadequacy is the inclusion of words like "too", "enough", "sufficient" or "insufficient" in the symptom description:

It's too slow
There's not enough power
The bandwidth is insufficient
The water comes out too fast

Note that sometimes an inadequacy is an overabundance, such as the last of the preceding symptom description examples. In cases of hyper-adequacy, some limiter is doing an inadequate job.

Inadequacies don't just happen in machines and technology:

We're over budget by 30%
We took a loss (insufficient revenue to cover expenses)
Our high employee turnover is a major cost
All our shipments are late

The Theory of Constraints (TOC) is made especially to fix inadequacies. A subset, Bottleneck Analysis, is used to fix inadequacies in well defined systems.

These Distinctions Determine the Problem Solving Methodology

Theoretically, you could use generic problem solving methodologies (Analyze the problem state, analyze the solved state, design the transition, and prevent future problems) to solve all problems. However, in solving well defined problems, competitors using processes optimized for well defined systems would beat your problem solving productivity several fold. With inadequacies, competitors using the Theory of Constraints would run circles around you. And you'd need to build in special considerations for safety-sensitive systems.

Using one problem solving methodology, for all categories of problems, would be like building houses using only a hammer. In this competitive world, for each type of problem solving you do, you need a problem solving methodology optimized for that category of problem.

The good news is once you've learned one, the rest are easier to learn. They all have commonalties.

Problem Solving Commonalties

Every legitimate problem solving methodology demands that you define the problem very early. The reasons are pretty obvious.
All methodologies demand that you work toward a goal, although in the case of methodologies optimized for well defined systems that goal degenerates into "return system to as-designed behavior and state".
Most methodologies either explicitly or implicitly demand you prevent future occurrence of the problem.
All methodologies demand that if it's necessary to find the root cause, it be found by deductive reasoning.
Most methodologies demand testing of the "repair" in one form or another.
All legitimate methodologies are based on cause and effect, especially when analyzing the problem state.
Most methodologies incorporate ethics explicitly or implicitly.
All methodologies demand that if it's necessary to define a solved state beyond "as designed", that creative processes be used. Many methodologies offer suggestions on creative processes. Most methodologies encourage considering solutions that deviate significantly (often radically) from "the way we've done it before".
Well defined systems can grow into fuzzily defined systems when the environment in which they operate changes significantly. For instance, after thousands of transistor radios shorted out in the shower, somebody invented a shower radio. A well defined computer program becomes fuzzy when the humans forming the work, paper and data flow that the computer program models do fuzzily defined things.

In Summary

There are many distinctions by which you can classify systems, including:

Degree of definition
Degree of reproducibility
Degree of safety sensitivity
Is the problem a matter of degree?

These, and other distinctions enable the problem solver to pick a methodology optimized for the system at hand. Picking a severely suboptimal methodology can cost significant money or cause significant damage, injury or death.

In spite of the differences between systems and their associated methodologies, there are remarkable similarities between the methodologies. After learning one, it's likely that learning others will be significantly easier. That's a good thing, because it's likely the expert problem solver will need multiple methodologies because he solves multiple problem categories.

Steve Litt is the author of "Rapid Learning: Secret Weapon of the Successful Technologist". He can be reached at Steve Litt's email address.

Root Cause Analysis

By Steve Litt

Intermittents make us crazy. The sparser the intermittent, the crazier it makes us. The sparsest possible intermittent is the event, which happens exactly once. You can't solve events, or extremely sparse intermittents, with the Universal Troubleshooting Process or any of its contemporaries. Even generic problem solving methodologies do poorly against extremely sparse intermittents.

When safety concerns permit, we often ignore sparse intermittents. How many invoices come back "could not reproduce problem". But in an electric power station, to name one example, future occurrence must be prevented. And that means finding the root cause.

Root Cause Analysis is a troubleshooting process optimized specifically for sparse intermittents. Max Ammerman, formerly of Florida Power & Light, literally "wrote the book on the subject". His book is called "The Root Cause Analysis Handbook". This 135 page book is certainly not simple, but it's understandable by a person who solves problems for a living, especially if he's studied any other problem solving methodology.

In the introduction, the book describes Root Cause Analysis as using the following process:

Significant event occurs
Problem definition and initial data collection
Task analysis
Change analysis
Control barrier analysis
Begin the event causal factor chart
Conduct interviews
Determine root causes
Recommend corrective actions
Report conclusions

Notice I didn't number these steps. Ammerman often treats them like tools, and it's frequently mentioned that in some situations certain steps can be skipped. Each of the above steps, except the first one, consumes a chapter in the book. The first two steps above are obvious to anyone who solves problems. Chapter 1 tells how to define the problem and collect data, and discusses hints, tips, guidelines and pitfalls.

In fact, every chapter offers hints, tips, guidelines and pitfalls. He even tells you exactly how to interview a person, how to avoid "leading" him, and what to do if the person appears to be hiding something (a likelihood when interviewing someone about a serious messup). It's the little "how to" explanations that make this book so useful.

Task analysis it the act of analyzing the task related to the event. Task analysis reveals how the task should be done, as a baseline to evaluate what actually happened.

Change analysis is the comparison of the activities during or leading up to the event, contrasted with those same activities done successfully. The trick here is to ask questions leading to distinctions. These distinctions prove useful in later steps.

Control barrier analysis is the study of failed control barriers. A control barrier is anything, be it physical, policy, or anything else, that prevents problems. Either the existing control barriers are insufficient and need augmentation, or a control barrier failed to prevent the event. In the latter case, evaluate what caused the control barrier to fail. It's absolutely vital you identify all control barriers.

Event and causal factor charting is the creation of a flow chart detailing all the activities before, during, and after the event, as well as conditions, changes, control barriers. The idea is to trace the cause and effect back to the various causes, so that in a later step a root cause can be correctly identified. This step is complex, so read chapter 5 carefully.

We all think we know how to conduct interviews. Chapter 6 goes over the process with a fine tooth comb. Many times during the chapter, I thought to myself "that's a good point -- I didn't think of that". The final work product of the interview is an interview sheet and an observation sheet. These are used to reveal and to substantiate hypotheses necessary to finish the Root Cause Analysis.

Determining the root cause is where the rubber meets the road. A wrong root cause will doubtlessly create a flawed solution, at best a "coathanger" solution fixing a symptom but creating or allowing side effects. At worst it will simply not work, or make things worse. In this respect Root Cause Analysis is identical to the Universal Troubleshooting Process. Chapter 7 discusses how to put together all the info gleaned earlier to correctly deduce the root cause.

Developing corrective actions requires knowledge of the root cause, failed barriers, distinctions between functional and dysfunctional tasks, economics, company policies, government regulations, and a host of other information. In the terminology of generic problem solving, this step analyzes the solved state.

The final work product of the Root Cause Analysis is the report. It must give conclusions and recommendations, and must be presented correctly.

Root Cause Analysis is vital for sparse intermittents, and for safety critical situations where letting the problem recur or reproducing the problem is out of the question. It's an excellent tool for the diagnosis of failed processes, policies, even human interactions. When you read this book, be aware that it mentions PIC without ever defining it. PIC stands for Problem Identification & Correction, Florida Power & Light's internal name for the Root Cause Analysis process.

If you solve problems, you need to know Root Cause Analysis. Max Ammerman's book is called "The Root Cause Analysis Handbook", ISBN 0-527-76326-8. It's widely available. Get it.

Steve Litt is the content lead at Troubleshooters.Com. He can be reached at Steve Litt's email address.

Theory of Constraints

By Steve Litt

Many technological problems, and most problems in business, commerce and manufacturing, are matters of degree. "Too slow", "not enough", "too high", "late", "we need more money". In any system, the throughput is governed by one or a few bottlenecks. Extending this logic, if you increase the capacity of the bottleneck or bottlenecks, you increase the capacity of the whole system. But you need to determine which component is the bottleneck. The Theory of Constraints is optimized to do just that.

Bottlenecks leave clues. Put a 40 watt, a 100 watt and a 150 watt lightbulb in series, and only the 40 watt bulb lights. The clue is that it's got most of the voltage, so it lights up. On the factory floor, the bottlenecked machine or process has a huge stack of inventory awaiting work by that machine or process. And on a long trail too narrow for people to pass, the slowest person has a huge space in front of him :-)

Sometimes the bottleneck's clues must be uncovered. In a computer, you need software to display the memory and CPU usage in order to see if either is bottlenecking your computer. Any suspected bottleneck can be tested by reducing its throughput and seeing if system throughput decreases commensurately. For instance, if you don't know whether your network is the bottleneck, slow it down either by adding traffic to it, or by decreasing its bits per second figure. If your computer programs slow commensurately, that's the bottleneck. Otherwise not. In computer programs for which you have source code, you can sometimes temporarily comment out a suspected bottleneck and see if the program speeds up significantly. Or you can add a delay to it and see if the program slows commensurately.

The theory of constraints doesn't stop with finding the bottleneck. It offers alternatives on how to "fix" the bottleneck. Of course the most straightforward way is to add capacity to the bottleneck. Buy a faster CPU. Add another drillpress and operator. Switch from 10mb to 100mb Ethernet. Get a faster modem. Hire more salesmen.

Adding capacity to the bottleneck can be a great solution. But it's often expensive, especially in cases like the microprocessor where you need to replace, rather than add to, the resource. Luckily there are other ways to "fix" a bottleneck. One of the easiest is to make sure the bottleneck is running full blast all the time. If it's a machine in a factory, make sure it runs three shifts, make sure its operator does nothing but run that machine, and make sure upstream machines supply it with parts as needed, with an adequate stockpile so it can continue through those inevitable variations that crop up. In an audio amplifier, make sure the power supply runs full blast all the time by putting huge capacitors across the DC so that when the music is soft, the power supply goes full blast charging that capacitor, which is then discharged when loud music sucks out more electricity than the power supply can produce.

OK, so it's running full blast and it's still not enough. Often you can offload some of the bottleneck's work to a non-bottleneck. If you read "The Goal" by Eliyahu M. Goldratt and Jeff Cox, you know you can get a fast guy to carry all of Herbie's (the slowest guy on the march) equipment. You can get a less efficient machine to process the parts instead of the bottleneck, or even outsource the processing. Sure, it's expensive, but remember, the increased expense is only for that one process, but because it's the bottleneck, the one-process investment increases throughput for the entire system. Perhaps the coolest example of offloading is cache memory in a computer. Dynamic memory, the only kind cheap enough to use in quantity in computers, is unacceptably slow. It's the bottleneck. So computer designers long ago learned to offload most of the dynamic memory's work to the CPU, which must synchronize dynamic memory's contents with a small, fast and expensive static ram Cache memory. For each memory access, the CPU decides where to go for it, and if the accessed memory is not in cache, places it there, getting rid of what the CPU calculates is the most dormant data in the cache.

Whew, that CPU has to work. But because it's not the bottleneck, by definition it has excess capacity available. The result of memory caching is that up to 90% of the time, the CPU doesn't need to go through the slow process of requisitioning data from dynamic RAM. On Windows computers, you can have the CPU offload some work from the hard disk. Obviously disk cache does that, but in Windows there's a subtler way. If you have a fast processor (or a slow disk) you can speed system throughput by using disk compression. Sure, the CPU must work overtime compressing and uncompressing those files, but the resulting disk reads and writes are smaller. Given that disk access could be 100 times slower than memory access, those smaller reads and writes increase system throughputs. So unless your CPU is already close to overworked, disk compression speeds system performance.

Sometimes the bottleneck busies itself doing unnecessary work. In a factory, maybe some of the parts put in the oven don't really need the heat treatment. Or maybe, because there isn't an inspection station right before the bottleneck, bad parts are going into the bottleneck, displacing potential good parts. Or maybe downstream people and processes are ruining parts that came out of the bottleneck. All these sources of unnecessary work can be corrected.

The theory of constraints tells us there are many ways to "fix" a bottleneck. But so far what I've described is just bottleneck analysis on steroids. The theory of constraints goes much farther. As an introduction, let's discuss the difference between compressible and incompressible systems.

Compressible and Incompressible Systems

A series circuit with a battery and two resistors is incompressible. If the battery's internal resistance is 10 ohms, and the resistors are 100 and 1000 ohms respectively, the throughput through each and every component is governed almost completely by the thousand ohm resistor. At each and every moment in time, all three components pass an identical amount of current. It's an incompressible system.

But you can make it a compressible circuit by adding a 1000mfd capacitor across each resistor. Now, upon initial closure of the circuit, the capacitors take every bit of current the battery can supply. They "build up inventory", if you will. Early in the stabilization process, the voltage across each capacitor is based on the amount of DC current that's "flowed" through it, meaning each cap has the same voltage. So early in the stabilization process the 100 ohm resistor will conduct 10 times the current conducted by the 1000 ohm resistor. As things continue, the slower flow through the 1000 ohm resistor results in a greater voltage across its capacitor, meaning that the current discrepancy starts to disappear. When things reach steady state, once again the battery and resistors pass identical current.

That's similar to a factory. Each machine has room for a stockpile in front of it. Each stockpile is like a capacitor. It can grow and shrink. And in a factory, you can always increase the stockpile potential by adding space, or even warehouse space. It's like adding capacitors while a circuit is running. Another difference between a factory and a circuit is that machines are not like resistors. A resistor conducts more as the pressure on it increases. Not so a machine. A machine either runs at its rated throughput, or if it runs out of incoming parts, less. But having a huge stockpile does not increase the throughput of the machine. So electronically, a factory machine would be modeled as a constant current source, not as a resistor.

The final difference between the factory and the circuit is one of degree. Stockpiles resemble 100 kilofarad capacitors in a 1 amp circuit -- they can take a month or more to "fill up".

Calculations *VERY* approximate, I=dQ/dt and Q=CV
leftmost 2 Capacitors take about a month to charge :-)
Rightmost capacitor never charges (no stockpile)
Now imagine each current source intermittently 
"goes down" for setup, and imagine the result.

And they exhibit little or no loss during the month. That's very hard for an electronics person to wrap his mind around. But that's what it would take to model a factory. From here on we'll discuss the factory in factory terms.

Released raw materials are "pushed" into the system, after which each machine "pushes" materials to the next. In a factory, order counts. Everything upstream of the bottleneck can be functioning at breakneck speed, while everything downstream of the bottleneck is starved. Order becomes more significant when combined with the deadly duo of dependent events (cascaded stages in a process) and statistical fluctuation (variation).

Take two machines, with the faster machine four times the throughput of the slower. If the slower machine feeds the faster machine and the faster machine goes down for an hour, there is no lost production. The slow machine simply builds a stockpile for the fast machine. When the fast machine comes back on line, it makes quick work of that stockpile because it's four times faster than the slow machine.

Now reverse the machines. The fast machine feeds the slow one, and the fast one once again breaks down for an hour. Let's say that after 1/4 hour the slow machine finishes its stockpile of incoming parts. Now it has to wait for more parts from the fast machine. Now there's lost production in the amount of 3/4 of the slow machine's hourly throughput. Remember, the slow machine can't "catch up" the way the fast one can. The only way to have prevented that was to keep a continuous stockpile in front of the slow machine equal to the worst case downtime throughput of the fast machine.

This isn't especially impressive with just two machines. But if a factory were unfortunate enough to be cascaded with machines sorted fastest to slowest from upstream to downstream, the requisite stockpiles in front of each machine would be enormous. Those stockpiles represent money spent but not recovered from customers. They represent potential obsolescence. They represent a mess making it harder to work in the factory. They represent storage costs. They can drive you to the poorhouse.

On the other hand, a factory fortunate enough to have the slowest machine at the beginning would require no stockpiles. Every other machine could "catch up" to the slowest machine's pace if something in front of it went down.

But the bottleneck is seldom at the front. So the theory of constraints presents methods of calculating material release so that the bottleneck always has a stockpile of incoming parts.

It gets even more complex. Typically the bottleneck machine will process several different types of parts. The different parts are destined to go into finished products in different customer orders, each of which must be delivered on time. How do machines upstream schedule delivery of appropriate parts to the bottleneck such that those parts contribute to ontime delivery. Remember, the bottleneck must run continuously at full speed. Among other things, that means minimal parts changes (each parts change requires a setup). In other words, for a given incoming part, process all of that part in the incoming stockpile. Or at least enough to satisfy all current orders.

In the name of (false) efficiency, some folks extend the "long run" philosophy to non-bottlenecks. The result is that the bottleneck is starved for certain needed parts, because the upstream processes are creating excessive numbers of a different part. So one order ships late, and excessive inventory in the long run part are developed.

The trick is to use shorter runs on the non-bottleneck machines. After all, by definition they have excess capacity, so they can afford to be idled by extra setups. Meanwhile, the shorter runs on non-bottlenecks means a more even parts distribution arrives at the incoming stockpile of the bottleneck, so it can fulfill orders.

But wait, there's more. As taught in the Universal Troubleshooting Course, bottleneck analysis is all about speeding up bottlenecks, which are considered bad. But in the Theory of Constraints, bottlenecks can be useful. Think of your car. It would continuously go 120mph if you didn't have a bottleneck called "the accelerator" to slow system throughput (via fuel delivery). The theory of constraints endeavors to minimize stockpiles by calculating the release of raw materials at a time such that they will arrive (as part of finished goods) at the shipping dock just in time for ontime delivery to the customer. They USE the bottleneck to calculate the time from parts release to shipping dock, and can even use it to calculate parts orders and promise dates to their customers. It's the bottleneck that allows them to accurately calculate these things.

Trickier still, there are situations in which throughput is significantly slower than the bottleneck. The classic example is the fixed speed streaming tape drive, which must be supplied with sufficient data to fill the tape passing the tape head. If the streamer moves faster than the data supply feeding it, many streaming tape drives cannot simply stop. There's a finite time until they've stopped, after which they must reverse and re-cue before recording more data. Depending on the mismatch between the tape and the data flow into it, this constant "shoe shining" could cut the theoretical capacity of the bottleneck by several fold. In fact, such inefficiencies happen in any system where a component has speed and inertia. An assembly line comes to mind. If the line is sped up beyond the capacity of a specific nut tightener to tighten his assigned nut, he must either skip certain units and put them aside, pass them on creating rework down the line, stop the line, or some equally counterproductive action. True assembly lines, as opposed to cascades of stations with stockpiles, react very badly to oversupply of incoming material.

So you've eliminated some bottlenecks, eliminated excessive supply to the existing bottlenecks, and exploited bottlenecks to schedule orders. Now you're running a phenomenally smooth factory. But as you continue to further enable the bottleneck, there comes a time when you seem to have many roving bottlenecks, even though calculations indicate that these roving bottlenecks have more than enough capacity to supply the bottleneck, and indeed the order flow.

Here's what's happening. As the bottleneck approaches the throughput of certain other machines, those machines are no longer able to "catch up" when inevitable upstream variations deplete their stockpiles. So they don't move needed parts to the bottleneck in time, and there's lost production. Basically, everything's running too close to "the edge". The solution is to release parts early enough to maintain non-bottleneck stockpiles big enough to cover any expected upstream downtime.

And of course, if the bottleneck's throughput continues to be improved, you reach a point where another machine becomes the bottleneck. It's very important to recognize that change, and continue your calculations using the new bottleneck. And remember, sales can be the bottleneck (meaning the factory as a whole has excess capacity). That's the time to acquire new customers and get more orders from the old by promising the delivery times that your excess capacity can provide.

Whew! Quite an explanation, and it barely scratched the surface. Luckily, Eliyahu Goldratt and Jeff Cox have written a book on the Theory of Constraints called "The Goal" (ISBN 0-88427-061-0). I've read it probably once a year for the past 5 years, and it's my favorite book. The Goal has a "can't put it down" plot, in which characters we can all recognize (some we can identify with, and some we recognize as enemies) blunder through life slowly learning the lessons of the Theory of Constraints. By the end of the book they're Ninjas, and the reader has learned right along with them. There are good guys, bad guys, good guys who are forced to be adversarial by circumstance, love and plenty of conflict. I'm a Stephen King fan, and I can tell you that "The Goal" is every bit as exciting as the best of King's books.

But by far the most memorable parts of the book are the ingenious analogies Goldratt and Cox use to demonstrate the complexities of factory logistics. One, a boy scout hike, uses a boy named "Herbie" as a metaphor for a bottleneck. Every time someone slows down in front of him, he has to stop, but when the faster boy in front of him sprints to close up his own gap, Herbie can't catch up, so he falls ever more behind, opening a huge gap in the line. Putting Herbie at the front eliminates all gaps, and allows the group to progress at Herbie's maximum speed. They then offload work from Herbie the bottleneck by carrying the heavy stuff in his backpack, and the whole line sped up. "Herbie" probably has more name recognition than many state governors.

The second well known analogy from the book is "the match game". It's a game invented by the hero to model the effects of the combination of dependent events (cascaded steps in a process) and statistical fluctuation (variation). By the time you're done with the book, you can do more than understand the principles. You can do more than recite the principles. You can FEEL the principles. And if your experience is like mine, every time you read the book you obtain a greater level of understanding and feeling.

I recommend that everyone who ever solves any kind of problem (and who isn't included in that definition) read this book. At $20.00 for the book and maybe 10 hours to read it, the learning to investment ratio is right off the charts.

Steve Litt is a Theory of Constraints true believer, as well as a bottleneck analysis expert. He can be reached at Steve Litt's email address.

Statistical Process Control

By Steve Litt

I've got an idea how to implement continuous improvement. Every year, fire your bottom 10% producers, and hire others to replace them. On an ongoing basis, this will solve your personnel problems. Cool idea, huh?

It won't work, of course. And the main problem won't be the ensuing morale problems, or even turnover. The problem is that very few of the fired employees contributed to the root cause of the company's problems. Very few of them contributed to the company's bottleneck. Sadder, some of the fired employees might be among the company's best. How can that be?

Everything has variation. Some of the variation has a cause, and some is statistically insignificant "random noise". With extreme amounts of employee variation, all the variation could be random noise. In other words, the worst guy this year could be the best guy next year, in spite of the fact that nobody changed what they were doing.

To find out whether a particular person's performance is statistically significant, you do some statistic evaluations on the performance figures to obtain an Upper Control Limit (UCL) and a Lower Control Limit (LCL). All performances between the upper and lower control limits are considered normal performance, not subject to any discipline, correction or awards. Anyone above the UCL should be evaluated to find out what he's doing right, so everyone can do the same thing. Anyone falling below the LCL should be, for lack of a better word, troubleshot, to find out what is going wrong with his performance. There are cases where this statistical analysis revealed that the employee needed a new eyeglass prescription, and in fact once the new eyeglasses were obtained, the problem vanished. Below is a cartoon, based on a control chart, lampooning those who would "grade on the curve" in a corporation:

The gunslinger is making a fool of himself because all employees, including Barnes, are within the statistical limits, thus none of them is the cause of any company problems, and the company cannot be improved by getting rid of, or tampering with, any one of them. If the gunslinger wants to improve productivity, he must look at the production system common to all those employees.

The process of assigning causation through statistical analysis is called Statistical Process Control (SPC). This was W. Edwards Deming's starting point. It's an extremely powerful tool. SPC isn't used only to evaluated the performance of people. It can evaluate anything that can be expressed numerically. Basically, when any given "thing" falls outside of the UCL/LCL borders, it should be investigated and either fixed or propagated, as appropriate. Notice that this bestows the opportunity to answer the question "why was this month so good" (of course after determining that the months figures were above the UCL). That answer can then be used to make permanent improvements.

So SPC first alerts us to problems, and then gives us some tools to diagnose those problems. Note that at any point in the investigation we can move from SPC to a diagnostic process like the Universal Troubleshooting Process, Jim Roach's Diagnosis, the Six Step Loop, Root Cause Analysis, or Theory of Constraints, to finish getting down to the root cause.

SPC is deeply rooted in the quality movement. One hallmark of "quality" is the reduction of variation. As variation decreases, the UCL and LCL approach each other. In the case of the people in the preceding cartoon, the reduction of variation would come not from hiring new people, but from creating a system that allows them more consistent (and hopefully higher) production. This would be equally true if the preceding control chart were ball bearing diameters instead of human performance.

In the vernacular of SPC, anything outside the UCL/LCL limits warrants investigation and is called an "assignable cause", or sometimes a "special cause". Anything within those limits is termed a "chance cause", a variation born of random noise needing no investigation. If one wants to reduce the variation or raise the numbers en-masse, the system common to all the numbers must be investigated.

There are many books on the subject. The book I own is called "Understanding Statistical Process Control by Donald J. Wheeler and David S. Chambers. This book contains all the equations you need to calculate the UCL and LCL, as well as the math to diagnose numerous problems that can be evaluated statistically. As is obvious from this article, I haven't even scratched the surface of the information contained in this book.

Steve Litt has presented at several ASQC meetings, on the topic of Troubleshooting. He can be reached at Steve Litt's email address.

Well Defined System Troubleshooting

By Steve Litt

There have been many processes optimized for well defined systems. As mentioned previously, well defined systems can take for granted that the solved state is "the as-designed state and behavior". Unnecessarily analyzing the solved state is very inefficient.

Universal Troubleshooting Process

I won't spend too much time on this, because by my count the Troubleshooters.Com website devotes upwards of 100,000 words on this subject. I originally evolved the Universal Troubleshooting Process as a way to fix consumer audio equipment. Since then, it's found wide diagnosing many types of systems, especially computerized systems. The Universal Troubleshooting Process consists of these 10 steps:

Get the Attitude
Get a complete and accurate symptom description
Make damage control plan
Reproduce the symptom
Do the appropriate general maintenance
Narrow it down to the root cause
Repair or replace the defective component
Test
Take pride in your solution
Prevent future occurrence of this problem

In my opinion this is the most widely applicable Troubleshooting process, and therefore should be the default. It works very well in the absence of vendor support (i.e. smart manuals, Era 4, etc.). Obviously this makes it a perfect match for the software industry.

It's the only technology Troubleshooting process I know of that addresses the mental outlook of the Troubleshooter, instead of considering him or her a perfectly rational robot.

Jack Ganssle's Looping 6 Step Process

Embedded systems development guru Jack Ganssle's looping 6 step process was created to optimize Troubleshooting in circuits under design. This requires a whole different set of assumptions, most notably that you can't trust the (unfinished) design. Remember that it is a loop -- after step 6 you go back to step 1. This is especially true if your fix was a temporary one (what I would term a coathanger fix). Remember, in design, you might not want to get bogged down with fixing the right problem if all you need is to get the subsystem to work so you can continue the design of another subsystem. Naturally, Jack demands that any "coathanger" fixes be cleaned up before going to market.

Observe the behavior to find the apparent bug.
Observe collateral behavior to gain as much information.
Round up the usual suspects.
Generate a hypothesis.
Generate an experiment to test the hypothesis.
Fix the bug.

So his steps 1 and 2 are like the UTP's step 2, his step 3 correlates roughly to the UTP's step 5, with his 4 and 5 being part and parcel of the UTP's step 6. His step 6 is the UTP's step 7, except that his fix may not be the long term fix, due to the realities of a design situation.

To really understand the beauty of Jack's methodology, you need to visit his Troubleshooting web page (in the URL's section of this magazine). Throughout the explanation of his method runs a message of "be careful, trust nothing". Obviously, an in-process (and therefore incomplete) design is buggy, and may not behave as expected. There could be double and triple root causes, and the system may not be designed as you think it is.

If you're in the middle of creating an electronic design, use Jack's methodology. Although I personally use the UTP in troubleshooting software under design, it's very possible that Jack's would be better optimized for that. Basically, when designing anything, make sure you're familiar with Jack's 6 step loop.

Era 4 Troubleshooting Systems from Intelliworxx

Jim Roach is Vice President, Mentoring Systems at Intelliworxx, the company offering the first (to my knowledge) working Era 4 Troubleshooting tool. In case you're new to Troubleshooting Professional, Era 4 refers to an expert system with a built in valid Troubleshooting Process. Here's how it gets its Era 4 name:

Era	Name	Range	Description
1	Observational Troubleshooting	From invention of the bow and arrow until the invention of the steam engine (8000 BC to 1700's)	Observation only. Systems under repair have all components visible, so the problem is obvious. Little diagnosis needed. On the other hand, repair/replacement of component requires precision, one of a kind work.
2	Intuitive Troubleshooting	From invention of the steam engine until the 1970's	Observation and non-rigorous diagnostic process. Systems under repair still contain only a few components, though some aren't visible to the naked eye. Diagnosis required, but doesn't need to be rigorous. Replacement parts likely to be available from a vendor, but may be difficult to replace.
3	Process Troubleshooting	From 1970's until the present	Observation and rigorous diagnostic process. Systems under repair contain many (>10,000) components, most abstract or invisible to the naked eye. Non-rigorous diagnosis produces circular search and rework. Rigorous diagnosis required. Replacement parts available from a vendor, and due to modularity often easy to install. Software components are often replaced in five minutes with a few keystrokes.
4	Technologically Enhanced Troubleshooting	From now until the next era	Observation and rigorous diagnostic process, aided by context-relevant technology-served information (Troubleshooting process aware smart manuals). Systems under repair are now hugely complex, not always completely modular. Observation and rigorous diagnostic process alone takes too long, because no human can have the complete Mental Model, manual and diagnostic information in his or her head. Replacement parts are stock.

I met Jim when he emailed me in 1996, wanting some advice on placing a Troubleshooting Process in an automated diagnostic system. At the time he was a highly placed training executive in GM. If it had been anyone else I would have chalked it up to "another expert system marketed to replace a human Troubleshooter". But Jim knew his stuff, and it was obvious he understood Troubleshooting Process through and through. So we talked frequently throughout 1996 and the first half of 1997.

But I really didn't understand the finished product. I'd heard so much about it, but didn't understand what the finished product would be like. Until 1999, when I tried it in at a conference. It was incredibly easy to use. Jim Roach gave several demos where people used it to find bugs (intentional malfunctions) placed in a Cadillac.

This Troubleshooting Process is highly optimized for situations in which the vendor has provided voluminous service documentation, including quickchecks, error code documentation, predefined diagnostics and the like. The process starts out with symptom acquisition and reproduction. Next what would be called General Maintenance in the UTP, including tech bulletins, diagnostic codes, and the like. This is followed possibly by a divide and conquer session, guided to the extent possible by existing predefined diagnostics. The final step is repair and testing.

So far it sounds like the Universal Troubleshooting Process. But if you look at the details, the Intelliworxx model assumes that most problems will be solved with the help of existing documentation, and that the Troubleshooter won't need to devise his own diagnostic tests. Given the volume of system information in the smart manual, for the first time this becomes a viable assumption. Indeed, combined with a smart manual on a voice actuated, ruggedized hands-free computer, the Troubleshooter has instant, just in time access to exactly the necessary information. At every stage of the game, the first priority is to look at existing documentation. That documentation is a smart manual (or as Intelliworxx would call it, a mentoring application). Only when all documentation has been delivered to the Troubleshooter, without a solution being found, does the Troubleshooter go "offroad", creating and testing his own hypotheses. At that point, the well equipped Troubleshooter would know the Universal Troubleshooting Process, which is optimized for those times when relevant system documentation is not available.

Authoring such a smart manual is costly, but so is a truly detailed paper manual. This Troubleshooting Process enhanced, voice actuated, hands free smart manual, is the first tool integrating effective, low cost information lookup with Troubleshooting Process. For the first time it's quicker to follow predefined diagnostics than to create your own. In industries providing detailed, accurate and timely system information (the automotive industry is a perfect example), it can multiply productivity.

Contrast this with an industry like proprietary computer equipment and software, where documentation is incomplete and scattered. In software diagnosis, if you haven't found the info in 10 minutes you're probably better off diagnosing it yourself. We software guys can only dream how fast Troubleshooting could be if we had instant, as needed access to all accumulated knowledge of the machine or system.

A link to the Intelliworxx website appears in the URL's section of this magazine.

Conclusion

This article has discussed three processes for solving well defined problems. As you can see, all three are very similar. As you can imagine, all three are interchangeable with each other. But that doesn't mean one size fits all. They're optimized for different situations, so their economics can vary widely. In design situations, you'd use Jack Ganssle's process. In those situations where the equipment or system's vendor is nice enough to provide a Troubleshooting Process based delivery system for complete system information, the Intelliworxx system yields the quickest and most reliable solutions. And in the remainder of cases, the Universal Troubleshooting Process empowers the Troubleshooter to quickly and effectively isolate the root cause with a minimum of help.

Steve Litt is the author of "Troubleshooting: Tools, Tips and Techniques". He can be reached at Steve Litt's email address.

Comparison

By Steve Litt

The following is a comparison chart showing various properties of several different problem solving methodologies. The ones in light blue are intended to be used on well defined systems, and the ones in light red are intended to be used on fuzzily defined systems:

Well Defined Systems

Fuzzily Defined Systems

Method	Repro- ducibility	System definition/ document- ation	Safety	Availability	Comment
Universal Troubleshooting Process	Reproducible to semi frequent intermittents	Well defined to almost well defined	Non-critical to individual injury risk	Free to read, very extensive description on the web. Modest licensing fee for in-house training. www.troubleshooters.com/ tuni.htm	Optimized for machine and computer system troubleshooting, especially when vendor support is lacking.
Intelliworxx Era 4 tools (Jim Roach et al)	Reproducible to semi frequent intermittents	Extremely well defined only	Non-critical to individual injury risk	Available for purchase from Intellex. Was free to read on web, and might be again. Requires system specific software and electronic documentation with scripts.	The fastest way to troubleshoot machines and systems for which process-based troubleshooting scripts are available.
6 Step Loop Jack Ganssle	Reproducible to semi frequent intermittents	Well defined to almost well defined	Non-critical to individual injury risk	Free to read on web. www.ganssle.com/ articles/atblsho.htm	Optimized for troubleshooting used as a part of the design process.
Statistical Process Control	Reproducible to semi requent intermittents	Well defined to barely defined	?	Readily available in book/textbook form. Training and college courses available.	Optimized to fix problems with processes running over a long interval of time.
Theory of Constraints	Reproducible to frequent intermittents	Well defined to deducible / documentable	Non-critical. Augment with other methodologies in safety critical situations	Available in inexpensive and easy to read books. Heavy consulting/training available from Goldratt institute.	Optimized to solve problems whose symptom descriptions include "too little", "too much", "too long", etc. Works extremely well in fuzzily defined systems, used extensively in manufacturing
Generic Problem Solving	Reproducible to somewhat sparse intermittents	Deducible / documentable to barely defined	Non-critical to seriously critical	Almost impossible to find on web. Typically marketed to large corporations as expensive training.	Optimized for fuzzily defined problems in which little can be assumed.
Re- engineering	N/A	Well defined to undefined	Best relegated to non-critical	Very well explained in Hammer's "Reengineering the Corporation". Many different flavors.	Optimized for businesses so far off the mark that Theory of Constraints and "Constant Improvement" are not an option.
Root Cause Analysis	Sparse intermittents to single events	Well defined to deducible / documentable	Used in nuclear powerplants. Addresses all safety issues	Very well explained in Ammerman's "Root Cause Analysis" book. Other books available. Training available.	Optimized for sparse intermittents and single events -- detective work. Especially necessary in safety critical environments.

Steve Litt can be reached at Steve Litt's email address.

Generic Problem Solving

By Steve Litt

I tell people that I teach Troubleshooting, and they ask "Troubleshooting what?". I say "anything", and many give me the dreaded blank stare. But some ask, "can I use it to Troubleshoot relationships?".

There was a time I answered that question affirmatively. Not that I ever successfully used it to solve an interpersonal relationship, but problems are problems, right?". I actually made the mistake of using relationship problems as examples in some courses. Sometimes it worked, and sometimes it bombed very badly.

Eventually I backed off my claim of being able to solve relationships, guaranteeing only to solve problems in "well defined systems". But I still didn't know why the Universal Troubleshooting Process didn't work on fuzzily defined systems.

To figure that out, I had to learn about the anatomy of a problem. You see, a well defined system problem is a subset of generic problems. A well defined system comes with two additional pieces of information for the Troubleshooter:

Complete definition of system components and their relationships
An as-designed state and behavior

The as-designed state and behavior is the important one. Indeed, even without a complete definition of components and their relationships, the Troubleshooter uses observation and testing to deduce those components and relationships. Many troubleshooters seldom crack a manual, even on unfamiliar equipment. If the equipment is unfamiliar, they probably don't have the manual, so they deduce the design.

But the as-designed state and behavior is a vital distinction. Here's why...

The most basic definition of problem solving is the following 2 step process:

Analyze the problem state
Analyze the solved state

#1 really means "find the root cause". It's exactly what the Universal Troubleshooting Process is optimized to do. But #2 calls for finding and implementing the best solution -- a selection among many alternatives, and implies a design process. For instance, John and Mary aren't getting along due to insufficient communication. They've actually found the root cause to be too little communication, solving the problem state. Now what do they do about it? There's no as-designed state and behavior for a marriage.

They both work 75 hours a day, and have responsibilities for their son and his activities. This obviously leaves no time. Should they:

One quits his or her job, and adjust their budget and living accommodations to the lesser income.
Both demand their companies respect their time, cutting their workload to 50 hours/week, and take the chance of getting fired, laid off or passed over for a promotion.
One or both get new jobs that take less time, and take the chance of an interruption of income.
Surround Junior with tutors and sitters to give them more time.
Take a second mortgage out on their house, and start a business together.

These, and plenty more alternatives must be weighed. For each one the pros and cons must be discussed, along with implementation details. The final choice is the solution of the solved state.

Generic problem solving methodologies are equipped to address both the problem state and the solved state. The Universal Troubleshooting Process is not a generic problem solving methodology, and therefore is sufficient only in the subset of problems in which the solved state is "restore to the as-designed state and behavior". So why would anyone use the Universal Troubleshooting Process, when generic processes handle all problems?

Simply stated, tools meant to design and to weigh alternatives are wasteful in problems where the solved state degenerates to "the as-designed state and behavior". This issue has an article called Cars and Tanks that discusses just how wasteful this can be.

So the purpose of this article so far has been to show that generic problem solving methodologies and problem solving methodologies optimized for well defined systems are neither interchangeable, nor viable substitutes for each other.

The previously listed two step representation of generic problem solving actually leaves out a couple things. A more realistic generic problem solving process would look like this:

Analyze the problem state
Analyze the solved state
Determine how to transition to the solved state
Prevent future problems

It's interesting that #2 and #3 are interdependent. You can't find the best solved state without knowing the cost of transitioning to that state, and you can't know how to make the transition without knowing what you're transitioning to. So they're probably best approached iteratively:

           .-------------------------------.
           |                               |
           |                               V
Analyze the solved state          Determine the transition
           ^                               |
           |                               |
           `-------------------------------'

There are different kinds of future problems to prevent. One is a recurrence of this same problem. That's achieved primarily by the problem state analysis. Then there's the creation of different problems. This is an area for study in the solved state analysis.

Creativity Jump Starts

As mentioned, analyzing the solved state requires creativity, design, and choice among alternatives. Different methodologies use different tools to help the problem solver think creatively. Perhaps the most straightforward is Robert Schuller's "Possibility Thinking", in which the problem solver lists 10 ways to accomplish something without ruling on their aptness or practicality. Doing this often leads to an idea that might have been missed otherwise.

Many of us think most creatively when exercising. I've designed many a computer program while skateboarding, bike riding, or walking through the woods.

The Universal Troubleshooting Process's Troubleshooter's mantra, "How can I narrow this down just one more time?" can easily be adapted to creativity jump starting. There are other questions that can jump start creativity:

How can I use this (perceived problem) to my advantage?
Does the task or process even need to be done at all? Why? Can I eliminate part of it?
How can I simplify?
What are the priorities? Why?
Whose help would be valuable in thinking this through?
Can I think of similar problems or situations?
Who has solved a problem like this before? What did they do?
What are the ethical considerations?
Who has a stake in the direction of the decision? Why? Have I talked to them?
How can I reduce antagonism in the decision making process?
Construct questions with words who, what, where, when, why, how and how much.
Construct questions with phrases "what is" and "what isn't".

Methodology Availability

For some reason, the generic problem solving methodologies are not well publicized. To my knowledge, none of the generic problem solving methodology vendors publish any useful methodology information on the web, or even in readily available books. They seem to market directly to large corporations. Although that enables them to keep control of the information and keep the prices high, it severely limits their market. I have a feeling that very soon competition will adjust the market, and whomever makes their techniques available on the web will rule the generic problem solving marketplace.

If you're lucky enough to be in a company whose budget allows you receive generic problem solving training from a vendor, do it. If not, I'd suggest a thorough reading the web material of Dr. Charles Camp and Fred Fred Nickols. Both are excellent, and the URLs are in the URL's section of this magazine.

And a word of warning. One day a generic problem solving vendor may try to sell you generic problem solving training for your technologists, even though the technologists primary troubleshooting is technical (restoring well defined systems back to their as designed state). These vendors may even offer to train you to be a trainer in their methodology, with the obvious resume benefits.

Sounds great, but be very careful. Because when it comes to fixing well defined systems, if your organization uses a generic problem solving methodology and your competitor uses something like the Universal Troubleshooting Process, they'll clean your clock.

Read the next article.

Steve Litt can be reached at Steve Litt's email address.

Cars and Tanks

By Steve Litt

Rush hour is frustrating. Ever wish you could drive a tank to work?

Imagine driving to work in a M1A1 Main Battle Tank, also called an Abrams tank. It has tracks instead of wheels, and it can go absolutely anywhere. It can roll over obstacles 42 inches high. It can cross trenches 9 feet wide. It can go up a 60 degree slope. And due to its almost impenetrable armor, its 120mm main gun, and three auxiliary machine guns, it can traverse the most hostile environments. An M1A1 main battle tank can go almost anywhere on land, including the freeway. So why use a car to go to work, when cars accommodate only a small subset of terrains?

One reason is that the M1A1 gets less than 1 mile per gallon of gas. Working only 10 miles from home, you'd pay $150/week in fuel alone. On those rare occasions when the freeway travels full speed, the M1A1's 45 mph maximum speed is a liability. At a length of 32 feet, one inch, a height of 9.5 feet, and a width of 9.5 feet, parking is a problem.

There's no doubt the M1A1 can get you to work. But your friends driving Chevy Luminas get there faster, cheaper and more conveniently. Yes, the M1A1 can go anywhere, but that ability is costly indeed.

Reminds me of companies selling general problem solving training to those requiring electronic, mechanical or computer Troubleshooting.

Mechanical, electronic and computer troubleshooting is a subset of problem solving. Machines and automated systems are well defined systems. By that I mean they have a documented and well defined state and behavior. Fixing them requires only returning them to their as-designed state and behavior. You needn't analyze the solved state, with its heavy design and creative thinking requirements. You needn't ask how you want the machine to perform after repair -- you already know that. It must perform as designed. You needn't ask if there's some better way you can do it. All that's necessary is to get it back to its as-designed state and behavior.

Some training vendors are all too happy to sell you a generic problem solving course for your technical people to use on machine/computer/software problems. Such generic problem solving methodologies contain several time consuming steps necessary only to design the solved state (which degenerates into the as designed state and behavior for machine, computer and software problems). The vendor might justify this by mentioning that the generic problem solving methodology can solve all problems, including those of machines, computers and software. They're telling the truth, and it's about as practical as trading in your car for an Abrams tank.

If you want to win, you go to war in a tank and the office in a car. If you want to win, you fix technical problems with a Troubleshooting Process optimized for technical problems, and fuzzily defined problems with a generic problem solving methodology. If a person solves both types of problems, train him in both methodologies.

So the question you need to ask is this: How would it affect my business if my competitors used more optimized Troubleshooting methodologies than my company?

Steve Litt presents and licenses courses on troubleshooting well defined systems, and well defined systems only. He can be reached at Steve Litt's email address.

Specialized Problem Solving Methodologies

By Steve Litt

This article covers some methodologies highly specialized to work in specific situation. Many are so specific that it's a stretch to call them problem solving methodologies, but most are very helpful in their domains.

Human-Centric Problem Solving Methodologies

Have you noticed how few machine/technology repair methodologies take into account the person doing the repair? They consider him or her a robot. They focus exclusively on the system under repair, never on the person doing the repair.

Except for the Universal Troubleshooting Process. The UTP's Step 1 (Get the Attitude) and Step 9 (Take Pride) are inward focused.

Outside of Star Trek and Isaac Asimov novels, I've never met a Troubleshooting robot. What I have met are technicians angered to the point of throwing equipment across the room, and computer managers panicked to the point of paralysis. Frequent failure results from methodologies that doesn't at least consider the person doing the fixing. None of us is a robot, and if we haven't learned to control our emotions, those emotions can wash away all logic.

But the Universal Troubleshooting Process isn't human-centric. Only two of the ten steps are. The majority is procedural. In machine/technology repair, all you need to achieve is the as-designed state and behavior. Contrastingly, many methodologies are almost entirely focused on the person doing the problem solving. I call them human-centric methodologies.

And that's exactly what's needed when the system under repair is the problem solver himself. In such cases, the problem solver is troubleshooting himself, surrounded by an environment for which he has limited control. Personal problems are typically triggered (I didn't say caused, I said triggered) by either changes in the environment around the person, or a "bad break".

Note: Such "self help" can be propagated by trainers training others in these methodologies. They can even be extrapolated to solution by consultants using human-centric principles to fix business problems, but this runs the risk of the "program of the month" label if not done in a way that respects employees' intelligence, time and motivation.

Personal problem solving is complicated by several distinctions:

It's incredibly difficult for a person to know what he or she wants (identify the solved state).
By definition it's impossible to retain what the UTP calls "The Attitude", because without panic, fear, depression or anger there wouldn't be a personal problem in the first place.
There is no clear documentation on how the human mind works, leaving the door open for numerous conflicting philosophies, religions and personalities to declare themselves "the one true way".

Step into the "Self-Help" section of your local bookstore and you'll see no end of human-centric problem solving methodologies. Many are self-serving fluff. Here's how I personally assign grades to human-centric problem solving methodologies:

A good methodology doesn't insult the intelligence of the reader.
A good methodology doesn't insult the intentions of the reader.
A good methodology is easy enough to follow that it will show some results even when tried by a skeptical reader.

Certainly those "believe it will happen, and it will" books insult the reader's intelligence. The intelligent reader knows that lying just under that heart warming story of the Olympic Champion are stories of 100 runners up and also-rans, all with huge beliefs and desires, who didn't make it. Most intelligent readers know that although faith is a necessity, so are hard work, planning, luck and inborn abilities. Most intelligent readers know that sometimes you aim for a goal and never get there. Intelligent readers know that often the true measure of success is recognizing another opportunity as you miss the goal of your dreams.

When it comes to insulting the intentions of the reader (or in this case listener), certainly radio's pop-psychologists corner the market. The listener calls in with a serious and complex problem, and the first thing the radio guy does is paints the listener as weak, stupid and unethical. Yeah, that's real productive!

A problem solving methodology is effective only if followed. It is followed by a reader only if the investment is justified by the expected return. The average reader has been unsuccessful with some human-centric methodologies in the past, and is thus skeptical, meaning the expected return is uncertain. Therefore, the book expounding an overly complex or demanding methodology is read but not followed. The methodology offering a quick and easy route to small benefits, followed by a staircase of additional investments with additional gains, is followed.

There are effective methods of "putting across" complex and difficult methodologies. One method is charging a lot of money, so it makes it worth the student's while to devote a month to learning and practicing the methodology. This is accompanied by live lectures, one on one consultation, tapes and exercises. This will work with methods that, once fully learned, are truly beneficial. But it requires extraordinary faith in the program. And unfortunately, some are in it just for the money. As the old saying goes, once bitten twice shy.

Another component of benefiting even the skeptical reader is use of plain language. Methodologies defining their own terms, and especially those engaging in psychobabble, are much less credible to the intelligent person.

Purveyors of human-centric methodologies would be wise to study the merits of the Universal Troubleshooting Process, which itself is not human-centric. The UTP requires very little buy-in -- merely hanging the list of steps on your wall produces improvement. A single day of studying step 6 produces another vast improvement. And as the reader gains confidence, there's a continuing stepwise route to study and improvement. And the reader can pretty much determine the order and pace of the steps.

The documentation of the Universal Troubleshooting Process is devoid of jargon. Extra effort has been taken to use plain language to define and discuss its concepts.

Human-centric problem solving methodologies are optimized to solve personal problems, but they can be helpful solving other problems. Consider that many highly paid technical Troubleshooters are limited by their emotions, especially in the face of extremely difficult problems. All other things being equal, including training in technical Troubleshooting Process, the technical Troubleshooter who has mastered an effective human-centric problem solving methodology will be more effective solving technical problems.

There are many outstanding human-centric methodologies out there. Following are discussions of these human-centric methodologies:

The 7 Habits of Highly Effective People
Schuller's "Tough Times Never Last, But Tough People Do"
NLP
Affirmations
Psycho Cybernetics

The 7 Habits of Highly Effective People

Steven Covey wrote a national bestseller called "The 7 Habits of Highly Effective People" in 1989, and changed the world. This seven habit method (or group of 7 tools, or steps, or however it's viewed) is exquisitely human-centric yet avoids the common pitfalls of psychobabble, religious indoctrination, insults, pollyannaism, and all the rest that robs many similar teachings of credibility. It's my favorite human-centric problem solving methodology. Here are the seven habits:

Be Proactive
Begin with the End in Mind
Put First things First
Think Win/Win
Seek First to Understand... Then to be Understood
Synergize
Sharpen the Saw

I guess #1 is pretty obvious. Unless one takes action, one cannot move toward a goal. To quote Intelliworxx's Jim Roach, "if you always do what you've always done.....you will always get what you always got!" And speaking of goals, that's #2, although Covey phrases it as "Begin with the End in Mind". Obviously, without a goal, reaching something good would depend on pure luck.

#3 deals with use of time and prioritization. If you're a regular reader of Troubleshooting Professional, you know I believe in easiest-first prioritization. But that's WITHIN the prioritization methodology Covey discusses in #3:

	Urgent	Not Urgent
Important	This is firefighting, and should be avoided	Spend your time here, so important stuff gets done before it's urgent
Not important	Waste of time	Waste of time

Blow off unimportant tasks. If a task doesn't help you reach your goal, and not doing it won't prevent you from reaching your goal, don't do it. Now of course, somebody else might consider it urgent that you do unimportant tasks. Try to work it out so you're not required to do them, because they're a waste of time. Or figure out a way to make them important in reaching your goal.

Note the similarity here to my admonition not to troubleshoot unprofitable work. This is discussed in the August 2000 issue of Troubleshooting Professional.

One might instinctively think important and urgent tasks are where we should spend our time. But in fact, that's not true. Tasks always take longer when they're urgent. Urgent tasks spawn the need for explanations, written reports and meetings. Many individuals you work with react to urgency with anger or panic, both of which lead to costly mistakes. And of course, if you go over the deadline and don't sell the product or get something to market, that's extremely costly.

So the object is to have all the important stuff done before it gets urgent, which is why once you've gotten your fires put out, spend all possible time on important tasks that are not yet urgent, and make every effort to prevent important tasks from becoming urgent.

Covey explains that the first three habits are done in isolation -- you don't need to collaborate to accomplish them. The next three are collaborative.

Thinking Win/Win usually promotes success, depending on the definition of success. We all have seen enough of the world to know it's not an absolute prerequisite for success. We all know of companies succeeding through playing dirty tricks on their competitors rather than making good product. But in the vast majority of cases, the win/lose crowd finally become losers themselves, either because the ethical emptiness of their lives leads to substance abuse or other problems, or because they run afoul of the law. Or both. The intelligent person knows that sometimes each of us must "go to war", but generally speaking, Win/Win is the best policy.

Most sales books I've read have a paraphrase of habit 5, "Seek First to Understand... Then to be Understood". Sales books typically mention you have 2 ears and one mouth, and to use them in proportion. In a technical Troubleshooting scenario, you'd never attempt a fix before knowing the root cause -- you need to understand first. But understanding is not always so easy. Psychology 101 teaches us there's a principle that most people attribute their own actions to their situation, but they attribute the the actions of others to the others' personalities. I drive 80 mph on the freeway because I'm late to my wedding. The guy in the red car drives 80mph because he's reckless. One cannot really deal with others until one understands their situation. You need to walk a mile in the other guys shoes.

Covey's habit #6 is "Synergize". Work with a group in such a way that the whole becomes greater than the sum of the parts. I've found that if you put 10 people in a room they can accomplish just about anything, because there's almost no piece of knowledge not possessed by at least one. If the people practice the first 5 habits, #6 can be correctly accomplished. Once one can work synergistically with a group, he or she can accomplish amazing things. So what's left? Why is there a habit #7? For exactly the same reason there's a step 9 (Take Pride) in the Universal Troubleshooting Process.

Consider this: If you saw wood for a living, you'd surely sharpen your saw when it got dull. Otherwise, your productivity would plunge. Sharpening the saw is Covey's 7th habit.

People get dull after protracted periods of hard and effective work. They must be sharpened. Covey lists four categories of such sharpening -- physical (exercise, nutrition, sleep), mental, spiritual, and social/emotional. I'd like to add a fifth -- savoring triumph, which definitely encompasses mental and social/emotional. If done during a walk, skating, bike ride etc., it encompasses the physical. And you know what? I've had cases where savoring triumph approached a spiritual activity.

Covey's 7 habits are my favorite human-centric problem solving methodology. It's common sense, without inordinate amounts of early 90's mission statement pabulum. And it's so simple that an average individual can read the book and begin to put it into practice. One thing I really like about the 7 habits is they avoid the "believe in it, and it will come" trap, instead portraying faith as one necessity among many.

Schuller's "Tough Times Never Last, But Tough People Do"

Doubledigit unemployment ruled the land when Robert H. Schuller wrote his classic "Tough Times Never Last, But Tough People Do". In hindsight the recession of 1982 was no big deal because it ended suddenly in the 1982 holiday season. It was a short recession. But in the fall of 82, with no end in sight and 10.8% peak unemployment, it was a time of dread and hopelessness.

Schuller wrote this book to help the fearful and hopeless in that dreadful year. Above all, this book is exquisitely inspirational. Read just the first chapter to see how Schuller weaves the present (in 1982) economic disaster with his own hard and poor childhood, working back forward to the present, ending with what all but the most hardcore Atheist would call a gift from God. Even the Atheist would call it a 9.2 on the Richter scale of lucky breaks. Armed with the "anything is possible" feeling the first chapter bestows, Schuller lays down several sound principles, tips and techniques for snatching victory from the jaws of defeat.

Schuller gives some great marketing advice. His "how do you catch a marlin" discussion details the need for access to a market for your product. His "possibility thinking" discussion with its 10 alternative method is the self-help equivalent of "how can I narrow it down just one more time?" question in technology Troubleshooting.

In every chapter, Schuller has lists to be followed, and examples of people succeeding by following those lists.

This book is a must-read for anyone facing seemingly insoluble problems. After all, it was written for just such an audience.

NLP

Neuro Linguistic Programming is a very powerful, and very complex method used by an individual to make himself or herself happier and more effective. What little knowledge I have of NLP comes from the writings of Tony Robbins. The discipline is huge and detailed. One of the keystones of NLP is reframing, where the individual chooses how to interpret events. For instance, if your wife of 30 years cleans out your bank account and runs off to the Riviera with the pool boy, you could see it as an opportunity to begin again. There are many, many tools one uses to reframe. One such tool is visualizing the "problem", and then envisioning it smaller and smaller so it loses importance.

The basic idea is that you're effective when you're in a good and creative mental state. Rather than waiting for external events to put you in that state, you can put yourself in that state and then reap the benefits. As Abraham Lincoln said, "a man is just about as happy as he decides to be". Of course, I understand Lincoln had a problem with depression :-)

I believe that the NLP techniques I've read in Anthony Robbins' books "Awaken the Giant Within" and "Unlimited Power" are excellent personal problem solving techniques that cannot be learned by skeptics, or even the unconvinced. They require a huge commitment, maybe a month or more, to practice and master the techniques. I see no "low hanging fruit" that can boost results in a day or two. Robbins includes many real-life exercises you need to do. If you don't do them, you'll get little from the books.

If you believe your problems can be helped by a better life outlook or a better use of your mental resources, and you're willing to invest a lot of time, I'd highly recommend getting these books, and taking the significant time it takes to work through and master their exercises. I cannot recommend these books, or the methods they espouse, to those unwilling to make a substantial commitment to mastering their techniques.

Note: There's a Tony Robbins book called "Notes from a Friend: A Quick and Simple Guide to Taking Control of Your Life". It's based on "Awaken the Giant Within" and "Unlimited Power". It may be a way to ease yourself into these techniques without big time prior commitment.

Affirmations

I see this as a tool rather than a human-centric problem solving methodology. If the bottleneck in your life seems to be lack of self confidence, or procrastination, or lack of a goal, affirmations can help boost your performance by boosting self confidence, reducing procrastination, and verbalizing a goal.

To solve this type of problem, Keith Ellis recommends, in his article titled "Affirmations", saying something like this:

"I choose to joyfully become a great Troubleshooter."

I choose	joyfully	become a great Troubleshooter
Mr. Ellis recommends we always recognize that every change we make in our life is a choice.	Mr. Ellis recommends we maximize the power of every affirmation by injecting positive emotion, such as the adverb "joyfully".	This is the actual goal of the affirmation.

Once again, this solves only problems whose root cause is attitude. But it's powerful, because a single destructive attitude leads to tunnel vision. If your goal requires $100,000 to start, and you've got only $859 in the bank, it's just possible that an affirmation like this could break the conflict:

I choose to joyfully find inexpensive ways to quickly reach my stated goal.

Psycho Cybernetics

This book by Maxwell Maltz has been around since at least the early 70's. I see some value in it, but remain skeptical after the author seriously says that closing your eyes and visualizing an athletic move can be as valuable as practice. However, he was one of the first modern authors to make the connection between visualization and execution.

Sales Optmized Problem Solving Methodologies

Sales and marketing are always a problem. After all, you always need more sales. Even if your factory is running full blast, you need the ability to sell at higher prices. As the first non-salesman in a long line of salesmen, I've read many sales books. A few were great, most were good, a few I felt were ridiculous. Below I review a few sales and marketing methodologies.

Attention, interest, desire and action

This is the old marketing and advertising model. Basically, you can't ask for action until there's desire, you can't get desire without interest, and you can't get interest without attention. Sometimes a fifth step, "conviction", is added between desire and action. That's more appropriate for sales, because mere desire isn't enough to close someone.

The attention and interest steps are done during prospecting, with desire split between prospecting and sales calls. Conviction and action are done during an actual face to face meeting (usually). This meshes very nicely with the sales funnel, described later in this article.

Steve Litt's Access to a Market for your Product

Picture of slot machine with access, market and product on line

After reading many books, and observing the successes and failures of myself and others, I've reached the conclusion that sales and marketing boil down to having Access to a Market for your Product. It's the matchup of Access, Market and Product that IMHO is a prerequisite for good sales. You don't sell your acting skills in Wisconsin or your farming skills in Los Angeles. Because access is hardest to obtain, choice of new products should always favor the markets to whom you already have access. You can slowly grow the boundaries of your market access area, but don't make a product outside it.

Some people think they don't have access to any market. Not true. All one needs to do is look around at his friends. That's access. What kinds of people does one get along with well. Some of us get along with the upper crust, and some of us have friends in low places. We sell to the people we get along with.

Almost everyone gets requests for advice and help. Many of these requests are for favors that don't involve money, but make no mistake, if people are asking you for advice and help, you have something to offer. The trick is to expand your access outside your immediate group, because it's often hard to charge a fee to friends. Sometimes you can even sell your knowledge in the form of books.

Before you go out and base your entire sales strategy on my ideas, keep in mind that I'm just a middle class guy, and with my products, if I were a great salesman I'd be a multi-millionaire. But I think once I get the execution down correctly...

The Walter Litt Three Base Approach

Walter Litt is my dad, and I've seen him in action many times. He's retired now, but he was a very successful salesman. Unlike me, Dad can't easily describe his actions, tactics, and strategies, so I've had synthesize his techniques. I believe the key to his success is:

Gain access
Make it easy to buy
Bury the cost in the big picture

Gain access

My Dad is friendly and intelligent and has reasonable social graces, which means that buyers like to be around him. Like all other salespeople, he counters objections. But he doesn't argue. For the brief time I worked for him, he advised me never to win the argument but lose the sale. If you're pleasant to be around, you'll get another try. This, and the fact that he was fun to have at parties, got my Dad great access to buyers.

He also concentrated on getting more business from other departments in businesses he sold to. For instance, he was in the Sears Tower so much that he was able to expand from labels to tags, selling them to many different departments.

Make it easy to buy

Dad sold labels. You know, like what's on the inside back of your shirt. Doesn't sound like much until you consider that his #1 account in the 60's, 70's and 80's was Sears. He wanted to be an artist, but the Great Depression intervened. Instead, he used his art skills to create drawings of labels, so in one visit he and the buyer could design the label. His competitors had to phone the factory to make a drawing, and hope that their description of the customer needs would be reflected in the drawing. Walter Litt made the buyer's life easy by creating a good design immediately. This is just one example. My Dad made it easy to buy in every way. He had the paperwork drawn up by people good at that stuff, so there were no mistakes. And when there were mistakes, my Dad took complete responsibility. Yep, everyone wanted to buy from Walter Litt.

Bury the cost in the big picture

But of course, a buyer can't always buy from the guy who's easy to buy from. There's price pressure to consider. In the 1970's, Walter was selling labels made by Americans in North Carolina, for almost double the price of Japanese labels. He single-handedly kept over 100 American factory workers employed in the face of the Japanese competition that was cleaning our clocks in the automotive and electronic industries. How did he do that? Did he ask the government for tariffs?

Nope, he just made sure his quality was as good as the Japanese, and then buried the extra cost in the cost of the entire item. A $40.00 dress has a single label. A Japanese one might cost a penny, the American one two cents. That penny difference is 0.025% of the sales price. It's negligible, and Dad approached it that way. It was much easier to buy from him, and the cost was 0.025% of the dress. It's a no brainer.

It could be said that burying the cost in the big picture is practical only with very cheap items. But in fact it's simply easier to explain and understand with cheap items. If you program computers and charge $20.00 more per hour than the competition, but you can better demonstrate that you're likely to do it right, you're easy to buy from. Now multiply the $20.00 differential by 2500 hours for a one year project making a cost differential of $50,000. Now total the hardware and software costs of the project, training, administration, as well as the coding cost that is not differential. In many cases, your differential comes out to be peanuts. And for the icing on the cake, find some way you can save them more than that small percentage, making you cheaper.

Summary of Walter Litt's Method

Walter Litt has had much better success with his techniques than I have, so it's certainly not a guaranteed method. But it certainly bears further attention.

Ralph Desanto's Friendly Method

My buddy Ralph Desanto died a couple years ago, but not before I got to observe first hand the techniques he used to achieve excellent sales success. Ralph simply liked people, so people liked him back. So if there was any way they could buy from him, they did. Of course his prices and products were reasonable, and he was a very hard worker, but aren't we all. The distinction is that Ralph saw the best in everyone.

When Ralph complemented, he complemented the person. When he criticized, he criticized their action. I think he basically believed people are good.

Unfortunately, there's no way you can package Ralph's basic belief in the good of humanity. Few of us have that outlook. But to the extent possible, try to see your customers and prospects as cool people you might like to get friendly with some day. You'll be glad you did.

Mark Kassof's Marketing Tips

Radio researcher Mark Kassof has posted several articles on his website. He calls these articles "Research Insights". I call them the marketing system of Troubleshooters.Com, having used them since 4/1998. Troubleshooters.Com has received over 1000 distinct IP visitors per day during the last two months, and is currently receiving 2000 due to a recommendation from ZdNet. Troubleshooters.Com is rated in the upper 1.3% of sites most linked according to WebsMostLinked.com, who place T.C ranking #13595 out of 1044933 domains in their database.

Not too shabby for a site with no IPO and no venture capitalists. But I had great marketing advice from Kassof.Com. Basically, Mark emphasizes seeking an audience you can realistically attract, and super-serving them. Being a technology and Open Source kind of guy, I super-served the technologist and Linux audience.

Kassof emphasizes that with so much noise in the media, you need to be a little outrageous to be heard. So I did. I told my true feelings about Microsoft. Many friends told me I was doing the wrong thing. They told me I'd be closing doors on myself. They told me I'd blow off 95% of my potential audience. And they were right.

But that remaining 5% loved Troubleshooters.Com, just as Mark predicts in his insights. One of his quotes is that "These days, the 'middle of the road' is a good place to get run over".

I've just scratched the surface here, but it would take 10,000 words to fully explain Kassof's marketing strategies, and I don't have the bandwidth. So go to Kassof.Com, and study both his "Research Insights" and his "Quotations of Chairman Mark". Links to both, as well as to his main page, are in the URL's section of this magazine.

You might also wonder what marketing has to do with sales. In a big company maybe very little. But in a small company, prospecting begins with marketing and transitions to sales as the prospects become more interested.

Don't let it throw you that Mr. Kassof's tips are targeted at the radio industry. They work on the web too. Believe me :-) And for any of you out there who are involved with radio, use him. My reading of his website tells me that if I needed radio research, I'd go to him, because he's independent and not affiliated with a radio chain or consultant, he's unbiased, and he knows radio.

Strategic Selling (Miller and Heiman)

In my opinion this is the best of the Sales Book industry. Authors Robert B. Miller, Stephen E. Heiman (with Tad Tuleja) address the ongoing nature of sales, and manage to turn it into a process. They divide prospects into what they define as a "sales funnel" with the following four levels:

Above the funnel (uncontacted leads and research based hunches)
In the funnel (a lead who has been contacted and is listening)
The best few (an interested prospect who is probably a good fit)
Below the funnel (got the order)

This book rightly points out that your sales activities should be optimized to the level. You'd attempt to close the best few, not above the funnel. And of course, you really don't want to bore your best few with yet another description of all the merits of your offering.

This book makes the point that too many salespeople and organizations hammer the best few until they've sucked it dry, and then discover their abandonment of the upper levels means they need to wait months to refill that level. So they hammer the upper levels and finally get an influx of best few. This explains certain extreme cyclical sales events.

In many other ways, this book takes a systemic and scientific view of sales. I recommend it.

Tom Hopkins

I've read two books by Tom Hopkins, "Selling for Dummies" and "How to Master the Art of Selling". The former is a great book that I would recommend to anyone wanting to get into selling. The latter is not so simple.

First, be aware that my copy of "How to Master the Art of Selling" is the second edition, which appears to have been copyrighted in 1982. It's probable that Mr. Hopkins has changed the book a lot since then.

My second edition book gives many, many examples. That's a good thing. My wife's a Realtor^(R), and I just mentioned an example from this book that directly applied to a situation she was in. Especially refreshing is that Mr. Hopkins hasn't gone overboard on "consultative selling". Tom isn't afraid to admit that he likes to close, and likes to close hard. And he's not afraid to imply that not all prospective customers are peers of Einstein.

Which brings up the problem. Many of the techniques he espouses in my Second Edition book are insulting to an intelligent person. Anybody trying Hopkins' "Ben Franklin Balance Sheet Close" or the "Porcupine Test Close" on me would find themselves out the door (without an order) in a hurry.

"How to Master the Art of Selling" is a must-read for anyone in sales. The author has obviously been there and documented his strategies and tactics. Just carefully temper your adoption of those suggestions that seem to insult the buyer's intelligence. And if you're contemplating a sales career, or new to sales, be sure to pick up a copy of Selling for Dummies".

Non Manipulative Selling

I'm sorry, but I just don't find this book credible. The authors, Tony Alessandra, Phil Wexler and Rick Barrera have leaned so far overboard on consultative selling, that it doesn't match any of my experiences or selling situations I've seen and heard. In my experience, sooner or later many buyers need (and maybe even expect) a small nudge in the direction of purchase. On a positive note, this book shows the flaws of the hard sellers and the indiscriminate closers. This book is a classic, so you should check it out in the bookstore, and buy it if you believe it of value. My copy is second edition, copyright 1992. It may have changed significantly since then.

How I Raised Myself from Failure to Success in Selling

I bought "How I Raised Myself from Failure to Success in Selling" for a quarter at a garage sale. The cover price is $2.00, the picture of author Frank Bettger on the cover would fit in perfectly with 1950's black and white reruns, and the copyright date is 1949. And believe it or not, it was a pretty good book. Once again, it's obvious this guy's been in the trenches. Amazingly, the more things change, the more they stay the same. This book, from the first year of commercial television, is very applicable to our present day e world.

There's a myth that consultative selling started in the 1980's. Check out this line from the 1949 edition:

"I resolved right then to dedicate the rest of my selling career to this principle: Finding out what people want, and helping them get it."

Don't worry though, Bettger's not just a consulatively correct wet noodle -- he also suggests the "sign by the X assumptive close".

Unbelievably, this book is still in print, copyright 1992. It is 24 hours available at Amazon, where it's ranked #7,966 and has a five star review average. Frank Bettger looks about 35 years old on the 1949 copy cover, which would have made him 78 in 1992. To me, this book's staying power makes it not only a classic, but a tried and true resource.

The Greatest Sales Stories Ever Told

Upon reading this book you'll immediately see that author Robert L. Shook deeply loves sales and salespeople. In this book, whose complete title is "The Greatest Sales Stories Ever Told : From the World's Best Salespeople", Shook lets several topnotch sales people tell their favorite stories, and then Shook himself points out the crucial distinctions one can use to boost his or her own sales. Shook further enhances the value of the book by categorizing the stories into book parts, each describing a distinct component of sales.

This book is obviously not a complete sales problem solving methodology, but serves beautifully in the function of example. I believe the seasoned salesperson would profit from this book, right along with the newbie.

In a world where even we salespeople start to believe in the plaid sportcoat car salesman and Arthur Miller's Willy Loman, Shook's deep love and admiration for his profession makes the reader stand a bit taller. If you EVER contemplate selling anything, get this book.

One Business Optimized Methodology: Reengineering

My impression upon a casual reading of reading "Reengineering the Corporation : A Manifesto for Business Revolution", by Michael Hammer, James Champy, is that it throws the baby out with the bathwater. From my casual reading, the basic advice is to throw away the given system and all given policies (not just question them, but throw them away). Then there's the problem that the word "reengineering" has been used by carpetbaggers to distort Hammer and Champy's message into advocacy of slash and burn layoffs. Thus the word is almost without definition.

So from my understanding of Hammer and Champy's definition of reengineering, its optimized for those cases where the business is so out of touch with present day realities as to be "totalled", like a car hitting a bridge at 120mph or a house burned to the ground. If you'd like to "improve" your business while keeping its core advantages, my understanding is that reengineering is not the way to go.

Obviously, my understanding of the term is crude at best, so please don't take my word for it. Read the book for yourself, and see if I've been overly critical. I invite all more knowledgeable on reengineering to write letters to the editor (instructions near the bottom of this page) so that next month the truth about reengineering will surface.

Conclusion

This article touched briefly on methodologies highly optimized for self help, sales and business. Most are so specific as to belong in the tool realm rather than the methodology realm. This is not to say they are not valuable. The smart Troubleshooter follows a predefined diagnostic when one is available. The smart Troubleshooter follows in the footsteps of others. The smart Troubleshooter familiarizes himself with the optimization tools of his home domain.

I believe that one of these tools, Covey's "The 7 Habits of Highly Effective People", rises above the others to take its place among general problem solving methodologies. It's true you can't fix a car using his methodology, but you can sure use it to make yourself a better mechanic.

Steve Litt can be reached at Steve Litt's email address.

Some Books for Problem Solving

By Steve Litt

Here are some books that detail four troubleshooting or problem solving methodologies:

Category	TITLE	AUTHOR	Comments
Troubleshooting well defined systems	Troubleshooting Techniques of the Successful Technologist	Steve Litt	Troubleshooting methodology optimized for well defined systems, and numerous easy, cheap, and safe test points. Includes Bottleneck Analysis for problems of degree in well defined systems.
Tangible general problem solving	The Root Cause Analysis Handbook	Max Ammerman	Optimized for sparse intermittents, events, and extreme safety criticality. Can drill down root cause past components all the way to human factors.
	The Goal	Eliyahu M. Goldratt	Optimized for problems of degree in both well defined and fuzzily defined systems.
	The New Rational Manager	Charles H. Kepner and Benjamin B. Tregoe	Optimized for expensive, unsafe or difficult testpoints, as well as fuzzily defined problems.
Tangible/Personal problem solving	The 7 Habits of Highly Effective People	Stephen R. Covey	Optimized for personal performance of the Troubleshooter/Problem solver himself. A classic, very credible and helpful.
Personal problem solving	Tough Times Never Last, But Tough People Do	Robert H. Schuller	Optimized for personal performance. A very credible treatment of positive mental attitude, written for the sufferers of the 10% unemployment in the 1982 recession, but applicable to anyone facing personal challenges or wanting to rise beyond his/her comfort zone.
Personal problem solving	Awaken the Giant Within	Anthony Robbins	Optimized for personal performance. Very detailed and demanding, not for the faint of heart.
Sales problem solving	Strategic Selling	Miller and Heiman	In my opinion, extremely credible, process oriented, emphasizes the "sales funnel". Top notch.
	Selling for Dummies	Tom Hopkins	In my opinion, this beginner's guide to sales is factual, and walks the middle path between wimpy "nonmanipulative" sales and aggressive sales tricks.
	How to Master the Art of Selling	Tom Hopkins	In my opinion, contains many great ideas, but also contains many aggressive sales tricks that would infuriate an intelligent buyer.
	How I Raised Myself from Failure to Success in Selling	Frank Bettger	In my opinion, this half-century old classic is good, common sense, with tips on sales techniques and also an emphasis that you must satisfy the customer.
	The Greatest Sales Stories Ever Told : From the World's Best Salespeople	Robert L. Shook	In my opinion, this book's stories teach by example. Each story illustrates a sales technique or quality. A must-read.

I'd recommend you read every one of these books. The more problem solving books you read, the more you'll understand about all problem solving.

Steve Litt can be reached at Steve Litt's email address.

I Stand on Their Shoulders

By Steve Litt

This issue of Troubleshooting Professional required a heck of a lot of web research, especially when it comes to generic problem solving. Purveyors of generic problem solving training seem to play it close to the vest. But thanks to the websites of Fred Nickols and Charles Camp, I was able to glean some information.

According to Fred Nickols' website, he's a writer, consultant, and executive. A further perusal of his site reveals his credentials to be, putting it mildly, outstanding. Be sure to read Fred's articles and his distance consulting page. His articles on problem solving are insightful, and all too rare on the web. His URL is in the URL's section of this magazine.

Charles V. Camp is a Professor of Civil Engineering at the University of Memphis. His web notes for the CIVL 1101 course he teaches, Civil Engineering Measurements, contain some of the best problem solving content on the Internet. I'd suggest you read all the content related to problem solving. The URL is in the URL's section of this magazine.

For time reasons, I didn't detail Fred's and Charles' contributions to problem solving, but their content is outstanding and a must-see for any problem solver.

Steve Litt can be reached at Steve Litt's email address.

Linux Log: Linux Troubleshooting Distinctions

Linux Log is a regular column in Troubleshooting Professional Magazine, authored by Steve Litt. Each month we'll explore a facet of Linux as it relates to that month's theme.

Linux offers several distinctions useful for optimization of Troubleshooting Process. To the extent the Troubleshooter can optimize around these distinctions, he or she will gain productivity in Linux Troubleshooting. The first thing you should know is that the standard Reboot Reapply Reinstall Windows Troubleshooting Process does not apply to Linux:

Reboot the computer
Reapply the service pack(s)
Reinstall the app or operating system

Windows operating systems tend to be quite intermittent and non-modular, so that it's very difficult to devise definitive tests to narrow the scope of the problem. That is definitely not the case with Linux computers. In my opinion, the Universal Troubleshooting Process, suitably optimized for Linux, is the best Linux troubleshooting methodology. As a review, here are the steps of the Universal Troubleshooting Process:

Get the Attitude
Get a complete and accurate symptom description
Make damage control plan
Reproduce the symptom
Do the appropriate general maintenance
Narrow it down to the root cause
Repair or replace the defective component
Test
Take pride in your solution
Prevent future occurrence of this problem

Now let's examine how to optimize your Troubleshooting Process for Linux's distinctions.

Linux Has Logs

I learned Troubleshooting Process while repairing consumer audio equipment. The tools used in diagnostic tests were primarily a voltmeter, my finger (as a signal injector), and the speakers (to hear the output). There was definitely no strip-chart recorder.

Imagine how much easier diagnosis of intermittents would have been with a continuously running strip chart recorder. I could have looked for the onset of the symptom, and seen what else happened around that time. I could have observed which voltages changed.

Enter the Linux logs, which act pretty much like strip chart recorders. By looking back on the logs, you can see not only the onset of the problem, but the entire encompassed syndrome. Here are some of the major logs used in Linux:

/var/log/messages The granddaddy of all logs, this is the major repository of system messages.

/var/log/boot.log This log records messages issued during the boot process, and is handy for deducing the as-booted state of the computer, as well as getting hints as to what went wrong during bootup.

/var/log/dmesg Messages from the kernel ring buffer -- low level boot info

/var/log/secure Basic security messages including telnet logons.

/var/log/cron Log of running of cron jobs.

/var/log/maillog Mail log

/var/log/netconf.log A record of network events such as route changes and dhcpd

/var/log/security.log This log shows security concerns such as world-writable files, open and listening ports, changes in security, and the like. Pay special attention to the "diff check". This file is written by a daily cron job.

/var/log/security/ This directory contains intermediate files contributing to the maintenance of the security.log file.

/var/log/xdm-error.log Use this when troubleshooting your Xwindows video.

/var/log/httpd/ This directory contains logs pertaining to the web server.

/var/log/samba/ This directory contains the various logs for the Samba file and print server.

These logs can be used for more than just after the fact detective work. Using the tail -f command, you can view messages in real time as you exercise the system. Better yet, most daemons have methods to increase or decrease the verbosity of what they write to logs, enabling you to "drill down" to execution details. When combined with access to the source code, this becomes a powerful Troubleshooting tool. To find how to change the log verbosity of a daemon, look at the man page of the daemon.

When troubleshooting Linux, always look at the relevant log files.

Available Source

Those of us accustomed to troubleshooting closed source tend to view software as a black box with test points revealed by whatever documentation exists. And many times, the documentation is inaccurate.

Enter source available software. Now you can peer into the actual design of the software. This can be done at several levels.

The simplest way to use the source is to crank up log levels, search for the source statement issuing a suspicious log message, and work backwards through the source until you come upon an interesting question. This can be done with minimal knowledge of the software.

As you gain a little more expertise, you can use the source to form a Mental Model of the software, and make deductions and form new tests based on that Mental Model.

Finally, you can actually change and recompile the source to test hypotheses. Although this is extremely powerful, care must be taken not to cause harm. Original source files must be backed up, the original executable files must be backed up, and it's best to do this on a non-live system, if one exists and if the symptom can be reproduced on it.

Source code needn't be thoroughly digested to be helpful in Troubleshooting. Often you can start in the type source distribution directory, and search for a term or phrase like this:

$ grep -i -r "searchterm" *

Such searches often yield just the right information within a few minutes.

Little General Maintenance

General Maintenance is step 5 of the Universal Troubleshooting Process. It consists of checking the obvious, and doing harmless or desirable maintenance before the real diagnosis (step 6) begins. General Maintenance is a gamble to catch a lucky break and fix the problem early. Cleaning switches and controls, reseating cards and cables, cleaning a car's battery terminal, and rebooting a computer are all examples of General Maintenance.

General Maintenance is especially powerful against non-component flaws -- dirty contacts, loose bolts and other things not appearing on the schematic diagram. It is also extremely powerful against intermittents. But neither of these situations pops up frequently in Linux. As a rule, software has few "non components". Especially software whose source code is available. And as far as intermittents, Linux is a stable and robust operating system that seldom exhibits intermittents.

General Maintenance has one huge downfall. It prevents positive ID of the root cause. If the symptom goes away after you clean all switches and controls, you don't know which switch, you don't know which contact within the switch, and you don't know why the contact got dirty or oxidized. This is obviously taboo in a safety sensitive situation, but even in a business computing environment, it's not desirable.

In fact, the system logs are often a more productive tool in diagnosing intermittents. And it's all too easy to conduct tests with the various software tools that come with Linux, in order to positively determine the root cause, and toggle it.

So except in cases of complete befuddlement, don't reboot a Linux box as a Troubleshooting step. It might not even be desirable to restart a service, although this is done often. Try to find the root cause without General Maintenance.

Most problems caused by human error or human cracking

Linux is robust, consistent and reliable. Linux computers typically keep working until either somebody changes an aspect of the computer's configuration, or a power outage, a hardware problem, or a cracker attack. Unlike Windows, when Linux malfunctions, the best first assumption is that something really changed.

Long upgrade history

It's folly to continually upgrade Windows systems. You get ghosts of operating systems past. Instead, it's best to format and install fresh each new Windows version.

This isn't necessarily true of Linux. It's often better to upgrade, thereby keeping dns, dhcp, samba, apache and other configurations intact. Installing fresh means restoring each of those configurations (and many, many more) from documentation or (human) memory.

The downside is that on a Linux system that's gone through a long series of upgrades, the system state is not fully known. I recently heard of somebody who had continuously upgraded a mid-90's copy of yggdrasil and still has it running today. Almost all the yggdrasil code is gone, but over the continuum of time, it's the same system. Obviously, such a system could not be replicated from distribution.

When should you finally bite the bullet and install Linux fresh? I guess when the system becomes so unknown that you cannot troubleshoot it.

Text config files and vi

Perhaps the most beautiful thing about Linux is that all its configuration, and that of apps running on it, are kept in text files. So it's entirely configurable with the simplest text editor. No need for a tool like regedit. Indeed, Linux has many GUI and text/graphics tools like Linuxconf, but these tools simply modify the text files. Personally, I prefer modifying the text files with an editor, because then I know EXACTLY which files got changed, and when. If something goes wrong I can "put it back".

Remember to restart the appropriate daemons or services after changing a config file. My experience is that daemons recognize certain config file changes immediately, but need to restart in order to recognize them all. After changing a config file, you're in an undefined state until restarting the daemon or service.

Security

A rooted Linux box is extremely powerful in the hands of a badguy. Much more powerful than a Windows box. So it's vitally important to maintain security. Security is like smog control on a car. The machine runs perfectly well without it, but one day it will bite you. In the case of the car, it's the day DMV wants you to reregister the car. In the case of security, it's the day a cracker gets in and plants his little trojans and back doors all over your box.

If you do any Troubleshooting tests that compromise security, try to get the box off the network before doing them. Write them down so you can quickly undo them.

The bad news is that an owned Linux box is more dangerous than an owned Windows box. The good news is that a Linux box can be made much more secure than any Windows box.

A discussion of security is very much beyond the scope of this article, but there are many fine books and websites devoted to the subject. Unless you're running a single machine that does not log into the Internet or LAN, you need to understand security.

Steve Litt is a member of Linux Enthusiasts and Professionals of Central Florida (LEAP-CF). He can be reached at Steve Litt's email address.

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).

Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Troubleshooting Process, or articles on tools, equipment or systems with a Troubleshooting slant. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

By submitting content, you give Troubleshooters.Com the non-exclusive, perpetual right to publish it on Troubleshooters.Com or any A3B3 website. Other than that, you retain the copyright and sole right to sell or give it away elsewhere. Troubleshooters.Com will acknowledge you as the author and, if you request, will display your copyright notice and/or a "reprinted by permission of author" notice. Obviously, you must be the copyright holder and must be legally able to grant us this perpetual right. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

I (your name), am submitting this article for possible publication in Troubleshooters.Com. I understand that by submitting this article I am giving the publisher, Steve Litt, perpetual license to publish this article on Troubleshooters.Com or any other A3B3 website. Other than the preceding sentence, I understand that I retain the copyright and full, complete and exclusive right to sell or give away this article. I acknowledge that Steve Litt reserves the right to edit my submission for clarity or brevity. I certify that I wrote this submission and no part of it is owned by, written by or copyrighted by others.

After that paragraph, write the title, text of the article, and a two sentence description of the author.

URLs Mentioned in this Issue

Troubleshooting Processes for well defined systems:

http://www.ganssle.com/articles/atblsho.htm: Jack Ganssle's looping 6 step Troubleshooting Process. Also see his other Troubleshooting info at http://www.ganssle.com/articles/tblsht1.htm:, as well as his main page at http://www.ganssle.com.
http://www.Intelliworxx.com/html/mentoring.html: This is the starting point to read about Intelliworxx's Era 4 Troubleshooting product, which they call MentorWorxx.
http://www.troubleshooters.com/tuni.htm: The 10 step Universal Troubleshooting Process.

Other problem solving processes

http://www.concentric.net/~Maxainc/indexm.htm: Website of Max Ammerman, the Root Cause Analysis authority.

http://www.goldratt.com: The website of the Avraham Y. Goldratt Institute (The Goal and Critical Chain). Information on all works of Eliyahu M. Goldratt. See also http://www.goldratt.com/library.htm:, the library page for Theory of Constraints.
http://www.jeffcox.com: Website of Jeff Cox, the co-author of The Goal.
http://www.deming.org: Website of the W. Edwards Deming Institute. Deming started with the SPC problem solving method, and elaborated from there. In my opinion, Deming told the truth, the whole truth, and nothing but the truth.
http://home.att.net/~nickols/articles.htm: Fred Nickols' wonderful articles on generic (fuzzily defined problems) problem solving.
http://www.ce.memphis.edu/faculty/camp/1101/notes/index.html: This page and its subpages contain some truly excellent information on problem solving. The site was authored by Charles Camp, Professor of Civil Engineering at the University of Memphis.

Sales tools useful in problem solving

http://www.kassof.com: Website of marketing researcher Mark Kassof. Especially valuable are his marketing tips at http://www.kassof.com/insights/index.htm. See also "The Quotations of Chairman Mark" at http://www.kassof.com/mark/quote.htm.
http://www.millerheiman.com: Website of the authors of "Strategic Selling". Be sure to read their great "Tip of the Week Archive" at http://www.millerheiman.com/tip/archive/index.htm.
http://www.tomhopkins.com: Website of sales author and consultant Tom Hopkins.
http://www.shookbook.com: Home page of Robert L. Shook, author of "The Greatest Sales Stories Ever Told : From the World's Best Salespeople".

Human-Centric tools useful in problem solving

http://www.ozemail.com.au/~caveman/Creative/Resources/crquote2.htm: Not mentioned in this magazine, but this site contains numerous quotations about creativity and problem solving.
http://www.crystalcathedral.org: This is the website of the Crystal Cathedral, whose founding pastor is Robert Schuller, author of "Tough Times Never Last, But Tough People Do!". Also see the Robert Schuller book list at http://store.crystalcathedral.org/acb/showprod.cfm?

&DID=8&CATID=19&ObjectGroup_ID=91

http://www.keithellis.com: Website of Keith Ellis, who wrote the "affirmations" article that can be seen at http://www.keithellis.com/bootstraps1.7.html.

Other URL's

http://www.troubleshooters.com: Author Steve Litt's website.
http://www.hqmc.usmc.mil/factfile.nsf/

7e931335d515626a8525628100676e0c/9e6cdb7ba648f1388525627b0065de66?

OpenDocument

http://call.army.mil/call/newsltrs/98-10/chap8.htm: More info on the M1A1 main battle tank. This site was my source of the tank's gas mileage.
http://www.dictionary.com: The source of the definitions in this magazine.

/var/log/messages	The granddaddy of all logs, this is the major repository of system messages.
/var/log/boot.log	This log records messages issued during the boot process, and is handy for deducing the as-booted state of the computer, as well as getting hints as to what went wrong during bootup.
/var/log/dmesg	Messages from the kernel ring buffer -- low level boot info
/var/log/secure	Basic security messages including telnet logons.
/var/log/cron	Log of running of cron jobs.
/var/log/maillog	Mail log
/var/log/netconf.log	A record of network events such as route changes and dhcpd
/var/log/security.log	This log shows security concerns such as world-writable files, open and listening ports, changes in security, and the like. Pay special attention to the "diff check". This file is written by a daily cron job.
/var/log/security/	This directory contains intermediate files contributing to the maintenance of the security.log file.
/var/log/xdm-error.log	Use this when troubleshooting your Xwindows video.
/var/log/httpd/	This directory contains logs pertaining to the web server.
/var/log/samba/	This directory contains the various logs for the Samba file and print server.

CONTENTS

Editors Desk

By Steve Litt

Steve Litt can be reached at Steve Litt's email address.

Vote for Your Favorite Troubleshooting Professional

By Steve Litt

Steve Litt is the author of "Rapid Learning: Secret Weapon of the Successful Technologist". He can be reached at Steve Litt's email address.

Distinctions

By Steve Litt

Steve Litt can be reached at Steve Litt's email address.

Problem Solving Principles

By Steve Litt

Other Distinctions

Reproducibility/Intermittence

Safety Sensitivity

Total Failures vs. Inadequacies

These Distinctions Determine the Problem Solving Methodology

Problem Solving Commonalties

In Summary

Steve Litt is the author of "Rapid Learning: Secret Weapon of the Successful Technologist". He can be reached at Steve Litt's email address.

Root Cause Analysis

By Steve Litt

Steve Litt is the content lead at Troubleshooters.Com. He can be reached at Steve Litt's email address.

Theory of Constraints

By Steve Litt

Compressible and Incompressible Systems

Steve Litt is a Theory of Constraints true believer, as well as a bottleneck analysis expert. He can be reached at Steve Litt's email address.

Statistical Process Control

By Steve Litt

Steve Litt has presented at several ASQC meetings, on the topic of Troubleshooting. He can be reached at Steve Litt's email address.

Well Defined System Troubleshooting

By Steve Litt

Universal Troubleshooting Process

Jack Ganssle's Looping 6 Step Process

Era 4 Troubleshooting Systems from Intelliworxx

Conclusion

Steve Litt is the author of "Troubleshooting: Tools, Tips and Techniques". He can be reached at Steve Litt's email address.

Comparison

By Steve Litt

Steve Litt can be reached at Steve Litt's email address.

Generic Problem Solving

By Steve Litt

Creativity Jump Starts

Methodology Availability

Steve Litt can be reached at Steve Litt's email address.

Cars and Tanks

By Steve Litt

Steve Litt presents and licenses courses on troubleshooting well defined systems, and well defined systems only. He can be reached at Steve Litt's email address.

Specialized Problem Solving Methodologies

By Steve Litt

Human-Centric Problem Solving Methodologies

The 7 Habits of Highly Effective People

Schuller's "Tough Times Never Last, But Tough People Do"

NLP

Affirmations

"I choose to joyfully become a great Troubleshooter."

I choose

joyfully

become a great Troubleshooter

Psycho Cybernetics

Sales Optmized Problem Solving Methodologies

Attention, interest, desire and action

Steve Litt's Access to a Market for your Product

The Walter Litt Three Base Approach

Gain access

Make it easy to buy

Bury the cost in the big picture

Summary of Walter Litt's Method

Ralph Desanto's Friendly Method

Mark Kassof's Marketing Tips

Strategic Selling (Miller and Heiman)

Tom Hopkins

Non Manipulative Selling

How I Raised Myself from Failure to Success in Selling

The Greatest Sales Stories Ever Told

One Business Optimized Methodology: Reengineering

Conclusion

Steve Litt can be reached at Steve Litt's email address.

Some Books for Problem Solving

By Steve Litt