Troubleshooters.Com and Steve Litt's HR Tips Present

Problem Solving

Copyright (C) 2007 by Steve Litt



As a Human Resources professional, you might be brought in the loop if there's a perception that the employees "can't solve problems". Or perhaps you'll be involved in some other sort of complaint that, to you, looks like employees' inability to solve problems. Either way, it's a complex subject, so your response should be thoughtful and measured. This document provides an overview to problem solving, and in doing so, perhaps gives you the insight needed to research and formulate an answer to the "can't solve problems" problem, from an organizational and Human Resources viewpoint.

Problem solving is the most basic and most important human mental activity. We all want to hire excellent problem solvers. We all want to improve our employees' problem solving ability. But how?

The best problem solvers solve problems with a process. Once your employees learn the process, their problem solving productivity will skyrocket. How can the process be taught to them?

Here's where things get challenging. There are many categories of troubleshooting, and many problem solving processes. Each problem solving process (methodology) is optimized for a specific problem solving category. Sometimes a process can be used on a problem for which it's not optimized, but if that's done, it's likely to be slow going. It's always best to approach each problem with a process optimized for that category of problems.

Here's a very partial list of problem solving categories:
The following table lists some categories and their most famous problem solving methodologies:

Problem Optimized Methodology
Math problems Algebra, geometry, calculus, etc
Science problems Scientific method
Technical troubleshooting Universal Troubleshooting Process
Factory throughput problems Theory of Constraints
Business decision problems Methods of Kepner and Tregoe
Safety events Root Cause Analysis

When training employees to solve problems, the key advantage is to train them in methodologies optimized for the category of problem they will be solving. If they'll be solving more than one class of problems, train them in methodologies optimized for each problem class, rather than trying to shoehorn several problem classes into a "one size fits all" methodology.

Who's This Steve Litt Guy?

I created, teach and market the Universal Troubleshooting Process (UTP for short). This process is highly optimized for quick, economical and accurate solutions of well defined systems with abundant test points. In other words, machines, electronics, computerized systems, software and the like.

Early in my career I tried to market the UTP for solving ALL problems,  but quickly found out that it doesn't work well for things like business and relationship problems. Meanwhile, "competitors" with problem solving methodologies optimized for vastly different problem categories tried to sell my customers their methodologies for technical troubleshooting, and I had to explain why those non-optimized methodologies would have been costly, even if the training had been free.

Besides being an authority on the Universal Troubleshooting Process, I also have a good working knowledge of the Theory of Constraints and Root Cause Analysis.

The Most General Problem Solving Process

The most general problem solving process involves three steps:
  1. Analyze the problem state
  2. Analyze the solved state
  3. Analyze how to make the transition
In other words:
  1. What don't I like about the current situation?
  2. What would I like the situation to be instead?
  3. How can I go from what is to what I want?
This description is too general to solve real world problems, but it gives you the basic structure of most problem solving.

Universal Troubleshooting Process

This is what you use to fix machines, electronics, computerized systems and software. It's optimized for well defined systems with abundant, easy, safe and cheap test points. One of the ways it's optimized is that the solved state is simply "get it to work as designed". There's no need to evaluate alternate solutions, no need to manage expectations. Making the transition is as simple as replacing the bad component.

The Universal Troubleshooting Process incorporates the fact that steps 2 and 3 of the most generic problem solving methodology are obvious in technical troubleshooting. That's one of the many reasons it's much more efficient at technical troubleshooting than more generic problem solving methods.

I've written about it at

Theory of Constraints

The Theory of Constraints (TOC) is what you use to identify and correct bottlenecks. It's most often associated with factory floors, but it can be used for any problem in which the entire system is slowed by a bottleneck. The authoritative text on the TOC is "The Goal" by Eliyahu M. Goldratt and Jeff Cox, ISBN-13 # 978-0884270614.

I'm not going to summarize the book in this article, but here are a few concepts from the book:
That last point seems counterintuitive so let me explain. A factory could theoretically run at 100% capacity if every single component of the factory ran at 100% all the time. However, the first time a bad part is created, or an operator comes back from lunch late, or an extra-large order is received, every factory machine plays the "hurry up and wait" game, resulting in grossly compromised production. This is explained quite well, in the book, by "the match game". Additionally, if non-bottlenecks are run at 100%, the factory drowns in work-in-process inventory. "Full capacity" and "100% efficiency" are myths.

Root Cause Analysis

Imagine this situation: An employee backs up your company's commercial website, and the website immediately goes down, stranding a hundred thousand customers. What do you do?

Obviously you get the website back up, but that's the tip of the iceberg. You need to find out WHY the website went down so that you can prevent future occurrence. It's likely you'll use Root Cause Analysis.

Root Cause Analysis is backward looking detective work. It looks something like this:
Task analysis is just what it means -- analyze the task that failed (backup). What's involved with backup?

Change analysis involves analyzing the difference between the way the task was performed when the problem occurred, and the way it's normally performed.

Control barrier analysis is the analysis of control barriers. A control barrier is a component or procedure in place to prevent a bad outcome. Modern presses have safety mechanisms that shut off if a person's hand is in the jaws. That's a control barrier. Your company might have a policy that backups are not done until the webserver is failovered to the alternate server. That's a control barrier. Theoretically, if a system had all necessary control barriers and all those control barriers worked as designed, there would be no mishaps. In performing a control barrier analysis you might find a control barrier missing, or one that was ineffective.

An event causal factor chart is technical and beyond the scope of this little article.

The next step is to conduct interviews to find out what really happened that day. This reduces the possibility of proceding on false assumptions.

Armed with the task analysis, change analysis, control barrier analysis, event causal factor chart and interviews, you're in an excellent position to find out what went wrong, and fix things so it doesn't happen again. Perhaps in this case, the person performing the backup was inadequately trained so he didn't failover the website to the alternate server before performing the backup, resulting in a crash of the web server software. He didn't failover to the alternate server because he wasn't trained to do so. He wasn't trained to do so because full training is scheduled only four times a year, so his training was only unofficial training from his supervisor, and his supervisor forgot to tell him to switch servers before backup.

The root cause is the system failed to train him to failover. You can think of that as the four times annual training control barrier failing, or you can think of it as a missing control barrier for new hires hired between trainings. To fix the root cause and prevent future occurrence, you could either fix the existing control barrier by specifying that any new employee get full official training immediately upon hire, and not operate the servers until trained, or you could install a new control barrier by creating an official training guide and checklist for the supervisor to provide immediate training.

Method of Kepner and Tregoe

As a business decision support methodology, this methodology is one of the most generic problem solving methodologies around, and looks like this from a high level:
  1. Analyze the problem state
  2. Analyze the solved state
  3. Analyze the transition
  4. Analyze how to prevent future problems and find future opportunities
The best known technique of the Method of Kepner and Tregoe is it's "is/isn't analysis", used in problem state analysis, where questions like these are asked and answered:
Basically, the problem solver constructs a substantial series of such questions, using the 6 W's as a starting point:
If you don't have abundent easy and safe test points, this is a very powerful way of narrowing down a problem. If you have abundent easy and safe test points, diagnosis by diagnostic test is quicker, more certain and more accurate. That's why this methodology is suboptimal for fixing most machines, electronics, data systems and software.

With the methodology of Kepner and Tregoe, once you've analyzed the problem domain you analyze the solved domain, using a prescribed set of metrics. Solved state features are divided into wants and needs. Any proposed solution not addressing every need is thrown out. The wants are computed using metrics, and the list of possible solutions is sorted by numerical value. As stated before, this is not necessary in fixing machines, electronics, data systems and software, because the one and only solved state is "performs as designed", and the one and only transition path is "repair or replace the defective component".

The methodology of Kepner and Tregoe is great for decision analysis. It can be well suited in redesigning machines, electronics, data systems and software because redesign is really a business decision and requires analysis of all states. However, for fixing broken machines, electronics, data systems and software, this methodology is horribly suboptimal. Unfortunately, it is sometimes sold as a troubleshooting tool for broken machines, electronics, data systems and software, perhaps on the theory that you can teach one process to help with both mechanical/electrical/data problems and business problems. Don't fall for that. Teach the method of Kepner and Tregoe for business decision, the Universal Troubleshooting Process for troubleshooting, and teach both to anyone performing both types of tasks.

Cars and Tanks

By Steve Litt, reprinted with permission from the December 2000 Troubleshooting Professional Magazine.
Rush hour is frustrating. Ever wish you could drive a tank to work?

Imagine driving to work in a M1A1 Main Battle Tank, also called an Abrams tank. It has tracks instead of wheels, and it can go absolutely anywhere. It can roll over obstacles 42 inches high. It can cross trenches 9 feet wide. It can go up a 60 degree slope. And due to its almost impenetrable armor, its 120mm main gun, and three auxiliary machine guns, it can traverse the most hostile environments. An M1A1 main battle tank can go almost anywhere on land, including the freeway. So why use a car to go to work, when cars accommodate only a small subset of terrains?

One reason is that the M1A1 gets less than 1 mile per gallon of gas. Working only 10 miles from home, you'd pay $300/week in fuel alone. On those rare occasions when the freeway travels full speed, the M1A1's 45 mph maximum speed is a liability. At a length of 32 feet, one inch, a height of 9.5 feet, and a width of 9.5 feet, parking is a problem.

There's no doubt the M1A1 can get you to work. But your friends driving Ford Focuses get there faster, cheaper and more conveniently. Yes, the M1A1 can go anywhere, but that ability is costly indeed.

Reminds me of companies selling general problem solving training to those requiring electronic, mechanical or computer Troubleshooting.

Mechanical, electronic and computer troubleshooting is a subset of problem solving. Machines and automated systems are well defined systems. By that I mean they have a documented and well defined state and behavior. Fixing them requires only returning them to their as-designed state and behavior. You needn't analyze the solved state, with its heavy design and creative thinking requirements. You needn't ask how you want the machine to perform after repair -- you already know that. It must perform as designed. You needn't ask if there's some better way you can do it. All that's necessary is to get it back to its as-designed state and behavior.

Some training vendors are all too happy to sell you a generic problem solving course for your technical people to use on machine/computer/software problems. Such generic problem solving methodologies contain several time consuming steps necessary only to design the solved state (which degenerates into the as designed state and behavior for machine, computer and software problems). The vendor might justify this by mentioning that the generic problem solving methodology can solve all problems, including those of machines, computers and software. They're telling the truth, and it's about as practical as trading in your car for an Abrams tank.

If you want to win, you go to war in a tank and the office in a car. If you want to win, you fix fuzzily defined problems with a generic problem solving methodology, and technical problems with a Troubleshooting Process optimized for technical problems. If a person solves both types of problems, train him in both methodologies.

So the question you need to ask is this: How would it affect my business if my competitors used more optimized Troubleshooting methodologies than my company?

Deciding on Training

If you're partially responsible for decisions on who gets what training, you have a tough job. This document hasn't told you how to make such decisions, but perhaps this document has given you a high level framework, with which to work with others, in making training decisions.



The information in this document is information is presented "as is",  without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the information is with you. Should this information prove defective, you assume the cost of all necessary servicing, repair, legal costs, negotiations with insurance companies or others, correction or medical care.

In no event unless required by applicable law or agreed to in writing will the copyright holder, authors, or any other party who may modify and/or redistribute the information, be liable to you for damages, including any general, special, incidental or consequential damages or personal injury arising out of the use or inability to use the information, even if such holder or other party has been advised of the possibility of such damages.

If this is not acceptable to you, you may not read this information.

Top of Page

 [ Troubleshooters.Com | Email Steve Litt ]

Copyright (C)2007 by Steve Litt. -- Legal