Copyright (C) 2005 by Steve Litt. All rights
Materials from guest authors copyrighted by them and licensed for
use to Troubleshooting Professional Magazine. All rights reserved to
copyright holder, except for items specifically marked otherwise
free software source code, GNU/GPL, etc.). All material herein provided
User assumes all risk and responsibility for any outcome.
Volume 9 Issue
A Few Computer Repair
| Back Issues | Linux Productivity Magazine ]
Where a calculator on the ENIAC is equipped with 18,000 vacuum
tubes and weighs 30 tons, computers of the future may have only
1,000 vacuum tubes and perhaps weigh 1 1/2 tons. -- Popular Mechanics, March 1949
By Steve Litt
Computers. Can't live with em, can't live without em.
One magazine cannot possibly serve as a general purpose repair manual,
but in this Troubleshooting Professional Magazine issue I'll offer you
a few tips that have saved me time and effort in the past.
This issue of Troubleshooting Professional Magazine is devoted to
choosing the right tool for the job. So kick back, relax, and enjoy the
read. And remember, if you're a Troubleshooter, this is your magazine.
Does it Count Memory?
By Steve Litt
The voice on the other end of the
line is strident: "My computer's broken. You've gotta fix it right
away. My report's due in 2 hours!"
Breathing deeply, you summon The
Attitude, and ask the granddaddy of all symptom description questions:
"What indicates to you that there's a malfunction?"
"It doesn't work -- aren't you
listening! You IT guys are all the same!"
You count to 5. This is nothing
personal -- it's just your job. "What do you see on the screen when you
turn the computer on?", you ask.
"Nothing! Would you listen!"
"Is your monitor turned on? Is the
monitor's power light on?, you ask.
"Are you calling me stupid?" the user screams. "Of course it's on --
otherwise I would have turned it on."
"I'll be right down there", you
assure him. You grab a 10 foot video extension cable, video card and a
couple RAM sticks and head for the user's desk.
In 2 minutes you completely eliminated software as a cause. Long before
the computer boots an operating system, the BIOS counts the memory
attached to the system. If you don't see the bios count memory, you
know for sure it's a hardware problem.
You boot his computer and reproduce the symptom. You look behind his
computer and verify that the monitor cable is correctly plugged into
the computer and that the monitor is powered up. Everything's properly
connected. Seeing another working computer in the cubicle, you use the
10 foot monitor extension cable to connect the nonworking computer to
the other monitor. The other monitor is black. You've now eliminated
the monitor from the root cause scope. You reconnect the original
Now it's time to investigate the computer itself. You open the user's
box and swap the video card. No change. You revert to his old video
card, and disconnect everything from the motherboard except the video
card, the RAM sticks, the wiring to the power button, and the power
supply connection to the motherboard.
If it fails to count memory now, the problem is in either the
motherboard, the CPU, the RAM, the power switch or the video card, and
you already eliminated the video card as the cause. In that case, you
could remove the power switch from contention by manually using a
screwdriver to short the the switches connectors on the motherboard. It
would then be pretty easy to swap out the RAM and if necessary, the
CPU, and if nothing changes, it's the motherboard.
If you're lucky enough to have a POST card you can eliminate many of
those steps and reach a conclusion faster. Most of us don't have POST
However, in this particular case none of that is necessary, because
once everything but the video card, the RAM sticks, the wiring to the
power button, and the power supply connection to the motherboard were
disconnected, it began to count memory. Naturally, because no disks
were connected, it halted with a rather ominous sounding message about
a missing boot sector, but this is to be expected.
Now you start replacing things, a few at a time, until the symptom
recurs. Then you start disconnecting the last wave of connections one
at a time until you find the one component that toggles the symptom,
which in this case is the internal modem, which isn't even used. You
remove the modem, connect everything back up, and the user is up and
running, with plenty of time to finish his report. He apologizes to you
for his harsh words, explaining that he's had a tough day.
The lowest common denominator in the boot process is the part where it
counts memory. If that doesn't happen, you need to get it to happen.
Note that some BIOS's have a feature whereby a "splash screen" comes up
instead of memory counting and the rest of the POST (Power On Self
Test) activities. There should be a way to turn off that splash screen.
Personally, I always disable the splash screen -- I don't think a few
lines of text offend the user's sensibilities, but if there's a
problem, you'll be very glad to see the prompt telling you how
get to the BIOS configuration
utility, you'll be very glad to see all relevant text.
Abbrieviated Look at the Boot Process
By Steve Litt
This article presents a very abbreviated look at the boot process for a
commodity X86 computer or clone.
The computer has a startup code built into its ROM BIOS chip. Upon startup,
the x86 processor jumps to the BIOS code, which typically starts at F0000 hex.
That code works directly on the CPU, memory and disks. It absolutely does not
have a kernel, operating system, or system libraries through which to
One of the first things this startup code does is give the
troubleshooter a chance to press a key (usually the Del key) in order
to change some data parameters stored in the ROM BIOS. You can change
which disks it tries to boot from and their order. You can change the
perceived disk geometry. Most of the rest of what you can change is
beyond the scope of this book.
The startup code then goes into its Power Up Self Test (POST) which
does just what it's supposed to -- test everything in the computer.
What is tested, and what errors are considered fatal, depends to a
great deal on how you have configured the BIOS data parameters
(commonly called the BIOS setup or CMOS setup). Typically it first
checks the video card, then counts memory, then checks all hard drives
for their existence, checks the floppy and CDRom. Note that many
monitors take a second or so to "warm up", so you might not see the
video card test, but by the time the computer counts memory you should see
Depending on the BIOS setup settings, the memory counting might simply
be a display of how much RAM is installed, or it might actually check
each RAM address, thus "counting up".
If everything checks out OK, the BIOS then tries to boot from a disk.
Which disks it tries to boot from, and the order in which they're
tried, is defined in the BIOS setup. For troubleshooting purposes, the
best sequence is the floppy first, then the CDROM drive, then the first
hard disk. This can be set in the BIOS setup screen, available by
pressing the Del key (or some other key) during POST.
The BIOS hands off to the first device it finds that is bootable.
Booting from Disk
Did you ever wonder where booting got its name? It stands for
"bootstrap", which is a reference to the term "pulling yourself up from
the bootstraps". After you read this section you'll understand how
aptly that name applies.
When the bios hands off to a device, it specifically hands off to the
first sector of that device. Cylinder 0, Head 0, Sector 1. Actually it
pulls that sector into memory and starts executing it, but for the
purposes of this article you can just think of the BIOS jumping to the
code on the first sector of the device.
That first sector is 512 bytes long. Those bytes are distributed
|Boot code (bootloader)
|Boot code signature (hard code
While we're at it, let's give the program on the MBR a name. It's
called a bootloader. Some famous bootloaders from the UNIX world are
LILO and grub. Windows has its own bootloader.
So the computer program on the MBR, commonly called the bootloader,
must do its magic in 446 bytes. Not Megabytes. Not Kilobytes. Bytes.
There is no operating system loaded yet, so there is no access to high
level structures like files. Or even medium level structures like
clusters (DOS, Windows) or inodes (Linux, Unix, BSD). Any disk access
must use the low level routines on the ROM BIOS (int 13H, to be
specific), which access the disk by Cylinder, Head and Sector (CHS, or a CHS
About all the the bootloader can do in its 446 bytes is to look at the
partition table with which it shares the MBR, and jump to code
elsewhere. Depending on the capabilities of the bootloader in
question, that somewhere else could be the sector 1, head 0 of the
first cylinder of a partition (that partition's "boot sector"), or it
could be a "file" that is in a known and static cylinder, head and
sector, that cylinder, head and sector being kept as part of the 446
bytes. Wherever it jumps, it's the job of that program to actually load
the kernel of the operating system. Sometimes there are two jumps.
Because grub is the bootloader I know the best, let me describe what
happens with grub.
The grub program is actually two programs: stage1 and stage2. The first 446 bytes of stage1 is copied up to the
first 446 bytes of the MBR, together with the CHS (cylinder, head,
sector) address of stage2.
Then, when you reboot to the disk containing that MBR, stage1 runs, and passes control
to stage2, which is on
the disk at the CHS written up to the MBR (heaven help you if somebody
moved stage2). Then stage2 runs. stage2 has code built in which
can understand some filesystems, including Linux, BSD and Unix. stage2 reads a file called menu.lst and finds the kernel
and other things based on that.
Other bootloaders can't read filesystems, which is why addresses of all
relevent files must either be included in the MBR or in the map file
pointed to by the MBR.
Once the kernel is running, the last step is the operating system boot.
Linux runs the init
program to boot. Windows has something similar.
Using Knoppix as
a Diagnostic Tool
By Steve Litt
Knoppix is a Linux distribution on a bootable CD. What makes it really
When you have a question like "is it hardware or software", you can
boot Knoppix to get a completely different operating system. If the
problem still occurs, it's probably hardware. The exception is if
Knoppix happens not to support the hardware in question. Many hardware
manufacturers do not write Linux drivers, so the Linux community writes
them, and they are placed on the Knoppix CD. But sometimes a piece of
hardware hasn't been out long enough for the Linux community to reverse
engineer it, or sometimes the hardware is so esoteric that nobody's
reverse engineered it. In that case, if the Windows symptom is "can't
recognize the Esoterica IIID video card", the Knoppix symptom could be
the same, even though the real problem was a Windows configuration. But
in most cases, Knoppix is a good way of answering the question "is it
hardware or software?"
- With its superior hardware detection, it often runs hardware
other operating systems can't.
- It mounts all available Linux, Unix and Windows partitions, so
it's easy to investigate every partition on every hard drive.
- If operated normally, it does not write to any hard disks, making
its tests non-destructive
- It's a good way to boot if you can't boot
Your sound card stopped working. Is it hardware, or software? Assuming
that sound card is supported by Knoppix (and that's usually a good
assumption), just boot Knoppix and see whether the sound card works in
Knoppix. The same can be done with video cards and most periperals.
Have you ever had Windows stop working, but you haven't backed up? You
might be in luck. Try booting Knoppix and mount the Windows
partition(s) read-only. Then transfer the data to another machine via
If Windows won't boot, sometimes you can find the Windows partition
with Knoppix, and then use grub
Internet access just took a dump on your windows machine, and in order
to restore internet access you must download a file from the Internet.
No problem. Boot Linux. Mount a partition read/write, and download the
Have you ever noticed that installing certain network cards on a
Windows computer is like pulling teeth? To determine whether the
problem is software or hardware, boot Knoppix on the computer and see
whether you can see the network card and the LAN. Knoppix has the best
network card detection around. If it's software, you can redouble your
efforts to configure Windows for the card. If it's hardware, you
needn't waste your time.
This just scratches the surface. Every computer professional, whether
they use Linux or not, should have a Knoppix CD in their toolbox.
By Steve Litt
Dirty or corroded electronic contacts are a frequent cause of computer
problems, especially intermittent problems. By using a good electronic
contact cleaner/lubricant you can prevent many such problems from ever
happening. These days I use Lube Job Electronics Lubricant, available
from www.blowoff.com. Every time reinsert a daughtercard or ramstick,
every time I reconnect a IDE or floppy cable, every time I connect a
mouse, keyboard or network cable, I carefully apply Lube Job. Since
I've lubricated electronic contacts, I've found a noticible decrease in
computer problems, especially intermittents.
Electronic contact lubrication isn't perfect. You must make sure not to
spread it beyond the metal contact (or the plastic housing those
contacts). It's possible that some electronic lubricants have some
degree of conductivity, which could alter the functioning of your
computer. It's also possible that some lubricants could hurt some
plastics used in a computer. All I can say is I've been very happy with
the performance of Lube Job in my fleet of computers.
Letters to the Editor
All letters become the property of the publisher (Steve Litt), and
be edited for clarity or brevity. We especially welcome additions,
corrections or flames from vendors whose products have been reviewed in
magazine. We reserve the right to not publish letters we deem in
(bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be
the subject reads "Letter to the Editor". We regret that we cannot
your letter, so please make a copy of it for future reference.
How to Submit an Article
We anticipate two to five articles per issue, with issues coming out
We look for articles that pertain to the Troubleshooting Process, or
on tools, equipment or systems with a Troubleshooting slant. This can
done as an essay, with humor, with a case study, or some other literary
A Troubleshooting poem would be nice. Submissions may mention a
but must be useful without the purchase of that product. Content must
overpower advertising. Submissions should be between 250 and 2000 words
Any article submitted to Troubleshooting Professional Magazine must
licensed with the Open Publication License, which you can view at
At your option you may elect the option to prohibit substantive
However, in order to publish your article in Troubleshooting
Magazine, you must decline the option to prohibit commercial use,
Troubleshooting Professional Magazine is a commercial publication.
Obviously, you must be the copyright holder and must be legally able
so license the article. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for
or brevity, within the scope of the Open Publication License. If you
to prohibit substantive modifications, we may elect to place editors
outside of your material, or reject the submission, or send it back for
Any published article will include a two sentence description of the
a hypertext link to his or her email, and a phone number if desired.
request, we will include a hypertext link, at the end of the magazine
to the author's website, providing that website meets the
criteria for links and that the
website first links to Troubleshooters.Com. Authors: please understand
can't place hyperlinks inside articles. If we did, only the first
would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with
line Article Submission. The first paragraph of your message should
as follows (unless other arrangements are previously made in writing):
Copyright (c) 2001 by <your name>. This
may be distributed only subject to the terms and conditions set forth
the Open Publication License, version Draft v1.0, 8 June 1999
at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for
at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest
is presently available at http://www.opencontent.org/openpub/).
Open Publication License Option A [ is | is not]
so this document [may | may not] be modified. Option B is not elected,
this material may be published for commercial purposes.
After that paragraph, write the title, text of the article, and a
sentence description of the author.
Why not Draft v1.0, 8 June 1999 OR LATER
The Open Publication License recommends using the word "or later" to
the version of the license. That is unacceptable for Troubleshooting
Magazine because we do not know the provisions of that newer version,
it makes no sense to commit to it. We all hope later versions will be
but there's always a chance that leadership will change. We cannot take
chance that the disclaimer of warranty will be dropped in a later
All trademarks are the property of their respective owners.
(R) is a registered trademark of Steve Litt.
URLs Mentioned in this Issue