Troubleshooters.Com Presents

Troubleshooting Professional Magazine

Volume 4 Issue 7, July 2000
Making it in a Post Microsoft World, Part II
Troubleshooting Linux
Copyright (C) 2000 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

[ Troubleshooters.Com | Back Issues ]


Editors Desk

By Steve Litt
Remember the Troubleshooting Process you used for Windows? It probably went something like this:
  1. Get the Symptom Description
  2. Reboot
  3. If that didn't work, reinstall the app
  4. If that didn't work, delete and redetect device drivers
  5. If that didn't work, use the "Windows Troubleshooter" to figure conflicts
  6. If that didn't work, wipe your disk clean and install Windows and apps from scratch
  7. If that didn't work, live with it
The preceding is a valid Troubleshooting Process for a massively intermittent, undocumented system. It was pure general maintenance (and a little superstition). And it was fairly easy -- mostly manual labor of installation.

Repairing Windows was similar to repairing those ancient vacuum tube radios. With tube radios, you removed all the tubes, went to the drugstore, and tested them on a tube tester. If any were bad you replaced them and tried again. If you had a tube radio with all good tubes that still didn't work, you threw it away. The wiring was a mess, the components were low quality, and modularity was nonexistent.

Tube radios are a relic of history, and Windows isn't far behind.

Now comes the new operating system, Linux. Windows was like a spaghetti wired tube radio, and Linux is like modern electronic equipment sporting removable circuit boards. It's modular, well laid out, and very well featured. Every adjustment (configuration change) can be made with a text editor, although there are tools to automate configuration changes.

Lacking the intermittence and non-documentation of Windows, Linux requires a different Troubleshooting Process, which might look something like this:

  1. Get the Attitude
  2. Get a complete and accurate symptom description
  3. Make damage control plan
  4. Reproduce the symptom
  5. Do the appropriate general maintenance
  6. Narrow it down to the root cause
  7. Repair or replace the defective component
  8. Test
  9. Take pride in your solution
  10. Prevent future occurrence of this problem
Yes, unlike Windows, Linux is fixed with the same Troubleshooting Process you use on all reproducible problems on well defined systems. This process bestows both a benefit and a responsibility. The benefit is that on reproducibles, it's a mathematical certainty that using this process you will solve the problem, and that most problems will be solved in much less time than a reinstall takes. The responsibility is the need for some system expertise, Troubleshooting Process competency, and the need for a little deductive reasoning.

It's precisely this responsibility that causes those from the Windows retroculture to say "Linux is harder to work with". They're absolutely right. I could "troubleshoot" tube radios at the age of 12. I had to wait til my mid 20's before I had the knowledge and maturity to troubleshoot modular solid state equipment. But with the latter, I had about a 99% success rate. Tube radios were throwaway items -- kind of like Windows installations.

After so many years of the incantations and rain dances we used to keep those old Windows systems running, it's refreshing to have high quality, modular and repairable Linux systems. This issue of Troubleshooting Professional Magazine discusses the tactics and strategies of troubleshooting Linux systems. "Enhancement" and "configuration" might be a better description than "Troubleshooting", because once working, Linux systems seldom "screw up".

Guest author Steve Epstein makes his Troubleshooting Professional debut with his great article, "Using Ipchains", which demystifies much of this important security tool. Thanks Steve!

The July 2000 issue is actually part 2 of the "Making it in a Post Microsoft World" series. So kick back, relax, and read this issue. You'll find info to make your Linux config/troubleshooting straightforward. And remember, if you're a Troubleshooter, this is your magazine. Enjoy!.

Steve Litt can be reached at Steve Litt's email address.

Tools, Solutions and the Universal Troubleshooting Process

By Steve Litt
Solutions cannot be bought or sold. Only tools can be bought and sold.

You don't need to take my word for this. See what Gerald Nadler and Shozo Hibino say in the first few sentences of the fourth chapter of the book , "Breakthrough Thinking" (ISBN 1-55958-004-6):

One of the most pervasive and persistent mistakes made in problem solving is to assume that one problem is identical with another. Why do people make this mistake? Often the adage, "don't reinvent the wheel," is used as their rationale. A prime tenet of Breakthrough Thinking, which we will explain in this chapter, is that each problem is unique. If the optimal solution is to be found, the problem must be treated as unique from the outset.
To avoid reinventing the wheel, you use the same *tools* to solve problems, adding your brainpower to discover and solve the unique aspects of the problem. Most of this issue of Troubleshooting Professional Magazine is devoted to tools you can use to solve problems in Linux systems. It's important to be constantly aware that I'm giving you tools, not solutions. You will create your own solutions, using these and other tools, and the Universal Troubleshooting Process (UTP).

The UTP is a 10 step process optimized to quickly find the root cause of any reproducible problem in a well defined system (and Linux systems are almost always well defined). I'd highly suggest using the UTP to solve computer problems. Here are the ten steps of the Universal Troubleshooting Process:

  1. Get the Attitude
  2. Get a complete and accurate symptom description
  3. Make damage control plan
  4. Reproduce the symptom
  5. Do the appropriate general maintenance
  6. Narrow it down to the root cause
  7. Repair or replace the defective component
  8. Test
  9. Take pride in your solution
  10. Prevent future occurrence of this problem
Steps 1 and 9 are for the benefit of *you*, the Troubleshooter. They reaffirm that people make solutions, not CD burners. Most of the rest are pretty obvious. Step 6 is done by repeatedly dividing the system in two pieces, and ruling out one or the other. Ideally each piece would be 1/2 the remaining possible problem domain, but in reality it's a quadruple tradeoff between these four factors:
  1. Ease
  2. Likelihood
  3. Even divisions
  4. Safety (for people, equipment and materials)
Step 5 can be thought of as "playing the odds", or alternatively "rounding up the usual suspects", or alternatively observing the obvious (error messages and the like). Different situations call for different general maintenance, but the following are often mentioned general maintenance procedures for computers: The entire UTP is built to deliver quality solutions. That's quality by design. In addition, Step 8 is the "quality by inspection" part of the Troubleshooting process. Testing includes verification of adequacy of solution by providing answers to the following four questions:
  1. Did the symptom disappear?
  2. Were any new problems created?
  3. Do I know *why* the symptom disappeared?
  4. Did I fix it the right way?
It's obvious the symptom must disappear, or else there's no solution. Note that this implies adequate completion of step 2, Get the symptom description. Fixing the wrong symptom is never a solution. How many times do we get a fourth-hand report that "the network's too slow", only to find that the user's gripe was a slow dialup connection to the Internet. Giving him a 100mb Ethernet card would not solve his dialup problem.

But it's also vital that no additional problems were created. Parts of the system likely to have changed are investigated to see whether they in fact did. Note that checking for new problems implies adequate completion of step 4, Reproduce the symptom. Without a baseline of the system's performance, it's impossible to determine whether new problems were produced.

The third and fourth questions tackle the problem of "coathanger solutions", named after the practice of hanging a car's muffler with coathangers instead of replacing the bolts. "Coathanger solutions" are solutions which strongarm the symptom into submission, but leave the root cause intact. The usual outcome of a coathanger solution is a side effect problem.

Here's an example. Imagine that Charlie Coathanger has an incorrect setting of his parallel port makes his printer incredibly slow when printing graphics. Charlie doesn't take the time to find the root cause (the incorrectly set parallel port), but instead he "fixes" the problem by a software tweak to print much courser (less bytes) graphics. The symptom is gone. Charlie's print jobs now print fast. But he's created the side effect of incredibly course and ugly graphics.

Note that the last two testing questions cannot be answered for problems fixed in step 5 (General Maintenance). In most cases, that's a legitimate tradeoff for Troubleshooting speed. However, in safety critical systems, Step 5 should include no steps that would compromise answering the last 2 questions of step 8. In fact, I often recommend against general maintenance in safety critical systems. If you absolutely must troubleshoot to the root cause, skip step 5 (except for its observational component).


Perhaps the funniest phrase in the English language is "packaged solution". The salesmen and glib-talking managers with impressive sounding titles hate me for saying this, but those of us who troubleshoot know it's true. This issue of Troubleshooting Professional Magazine discusses many measuring and injection tools for Linux system. This article has scratched the surface of the greatest Troubleshooting Tool, the Universal Troubleshooting Process. At first opportunity, I'd highly recommend you learn as much about the UTP as possible, starting with the Universal Troubleshooting Process web page at Not only will it help you with Linux, but also in solving every technological problem you encounter.

The Universal Troubleshooting Process is valid for any well defined system. Most of the UTP is constant over all systems. The only thing that changes from system to system is that entity often called "subject matter expertise". I just call it the Mental Model. Read on...

Steve Litt can be reached at Steve Litt's email address.

The /etc Tree

By Steve Litt
Almost all Linux configuration can be done with a simple text editor. If you're anything like me, that's an enormous load off your shoulders. Remember the Windows registry, with the Regedit tool? Remember dueling DLL's? Remember needing to click through all sorts of junk to make a minor config change?

All gone. Linux is here, and almost all of its configuration is contained in text files. And almost all those text files are in the /etc tree. Most of the major configuration files are directly in the /etc directory, but before discussing those lets list some of the major subtrees of the /etc tree:
|-- X11                      <Systemwide GUI config>
|-- cron.d                   <some cron config, added to /etc/anacrontab>
|-- cron.daily               <daily cron jobs>
|-- cron.hourly              <hourly cron jobs>
|-- cron.monthly             <monthly cron jobs>
|-- cron.weekly              <weekly cron jobs>
|-- httpd                    <all apache config goes in this tree>
|-- isdn                     <isdn line config>
|-- logrotate.d              <config for rotating logs, restarting, etc>
|-- mgetty+sendfax           <config for virtual terminals, faxing, and voice>
|-- news                     <config your inn newsreader here>
|-- pam.d                    <configure logon security for various services here>
|-- ppp                      <configure your dialup connection here>
|-- profile.d                <aliases and other systemwide profile config here>
|-- rc.d                     <system and daemon startup/shutdown configuration>
    |-- init.d               <contains start/stop scripts for all services>
    |-- rc3.d                <Level 3 (console) startup config via symlinks to init.d>
    `-- rc5.d                <Level 5 (gui) startup config via symlinks to init.d>
|-- sane.d                   <scanner configuration>
|-- security                 <some general computer access configuration>
|-- skel                     <pattern for individual user configurations>
`-- sysconfig                <some fundimental system config, esp. network, power mgmt, etc>
    `-- network-scripts      <these are the configurable scripts that govern the network>

Important Config Files

Bootup and Initiation Related Configuration


Configure LILO. Lilo is the boot loader that allows you to choose the proper kernel, and the proper root and boot partitions. Once lilo.conf is configured and all partitions mentioned in lilo.conf are mounted by the same mountpoints as mentioned in lilo.conf's image= statements, you can run the LILO program to tell the computer how to boot next time.

Running LILO on an incorrect lilo.conf program can prevent reboot. There are ways to get around and fix the problem, but it's not easy.

Always be careful when editing lilo.conf and/or running LILO.

You can find out more about lilo.conf and LILO as follows:

man -a LILO
man -a lilo.conf


Controls mounting of drives and partitions to mount points. LILO's root= statement tells the location of the root partition, from which the /etc partition can be found. Once /etc is found, /etc/fstab is consulted to determine all mounted partitions and devices, and their mount points.

You can find out more about fstab with:

man -a fstab


DNS *client* configuration. Although more properly thought of as a network config file, bad reverse DNS resolution can in certain cases prevent bootup. Such problems can temporarily be resolved by renaming resolv.conf, after which the problem can be resolved.

You can find out more about resolv.conf with the following command:

man -a resolver


This file defines all services and their port numbers. The port number is optionally the number listed after the last colon in a URL, as in:
The preceding url comes in at port 80, which is the default port for http. With my present web host, trying to come in at port 8080 would fail. However, many web hosts put http at 8080, the normal location for web caching services.

Most of the ports in this file are "standards", so it's unwise to change them without a good reason.

You can find out more about the services file with the following command:

man -a services
The services listed in /etc/services are available for the inetd daemon as configured by /etc/inetd.conf. Read on...


The inetd daemon starts up various services. It's an alternative to starting the services in the /etc/rc.d/rc3.d (or whatever) directory. Startup via inetd has advantages and disadvantages. The inetd.conf file governs the startup of services started by the inetd daemon.

This file has security implications because it determines what services run, and whether they are wrapped in TCP wrappers (tcpd). For instance, if you start up telnet in inetd.conf (and this is the default), anyone with an account can remotely telnet into your system and act like he's on a terminal. If you're Internet connected, you probably do not want this. If you're an ISP, you for sure don't want this, because instead you'll use secure shell (ssh) so that passwords aren't sent in the clear over the net.

You can read more about inetd.conf with the following commands:

man -a inetd.conf
man -a inetd


This sets various system wide settings on bootup.  The main thing you do here is set the system default umask (which governs default file mode upon file creation). To learn more about this file, view it in read only mode:
less /etc/profile

Logon Related Configuration

These are some of the few files that are not completely configurable by a text editor. The reason, of course, is they involve encrypted passwords, which certainly should not be typed in. So programs like useradd and passwd configure these programs. Nevertheless, these programs *are* editable text files, and as a matter of fact can be configured, in every respect but the passwords, by text editor.


This is a list of all users, one line per user. Each line contains seven colon delimited fields, as follows:
account This is the username with which the person logs on.
password On systems with shadow passwords, which are more secure, this field contains only a single lowercase x, and the actual password is incrypted in a file called /etc/shadow. This is called "shadow passwords". On systems without shadow passwords, this field contains an encrypted version of the password. Any edits to this field can result in a failure to log on.
UID A numeric identifier for this account. There's a one to one relationship between account and UID.  This is a "primary key" used to relate this file to other files and processes.
GID The numeric group identifier of the group serving as this account's primary group. The actual group name and other info about this group is looked up in /etc/group by this number. Note that an account can be a member of several groups, but only the primary group is identified in the passwd file. The other groups which this account is a member is identified in /etc/group.
GECOS This is a spare field often containing a the same text as the account field or some other informational text.Linux, UNIX and BSD have no *official* use for this field.
directory This is the home directory for this account. For most accounts it's a directory with the same name as the account, but below the /home directory. For the root account it's typically /root. But it can actually be anything.
shell This identifies the shell this user will use once authenticated. On Linux systems it's usually /bin/bash, but of course it could be /bin/csh, or any other shell. I have a menu program called UMENU, and in the past I've placed its startup command, /usr/bin/mm x into the shell field, after which when the user logged in he was immediately presented with the menu interface, and when he terminated the menu interface he was logged out.

To find out more about this file, use the following command:

man 5 passwd
This file can be programatically manipulated by the passwd, useradd, usermod and userdel programs. You can look up any of those programs with the man -a command, followed by the program name.


On systems using shadow passwords, this file contains encrypted password strings for the various accounts. For further info, use the following command:
man 5 shadow


The group file lists every group on the system, one line for each group, and for each group it lists all accounts in the group. It's comprised of one line per group, and each line four colon delimited fields.
group_name The name of the group, such as users. There's a 1 to 1 correspondence between group_name and GID.
passwd Occasionally a group has a password, although typically only accounts have passwords. If the group has a password, this field has an encrypted string. If the group has no password (which is typical), this field is blank.
GID The numeric identifier of the group. This is a "primary key" used to relate this file to other files and processes.
user_list This is a comma delimited list of every user in this group. Note that these are listed by account name, *not* UID. An account need not appear here if the group is listed as that account's primary group in the /etc/passwd file. In addition to the option of configuration via editor, the user list can be automatically manipulated by the usermod program with its -G option. The following puts umenu in the user_list field of the slitt and blitt groups:
usermod -G slitt,blitt umenu
You can then delete umenu from the slitt and blitt groups' user_list fields with the following command:
usermod -G "" umenu
Warning: usermod -G is Nonintuitive

One might *wrongly* think that the preceding commands modify group umenu. In fact, they modify all groups listed in the comma delimited list contained in the -G argument.  The final argument is an *account*, NOT a user.

If you're not familiar with this or clear on its meaning, please reread until you understand the ramifications.

It might seem sensible to modify the user list for a group with the groupmod command. Unfortunately, the groupmod command has no provisions for changing the group's user list. You must use the usermod command.

For further information on the /etc/group file use the following command:

man -a group

Other Security Related Configuration


The inetd daemon starts up various services. It's an alternative to starting the services in the /etc/rc.d/rc3.d (or whatever) directory. Startup via inetd has advantages and disadvantages. The inetd.conf file governs the startup of services started by the inetd daemon.

This file has security implications because it determines what services run, and whether they are wrapped in TCP wrappers (tcpd). For instance, if you start up telnet in inetd.conf (and this is the default), anyone with an account can remotely telnet into your system and act like he's on a terminal. If you're Internet connected, you probably do not want this. If you're an ISP, you for sure don't want this, because instead you'll use secure shell (ssh) so that passwords aren't sent in the clear over the net.

This is a very complex file. You can read more about inetd.conf with the following commands:

man -a inetd.conf
man -a inetd


This defines which hosts are allowed to run which tcpd-wrapped services started by inetd. You can learn more about both the hosts.allow file and the hosts.deny file with the following command:
man 5 hosts_access


This defines which hosts are forbidden to run which tcpd-wrapped services started by inetd. You can learn more about both the hosts.allow file and the hosts.deny file with the following command:
man 5 hosts_access

Network Related Configuration


DNS *client* configuration file. This file tells the computer which DNS servers to use. If those DNS servers have defective reverse DNS resolution, it can prevent or badly delay access by telnet, ftp, and email performance, and can even prevent bootup of your computer. Such problems can temporarily be resolved by renaming resolv.conf, after which the problem can be resolved on the DNS server (or else a different DNS server used).

You can get a little bit of additional info here:

man -a resolver


DNS *server* config, top level. This file identifies domains to be resolved, and forwards queries to the proper resolution files, generally in the /var/named directory.

Configuring named.conf file isn't simple. For more info:

man -a named.conf


Here's where you put your various LAN segments.


The following are the contents of a typical /etc/host.conf file:
order hosts, bind
multi on
The first line says that to resolve a name, first look in the hosts file (/etc/hosts), and then look in bind (DNS). The second line enables multiple IP's for one IP address.


This file contains names, aliases and IP addresses of often used computers (hosts). This yields ultra-quick lookup on the most often used computers. Perhaps more important, it enables your desktop machine to reach all local networks even after your DNS system crashes or malfunctions.


This is one of the many locations of the host name. The host name is written as a FQDN, (Fully Qualified Domain Name). Here's the one for my desktop computer, which is part of an internal domain called domain.cxm [sic, cxm prevents my making accidental mischief on the Internet]:


This is another place for name resolutions for often used hosts. It is used by SMB/CIFS (Samba) protocols. See more at:
man -a lmhosts


In this file, for each listed file, defines which services, and in what order, you attempt to find it. Services include: nis+,nis, dns, files (local filesystem), db (local database (.db) files), compat (NIS on compat mode), hesiod, [NOTFOUND=return]. The latter is basically like a break statement in C -- it says "give up". This is part of the "Name Service Switch" system, and is the first step in various built-in Linux functions finding specific files they need (like passwd, for instance).

The preceding explanation is only partially accurate and barely scratches the surface. See the man page at

man -a nsswitch.conf
NEVER change this file without making a backup and documenting the change far and wide.

If you encounter a situation where Linux cannot find a file, nsswitch.conf is one of the many configurations to investigate. I'd personally start by walking the switch path. For instance, look at this line from my nsswitch.conf:

passwd:     files nisplus nis
So I'd look at the /etc/passwd before I start looking in the nis systems.


This is the Samba configuration file. Samba is file and print server software that replaces NT, Win2K and Netware file servers. Samba is an essential part of a desktop migration from Windows to Linux. You can find out more about Samba in my new book, "Samba Unleashed", which is available at your local bookstore. You can also look at the Samba pages of Troubleshooters.Com's Linux Library, URL listed in the URL's section of this issue. Finally, check out the man page:
man -a smb.conf
Caution, the preceding man page is HUGE.

Unfortunately, many folks create their smb.conf files by modifying a huge example file with voluminous comments and extraneous parameters that already match the defaults. In my opinion you're much better off starting from scratch, letting Samba's intelligent defaults do most of the work, and making a small file.

NOTE: This file's location is dependent on distribution and Samba installation. On Red Hat and Mandrake, it's in /etc. On Caldera it's in /etc/samba.d. On a default source compile, it's located in /usr/local/samba/lib.

Here are some very common errors and adjustments for the smb.conf file.

If your Windows box keeps asking for a password and rejecting, your [global] section's encrypt passwords= line is probably wrong. Here's the proper setting when Windows clients use encrypted passwords:

encrypt passwords=yes
And the following is proper if the clients don't use password encryption:
encrypt passwords=no
If some clients use encryption and others don't, you can use include statements to proper handle each, but that's beyond the scope of this article. See Samba Unleashed.

Another common problem is that Windows doesn't see the computer and/or its shares. If that happens, make sure you have the proper [global] section netbios name= and workgroup= settings. The following are appropriate for a Samba server whose host name is "mainserv", serving a Windows workgroup called "MYGROUP":

netbios name=mainserv
If your Windows clients still can't see it, and if there are no other Windows servers acting as domain controllers or WINS servers (typically this would require an NT or W2K box, so if you only have Win9x boxes on the LAN you probably have no Windows servers acting as domain controllers or WINS servers), enable WINS service and domain control in the [global] section as follows:
wins support=yes
domain master=yes
domain logons=yes
os level=65
preferred master = yes
If, after adding the preceding, your network starts acting "flaky", back out those changes and start looking for a Windows box that is acting as a WINS server or domain controller.


Samba users and passwords are stored in /etc/smbpasswd, not to be confused with executable /usr/bin/smbpasswd, which is the program used to update /etc/smbpasswd. The data file is documented at:
man 5 smbpasswd
The executable is documented at:
man 5 smbpasswd
You can see them both by:
man -a smbpasswd
NOTE: This file's location is dependent on distribution and Samba installation. On Red Hat and Mandrake, it's in /etc. On Caldera it's in /etc/samba.d. On a default source compile, it's located in /usr/local/samba/private.
Always remember to add a password for each Windows user with the following command. For instance, if the Windows user logs into his Windows box with user name slitt:
smbpasswd -a slitt
The preceding enables  typing in a password for user slitt. Be sure to input the same password as user slitt uses on the Windows box. Note that there are other ways to maintain synchronicity between Windows clients and Samba servers, but they're beyond the scope of this article. See "Samba Unleashed".

Other Configuration Files in /etc


This is the  upper level configuration of cron jobs. See
man -a crontab


This file contains terminal definitions, converting various generic video output functionalities to sequences specific to your terminal. It's big and complicated. You can learn more at:
man -a termcap


This file defines all printers available on system, whether those systems are plugged into your lpt: port, or available on the network via NFS, or Windows printers available on the network via Samba.

The printcap file uses a fairly complex syntax, so many administrators change this file only with various tools such as Red Hat's printtool. The problem with such tools is they keep voluminous configuration information in comments formatted in a manner proprietary to the tool. That means that editing the file with VI or with a different tool can prevent further change via the tools.

My suggestion: If you're a Red Hat or Mandrake user, use printtool. Then, gradually, learn the syntax of the printcap file and the theory of printing so you're never at the mercy of a tool.

Voluminous documentation on the printcap file, including syntax standards, are here:

man -a printcap


This defines mime types, i.e. file type by file extension. Mime types rule which apps are started by default when a file is opened in a file manager. They also rule what happens when a file of a specific extension is downloaded via browser. Note that there is also a /etc/httpd/conf/apache-mime.types used by the Apache web server.
Steve Litt can be reached at Steve Litt's email address.

Comparing to a Working System

By Steve Litt
In various places in Troubleshooters.Com, and in my book "Troubleshooting: Tools, Tips and Techniques", I discuss the tactic of comparing a problematic system to working system, and exploiting the differences. In Linux, I do this all the time. I keep a 2 Gig partition free to temporarily install Linux. When I have a problem I can't figure out in other ways, I install the same distro on the new partition, verify that the symptom does not occur on the new experimental partition, and then begin to exploit the differences.

Exploiting the differences amounts to observing differences, and either changing an aspect of the the non-functional system to resemble the working system, or changing an aspect of the working system to resemble the non-functional system. In the former, if the symptom goes away I've found a cause. In the latter case, if the symptom appears in the working system, I've found a cause. From there it's necessary to determine if that cause is the *root cause*, but that's usually easy.

Another use of the temporary distro partition is to exploit the "smartness" of the installation program. With its hardware detection and smart defaults, the installation program points up exactly how to accomplish various configurations.

Installation takes roughly 30 minutes. Most of that time is non-attended, meaning you can do other work on a different box during that time, but it's still time consuming. So every time I use it I document my hard-earned knowledge so I won't need to do this next time. Always be careful when installing distros not to make your system non-bootable. Also be very careful not to format any partitions with existing data. Be *very* careful when using fdisk or any other partition manager.

Steve Litt can be reached at Steve Litt's email address.

To Install Fresh, or Not to Install Fresh

By Steve Litt
Every one of my buddies vehemently states that new Linux versions should not be installed clean, but instead upgraded. I'm not so sure. Their response is that I'm too "stuck in the Windows world". These are people whose opinions I highly respect.

But they're wrong about one thing. I didn't learn the value of fresh, clean installs in the Windows world. I learned it in 1984, before Windows existed.

I was one of four programmers for a medical management package running on the TSX multiuser environment on top of RT11 running on DEC PDP11's. I want to repeat that a different way. An installation consisted of installing RT11 on a PDP11, then TSX on top of RT11, then our package on top of TSX. Things were not going well.

The boss cornered another programmer and myself in the hall, and reminded us that 90% of the installations were malfunctioning, and we were spending all our time putting out fires, and "if you can't get our software to work, there's not much reason for you to be working here". This was one of the nicest bosses I've ever had, but his meaning was clear enough. I think he told this to me because he knew I was the best Troubleshooter in the organization, and if anyone could solve the problem I could.

And I did. I quickly determined that the problem was that every single installation was a conglomeration of whatever RT11 tape the system rep happened to have, and whatever TSX configurations he or she happened to institute, and a medical management package consisting of whatever versions of various components he or she could beg, borrow or steal from other customers. Can you spell "segfault"?

So I took a snapshot of the latest known good version of our software, OS and environment, and put them all on tape. I duplicated the tapes, gave them to all the system reps, and got management to declare that all installs must be off those tapes, and any malfunctioning systems not originally installed by a standard tape must be installed clean with a standard tape. Problems went down to a manageable level, to the point where almost all problems were user error.

Never underestimate the value of a known state! If you don't know the state of your system, you don't have a well defined system. And if you don't have a well defined system, it is not mathematically certain that you can solve reproducible problems in that system.

As a practical matter, because Linux's configuration overwhelmingly observable in editable text files, its state is always reasonably deducible. There's some pretty good justification for upgrading rather than installing fresh. Nevertheless, I *strongly* recommend taking steps to at least retain the possibility of making fresh installs. That way you can install clean on those rare occasions when you find your system in a hopelessly unknown state.

The most important step is to separate data (stuff you can't buy or download in a convenient fashion) in a separate tree from software. I recommend all important data be under a /d directory, and that everything under /home be considered temporary. The /home directory acquires too much junk and compilation scraps to serve as a good data store.

The second most important thing is to document everything you install on the system. Did you install it with .deb, .rpm, source compile, or other? Where's the installation file now (keep them for fresh installs to the same state). Did you use any special compile options or rpm options? What directory was it installed in? It takes 5 minutes to document such things, and it can save days of fooling around trying to return the system to its state. It can also save many sleepless nights.

Because every single one of my friends recommends upgrade over install, I'll recommend it too. Most of the time. But be sure you leave the door open to install from scratch if your system departs too far from a known state.

Steve Litt can be reached at Steve Litt's email address.

Find and Grep are Your Friend

By Steve Litt
Quick -- what are the files you'll need to change in order to change the IP address of your host, but keep the same netmask:
find / -type f | grep -iv "^\/d\/" | grep -v "^\/home\/" | grep -v "^\/proc\/" | xargs grep -il 192.168.100 > oldsubnet.txt

The find command produces a list of all files in the root tree. The first grep statement filters out everything in the /d tree. The second grep statement filters out all files in the /home tree, which can be assumed to be data or temp stuff rather than network config info, and the third grep statement filters out everything in the /proc tree, which obviously doesn't contain config data. The xargs statement searches all remaining files for the string "192.168.100", and reports on any which match.

Here's the list this command produced on my desktop computer, that's at

The list cofirms what we suspected all along. Everything outside the /etc tree mentioning this subnet is a log, documentation, or core dump. Well, except for the following:

The lock file is probably something that can be deleted. The /var/spool/mail/slitt file is actually a mailbox file, and the IP is mentioned in an email, so it's not applicable. The two in /var/named are part of the DNS system, and *absolutely* must be changed. The perl file looks like it might be part of another configuration tool. It should be backed up in case needed, and possibly changed to accommodate the new subnet.

Note that this command takes several minutes to run on my dual Celeron 450 with 512 meg of RAM, so it's a command you'll want to run just before going to lunch or going home. But it's the best method of finding *every* file that could conceivably affect IP addressing on the system.

Steve Litt can be reached at Steve Litt's email address.

Using Configuration Tools

By Steve Litt
All configuration not involving passwords can be accomplished by editing text files. For the intelligent, thorough and experienced system administrator, that's probably the best way. But there's an extreme shortage of knowledgeable technologists, so it's sometimes necessary to turn to config tools that act as a front end to the text configuration files. To the extent possible, the interaction should be two way, meaning that the config tool can read its initial state from *any legal* state of the config file, and any tool usage writes a legal config file. A great deal of Linuxconf is like that. You can change the "name service access" screen's "Multiple IPs for one host" field, save and exit out of Linuxconf, and observe the result in hosts.conf. You can then change it in hosts.conf, and note that the change is recognized in the Linuxconf program.

Then there are one way tools, like Red Hat's printtool. Printtool always writes legal /etc/printcap files, but it cannot read any old legal printcap file created with an editor. This is because printtool depends on some of its config info being written into comments within the printcap file.

Oneway tools can still be useful, even after the configuration file has been hand-edited. Take printtool for example. Let's say you want to add a network printer with an HP Laserjet IIID filter. Simply rename the existing config file, and run printtool, and create the new printer driver for the IIID. Now rename the newly created printcap, rename the original back, and use VI cut and paste to copy the new printer definition to the existing printcap.

Here are some of the best known configuration tools:
Linuxconf curses Curses, GUI Redhat derived linuxconf Most system configuration can be done from here
Drakconf GUI Mandrake DrakConf & Configure X, resolution, users, security level, startup services,  keyboard choice, rpm packages (Kpackage), linuxconf entry point, hardware, network, and printers
Control Panel GUI Redhat derived control-panel & This program changes the current state of the operating system.
Netconf Curses, GUI Redhat derived netconf Network part of Linuxconf
Userconf Curses, GUI Redhat derived userconf User maintenance part of Linuxconf
FSconf Curses, GUI Redhat derived fsconf Filesystem part of Linuxconf
Printtool GUI Redhat derived printtool & Front end to the /etc/printcap file
Xconfigurator Curses Most distros Xconfigurator Front end to /etc/X11/XF86Config
XF86Setup GUI Most distros XF86Setup Front end to /etc/X11/XF86Config
SWAT Browser With Samba Browse to serverIP:901 Front end to smb.conf
wmconfig ? ? wmconfig Create menu system for your window managerf
lisa Curses Caldera lisa Old but excellent Caldera config program
COAS GUI Caldera From menu system New Caldera configuration system
kpackage ? With KDE ? KDE interface to Red Hat rpm files
dpkg Curses Debian dpkg Very low level .deb package management
apt-get Curses Debian apt-get Higherl level .deb package management, complete with Internet search and sophisticated requirement search
rpm Teletype Caldera, Redhat derived rpm (lots of options) Command line .rpm install, upgrade and uninstall utility
ifconfig Teletype Linux ifconfig (lots of options) Look at and configure network interfaces
ipchains Teletype Linux ipchains (lots of options) Look at and configure IP forwarding, IP masquerading, IP firewalling, and various packet level security.
route Teletype Linux route (lots of options) Look at and modify the routing tables. Often included in boot scripts
comanche ? With Apache ? GUI front end to /etc/httpd/conf/httpd.conf
KDE user mount tool GUI With KDE ? Mounting drives and partitions
pppconfig ? Debian ? Configure ppp
KDE Menu Editor GUI With KDE kmenuedit -caption Menu Editor -icon kmenuedit.xpm -m Edit the KDE menu

Steve Litt can be reached at Steve Litt's email address.

Linux Measuring Tools

By Steve Litt
As the owner of Steve's Stereo Repair in the early 80's, I invested in only two pieces of measuring equipment: a multimeter and a series lightbulb.

Nowadays I troubleshoot computers, and I need to measure RAM, CPU, cache, disk usage, network connectivity and packets. Lucky for me, all the tools come on that little Linux install CD I get for $50.00 (or $2.00 from LinuxCentral or CheapBytes).

Most computer measuring tools are software. Once you've gotten past the Power On Self Test (POST), you can pretty much measure everything with software.

Hardware and Low Level System Info

This stuff's almost as good as a voltmeter. The /proc directory is a wishing well of low level hardware and software information. As a matter of fact, I think I could sell Linux for about $500 as a Windows diagnostic tool. Just install Linux in a little partition, and look in the /proc directory to see interrupts, CPU information, memory info, all partitions in the machine (no fdisk needed), and much, much more. But I'd have trouble getting $500 because LinuxCentral and CheapBytes are selling it for $2.00. Kind of makes it tough to sell Windows diagnostic software, doesn't it :-)

Every single Linux system has these tools. They're free with Linux. Use them early and often.

Every one of these runs in a teletype interface, meaning they can be accessed on any Linux system, no matter how simplistic.

Here are some quick peeks at your system's hardware and low level system software.

cat /proc/interrupts

Interrupt conflicts are much less common in Linux than in DOS and Windows. But if you suspect one, here's a quick window into your interrupts. Here are the interrupts on my desktop computer:
[slitt@mydesk /proc]$ cat /proc/interrupts
           CPU0       CPU1
  0:     403992     404595    IO-APIC-edge  timer
  1:      21532      21535    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 10:       6831       7129   IO-APIC-level  eth0
 11:          0          0   IO-APIC-level  es1371
 12:       8954       9243    IO-APIC-edge  PS/2 Mouse
 13:          1          0          XT-PIC  fpu
 14:      77266      69593    IO-APIC-edge  ide0
 15:          7          2    IO-APIC-edge  ide1
NMI:          0
ERR:          0
[slitt@mydesk /proc]$
Isn't it interesting that /dev/cua0 and /dev/cua1 don't have their interrupts listed here?

cat /proc/cpuinfo

What's up with your system speed? Is your CPU performing to expectation? Is it even the CPU you thought it was? How many CPU's are operating? Check out my system's 2 300Mhz Celerons, clocked up to 450mhz (and running cool as a cucumber):
[slitt@mydesk slitt]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : Celeron (Mendocino)
stepping        : 5
cpu MHz         : 451.029273
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips        : 448.92

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : Celeron (Mendocino)
stepping        : 5
cpu MHz         : 451.029273
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips        : 450.56

[slitt@mydesk slitt]$

cat /proc/ioports

Suspect an addressing problem? Wondering if your serial ports are set right? Want to write an assembler program to mess with the hardware? Here's where you find the info:
[slitt@mydesk /proc]$ cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
a400-a43f : es1371
f000-f007 : ide0
f008-f00f : ide1
e083b000-e083b01f : Intel Speedo3 Ethernet
[slitt@mydesk /proc]$

cat /proc/modules

The past several versions of Linux allow some of the kernel's functionality to be in loadable modules, reducing the size of the core kernel and enabling more features. Wondering what modules are loaded? Check here:
[slitt@mydesk /proc]$ cat /proc/modules
nfs                    31960   1 (autoclean)
lockd                  33736   1 (autoclean) [nfs]
sunrpc                 57732   1 (autoclean) [nfs lockd]
slhc                    4408   0
eepro100               14184   1 (autoclean)
vfat                   11004   0 (unused)
fat                    33120   0 [vfat]
es1371                 29412   0
soundcore               4164   4 [es1371]
[slitt@mydesk /proc]$

cat /proc/meminfo

There are several memory information utilities in Linux, and they get their info from /proc/meminfo. What I like about the /proc/meminfo information source is it's formatted very cleanly, suitable for incorporation into a report-writing script. The top line lists column headers. The next line lists various RAM memory figures, in bytes, not human-convenient kB or mB. The next line lists swap memory -- your swap partition -- how much there is, how much is used, and how much free, again, in bytes. The next bunch of lines list single figures in human-convenient KB. Note that the total memory in line 2, divided by 1024, equals the kB value in the MemTotal line.
[slitt@mydesk slitt]$ cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  529518592 103690240 425828352 93466624  9449472 45191168
Swap: 133885952        0 133885952
MemTotal:    517108 kB
MemFree:     415848 kB
MemShared:    91276 kB
Buffers:       9228 kB
Cached:       44132 kB
BigTotal:         0 kB
BigFree:          0 kB
SwapTotal:   130748 kB
SwapFree:    130748 kB
[slitt@mydesk slitt]$
Interestingly enough, my computer has 512mB of hardware RAM installed. 512*1024*1024=536870912 bytes, 7352320 bytes more than that reported by this command. Is my RAM defective? Is some of it inaccessible because of my hardware? These are things to investigate.

The total memory minus the used memory equals the free memory. Of the used memory, the preceding command shows 91276 kB as shared memory, 9228 kB as buffers, and 44132 kB used as disk cache.

The swap total is the size of your swap partition(s).

cat /proc/partitions

This is something you should print out and leave taped to the wall, so if you ever goof up LILO and can't get back in, you can bust in with a boot disk and mount enough to fix your problem. It's also very handy for making scripts, and anything else requiring you know about all the partitions on your system. What's cool about this is it reports on all partitions, mounted or not.
[slitt@mydesk /proc]$ cat /proc/partitions
major minor  #blocks  name

   3     0   20044080 hda
   3     1      32098 hda1
   3     2          1 hda2
   3     5     136521 hda5
   3     6    1831378 hda6
   3     7    6032376 hda7
   3     8    3076416 hda8
   3     9    2048256 hda9
   3    10     514048 hda10
  22     0 1073741823 hdc
[slitt@mydesk /proc]$

cat /proc/mounts

Here's the second thing you should print out and leave taped to the wall. If I goof up LILO and need to bust back in (I hack -- that happens about once a month), this is your roadmap for doing so. In the case of my system, my /boot partition is shown to be on /dev/hda1, and my /d is on /dev/hda7, while my / is on /dev/root. 
NOTE: Be sure to record which partition is actually /dev/root. This is deduced from the mount command, which shows my / to be on /dev/hda8. The mount command is probably more useful because it lists the actual partition.
[slitt@mydesk /proc]$ cat /proc/mounts
/dev/root / ext2 rw 0 0
/proc /proc proc rw 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda7 /d ext2 rw 0 0
none /dev/pts devpts rw 0 0
mydesk:(pid550) /net nfs rw,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,noac,add
r=pid550@mydesk:/net 0 0
[slitt@mydesk /proc]$

Memory and CPU Measurement


This program lists the "top" 17 processes. They can be sorted by process ID, age, CPU usage, resident memory usage or cumulative time. You change the sort order with a single keystroke. There are many other adjustments you can make in real time. To investigate, press the h key while top is running.

Top answers questions like this:

Above the 17 processes is a header giving all sorts of stats about the machine's state. Here's the header for my Linux desktop box:
  6:40am  up  1:52,  2 users,  load average: 0.00, 0.00, 0.00
64 processes: 62 sleeping, 1 running, 1 zombie, 0 stopped
CPU states:  0.0% user,  0.3% system,  0.0% nice, 99.5% idle
Mem:   517108K av,  100416K used,  416692K free,   89212K shrd,    9220K buff
Swap:  130748K av,       0K used,  130748K free                   44080K cached


This is a "quickie" look at memory, duplicating the info from top, vmstat, /proc/meminfo, and others:
[slitt@mydesk slitt]$ free
             total       used       free     shared    buffers     cached
Mem:        517108     105760     411348      99180      10436      44728
-/+ buffers/cache:      50596     466512
Swap:       130748          0     130748

[slitt@mydesk slitt]$ free -t
             total       used       free     shared    buffers     cached
Mem:        517108     105780     411328      99180      10436      44728
-/+ buffers/cache:      50616     466492
Swap:       130748          0     130748
Total:      647856     105780     542076

[slitt@mydesk slitt]$ free -to
             total       used       free     shared    buffers     cached
Mem:        517108     105780     411328      99180      10436      44728
Swap:       130748          0     130748
Total:      647856     105780     542076
[slitt@mydesk slitt]$


This is an excellent memory, swap, buffer and cpu measurement utility. You can cause it to continuously print updated information by placing a numeric argument at the end specifying the number of seconds between updates.
[slitt@mydesk slitt]$ vmstat
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 415668   9220  44092   0   0     1     0   66   145   1   0  99
[slitt@mydesk slitt]$               

malloc and nice

These aren't a measurement utilities, but by creating a small C program using malloc() to "gobble up" memory, combined with the preceding measurements, you can do some fairly serious bottleneck analysis.

Likewise, you can use the nice command on a tight loop program you create to "gobble" cpu in order to do bottleneck analysis on CPU.

The exact methods are beyond the scope of this magazine, but they are completely documented in Samba Unleashed's chapter 35, "Optimizing Samba Performance". Although that chapter is intended to be Samba specific, it's one of the best documents on Bottleneck Analysis I ever wrote.

Disk Measurement


For each mounted partition, the df command lists the used and free disk space. This is vital for system architecture decisions and for finding and correcting impending "disk full" conditions before they happen.
[slitt@mydesk slitt]$ df
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda8             2.9G  1.5G  1.2G  56% /
/dev/hda1              30M   16M   13M  55% /boot
/dev/hda7             5.6G  1.1G  4.1G  21% /d
[slitt@mydesk slitt]$ 


This utility takes a directory as an argument, prints and totals all filesizes in the tree. Never use this command on the root (or without arguments), as it will take a long time. Note that this command counts node sizes (similar to clusters in DOS), so you find out how much space you'd save by deleting the tree. This is very different from programs you use to traverse a tree and add up the file sizes.

Network Management


This is how you check connectivity. The ping program determines whether you can see a network interface (could be on the same box, or on a different one). This tests hardware, network drivers, subnetting and routing, and even firewalling and IP forwarding and masquerading. Ping is important because if you can't ping, there's no use trying to establish email, web, or file service connectivity.
[slitt@mydesk slitt]$ ping -c3 localhost
PING localhost.localdomain ( 56 data bytes
64 bytes from icmp_seq=0 ttl=255 time=0.1 ms
64 bytes from icmp_seq=1 ttl=255 time=0.1 ms
64 bytes from icmp_seq=2 ttl=255 time=0.0 ms

--- localhost.localdomain ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/0.1 ms

[slitt@mydesk slitt]$ ping -c3
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=255 time=0.1 ms
64 bytes from icmp_seq=1 ttl=255 time=0.0 ms
64 bytes from icmp_seq=2 ttl=255 time=0.0 ms

--- ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/0.1 ms

[slitt@mydesk slitt]$ ping -c3
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=255 time=0.2 ms
64 bytes from icmp_seq=1 ttl=255 time=0.2 ms
64 bytes from icmp_seq=2 ttl=255 time=0.2 ms

--- ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.2/0.2/0.2 ms
[slitt@mydesk slitt]$

Here's a simple predefined diagnostic using ping:

Here's why this predefined diagnostic is so handy So in less than 1 minute, you can significantly reduce the scope of the root cause.


If you can't ping, the next step is to see your network card's IP address, netmask, and other information. Do this:
[slitt@mydesk slitt]$ /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 00:D0:B7:09:2A:58
          inet addr:  Bcast:  Mask:
          RX packets:24440 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8551 errors:0 dropped:0 overruns:0 carrier:0
          collisions:13 txqueuelen:100
          Interrupt:10 Base address:0xb000

eth0:0    Link encap:Ethernet  HWaddr 00:D0:B7:09:2A:58
          inet addr:  Bcast:  Mask:
          Interrupt:10 Base address:0xb000

eth0:1    Link encap:Ethernet  HWaddr 00:D0:B7:09:2A:58
          inet addr:  Bcast:  Mask:
          Interrupt:10 Base address:0xb000

lo        Link encap:Local Loopback
          inet addr:  Mask:
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:1399 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1399 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

[slitt@mydesk slitt]$

The preceding command shows the network card eth0 at IP address, netmask, and   broadcast address It's up. Also shown is that eth0 supports two "alias" IP addresses, and These are used for serving two different websites. Note also that aliases can be on different subnets, thereby enabling communications to other subnets.


Configuration and troubleshooting would be a lot easier if all network traffic were referenced by numerical IP address instead of names. But it would drive users (including we admins) nuts. So we must provide mechanisms for translating between names and numbers. When those go wrong, nslookup is one tool used to diagnose the problem.
[slitt@mainserv slitt]$ nslookup www.domain.cxm
Server:  mainserv.domain.cxm

Name:    mainserv.domain.cxm
Aliases:  www.domain.cxm

[slitt@mainserv slitt]$ nslookup www.domain.cxm localhost
Server:  localhost

Name:    mainserv.domain.cxm
Aliases:  www.domain.cxm

[slitt@mainserv slitt]$ 
In the preceding, the first command looks it up on the default nameserver, which is (the DNS server for this LAN). You can specify a second argument telling nslookup to do the lookup on a specific machine. In the case of the second command above, that lookup machine is localhost.


The dnswalk tool is a detailed check on your DNS system. I've found that if your system passes dnswalk without errors or warnings, it will probably function completely right. If dnswalk throws errors or warnings, sooner or later you'll have dns problems, and you should fix them.

The dnswalk program probably does not ship with your Linux CD, but it's free on the Internet. The URL is listed in the URL's section of this magazine.


Sure, I'd have liked an oscilloscope and a distortion analyzer at Steve's Stereo Repair, but the cheapest decent scope cost $600, and I don't even want to think about the distortion analyzer. So I used what I could afford.

Isn't it nice to know that the tcpdump program on a cheapo Linux box performs many of the functionalities of expensive network analyzers? That's the beauty of Linux.

You can learn more about tcpdump like this:

man -a tcpdump
Steve Litt can be reached at Steve Litt's email address.

Linux Network Troubleshooting

By Steve Litt
Here's a nice little predefined diagnostic for network problems: The following is an automated script, called nettest, to perform this predefined diagnostic. You use the IP address of another host on the subnet as an argument, and this script does the rest, piping the results to a file called pingtest.txt.
[ $1 ] || { echo "usage: nettest IPofOtherHostOnSubnet";exit 1; }

NIC=$(/sbin/ifconfig -n eth0 | grep inet | sed -e "s/.*inet addr:\(.*\)\s*Bcast.

Writing ping tests to pingtest.txt...
echo Testing localhost at $LOCALHOST...
if ! ping -c2 $LOCALHOST > pingtest.txt; then
  echo $LOCALHOST failed! See pingtest.txt.
  exit 1

echo Testing eth0 at $NIC...
if ! ping -c2 $NIC >> pingtest.txt; then
  echo $NIC failed! See pingtest.txt.
  exit 1

echo "Testing other host on subnet at $OTHERHOST (argument to this command) ..."
if ! ping -c2 $OTHERHOST >> pingtest.txt; then
  echo $OTHERHOST failed! See pingtest.txt.
  exit 1

cat pingtest.txt

If it fails on one host for the subnet, see about others.

DO NOT attempt to troubleshoot lack of Samba, telnet, ftp, email, or http connectivity until you can ping. You can't telnet to what you can't ping.

Steve Litt can be reached at Steve Litt's email address.

Using Ipchains

By Steve Epstein
As the number of computers used in homes and small offices increases, so too, does the likelihood of networked systems. Today, whether networked or not, nearly all computers are connected to the Internet. The Internet, and many small networks use TCP/IP (Transmission Control Protocol/Internet Protocol). Ipchains is a powerful program that provides complex IP filtering and accounting for TCP/IP connections. Ipchains can be used with the Linux 2.x kernels. Support is compiled into most kernel distributions, but if ipchains is not in your kernel, it can be compiled in. To check which version you have, or if you even have it, enter the following:
# ipchains -V
I have version 1.3.9 installed on my Red Hat 6.2 machine.

Accessing the Internet through a gateway

For many, the most common use of ipchains is to provide access to the Internet through a common network gateway. The simplest (and least secure) way to do this is by enabling IP forwarding and the setting up some simple chains. IP forwarding is enabled with the following command:
# echo "1" > /proc/sys/net/ipv4/ip_forward
Once that is done, two simple ipchains commands will get you started:
# ipchains -P forward DENY
# ipchains -A forward -s -j MASQ
Substitute your network's IP address and subnet mask for the above. The first command simply sets the policy of the forward chain to deny everything, and the second command is an append (-A) that will masquerade any packets with a source (-s) or (the local network). You might also want to verify all the current ipchains with the following command:
# ipchains -L
This will list the current chains. There are three chains in the default ipchains setup; input, output and forward. Both input and output need to be configured for an ACCEPT policy (this is not very secure, but it works).

If you have a properly setup network using TCP/IP, the above commands will allow the rest of the network to connect to the Internet through an Internet-connected Linux server. Additional services such as Web browsing will require setup of the DNS entries in the client machines. DNS and DHCP do not need to be running in the Linux server for proper operation of the above. In other words, a simple PING command to the Internet will work.

 To get most Web Browser functions operating will require DNS implementation at some level. The simplest way to do this is to include an address of a DNS server in the hosts file of each machine on the network. A typical hosts file might include the following:       localhost     gateway     worldnet
The above file has three entries, the address for the localhost, the address of a machine called gateway, which in this case is the gateway to the Internet, and the address of a DNS server on the Internet (AT&T Worldnet's). When (the) gateway is connected to the Internet, and ipchains is properly running, you should be able to PING each of the machines by address and by name. As a final test, try:
# ping
If this is successful, ipchains and DNS are operational. You can then bring up your browser and try it. In MS Windows machines, you may need to set the default gateway under Network Neighborhood -> Properties -> TCP/IP (Ethernet card) -> Properties -> Gateway.

The above information should be enough to get a small (test) network up and running with Internet connectivity using a dial-up, DSL or cable modem connection to the Internet. I have always found that a simple proof-of-concept can provide the necessary momentum to convince those involved to move forward and complete the project. The danger here is leaving the system configured in it's current less-than-secure status.

How it works

By default, there are three rules installed, input, forward and output. Additional chains can be added using:
ipchains -N chainname
or deleted with:
ipchains -D chainname
For many applications, the three default chains provide sufficient functionality. The figure provides a basic flow through the first stage of ipchains:
                ______________      ________     .--------.
From the       |              |    |        |    | Forward |    To 
input chain--->| Demasquerade |--->| Sanity |--->| chain   |--->next
               `------.-------'    `---.----'    \_________/    chain
                      |                |              | 
                      |                |              | 
             _________V_______       __V___       ____V________
            |                 |     |      |     |             |
            | To output chain |     | DENY |     | DENY/REJECT |
            `-----------------'     `------'     `-------------'

Packets entering this machine from outside sources such as the local network, through a dial-up connection or through the loopback device are verified to be good through both a checksum and a sanity check. Good packets are allowed through where they are checked in the input chain. To allow or deny a connection based on IP address use the following:

# ipchains -A input -j DENY -p all -s -d
In this example, any packet with a source address of (subnet mask will be denied passage through the input chain. Filters can also be setup to block packets based on addresses, port numbers or interfaces for both source and destination.

 After the input chain, a demasquerade process is used if the packet is in reply to previously masqueraded packet. If the demasquerade is performed, the packet is routed directly to the output chain. If not, a routing decision is made based on the packet's destination. If the packets are for the local machine, they are sent locally, otherwise they traverse the forward chain and are possibly sent on to the remote destination. Locally created packets go through the routing decision process and are then sent to the output chain as necessary.

 Each of the chains are simply a set of rules. Each packet is examined and based on the rules in place, the packet will be accepted, denied or redirected. You can see how many packets have been sent through a chain using:

# ipchains -L chain -v
And, the following can be used to reset the counters:
# ipchains -Z chain
Once your chains are in place, they can be saved using the save command, and restored using restore. For a very simple setup, the commands can be added to a startup file such as /etc/rc.d/rc.local. Additional information is available through man ipchains and the IPCHAINS-HOWTO. Both of these are included in most distributions. As noted throughout the text, the setup described is rather rudimentary and insecure. It will work reasonably well for a small network that is only connected to the Internet for brief periods. If you wish to have a permanent Internet connections, consider using a pre-configured firewall package or researching the topic further, as setting up a secure firewall is a complex task.
Steve Epstein is a technical editor/writer and broadcast engineer, with more than 20 years experience in electronic, mechanical and computer troubleshooting. Currently, he is technical editor of several magazines including Broadcast Engineering. He can be reached at
[Editors Note: Steve Epstein was also one of the two technical editors of Samba Unleashed, and did an excellent job catching my technical booboo's and suggesting better ways to present some of the technical material].

Getting Answers Online

By Steve Litt
The "Linux user community" won Infoworld's 1997 Best Technical Support award. Not Microsoft, or Sun, or a huge outsource helpdesk organization. How can a bunch of users beat out the biggest and best in the tech support world?

It's entirely too easy to do a better job of tech support than from commercial vendors. As a matter of fact, most commercial tech support borders on pitiful. Look back on your own experience (or your tech employees experience if you're in management). You wait for an hour on hold, during which time you're repeatedly cautioned to "have your credit card ready". You're finally greeted by an obsequious clerk with half your knowledge whose job seems to be to belittle callers. After 5 minutes the guy realizes you really do know more than he does, and he passes the buck (scuse me, escalates) you up to the next level of tech support, probably with another stint on hold, repetition of the symptom description, and another belittling session. This continues until at some point somebody takes your number and tells you the guy who *really* knows the answers will call you.

But you don't receive the call, because they're in a California and you're in Boston, and they call you at 8:30pm your time. You call back at 8am the next morning, but they're still fast asleep. Finally you talk to the real guy, a developer, who assures you its a known problem, will be solved in the next release, and gives you a rough idea how to work around it for now.

Contrast this experience with fixing a Samba problem. You do your best to narrow the scope of the problem. You create a tiny smb.conf file that produces the symptom. You find anything in the log that sheds light. You do a little preliminary troubleshooting. Then you report it on the Samba mailing list, and if you've done your job well in the preliminary work, someone will likely report a solution within a few hours.

But what if it's a bug, as in the previous example of commercial tech support? You've got the source code, and it's much easier than you'd think to fix it in source code. Then you email in the solution as a patch, and *everyone* can use your solution. You're famous!

Getting the Most Out of the Open Source Support Model

Not everyone gets such great results from mailing lists. Some get no result at all. What factors account for that discrepancy?

The difference, in a word, is respect. Respect for the technologist's time. For his or her intelligence. For his or her finances. For his or her humanity. Until you've been on the receiving end of disrespectful help requests, you can't imagine how aggravating some help requests can be.

If any of this applies to you, please take it as the most gentle of criticism. I'm not calling you names, I'm giving you very easy advice that will help skyrocket your effective use of online support.

Respect! It's easy to understand. But it can also be elusive, especially if the person making the help request doesn't understand the situation of the person on the other end of the email. Take me, for example...

Until a year ago, I answered almost every emailed help request received by Troubleshooters.Com. Those days are gone. Today, if the person gives a woefully inadequate symptom description, I trash it. I don't have time for a detailed email interview. Oppositely, if the person gives me a huge oration instead of a concise symptom description, it's trashed. If I had voluminous time to read, I'd read Gone With The Wind. Then there are the guys emailing me long lines of font-filled html email that gets garbled in quoted reply text, requiring a switching back and forth between the original and the reply. No time for that.

It's simple. The technologist on the other end of your email considers himself a fun-loving technologist. Not a slave. Not a psychic. And *certainly* not cheap labor. Sometimes the best way to tell how to do something is to show how not to do it. I therefore present the Open Source Support Dis List.

Open Source Support Dis List

Here are some common mistakes in online help requests. If you've never been on the receiving end of such requests, they may not seem like mistakes. But when you consider the person on the other end may field several hundred emails a day, and is contributing his time to the community instead of earning high consulting fees with that same time, it becomes apparent why these mistakes lead to stoney silence on the other end.

This is a list of these mistakes, and some internal dialog a harried technologist might think (think, not say) as he or she reads the email. These are phrased rather harshly, but please don't take it personally. The purpose here is to reveal what it's like to get these common mistakes in several of your several hundred emails per day.:

Respect, and You Shall Receive

It's pretty simple. All that's required for success in the "Open Source Support Model" is respect. It's easy once the support person's situation is understood. He's got expertise worth a big salary, he's got the ear of several hundred per day, and he's got very little time. He appreciates those who take the time to formulate complete yet concise symptom descriptions, do the Troubleshooting spade work, and create a short and simple reproduction example or sequence. Do that, and you'll receive your answer (probably several) in a very timely fashion.

And no little voice in your ear saying "There are only three ahead of you. Your business is very important to us. Please hang on."

Steve Litt can be reached at Steve Litt's email address.

"But We Can't Find Linux Talent"

By Steve Litt
The old "human capital" argument. "Sure, we'd like the superior Linux operating system, but we can't find people to support it". It sounds so valid in these days when knowledgeable technologists often command six figure salaries and still jump ship every year or so. But this "human capital" argument, so exploited by Microsoft and the Windows retroculture, has some basic logic flaws.

Aren't Linux Technologists in Short Supply?

The first logic flaw is the supposition that you'll need as many Linux technologists as you needed Windows technologists. That's not true -- it's not even close. For many reasons, Linux technologists are much more productive than their Windows counterparts.

The most obvious reason is that Windows malfunctions regularly. For absolutely no reason. The old steam locomotives required a fireman to constantly throw coal into the boiler. In the same way, Windows systems require an administrator to constantly repair strange problems that result from continued operation, memory leak, dll fragments, etc. Contrast that with Linux admins, who are needed only in major configuration changes, incredibly stupid user errors, or natural disasters.

As if the Windows system's flakiness weren't enough reason for a reduction in Windows technologist productivity, there's also the problem with the supply of Windows technologists. Sure, there are a lot of MCSE's (Microsoft Certified System "Engineer"), but many are "paper MCSE's". Inexperienced people who have proven they can pass a test but can't they fix systems. In fact, word on the street is that Linux, UNIX and BSD administrators, with their superior knowledge of networking, are often in a better position to solve Windows problems than the paper MCSE's.

So you can get along with far fewer technologists in a Linux environment, greatly reducing the impact of the Linux expert shortage. But what about that shortage?

Home Grown Linux Experts

If you perceive a Linux technologist shortage, you're far sighted. Today, right in your organization, you have talented, tried and true technologists. Many of them *hate* Microsoft, and would jump at the chance to work in Linux. Let em!

But oh, my, what about the training costs for Linux?

Is Windows training so cheap? Every year Microsoft comes out with newer, incompatible technology, each with its own exceptions, workarounds, incantations and rain dances. Every year your technologists need training on changes. Considering that Windows technologists must relearn everything every couple years, it's hard to identify an "investment" in Windows expertise. Turn around, and the "investment" is gone.

Contrast this to Linux, which has as its foundation the thirty year old UNIX system. In Linux, training is an investment, not an ongoing cost center.

NOTE: I can hear the Windows retroculture sycophants digging into the preceding statement about a 30 year old foundation. Give it up guys, I said *foundation*. You can put a modern central air conditioning system in a 16th century castle. You can put the latest sound, video, and everything else into UNIX. It's the *foundation* that's old (and sturdy). The feature set is state of the art. And it doesn't blow away in the first wind.

Let's take it a step farther. Suppose for a minute that the hordes of paper MCSE's really did represent an "investment". You could quickly and cheaply duplicate that in the Linux world. Find the in-house technologists who are ready, willing and able (chomping at the bit might be a more apt description) to switch to Linux. Give each a hand-me-down desktop and a $2.00 Mandrake CD from Cheapbytes or LinuxCentral, and watch em quickly learn Linux. If you want to speed up the process, get them a copy of my book, "Rapid Learning: Secret Weapon of the Successful Technologist", which describes self-guided learning by experimentation.

But isn't learning by experimentation slower than classroom learning?

No. Classroom learning is necessitated primarily by proprietary technologies costing thousands to get in the game. Linux and most software running on Linux is free. No problem. Additionally, classroom learning is necessitated by technologies for which there's little useful documentation. With certain ERP apps, you must prove you've been using the technology just to get into the user group. With Open Source software, absolutely anybody can join a software project's mailing list, and see the thoughts and decisions of the software's developers.

Using experimentation and other techniques outlined in "Rapid Learning: Secret Weapon of the Successful Technologist", within a day the technologist can achieve the proficiency bestowed by three days of classroom training. Within 3 days he or she can be truly useful to the organization in that technology. With continued use and experimentation he or she will become an acknowledged guru. You didn't spend a dime on classroom courses, airline tickets, parking, or anything else. Oh yeah, you spent $2.00 on a CD and the salvage cost of an old PC.

But what if you're a tiny company and you don't have any in-house technologists to train? Where could you possibly find a Linux expert?

Want Linux Genius? Go to the local LUG!

If you're in Central Florida, come to a Linux Enthusiasts and Professionals meeting and I'll introduce you to at least 10 people who can walk into your organization, and improve your organization's automation manifold. They'll do it with Open Source software, starting with the Linux (or possibly BSD) technology. You'll find top notch developers, Ecommerce veterans (remember, Ecommerce started on UNIX), and ninja administrators.

Don't live in Central Florida? All LUGs are equally stocked with Linux expertise. Go on the web and search for a local LUG. Every major city has a LUG. Don't assume the LUG members are out of your price range. Many are so anxious to bail from the legacy Windows system as to moderate their compensation demands. And once again remember that Linux technologists are likely to be much more productive than equivalent Windows talent.

Don't Forget Outsourcing

Many vendors offer outsourcing. Start by rounding up the usual suspects: Red Hat, Linuxcare, VA Linux and the like. Then go for those less well known.

Summary: The "Human Capital" Gambit

There was a time the Windows retroculture espoused their "superior quality". We saw the king had no clothes and they switched to "innovation". Once again, the king was seen naked, and now they discuss the shortage of "human capital". Well obviously, with the falling but still powerful empire sucking up immense resources implementing their workarounds, there's a human capital shortage. The more Microsoft systems you have, the more human capital you're short.
Steve Litt is the author of "Rapid Learning: Secret Weapon of the Successful Technologist. He can be reached at Steve Litt's email address.

Linux Log: We're the Ambassadors

Linux Log is now a regular column in Troubleshooting Professional Magazine, authored by Steve Litt. Each month we'll explore a facet of Linux as it relates to that month's theme.
I hope I didn't bore you with the contents of this month's issue. As an experienced Linux user, you already know everything I said. I wrote this information for what are commonly called "newbies".

Writing info for Linux "newbies" is vital, because that's how Linux will grow to dominate the market. And once market domination is achieved, hardware vendors will ship Linux drivers with their hardware. And "development environments" will run on Linux. And we'll have a choice os office suites to run on our always-up Linux boxes.

Don't count Microsoft out. Even now, as they slide down the slippery slope toward oblivion, Uncle Billy is up to his old tricks, creating a new programming language called C# with which to build apps of the future. Although Billy's spokesman bills C# as interoperable across platforms, (,1199,NAV47-68-84-88-94_STO46293_TOPWindows,00.html), I predict that it will encapsulate enough Microsoft proprietarisms that it will never see the light of day on non-Windows machines. Now here's the kicker. If Microsoft can create a new version of Office, in C#, before the government busts them up, then it will be very difficult for the new M$ Applications Company to port Office to other platforms. I have no more of a crystal ball than anyone else, but I believe this is Uncle Billy's plan.

We, the Linux User Community, must continue the process of propelling Linux over the top for one more year. If we do everything right, Microsoft's toast. If we stumble (and we haven't stumbled so far), Linux is toast, and Microsoft will survive another couple years until yet another great technology displaces their competitor-clobbering kludge.

Too be blunt, we need to be VERY NICE to Linux newbies and wannabes. "Read the Fantastic Manual" is not an appropriate response, even to a question we know to be gratuitously lazy. We need over 50% of the OS market, which means we need these newbies. We need to let them know our community is the better community. We need to write good documentation for them to follow. And most of all, we must accept them, and not hold their recent Windows citizenship against them.

You may notice the preceding paragraph seems to conflict with the article entitled "Getting Answers Online" earlier in this magazine. That's true. In "Getting Answers Online" I pointedly asked people to respect the people on the mailing list by not asking gratuitously lazy questions. For the next year or so, we need to answer those questions nicely (Linux questions, not Windows questions). And when we get an inadequate or overly wordy symptom description, or failure to encapsulate reproduction environment, or questions about something answered hundreds of times, we need to *nicely* explain to the users how to do the spade work themselves next time.

If, during the next year, we continue to attract interest to Linux, Microsoft will move from an empire to a JAV (Just Another Vendor). They might even die (you heard it here first). You can't long feed a five figure payroll with nothing coming in. If they have something to offer, it's certainly not in the areas of quality or innovation. They may not be able to make the transition to competing on a level playing field. And wouldn't it be nice not to have those guys around any more :-)

So let's all be really nice to the newbies and wannabes. Invite them into our culture and our community. Celebrate our diversity. They bring a lot of corporational correctness to the table. Add that to our ability to create a quality, reliable app, and we've really got something.

So our job the next year or so is to be ambassadors. To the extent that we're good, we'll be highly respected in new era of Software just around the corner. The era when "you can't get fired for buying Open Source".

Steve Litt is a member of Linux Enthusiasts and Professionals of Central Florida (LEAP-CF). He can be reached at Steve Litt's email address.

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Troubleshooting Process, or articles on tools, equipment or systems with a Troubleshooting slant. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

By submitting content, you give Troubleshooters.Com the non-exclusive, perpetual right to publish it on Troubleshooters.Com or any A3B3 website. Other than that, you retain the copyright and sole right to sell or give it away elsewhere. Troubleshooters.Com will acknowledge you as the author and, if you request, will display your copyright notice and/or a "reprinted by permission of author" notice. Obviously, you must be the copyright holder and must be legally able to grant us this perpetual right. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

I (your name), am submitting this article for possible publication in Troubleshooters.Com. I understand that by submitting this article I am giving the publisher, Steve Litt, perpetual license to publish this article on Troubleshooters.Com or any other A3B3 website. Other than the preceding sentence, I understand that I retain the copyright and full, complete and exclusive right to sell or give away this article. I acknowledge that Steve Litt reserves the right to edit my submission for clarity or brevity. I certify that I wrote this submission and no part of it is owned by, written by or copyrighted by others.
After that paragraph, write the title, text of the article, and a two sentence description of the author.

URLs Mentioned in this Issue