Troubleshooting Professional Magazine
Making it in a Post Microsoft World, Part II Troubleshooting Linux |
Repairing Windows was similar to repairing those ancient vacuum tube radios. With tube radios, you removed all the tubes, went to the drugstore, and tested them on a tube tester. If any were bad you replaced them and tried again. If you had a tube radio with all good tubes that still didn't work, you threw it away. The wiring was a mess, the components were low quality, and modularity was nonexistent.
Tube radios are a relic of history, and Windows isn't far behind.
Now comes the new operating system, Linux. Windows was like a spaghetti wired tube radio, and Linux is like modern electronic equipment sporting removable circuit boards. It's modular, well laid out, and very well featured. Every adjustment (configuration change) can be made with a text editor, although there are tools to automate configuration changes.
Lacking the intermittence and non-documentation of Windows, Linux requires a different Troubleshooting Process, which might look something like this:
It's precisely this responsibility that causes those from the Windows retroculture to say "Linux is harder to work with". They're absolutely right. I could "troubleshoot" tube radios at the age of 12. I had to wait til my mid 20's before I had the knowledge and maturity to troubleshoot modular solid state equipment. But with the latter, I had about a 99% success rate. Tube radios were throwaway items -- kind of like Windows installations.
After so many years of the incantations and rain dances we used to keep those old Windows systems running, it's refreshing to have high quality, modular and repairable Linux systems. This issue of Troubleshooting Professional Magazine discusses the tactics and strategies of troubleshooting Linux systems. "Enhancement" and "configuration" might be a better description than "Troubleshooting", because once working, Linux systems seldom "screw up".
Guest author Steve Epstein makes his Troubleshooting Professional debut with his great article, "Using Ipchains", which demystifies much of this important security tool. Thanks Steve!
The July 2000 issue is actually part 2 of the "Making it in a Post Microsoft World" series. So kick back, relax, and read this issue. You'll find info to make your Linux config/troubleshooting straightforward. And remember, if you're a Troubleshooter, this is your magazine. Enjoy!.
You don't need to take my word for this. See what Gerald Nadler and Shozo Hibino say in the first few sentences of the fourth chapter of the book , "Breakthrough Thinking" (ISBN 1-55958-004-6):
The UTP is a 10 step process optimized to quickly find the root cause of any reproducible problem in a well defined system (and Linux systems are almost always well defined). I'd highly suggest using the UTP to solve computer problems. Here are the ten steps of the Universal Troubleshooting Process:
But it's also vital that no additional problems were created. Parts of the system likely to have changed are investigated to see whether they in fact did. Note that checking for new problems implies adequate completion of step 4, Reproduce the symptom. Without a baseline of the system's performance, it's impossible to determine whether new problems were produced.
The third and fourth questions tackle the problem of "coathanger solutions", named after the practice of hanging a car's muffler with coathangers instead of replacing the bolts. "Coathanger solutions" are solutions which strongarm the symptom into submission, but leave the root cause intact. The usual outcome of a coathanger solution is a side effect problem.
Here's an example. Imagine that Charlie Coathanger has an incorrect setting of his parallel port makes his printer incredibly slow when printing graphics. Charlie doesn't take the time to find the root cause (the incorrectly set parallel port), but instead he "fixes" the problem by a software tweak to print much courser (less bytes) graphics. The symptom is gone. Charlie's print jobs now print fast. But he's created the side effect of incredibly course and ugly graphics.
Note that the last two testing questions cannot be answered for problems fixed in step 5 (General Maintenance). In most cases, that's a legitimate tradeoff for Troubleshooting speed. However, in safety critical systems, Step 5 should include no steps that would compromise answering the last 2 questions of step 8. In fact, I often recommend against general maintenance in safety critical systems. If you absolutely must troubleshoot to the root cause, skip step 5 (except for its observational component).
The Universal Troubleshooting Process is valid for any well defined system. Most of the UTP is constant over all systems. The only thing that changes from system to system is that entity often called "subject matter expertise". I just call it the Mental Model. Read on...
All gone. Linux is here, and almost all of its configuration is contained
in text files. And almost all those text files are in the /etc
tree. Most of the major configuration files are directly in the /etc directory,
but before discussing those lets list some of the major subtrees of the
/etc tree:
/etc |-- X11 <Systemwide GUI config> |-- cron.d <some cron config, added to /etc/anacrontab> |-- cron.daily <daily cron jobs> |-- cron.hourly <hourly cron jobs> |-- cron.monthly <monthly cron jobs> |-- cron.weekly <weekly cron jobs> |-- httpd <all apache config goes in this tree> |-- isdn <isdn line config> |-- logrotate.d <config for rotating logs, restarting, etc> |-- mgetty+sendfax <config for virtual terminals, faxing, and voice> |-- news <config your inn newsreader here> |-- pam.d <configure logon security for various services here> |-- ppp <configure your dialup connection here> |-- profile.d <aliases and other systemwide profile config here> |-- rc.d <system and daemon startup/shutdown configuration> |-- init.d <contains start/stop scripts for all services> |-- rc3.d <Level 3 (console) startup config via symlinks to init.d> `-- rc5.d <Level 5 (gui) startup config via symlinks to init.d> |-- sane.d <scanner configuration> |-- security <some general computer access configuration> |-- skel <pattern for individual user configurations> `-- sysconfig <some fundimental system config, esp. network, power mgmt, etc> `-- network-scripts <these are the configurable scripts that govern the network> |
Running LILO on an incorrect lilo.conf program can prevent reboot. There are ways to get around and fix the problem, but it's not easy.
Always be careful when editing lilo.conf and/or running LILO.
You can find out more about lilo.conf and LILO as follows:
man -a LILO
man -a lilo.conf
You can find out more about fstab with:
man -a fstab
You can find out more about resolv.conf with the following command:
man -a resolver
http://www.troubleshooters.com:80The preceding url comes in at port 80, which is the default port for http. With my present web host, trying to come in at port 8080 would fail. However, many web hosts put http at 8080, the normal location for web caching services.
Most of the ports in this file are "standards", so it's unwise to change them without a good reason.
You can find out more about the services file with the following command:
man -a servicesThe services listed in /etc/services are available for the inetd daemon as configured by /etc/inetd.conf. Read on...
This file has security implications because it determines what services run, and whether they are wrapped in TCP wrappers (tcpd). For instance, if you start up telnet in inetd.conf (and this is the default), anyone with an account can remotely telnet into your system and act like he's on a terminal. If you're Internet connected, you probably do not want this. If you're an ISP, you for sure don't want this, because instead you'll use secure shell (ssh) so that passwords aren't sent in the clear over the net.
You can read more about inetd.conf with the following commands:
man -a inetd.conf
man -a inetd
less /etc/profile
account:password:UID:GID:GECOS:directory:shell
FIELD | USAGE |
account | This is the username with which the person logs on. |
password | On systems with shadow passwords, which are more secure, this field contains only a single lowercase x, and the actual password is incrypted in a file called /etc/shadow. This is called "shadow passwords". On systems without shadow passwords, this field contains an encrypted version of the password. Any edits to this field can result in a failure to log on. |
UID | A numeric identifier for this account. There's a one to one relationship between account and UID. This is a "primary key" used to relate this file to other files and processes. |
GID | The numeric group identifier of the group serving as this account's primary group. The actual group name and other info about this group is looked up in /etc/group by this number. Note that an account can be a member of several groups, but only the primary group is identified in the passwd file. The other groups which this account is a member is identified in /etc/group. |
GECOS | This is a spare field often containing a the same text as the account field or some other informational text.Linux, UNIX and BSD have no *official* use for this field. |
directory | This is the home directory for this account. For most accounts it's a directory with the same name as the account, but below the /home directory. For the root account it's typically /root. But it can actually be anything. |
shell | This identifies the shell this user will use once authenticated. On Linux systems it's usually /bin/bash, but of course it could be /bin/csh, or any other shell. I have a menu program called UMENU, and in the past I've placed its startup command, /usr/bin/mm x into the shell field, after which when the user logged in he was immediately presented with the menu interface, and when he terminated the menu interface he was logged out. |
To find out more about this file, use the following command:
man 5 passwdThis file can be programatically manipulated by the passwd, useradd, usermod and userdel programs. You can look up any of those programs with the man -a command, followed by the program name.
man 5 shadow
group_name:passwd:GID:user_list
FIELD NAME | PURPOSE | |
group_name | The name of the group, such as users. There's a 1 to 1 correspondence between group_name and GID. | |
passwd | Occasionally a group has a password, although typically only accounts have passwords. If the group has a password, this field has an encrypted string. If the group has no password (which is typical), this field is blank. | |
GID | The numeric identifier of the group. This is a "primary key" used to relate this file to other files and processes. | |
user_list | This is a comma delimited list of every user in this group. Note that
these are listed by account name, *not* UID. An account need not appear
here if the group is listed as that account's primary group in the /etc/passwd
file. In addition to the option of configuration via editor, the user list
can be automatically manipulated by the usermod program with its -G option.
The following puts umenu in the user_list field of the slitt and blitt
groups:
usermod -G slitt,blitt umenuYou can then delete umenu from the slitt and blitt groups' user_list fields with the following command: usermod -G "" umenu
|
For further information on the /etc/group file use the following command:
man -a group
This file has security implications because it determines what services run, and whether they are wrapped in TCP wrappers (tcpd). For instance, if you start up telnet in inetd.conf (and this is the default), anyone with an account can remotely telnet into your system and act like he's on a terminal. If you're Internet connected, you probably do not want this. If you're an ISP, you for sure don't want this, because instead you'll use secure shell (ssh) so that passwords aren't sent in the clear over the net.
This is a very complex file. You can read more about inetd.conf with the following commands:
man -a inetd.conf
man -a inetd
man 5 hosts_access
man 5 hosts_access
You can get a little bit of additional info here:
man -a resolver
Configuring named.conf file isn't simple. For more info:
man -a named.conf
order hosts, bind multi onThe first line says that to resolve a name, first look in the hosts file (/etc/hosts), and then look in bind (DNS). The second line enables multiple IP's for one IP address.
mydesk.domain.cxm
man -a lmhosts
The preceding explanation is only partially accurate and barely scratches the surface. See the man page at
man -a nsswitch.confNEVER change this file without making a backup and documenting the change far and wide.
If you encounter a situation where Linux cannot find a file, nsswitch.conf is one of the many configurations to investigate. I'd personally start by walking the switch path. For instance, look at this line from my nsswitch.conf:
passwd: files nisplus nisSo I'd look at the /etc/passwd before I start looking in the nis systems.
man -a smb.confCaution, the preceding man page is HUGE.
Unfortunately, many folks create their smb.conf files by modifying a
huge example file with voluminous comments and extraneous parameters that
already match the defaults. In my opinion you're much better off starting
from scratch, letting Samba's intelligent defaults do most of the work,
and making a small file.
NOTE: This file's location is dependent on distribution and Samba installation. On Red Hat and Mandrake, it's in /etc. On Caldera it's in /etc/samba.d. On a default source compile, it's located in /usr/local/samba/lib. |
Here are some very common errors and adjustments for the smb.conf file.
If your Windows box keeps asking for a password and rejecting, your [global] section's encrypt passwords= line is probably wrong. Here's the proper setting when Windows clients use encrypted passwords:
encrypt passwords=yesAnd the following is proper if the clients don't use password encryption:
encrypt passwords=noIf some clients use encryption and others don't, you can use include statements to proper handle each, but that's beyond the scope of this article. See Samba Unleashed.
Another common problem is that Windows doesn't see the computer and/or its shares. If that happens, make sure you have the proper [global] section netbios name= and workgroup= settings. The following are appropriate for a Samba server whose host name is "mainserv", serving a Windows workgroup called "MYGROUP":
netbios name=mainserv workgroup=MYGROUPIf your Windows clients still can't see it, and if there are no other Windows servers acting as domain controllers or WINS servers (typically this would require an NT or W2K box, so if you only have Win9x boxes on the LAN you probably have no Windows servers acting as domain controllers or WINS servers), enable WINS service and domain control in the [global] section as follows:
wins support=yes domain master=yes domain logons=yes os level=65 preferred master = yesIf, after adding the preceding, your network starts acting "flaky", back out those changes and start looking for a Windows box that is acting as a WINS server or domain controller.
man 5 smbpasswdThe executable is documented at:
man 5 smbpasswdYou can see them both by:
man -a smbpasswd
NOTE: This file's location is dependent on distribution and Samba installation. On Red Hat and Mandrake, it's in /etc. On Caldera it's in /etc/samba.d. On a default source compile, it's located in /usr/local/samba/private. |
smbpasswd -a slittThe preceding enables typing in a password for user slitt. Be sure to input the same password as user slitt uses on the Windows box. Note that there are other ways to maintain synchronicity between Windows clients and Samba servers, but they're beyond the scope of this article. See "Samba Unleashed".
man -a crontab
man -a termcap
The printcap file uses a fairly complex syntax, so many administrators change this file only with various tools such as Red Hat's printtool. The problem with such tools is they keep voluminous configuration information in comments formatted in a manner proprietary to the tool. That means that editing the file with VI or with a different tool can prevent further change via the tools.
My suggestion: If you're a Red Hat or Mandrake user, use printtool. Then, gradually, learn the syntax of the printcap file and the theory of printing so you're never at the mercy of a tool.
Voluminous documentation on the printcap file, including syntax standards, are here:
man -a printcap
Exploiting the differences amounts to observing differences, and either changing an aspect of the the non-functional system to resemble the working system, or changing an aspect of the working system to resemble the non-functional system. In the former, if the symptom goes away I've found a cause. In the latter case, if the symptom appears in the working system, I've found a cause. From there it's necessary to determine if that cause is the *root cause*, but that's usually easy.
Another use of the temporary distro partition is to exploit the "smartness" of the installation program. With its hardware detection and smart defaults, the installation program points up exactly how to accomplish various configurations.
Installation takes roughly 30 minutes. Most of that time is non-attended, meaning you can do other work on a different box during that time, but it's still time consuming. So every time I use it I document my hard-earned knowledge so I won't need to do this next time. Always be careful when installing distros not to make your system non-bootable. Also be very careful not to format any partitions with existing data. Be *very* careful when using fdisk or any other partition manager.
But they're wrong about one thing. I didn't learn the value of fresh, clean installs in the Windows world. I learned it in 1984, before Windows existed.
I was one of four programmers for a medical management package running on the TSX multiuser environment on top of RT11 running on DEC PDP11's. I want to repeat that a different way. An installation consisted of installing RT11 on a PDP11, then TSX on top of RT11, then our package on top of TSX. Things were not going well.
The boss cornered another programmer and myself in the hall, and reminded us that 90% of the installations were malfunctioning, and we were spending all our time putting out fires, and "if you can't get our software to work, there's not much reason for you to be working here". This was one of the nicest bosses I've ever had, but his meaning was clear enough. I think he told this to me because he knew I was the best Troubleshooter in the organization, and if anyone could solve the problem I could.
And I did. I quickly determined that the problem was that every single installation was a conglomeration of whatever RT11 tape the system rep happened to have, and whatever TSX configurations he or she happened to institute, and a medical management package consisting of whatever versions of various components he or she could beg, borrow or steal from other customers. Can you spell "segfault"?
So I took a snapshot of the latest known good version of our software, OS and environment, and put them all on tape. I duplicated the tapes, gave them to all the system reps, and got management to declare that all installs must be off those tapes, and any malfunctioning systems not originally installed by a standard tape must be installed clean with a standard tape. Problems went down to a manageable level, to the point where almost all problems were user error.
Never underestimate the value of a known state! If you don't know the state of your system, you don't have a well defined system. And if you don't have a well defined system, it is not mathematically certain that you can solve reproducible problems in that system.
As a practical matter, because Linux's configuration overwhelmingly observable in editable text files, its state is always reasonably deducible. There's some pretty good justification for upgrading rather than installing fresh. Nevertheless, I *strongly* recommend taking steps to at least retain the possibility of making fresh installs. That way you can install clean on those rare occasions when you find your system in a hopelessly unknown state.
The most important step is to separate data (stuff you can't buy or download in a convenient fashion) in a separate tree from software. I recommend all important data be under a /d directory, and that everything under /home be considered temporary. The /home directory acquires too much junk and compilation scraps to serve as a good data store.
The second most important thing is to document everything you install on the system. Did you install it with .deb, .rpm, source compile, or other? Where's the installation file now (keep them for fresh installs to the same state). Did you use any special compile options or rpm options? What directory was it installed in? It takes 5 minutes to document such things, and it can save days of fooling around trying to return the system to its state. It can also save many sleepless nights.
Because every single one of my friends recommends upgrade over install, I'll recommend it too. Most of the time. But be sure you leave the door open to install from scratch if your system departs too far from a known state.
find / -type f | grep -iv "^\/d\/" | grep -v "^\/home\/" | grep -v "^\/proc\/" | xargs grep -il 192.168.100 > oldsubnet.txt |
The find command produces a list of all files in the root tree. The first grep statement filters out everything in the /d tree. The second grep statement filters out all files in the /home tree, which can be assumed to be data or temp stuff rather than network config info, and the third grep statement filters out everything in the /proc tree, which obviously doesn't contain config data. The xargs statement searches all remaining files for the string "192.168.100", and reports on any which match.
Here's the list this command produced on my desktop computer, that's
at 192.168.100.10:
/etc/sysconfig/network-scripts/ifcfg-eth0 /etc/hosts /etc/isdn/profile/ippp.default /etc/smb.conf /etc/named.conf /etc/resolv.conf /etc/resolv.conf.lan /etc/resolv.conf.bup /tmp/kfm-cache-502/959437415.5 /var/lib/slocate/slocate.db /var/log/messages /var/log/secure /var/log/httpd/error_log /var/log/httpd/access_log /var/log/httpd/access_log.1 /var/log/httpd/error_log.1 /var/log/samba/log.mydesk /var/log/samba/log.wincli /var/log/samba/log.nmb.1 /var/log/samba/log.nmb.2 /var/log/samba/log.nmb.3 /var/log/samba/log.nmb.4 /var/log/messages.1 /var/log/samba/log.nmb.4 /var/log/messages.1 /var/log/secure.1 /var/log/messages.2 /var/log/secure.2 /var/log/messages.3 /var/log/secure.3 /var/log/messages.4 /var/log/secure.4 /var/lock/samba/STATUS..LCK /var/spool/mail/slitt /var/named/slave.192.168.100 /var/named/slave.domain.cxm /root/auto_inst.cfg.pl /root/.bash_history /root/.vnc/mydesk.domain.cxm:1.log /root/.vnc/mydesk.domain.cxm:2.log /root/.vnc/core /root/messages /root/oldsubnet.txt /usr/doc/squid-2.2.STABLE5/FAQ-19.html /usr/doc/squid-2.2.STABLE5/FAQ.sgml /usr/doc/isdn4net-1.4.1/defaults/ippp.default /usr/doc/isdn4net-1.4.1/samples/ippp/proto/dialin |
The list cofirms what we suspected all along. Everything outside the /etc tree mentioning this subnet is a log, documentation, or core dump. Well, except for the following:
/var/lock/samba/STATUS..LCK /var/spool/mail/slitt /var/named/slave.192.168.100 /var/named/slave.domain.cxm /root/auto_inst.cfg.plThe lock file is probably something that can be deleted. The /var/spool/mail/slitt file is actually a mailbox file, and the IP is mentioned in an email, so it's not applicable. The two in /var/named are part of the DNS system, and *absolutely* must be changed. The perl file looks like it might be part of another configuration tool. It should be backed up in case needed, and possibly changed to accommodate the new subnet.
Note that this command takes several minutes to run on my dual Celeron 450 with 512 meg of RAM, so it's a command you'll want to run just before going to lunch or going home. But it's the best method of finding *every* file that could conceivably affect IP addressing on the system.
Then there are one way tools, like Red Hat's printtool. Printtool always writes legal /etc/printcap files, but it cannot read any old legal printcap file created with an editor. This is because printtool depends on some of its config info being written into comments within the printcap file.
Oneway tools can still be useful, even after the configuration file has been hand-edited. Take printtool for example. Let's say you want to add a network printer with an HP Laserjet IIID filter. Simply rename the existing config file, and run printtool, and create the new printer driver for the IIID. Now rename the newly created printcap, rename the original back, and use VI cut and paste to copy the new printer definition to the existing printcap.
Here are some of the best known configuration tools:
PROGRAM
NAME |
USER
INTERFACE |
DISTRO | COMMAND
TO START |
COMMENTS |
Linuxconf curses | Curses, GUI | Redhat derived | linuxconf | Most system configuration can be done from here |
Drakconf | GUI | Mandrake | DrakConf & | Configure X, resolution, users, security level, startup services, keyboard choice, rpm packages (Kpackage), linuxconf entry point, hardware, network, and printers |
Control Panel | GUI | Redhat derived | control-panel & | This program changes the current state of the operating system. |
Netconf | Curses, GUI | Redhat derived | netconf | Network part of Linuxconf |
Userconf | Curses, GUI | Redhat derived | userconf | User maintenance part of Linuxconf |
FSconf | Curses, GUI | Redhat derived | fsconf | Filesystem part of Linuxconf |
Printtool | GUI | Redhat derived | printtool & | Front end to the /etc/printcap file |
Xconfigurator | Curses | Most distros | Xconfigurator | Front end to /etc/X11/XF86Config |
XF86Setup | GUI | Most distros | XF86Setup | Front end to /etc/X11/XF86Config |
SWAT | Browser | With Samba | Browse to serverIP:901 | Front end to smb.conf |
wmconfig | ? | ? | wmconfig | Create menu system for your window managerf |
lisa | Curses | Caldera | lisa | Old but excellent Caldera config program |
COAS | GUI | Caldera | From menu system | New Caldera configuration system |
kpackage | ? | With KDE | ? | KDE interface to Red Hat rpm files |
dpkg | Curses | Debian | dpkg | Very low level .deb package management |
apt-get | Curses | Debian | apt-get | Higherl level .deb package management, complete with Internet search and sophisticated requirement search |
rpm | Teletype | Caldera, Redhat derived | rpm (lots of options) | Command line .rpm install, upgrade and uninstall utility |
ifconfig | Teletype | Linux | ifconfig (lots of options) | Look at and configure network interfaces |
ipchains | Teletype | Linux | ipchains (lots of options) | Look at and configure IP forwarding, IP masquerading, IP firewalling, and various packet level security. |
route | Teletype | Linux | route (lots of options) | Look at and modify the routing tables. Often included in boot scripts |
comanche | ? | With Apache | ? | GUI front end to /etc/httpd/conf/httpd.conf |
KDE user mount tool | GUI | With KDE | ? | Mounting drives and partitions |
pppconfig | ? | Debian | ? | Configure ppp |
KDE Menu Editor | GUI | With KDE | kmenuedit -caption Menu Editor -icon kmenuedit.xpm -m | Edit the KDE menu |
Nowadays I troubleshoot computers, and I need to measure RAM, CPU, cache, disk usage, network connectivity and packets. Lucky for me, all the tools come on that little Linux install CD I get for $50.00 (or $2.00 from LinuxCentral or CheapBytes).
Most computer measuring tools are software. Once you've gotten past
the Power On Self Test (POST), you can pretty much measure everything with
software.
Every single Linux system has these tools. They're free with Linux. Use them early and often.
Every one of these runs in a teletype interface, meaning they can be accessed on any Linux system, no matter how simplistic.
Here are some quick peeks at your system's hardware and low level system software.
[slitt@mydesk /proc]$ cat /proc/interrupts CPU0 CPU1 0: 403992 404595 IO-APIC-edge timer 1: 21532 21535 IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1 IO-APIC-edge rtc 10: 6831 7129 IO-APIC-level eth0 11: 0 0 IO-APIC-level es1371 12: 8954 9243 IO-APIC-edge PS/2 Mouse 13: 1 0 XT-PIC fpu 14: 77266 69593 IO-APIC-edge ide0 15: 7 2 IO-APIC-edge ide1 NMI: 0 ERR: 0 [slitt@mydesk /proc]$ |
[slitt@mydesk slitt]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 6 model name : Celeron (Mendocino) stepping : 5 cpu MHz : 451.029273 cache size : 128 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 448.92 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 6 model name : Celeron (Mendocino) stepping : 5 cpu MHz : 451.029273 cache size : 128 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 450.56 [slitt@mydesk slitt]$ |
[slitt@mydesk /proc]$ cat /proc/ioports 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0070-007f : rtc 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial(auto) 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial(auto) a400-a43f : es1371 f000-f007 : ide0 f008-f00f : ide1 e083b000-e083b01f : Intel Speedo3 Ethernet [slitt@mydesk /proc]$ |
[slitt@mydesk /proc]$ cat /proc/modules nfs 31960 1 (autoclean) lockd 33736 1 (autoclean) [nfs] sunrpc 57732 1 (autoclean) [nfs lockd] slhc 4408 0 eepro100 14184 1 (autoclean) vfat 11004 0 (unused) fat 33120 0 [vfat] es1371 29412 0 soundcore 4164 4 [es1371] [slitt@mydesk /proc]$ |
[slitt@mydesk slitt]$ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 529518592 103690240 425828352 93466624 9449472 45191168 Swap: 133885952 0 133885952 MemTotal: 517108 kB MemFree: 415848 kB MemShared: 91276 kB Buffers: 9228 kB Cached: 44132 kB BigTotal: 0 kB BigFree: 0 kB SwapTotal: 130748 kB SwapFree: 130748 kB [slitt@mydesk slitt]$ |
The total memory minus the used memory equals the free memory. Of the used memory, the preceding command shows 91276 kB as shared memory, 9228 kB as buffers, and 44132 kB used as disk cache.
The swap total is the size of your swap partition(s).
[slitt@mydesk /proc]$ cat /proc/partitions major minor #blocks name 3 0 20044080 hda 3 1 32098 hda1 3 2 1 hda2 3 5 136521 hda5 3 6 1831378 hda6 3 7 6032376 hda7 3 8 3076416 hda8 3 9 2048256 hda9 3 10 514048 hda10 22 0 1073741823 hdc [slitt@mydesk /proc]$ |
[slitt@mydesk /proc]$ cat /proc/mounts /dev/root / ext2 rw 0 0 /proc /proc proc rw 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda7 /d ext2 rw 0 0 none /dev/pts devpts rw 0 0 mydesk:(pid550) /net nfs rw,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,noac,add r=pid550@mydesk:/net 0 0 [slitt@mydesk /proc]$ |
Top answers questions like this:
6:40am up 1:52, 2 users, load average: 0.00, 0.00, 0.00 64 processes: 62 sleeping, 1 running, 1 zombie, 0 stopped CPU states: 0.0% user, 0.3% system, 0.0% nice, 99.5% idle Mem: 517108K av, 100416K used, 416692K free, 89212K shrd, 9220K buff Swap: 130748K av, 0K used, 130748K free 44080K cached |
[slitt@mydesk slitt]$ free total used free shared buffers cached Mem: 517108 105760 411348 99180 10436 44728 -/+ buffers/cache: 50596 466512 Swap: 130748 0 130748 [slitt@mydesk slitt]$ free -t total used free shared buffers cached Mem: 517108 105780 411328 99180 10436 44728 -/+ buffers/cache: 50616 466492 Swap: 130748 0 130748 Total: 647856 105780 542076 [slitt@mydesk slitt]$ free -to total used free shared buffers cached Mem: 517108 105780 411328 99180 10436 44728 Swap: 130748 0 130748 Total: 647856 105780 542076 [slitt@mydesk slitt]$ |
[slitt@mydesk slitt]$ vmstat procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 0 415668 9220 44092 0 0 1 0 66 145 1 0 99 [slitt@mydesk slitt]$ |
Likewise, you can use the nice command on a tight loop program you create to "gobble" cpu in order to do bottleneck analysis on CPU.
The exact methods are beyond the scope of this magazine, but they are completely documented in Samba Unleashed's chapter 35, "Optimizing Samba Performance". Although that chapter is intended to be Samba specific, it's one of the best documents on Bottleneck Analysis I ever wrote.
[slitt@mydesk slitt]$ df Filesystem Size Used Avail Use% Mounted on /dev/hda8 2.9G 1.5G 1.2G 56% / /dev/hda1 30M 16M 13M 55% /boot /dev/hda7 5.6G 1.1G 4.1G 21% /d [slitt@mydesk slitt]$ |
[slitt@mydesk slitt]$ ping -c3 localhost PING localhost.localdomain (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.1 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.1 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=0.0 ms --- localhost.localdomain ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.0/0.0/0.1 ms [slitt@mydesk slitt]$ ping -c3 192.168.100.10 PING 192.168.100.10 (192.168.100.10): 56 data bytes 64 bytes from 192.168.100.10: icmp_seq=0 ttl=255 time=0.1 ms 64 bytes from 192.168.100.10: icmp_seq=1 ttl=255 time=0.0 ms 64 bytes from 192.168.100.10: icmp_seq=2 ttl=255 time=0.0 ms --- 192.168.100.10 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.0/0.0/0.1 ms [slitt@mydesk slitt]$ ping -c3 192.168.100.1 PING 192.168.100.1 (192.168.100.1): 56 data bytes 64 bytes from 192.168.100.1: icmp_seq=0 ttl=255 time=0.2 ms 64 bytes from 192.168.100.1: icmp_seq=1 ttl=255 time=0.2 ms 64 bytes from 192.168.100.1: icmp_seq=2 ttl=255 time=0.2 ms --- 192.168.100.1 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.2/0.2/0.2 ms [slitt@mydesk slitt]$ |
Here's a simple predefined diagnostic using ping:
[slitt@mydesk slitt]$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:D0:B7:09:2A:58 inet addr:192.168.100.10 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:24440 errors:0 dropped:0 overruns:0 frame:0 TX packets:8551 errors:0 dropped:0 overruns:0 carrier:0 collisions:13 txqueuelen:100 Interrupt:10 Base address:0xb000 eth0:0 Link encap:Ethernet HWaddr 00:D0:B7:09:2A:58 inet addr:192.168.168.200 Bcast:192.168.168.200 Mask:255.255.255.255 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:10 Base address:0xb000 eth0:1 Link encap:Ethernet HWaddr 00:D0:B7:09:2A:58 inet addr:192.168.168.201 Bcast:192.168.168.201 Mask:255.255.255.255 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:10 Base address:0xb000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:1399 errors:0 dropped:0 overruns:0 frame:0 TX packets:1399 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 [slitt@mydesk slitt]$ |
The preceding command shows the network card eth0 at IP address 192.168.100.10, netmask 255.255.255.0, and broadcast address 192.168.100.255. It's up. Also shown is that eth0 supports two "alias" IP addresses, 192.168.168.200 and 192.168.168.201. These are used for serving two different websites. Note also that aliases can be on different subnets, thereby enabling communications to other subnets.
[slitt@mainserv slitt]$ nslookup www.domain.cxm Server: mainserv.domain.cxm Address: 192.168.100.1 Name: mainserv.domain.cxm Address: 192.168.100.1 Aliases: www.domain.cxm [slitt@mainserv slitt]$ nslookup www.domain.cxm localhost Server: localhost Address: 127.0.0.1 Name: mainserv.domain.cxm Address: 192.168.100.1 Aliases: www.domain.cxm [slitt@mainserv slitt]$ |
The dnswalk program probably does not ship with your Linux CD, but it's free on the Internet. The URL is listed in the URL's section of this magazine.
Isn't it nice to know that the tcpdump program on a cheapo Linux box performs many of the functionalities of expensive network analyzers? That's the beauty of Linux.
You can learn more about tcpdump like this:
man -a tcpdump
[ $1 ] || { echo "usage: nettest IPofOtherHostOnSubnet";exit 1; } OTHERHOST=$1 LOCALHOST=127.0.0.1 NIC=$(/sbin/ifconfig -n eth0 | grep inet | sed -e "s/.*inet addr:\(.*\)\s*Bcast. */\1/g") Writing ping tests to pingtest.txt... echo Testing localhost at $LOCALHOST... if ! ping -c2 $LOCALHOST > pingtest.txt; then echo $LOCALHOST failed! See pingtest.txt. exit 1 fi echo Testing eth0 at $NIC... if ! ping -c2 $NIC >> pingtest.txt; then echo $NIC failed! See pingtest.txt. exit 1 fi echo "Testing other host on subnet at $OTHERHOST (argument to this command) ..." if ! ping -c2 $OTHERHOST >> pingtest.txt; then echo $OTHERHOST failed! See pingtest.txt. exit 1 fi cat pingtest.txt |
If it fails on one host for the subnet, see about others.
DO NOT attempt to troubleshoot lack of Samba, telnet, ftp, email, or http connectivity until you can ping. You can't telnet to what you can't ping.
# ipchains -VI have version 1.3.9 installed on my Red Hat 6.2 machine.
# echo "1" > /proc/sys/net/ipv4/ip_forwardOnce that is done, two simple ipchains commands will get you started:
# ipchains -P forward DENY # ipchains -A forward -s 192.168.0.0/24 -j MASQSubstitute your network's IP address and subnet mask for the 192.168.0.0/24 above. The first command simply sets the policy of the forward chain to deny everything, and the second command is an append (-A) that will masquerade any packets with a source (-s) or 192.168.0.0/24 (the local network). You might also want to verify all the current ipchains with the following command:
# ipchains -LThis will list the current chains. There are three chains in the default ipchains setup; input, output and forward. Both input and output need to be configured for an ACCEPT policy (this is not very secure, but it works).
If you have a properly setup network using TCP/IP, the above commands will allow the rest of the network to connect to the Internet through an Internet-connected Linux server. Additional services such as Web browsing will require setup of the DNS entries in the client machines. DNS and DHCP do not need to be running in the Linux server for proper operation of the above. In other words, a simple PING command to the Internet will work.
To get most Web Browser functions operating will require DNS implementation at some level. The simplest way to do this is to include an address of a DNS server in the hosts file of each machine on the network. A typical hosts file might include the following:
127.0.0.1 localhost 192.168.0.2 gateway 204.127.2.1 worldnetThe above file has three entries, the address for the localhost, the address of a machine called gateway, which in this case is the gateway to the Internet, and the address of a DNS server on the Internet (AT&T Worldnet's). When (the) gateway is connected to the Internet, and ipchains is properly running, you should be able to PING each of the machines by address and by name. As a final test, try:
# ping troubleshooters.comIf this is successful, ipchains and DNS are operational. You can then bring up your browser and try it. In MS Windows machines, you may need to set the default gateway under Network Neighborhood -> Properties -> TCP/IP (Ethernet card) -> Properties -> Gateway.
The above information should be enough to get a small (test) network up and running with Internet connectivity using a dial-up, DSL or cable modem connection to the Internet. I have always found that a simple proof-of-concept can provide the necessary momentum to convince those involved to move forward and complete the project. The danger here is leaving the system configured in it's current less-than-secure status.
ipchains -N chainnameor deleted with:
ipchains -D chainnameFor many applications, the three default chains provide sufficient functionality. The figure provides a basic flow through the first stage of ipchains:
______________ ________ .--------. From the | | | | | Forward | To input chain--->| Demasquerade |--->| Sanity |--->| chain |--->next `------.-------' `---.----' \_________/ chain | | | | | | _________V_______ __V___ ____V________ | | | | | | | To output chain | | DENY | | DENY/REJECT | `-----------------' `------' `-------------' |
Packets entering this machine from outside sources such as the local network, through a dial-up connection or through the loopback device are verified to be good through both a checksum and a sanity check. Good packets are allowed through where they are checked in the input chain. To allow or deny a connection based on IP address use the following:
# ipchains -A input -j DENY -p all -s 1.2.3.4/24 -d 0.0.0.0/24In this example, any packet with a source address of 1.2.3.4 (subnet mask 255.255.255.0) will be denied passage through the input chain. Filters can also be setup to block packets based on addresses, port numbers or interfaces for both source and destination.
After the input chain, a demasquerade process is used if the packet is in reply to previously masqueraded packet. If the demasquerade is performed, the packet is routed directly to the output chain. If not, a routing decision is made based on the packet's destination. If the packets are for the local machine, they are sent locally, otherwise they traverse the forward chain and are possibly sent on to the remote destination. Locally created packets go through the routing decision process and are then sent to the output chain as necessary.
Each of the chains are simply a set of rules. Each packet is examined and based on the rules in place, the packet will be accepted, denied or redirected. You can see how many packets have been sent through a chain using:
# ipchains -L chain -vAnd, the following can be used to reset the counters:
# ipchains -Z chainOnce your chains are in place, they can be saved using the save command, and restored using restore. For a very simple setup, the commands can be added to a startup file such as /etc/rc.d/rc.local. Additional information is available through man ipchains and the IPCHAINS-HOWTO. Both of these are included in most distributions. As noted throughout the text, the setup described is rather rudimentary and insecure. It will work reasonably well for a small network that is only connected to the Internet for brief periods. If you wish to have a permanent Internet connections, consider using a pre-configured firewall package or researching the topic further, as setting up a secure firewall is a complex task.
It's entirely too easy to do a better job of tech support than from commercial vendors. As a matter of fact, most commercial tech support borders on pitiful. Look back on your own experience (or your tech employees experience if you're in management). You wait for an hour on hold, during which time you're repeatedly cautioned to "have your credit card ready". You're finally greeted by an obsequious clerk with half your knowledge whose job seems to be to belittle callers. After 5 minutes the guy realizes you really do know more than he does, and he passes the buck (scuse me, escalates) you up to the next level of tech support, probably with another stint on hold, repetition of the symptom description, and another belittling session. This continues until at some point somebody takes your number and tells you the guy who *really* knows the answers will call you.
But you don't receive the call, because they're in a California and you're in Boston, and they call you at 8:30pm your time. You call back at 8am the next morning, but they're still fast asleep. Finally you talk to the real guy, a developer, who assures you its a known problem, will be solved in the next release, and gives you a rough idea how to work around it for now.
Contrast this experience with fixing a Samba problem. You do your best to narrow the scope of the problem. You create a tiny smb.conf file that produces the symptom. You find anything in the log that sheds light. You do a little preliminary troubleshooting. Then you report it on the Samba mailing list, and if you've done your job well in the preliminary work, someone will likely report a solution within a few hours.
But what if it's a bug, as in the previous example of commercial tech support? You've got the source code, and it's much easier than you'd think to fix it in source code. Then you email in the solution as a patch, and *everyone* can use your solution. You're famous!
The difference, in a word, is respect. Respect for the technologist's time. For his or her intelligence. For his or her finances. For his or her humanity. Until you've been on the receiving end of disrespectful help requests, you can't imagine how aggravating some help requests can be.
If any of this applies to you, please take it as the most gentle of criticism. I'm not calling you names, I'm giving you very easy advice that will help skyrocket your effective use of online support.
Respect! It's easy to understand. But it can also be elusive, especially if the person making the help request doesn't understand the situation of the person on the other end of the email. Take me, for example...
Until a year ago, I answered almost every emailed help request received by Troubleshooters.Com. Those days are gone. Today, if the person gives a woefully inadequate symptom description, I trash it. I don't have time for a detailed email interview. Oppositely, if the person gives me a huge oration instead of a concise symptom description, it's trashed. If I had voluminous time to read, I'd read Gone With The Wind. Then there are the guys emailing me long lines of font-filled html email that gets garbled in quoted reply text, requiring a switching back and forth between the original and the reply. No time for that.
It's simple. The technologist on the other end of your email considers himself a fun-loving technologist. Not a slave. Not a psychic. And *certainly* not cheap labor. Sometimes the best way to tell how to do something is to show how not to do it. I therefore present the Open Source Support Dis List.
This is a list of these mistakes, and some internal dialog a harried technologist might think (think, not say) as he or she reads the email. These are phrased rather harshly, but please don't take it personally. The purpose here is to reveal what it's like to get these common mistakes in several of your several hundred emails per day.:
And no little voice in your ear saying "There are only three ahead of you. Your business is very important to us. Please hang on."
The most obvious reason is that Windows malfunctions regularly. For absolutely no reason. The old steam locomotives required a fireman to constantly throw coal into the boiler. In the same way, Windows systems require an administrator to constantly repair strange problems that result from continued operation, memory leak, dll fragments, etc. Contrast that with Linux admins, who are needed only in major configuration changes, incredibly stupid user errors, or natural disasters.
As if the Windows system's flakiness weren't enough reason for a reduction in Windows technologist productivity, there's also the problem with the supply of Windows technologists. Sure, there are a lot of MCSE's (Microsoft Certified System "Engineer"), but many are "paper MCSE's". Inexperienced people who have proven they can pass a test but can't they fix systems. In fact, word on the street is that Linux, UNIX and BSD administrators, with their superior knowledge of networking, are often in a better position to solve Windows problems than the paper MCSE's.
So you can get along with far fewer technologists in a Linux environment, greatly reducing the impact of the Linux expert shortage. But what about that shortage?
But oh, my, what about the training costs for Linux?
Is Windows training so cheap? Every year Microsoft comes out with newer, incompatible technology, each with its own exceptions, workarounds, incantations and rain dances. Every year your technologists need training on changes. Considering that Windows technologists must relearn everything every couple years, it's hard to identify an "investment" in Windows expertise. Turn around, and the "investment" is gone.
Contrast this to Linux, which has as its foundation the thirty year old UNIX system. In Linux, training is an investment, not an ongoing cost center.
Let's take it a step farther. Suppose for a minute that the hordes of paper MCSE's really did represent an "investment". You could quickly and cheaply duplicate that in the Linux world. Find the in-house technologists who are ready, willing and able (chomping at the bit might be a more apt description) to switch to Linux. Give each a hand-me-down desktop and a $2.00 Mandrake CD from Cheapbytes or LinuxCentral, and watch em quickly learn Linux. If you want to speed up the process, get them a copy of my book, "Rapid Learning: Secret Weapon of the Successful Technologist", which describes self-guided learning by experimentation.
But isn't learning by experimentation slower than classroom learning?
No. Classroom learning is necessitated primarily by proprietary technologies costing thousands to get in the game. Linux and most software running on Linux is free. No problem. Additionally, classroom learning is necessitated by technologies for which there's little useful documentation. With certain ERP apps, you must prove you've been using the technology just to get into the user group. With Open Source software, absolutely anybody can join a software project's mailing list, and see the thoughts and decisions of the software's developers.
Using experimentation and other techniques outlined in "Rapid Learning: Secret Weapon of the Successful Technologist", within a day the technologist can achieve the proficiency bestowed by three days of classroom training. Within 3 days he or she can be truly useful to the organization in that technology. With continued use and experimentation he or she will become an acknowledged guru. You didn't spend a dime on classroom courses, airline tickets, parking, or anything else. Oh yeah, you spent $2.00 on a CD and the salvage cost of an old PC.
But what if you're a tiny company and you don't have any in-house technologists to train? Where could you possibly find a Linux expert?
Don't live in Central Florida? All LUGs are equally stocked with Linux expertise. Go on the web and search for a local LUG. Every major city has a LUG. Don't assume the LUG members are out of your price range. Many are so anxious to bail from the legacy Windows system as to moderate their compensation demands. And once again remember that Linux technologists are likely to be much more productive than equivalent Windows talent.
Writing info for Linux "newbies" is vital, because that's how Linux will grow to dominate the market. And once market domination is achieved, hardware vendors will ship Linux drivers with their hardware. And "development environments" will run on Linux. And we'll have a choice os office suites to run on our always-up Linux boxes.
Don't count Microsoft out. Even now, as they slide down the slippery slope toward oblivion, Uncle Billy is up to his old tricks, creating a new programming language called C# with which to build apps of the future. Although Billy's spokesman bills C# as interoperable across platforms, (http://www.computerworld.com/cwi/story/0,1199,NAV47-68-84-88-94_STO46293_TOPWindows,00.html), I predict that it will encapsulate enough Microsoft proprietarisms that it will never see the light of day on non-Windows machines. Now here's the kicker. If Microsoft can create a new version of Office, in C#, before the government busts them up, then it will be very difficult for the new M$ Applications Company to port Office to other platforms. I have no more of a crystal ball than anyone else, but I believe this is Uncle Billy's plan.
We, the Linux User Community, must continue the process of propelling Linux over the top for one more year. If we do everything right, Microsoft's toast. If we stumble (and we haven't stumbled so far), Linux is toast, and Microsoft will survive another couple years until yet another great technology displaces their competitor-clobbering kludge.
Too be blunt, we need to be VERY NICE to Linux newbies and wannabes. "Read the Fantastic Manual" is not an appropriate response, even to a question we know to be gratuitously lazy. We need over 50% of the OS market, which means we need these newbies. We need to let them know our community is the better community. We need to write good documentation for them to follow. And most of all, we must accept them, and not hold their recent Windows citizenship against them.
You may notice the preceding paragraph seems to conflict with the article entitled "Getting Answers Online" earlier in this magazine. That's true. In "Getting Answers Online" I pointedly asked people to respect the people on the mailing list by not asking gratuitously lazy questions. For the next year or so, we need to answer those questions nicely (Linux questions, not Windows questions). And when we get an inadequate or overly wordy symptom description, or failure to encapsulate reproduction environment, or questions about something answered hundreds of times, we need to *nicely* explain to the users how to do the spade work themselves next time.
If, during the next year, we continue to attract interest to Linux, Microsoft will move from an empire to a JAV (Just Another Vendor). They might even die (you heard it here first). You can't long feed a five figure payroll with nothing coming in. If they have something to offer, it's certainly not in the areas of quality or innovation. They may not be able to make the transition to competing on a level playing field. And wouldn't it be nice not to have those guys around any more :-)
So let's all be really nice to the newbies and wannabes. Invite them into our culture and our community. Celebrate our diversity. They bring a lot of corporational correctness to the table. Add that to our ability to create a quality, reliable app, and we've really got something.
So our job the next year or so is to be ambassadors. To the extent that we're good, we'll be highly respected in new era of Software just around the corner. The era when "you can't get fired for buying Open Source".
By submitting content, you give Troubleshooters.Com the non-exclusive, perpetual right to publish it on Troubleshooters.Com or any A3B3 website. Other than that, you retain the copyright and sole right to sell or give it away elsewhere. Troubleshooters.Com will acknowledge you as the author and, if you request, will display your copyright notice and/or a "reprinted by permission of author" notice. Obviously, you must be the copyright holder and must be legally able to grant us this perpetual right. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):