Volume 1 Issue 1, August 2002
|
Copyright (C) 2002 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Linux Productivity Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.
Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist and Rapid Learning: Secret Weapon of the Successful Technologist. |
[ Troubleshooters.Com
| Back Issues ]
Peter Drucker, Alan Kay, Thomas Edison and others |
Those four years saw changes here at Troubleshooters.Com. Backup media transitioned from 100MB Zip media to 650MB CDR and CDRW. PKZip was dropped in favor of the Open Source tar and gzip. We switched from Windows to Linux. And last but not least, Troubleshooting Professional Magazine split in two: a new, quarterly Troubleshooting Professional Magazine that's all troubleshooting all the time, and the monthly Linux Productivity Magazine that you're reading now. This is the premier issue of Linux Productivity Magazine.
All these changes notwithstanding, the backup principles espoused in the July 1998 remain conspicuously unchanged. Restorability, be it short term, mid term or long term, is still the top priority. Reliable media is still vital for all backups. Use of ubiquitous, standard media, format and compression are still the key to long term restorability.
The principles remain the same, but in a GNU/Linux and Open Source/free software environment, implementation of those principles have become so much nicer. This month's magazine details that implementation.
This month's material applies mainly to SOHO (Small Office/Home Office) computers and networks because that's what I'm familiar with. Big iron shops with hundreds of users have no choice but to back up to tape, and the tape machines they use cost a fortune. They typically have several system administrators, each of whom has backup as a significant part of his or her job responsibilities. Luckily, big iron shops have the budget to make this work. SOHO environments typically back up with sweat equity, and their small volume of data makes that possible.
The preceding paragraph not withstanding, many of the underlying concepts and principles apply to shops of any size. One thing remains constant in any highly automated business -- if you lose all your data and can't restore it, you're out of business.
So kick back, put your feet up, and contemplate this important aspect
of data security. And remember, if you're a Troubleshooter, this is your
magazine. Enjoy!
Winter | January | |
Spring | April | |
Summer | July | |
Fall | October |
All the Linux, Open Source and free software content has been removed
from Troubleshooting Professional Magazine and placed in a new monthly
magazine, Linux Productivity Magazine. Right now you're reading the August
2002 premier issue of Linux Productivity Magazine.
(January, April, July and October)This is the premier issue of Linux
Productivity Magazine, dated August, 2002.
This split was executed in order to give our Troubleshooting audience exactly what they want, and our Open Source audience exactly what they want. The results of our informal election indicated that readers of both persuasions overwhelmingly preferred the split. Interestingly enough, this was in spite of the fact that a large portion of Troubleshooting Professional's readership were interested both in Troubleshooting and in Open Source content. Such "dual citizens" will now be happy to read both magazines.
What makes a good backup? It varies depending on the purpose and nature of the backup, but the following are a great set of criteria:
The best way to test predictability and trustworthiness of a backup solution (software and hardware) is to repeatedly backup and restore (to a different drive, obviously) a complex backup. Are the results the same every time?
In addition, any good backup system provides a method to confirm backup accuracy. This comes in two flavors, comparision against original and CRC comparison. Each has its advantages, and using Linux, there's nothing to stop you from implementing both.
Good systems provide a log of files that don't match. In such a case productivity can continue, and one can simply compare the list of non-comparing files with the list of files modified during backup and compare.
There are backup systems that compare each file immediately after it's backed up. For the most part that allows productivity to continue.
Other than the above, CRC comparision has a host of advantages.
To be accurate, the backup system must provide the user with a method of choosing, using various criteria, which files and directories to be backed up.
Set up a backup criteria, then repeatedly back up and restore with your present backup setup and the one you're thinking of going to. Are the results the same? If so, you're OK. If not, figure out which one isn't doing its job. Repeat this test with various backup criteria at various times
Millions more backups were once good, but became unrestorable due to age, or new hardware/software environments. You needn't search hard to find stories of 1960's or 1970's source code and data lost because the knowledge of the data's format has been lost. In some cases the information is transcribed from paper copies, while in others it's just lost forever.
The entire purpose of backup it to be able to restore, so restorability is essential. There are three categories of restorability:
In general, magnetic oxide is less reliable than desirable. Oxide can easily become damaged, causing readability destroying "dropouts". Certainly, floppies cannot be trusted with your data. Tape media range from junk to highly reliable. Spectacular manufacturing quality contributes to some of the reliability. Another undoubtedly vital factor in tape reliability is redundancy. The greater the redundancy, and the more intelligently it's implemented, the larger oxide dropouts can be tolerated by deducing what was there. From what I've heard, helical tapes (with spinning heads like a video tape) are more reliable than tape that passes over a static head. Finally, some magnetic oxide media are intrinsically reliable. IOmega Zip disks are an example.
Optical media seem more reliable in general. The 650MB CDR format is certainly an example. If you use reasonable quality blanks, keep CDs covered, and keep them in any kind of reasonable environment, and they'll be byte for byte restorable for years.
I think its safe to say that five years from now proprietary hardware/software combinations WILL NOT RESTORE. You might be able to read the tape, but it's unlikely you'll find software to read that proprietary tape on your newer (and from a different manufacturer) drive. No matter what hardware is involved, you're much better to record the backup in a standard format, like .zip or .tgz or whatever. That way, once you get that single file off the tape or other media, you can work with it on your hard disk in a hardware free environment.
You don't want a proprietary binary file format. My early Zip drive backups are unreadable on my Linux system due to that silly .qic format. To read them I need Microsoft Backup.
Likewise, standard hardware media and format is essential. Over the years there have been floppies that spun ten times as fast, recording ten times the data. There have been backup programs that put 9 sectors per ring instead of 8, just to gain that %12. There have been all sorts of optical disk media formats. How can you read those media now? You don't.
Here's your best bet. Have the hardware and software for the backup completely separate and independent. Both hardware and software should be a ubiquitous industry standard, and both should have achieved that status for several years. Iomega Zip disks and ISO9660 650 MB CDR's are great examples. Even the QIC-80 tapes I was using in 1994 are readable by drives you can buy on Ebay and standard Linux drivers. Stick with media that was a ubiquitous standard for a prolonged period of time.
Likewise, the software format should be a ubiquitous standard. The backup program that shipped with your tape drive likely writes the backup in a format readable only by that same software. Five years from now your OS won't support that software -- what do you do then? The solution is to have the format in a ubiquitous standard. The ZIP format (brought to us by PKWare) is an outstanding example. The tar and gzip formats are age-old UNIX favorites, and will be supported for a long time to come.
Make the backup on the media a simple file created by the backup software. Have the media look like a disk drive which can have files copied to and from it. Keep a copy of the backup software, including the part that restores. Then, even years later, all you need to do is find hardware and drivers to read the drive, copy the file off it, and use the backup software to restore.
Last, but not least, restoring from years ago requires existence of such backups. Backup media are expensive, so there's a natural inclination to re-use them. Strike a balance between re-use and data conservation. It might be something like this:
Not so with my 1996-1998 Zip drive backups. Soon (before my 5 year old Zip drive wears out) I'll transfer those backups to CDR format, so when my Zip drive gives up the ghost my Zip drive backups don't go with it.
As old media and backup formats obsolete, you should transfer them to the latest media and formats while you can still decode them. Fortunately this isn't difficult. Because media sizes double every 2 or 3 years and get cheaper to boot, you can transfer those old backups to more modern media/formats easily and cheaply. And because hard disks keep growing every year, you'll have enough scratch space to assemble new images for those old backups.
When transferring old backups, one or two per year is sufficient. And if your business has a policy of dumping aged data (many people do this to limit discovery in litigation), data older than the maximum saved data age needn't be transferred -- it should be destroyed.
The backup was restorable, and I didn't need it anyway. My Gateway 486-66 took a licking and kept on ticking (thanks Gateway).
But what if it had been more like the 1906 San Francisco earthquake, with a firestorm driving us out with only the clothes on our back? Or a flash flood? Or a home-invasion robbery? Both the backups and the computer might have been gone. That's why offsite backup is so vital.
Not just any offsite backup. Backup out of the region -- out of reach of a regional disaster like a large earthquake, a hurricane or a flood. It's expensive to buy and ship media out of state, so you might want to do it only once or twice a year. Balance the expense against how far you're willing to be "set back" in event of a regional disaster-caused data loss. Here's a possibility:
On the other hand, entirely too many write-only backups have been done in the name of "ease". "One button" backups often don't include the right stuff, and many times don't restore, or restore to the wrong drive, or whatever.
The ideal backup is a fully configurable one whose configuration can be remembered between backups.
Single file | File by file | |
Advantages |
|
|
Disadvantages |
|
|
The simplest possible backup of a directory tree called /d would be as follows:
cp -p -R /d /mnt/zipThe preceding is a file by file archive with no compression and no CRC. Verification is done as follows:
diff -r /d /mnt/zipWith a Zip drive, it may also be possible to write down a separate checksum like this:
md5sum /dev/sda4Obviously that checksum can't be added to the disk, because doing so would change the checksum of the disk...
Device wide checksums aren't reliable on CDs. I've never found a reliable method to read the entirety of a CD device -- no less and no more. The reason is that it errors out toward the end, whether or not all the data (and no more) is read. In other words,
md5sum /dev/cdromproduces different results on different boxes with different CD drives. The following produces slightly more consistent results across boxes, but still has unacceptible variation:
cat /dev/cdrom | md5sumStill more consistent, but nevertheless unacceptibly variable is the following:
mount /dev/cdrom blocksize=`/sbin/blockdev --getss /dev/cdrom` blockcount=`/sbin/blockdev --getsize /dev/cdrom` dd if=/dev/cdrom bs=$blocksize count=$blockcount conv=notrunc,noerror | md5sumPerhaps the most consistent checksum comes when the preceding blocksize and blockcount are replaced with numbers obtained from the .iso that produced the CD, and plugged into the command, it's accurate on most boxes. But not all.
The bottom line is that device based checksums of file by file archives are problematic. A much better method would be to make a series of per-file checksums, and then checksum that series. Or better yet, have per-file and per-directory checksums, making it possible to verify individual files and directories at restore time.
The simple filecopy archive can be compressed by compressing (gzip) each individual file. The easiest way to do this is to copy the directory to a scratch location and then gzip -r new_directory. , after which checksum files can be made and then the tree can be used to create a .iso from which a CD is burned, tree-copied to a Zip disk, or put in a format suitable for taping.
If you're interested in file by file backup but want something more
sophisticated than what's described in this article, take a look at Bryan
Smith's back2cd utility (URL in URL's section).
I have no fear of the major single file disadvantage -- a single byte corruption making the entire backup unreadable. Tests on my oldest CDR backups (December 1998, and early 1999 ISO9660 CDs containing multiple .zip file) show that out of the 13 .zip files checked, all tested perfect -- not a single bit of corruption. Some 1996 .zip file backups recorded on Zip Disks also were 100% perfect. So according to my tests, the risk of losing data is tiny. Combine this with the fact that CDR media is so cheap and so space efficient (if U use paper sleeves), that I have monthly backups going back years. So if one or two fail, there are plenty more.
The other benefit is easy whole-backup checksum verification. The CD contains the archive file, and in addition it contains a file containing the checksum created by md5sum. So 10 years from now I can verify that the entire backup is valid, even if I've since copied it to different disks or media.
Here at Troubleshooters.Com, a script to back up two trees to two separate
compressed archives, enclose those archives in a .iso image file,
and burn a CD from that .iso image, is a simple 45 line shell script. The
script to burn that .iso is a single command. The script to verify
both archives (regardless of the dates in their names) by both the data
comparison and checksum methods, and issue both specific errors and general
go-nogo messages for each archive, is a 72 line shell script.
[root@mydesk root]# du -s -k /d 880704 /d [root@mydesk root]# du -s -k /classic/a 622816 /classic/a [root@mydesk root]# du -s -k /inst 182948 /inst [root@mydesk root]# du -s -k / 11123046 / [root@mydesk root]# du -s -k /usr 2550204 /usr [root@mydesk root]# du -s -k /var 137980 /var [root@mydesk root]# |
It's not that difficult to back up 1.7GB on CDR, especially when less than 650MB consists of fast moving data. Backing up 11.1GB onto CDR is clumsy and impractical, so system backups are almost always done to tape. And most inexpensive tape solutions are not nearly as error free as CDR. Which means if you want a complete system backup, you're more or less forced to use either a very expensive backup method or an iffy one.
There are some gray areas. I consider my copy of Micrografx Windows Draw data. Why? Because Micrografx long ago sold it to an entity who no longer sells it. It's unpurchaseble, and much of my true data (drawings) is readable only in Micrografx Windows Draw. So even though it's just a computer program, I have it backed up every which way from Sunday.
Another gray area is configuration. Certainly I consider all my scripts data. Nobody sells them. Likewise my GPL projects are data for me because I'm the source. For anyone else it would be data. Oppositely, my copy of LyX is not data, because I can download it any place.
Interestingly, the Netscape that came with my Mandrake 8.1 is data. Why? Because Troubleshooters.Com is written and maintained with Netscape Composer. Mandrake stopped packaging Netscape with their distro, and the Mozilla Composer they do package has problems making it useless for maintaining Troubleshooters.Com. As far as I know, Netscape 6x is also unsuitable for maintaining Troubleshooters.Com. So Netscape has become something I cannot purchase.
Backups are insurance. Insurance is meant to protect unbearable losses. If it helps pay for small misfortunes so much the better, but it's essence is to protect against unbearable losses. So if you have a choice between all-inclusive system backup that's not all that reliable, and a data-only backup that's utterly reliable (or better yet several such backups), choose the data backup.
One great way to divide your data is to have a partition for fast changing data mounted at /d, a directory for installation programs that have achieved data status (like Netscape as discussed in the Data Backups vs. System Backups articles earlier in this magazine) mounted at /inst, and static data (work from years ago that's no longer maintained) at /classic/a, /classic/b, classic/c, etc. Each partition is a size such that when compressed it fits on one piece of media. Thus, if you're using 650MB CDR's for backup, /d might be 800MB. When it threatens to overflow, you move some of it (the stuff that you anticipate never changing again) to /classic/a. If that overflows you make a /classic/b. Thus you need back up the directories under /classic maybe twice a year, or whenever you move new data to them -- whichever comes first. Meanwhile, /inst is backed up weekly, with daily backups of the day's changed data.
Every half hour | Copy your current project to a backup directory on a scratch partition. No compression. Consecutively name such copies so you have a poor man's revision history. |
Every day | Incrimental backup for data changed that day. Send to a rewritable CD or to another machine (via NFS) -- anything that won't be destroyed by a disk crash on the backed up machine. If you use CDRW's you need 7 -- one for each day. |
Every week | Full backup of all fast changing data. Back up to CDRW. You need four CDRW's -- one for each week in the month. |
Every month | Permanent full backup of all fast changing data. Back up to a CDR, keep for many years. They're cheap. |
When needed | Permanent full backup of slower changing data. |
The preceding is just one possible example (it resembles what I do). The point to remember is that short term backups are quick and easy, while long term backups are built to survive many years. Many people do the revision backups (what I call half hourly) with a true revision tool like CVS or RCS. Some forego the daily backups and rely on the half hourly backups instead. CDR's are so cheap that some people back up the slow moving data every month just to have complete backup sets.
Half hourly backups should take maybe 10 seconds because they protect only 1/2 hour of work. Daily backups should take maybe 5 minutes. Weekly backups should take maybe 1/2 hour. Monthly and beyond take whatever time is necessary -- they're what keep you from being bombed back to the stone age.
Speaking of time, you can maximize the concurrency of backing up and working with an early archive/data comparison. For instance, if you're burning CDR's, you can tar -dvf the .tgz files immediately after making them and recording their md5sum numbers in a file. Once the tar -dvf are done, you can change data to your heart's content while creating .iso files and then burning the CD's. CD confirmation is done via md5sum checking, knowing that the original md5sum files were created with files that byte for byte data compared with the original data (via tar dvf).
In my opinion, the existence of cheap CDR's make Zip drives redundant for backup. Based on sources researched on the Internet, you can buy 20 blank CDR's for the price of one Zip disk. And you can buy 5 or so CDRW's for the price of one Zip disk, if rewriteability is an issue. The coolest thing about CDR is because they're not rewriteable, you're never tempted to cannibalize your media.
You might consider recording DVD's. They hold up to 7 times as much data. Indeed, if your data backups are several GB this is the only practical way. But DVD writers are in the neighborhood of $500.00, as opposed to $79.00 CDR drives. And DVD media are in the neighborhood of $6.00 -- over twice the price per GB of CDR media. Consider also that DVD isn't yet as standard or ubiquitous as ISO9660. It's likely the day will come that DVD media is the media of choice, but for most people and small businesses today, CDR is still the better way to go.
Tape is great because it holds huge amounts of data, and the cost per
GB is very low, and it's re-recordable. But my past experience tells me
that many tape formats fall far short of the reliability exhibited by CDR
and Zip Disk media. I have no data on how they degrade over time, but given
their thinness, flexibility, and reliance on oxides, I'd be cautious about
depending on tapes for long term backup. Some tapes are highly reliable,
but from what I've heard those are the very expensive ones. One can also
use a compression format with huge amounts of redundancy, so data errors
can be corrected. Personally, I would use tape for system backups, but
unless I had no other choice I'd stay clear of tape for backing up valuable
data.
As mentioned previously in this magazine, the two types of verification are data comparison and CRC. A successful data comparison is proof positive that the .tgz file accurately mirrors the disk data it contains, at the moment of the data comparison. A successful CRC comparison is proof positive that the .tgz file is identical to the file originally created, and is excellent for proving accuracy of older backups. I use both.
The data comparison is problematic because you cannot alter any data between the beginning of the tar command and the conclusion of the data comparison. This disadvantage is minimized by data-comparing the .tgz immediately after a CRC is created for it, but before it is rolled into a .iso and before it's burned. Once the original .tgz is proved good with data comparison, work can proceed. Once the CD is burned, a CRC comparison will prove that the .tgz on the CD is identical to the .tgz first created.
To minimize the time that work must halt, the fast-moving data should be done first, and a clear indication should be given that it's safe to proceed. So the procedure is like this:
1. ISO Creation Script |
#!/bin/sh yymmdd=$1 # isomountpoint=/mnt/iso tgz=/tmp/tgz dtgz="$tgz/d$yymmdd.tgz" itgz="$tgz/i$yymmdd.tgz" catgz="$tgz/ca$yymmdd.tgz" dmd5="$tgz/d$yymmdd.md5" imd5="$tgz/i$yymmdd.md5" cmd5="$tgz/ca$yymmdd.md5" dstatus=GOOD istatus=GOOD castatus=GOOD #REMOVE LAST WEEK'S FILES rm -f $tgz/*.tgz rm -f $tgz/*.iso rm -f $tgz/*.md5 #BACK UP /d cd /d pwd tar czvf $dtgz /d md5sum $dtgz > $dmd5 #DIFF /d cd / echo -n "Diffing $dtgz, please wait... " dtgzdiff=`tar dzf $dtgz` echo "$dtgzdiff diffed." if (! test "$dtgzdiff" = ""); then dstatus=BAD echo echo echo ERROR: COMPARE MISSMATCH $dtgz echo echo echo -n "Press enter to continue==>" read tempp fi #TELL USER HE CAN NOW WORK echo YOU CAN NOW WORK ON FILES IN THE /d DIRECTORIES sleep 10 #BACK UP /inst cd /inst pwd tar czvf $itgz /inst md5sum $itgz > $imd5 #DIFF /inst cd / echo -n "Diffing $itgz, please wait... " itgzdiff=`tar dzf $itgz` echo "$itgzdiff diffed." if (! test "$itgzdiff" = ""); then istatus=BAD echo echo echo ERROR: COMPARE MISSMATCH $itgz echo echo echo -n "Press enter to continue==>" read tempp fi echo "d status=$dstatus" echo "i status=$istatus" #MAKE MAIN BACKUP ISO cd $tgz mkisofs -pad -o $mainiso $dtgz $itgz $dmd5 $imd5 #BACK UP /classic/a cd /classic/a pwd tar czvf "$catgz" /classic/a md5sum $catgz > $camd5 #DIFF CLASSIC BACKUP cd / echo -n "Diffing $catgz, please wait... " catgzdiff=`tar dzf $catgz` echo "$catgzdiff diffed." if (! test "$catgzdiff" = ""); then castatus=BAD echo echo echo ERROR: COMPARE MISSMATCH $catgz echo echo echo -n "Press enter to continue==>" read tempp fi #MAKE CLASSIC BACKUP ISO cd $tgz mkisofs -pad -o $classiso $catgz $camd5 echo "d status=$dstatus" echo "i status=$istatus" echo "ca status=$istatus" echo If all is well, burn the CDs |
2. CD Burning Script, CDR |
cdrecord dev=0,0,0 blank=fast speed=12 -v -eject /tmp/tgz/mainbup.iso |
3. CD Burning Script CDRW |
cdrecord dev=0,0,0 speed=16 -v -eject /tmp/tgz/mainbup.iso |
4. Checksum Verification Script, /d and /inst |
DEVICE=/dev/cdrom MOUNTPOINT=/mnt/cdrom mount $DEVICE dstatus=GOOD istatus=GOOD dtgz=`ls $MOUNTPOINT/d*.tgz` dmd5=`ls $MOUNTPOINT/d*.md5` itgz=`ls $MOUNTPOINT/i*.tgz` imd5=`ls $MOUNTPOINT/i*.md5` echo -n "Comparing Checksums for $dtgz, please wait... " dmd5val=`cut -d " " -f 1 $dmd5` dtgzval=`md5sum $dtgz | cut -d " " -f 1` echo " finished." echo "$dmd5 value==>$dmd5val<==" echo "$dtgz value==>$dtgzval<==" if (! test "$dmd5val" = "$dtgzval"); then dstatus=BAD echo echo echo ERROR: MD5SUM MISMATCH: $dtgz! echo echo fi echo echo -n "Comparing Checksums for $itgz, please wait... " imd5val=`cut -d " " -f 1 $imd5` itgzval=`md5sum $itgz | cut -d " " -f 1` echo " finished." echo "$imd5 value==>$imd5val<==" echo "$itgz value==>$itgzval<==" if (! test "$imd5val" = "$itgzval"); then istatus=BAD echo echo echo ERROR: MD5SUM MISMATCH: $itgz! echo echo fi echo "d status=$dstatus" echo "i status=$istatus" |
5. Checksum Verification Script, /classic/a |
DEVICE=/dev/cdrom MOUNTPOINT=/mnt/cdrom mount $DEVICE cstatus=GOOD ctgz=`ls $MOUNTPOINT/ca*.tgz` cmd5=`ls $MOUNTPOINT/ca*.md5` echo -n "Comparing Checksums for $ctgz, please wait... " cmd5val=`cut -d " " -f 1 $cmd5` ctgzval=`md5sum $ctgz | cut -d " " -f 1` echo " finished." echo "$cmd5 value==>$cmd5val<==" echo "$ctgz value==>$ctgzval<==" if (! test "$cmd5val" = "$ctgzval"); then cstatus=BAD echo echo echo ERROR: MD5SUM MISMATCH: $ctgz! echo echo fi echo echo "ca status=$cstatus" |
If the law passes, depending on how it's implemented both legally and technologically, and if it's judged constitutional, it's likely that backups as we know them will be impossible. You might not be able to copy data from the hard disk to the CD unless such data includes special copy protection codes. It's very unlikely that you would be able to restore data from a backup CD to a new hard disk in the event of a hard disk. It's possible that even the UNIX/Linux/BSD cp and dd commands will be illegal.
Of course, vendors will be more than happy to provide copy protection enabled backup media and software. But without competitive forces from Open Source software, what's their incentive to provide reliable products at a good price? The days of using free software to back up to a 35 cent CD on a $79.00 drive will be gone.
Write your senator and congress people, and let them know you won't take kindly to legislators voting in favor of CBDTPA.
My first supposition is that this was simply one more case of Microsoft paying writers to pretend to be grassroots Windows fans (two such instances are documented in the URL's section of this magazine). However, when Tony responded to my email, it was clear he was for real. However, a careful reading of his response made me suspect that the problems mentioned in the preceding paragraph were not the root cause of his switching, and that the bottom line reason was simply time.
Tony tweaked. He tweaked a lot. He tried different distros. He bought hardware not especially supported by the current Linux. He joined a LUG and even became a board member. He started an IRC channel for a games group, and created SuSe packages for several free software projects. And somewhere in all of this, time became scarce. If I read between the lines, he gave up Linux for the same reason I don't load fun games -- I'd spend too much time using them.
Most Linux people have the exact same difficulty. Linux is just so darn tweakable. I spend an hour a day or more revamping scripts, enhancing my menu system, and doing other stuff to make my work go faster. But is it worth it, or am I just hacking for fun and letting family and business stagnate?
I look at it like this: In 1990 my DOS computer was tweaked to the max, with a menu system and scripts to do everything I did on a regular basis. It was a lean, mean production machine. Then came Windows, and I spent the next 10 years pointing and clicking and spending lots of time on the most mundane, redundant tasks. I got very productive at such pointing and clicking, to the point I almost forgot how nice it was to run a script rather than point and click ten times through a procedure. By the time I switched to Linux, my work was very inefficient. And I've spent the last year catching up.
But the bottom line is this: If Linux makes you spend too much time, perhaps Windows is an alternative. Without the possibility of major tweaking, maybe you'll get down to business, however cumbersome that business may be. If so, here are my recommendations for the migration:
Is the problem Linux's ugly fonts? There's no doubt that a standard Windows machine has nicer fonts than a standard Linux machine. However, consider that the appearance of Linux fonts can easily be improved to the point where they're quite readable and attractive on the screen, and downright beautiful on paper. Perhaps you long for the hundreds of fonts available in Windows. If so, keep in mind that all authorities on style agree that a document should use very few different typefaces, or else it will look like a kindergartner's scrapbook.
Is your motivation the need to spend too much time tweaking Linux? If so, ask yourself whether your problem might be solved by picking a distro -- any distro, and sticking with it, learning the GUI environment the same way you learned Windows in the old days.
Whatever your motivation, examine it carefully before taking a step that could take away your freedom to access your data and take away the fact that your computer is your castle.
You've become accustomed to a high level of security. Of using a normal login and not needing root unless you're really adjusting the system. Of seeing email viruses bounce harmlessly off your system. Once you switch to Windows, those days are gone.
Today you think nothing of grabbing an installer disk and installing your operating system. Do you really want to give that up? If you switch to Windows XP you have forced registration. And if your hardware changes, you must beg Bill Gates to let you reinstall your bought and paid for software on the new hardware.
Have you ever stopped to think how much software you get on that $10.00 distribution? Without going to the store, without downloading on the net, without any effort at all, you get many programming languages (C, C++, Java, Perl, Python, Ruby). You get several word processing programs (KWord, Abiword, LyX and very possibly OpenOffice). Spreadsheets include the versatile, powerful and robust Gnumeric, Kspread, and often the OpenOffice spreadsheet program. You get the hugely powerful Gimp drawing program -- a program compared by many to Photoshop. You get presentation programs, and many other different types of office utilities. Your choice of the Mozilla, Galeon or Konqueror web browsers. Many different email clients, including the user friendly and virus impervious Kmail. Countless text editors, including Vim, an editor as powerful as those selling for $300.00 in the Windows world. And Emacs, which surpasses the power of all those editors.
Speaking of money, you've probably become accustomed to spending nothing for software. Prepare for that to end. My tax records indicate my 2001 software purchase expense to be 0. Nada. Zip. I think I'll write a check to the Mandrake folks just to let them know how much I appreciate everything they've given me. I think back to the mid 90's, when my software expenses were in the four figure category most years. Going back to Windows will cost serious money.
Say good bye to data portability. You've probably come to take for granted the text format of LyX, or the XML format of OpenOffice, dia, and many other programs. No matter what kind of data conversion you want to do, if worse comes to worst you can always write scripts to parse and build these files. Not pleasant, and if you're not a programmer you'll need to hire someone to do it, but it's doable. When you move to proprietary apps, the data is in unfathomable binary format, often legally "protected" by anti-reverse-engineering license language or even software patents.
And that brings up the most basic privelege you'll be giving up. You'll be giving up ownership of your data. To the extent that you use proprietary software, you rent your own data in the form of software license fees and software maintenance fees. And this landlord has a nastier legal language in his lease agreement than the worst slumlord, and you needn't even sign the lease before buying, nor can you even look at the lease before buying. Before making the switch, ask yourself "Who owns my data?".
The minute you put your data in Microsoft Word, Bill Gates owns you. He can make it difficult or impossible to migrate your Word document to another format (LyX, let's say). If you upgrade to a new version, it's that much tougher. I've heard stories of people whose Word docs were unreadable by MS Word on a Mac, or even on a PC if it's an older version of Word. Bill Gates leads you around on a leash.
This problem repeats itself with every piece of software you use. Proprietary software is usually unfathomable binary for the simple reason that they want to make it as tough as possible to migrate away from their product. Once it's hard enough, they can ram any price down your throat, and you're helpless.
Application lock in is nothing compared to what Bill Gates plans for you on the Internet. With Passport, .net, applications, Internet Explorer and Windows all thoroughly entangled, you'll soon be sucked into the ultimate tar baby.
The solution, of course, is to use Open Source software on Windows. When you write books, use LyX, not MS Word. For an office suite, use Open Office, not MS Office. For finances, use GnuCash, not Quicken. Use Mozilla or Netscape, not Internet Explorer, to browse the net. Use a browser for email instead of the Outlook virus machine. Use ActiveState's implementation of free software Python or Perl instead of VB, and if you need a C/C++ compiler, take the trouble to install the GNU compiler and various .dll and header files to make it write native Windows code.
Understand that if, in the future, you want to migrate away from Windows, vendor lock-in will make it extremely difficult to do so. Unless you make plans to prevent Vendor lock-in, you are writing yourself a one way ticket to Windows land.
You see, the minute you switch to Windows, Windows will begin playing silly tricks with the case of your filenames (Myfile.JPG, etc), and the permissions and attributes of your files. Back converting will be a huge hassle. Also, you want to back up before changing text data to the Windows form of line termination.
Next, change all text files to DOS format, with line termination
crlf instead of just lf. The April 2001 Troubleshooting Professional Magazine
has scripts to go the other way (Windows to Linux), so simply change the
scripts accordingly. It's important to do ALL this work on the Linux side,
because all those handy little techniques piping find into grep
into conversion scripts just don't work as well under Windows. Be sure
this work does not change the file's modification date. Using the touch
command in the script can do that for you. Once again, see the April 2001
Troubleshooting Professional Magazine.
Next, on the Windows box use PKZip or WinZip to archive only those data files that have changed since your transition. This gives you a second tree -- an incremental tree, you could say. FTP that .zip file over to the new Linux box, restore it in an alternative directory, and then devise a script to move restore proper UNIX filename case, proper UNIX line termination, and then move the files to their original home.
From there, the person needs to follow a procedure to make sure Windows can serve him well, and make sure there's no vendor lockin. Finally, don't burn your bridges. Get an iron clad backup before beginning your transition, and have a plan to go back if things don't work out as well in Windows as you thought they would.
Any article submitted to Linux Productivity Magazine must be licensed with the Open Publication License, which you can view at http://opencontent.org/openpub/. At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Linux Productivity Magazine, you must decline the option to prohibit commercial use, because Linux Productivity Magazine is a commercial publication.
Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):
Copyright (c) 2001 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version Draft v1.0, 8 June 1999 (Available at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version is presently available at http://www.opencontent.org/openpub/).
Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.
After that paragraph, write the title, text of the article, and a two sentence description of the author.