Troubleshooters.Com and Code Corner Present

Steve Litt's Perls of Wisdom
The 10% you need -- for 90% of your work

Copyright (C) 1998-2000, 2009 by Steve Litt

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist, Rapid Learning: Secret Weapon of the Successful Technologist, and Samba Unleashed.

About Perl		Snippets and Tips
Introduction Where Perl Excels Perl Hints and Landmines		Hello World Variables: Scalars, Lists and Hashes Looping and Ifs Subroutines and References Danger: Prototyped Subroutine Quirk File Input, Output and Sorting Simple Perl String and Numeric Manipulation Regular Expressions Spawning Other Programs Perl CGI Web Forms Perl OOP Perl Hash Reference Examples Perl Tk Programming Perl Modules using CPAN A Few Other Handy Snippets Other Perl Experts

NOTE: If you're looking for the CGI source code to display man pages in a browser, it's at:
../../linux/quickhacks.htm#Manpagesformattedashtml.

Introduction

Perl is a portable, command line driven, interpreted development environment. Used in appropriate problem domains, it develops 5 to 20 times faster than its supposedly RAD competitors. Written properly, the same code will run identically on Win95, Win NT, and UNIX (all flavors). I've heard it will also run on Macintosh.

Perl's pretty fast for an interpreter. Sure, its slower than an equivalent C++ program, but I'd put it up against VB any day.

The Perl development environment is command line driven, with source files being plain text. I view it as a refreshing change from "GUIs" which are so easy to develop, but so hard to figure out six months later. Also, since source is pure text, there's less chance of file corruption trashing your source code.

Perl has no GUI user interface. It's pure STDIN/STDOUT/STDERR plus UNIX like file streams. For quick reports, file integrety checking, system administration or file conversion, this is exactly what you want. If you need a GUI user interface, simply add your favorite Web Browser and an HTML form.

Perl is perfect for small to medium text based "web apps". It develops ultra-quick, runs ultra-quick, and can handle small to medium database requirements. And it's totally portable. To change servers or ISP's, just FTP the site over and change the line saying where the Perl interpreter is.

Perl isn't the answer to everything. You certainly wouldn't create a draw or paint program in Perl. If run-time speed is critical, Perl isn't the answer. You probably can't directly hit your corporate database with Perl (although with new support being added all the time, this too may change). And Perl is "corporationally incorrect". There's no drag and drop. All source is text. It's free. And it's not made in Redmond.

I like Perl. A lot.

Where Perl Excels

Impossible Deadlines
Batch File and Shellscript Substitutes
Parsing
Quick and Dirty Reports, and Error Checking
File Conversions

Impossible Deadlines

Assignment:

We need to import each of our hundreds of individual user databases into our new corporate databases. We need this done in 48 hours.

Perl Solution:

Subroutine 1:

Walk the directory tree containing each individual database. For each directory, it calls (via system()) a simple database utility or program to export each table to delimited text, then copies the delimited text records to single files at the root of the directory.

Subroutine 2:

Walk through the concatenated delimited files, changing field lengths and positions to match those of the final destination, and changing delimiters to those used by the new database's import utility. Also checks for any errors detectable at this point. Note that dups can be detected using sort() and a simple control-break algorithm.

Subroutine 3:

FTP or otherwise transfer the files to the location required by the new database's import utility.

Batch File and Shellscript Substitutes

Any time a batch file or shellscript grows beyond 30 statements, or calls other batch files or shellscripts, or contains moderate to complex program flow logic, calls GREP or a text editor, or requires parsing of files, use Perl.

Assignment:

Our nightly network backups intermittently fail. Some nights they fail because the corporate transfer program is still running. Corporate transfer program runs anywhere between 10 minutes on an easy night to 4 hours at the end of the month. We could always set our backup to go off 4 hours after starting the corporate transfer program, but many nights the /wange directory has so many files that if we set off the backup that late it would fail when users get on the system at 7am. Most of the files in the /wange directory are .wan files, which aren't absolutely critical but should be backed up if possible. The rest of the files in /wange must be backed up.

Perl Solution:

Subroutine 1:

Loop, checking whether the corporate transfer program is running, then sleep(10) minutes. Once corporate transfer is found not to be running, return with the current time.

Subroutine 2:

Subtract the current time from 7am to determine the time available for backup. Use opendir(), readdir() and closedir() to determine the number of files in a total backup, and the number of .wan files in /wange. Do arithmetic based on average backup time per file to estimate whether the backup can complete before 7am. If so, return TOTAL_BACKUP. If not, subtract the number of .wan files in /wange. If it can back up that by 7am, return EXCLUDE_WANGE. If it still can't back up by 7am, determine whether /wange/*.wan is greater than 15% of the total. If so, return EXCLUDE_WANGE, otherwise return TOTAL_BACKUP.

Subroutine 3:

Create a backup inclusion file that, depending on the return of subroutine 2, excludes or includes /wange/*.wan.

Subroutine 4:

Append to a log file. Record a timestamp and the backup command you're about to invoke. Then start the backup with Perl's system() command. When the backup finishes, write a timestamp and the return code of the backup program to the log file.

Parsing

Assignment:

We need to get sales data out of the mainframe and into Excel spreadsheets (I know, bad idea, but management demands it). The program is 20 years old and the mainframe programmers don't know how to give you just the data, so they're giving you a disk image of a report. Don't worry, the report's very intuitive, with word wrapping on the data fields, and data fields separated either by spaces, or sometimes one or more tabs. Oh, and also, spurious JCL commands appear in the output occasionally, so you'll need to filter those out. Don't worry though, you've got a couple days to complete this.

Perl Solution :-)

Open input (disk report) and output files.
while(<INPUT_REPORT>) #pseudo code, not exact

Read next line
Use regular expression to filter out JCL.
Use regular expression to deduce whether line is new record, or just word wrap
On new record

finish old record
write old record
zero totals
start new record

Using regular expressions, parse the data from the prose and update all totals.

Close input and output files.

Quick and Dirty Reports, and Error Checking

Assignment:

We need independent confirmation of the timesheet items and hours reported in the database after import.

Perl Solution:

Subroutine 1: Deduce totals from individual timesheet files:

Use opendir(), readdir() and closedir() to get names of individual timesheet files. For each file call do_one_file().

Subroutine do_one_file:

Open the file, read and parse it obtaining the number of entries and hours. Add those to the "input file totals".

Subroutine 2: Deduce totals from intermediate file imported by database:

Open the file, read line by line. Using line counters and regular expressions, parse for new entries and number of hours. Add those to the "input file totals".

Subroutine 3: Print the results:

Print the number of entries and hours as reported from the input files, and as reported from the intermediate file. These should match each other, and should match the reports from the database.

File Conversions

For a great example of this, see Impossible Deadlines.

Perl Hints and Landmines

Don't change directories

The Windows version of Perl (the Activeware version, which I like the best) has a bug which does strange things with directories and paths if you attempt to change directories. Instead, stay in one place and call all files by pathnames relative to where you are now, or absolute path names. I don't know whether UNIX versions also have this bug.

Readdir() bugs on other drives

I've seen an intermittent bug in the Windows version, where readdir() can report wrong or incomplete file lists in a directory on a drive other than your current drive. I know of no way around this.

Weak Type Checking

It's real cool to be able to read a string into a variable, then add it to a floating accumulator or query it for greater or less than PI. Weak type checking furthers Perl's purpose of extremely rapid development. But it can push a reproducible compile time error into an intermittent runtime error, or even worse a wrong result. Just be aware and be careful. If something should be a number, make sure you test it first with a regular expression.

Variables are Global by Default

This is one of the two minor criticisms I have of this language (the other concerns object encapsulation issues and is beyond the scope of this document). Check this out:

sub getName
 {
 $name = <STDIN>;
 chomp($name);
 return($name);
 }

sub doLabel
 {
 print "What is your name?==>";
 $name = $getName();
 print "\nWhat is your spouses name?==>";
 $spousename = $getName();
 print "\n";
 print PRINTERDEVICE "Hi. I'm $name.\n";
 print PRINTERDEVICE "My spouse is $spousename.\n";
 }

This program prints your spouse's name on *both* lines, because $name is global, so the second call to getName() would reset global $name to your spouses name (after you typed it in). To get the program to act as intended, in subroutine getName() replace this:

 $name = <STDIN>;

with this:

 my($name) = <STDIN>;

Get in the habit of always using the my() construct on all variables, unless you really want them to be global (and in anything but a quick and dirty throwaway program, globals are asking for trouble).

NOTE: I've seen rare and intermittent cases where variables constructed as my($varname) = expression act funny in loops. If you find something like that, declare the var on one line and assign it in the next as follows:

 my($name);
 $name = <STDIN>;

[ Troubleshooters.com | Code Corner | Email Steve Litt ]

Troubleshooters.Com and Code Corner Present

Steve Litt's Perls of Wisdom The 10% you need -- for 90% of your work

Copyright (C) 1998-2000, 2009 by Steve Litt

About Perl

Snippets and Tips

Introduction

Where Perl Excels

Impossible Deadlines

Assignment:

Perl Solution:

Subroutine 1:

Subroutine 2:

Subroutine 3:

Batch File and Shellscript Substitutes

Assignment:

Perl Solution:

Subroutine 1:

Subroutine 2:

Subroutine 3:

Subroutine 4:

Parsing

Assignment:

Perl Solution :-)

Quick and Dirty Reports, and Error Checking

Assignment:

Perl Solution:

Subroutine 1: Deduce totals from individual timesheet files:

Subroutine do_one_file:

Subroutine 2: Deduce totals from intermediate file imported by database:

Subroutine 3: Print the results:

File Conversions

Perl Hints and Landmines

Don't change directories

Readdir() bugs on other drives

Weak Type Checking

Variables are Global by Default

NOTE: I've seen rare and intermittent cases where variables constructed as my($varname) = expression act funny in loops. If you find something like that, declare the var on one line and assign it in the next as follows:

Steve Litt's Perls of Wisdom
The 10% you need -- for 90% of your work