Steve Litt's Perls of Wisdom
The 10% you need -- for 90% of your work
Introduction
Perl is a portable, command line driven, interpreted development environment.
Used in appropriate problem domains, it develops 5 to 20 times faster than
its supposedly RAD competitors. Written properly, the same code will run
identically on Win95, Win NT, and UNIX (all flavors). I've heard it will
also run on Macintosh.
Perl's pretty fast for an interpreter. Sure, its slower than an equivalent
C++ program, but I'd put it up against VB any day.
The Perl development environment is command line driven, with source
files being plain text. I view it as a refreshing change from "GUIs" which
are so easy to develop, but so hard to figure out six months later. Also,
since source is pure text, there's less chance of file corruption trashing
your source code.
Perl has no GUI user interface. It's pure STDIN/STDOUT/STDERR plus UNIX
like file streams. For quick reports, file integrety checking, system administration
or file conversion, this is exactly what you want. If you need a GUI user
interface, simply add your favorite Web Browser and an HTML form.
Perl is perfect for small to medium text based "web apps". It develops
ultra-quick, runs ultra-quick, and can handle small to medium database
requirements. And it's totally portable. To change servers or ISP's, just
FTP the site over and change the line saying where the Perl interpreter
is.
Perl isn't the answer to everything. You certainly wouldn't create a
draw or paint program in Perl. If run-time speed is critical, Perl isn't
the answer. You probably can't directly hit your corporate database with
Perl (although with new support being added all the time, this too may
change). And Perl is "corporationally incorrect". There's no drag and drop.
All source is text. It's free. And it's not made in Redmond.
I like Perl. A lot.
Where Perl Excels
Impossible Deadlines
Assignment:
We need to import each of our hundreds of individual user databases into
our new corporate databases. We need this done in 48 hours.
Perl Solution:
Subroutine 1:
Walk the directory tree containing each individual database. For each directory,
it calls (via system()) a simple database utility or program to export
each table to delimited text, then copies the delimited text records to
single files at the root of the directory.
Subroutine 2:
Walk through the concatenated delimited files, changing field lengths and
positions to match those of the final destination, and changing delimiters
to those used by the new database's import utility. Also checks for any
errors detectable at this point. Note that dups can be detected using sort()
and a simple control-break algorithm.
Subroutine 3:
FTP or otherwise transfer the files to the location required by the new
database's import utility.
Batch File and Shellscript Substitutes
Any time a batch file or shellscript grows beyond 30 statements, or calls
other batch files or shellscripts, or contains moderate to complex program
flow logic, calls GREP or a text editor, or requires parsing of files,
use Perl.
Assignment:
Our nightly network backups intermittently fail. Some nights they fail
because the corporate transfer program is still running. Corporate transfer
program runs anywhere between 10 minutes on an easy night to 4 hours at
the end of the month. We could always set our backup to go off 4 hours
after starting the corporate transfer program, but many nights the /wange
directory has so many files that if we set off the backup that late it
would fail when users get on the system at 7am. Most of the files in the
/wange directory are .wan files, which aren't absolutely critical but should
be backed up if possible. The rest of the files in /wange must be backed
up.
Perl Solution:
Subroutine 1:
Loop, checking whether the corporate transfer program is running, then
sleep(10) minutes. Once corporate transfer is found not to be running,
return with the current time.
Subroutine 2:
Subtract the current time from 7am to determine the time available for
backup. Use opendir(), readdir() and closedir() to determine the number
of files in a total backup, and the number of .wan files in /wange. Do
arithmetic based on average backup time per file to estimate whether the
backup can complete before 7am. If so, return TOTAL_BACKUP. If not, subtract
the number of .wan files in /wange. If it can back up that by 7am, return
EXCLUDE_WANGE. If it still can't back up by 7am, determine whether /wange/*.wan
is greater than 15% of the total. If so, return EXCLUDE_WANGE, otherwise
return TOTAL_BACKUP.
Subroutine 3:
Create a backup inclusion file that, depending on the return of subroutine
2, excludes or includes /wange/*.wan.
Subroutine 4:
Append to a log file. Record a timestamp and the backup command you're
about to invoke. Then start the backup with Perl's system() command. When
the backup finishes, write a timestamp and the return code of the backup
program to the log file.
Parsing
Assignment:
We need to get sales data out of the mainframe and into Excel spreadsheets
(I know, bad idea, but management demands it). The program is 20 years
old and the mainframe programmers don't know how to give you just the data,
so they're giving you a disk image of a report. Don't worry, the report's
very intuitive, with word wrapping on the data fields, and data fields
separated either by spaces, or sometimes one or more tabs. Oh, and also,
spurious JCL commands appear in the output occasionally, so you'll need
to filter those out. Don't worry though, you've got a couple days to complete
this.
Perl Solution :-)
-
Open input (disk report) and output files.
-
while(<INPUT_REPORT>) #pseudo code, not
exact
-
Read next line
-
Use regular expression to filter out JCL.
-
Use regular expression to deduce whether line is new record, or just word
wrap
-
On new record
-
finish old record
-
write old record
-
zero totals
-
start new record
-
Using regular expressions, parse the data from the prose and update all
totals.
-
Close input and output files.
Quick and Dirty Reports, and Error Checking
Assignment:
We need independent confirmation of the timesheet items and hours reported
in the database after import.
Perl Solution:
Subroutine 1: Deduce totals from individual timesheet files:
Use opendir(), readdir() and closedir() to get names of individual timesheet
files. For each file call do_one_file().
Subroutine do_one_file:
Open the file, read and parse it obtaining the number of entries and hours.
Add those to the "input file totals".
Subroutine 2: Deduce totals from intermediate file imported by database:
Open the file, read line by line. Using line counters and regular expressions,
parse for new entries and number of hours. Add those to the "input file
totals".
Subroutine 3: Print the results:
Print the number of entries and hours as reported from the input files,
and as reported from the intermediate file. These should match each other,
and should match the reports from the database.
File Conversions
For a great example of this, see Impossible
Deadlines.
Perl Hints and Landmines
Don't change directories
The Windows version of Perl (the Activeware version, which I like the best)
has a bug which does strange things with directories and paths if you attempt
to change directories. Instead, stay in one place and call all files by
pathnames relative to where you are now, or absolute path names. I don't
know whether UNIX versions also have this bug.
Readdir() bugs on other drives
I've seen an intermittent bug in the Windows version, where readdir() can
report wrong or incomplete file lists in a directory on a drive other than
your current drive. I know of no way around this.
Weak Type Checking
It's real cool to be able to read a string into a variable, then add it
to a floating accumulator or query it for greater or less than PI. Weak
type checking furthers Perl's purpose of extremely rapid development. But
it can push a reproducible compile time error into an intermittent runtime
error, or even worse a wrong result. Just be aware and be careful. If something
should be a number, make sure you test it first with a regular expression.
Variables are Global by Default
This is one of the two minor criticisms I have of this language (the other
concerns object encapsulation issues and is beyond the scope of this document).
Check this out:
sub getName
{
$name = <STDIN>;
chomp($name);
return($name);
}
sub doLabel
{
print "What is your name?==>";
$name = $getName();
print "\nWhat is your spouses name?==>";
$spousename = $getName();
print "\n";
print PRINTERDEVICE "Hi. I'm $name.\n";
print PRINTERDEVICE "My spouse is $spousename.\n";
}
This program prints your spouse's name on *both* lines, because $name is
global, so the second call to getName() would reset global $name to your
spouses name (after you typed it in). To get the program to act as intended,
in subroutine getName() replace this:
$name = <STDIN>;
with this:
my($name) = <STDIN>;
Get in the habit of always using the my() construct on all variables,
unless you really want them to be global (and in anything but a quick and
dirty throwaway program, globals are asking for trouble).
NOTE: I've seen rare and intermittent cases where variables
constructed as my($varname) = expression act funny in loops. If you find
something like that, declare the var on one line and assign it in the next
as follows:
my($name);
$name = <STDIN>;
[ Troubleshooters.com
| Code Corner | Email
Steve Litt ]
Copyright
(C)1998-2000, 2009 by Steve Litt --
Legal