Troubleshooters.Com Presents

Linux Productivity Magazine

Volume 3 Issue 7, July 2004

Node.pm

Copyright (C) 2004 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Linux Productivity Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

Have Steve Litt write you a quick application!

See also Troubleshooting Techniques of the Successful Technologist
and Rapid Learning: Secret Weapon of the Successful Technologist
by Steve Litt

[ Troubleshooters.Com | Back Issues |Troubleshooting Professional Magazine ]


Man is a tool-using animal.... Without tools he is nothing, with tools he is all. -- Thomas Carlyle

CONTENTS

Editor's Desk

Help Publicize Linux Productivity Magazine

GNU/Linux, open source and free software

Node.pm Installation

Node.pm Hello World

Parsing an Outline

Outputting Markup

Using Attributes

Accessing Nodes by Level

Deleting Nodes

Inserting Nodes

Determining the Node.pm Path

The Node.pm API

The Outline to HTML Outline program

Life After Windows: High Quality Tools

Letters to the Editor

How to Submit an Article

Trademarks

URLs Mentioned in this Issue

Editor's Desk

By Steve Litt

The original EMDL to UMENU converter was the worst code I ever wrote. Totally OOP and totally inscrutable, it consistently defied maintenance.

That's not to say it didn't work. It worked beautifully. I'd run it on my 2200 line EMDL (Easy Menu Definition Language) menu tree, and 15 seconds later I'd have a perfectly conforming set of UMENU menu definition files, ready to use with my UMENU system.

What is UMENU?

UMENU is a keyboard driven menu program that accepts user keystrokes, and in response invokes either submenus or programs. It runs on any Linux or Unix computer, and can substitute for the desktop manager's menu. It can also serve as the front end for commands with numerous options.

Crafted for the touch typist, only a single keystroke is required to trigger a submenu or program. It is not necessary to press the Enter key.

UMENU runs from a group of menu definition files, all ending in .mnu, each of which defines a single submenu and everything on it. The reason it was decided to have one file run one menu, instead of a single file defining the entire menu hierarchy, is because small files load fast relative to human keystrokes, so the user perceives no delay.

The trouble with the file per submenu paradigm is that it's cumbersome to author. To eliminate that disadvantage, EMDL (Easy Menu Definition Language), and an EMDL to UMENU converter, were created. Now an entire menu system can be created and maintained as a single tab indented outline file, which can then be compiled into performance enhancing UMENU files.

The only problem was, I wanted to make the EMDL parser generic, outputting not only UMENU menu definition files, but also IceWM menu files and whatever other menus were needed. But UMENU is based on many peer files, one for each menu, while IceWM menus are based on a single hierarchical file. There was no single place within the old EMDL parser to connect the IceWM Writer object. It would have been necessary to place hooks in several different places. Given the fact that 6 hours of study was required before making even the slightest modification to this program, adding IceWM capabilities would have made this program collapse under its own complexity.

The EMDL to UMENU parser was written in March 2002. Within 6 months I realized that adding an IceWM menu writer would be impossible. I put it aside.

-*-*-*-

A year before writing the EMDL to UMENU converter, my big project was a massive Troubleshooting Professional Magazine exploration of XML, including the DOM (Document Object Model) spec. There are many ways to describe DOM, but perhaps the most basic is that a DOM tree consists of a tree of nodes that can be navigated by functions defined in the DOM spec. Researching DOM, I found the following functions:

getFirstChild()
getLastChild()
getNextSibling()
getPreviousSibling()

Why, I asked myself, would they have getNextSibling() and getPreviousSibling()? Instead of navigating siblings, why not have getNextChild() and getPreviousChild()? If you're going to recurse through a tree, that's the way it's normally done. These sibling-centric are useful only if one continually changes his reference point -- sort of like moving a checker (from a game of checkers) around the node tree. Why would one do that?

Then it hit me like a ton of bricks. By moving the reference point around the node tree, one can navigate it using iteration instead of recursion:

#                     TOP
#                      |
#       .--------------+--------------.
#       |              |              |
#       N              N              N
#       |              |              |
#   .---+---.      .---+---.      .---+---.
#   |   |   |      |   |   |      |   |   |
#   N   N   N      N   N   N      N   N   N
#           |
#         .-+-.  
#         |   | 
#         N   N


while(1)
	{
	if ascending
		if back at top node 
			quit
	else
		perform action on this node

	if you can go down
		go down
	else if you can go right
		go right
	else
		go up
	}

The preceding algorithm is conceptually valuable, but to make it workable, when you test if you can go down, you must make sure you've never been at this node before, or you'll infinite-loop. In practice, you'd try to go down only if you are not ascending.

The XML issue of Troubleshooting Professional Magazine was a hit, I converted my business to Linux, and time moved on...

-*-*-*-

I finished "Troubleshooting Techniques of the Successful Technologist" in December 2001. It had been authored in LyX, after being outlined in VimOutliner. I had created scripts to display a subtree at a time, display an entire outline to a specific level, convert an outline to pages, and number the outline. As time went on I realized all these scripts had something in common -- their data was a hierarchy (tree). Remembering the DOM spec, I created a Node object to represent each piece of data.

Over time, I migrated the Node object to its own .pm file (Node.pm). Over time, I added an object to parse a tab indented outline (OutlineParser) and an object to walk and manipulate a tree of Node objects (Walker). Over time, Node.pm was debugged and became more robust. Time marched on...

-*-*-*-

I had little time to spare, but at the monthly installfests in June, July and August 2003, I rewrote the EMDL to UMENU converter, from scratch, using Node.pm. Doing so revealed several subtle bugs in Node.pm, which I fixed. Finally, regression tests proved that for a given small EMDL input, my new program produced the same UMENU files as the old program.

Now it was time to test with my 2200 line personal EMDL file, and I was afraid. The new algorithm repeatedly walked the Node tree, tweaking one little property at a time, instead of walking it once or twice and tweaking everything at once. The single property tweak algorithm made for a much simpler program, but at what cost to speed?

The old program took 15 seconds to run against my 2200 line personal EMDL file -- barely tolerable. If the new algorithm took much longer, it would be useless. I crossed my fingers and ran the program.

It took 2 seconds!

I danced around the room, rejoicing the 7.5x speed increase bestowed by Node.pm. In hindsight it was predictable. Node.pm keeps all data in memory, linked by very simple pointers (references actually). Methods primarily return such pointers. The application algorithm might have been suboptimal, but it was built on a very efficient Node.pm.

By early August 2003, Node.pm was a rock solid tool, and I had become expert with the tool by crafting the new EMDL to UMENU parser. I fully understood the power of Node.pm, but wasn't yet cognizant of either its ease of use or its wide applicability. That would come soon enough...

-*-*-*-

Between 1997 and 2003 I did OOP for OOP's sake. You can't blame me -- I was just mirroring what was going on in the IT world at large. The old "Structured paradigm" was considered obsolete -- the domain of old programmers who couldn't adapt.

As the new century progressed, I became increasingly disillusioned with OOP for OOP's sake. OOP's scalability advantages were greatly overrated. Unless done brilliantly, pure OOP produced code almost as unfathomable as the spaghetti code produced by our goto ancestors. OOP adherants had many design methodologies, most of which were more intuition than method. I began to feel there's got to be a better way.

In October 2003 there was a marathon thread on the VimOutliner list and some other lists concerning programming productivity. One thing led to another, and Noel Henson recommended using UNIX pipes to connect all modules, citing the \RDB RAD tool as an example. I tried it on a domain reporting program, and the results were horrible, slow and complex.

But the discussion led me to an understanding:

It's the Data, Stupid!

The more logic that can be moved out of procedural code and into data, the more maintainable the program. The more the program's data matches the underlying problem domain, the simpler and more elegant the program.

A long time ago I started every design with functional decomposition. In the late 1990's and early 2000's I started by asking: "if the program is considered a machine, what parts would it have?". Now I started every design with the question: "What is (are) the underlying data structure(s)?". Often the answer is "a hierarchy", or "a stack" or "a short list of entities with properties". Node.pm is a natural for hierarchies, stacks and lists with entities. I began using it more often.

In December 2003 I decided that UMENU needed a rewrite...

-*-*-*-

The UMENU code wasn't as bad as the EMDL to UMENU converter. But it wasn't good.

The code was written in 2 days, with an OOP design that sounded great in concept, but produced complexity upon coding. It sported four objects types:

CHOICE <pertains to a menu choice>
CONFIG <stores and handles configuration info>
OS <stores and handles operating system dependencies>
UMENUGLOBAL <repository for global variables>

It was ugly. CHOICE.pm was over 196 lines non-blank, non-comment, with CONFIG.pm at 84 lines, and OS.pm weighing in at around 120 lines. The main program, ./umenu.pl, was a whopping 400 lines. The objects called each other -- it was a mess.

The new program replaces each of the four objects with Node trees, so that the new program consists only of a main program (umenu.pl) and the Node.pm tool. Instead of needing to understand four different object, the maintenance programmer now needs to understand only the fully documented and debugged Node object. It's likely the programmer already has experience with Node objects, either from EMDL or elsewhere.

Without all the back and forth of objects calling objects calling objects, the program is much simpler to understand. At 649 non-blank, non-comment lines, umenu.pl isn't small, but it's much more straightforward and maintainable. And I've learned some valuable lessons...

-*-*-*-

Objects are wonderful as data wrappers and as emulators for physical objects. In other roles, objects are not ideal, and their use often becomes contrived and convoluted.

If all you have is a hammer, everything looks like a nail. Many recent programs have been written all OOP all the time, with lousy readability, maintainability, and yes, reusability. Programmers need tools other than OOP. Sometimes we need functional decomposition. Sometimes we need data analysis. Obviously, before any of this, we need to analyze the system being automated.

Occasionally we need to make tools. Tools should be widely applicable, reusable, easy and fast. Node.pm is such a tool. It's applicable any time hierarchical information needs to be manipulated. Its interface is simple, generic and complete, making it amazingly reusable with absolutely no modification. Its ability to granularize data manipulation makes it easy and fast to write and maintain. And the fact that it's built from simple pointers gives it fast runtime performance.

This month's Linux Productivity Magazine is devoted to the Node.pm tool. If you're a Perl developer, or a Linux or free software user, this is your magazine. Enjoy!

Steve Litt is the author of Samba Unleashed. Steve can be reached at his email address.

Help Publicize Linux Productivity Magazine

By Steve Litt

Loyal readers, I need your help.

For months I've publicized Linux Productivity Magazine, expanding it from a new magazine to a mainstay read by thousands. There's a limit to what I can do alone, but if you take one minute to help, the possibilities are boundless.

If you like this magazine, please report it to one of the Linux magazines. Tell them the URL, why you like it, and ask them to link to it.

I report it to them, but they don't take it very seriously when an author blows his own horn. When a hundred readers report the magazine, they'll sit up and take notice.

Reporting is simple enough. Just click on one of these links, and report the magazine. It will take less than 5 minutes.

News Mag	Submission URL or Email address	Comments
Slashdot	http://slashdot.org/submit.pl	Just fill in the short form.
LinuxToday	http://linuxtoday.com/contribute.php3	Just fill in the short form.
Linux Weekly News	lwn@lwn.net	Just tell them the URL, why you like it.
NewsForge	webmaster@linux.com	Just tell them the URL, why you like it.
VarLinux	http://www.varlinux.org/vlox/html/modules/news/submit.php	Just fill in the short form.
LinuxInsider.Com, Newsfactor Network	http://www.newsfactor.com/perl/contact_form.pl?to=contact	Just tell them the URL, why you like it.
The Linux Knowledge Portal	webmaster@linux-knowledge-portal.org	Just tell them the URL, why you like it.
OS News	http://www.osnews.com/submit.php	Just tell them the URL, why you like it.
DesktopLinux	http://www.desktoplinux.com/cgi-bin/news_post.cgi	Only for LPM issues involving the Linux desktop, not for programming or server issues.

If you really like this magazine, please take 5 minutes to help bring it to a wider audience. Submit it to one of the preceding sites.

Steve Litt is the founder and acting president of Greater Orlando Linux User Group (GoLUG). Steve can be reached at his email address.

GNU/Linux, open source and free software

By Steve Litt

Linux is a kernel. The operating system often described as "Linux" is that kernel combined with software from many different sources. One of the most prominent, and oldest of those sources, is the GNU project.

"GNU/Linux" is probably the most accurate moniker one can give to this operating system. Please be aware that in all of Troubleshooters.Com, when I say "Linux" I really mean "GNU/Linux". I completely believe that without the GNU project, without the GNU Manifesto and the GNU/GPL license it spawned, the operating system the press calls "Linux" never would have happened.

I'm part of the press and there are times when it's easier to say "Linux" than explain to certain audiences that "GNU/Linux" is the same as what the press calls "Linux". So I abbreviate. Additionally, I abbreviate in the same way one might abbreviate the name of a multi-partner law firm. But make no mistake about it. In any article in Troubleshooting Professional Magazine, in the whole of Troubleshooters.Com, and even in the technical books I write, when I say "Linux", I mean "GNU/Linux".

There are those who think FSF is making too big a deal of this. Nothing could be farther from the truth. The GNU General Public License, combined with Richard Stallman's GNU Manifesto and the resulting GNU-GPL License, are the only reason we can enjoy this wonderful alternative to proprietary operating systems, and the only reason proprietary operating systems aren't even more flaky than they are now.

For practical purposes, the license requirements of "free software" and "open source" are almost identical. Generally speaking, a license that complies with one complies with the other. The difference between these two is a difference in philosophy. The "free software" crowd believes the most important aspect is freedom. The "open source" crowd believes the most important aspect is the practical marketplace advantage that freedom produces.

I think they're both right. I wouldn't use the software without the freedom guaranteeing me the right to improve the software, and the guarantee that my improvements will not later be withheld from me. Freedom is essential. And so are the practical benefits. Because tens of thousands of programmers feel the way I do, huge amounts of free software/open source is available, and its quality exceeds that of most proprietary software.

In summary, I use the terms "Linux" and "GNU/Linux" interchangably, with the former being an abbreviation for the latter. I usually use the terms "free software" and "open source" interchangably, as from a licensing perspective they're very similar. Occasionally I'll prefer one or the other depending if I'm writing about freedom, or business advantage.

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist. Steve can be reached at his email address.

Node.pm Installation

By Steve Litt

For the purposes of this magazine, install Node.pm in /home/yourid/Node. as follows:

Download http://troubleshooters.a3b3.com/projects/Node/download/0.2.0/Node.0.2.0.tgz into your home directory.
tar xzvf Node.0.2.0.tgz
cd ~/Node
./example_hello.pl

If everything's gone correctly, example_hello.pl should yield the following results:

[slitt@mydesk Node]$ ./example_hello.pl

::: myname ::: mytype ::: myvalue :::
[slitt@mydesk Node]$

If so, go on to the next article. If not, check that Node.pm exists in the directory, as well as example_hello.pl. Troubleshoot as necessary.

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

Node.pm Hello World

By Steve Litt

Create the following hello.pl in your home directory:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;	# prevent hard to find errors
use Node;	

my $topNode = Node->new("myname", "mytype", "myvalue");
print "\n::: ";
print $topNode->getName(), " ::: ";
print $topNode->getType(), " ::: ";
print $topNode->getValue(), " :::\n";

The "use Node" line includes the Node.pm tool. The Node->new() command instantiates a new Node object with name, type and value according to its three arguments.

Now run it, and watch what happens:

[slitt@mydesk slitt]$ ./hello.pl
Can't locate Node.pm in @INC (@INC contains: 
/usr/lib/perl5/5.8.1/i386-linux-thread-multi 
/usr/lib/perl5/5.8.1 
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi 
/usr/lib/perl5/site_perl/5.8.1 
/usr/lib/perl5/site_perl 
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi 
/usr/lib/perl5/vendor_perl/5.8.1 
/usr/lib/perl5/vendor_perl/5.8.0 
/usr/lib/perl5/vendor_perl .) at ./hello.pl line 9.
BEGIN failed--compilation aborted at ./hello.pl line 9.
[slitt@mydesk slitt]$

As you can see, the program could not find Node.pm because it was not installed on Perl's module path. There are many ways to address this problem. To give this document the ultimate applicability with minimal change, we'll create a shellscript to run the program, so that no matter where you home directory, it will run correctly. The shellscript is called run, is placed in your home directory, and looks like this:

#!/bin/sh
perl -w -I$HOME/Node $@

Be sure to chmod the run file executable. Now watch what happens when you run hello.pl through the run script:

[slitt@mydesk slitt]$ ./run hello.pl

::: myname ::: mytype ::: myvalue :::
[slitt@mydesk slitt]$

The hello.pl script runs correctly. All exercises in this magazine will use the run script to pass the proper Node.pm path to the perl program.

The preceding program created a single Node object with a name, a type and a value, and then output the information from that Node object.

Congratulations! You've just created a Node.pm program, including tackling the sticky issue of defining the path to Node.pm.

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

Parsing an Outline

By Steve Litt

The preceding article was a simple proof of concept instantiating a single Node object and printing its name, type and value. This article guides you through parsing an outline into a Node tree, and then outputting the results.

Node.pm was originally created to parse and manipulate tab indented outlines. That's why it has an OutlineParser object whose entire job is to read an outline and convert it to a Node tree.

This article presents a typical, if oversimplified, outline manipulation program. At the most basic level, this is the logic of outline manipulation programs:

Use an OutlineParser object to parse the outline into a Node tree
Instantiate a Callbacks object.
Use one or more Walker objects to manipulate the Node tree
Use a Walker object to print the Node tree

The program in this article does not do #3, manipulate the Node tree. In that respect it's still a proof of concept.

Use an OutlineParser object to parse the outline into a Node tree

The OutlineParser object has several properties. You can define a comment character so it skips lines whose first nonblank is that character. You must define its input source either as stdin (fromStdin()) or a file (fromFile()). If from a file, the filename is passed as an argument to the parse() method. Once the OutlineParser object is configured, its parse() method is called to do the actual work of parsing the outline into a Node tree.

One more interesting point. The OutlineParser object creates a brand new Node object to serve as the ultimate ancestor of all Node objects representing lines in the outline. That way, an outline having multiple top level lines can be represented, with those top level lines being immediate children of the brand new Node object. This is a difference between the Node.pm OutlineParser object and the DOM XML parser, which cannot accommodate multiple top level entries. The OutlineParser created Node object has name="Header Node", type="Head", and value="Inserted by OutlineParser".

Once the outline is parsed into a tree of Node objects, it's ready to manipulate. Such manipulation is performed by Walker objects, which traverse the tree as if being recursed. At each node a callback function is called to perform work on the Node object.

Instantiate a Callbacks object.

Callback routines are called by the Walker object at two events:

Upon first entry into a Node <called the Entry callback>
Upon reentry into the Node from its children <called the return callback>

Note that the code in Node.pm calls the return callback the "exit callback". This is misleading because the return callback is NOT called on Node objects with no children.

Any callback routines used by the Node.pm tool's Walker object MUST be methods of an object. They cannot be free standing subroutines. The reason for making callbacks part of an object is so that persistent data can be accessed and manipulated by the callbacks, without the need for global variables.

The object containing the callback routine(s) needs no special naming convention. Readability favors the name Callbacks, but that's just custom. Also, there can be several different Callback objects. However, be sure to group Callback routines by needed persistent data.

A Callback object is typically instantiated like this:

my $callbacks = Callbacks->new();

However, there's nothing to stop a Callback object's new() method from instantiating variables or performing other functions.

Once the Callbacks object is instantiated, Walker objects can be instantiated using the Callbacks object's methods.

Use a Walker object to print the Node tree

A Walker object "walks" the Node tree, taking action on each Node object. The Walker object implements the algorithm discussed in this month's Editor's Desk article, as follows:

#                     TOP
#                      |
#       .--------------+--------------.
#       |              |              |
#       N              N              N
#       |              |              |
#   .---+---.      .---+---.      .---+---.
#   |   |   |      |   |   |      |   |   |
#   N   N   N      N   N   N      N   N   N
#           |
#         .-+-.  
#         |   | 
#         N   N


while(1)
	{
	if ascending
		if back at top node 
			quit
	else
		perform action on this node

	if you can go down
		go down
	else if you can go right
		go right
	else
		go up
	}

The preceding is for the most part the exact algorithm implemented by the Walker object. Notice the perform action on this node line in the preceding pseudocode. The action obviously cannot be defined within the Walker code because it changes from application to application. Therefore, the Walker code calls a dummy subroutine reference, called a callback routine, to do the work. It is then up to the application programmer to link the callback routine with the Walker object, after which the Walker object "magically does the right thing".

The only remaining question is, what does the callback routine do? Here it is:

package Callbacks;
sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}
	print $level, " ::: ";
	print $checker->getValue();
	print "\n";
	}

As you can see in the preceding, the callback, which is a method of an object, has three arguments: 0) The object it's attached to, 1) The Node for which it was called, and 2) The level of that node in the node hierarchy. The Node argument is called $checker because the Walker action resembles a checker slid from Node to Node. The $level argument isn't strictly necessary, as the level of any Node could be deduced by repeated calls to getParent(). However, such repeated getParent() calls would be a major performance hit, and would also complicate callback routines. By passing the level as an argument to the callback, the Walker object makes the program faster and the programming easier.

The line starting with unless prevents the algorithm from being run on an undefined Node. In most real callback routines, several lines at the top return under certain conditions. By writing the routine like that, nested if statements are avoided, and the algorithm is kept simple.

This particular callback simply prints the level, followed by " ... ", followed by the value of the Node object.

The program code follows:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}


sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}
	print $level, " ::: ";
	print $checker->getValue();
	print "\n";
	}

package Main;

sub main()
	{
	#### PARSE FROM FILE README.otl
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromFile();
	my $topNode=$parser->parse("Node/README.otl");

	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### OUTPUTTING LEVEL AND TEXT
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintNode, $callbacks]
		);
	$walker->walk();
	}

main();

Run the preceding, and notice the output:

./run test.pl | less
0 ::: Inserted by OutlineParser
1 ::: MANUAL FOR THE Node.pm Tool
2 ::: Version 0.2.0 released 5/13/2004
1 ::: License
2 ::: Litt Perl Development Tools License, version 1
3 ::: See COPYING file
3 ::: This license is the GNU GPL with an exception
4 ::: See COPYING.GPL
2 ::: NO WARRANTY!!!!! See COPYING.GPL
1 ::: Purpose
2 ::: Handling hierarchies in Perl
2 ::: Implements a tree of nodes
2 ::: Each node has a name, a type, a value, and optionally attributes
2 ::: Each node can have zero, one or many attributes
2 ::: Each attribute has a name and a value
2 ::: Especially made to handle tab indented outlines in memory
1 ::: Learning Node.pm
2 ::: Learn from the example programs: Study them in this order:
3 ::: example_hello.pl
3 ::: example_parse.pl
3 ::: example_otl2markup.pl
3 ::: example_attribs.pl
3 ::: example_bylevel.pl

Lessons Learned

This article covered most of the elements of a Node.pm application. We used an OutlineParser object to parse the outline into a Node tree, instantiated a callback object, instantiated a Walker object linked to the callback routine designed to print out the nodes, and then set the walker in action with the walk() method.

Callback routines must be methods of an object, and have three arguments: 1) The callback object, 2) the Node object on which it's called, and 3) the level of that Node.

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist. Steve can be reached at his email address.

Outputting Markup

By Steve Litt

The preceding article has all the elements of a practical Node.pm app, but it isn't really practical. This article converts an outline to markup, with start and end tags.

If you think about the concept of tags and end tags, they're nested. If an object contains subobjects, its end tag is not printed until all its subobjects' tags and end tags are printed. This feature can be implemented with complex break logic, or with a stack. Fortunately, with Node.pm it's as simple as printing the end tags with the return callback.

Also introduced in this program is indentation, which is as simple as a tab printing loop indexed by $level. Here's the code:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}

sub cbPrintTag()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}
	
	#### PRINT START TAG AND CONTENT
	for(my $n = 0; $n < $level; $n++) {print "\t";}
	print "<node level=", $level, ">";
	print $checker->getValue() if $checker->hasValue();

	#### IF THIS IS A LEAF LEVEL ITEM, PRINT THE
	#### END TAG IMMEDIATELY. OTHERWISE, THE
	#### RETURN CALLBACK WILL TAKE CARE OF THE END TAG.
	unless($checker->hasFirstChild())
		{
		print "</node>";
		}

	#### PRINT NEWLINE
	print "\n";
	}

sub cbPrintEndTag()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}
	
	#### PRINT END TAG FOR PARENT
	for(my $n = 0; $n < $level; $n++) {print "\t";}
	print "</node>";
	print "\n";
	}

package Main;

sub main()
	{
	#### PARSE FROM FILE README.otl
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromFile();	
	my $topNode=$parser->parse("Node/README.otl");

	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### OUTPUTTING TAG, VALUE, AND END TAG
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintTag, $callbacks],
		[\&Callbacks::cbPrintEndTag, $callbacks]
		);
	$walker->walk();
	}

main();

When running it through the less command, you see that it succeeds:

./run test.pl | less
<node level=0>Inserted by OutlineParser
        <node level=1>MANUAL FOR THE Node.pm Tool
                <node level=2>Version 0.2.0 released 5/13/2004</node>
        </node>
        <node level=1>License
                <node level=2>Litt Perl Development Tools License, version 1
                        <node level=3>See COPYING file</node>
                        <node level=3>This license is the GNU GPL with an exception
                                <node level=4>See COPYING.GPL</node>
                        </node>
                </node>
                <node level=2>NO WARRANTY!!!!! See COPYING.GPL</node>
        </node>
        <node level=1>Purpose
                <node level=2>Handling hierarchies in Perl</node>
                <node level=2>Implements a tree of nodes</node>
                <node level=2>Each node has a name, a type, a value, and optionally attributes</node>
                <node level=2>Each node can have zero, one or many attributes</node>
                <node level=2>Each attribute has a name and a value</node>
                <node level=2>Especially made to handle tab indented outlines in memory</node>
        </node>
        <node level=1>Learning Node.pm
                <node level=2>Learn from the example programs: Study them in this order:
                        <node level=3>example_hello.pl</node>
                        <node level=3>example_parse.pl</node>
                        <node level=3>example_otl2markup.pl</node>
                        <node level=3>example_attribs.pl</node>
                        <node level=3>example_bylevel.pl</node>
                        <node level=3>example_delete.pl</node>
                        <node level=3>example_insert.pl</node>
                        <node level=3>example_nodepath.pl</node>
                </node>

It's pretty self-explanatory. The only notable code feature is the instantiation of the Walker object, which is instantiated with two callback routines, not just one:

my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintTag, $callbacks],
		[\&Callbacks::cbPrintEndTag, $callbacks]
		);

The second callback routine is the return callback, triggering when a Node object is revisited after all its children have been visited. That second callback routine prints the end tag. Notice that the return callback is initiated only upon return to the Node object from its children, meaning that it's never called for leaf level Node objects. To print an end tag for leaf level nodes, the entry callback itself prints the end tag, but only if the node passes the test for being childless.

Lessons Learned

Sometimes you need to perform an action on a node, but only after all its children have been considered. End tags, tree totals, and percentage completion on trees are just a few examples. When this is a requirement, use the return callback to print the final information. Remember, return callbacks are not called on childless nodes, so if childless nodes need this same feature, the entry callback must test for children, and if there are none, print the final information.

Correct indentation is as simple as printing a Tab character $level times, typically in a loop.

Steve Litt is the author of Samba Unleashed. Steve can be reached at his email address.

Using Attributes

By Steve Litt

A node can have zero or more attributes. An attribute is a piece of information about the node. Each attribute has a name and a value, so they're key/value pairs.

Almost anything you can do with an attribute can be done with a subnode, so you might wonder why attributes are used, and when to use each. Here are some general guidelines:

Attributes are usually facts about a node. Subnodes are typically something contained in the node. It's a fuzzy difference, but you can use it as a guideline.
Attributes cannot have children. If you need children, use a subnode.

Exception: Because in Perl an attribute can take a reference as a value, this is not strictly true.

Attributes of a single node must be named uniquely. Multiple subnodes of a single node can be named identically.
Attributes can quickly be called by name. Subnodes must be iterated through, and name matched.

Node.pm gives the application programmer two different ways to access attributes of a node:

Collectively, as a Perl hash
Singly, by name

Collective access to attributes includes these:

%myAttribs = $myNode->getAttributes();
if($myNode->hasAttributes()) {...}
$myNode->setAttributes(%myAttribs); # Seldom used, not well tested: beware

Single attribute access is provided by these methods

if($myNode->hasAttribute('phone') {$myPhone = $myNode->getAttribute('phone');}
$myNode->setAttribute('phone', "800-555-1212");
$myNode->removeAttribute('phone');

Notice that although there's a remove method for single access, there is no remove method for collective attribute access. This is an oversight that should probably be corrected in a future version.

The program discussed in this article prints out each node, properly indented, with an asterisk as a bullet. Below each node is a list of its attributes, in key=value format, enclosed in parentheses. Below the attribute line, if and only if the node has children, is a message stating how many children. This message is in the program strictly for instructional purposes in that it shows how to retrieve a single attribute.

When you run the program, you'll notice that every Node object except the top one has a _lineno attribute. This was added by the OutlineParser object so that you can give line numbers with error messages. The top node has no such attribute because it was created by the OutlineParser object, not by the parsing of the file.

Here is the source code:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}

sub cbCountChildren()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}
	
	my $childCount=0;
	if($checker->hasFirstChild())
		{
		$childCount++;
		my $checker2 = $checker->getFirstChild();
		while($checker2->hasNextSibling())
			{
			$childCount++;
			$checker2 = $checker2->getNextSibling();
			}
		$checker->setAttribute("children", $childCount);
		}
	}

sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}

	#### PRINT NODE'S VALUE
	for(my $n=0; $n < $level; $n++) {print "\t";}
	print "* ";
	print $checker->getValue();
	print "\n";

	#### PRINT NODE'S ATTRIBUTES
	for(my $n=0; $n <= $level; $n++) {print "\t";}
	print "(";

	my %attribs = ();
	%attribs = $checker->getAttributes() if $checker->hasAttributes();

	my @keys = keys(%attribs);
	foreach my $key (sort @keys)
		{
		print $key, "=", $attribs{$key}, "; ";
		}

	print ")\n";

	#### PRINT SPECIAL MESSAGE ABOUT CHILDREN IF IT HAS ANY
	if($checker->hasAttribute("children"))
		{
		for(my $n=0; $n <= $level; $n++) {print "\t";}
		print "This node has ";
		print $checker->getAttribute("children");
		print " children.\n";
		}
	}

package Main;

sub main()
	{
	#### PARSE FROM FILE README.otl
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromFile();
	my $topNode=$parser->parse("Node/README.otl");


	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### COUNTING EACH NODE'S CHILDREN
	my $walker = Walker->new
		(
		$topNode,				
		[\&Callbacks::cbCountChildren, $callbacks]
		);
	$walker->walk();

	#### WALK THE NODE TREE,
	#### PRINTING EACH NODE'S VALUE, ATTRIBUTE LIST AND CHILD MESSAGE
	$walker = Walker->new
		(
		$topNode,				
		[\&Callbacks::cbPrintNode, $callbacks]
		);
	$walker->walk();
	}

main();

The following is a partial output:

[slitt@mydesk slitt]$ ./run test.pl | head -n20
* Inserted by OutlineParser
        (children=7; )
        This node has 7 children.
        * MANUAL FOR THE Node.pm Tool
                (_lineno=1; children=1; )
                This node has 1 children.
                * Version 0.2.0 released 5/13/2004
                        (_lineno=2; )
        * License
                (_lineno=3; children=2; )
                This node has 2 children.
                * Litt Perl Development Tools License, version 1
                        (_lineno=4; children=2; )
                        This node has 2 children.
                        * See COPYING file
                                (_lineno=5; )
                        * This license is the GNU GPL with an exception
                                (_lineno=6; children=1; )
                                This node has 1 children.
                                * See COPYING.GPL
./run: line 2: 15281 Broken pipe             perl -w -I$HOME/Node $@
[slitt@mydesk slitt]$

Lessons Learned

Any node can have zero, one or multiple attributes. Each attribute is a key/value pair. No two attributes of the same node can have the same key. Attributes can be accessed either collectively with hasAttributes(), getAttributes() and/or setAttributes(), or singly via hasAttribute($name), getAttribute($name), setAttribute($name, $value), and/or removeAttribute($name).

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

Accessing Nodes by Level

By Steve Litt

Typical access through a hierarchy is via recursive order. In other words, go deep before going next. In a normal outline, recursive order is simply reading down the outline from the top of the page to the bottom.

Sometimes recursive order isn't what you want. Sometimes you might need to present everything at the top level, then everything at the second level, the third, continuing down to the bottom leaf levels. You might need to print each node with its direct children, but no deeper descendents. If so, you can do that by giving your callback object two properties: currentLevel and childrenAtLevel.

The currentLevel property serves as a marker for the callback, which returns with no action unless the level matches currentLevel. At each level, the callback records any children found, incrementing childrenAtLevel. The level by level loop terminates after any iteration failing to increment childrenAtLevel.

The program displayed in this article traverses and prints a node tree level by level. In the output, each level is preceded by a level header. The value of each node at that level is printed, indented. So that you can know that node's parentage, above each group of sibling nodes is a header displaying the full ancestry. That full ancestry is calculated by the Callbacks::cbCalculateFullPath() callback routine, and stored in each node's fullpath attribute.

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use Node;


package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	$self->{'childrenatlevel'} = 0;
	$self->{'currentlevel'} = 0;
	$self->{'previousparentfullpath'} = "initialize";
	return($self);
	}

sub getChildrenAtLevel(){return $_[0]->{'childrenatlevel'};}
sub setChildrenAtLevel(){$_[0]->{'childrenatlevel'} = $_[1];}
sub incChildrenAtLevel(){$_[0]->{'childrenatlevel'}++;}

sub getCurrentLevel(){return $_[0]->{'currentlevel'};}
sub setCurrentLevel(){$_[0]->{'currentlevel'} = $_[1];}

sub cbCalculateFullPath()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;} # don't process undef node

	if($checker->hasParent)
		{
		my $fullpath = $checker->getParent()->getAttribute("fullpath");
		$fullpath .= "/";
		$fullpath .= $checker->getValue();
		$checker->setAttribute("fullpath", $fullpath);
		}
	else
		{
		$checker->setAttribute("fullpath", $checker->getValue());
		}
	}

sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;} # don't process undef node

	#### DO NOTHING UNLESS THIS NODE IS AT THE CURRENTLY SOUGHT LEVEL
	return unless $level == $self->getCurrentLevel();

	#### DO NOTHING UNLESS THIS NODE HAS CHILDREN
	return unless $checker->hasFirstChild();

	#### PRINT HEADER
	print "\n", $checker->getAttribute("fullpath"), "\n";
	
	#### PRINT CHILDREN AND COUNT CHILDREN AT LEVEL
	my $checker2 = $checker->getFirstChild();
	print "\t", $checker2->getValue(), "\n";
	$self->incChildrenAtLevel();

	while($checker2->hasNextSibling())
		{
		$checker2 = $checker2->getNextSibling();
		print "\t", $checker2->getValue(), "\n";
		$self->incChildrenAtLevel();
		}
	}


package Main;

sub main()
	{
	#### PARSE FROM FILE README.otl
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromFile();
	my $topNode=$parser->parse("Node/README.otl");


	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### CALCULATING FULL PATHS AND PUTTING THEM IN AN ATTRIBUTE
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbCalculateFullPath, $callbacks]
		);
	$walker->walk();

	#### PRINT LEVEL 0
	print "\n\n********** BEGIN LEVEL ", "0", "\n";
	print "\t", $topNode->getValue(), "\n";

	#### SET STARTING PARENT LEVEL,
	#### AND SET $childCount SO THE LOOP WILL FIRE THE FIRST TIME
	my $level=0;
	my $childCount=9999;

	#### LOOP THROUGH EACH LEVEL, QUIT WHEN NO MORE CHILDREN
	while($childCount > 0)
		{
		print "\n\n********** BEGIN LEVEL ", $level + 1, "\n";
		$callbacks->setChildrenAtLevel(0);
		$callbacks->setCurrentLevel($level);
		my $walker = Walker->new
			(
			$topNode,
			[\&Callbacks::cbPrintNode, $callbacks]
			);
		$walker->walk();
		$childCount = $callbacks->getChildrenAtLevel();
		$level++;
		}
	}

main();

./run ./test.pl | head -n70

********** BEGIN LEVEL 0
	Inserted by OutlineParser


********** BEGIN LEVEL 1

Inserted by OutlineParser
	MANUAL FOR THE Node.pm Tool
	License
	Purpose
	Learning Node.pm
	File manifest
	Objects
	Installation


********** BEGIN LEVEL 2

Inserted by OutlineParser/MANUAL FOR THE Node.pm Tool
	Version 0.2.0 released 5/13/2004

Inserted by OutlineParser/License
	Litt Perl Development Tools License, version 1
	NO WARRANTY!!!!! See COPYING.GPL

Inserted by OutlineParser/Purpose
	Handling hierarchies in Perl
	Implements a tree of nodes
	Each node has a name, a type, a value, and optionally attributes
	Each node can have zero, one or many attributes
	Each attribute has a name and a value
	Especially made to handle tab indented outlines in memory

Inserted by OutlineParser/Learning Node.pm
	Learn from the example programs: Study them in this order:
	That's the only way to learn this tool
	Do each example program in order
	Example programs

Inserted by OutlineParser/File manifest
	Documentation
	Licensing
	Node.pm file
	Example Programs
	Sample node path config file (for example_nodepath.pl)
	Sample outline (used for example_delete.pl)

Inserted by OutlineParser/Objects
	Node.pm implements three object types:
	Node
	OutlineParser
	Walker

Inserted by OutlineParser/Installation
	See INSTALL file


********** BEGIN LEVEL 3

Inserted by OutlineParser/License/Litt Perl Development Tools License, version 1
	See COPYING file
	This license is the GNU GPL with an exception

Inserted by OutlineParser/Learning Node.pm/Learn from the example programs: Study them in this order:
	example_hello.pl
	example_parse.pl
	example_otl2markup.pl
	example_attribs.pl

Lessons Learned

We kept three variables in the callback object (children at level, current level, and full path). Access to this capability is the reason that Node.pm demands all callbacks be an object method. And we walked the tree multiple times -- one for each level. This was done in the main routine.

Steve Litt is the author of the Universal Troubleshooting Process Courseware . Steve can be reached at his email address.

Deleting Nodes

By Steve Litt

Because Perl is a garbage collection language, node deletion DOES NOT deallocate memory and the like. However, in the absense of a copy of the node, it will be garbage collected and unavailable. The deletion process also specificly undef's the deleted node's first and last children.

You noticed I mentioned keeping a copy of the deleted node. The algorithm of a Walker object moves a node around the tree like a checker. Calling $checker->deleteSelf() does not render $checker undefined. In fact, $checker still has its parent, nextSibling and previousSibling pointers intact. What this means is that the Walker object's next iteration goes to exactly the same node as it would have if the deletion had not taken place. In other words, you do not need to "move the checker back one" after a deletion. This makes deletion algorithms very simple.

There may come a time when you want to delete a node but keep its children. In that case, you must first attach its children to nodes that will not be deleted.

This article showcases a program that deletes every node containing the word deleteme, and all of its children. The input file is the deletetest.otl file packaged with the Node.pm distribution. This file looks like this:

[slitt@mydesk slitt]$ cat Node/deletetest.otl
Top
        Level2
        Level2b
                Level3
                        deleteme
                                gone
                                gone
                        deleteme
                                gone
                                gone
                        this should stay
                                deleteme
                        deleteme
                                gone
                                gone
                Level3b
        2level2
Top2
[slitt@mydesk slitt]$

A Walker object walks the tree, calling a callback routine that tests for the deleteme string, and if found deletes the calling node. The result is as follows:
z

[slitt@mydesk slitt]$ ./run ./test.pl
Inserted by OutlineParser
        Top
                Level2
                Level2b
                        Level3
                                this should stay
                        Level3b
                2level2
        Top2
[slitt@mydesk slitt]$

Please remember,
x

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}

sub cbDelete()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return;}

	#### DELETE THIS NODE IF ITS VALUE CONTAINS deleteme
	my $text = "init";
	$text = $checker->getValue() if $checker->hasValue();
	if($text =~ m/deleteme/)
		{
		$checker->deleteSelf();
		}
	}

sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return -999;}

	for(my $n=0; $n < $level; $n++) {print "\t";}
	print $checker->getValue(), "\n";
	}

package Main;

sub main()
	{
	#### PARSE FROM FILE deletetest.otl
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromFile();
	my $topNode=$parser->parse("Node/deletetest.otl");

	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### DELETING NODES WITH "deleteme" IN THEM
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbDelete, $callbacks]
		);
	$walker->walk();

	#### WALK THE NODE TREE,
	#### OUTPUTTING INDENTED VALUE
	$walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintNode, $callbacks]
		);
	$walker->walk();
	}

main();

Lessons Learned

Deletion is as simple as calling the Node::deleteSelf() method. This is Perl, so the deletion doesn't actually deallocate the node's memory. Instead, it sets pointers of its parents and siblings to "cut it out of the picture". Because the deleted node's parent and sibling pointers are still intact, there is no need for the Walker object or any other algorithm to "backslide" the pointer. The algorithm can continue. Once all references to the deleted node are gone, Perl's garbage collection reclaims the node's memory.

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

Inserting Nodes

By Steve Litt

Node.pm isn't rocket science. A node holds these pieces of data:

Name
Type
Value
Attribute list

A node also holds the following location/navigation information:

Parent pointer (if not top node)
Pointer to first child (if exists)
Pointer to last child (if exists)
Pointer to next sibling (if exists)
Pointer to previous sibling (if exists)

To insert a new first child for node $currNode, you must:

Create a new node
Set the new node's parent pointer to $currNode
If $currNode has a first child pointer, set the new node's next sibling pointer to the current first child
If $currNode has a first child pointer, set the current first child's previous sibling pointer to the new node
Set $currNode's first child pointer to the new node

If the preceding seems like too much of a hassle, simply do this:

$currNode->insertFirstChild(Node->new("myName", "myType", "myValue"));

There's a Node::insertLastChild() that does the same, except inserts the node as the last child.

What if you want to insert a child in the middle, instead of as the first or last child? In that case, navigate to the node just before where you want to insert the new one, and do this, assuming the node before the one you want inserted is called $currNode:

$currNode->insertSiblingAfterYou(Node->new("myName", "myType", "myValue"));

This single call does the following:

Create the new node
Set the new node's parent pointer to the parent of $currNode
Set the new node's previous sibling pointer to $currNode
If $currNode has a next sibling pointer, set the new node's next sibling pointer to the next sibling pointer of $currNode
If $currNode has a next sibling pointer, set the previous sibling pointer of $currNode's next sibling to the new node
If $currNode does not have a next sibling pointer, set $currNode's parent's last child pointer to the new node
Set $currNode's next sibling pointer to the new node

As you can see, methods like Node::insertFirstChild() and Node::insertSiblingAfterYou() make tree insertions a breeze. The biggest challenge, and it's not much of a challenge at that, is to navigate either to the node before the intended insertion, after the intended insertion, or if the intended insertion is the first or last child of a parent node, move to that parent node.

This article builds a program to create and manipulate an appointment calender as a node tree, the hierarchy being:

Calender

Year

Month

timeslot

The calendar is built with the help of two arrays: @monthNames and @monthLengths. @monthLengths does not take into account leap years because this is an exercise, not a practical program. The following is a general description of what this program does:

Create the calendar node as the top node, and the 2004 node below it.
For each month in @monthNames, insert a month node, and then under it insert day nodes in the amount specified by @monthLengths[monthNumber].
Insert appointment hours in each day using a Walker object and associated callback.
Make several appointments by manually setting the appointment hour's value, which was by default undefined, to a string.
Mark days, months and years with an asterisk for each appointment in that day, month or year. This is done by a Walker object associated with a return callback routine that totals all appointments for all children.
Print the calendar using a Walker object with a callback routine that prints.
OOPS! We accidentally switched the June 22 and Sept 22 appointments. Switch them back.
Re-mark days, months and years with an asterisk for each appointment in that day, month or year. This is done by a Walker object associated with a return callback routine that totals all appointments for all children.
Print a separator to separate the erroneous calender from the correct one.
Print the corrected calendar using a Walker object with a callback routine that prints.

This is a pretty challenging program. It's long, with lots of steps. The section that switches the two nodes is extremely complex, but valuable because it teaches you working with clone nodes.

Here's the code, all 468 lines of it:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;	# prevent hard to find errors

use Node;	# Use Node.pm tool

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}

#=================================================================
# This callback places asterisks in the value of years, months and
# days whose anscestors have appointments, one asterisk per appointment
#=================================================================
sub cbMakeMarks()
	{
	my($self, $checker, $level) = @_;
	return unless defined($checker); # don't process undef node

	#### PROCESS ONLY DAY, MONTH OR YEAR NODES
	return unless	(
			$checker->getType() eq "Day" ||
			$checker->getType() eq "Month" ||
			$checker->getType() eq "Year"
			);

	#### COUNT APPOINTMENTS IN DESCENDENTS
	my $count = 0;
	my $childNode = $checker->getFirstChild();
	while(defined($childNode))
		{
		if($checker->getType() eq "Day")
			{
			if(defined($childNode->getValue()))
				{
				$count++;
				}
			}
		else
			{
			if($childNode->hasAttribute("appointments"))
				{
				$count += $childNode->getAttribute("appointments");
				}
			}
		$childNode = $childNode->getNextSibling();
		}

	#### PLACE COUNT IN ATTRIBUTE
	$checker->setAttribute("appointments", $count);

	#### MARK STARS, ONE FOR EACH APPOINTMENT IN DESCENDENTS
	if($count > 0)
		{
		my $string;
		for(my $n=0; $n < $count; $n++){$string .= '*';}
		$checker->setValue($string);
		}
	}

#=================================================================
# This callback operates ONLY on day nodes. When
# called from a day node, it inserts hourlong appointment slots
# starting at 8am and ending at 5pm. The code is pretty 
# straightforward.
#=================================================================
sub cbInsertHours()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return -999;} # don't process undef node


	return unless $checker->getType() eq "Day"; # Insert hours under days only

	my $checker2;
	for(my $n=8; $n <= 16; $n++)
		{
		my $startHour = "$n:00";
		my $n2 = $n + 1;
		my $endHour = "$n2:00";
		my $node = Node->new("$startHour" . "-" . "$endHour", "Hour", undef);
		if($checker->hasFirstChild())
			{
			$checker2 = $checker2->insertSiblingAfterYou($node);
			}
		else
			{
			$checker2 = $checker->insertFirstChild($node);
			}
		}
	}

#=================================================================
# The cbPrintNode() callback prints the name of the node,
# and its value if a value is defined. It's very straighforward.
#=================================================================
sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	unless (defined($checker)) {return -999;} # don't process undef node

	#### DON'T PRINT LEVEL 0 (CALENDER)
	return if $level == 0;

	for(my $n=1; $n < $level; $n++) { print "\t";}

	print $checker->getName() if $checker->hasName();
	print ":   ";

	print $checker->getValue() if $checker->hasValue();
	print "\n";
	}


package Main;

###########################################################################
# The makeAppointments() subroutine manually makes several appointments
# To facilitate a later demonstration of node swapping, the June 22
# and Sept 22 appoinments are accidentally switched
###########################################################################
sub makeAppointments($)
	{
	my $yearNode = shift;

	#### MARCH 22  AT 8AM
	my $monthNode = $yearNode->getFirstChild() ->		#January
			getNextSibling() ->			#February
			getNextSibling();			#March
	my $dayNode = $monthNode->getFirstChild();
	while($dayNode->getName() != 22)
		{
		$dayNode = $dayNode->getNextSibling();
		unless(defined($dayNode))
			{
			die "No March 22\n";
			}
		}
	my $hourNode = $dayNode->getFirstChild();
	$hourNode->setValue("Spring Cleaning");

	#### JUNE 22  AT 9AM
	#### WRONGLY LABELED AS FALL FESTIVAL
	#### INSTEAD OF SUMMER BREAK
	$monthNode = $monthNode->getNextSibling() ->		# April
			getNextSibling() ->			# May
			getNextSibling();			# June
	$dayNode = $monthNode->getFirstChild();
	while($dayNode->getName() != 22)
		{
		$dayNode = $dayNode->getNextSibling();
		unless(defined($dayNode))
			{
			die "No June 22\n";
			}
		}
	$hourNode = $dayNode->getFirstChild()->getNextSibling();
	$hourNode->setValue("Fall Festival");

	#### SEPTEMBER 22  AT 10AM
	#### WRONGLY LABELED AS FALL FESTIVAL
	#### INSTEAD OF SUMMER BREAK
	$monthNode = $monthNode->getNextSibling() ->		# July
			getNextSibling() ->			# August
			getNextSibling();			# September
	$dayNode = $monthNode->getFirstChild();
	while($dayNode->getName() != 22)
		{
		$dayNode = $dayNode->getNextSibling();
		unless(defined($dayNode))
			{
			die "No September 22\n";
			}
		}
	$hourNode = $dayNode -> getFirstChild() ->		#8-9
				getNextSibling() ->		# 9-10
				getNextSibling();		# 10-11
	$hourNode->setValue("Summer Break");

	#### DECEMBER 22 FROM 3PM TO 5PM (2 TIMESLOTS)
	#### HAPPY HOLIDAYS PARTY
	$monthNode = $monthNode->getNextSibling() ->		# October
			getNextSibling() ->			# November
			getNextSibling();			# December
	$dayNode = $monthNode->getFirstChild();
	while($dayNode->getName() != 22)
		{
		$dayNode = $dayNode->getNextSibling();
		unless(defined($dayNode))
			{
			die "No December 22\n";
			}
		}
	$hourNode = $dayNode->getFirstChild();
	while($hourNode->getName() ne "15:00-16:00")
		{
		$hourNode = $hourNode->getNextSibling();
		unless(defined($hourNode))
			{
			die "No 4pm slot\n";
			}
		}
	$hourNode->setValue("Happy Holidays Party");
	$hourNode = $hourNode->getNextSibling();
	$hourNode->setValue("Happy Holidays Party");

	#### DECEMBER 30 AT 9AM BUY PARTY SUPPLIES
	while($dayNode->getName() != 30)
		{
		$dayNode = $dayNode->getNextSibling();
		unless(defined($dayNode))
			{
			die "No December 30\n";
			}
		}
	$hourNode = $dayNode->getFirstChild()->getNextSibling();
	$hourNode->setValue("Buy Party Supplies");
	}

###########################################################################
# The insertMonthsAndDays() iterates through @monthNames. For each, it
# creates a month node, and then under than node iterates days from 1 to
# that month's entry in the @monthLengths array.
#
# Note that we could have avoided using a nested loop by using a Walker
# and associated callback to install the days under every month. In such
# a case the array of month lengths would have been placed in the Callback
# object. However, for the sake of variety, we chose to use a nested loop
# to load the months and days.
###########################################################################
sub insertMonthsAndDays($)
	{
	my $yearNode = shift;
	my $checker = $yearNode;
	my $checker2;
	my @monthNames=("January", "February", "March", "April", "May",
		"June", "July", "August", "September", "October",
		"November", "December");
	my @monthLengths=(31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
	my $monthSS = 0;
	foreach my $monthName (@monthNames)
		{
		my $node = Node->new($monthName, "Month", undef);
		$node->setAttribute("days", $monthLengths[$monthSS]);
		if($yearNode->hasFirstChild())
			{
			$checker = $checker->insertSiblingAfterYou($node);
			}
		else
			{
			$checker = $yearNode->insertFirstChild($node);
			}
		for(my $n=1; $n <= $monthLengths[$monthSS]; $n++)
			{
			$node = Node->new($n, "Day", undef);
			if($checker->hasFirstChild())
				{
				$checker2 = $checker2->insertSiblingAfterYou($node);
				}
			else
				{
				$checker2 = $checker->insertFirstChild($node);
				}
			}
		$monthSS++;
		}
	}

###########################################################################
# This subroutine switches the June 22 9am appointment and the
# September 22 10am appointment. In each case, both the appointment
# text and the time needed switching.
#
# The sane way to accomplish this task would have been to modify
# the nodes in place. However, this subroutine was created solely to 
# demonstrate the movement of nodes, so that's what we did.
#
# Note that the fact that the two are at different times complicates the
# situation. It's not enough to just trade nodes -- the Sept 9am node
# must be placed after the existing June 10am node, which itself is after
# the erroneous June 9am node containing what should be September's 
# appointment. After such placement, the original June 9am node must
# have its name updated so that it is a 10am node. A similar process
# takes place for September. The original nodes are also deleted.
#
# Please follow the (convoluted and contrived) logic:
#   1. Store the June hour node in $juneNode
#   2. Store the September hour node in $septNode
#   3. After the existing June 10am, place a CLONE of the Sept appointment
#   4. Before the existing Sept 9am,  place a CLONE of the June appointment
#   5. Delete the original June appointment
#   6. Delete the original September appointment
#   7. On the original June 10am node, make it 9am
#   8. On the original September 9am node, make it 10am
###########################################################################
sub switchJuneAndSeptemberAppointments($)
	{
	my $yearNode = shift;

	#### FIND NODE FOR JUNE 22 9AM APPOINTMENT
	my $juneNode = $yearNode->getFirstChild();
	while(defined($juneNode))
		{
		last if $juneNode->getName() eq "June";
		$juneNode = $juneNode->getNextSibling();
		}
	die "Cannot find month of June\n" unless defined($juneNode);

	$juneNode = $juneNode->getFirstChild();
	while(defined($juneNode))
		{
		last if $juneNode->getName() eq "22";
		$juneNode = $juneNode->getNextSibling();
		}
	die "Cannot find June 22\n" unless defined($juneNode);

	$juneNode = $juneNode->getFirstChild();
	while(defined($juneNode))
		{
		last if $juneNode->getName() eq "9:00-10:00";
		$juneNode = $juneNode->getNextSibling();
		}
	die "Cannot find June 22 at 9am\n" unless defined($juneNode);

	#### FIND NODE FOR SEPTEMBER 22 10AM APPOINTMENT
	my $septNode = $yearNode->getFirstChild();
	while(defined($septNode))
		{
		last if $septNode->getName() eq "September";
		$septNode = $septNode->getNextSibling();
		}
	die "Cannot find month of September\n" unless defined($septNode);

	$septNode = $septNode->getFirstChild();
	while(defined($septNode))
		{
		last if $septNode->getName() eq "22";
		$septNode = $septNode->getNextSibling();
		}
	die "Cannot find September 22\n" unless defined($septNode);

	$septNode = $septNode->getFirstChild();
	while(defined($septNode))
		{
		last if $septNode->getName() eq "10:00-11:00";
		$septNode = $septNode->getNextSibling();
		}
	die "Cannot find September 22 at 9am\n" unless defined($septNode);

	#### SWITCH THE NODES
	my $newJune = $juneNode->getNextSibling()->insertSiblingAfterYou($septNode->clone());
	my $newSept = $septNode->getPrevSibling()->insertSiblingBeforeYou($juneNode->clone());
	$juneNode->deleteSelf();
	$septNode->deleteSelf();

	#### FIX NAMES OF SURROUNDING CLONES
	$newJune->getPrevSibling()->setName("9:00-10:00");
	$newSept->getNextSibling()->setName("10:00-11:00");

	return;
	}


###########################################################################
# In the main routine, you carry out or delegate the following tasks
# in order to create an appointment calendar:
#   1. Insert single level 0 and 1 nodes
#   2. Instantiate the Callbacks object
#   3. Insert all month and day nodes
#   4. Insert all hour nodes
#   5. Make appointments
#         erroneously switching the june 22 & Sept 22 appointments
#   6. Mark days, months and years containing appointments
#   7. Output the calendar
#   8. Switch back June22 and Sept22
#   9. Re mark days, months and years
#   10. Output a separator between bad and good calendars
#   11. Re output the calendar
#
###########################################################################
sub main()
	{
	#### INSERT SINGLE LEVEL 0 AND 1 NODES
	my $topNode=Node->new("Calender", "Calender", "Calender");
	my $yearNode=$topNode->insertFirstChild(Node->new("2004", "Year", undef));

	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### INSERT MONTH AND DAY NODES
	insertMonthsAndDays($yearNode);

	#### INSERT THE HOURS USING A Walker
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbInsertHours, $callbacks]
		);
	$walker->walk();


	#### MAKE A FEW APPOINTMENTS
	#### ACCIDENTALLY SWITCHING SUMMER AND FALL
	makeAppointments($yearNode);

	#### MARK DAYS, MONTHS AND YEAR THAT HAVE APPOINTMENTS
	#### USING A WALKER WITH ONLY A RETURN CALLBACK
	$walker = Walker->new
		(
		$topNode,
		undef,
		[\&Callbacks::cbMakeMarks, $callbacks]
		);
	$walker->walk();

	#### WALK THE NODE TREE,
	#### OUTPUTTING THE CALENDER
	$walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintNode, $callbacks]
		);
	$walker->walk();

	#### CORRECT THE MISTAKE
	#### SWITCH JUNE 22 AND SEPT 22
	switchJuneAndSeptemberAppointments($yearNode);

	#### RE-MARK DAYS, MONTHS AND YEAR THAT HAVE APPOINTMENTS
	#### USING A WALKER WITH ONLY A RETURN CALLBACK
	$walker = Walker->new
		(
		$topNode,
		undef,
		[\&Callbacks::cbMakeMarks, $callbacks]
		);
	$walker->walk();

	#### OUTPUT A SEPARATOR BETWEEN ORIGINAL AND CORRECTED CALENDARS
	for (my $n=0; $n<5; $n++)
		{
		print "######################################################\n";
		}

	#### RE-WALK THE NODE TREE,
	#### RE-OUTPUTTING THE CALENDER
	$walker = Walker->new
		(
		$topNode,				# start with this node
		[\&Callbacks::cbPrintNode, $callbacks]	# do this on entry to each node
		);
	$walker->walk();
	}

main();

View the output of this program like this:

./run ./test.pl | less

Key Points

This exercise introduces several key points.

Child Insertion Riff

Look at the cbInsertHours() method and notice how hour nodes are inserted:

for(my $n=8; $n <= 16; $n++)
	{
	# ...
	my $node = Node->new($range, "Hour", undef);

	if($checker->hasFirstChild())
		{
		$checker2 = $checker2->insertSiblingAfterYou($node);
		}
	else
		{
		$checker2 = $checker->insertFirstChild($node);
		}
	}

In the preceding, $checker is the node under consideration, with $checker2 being the node repeatedly added as a child of $checker. The first addition is performed via insertFirstChild(), returning itself as $checker2. Thereafter, insert more children of $checker by performing $checker2->insertSiblingAfterYou(). This riff is ubitquitous in the Node.pm world.

Child Search Riff

Look at the switchJuneAndSeptemberAppointments() method and notice how the children of the original $juneNode are searched:

$juneNode = $juneNode->getFirstChild();
while(defined($juneNode))
	{
	last if $juneNode->getName() eq "22";
	$juneNode = $juneNode->getNextSibling();
	}
die "Cannot find June 22\n" unless defined($juneNode);

In the preceding, we perform the equivalent of a priming read on the parent node, and then loop through the children using getNextSibling(). If the search is unsuccessful, $juneNode is undefined; a condition that can be caught at the bottom. This riff is ubitquitous in the Node.pm world.

Returns at Top of Callbacks

My circa 1983 professors drummed it into our heads: structured programming! They didn't mean substituting data structures for code. For the most part they meant elimination of gotos and pseudo gotos:

No gotos
Return from subroutines only at the end, in a single return statement
Loops terminated only by fulfillment of loop condition; never by break (or last) statements
Loops may not have continue statements

The preceding rules make for clean, modifyable code. Usually they make for readable code. They make it easier for the maintenance programmer. Nobody wants to decide how their newly inserted code should interact with a return, break (last), or continue statement half way through a 200 line subroutine. But...

The preceding rules usually result in heavily nested if statements. The rules against break and continue make for extra if statements inside the loop, and often extra variables to hold state across iteration.

In large subroutines these are small prices to pay for the ability to find an unambiguous place to drop in new code. But in small subroutines that are likely to stay small, these rules and the branch nesting they engender just hurt readability.

Callback routines, by their nature, are small and likely will stay small. They are event handlers. Callback routines typically act on only specific types of nodes, leaving the rest untouched.

For these reasons, Node.pm programmers often place branched returns at the top of the callback routine, ensuring that only the desired nodes will have the callback routine's code run on them. These topside returns eliminate the need for nested if statements and compound conditions in the actual callback code. As an example, look at the cbMakeMarks() callback routine:

sub cbMakeMarks()
	{
	my($self, $checker, $level) = @_;
	return unless defined($checker); # don't process undef node

	#### PROCESS ONLY DAY, MONTH OR YEAR NODES
	return unless	(
			$checker->getType() eq "Day" ||
			$checker->getType() eq "Month" ||
			$checker->getType() eq "Year"
			);
	
	#Perform callback code on anything that makes it to this point

By the time the code makes it past the two return unless statements, you know the node is defined and is either a day, month or year node. The remaining code can assume these things, and not have to test for them, making for much simpler and readable code.

This riff is ubitquitous in the Node.pm world.

Lessons Learned

Outline parsing is not the only way to build a node tree. Node trees can be built by node insertion. Node insertion is performed with one of the following methods:

Node::insertFirstChild()
Node::insertLastChild()
Node::insertSiblingAfterYou()
Node::insertSiblingBeforeYou()

In each case, the node to be inserted must be passed in as an argument, meaning it must be created first. Such precreation can be on a previous line:

my $newNode = Node->new("myName", "myType", "myValue);
my $newSibling = $currNode->insertSiblingAfterYou($newNode);

Or the precreation can be nested into the argument itself:

my $newSibling = $currNode->insertSiblingAfterYou(Node->new("myName", "myType", "myValue));

This article demonstrated several riffs commonly used by Node.pm programmers:

The Child Insertion riff
The Child Search riff
The Returns at Top of Callbacks riff

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist. Steve can be reached at his email address.

Determining the Node.pm Path

By Steve Litt

So far we've run all programs via the ./run shellscript. As you remember from the Hello World article, running the Perl script straight caused an error in which Node.pm could not be found in the @INC path. We overcame that problem by running the Perl scripts through the ./run shellscript.

There are many ways for the application program to deduce the path to Node.pm:

Place Node.pm on the existing Perl module path
Shebang line (#!/usr/bin/perl -w -I/data/perl/Node)
Modify @INC, using use lib, within the application program
Shellscript (like ./run)
Import Node.pm at runtime based on a configuration file

Which method you use depends on machine access and knowledge:

Do you know the absolute location of Node.pm on the target machine(s)
Do you know the entirety of the Perl module path on the target machine(s)
Do you have root access to the target machine(s)

Place Node.pm on the existing Perl module path

This is by far the easiest. By placing Node.pm in a directory on the existing Perl module path (@INC), you can access Node.pm from any Perl program on the machine with a use statement:

use Node;

Typically all directories on the Perl module path are writeable only by root, so this alternative works only if you have root access on the box.

A disadvantage of this method is that the Node.pm distribution is not placed in its own directory. Possible workarounds are:

Place the distribution in its own directory, and then copy only Node.pm to a directory on @INC
Place the distribution in its own directory, and create a Node.pm symlink from a directory on @INC pointing to Node.pm in the distribution directory. For instance:

ln -s /data/perl/Node/Node.pm /usr/lib/perl5/5.8.1/Node.pm

Another disadvantage of this method occurs if you use data-only backups, which don't back up directories under /usr. Nevertheless, if the program is used on a single machine for which you have root access, placing Node.pm or a symlink to it on the existing Perl module path is easy, foolproof, and allows all sorts of apps on that machine to use it with a minimum of hassle.

Shebang line (#!/usr/bin/perl -w -I/data/perl/Node)

If you don't have root access, or if you don't want the hassle of symlinks, but you definitely know the absolute location of Node.pm on the target machine(s), you can use the -I parameter on the shebang line. It has the advantage of being clean and simple. The disadvantage is that if the Node.pm location happens to change, you must change the shebang line of every program using Node.pm -- a real hassle.

Once again, this works only if you know the absolute location of Node.pm, this cannot work. The shebang line CANNOT process environment variables, so it must be a true absolute path. Therefore, this method is not suitable for applications being distributed over a wide range of computers and organizations -- UMENU is an example of a program that can't use this methodology.

Modify @INC, using use lib, within the application program

This is basically a more complicated way of doing the same thing as the shebang line does, but with one huge advantage. Unlike the shebang line, the use lib line can decipher environment variables. For instance, try this:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;
use lib "$ENV{'HOME'}/Node";
use Node;	

my $topNode = Node->new("myname", "mytype", "myvalue");
print "\n::: ";
print $topNode->getName(), " ::: ";
print $topNode->getType(), " ::: ";
print $topNode->getValue(), " :::\n";

Now run it without the shellscript:

[slitt@mydesk slitt]$ ./test.pl

::: myname ::: mytype ::: myvalue :::
[slitt@mydesk slitt]$

It works! If you can count on always having an environment variable pointing to the location of Node.pm, this is an ideal method. If the location changes, you just change the environment variable.

However, it's not trivial to guarantee an accurate environment variable, for all users, over time and operating system upgrades. One way to guarantee an environment variable is to run the perl program through a shellscript...

Shellscript (like ./run)

The shellscript method looks hoakie, but it's actually the most robust and versatile way to reveal the location of Node.pm to the application. The shellscript is typically very simple; thus easy for users to modify at various locations. The original programmer needn't know anything about target locations, because the person installing the program can easily change a few environment settings in the initialization script. Better still, the user can configure many directories and environment variables which are passed on to the Perl program as long as the perl program and its arguments are arguments of the init shellscript.

For instance, here's emdlinit.sh from the EMDL distribution, which uses Node.pm:

#!/bin/sh

# Copyright (C) 2004 by Steve Litt, all rights reserved.
# Licensed under version 2 of the GNU General Public License
# See the COPYING file
# NO WARRANTY!!!!

################################################################
# To use emdl in the easiest possible way, change the following
# variables to reflect the locations of their respective files.
################################################################

EMDL_DIRECTORY=/data/perl/emdl		#LOCATION OF emdlparser.pl
UMENU_DIRECTORY=~/.umenu		#TOP OF UMENU TREE
NODE_DIRECTORY=/data/perl/Node		#LOCATION OF Node.pm

EMDL_CONFIG=$EMDL_DIRECTORY/emdl.cfg	#FULL PATH OF emdlparser.pl config file
UMENU_MENUDIR=$UMENU_DIRECTORY/menudir	#LOCATION OF UMENU MENU DEF FILES
UMENU_PROGRAMDIR=$UMENU_DIRECTORY/program	#LOCATION OF UMENU.PL
EMDL_TEMP=$EMDL_DIRECTORY/temp		#WRITEABLE TEMP DIR FOR COMPILED .mnu FILES

export EMDL_DIRECTORY EMDL_CONFIG UMENU_DIRECTORY NODE_DIRECTORY UMENU_MENUDIR UMENU_PROGRAMDIR EMDL_TEMP

cd $EMDL_DIRECTORY

$1 $2 $3 $4 $5 $6 $7 $8 $9

As you can see above, we have configured environment variables for many locations, not just that of Node.pm. What means is once the user configures this one self explanatory script, he or she can run all programs using those environment variables, just by preceding the command with ./emdlinit.sh. Better yet, ./emdlinit.sh can be copied to a directory on the executable path and run without the directory designator.

If you want your application to reside in its own directory, without intermixing with other applications and tools, the shellscript method is the best. The shellscript method works especially well when supplying environment variables to use lib statements in Perl programs.

Import Node.pm at runtime based on a configuration file

This is probably overkill, but I use it sometimes.

One problem with the use lib method is that it does not occur at runtime, but instead between the time Perl initializes itself and the time the program is executed. As a result, the use lib line IS aware of environment variables, but IS NOT aware of any data processing the program does, including file reads. Bottom line, it cannot deduce the Node.pm path by reading a configuration file. Sometimes a configuration file is just what you need.

In such a case, you can use require and import as a substitute for use. require and import happen at runtime, not Perl initialization time, so as long as they're done before the use Node statement, everything's fine. Before performing the require and import statements, you can read a config file, and prepend any found Node directories to @INC in real time. The following is an example:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

#####################################################################
# The loadNodeModule() subroutine is a complete substitute for:
#    use Node
# 
# It includes:
#	require Node;
#	import Node;
# 
# The preceding two calls completely replace a use Node statement,
# and better still, unlike the use statement, they happen at 
# runtime instead of compile time. Therefore, this subroutine reads
# the directory from a config file, then executes that directory
# with the proper require and import statements. Obviously, the
# loadNodeModule() subroutine must be executed before any code depending
# on the Node.pm module is executed.
#####################################################################
sub loadNodeModule()
	{
	#### CHANGE THE FOLLOWING TO CHANGE THE DEFAULT APP FILENAME
	my $defaultConfFileName = "./myapp.cfg";

	#### CHANGE THE FOLLOWING TO CHANGE APP FILENAME ENVIRONMENT VAR
	my $envVarName = "MY_APP_CONFIG";

	my($conffile) = $ENV{$envVarName};
	print $conffile, "\n" if defined $conffile;
	$conffile = $defaultConfFileName unless defined($conffile);
	print "Using config file $conffile.\n";

	open CONF, '<' . $conffile or die "FATAL ERROR: Could not open config file $conffile.";
	my @lines = <CONF>;
	close CONF;

	my @nodedirs;
	foreach my $line (@lines)
		{
		chomp $line;
		if($line =~ m/^\s*nodedir\s*=\s*([^\s]*)/)
			{
			my $dir = $1;
			if($dir =~ m/(.*)\$HOME(.*)/)
				{
				$dir = $1 . $ENV{'HOME'} . $2;
				}
			push @nodedirs, ($dir);
			}
		}

	if(@nodedirs)
		{
		unshift @INC, @nodedirs;
		}

	require Node;
	import Node;
	}

#####################################################################
# The main() routine calls loadNodeModule to include Node.pm,
# and then runs a few lines of code to conclusively prove that
# Node.pm is loaded. It also prints out the @INC array to show that
# directory in which Node.pm resides is now in the @INC path.
#
# Note that in the absense of any change to the environment variable
# defined in loadNodeModule(), the configuration file will be ./myapp.cfg.
#####################################################################
sub main()
	{
	loadNodeModule();
	my $topNode = Node->new("myname", "mytype", "myvalue");
	print "\n::: ";
	print $topNode->getName(), " ::: ";
	print $topNode->getType(), " ::: ";
	print $topNode->getValue(), " :::\n";
	foreach my $line (@INC)
		{
		print $line, "\n";
		}
	}

main();

In the preceding, as long as you have a myapp.cfg in the current directory, and it has a line like this:

nodedir=/data/perl/Node

then you can run the program without a shellscript:

[slitt@mydesk slitt]$ ./test.pl
Using config file ./myapp.cfg.

::: myname ::: mytype ::: myvalue :::
/path/to/nodemodule
/data/perl/Node
/usr/lib/perl5/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/5.8.1
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.1
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
.
[slitt@mydesk slitt]$

As I said, this is overkill, but occasionally it might come in handy.

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist. Steve can be reached at his email address.

The Node.pm API

By Steve Litt

This article offers a synopsis of the Node.pm API in outline form. This outline was created in VimOutliner, which is the fastest way to create an outline. Using Node.pm, I then wrote a program to convert an outline to a nested HTML unordered list, producing the HTML version of the outline that you see in this article.

package Node

For Construction

sub new()

Arguments

0: typeOfClass
1: Name of Node
2: Type of Node
3: Value of Node

Returns

The newly created Node object

For single attributes

sub setAttribute()

Arguments

0: $self
1: Attribute name
2: Attribute Value

Returns

Nothing

sub removeAttribute()

Arguments

0: $self
1: Name of attribute to be removed

Returns

Nothing

sub getAttribute()

Arguments

0: $self
1: Name of attribute to be retrieved

Returns

Value of requested attribute, or undef if it doesn't exist

sub hasAttribute()

Arguments

0: $self
1: Name of queried attribute

Returns

1 if there is an attribute with that name, 0 otherwise

For attribute array

sub hasAttributes()

Arguments

0: $self

Returns

1 if the $self node has attributes, 0 if it does not

sub getAttributes()

Arguments

0: $self

Returns

The attribute hash (not a reference, but an actual hash)

sub setAttributes()

Arguments

0: $self
1: A reference to a hash of intended attributes

Returns

Nothing

For Navigation

Has routines

sub hasFirstChild()

Arguments

0: $self

Returns

1 if the $self node has a first child, 0 otherwise

sub hasLastChild()

Arguments

0: $self

Returns

1 if the $self node has a last child, 0 otherwise

: Probably unnecessary, because normally if a node has
: a first child, it will also have a last child even
: if they're the same node.

sub hasNextSibling()

Arguments

0: $self

Returns

1 if the $self node has a next sibling, 0 otherwise

sub hasPrevSibling()

Arguments

0: $self

Returns

1 if the $self node has a next sibling, 0 otherwise

sub hasParent()

Arguments

0: $self

Returns

1 if the $self node has a parent, 0 otherwise

Get routines

sub getFirstChild()

Arguments

0: $self

Returns

The first child of the $self node, undef if there is none

sub getLastChild()

Arguments

0: $self

Returns

The last child of the $self node, undef if there is none

sub getNextSibling()

Arguments

0: $self

Returns

The next sibling of the $self node, undef if there is none

sub getPrevSibling()

Arguments

0: $self

Returns

The previous sibling of the $self node, undef if there is none

sub getParent()

Arguments

0: $self

Returns

The parent of the $self node, undef if there is none

Set routines

sub setFirstChild()

Arguments

0: $self
1: Node to be assigned as first child of $self

Returns

Nothing

sub setLastChild()

Arguments

0: $self
1: Node to be assigned as last child of $self

Returns

Nothing

sub setNextSibling()

Arguments

0: $self
1: Node to be assigned as next sibling of $self

Returns

Nothing

sub setPrevSibling()

Arguments

0: $self
1: Node to be assigned as previous sibling of $self

Returns

Nothing

sub setParent()

Arguments

0: $self
1: Node to be assigned as parent of $self

Returns

Nothing

For content

Has routines

sub hasName()

Arguments

0: $self

Returns

1 if the $self node has a defined name, 0 otherwise

sub hasType()

Arguments

0: $self

Returns

1 if the $self node has a defined type, 0 otherwise

sub hasValue()

Arguments

0: $self

Returns

1 if the $self node has a defined value, 0 otherwise

Get routines

sub getName()

Arguments

0: $self

Returns

The name of the $self node,
undef if the node's value is undefined

sub getType()

Arguments

0: $self

Returns

The type of the $self node,
undef if the node's value is undefined

sub getValue()

Arguments

0: $self

Returns

The value of the $self node,
undef if the node's value is undefined

Set routines

sub setName()

Arguments

0: $self
1: The name of the $self node

Returns

Nothing

sub setType()

Arguments

0: $self
1: The type of the $self node

Returns

Nothing

sub setValue()

Arguments

0: $self
1: The value of the $self node

Returns

Nothing

For Insertion

sub insertSiblingBeforeYou()

Arguments

0: $self
1: The node to be inserted before $self

Returns

The newly inserted node

sub insertSiblingAfterYou()

Arguments

0: $self
1: The node to be inserted after $self

Returns

The newly inserted node

sub insertFirstChild()

Arguments

0: $self
1: The node to be inserted as the first child of $self

Returns

The newly inserted node

sub insertLastChild()

Arguments

0: $self
1: The node to be inserted as the last child of $self

Returns

The newly inserted node

For deletion

sub deleteSelf()

Arguments

0: $self

Returns

Nothing

For cloning

sub clone()

Arguments

0: $self

Returns

A new node with the same content and pointers as $self

package OutlineParser

For Creation

sub new()

Arguments

0: Type of class

Returns

A new OutlineParser object

Defaults

head: Node->new("Header Node", "Head", "Inserted by OutlineParser");
fromstdin = 1;
zapblanks = 1;

For setting parser properties

sub setCommentChar()

Arguments

0: $self
1: Intended comment character for this parser

Returns

Nothing

sub getCommentChar()

Arguments

0: $self

Returns

Comment character for this parser

sub hasCommentChar()

Arguments

0: $self

Returns

1 if parser has comment character, 0 otherwise

sub fromStdin()

Arguments

0: $self

Returns

Nothing

Action

Sets parser to accept input from a file instead of stdin

sub fromFile()

Arguments

0: $self

Returns

Nothing

Action

Sets parser to accept input from stdin instead of a file

sub zapBlanks()

Arguments

0: $self

Returns

Nothing

Action

Sets parser to ignore whitespace-only lines

sub dontZapBlanks()

Arguments

0: $self

Returns

Nothing

Action

Sets parser to process whitespace-only lines

Other routines

sub getHead()

Arguments

0: $self

Returns

Head node of node tree

sub getFirstNonBlankChar()

Arguments

0: $self
1: line in which to find first nonblank

Returns

Single character: First non whitespace character on the line

Comment

Primarily used internally

sub parse()

Arguments

0: $self
1: Name of file to parse (if parsing from file)

Returns

The top level node in the newly created node tree

package Walker

sub new()

Arguments

0: Type of class
1: Node whose tree to walk
2: Reference to entry callback routine (undef if none)
3: Reference to return callback routine (optional)

Returns

A new Walker object

sub walk()

Arguments

0: $self

Returns

Nothing

Action

Walks the tree headed by $self->getHead()
At each node executes the entry callback routine
At every node reentered from its children,

executes the return callback routine

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

The Outline to HTML Outline program

By Steve Litt

As mentioned in the preceding article, I wrote a Node.pm program to convert a VimOutliner outline to an HTML nested unordered list. Different levels have different text colors. It's an incredibly simple program that took maybe 45 minutes to write and debug. Here it is:

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use lib "$ENV{'HOME'}/Node";

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}


sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	return unless defined $checker;
	return if $level == 0; #### Don't print the contrived first node

	my $bulletMod = ($level - 1) % 3;
	my $colorMod = ($level - 1) % 7;
	my @bulletTypes = ("disc", "circle", "square");
	my @levelColors = ("000000", "990000", "006600", "000099",
		"663300", "660066", "003366");

	print "<li type=\"";
	print $bulletTypes[$bulletMod];
	print "\">";

	print "<font color=\"#";
	print $levelColors[$colorMod];
	print "\">";

	print $checker->getValue() if $checker->hasValue();
	print "<\/font>";
	print "<\/li>\n";
	print "<ul>\n" if $checker->hasFirstChild();
	}

sub cbFinishList()
	{
	my($self, $checker, $level) = @_;
	return unless defined $checker;
	return if $level == 0; #### Don't print the contrived first node

	print "</ul>\n";
	}

package Main;

sub main()
	{
	#### START HTML
	print "<html><head>\n";
	print "<title>CREATED BY otl2htmloutline.pl<\/title>\n";
	print "<\/head><body>\n";
	print "<ul>";

	#### PARSE FROM STDIN
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromStdin();
	my $topNode=$parser->parse();

	#### INSTANTIATE THE Callbacks OBJECT
	my $callbacks = Callbacks->new();

	#### WALK THE NODE TREE,
	#### OUTPUTTING AN HTML FILE WITH NESTED UNORDERED LIST
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintNode, $callbacks],
		[\&Callbacks::cbFinishList, $callbacks]
		);
	$walker->walk();

	#### FINISH HTML
	print "<\/ul>";
	print "\n<\/body><\/html>\n";
	}

main();

The preceding is a great example of the power of Node.pm. In a few lines of code, most of which is boilerplate, we've converted a tab indented outline to an html unordered list outline. Not only that, but we've assigned specific bullet types and specific colors to specific levels. We've accomplished that with no nested loops, no nested branches, no break logic, and no global variables. The code is amazingly readable.

Think of other ways you could have performed this transformation. My first thought would have been a VI script. But how would one convert specific tab levels to nested <ul>? I can't think of a way.

So we would need a computer program. Without Node.pm, we'd need to do something like this for each line:

Read the line
Count the indentation level
change = indentation level - previous indentation level
if change > 0 for (i=0; i < change; i++) print "<ul>"
if change < 0 for (i=change; i > 0; i--) print "</ul>"
print "<li>"
print the bullet and color tags based on level
print the line
print the color and bullet end tags
print "<li>"
previous indentation level = indentation level

The preceding pseudocode isn't horrible, and it would certainly be less lines of code than thebut it involves comparison to the previous iteration. It involves counting tabs at the beginning of each line. True, it could be implemented in less lines than the Node.pm version, but for the person who knows Node.pm, the Node.pm version reads like a book.

Better yet, the Node.pm design is obvious. You know the level dependent end tags (</ul>) must be placed in the return callback routine, and they'll be printed in the right place and sequence.

Steve Litt is the author of the Universal Troubleshooting Process Courseware. Steve can be reached at his email address.

A Node.pm Based Binary Tree

By Steve Litt

Remember Comp Sci class where they taught binary trees as an efficient way to both read and write sorted data, and keep it sorted over time? Node.pm is NOT the optimal tool to use for a binary tree, but in order to show the amazing versatility of Node.pm, I've created a binary tree out of Node objects.

Node objects can have any number of children, but in a binary tree each node can have a maximum of 2 children, and if there are two, one must be less than and one must be greater than the parent node. The child with the lesser value is called the "left" child, while the one whose value is greater is called the "right" object.

The easiest way to envision a binary tree is as a tree with the top node at the top, and the others spread out from left to right. At any node, you know that any descendants to the left of it is a lesser value than the node, and any descendents to the right of it are of a greater value than the node.

#                            12 
#                             |
#             .---------------`-------------------.
#             |                                   |
#             4                                  16
#             |                                   |
#      .------`--------.                   .------`--------.
#      |               |                   |               |
#      2               6                  14              20
#      |               |                   |               |
#  .---`---.       .---`---.           .---`---.       .---`---.
#  |       |       |       |           |       |       |       |
#  1       3       5       7          13      15      19      21
#                          `--.                        |
#                             |                    .---`
#                             8                    |
#                             |                   18
#                             `--.                 |
#                                |             .---`
#                               10             |
#                                |            17
#                            .---`---.
#                            |       |
#                            9      11

Let's process the following list into a binary tree:

Montgomery
Richardson
Washington
Anderson
Osbourne
Peterson
Ethridge
Zelinski
Douglas
Jackson
Gardner
Quigley
Stevens
Thomas
Yamaha
Ibenez
Victor
Xavier
Heller
Barnes
Ullman
Carter
Farber
Louis
Kelly
Nagle

We'll create a program that does the following:

Places the preceding list, in the same order as the file, into a node tree. We could just read the values directly from the file, but because this is already a Node.pm program, it's just as easy to use an OutlineParser object.
Read those words one at a time and place into a binary tree, which is implemented as another node tree.
Print out the unsorted list
Print out the Btree in outline order, complete with level-aware indentation
Print the nodes in alphabetic order
Search for several strings, including one that doesn't exist

#!/usr/bin/perl -w

# Copyright (C) 2004 by Steve Litt
# Licensed with the GNU General Public License, Version 2
# ABSOLUTELY NO WARRANTY, USE AT YOUR OWN RISK
# See http://www.gnu.org/licenses/gpl.txt

use strict;

use lib "$ENV{'HOME'}/Node";

use Node;

package Callbacks;
sub new()
	{
	my($type) = $_[0];
	my($self) = {};
	bless($self, $type);
	return($self);
	}


sub cbPrintNode()
	{
	my($self, $checker, $level) = @_;
	return unless defined $checker;

	for(my $i=0; $i < $level; $i++){print "\t";}
	print $checker->getValue();
	print " (", $checker->getType(), ")";
	print " (", $checker->getName(), ")";
	if($checker->hasAttribute("_lineno"))
		{
		print " (Line ", $checker->getAttribute("_lineno"), ")";
		}
	print "\n";
	}

sub cbPrintSortedEntry()
	{
	my($self, $checker, $level) = @_;
	return unless defined $checker;

	if(($checker->getType() eq 'R'))
		{
		if($checker->hasParent())
			{
			print $checker->getParent->getValue();
			print " (", $checker->getParent->getType(), ")\n";
			}
		}
	if($checker->hasFirstChild() == 0)
		{
		print $checker->getValue();
		print " (", $checker->getType(), ")\n";
		}
	}

sub cbPrintSortedReturn()
	{
	my($self, $checker, $level) = @_;
	return unless defined $checker;
	return unless defined $checker->getLastChild();

	if($checker->getLastChild()->getType eq 'L')
		{
		if($checker->hasParent())
			{
			print $checker->getValue();
			print " (", $checker->getType(), ")\n";
			}
		}
	}

package Main;

sub howManyChildren($)
	{
	my $node = shift;
	return 0 unless $node->hasFirstChild();
	return 1 unless $node->getFirstChild()->hasNextSibling();
	return 2;
	}

sub insertOneNodeIntoBtree($$)
	{
	my $btreeHead = shift;
	my $flattreeNode = shift;

	return "undefined_btreeHead" unless defined $btreeHead;
	return "undefined_flattreeNode" unless defined $flattreeNode;
	return "undefined_flattreeNode_value" unless $flattreeNode;
	
	my $value = $flattreeNode->getValue();

	my $newNode = Node->new("","",$value);
	$newNode->setAttribute("_lineno", $flattreeNode->getAttribute("_lineno"));
	return "could_not_create_new_node" unless defined $newNode;

	my $returnString;
	my $btreeChecker = $btreeHead;
	my $inserted = 0;

	while(! $inserted)
		{
		my $comparison = ($value cmp $btreeChecker->getValue());
		my $children = howManyChildren($btreeChecker);
		if($comparison == 0)
			{
			$returnString = "duplicate_" . $value;
			$inserted = -1;
			}
		elsif($children == 0)
			{
			if($comparison == 1)
				{
				$newNode->setType('R');
				}
			else
				{
				$newNode->setType('L');
				}
			$newNode->setName('0');
			$btreeChecker->insertFirstChild($newNode);
			$returnString = "inserted";
			$inserted = 1;
			}
		elsif($children == 2)
			{
			if($comparison == 1)
				{
				$btreeChecker = $btreeChecker->getLastChild();
				}
			else
				{
				$btreeChecker = $btreeChecker->getFirstChild();
				}
			}
		elsif($children == 1)
			{
			if($comparison == 1)
				{
				if($btreeChecker->getFirstChild()->getType() eq 'L')
					{
					$newNode->setType('R');
					$newNode->setName('1R');
					$btreeChecker->insertLastChild($newNode);
					$returnString = "inserted";
					$inserted = 1;
					}
				else
					{
					$btreeChecker = $btreeChecker->getFirstChild();
					}
				}
			else
				{
				if($btreeChecker->getFirstChild()->getType() eq 'R')
					{
					$newNode->setType('L');
					$newNode->setName('1L');
					$btreeChecker->insertFirstChild($newNode);
					$returnString = "inserted";
					$inserted = 1;
					}
				else
					{
					$btreeChecker = $btreeChecker->getFirstChild();
					}
				}
			}
		}
	return $returnString;
	}


sub buildBtree($)
	{
	my $flattreeNode = shift;
	return undef unless defined $flattreeNode;
	return unless $flattreeNode->hasFirstChild();

	my $result;
	my $btreeChecker;

	my $flatChecker = $flattreeNode->getFirstChild();
	my $btreeHead =  Node->new("top", "T", $flatChecker->getValue());
	$flatChecker = $flatChecker->getNextSibling();
	while(defined($flatChecker))
		{
		$result = insertOneNodeIntoBtree($btreeHead, $flatChecker);
		$flatChecker = $flatChecker->getNextSibling();
		}
	return $btreeHead;
	}

sub displayTree($)
	{
	my $topNode = shift;
	
	my $callbacks = Callbacks->new();
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintNode, $callbacks]
		);
	$walker->walk();
	}

sub printSortedList($)
	{
	my $topNode = shift;
	my $callbacks = Callbacks->new();
	my $walker = Walker->new
		(
		$topNode,
		[\&Callbacks::cbPrintSortedEntry, $callbacks],
		[\&Callbacks::cbPrintSortedReturn, $callbacks]
		);
	$walker->walk();
	}

sub stdin2flattree()
	{
	my $parser = OutlineParser->new();
	$parser->setCommentChar("#");
	$parser->fromStdin(); my $topNode=$parser->parse();
	return $topNode;
	}

sub findNode($$)
	{
	my $btreeTopNode = shift;
	my $value = shift;
	my $found = 0;
	my $tests = 0;


	my $checker = $btreeTopNode;
	while(! $found)
		{
		$tests++;
		my $comparison = ($value cmp $checker->getValue());
		if($comparison == 0)
			{
			$found = 1;
			last;
			}
		elsif(!$checker->hasFirstChild())
			{
			$found = -1;
			last;
			}
		if(! $found)
			{
			if($comparison == -1)
				{
				$checker = $checker->getFirstChild();
				if($checker->getType() ne 'L')
					{
					$found = -1;
					}
				}
			else
				{
				$checker = $checker->getLastChild();
				if($checker->getType() ne 'R')
					{
					$found = -1;
					}
				}
			}
		}

	if($found == 1)
		{
		print "$value found, line number ";
		print $checker->getAttribute('_lineno');
		print " $tests nodes tested.\n";
		return $checker;
		}
	else
		{
		print "$value not found, $tests nodes tested.\n";
		return undef;
		}
	}

sub main()
	{
	my $flattreeTopNode = stdin2flattree();
	my $btreeTopNode = buildBtree($flattreeTopNode);

	print "########## ORIGINAL WORDS FOLLOW...\n";
	displayTree($flattreeTopNode);

	print "\n\n########## BINARY TREE FOLLOWS...\n";
	displayTree($btreeTopNode);


	print "\n\n########## SORTED LIST FOLLOWS...\n";
	printSortedList($btreeTopNode);


	print "\n\n########## FINDING CARTER...\n";
	findNode($btreeTopNode, "Carter");

	print "\n\n########## FINDING ULLMAN...\n";
	findNode($btreeTopNode, "Ullman");

	print "\n\n########## FINDING WASHINGTON...\n";
	findNode($btreeTopNode, "Washington");

	print "\n\n########## FINDING NoSuch...\n";
	findNode($btreeTopNode, "NoSuch");
	}

main();

We'll now discuss the preceding program. In the following discussion, be sure to think of the tree with the root at the top, and growing downward:

#                            12 
#                             |
#             .---------------`-------------------.
#             |                                   |
#             4                                  16
#             |                                   |
#      .------`--------.                   .------`--------.
#      |               |                   |               |
#      2               6                  14              20
#      |               |                   |               |
#  .---`---.       .---`---.           .---`---.       .---`---.
#  |       |       |       |           |       |       |       |
#  1       3       5       7          13      15      19      21
#                          `--.                        |
#                             |                    .---`
#                             8                    |
#                             |                   18
#                             `--.                 |
#                                |             .---`
#                               10             |
#                                |            17
#                            .---`---.
#                            |       |
#                            9      11

In the preceding program, notice several things:

No provision for Duplicates

The preceding code has no provision for duplicates. For simplicity's sake, we simply forbid duplicates. It wouldn't be particularly hard to allow duplicates, or to have a duplicate update the value of a node.

Node Insertion

Node insertion is done in a loop, starting from the top. At each node, the new value is compared to the node's value, and if it's less it falls to the left, but if it's more it falls to the right. When I say "falls to", one of two things can happen. If the new value is smaller than the node, and there's already a left child, the next iteration features the left child. However, if there's no left child, then the new node becomes the current node's left child. A similar algorithm happens if the new value is bigger -- it falls to the right, and if there's already a right node that node is featured in the next iteration, but if there's no right node the new node is inserted as the right node.

Node objects don't natively support the concept of left and right, so we store that information in the Node object's type.

Sorting

Sorting is done by a walker object connected to an entry callback routine and a return callback routine. The entry callback prints the node's parent's value if the node is a right node, and if the parent exists. Then the entry prints the value of the node itself if the node is a leaf node.

The return callback prints the current node's value if the current node's only child is a left node.

At first this sounds wierd, but it really makes sense. Start with the entry callback. A node should not be printed until every one of its left hand ancestors is printed. You know its left hand decendents are all printed when you process a right child. Because leaf nodes have no decendents, they're printed right away. The entry callback takes care of almost every eventuality. Almost...

What happens if a node has only a left child, with no right child. The entry callback would not print that node. You might be able to add some code to the entry callback to cobble together the ability to print nodes with only a left child, but it would be very hard to get everything in the right order.

It would be easy to get the right order with a return callback, because a node should be printed AFTER all its left hand descendents.

Searching

Searching is done in a loop starting with the topmost node. At each node, if the sought string is less than the node's value, you go down and to the left. If it's greater, down and to the right. If equal, report it found and pass back the node as the function return.

Summary

This was strictly a theoretical exercise. First, there's no need for binary trees in Perl because Perl offers the Perl hash. Second, Node.pm Node objects are not ideally suited for binary trees, because a node can have any number of children, and there's no native concept of left and right nodes. We had to program that into the node types.

What this exercise showed us is the massively wide applicability of Node.pm. With the possible exception of insertion, building and maintaining a binary tree with Node.pm was very easy.

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist. Steve can be reached at his email address.

Life After Windows: High Quality Tools

Life After Windows is a regular Linux Productivity Magazine column, by Steve Litt, bringing you observations and tips subsequent to Troubleshooters.Com's Windows to Linux conversion.

By Steve Litt

A good tool is hard to find. Do you remember MFC? Not to pick on Microsoft, how bout Borland's old OWL? I briefly dabbled with qt and GTK, but the learning curve was significant.

Doesn't it seem sometimes like tools make things more complicated rather than less?

That's not to say there aren't some great tools. I wrote an app to control a worldwide fleet of routers with Perl tools BER.pm and SNMP_Session.pm. Certainly life would be dismal without Getopt::Long. And who in their right mind would attempt XML without tools such as Apache Software Foundation's XML tools for Perl and Java.

Here's the distinction. Even the most accomplished toolsman typically uses less than 100 tools. Compare that to the thousands of tools in CPAN (Comprehensive Perl Archive Network). How many times have you not been able to install a CPAN tool? Or installed it but it doesn't work as advertised? Or once you install it you find you could have written a small amount of code to get it to work?

A good tool is hard to find.

Sometimes a good tool is also hard to comprehend. I've used Node.pm, in one form or another, for several years. Only in the last year have I understood its true power and wide applicability.

I first made Node.pm to manipulate outlines in Perl programs I wanted to distribute along with the VimOutliner project. My first choice would have been a Perl DOM tool from the Apache Software Foundation, but that tool does not ship with standard Perl. When creating programs for free software distribution, I try my best to make my distribution work with the lowest common denominator Perl installed on the user's box. The average user can't or won't upgrade their Perl. Also, CPAN upgrades do compilation, and that can get pretty scary. I've actually broken perl performing CPAN updates. Yes, the break probably resulted from a mistake I made, but such mistakes are much harder to make if one simply copies a .pm file to the hard disk.

I decided to create a "poor man's DOM" in the form of an object with pointers to its parent, first and last children, and next and previous siblings, placing it in each of the outline manipulation files. Later it grew a parser object to convert from an outline file to a node tree. With early version I actually wrote application code to walk the node tree, but later I created a walker object to walk the tree, calling callback routines. Seeing the wider applicability of this tool, I pulled it out into its own file.

In this magazine's Editor's Desk I discussed how beautifully Node.pm simplified the incredibly complexity of the EMDL to UMENU converter, and how in writing that converter I was able to test and improve Node.pm itself. A few months later UMENU was rewritten with Node.pm. It was then I began to grasp the incredibly wide applicability of Node.pm, and the reliability of using such a well defined and tested tool.

The Node.pm tool has revolutionized my ability to quickly create high quality programs. This tool saves so much time that it's almost worth including it in every program I write.

Evaluating Programming Tools

Evaluating a programming tool is a matter of cost/benefit analysis. Return on Investment.

Costs include:

Purchase price (0 for free software)
License hassles (None for free software unless you're building a non-free product)
Learning time
Bug workarounds

Benefits should include

Faster development
More robust code
Wide applicability
Costs of tool spread over a wide group of applications

I think there are two kinds of programming tools:

Tools narrowly focused on a specific type of task
Tools with wide applications

Certainly the SNMP_Session.pm tool is an example of a narrowly focused tool. You use it when you need SNMP, which for most of us isn't very often.

The Getopt::Long tool is a perfect example of a tool with wide application. It's hard to write a program that can't benefit from Getopt::Long.

Narrowly focused tools are harder to justify because the learning time is allocated to just a handful of programs. Also, narrowly focused tools tend to be buggier, perhaps because they're not hammered by as many users. Narrowly focused tools are best used when you have absolutely no idea how to proceed without them. Certainly my SNMP project was an example of that.

It's possible that by using the narrowly focused product, you'll learn enough to write your own code more suited to your needs. In fact, it was my study of XML::DOM that enabled me to create Node.pm. Once I understood the DOM API, I found it easy to create the similar but smaller API of Node.pm.

Widely applicable tools are easier to justify, if their wide applicability is recognized, and if they're good, easy and robust tools. Here's a series of three questions to help evaluate a tool:

What capabilities does the tool give you?
Thinking broadly, how widely could you apply these capabilities?
How easy is it to realize these capabilities?
What are the alternatives?

What capabilities does the tool give you?

It's obviously important to understand the capabilities bestowed by the tool. Does it give you what you need? Does it throw in features willy-nilly, complexifying what could otherwise be a clean and simple interface?

Applied to Node.pm, the answer to this question would be:

Manipulation of hierarchies
Parsing of tab delimited outlines

Thinking broadly, how widely could you apply these capabilities?

If there's one thing Node.pm taught me, it's that many applications are not initially obvious. When first written, I viewed Node.pm simply as a tool for handling tab indented outlines. That's a very narrow focus.

Several months I defined EMDL (Easy Menu Definition Language) as an outline with certain structural elements, and created an EMDL to UMENU parser. Even though EMDL is obviously an outline, it never occurred to me to use it in the parser.

Over the next year I used Node.pm for several tasks, finally understanding that beyond parsing outlines, Node.pm exuded value in its in-memory handling of hierarchies. That led me to rewrite the cumbersome EMDL to UMENU parser using Node.pm. The results were astounding -- a 7 fold increase in runtime speed accompanied by a huge reduction in complexity and a huge increase in maintainability.

More months went by and I realized the ubiquity of hierarchies. Menus are hierarchies. Configuration information is, or should be, a hierarchy. Filesystems are hierarchies. XML is a hierarchy. Todo lists are hierarchies. Help systems are primarily hierarchies. OOP is hierarchy based, both in the IS-A sense and the HAS-A sense. Computer program structures are often hierarchies, with nest upon nest of typedefs. Much computer code is hierarchical, with subroutines calling subroutines calling subroutines. The Internet's DNS system is a hierarchy. To a great degree an organization's personnel form a hierarchy. Finances can be seen as a hierarchy:

Demand accounts

First National of Moore

Checking
Savings

Leonard national Bank

Savings
CD

Mutual Funds

Weaver Funds

Corporate Bond Fund
Lebanese Small Cap Fund

Tate High Yield Fund
Thomas No Load Mutual Fund

Stocks

IBM
Microsoft
Troubleshooters.Com

Once you find a really good tool, exploit it for all it's worth, and make glue routines making its use easier.

How easy is it to realize these capabilities?

You can write a very nice graphical app with Microsoft MFC. But I'd be willing to bet 90% of you reading this article haven't written more than 3 MFC apps. Invalidating rectangles, pens and device contexts, and the three musketeers (View object, a Frame object, and Document object) are more stumbling blocks than helpful tools.

MFC was popular for awhile, primarily because it was propped up by a convicted monopolist, but I never met anyone who said they liked it.

Contrast MFC with Node.pm. Any time you need any sort of hierarchy, you include Node.pm, copy and paste an old callback object, edit all the callback routines so they return at the top for nodes not needing work, and then accomplish their work with a combination of Node.pm calls and regular Perl. Build a node tree with simple algorithms or from an outline with an OutlineParser object. Finally, use Walker objects to walk the node tree and invoke specified callback routines on each node.

When you find an easy tool with wide application that's quick to run, quick to write, and results in stable, bug free code, it's like alchemy. You can build apps out of nothing in no time.

Be on the Lookout

Now that you know what constitutes an excellent tool and how to exploit it, always be on the lookout for new tools you can use. If a tool is close but no cigar, perhap you can surround it with Perl or shellscript wrappers. Or maybe you can write a "lite" version of the tool, like Node.pm is essentially a "DOM lite".

Steve Litt is the founder and acting president of Greater Orlando Linux User Group (GoLUG). Steve can be reached at his email address.

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).

Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the GNU/Linux or open source. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

Any article submitted to Linux Productivity Magazine must be licensed with the Open Publication License, which you can view at http://opencontent.org/openpub/. At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Linux Productivity Magazine, you must decline the option to prohibit commercial use, because Linux Productivity Magazine is a commercial publication.

Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

Copyright (c) 2003 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version Draft v1.0, 8 June 1999 (Available at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version is presently available at http://www.opencontent.org/openpub/).

Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.

After that paragraph, write the title, text of the article, and a two sentence description of the author.

Why not Draft v1.0, 8 June 1999 OR LATER

The Open Publication License recommends using the word "or later" to describe the version of the license. That is unacceptable for Troubleshooting Professional Magazine because we do not know the provisions of that newer version, so it makes no sense to commit to it. We all hope later versions will be better, but there's always a chance that leadership will change. We cannot take the chance that the disclaimer of warranty will be dropped in a later version.

Trademarks

All trademarks are the property of their respective owners. Troubleshooters.Com^(R) is a registered trademark of Steve Litt.

URLs Mentioned in this Issue

Node.pm and programs written with Node.pm

http://www.troubleshooters.com/projects/Node: Home of Node.pm.
http://www.troubleshooters.com/projects/VimOutliner: Home of VimOutliner, the best tool for constructing tab indented outlines.
http://www.troubleshooters.com/projects/emdl: Home of the EMDL project and EMDL to UMENU converter, which was created with Node.pm.
http://www.troubleshooters.com/umenu: Home of UMENU, a command line menu program created with Node.pm.

Miscelleneous URLs

http://www.icewm.org: Home of IceWM, the desktop manager that I use.
http://www.troubleshooters.com/tpromag/200103/200103.htm: A huge XML treatise that explains DOM.
http://www.troubleshooters.com/bookstore/ttech.htm: Home of the book "Troubleshooting Tools of the Successful Technologist.

http://www.switch.ch/misc/leinen/snmp/perl/: Home of BER.pm and SNMP_Session.pm
http://www.gtk.org/: Home of the GTK development tool
http://www.trolltech.com/: Home of the QT development tool
http://www.cpan.org: The CPAN Perl tool repository.
http://xml.apache.org/xerces-p/index.html: XML and DOM for Perl
http://xml.apache.org/xerces2-j/index.html: XML and DOM for Java
http://xml.apache.org/xerces-c/index.html: XML and DOM for C++
http://www.perldoc.com/perl5.6/lib/Getopt/Long.html: Documentation for Getopt::Long