Troubleshooters.Com and Code Corner Present

Steve Litt's Perls of Wisdom:
Subroutines and References
in Perl
(With Snippets)

Copyright (C) 1998-2003 by Steve Litt


Contents

Introduction

This page discusses both subroutines and references. They're on the same page because references are often passed into and out of subroutines.

References

In Perl, you can pass only one kind of argument to a subroutine: a scalar. To pass any other kind of argument, you need to convert it to a scalar. You do that by passing a reference to it. A reference to anything is a scalar. If you're a C programmer you can think of a reference as a pointer (sort of).

The following table discusses the referencing and de-referencing of variables. Note that in the case of lists and hashes, you reference and dereference the list or hash as a whole, not individual elements (at least not for the purposes of this discussion).
 
Variable Instantiating
the scalar
Instantiating a
reference to it
Referencing it Dereferencing it Accessing an element
$scalar $scalar = "steve";
$ref = \"steve";
$ref = \$scalar $$ref or
${$ref}
N/A
@list @list = ("steve", "fred");
$ref = ["steve", "fred"];
$ref = \@list @{$ref} ${$ref}[3]
$ref->[3]
%hash %hash = ("name" => "steve",
   "job" => "Troubleshooter");
$hash = {"name" => "steve",
   "job" => "Troubleshooter"};
$ref = \%hash %{$ref} ${$ref}{"president"}
$ref->{"president"}
FILE

$ref = \*FILE {$ref} or scalar <$ref>

These principles are demonstrated in the source code below. Note the following anomolies:

sub doscalar
   {
   my($scalar) = "This is the scalar";
   my($ref) = \$scalar;
   print "${$ref}\n";   # Prints "This is the scalar".
   }

sub dolist
   {
   my(@list) = ("Element 0", "Element 1", "Element 2");
   my($ref) = \@list;
   print "@{$ref}\n";    # Prints "Element 0 Element 1 Element 2".
   print "${$ref}[1]\n"; # Prints "Element 1".
   }

sub dohash
   {
   my(%hash) = ("president"=>"Clinton",
                "vice president" => "Gore",
                "intern" => "Lewinsky");
   my($ref) = \%hash;

   # NOTE: Can't put %{ref} inside doublequotes!!! Doesn't work!!!
   # Prints "internLewinskyvice presidentGorepresidentClinton".
   # NOTE: Hash elements might print in any order!
   print %{$ref}; print "\n";

   # NOTE: OK to put ${$ref}{} in doublequotes.
   # NOTE: Prints "Gore".
   print "${$ref}{'vice president'}\n";
   }

&doscalar;
&dolist;
&dohash;

Subroutines: A Discussion

Subroutines are the basic computer science methodology to divide tasks into subtasks. They take zero or more scalar arguments as input (and possibly output), and they return zero or one scalar as a return value. Note that the scalar arguments and/or return values can be references to lists, hashes, or any other type of complex data, so the possibilities are limitless.

In computer science, there are two methods of passing arguments to a subroutine:

When passing by value, the language makes a copy of the argument, and all access inside the subroutine is to that copy. Therefore, changes made inside the subroutine do not effect the calling routine. Such argumentscannot be used as output from the subroutine. The preferred method of outputting from a subroutine is via the return value. Unfortunately, the Perl language doesn't support it. Instead, the programmer must explicitly make the copy inside the subroutine.

In general, I believe it's best to use arguments as input-only.

When passing by reference, the language makes the argument's exact variable available inside the subroutine, so any changes the subroutine makes to the argument affect the arguments value in the calling procedure (after the subroutine call, of course). This tends to reduce encapsulation, as there's no way of telling in the calling routine that the called routine changed it. Passing by reference harkens back to the days of global values, and in general creates less robust code.

All arguments in Perl are passed by reference! If the programmer wishes to make a copy of the argument to simulate passing by value (and I believe in most cases he should), he must explicitly make the copy in the subroutine and not otherwise access the original arguments.


NOTE: Modern Perl versions (5.003 and newer) enable you to do function prototyping somewhat similar to C. Doing so lessens the chance for wierd runtime errors. Because this page was created before Perl prototyping was common, much of its code is old school. This will change as time goes on.

Danger! Warning! Peligro! Achtung! Watch it!
As you would probably imagine, subroutine order matters when prototyping. A subroutine call must call a subroutine defined previously. The danger lies in the fact that if you do not, you get a non-obvious runtime error, not a compile error.
SUBROUTINE ORDER MATTERS IN PROTOTYPING

Bare Bones Subroutine Syntax

Old school, no prototyping
Calling the subroutine Constructing the subroutine
&mysub();
sub mysub
  {
  }
Note that in the above the ampersand (&) is used before the subroutine call, and that no parentheses are used in the function definition.
 
Prototyping, no arguments
Calling the subroutine Constructing the subroutine
mysub();
sub mysub()
  {
  }
The preceding is prototyped. Note that there is no ampersand before the function. Note also that the function definition has parentheses, but because there are no args expected those parens are empty. Contrast that with the following, which expects two scalars. Experiment and note that Perl gripes when your prototype and call don't match.
 
Prototyping, two string arguments
Calling the subroutine Constructing the subroutine
mysub($filename, $title);
sub mysub($$)
  {
  }

Returning a Scalar

Use the return statement.
 
Calling the subroutine Constructing the subroutine
my($name) = &getName();
print "$name\n";
# Prints "Bill Clinton"
sub getName
    {
    return("Bill Clinton");
    }

NOTE: In C++ there are cases where the calling code can "reach into" the function via the returned pointer or reference. This is appearantly not true of passed back scalars. Check out this code:

$GlobalName = "Clinton";

sub getGlobalName
    {
    return($GlobalName);
    }

print "Before: " . &getGlobalName() . "\n";
$ref = \&getGlobalName();
$$ref = "Gore";
print "After: " . &getGlobalName() . "\n";
#All print statements printed "Clinton"
I have been unable to hack into a subroutine via its scalar return. If you know of a way it can be done, please let me know, as this would be a horrid violation of encapsulation.

Returning a List

Calling the subroutine Constructing the subroutine
my($first, $last) = &getFnameLname();
print "$last, $first\n";

# Prints "Clinton, Bill"
sub getFnameLname
    {
    return("Bill", "Clinton");
    }

Returning a Hash

Calling the subroutine Constructing the subroutine
my(%officers) = &getOfficers();
print $officers{"vice president"};

# prints Al Gore
sub getOfficers
    {
    return("president"=>"Bill Clinton",
           "vice president"=>"Al Gore",
           "intern"=>"Monica Lewinsky"
           );
    }

Subroutine With Scalar Input/Output Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.

Below is how to change the value of an argument outside the function.
 
Calling the subroutine Constructing the subroutine
my($mm, $dd, $yyyy) = ("12", "10", "1998");
print "Before: $mm/$dd/$yyyy\n";
&firstOfNextMonth($mm, $dd, $yyyy);
print "After : $mm/$dd/$yyyy\n";
# Second print will print 01/01/1999
sub firstOfNextMonth
    {
    $_[1] = "01";
    $_[0] = $_[0] + 1;
    if($_[0] > 12)
      {
      $_[0] = "01";
      $_[2]++;
      }
    }

By the way, the above is an excellent example of the advantages of a loosely typed language. Note the implicit conversions between string and integer.

Subroutine With Scalar Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.

Below is how to print changed values without changing the arguments outside the functions scope.
 
Calling the subroutine Constructing the subroutine
my($mm, $dd, $yyyy) = ("12", "10", "1998");
print "Before: $mm/$dd/$yyyy\n";
&printFirstOfNextMonth($mm, $dd, $yyyy);
print "After : $mm/$dd/$yyyy\n";
# Before and after will print 12/10/1998.
# Inside will print 01/01/1999
sub printFirstOfNextMonth
    {
    my($mm, $dd, $yyyy) = @_;
    $dd = "01";
    $mm = $mm + 1;
    if($mm > 12)
      {
      $mm = "01";
      $yyyy++;
      }
    print "Inside: $mm/$dd/$yyyy\n";
    }

Subroutine With List Input/Output Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_, which is a list of scalars. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.

If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.

Here's an example of a listcat() function which appends the second list to the first. From that point forward the caller will see the new value of the first argument:
 
Calling the subroutine Constructing the subroutine
my(@languages) = ("C","C++","Delphi");
my(@newlanguages) = ("Java","Perl");
print "Before: @languages\n";
&listcat(\@languages, \@newlanguages);
print "After : @languages\n";

# Before prints "C C++ Delphi"
# After prints "C C++ Delphi Java PERL"

sub listcat
   {
   # Purpose of @append is only to
   # self-document input-only status
   my(@append) = @{$_[1]};

   my($temp);
   foreach $temp (@append)
      {
      # note direct usage of arg0
      push(@{$_[0]}, $temp);  
      }
   }

By the way, the above is an excellent example of the advantages of a loosely typed language. Note the implicit conversions between string and integer.

Subroutine With List Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.

If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.

Here's an example of an improved listcat() function which appends the second list to the first without affecting the first outside the subroutine. Instead, it returns the total string.
 
Calling the subroutine Constructing the subroutine
my(@languages) = ("C","C++","Delphi");
my(@newlanguages) = ("Java","PERL");
print "Before: @languages\n";
print "Inside: ";
print &listcat(\@languages,\@newlanguages);
print "\n";
print "After : @languages\n";

# Before and after prints "C C++ Delphi"
# Inside prints "CC++DelphiJavaPERL"

sub listcat
   {
   # Purpose of @append is only to
   # self-document input-only status
   my(@original) = @{$_[0]};
   my(@append) = @{$_[1]};
   my($temp);
   foreach $temp (@append)
      {
      push(@original, $temp);  # note direct usage
      }
   return(@original);
   }

Use parentheses with the shift command!

The following generates an error:
sub handleArray
  {
  my(@localArray) = @{shift};
  my($element);
  foreach $element (@localArray) {print $element . "\n";}
  }
&handleArray(\@globalArray);


But once you place the shift command in parens, everything's fine:

sub handleArray
  {
  my(@localArray) = @{(shift)};
  my($element);
  foreach $element (@localArray) {print $element . "\n";}
  }
&handleArray(\@globalArray);

Using prototypes

Be careful prototyping with lists:
sub printList(@$) {print @{(shift)}; print shift; print "\n";};
printList(\@globalArray);
The preceding gives some runtime warnings. But the call is missing an arg -- it shouldn't run at all. Instead, use \@ for the list in the prototype, and pass just the list in the call, as follows:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray);
Now it gives you a "not enough arguments errors, and ends with a compile error, which is what you want. Place an additional scalar in the call so the call matches the prototype, and it runs perfectly:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray, "Hello World");
Remember, using an unbackslashed @ in the prototype defeats the purpose of prototyping. Precede the @ with a backslash. Note that this is also true for passed hashes (%). Unless you have a very good reason to do otherwise, precede all @ and % with backslashes in the prototype.

Subroutine With Hash Input/Output Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_, which is a list of scalars. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.

If a member of @_ (in other words, an argument) is a reference to a hash, it can be dereferenced and used inside the subroutine.

Here's an example of a setGlobals() function which takes an existing %globals passed in as a reference argument and sets the proper elements. From that point forward the caller will see the new value of the elements:
 
Calling the subroutine Constructing the subroutine
%globals;     
&setGlobals(\%globals);
&printGlobals(\%globals);

sub setGlobals
   {
   ${$_[0]}{"currentdir"} = "/corporate/data";
   ${$_[0]}{"programdir"} = "/corporate/bin";
   ${$_[0]}{"programver"} = "5.21";
   ${$_[0]}{"accesslevel"} = "root";
   }

Subroutine With Hash Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.

If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.

Here's an example of an improved listcat() function which appends the second list to the first without affecting the first outside the subroutine. Instead, it returns the total string.
 
Calling the subroutine Constructing the subroutine
%globals;
# ...
# set globals
# ...
# now print globals
&printGlobals(\%globals);

sub printGlobals
   {
   # copy of argument precludes extra-scope change
   my(%globals) = %{$_[0]};
   print "Current Dir: $globals{'currentdir'}\n";
   print "Program Dir: $globals{'programdir'}\n";
   print "Version    : $globals{'programver'}\n";
   print "Accesslevel: $globals{'accesslevel'}\n";
   }


Dereferencing in Place: The -> Operator

By FAR the easiest way to handle references, especially when they're being passed into and out of subroutines, is the -> operator. This operator works the same as it does in C. It means "element so and so of the dereferenced reference". This is ABSOLUTELY vital when using objects, because most Perl objects are references to a hash. Nest a few of those, and without the -> operator you're dead meat. The -> operator also enables you to  easily modify arguments in place, which is vital in typical OOP applications.

One typical usage is an object containing a list of hashes. The list of hashes could easily represent a data table, with array elements being rows (records) and hash elements being columns (fields). Here's how it's easily done in Perl:

#!/usr/bin/perl -w
use strict;

package Me;

sub new
{
my($type) = $_[0];
my($self) = {};
$self->{'name'} = 'Bill Brown';

### Make a reference to an empty array of jobs
$self->{'jobs'} = [];

### Now make each element of array referenced by
### $self->{'jobs'} a REFERENCE to a hash!
$self->{'jobs'}->[0]={'ystart'=>'1998','yend'=>'1999','desc'=>'Bus driver'};
$self->{'jobs'}->[1]={'ystart'=>'1999','yend'=>'1999','desc'=>'Bus mechanic'};
$self->{'jobs'}->[2]={'ystart'=>'1999','yend'=>'2001','desc'=>'Software Developer'};

bless($self, $type);
return($self);
}

### showResume is coded to show off the -> operator. In real
### life you'd probably use a foreach loop, but the following
### while(1) loop better demonstrates nested -> operators.
sub showResume
{
my($self)=$_[0];
print "Resume of " . $self->{'name'} . "\n\n";
print "Start\tEnd\tDescription\n";
my $ss = 0;

# Loop through array referenced by $self->{'jobs'},
# and for each subscript, print the value corresponding
# to the hash key. In other words, print every field of
# every record of the jobs array
while (1)
{
last unless defined $self->{'jobs'}->[$ss];
print "$self->{'jobs'}->[$ss]->{'ystart'}\t";
print "$self->{'jobs'}->[$ss]->{'yend'}\t";
print "$self->{'jobs'}->[$ss]->{'desc'}\n";
$ss++;
}
}

package Main;

my $me = Me->new();
$me->showResume();
print "\nFirst job was $me->{'jobs'}->[0]->{'desc'}.\n";

I think you'll agree that the reference nesting in the preceding code would have been extremely hard to understand without the in-place dereferencing provided by the -> operator. The following is the resulting output:

[slitt@mydesk slitt]$ ./test.pl
Resume of Bill Brown

Start End Description
1998 1999 Bus driver
1999 1999 Bus mechanic
1999 2001 Software Developer

First job was Bus driver.
[slitt@mydesk slitt]$



 [ Troubleshooters.com| Code Corner | Email Steve Litt ]

Copyright (C)1998-2003 by Steve Litt --Legal