Copyright (C) 1998-2003 by Steve Litt
Contents
Introduction
This page discusses both subroutines and references. They're on the same
page because references are often passed into and out of subroutines.
References
In Perl, you can pass only one kind of argument to a subroutine: a scalar.
To pass any other kind of argument, you need to convert it to a scalar. You
do that by passing a reference to it. A reference to anything is a
scalar. If you're a C programmer you can think of a reference as a pointer
(sort of).
The following table discusses the referencing and de-referencing of variables.
Note that in the case of lists and hashes, you reference and dereference the
list or hash as a whole, not individual elements (at least not for the purposes
of this discussion).
| Variable |
Instantiating
the scalar
|
Instantiating a
reference to it
|
Referencing it |
Dereferencing it |
Accessing an element |
| $scalar |
$scalar = "steve";
|
$ref = \"steve";
|
$ref = \$scalar |
$$ref or
${$ref} |
N/A |
| @list |
@list = ("steve", "fred");
|
$ref = ["steve", "fred"];
|
$ref = \@list |
@{$ref} |
${$ref}[3]
$ref->[3]
|
| %hash |
%hash = ("name" => "steve",
"job" => "Troubleshooter");
|
$hash = {"name" => "steve",
"job" => "Troubleshooter"}; |
$ref = \%hash |
%{$ref} |
${$ref}{"president"}
$ref->{"president"}
|
| FILE |
|
|
$ref = \*FILE |
{$ref} or scalar <$ref> |
|
These principles are demonstrated in the source code below. Note the following
anomolies:
- A variable with a % sign won't evaluate out when placed in doublequotes.
Variables with @ and $ will. I have no idea why.
sub doscalar { my($scalar) = "This is the scalar"; my($ref) = \$scalar; print "${$ref}\n"; # Prints "This is the scalar". }
sub dolist { my(@list) = ("Element 0", "Element 1", "Element 2"); my($ref) = \@list; print "@{$ref}\n"; # Prints "Element 0 Element 1 Element 2". print "${$ref}[1]\n"; # Prints "Element 1". }
sub dohash { my(%hash) = ("president"=>"Clinton", "vice president" => "Gore", "intern" => "Lewinsky"); my($ref) = \%hash;
# NOTE: Can't put %{ref} inside doublequotes!!! Doesn't work!!! # Prints "internLewinskyvice presidentGorepresidentClinton". # NOTE: Hash elements might print in any order! print %{$ref}; print "\n";
# NOTE: OK to put ${$ref}{} in doublequotes. # NOTE: Prints "Gore". print "${$ref}{'vice president'}\n"; }
&doscalar; &dolist; &dohash;
|
Subroutines: A Discussion
Subroutines are the basic computer science methodology to divide tasks into
subtasks. They take zero or more scalar arguments as input (and possibly
output), and they return zero or one scalar as a return value. Note that
the scalar arguments and/or return values can be references to lists, hashes,
or any other type of complex data, so the possibilities are limitless.
In computer science, there are two methods of passing arguments to a subroutine:
When passing by value, the language makes a copy of the argument, and all
access inside the subroutine is to that copy. Therefore, changes made inside
the subroutine do not effect the calling routine. Such argumentscannot
be used as output from the subroutine. The preferred method of outputting
from a subroutine is via the return value. Unfortunately, the Perl language
doesn't support it. Instead, the programmer must explicitly make the copy
inside the subroutine.
In general, I believe it's best to use arguments as input-only.
When passing by reference, the language makes the argument's exact variable
available inside the subroutine, so any changes the subroutine makes to the
argument affect the arguments value in the calling procedure (after the subroutine
call, of course). This tends to reduce encapsulation, as there's no way of
telling in the calling routine that the called routine changed it. Passing
by reference harkens back to the days of global values, and in general creates
less robust code.
All arguments in Perl are passed by reference! If the programmer wishes
to make a copy of the argument to simulate passing by value (and I believe
in most cases he should), he must explicitly make the copy in the subroutine
and not otherwise access the original arguments.
NOTE: Modern Perl versions (5.003 and newer) enable you
to do function prototyping somewhat similar to C. Doing so lessens the chance
for wierd runtime errors. Because this page was created before Perl prototyping
was common, much of its code is old school. This will change as time goes
on.
Danger! Warning! Peligro! Achtung! Watch
it!
As you would probably imagine, subroutine order matters when prototyping.
A subroutine call must call a subroutine defined previously. The danger lies
in the fact that if you do not, you get a non-obvious runtime error, not a
compile error.
SUBROUTINE ORDER MATTERS IN
PROTOTYPING
Bare Bones Subroutine Syntax
| Old school, no prototyping |
| Calling the subroutine |
Constructing the subroutine |
&mysub();
|
sub mysub { }
|
Note that in the above the ampersand (&) is used before the subroutine
call, and that no parentheses are used in the function definition.
| Prototyping, no arguments |
| Calling the subroutine |
Constructing the subroutine |
mysub();
|
sub mysub() { }
|
The preceding is prototyped. Note that there is no ampersand before the function.
Note also that the function definition has parentheses, but because there
are no args expected those parens are empty. Contrast that with the following,
which expects two scalars. Experiment and note that Perl gripes when your
prototype and call don't match.
| Prototyping, two string arguments |
| Calling the subroutine |
Constructing the subroutine |
mysub($filename, $title);
|
sub mysub($$) { }
|
Returning a Scalar
Use the return statement.
| Calling the subroutine |
Constructing the subroutine |
my($name) = &getName(); print "$name\n";
# Prints "Bill Clinton"
|
sub getName { return("Bill Clinton"); }
|
NOTE: In C++ there are cases where the calling code can "reach into" the
function via the returned pointer or reference. This is appearantly not true
of passed back scalars. Check out this code:
$GlobalName = "Clinton";
sub getGlobalName
{
return($GlobalName);
}
print "Before: " . &getGlobalName() . "\n";
$ref = \&getGlobalName();
$$ref = "Gore";
print "After: " . &getGlobalName() . "\n";
#All print statements printed "Clinton"
I have been unable to hack into a subroutine via its scalar return. If you
know of a way it can be done, please let me know, as this would be
a horrid violation of encapsulation.
Returning a List
| Calling the subroutine |
Constructing the subroutine |
my($first, $last) = &getFnameLname(); print "$last, $first\n";
# Prints "Clinton, Bill"
|
sub getFnameLname { return("Bill", "Clinton"); }
|
Returning a Hash
| Calling the subroutine |
Constructing the subroutine |
my(%officers) = &getOfficers(); print $officers{"vice president"};
# prints Al Gore
|
sub getOfficers { return("president"=>"Bill Clinton", "vice president"=>"Al Gore", "intern"=>"Monica Lewinsky" ); }
|
Subroutine With Scalar Input/Output
Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_.
Any change the subroutine performs to @_ or any of its members like $_[0],
$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or
its elements to other variables makes a separate copy. Changes to the separate
copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore
important to use the output argument as $_[] or @_ throughout the function
to let the reader know it's an output argument.
Below is how to change the value of an argument outside the function.
| Calling the subroutine |
Constructing the subroutine |
my($mm, $dd, $yyyy) = ("12", "10", "1998"); print "Before: $mm/$dd/$yyyy\n"; &firstOfNextMonth($mm, $dd, $yyyy); print "After : $mm/$dd/$yyyy\n";
# Second print will print 01/01/1999
|
sub firstOfNextMonth { $_[1] = "01"; $_[0] = $_[0] + 1; if($_[0] > 12) { $_[0] = "01"; $_[2]++; } }
|
By the way, the above is an excellent example of the advantages of a loosely
typed language. Note the implicit conversions between string and integer.
Subroutine With Scalar Input-Only Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_.
Any change the subroutine performs to @_ or any of its members like $_[0],
$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or
its elements to other variables makes a separate copy. Changes to the separate
copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only
arguments to local variables, and only work on the local variables.
Below is how to print changed values without changing the arguments outside
the functions scope.
| Calling the subroutine |
Constructing the subroutine |
my($mm, $dd, $yyyy) = ("12", "10", "1998"); print "Before: $mm/$dd/$yyyy\n"; &printFirstOfNextMonth($mm, $dd, $yyyy); print "After : $mm/$dd/$yyyy\n";
# Before and after will print 12/10/1998. # Inside will print 01/01/1999
|
sub printFirstOfNextMonth { my($mm, $dd, $yyyy) = @_; $dd = "01"; $mm = $mm + 1; if($mm > 12) { $mm = "01"; $yyyy++; } print "Inside: $mm/$dd/$yyyy\n"; }
|
Subroutine With List Input/Output Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_,
which is a list of scalars. Any change the subroutine performs to @_ or any
of its members like $_[0], $_[1], etc, are changes to the original argument.
HOWEVER, assigning @_ or its elements to other variables makes a separate
copy. Changes to the separate copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore
important to use the output argument as $_[] or @_ throughout the function
to let the reader know it's an output argument.
If a member of @_ (in other words, an argument) is a reference to a list,
it can be dereferenced and used inside the subroutine.
Here's an example of a listcat() function which appends the second list
to the first. From that point forward the caller will see the new value of
the first argument:
| Calling the subroutine |
Constructing the subroutine |
my(@languages) = ("C","C++","Delphi"); my(@newlanguages) = ("Java","Perl"); print "Before: @languages\n"; &listcat(\@languages, \@newlanguages); print "After : @languages\n";
# Before prints "C C++ Delphi" # After prints "C C++ Delphi Java PERL"
|
sub listcat { # Purpose of @append is only to # self-document input-only status my(@append) = @{$_[1]};
my($temp); foreach $temp (@append) { # note direct usage of arg0 push(@{$_[0]}, $temp); } }
|
By the way, the above is an excellent example of the advantages of a loosely
typed language. Note the implicit conversions between string and integer.
Subroutine With List Input-Only Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_.
Any change the subroutine performs to @_ or any of its members like $_[0],
$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or
its elements to other variables makes a separate copy. Changes to the separate
copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only
arguments to local variables, and only work on the local variables.
If a member of @_ (in other words, an argument) is a reference to a list,
it can be dereferenced and used inside the subroutine.
Here's an example of an improved listcat() function which appends the
second list to the first without affecting the first outside the subroutine.
Instead, it returns the total string.
| Calling the subroutine |
Constructing the subroutine |
my(@languages) = ("C","C++","Delphi"); my(@newlanguages) = ("Java","PERL"); print "Before: @languages\n"; print "Inside: "; print &listcat(\@languages,\@newlanguages); print "\n"; print "After : @languages\n";
# Before and after prints "C C++ Delphi" # Inside prints "CC++DelphiJavaPERL"
|
sub listcat { # Purpose of @append is only to # self-document input-only status my(@original) = @{$_[0]}; my(@append) = @{$_[1]}; my($temp); foreach $temp (@append) { push(@original, $temp); # note direct usage } return(@original); }
|
Use parentheses with the shift command!
The following generates an error:
sub handleArray
{
my(@localArray) = @{shift};
my($element);
foreach $element (@localArray) {print $element . "\n";}
}
&handleArray(\@globalArray);
But once you place the shift command in parens, everything's fine:
sub handleArray
{
my(@localArray) = @{(shift)};
my($element);
foreach $element (@localArray) {print $element . "\n";}
}
&handleArray(\@globalArray);
Using prototypes
Be careful prototyping with lists:
sub printList(@$) {print @{(shift)}; print shift; print "\n";};
printList(\@globalArray);
The preceding gives some runtime warnings. But the call is missing an arg
-- it shouldn't run at all. Instead, use \@ for the list in the prototype,
and pass just the list in the call, as follows:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray);
Now it gives you a "not enough arguments errors, and ends with a compile
error, which is what you want. Place an additional scalar in the call so
the call matches the prototype, and it runs perfectly:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray, "Hello World");
Remember, using an unbackslashed @ in the prototype defeats the purpose of
prototyping. Precede the @ with a backslash. Note that this is also true for
passed hashes (%). Unless you have a very good reason to do otherwise, precede
all @ and % with backslashes in the prototype.
Subroutine With Hash Input/Output Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_,
which is a list of scalars. Any change the subroutine performs to @_ or any
of its members like $_[0], $_[1], etc, are changes to the original argument.
HOWEVER, assigning @_ or its elements to other variables makes a separate
copy. Changes to the separate copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore
important to use the output argument as $_[] or @_ throughout the function
to let the reader know it's an output argument.
If a member of @_ (in other words, an argument) is a reference to a hash,
it can be dereferenced and used inside the subroutine.
Here's an example of a setGlobals() function which takes an existing %globals
passed in as a reference argument and sets the proper elements. From that
point forward the caller will see the new value of the elements:
| Calling the subroutine |
Constructing the subroutine |
%globals; &setGlobals(\%globals); &printGlobals(\%globals);
|
sub setGlobals { ${$_[0]}{"currentdir"} = "/corporate/data"; ${$_[0]}{"programdir"} = "/corporate/bin"; ${$_[0]}{"programver"} = "5.21"; ${$_[0]}{"accesslevel"} = "root"; }
|
Subroutine With Hash Input-Only Arguments
Arguments to a subroutine are accessible inside the subroutine as list @_.
Any change the subroutine performs to @_ or any of its members like $_[0],
$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or
its elements to other variables makes a separate copy. Changes to the separate
copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only
arguments to local variables, and only work on the local variables.
If a member of @_ (in other words, an argument) is a reference to a list,
it can be dereferenced and used inside the subroutine.
Here's an example of an improved listcat() function which appends the
second list to the first without affecting the first outside the subroutine.
Instead, it returns the total string.
| Calling the subroutine |
Constructing the subroutine |
%globals; # ... # set globals # ...
# now print globals &printGlobals(\%globals);
|
sub printGlobals { # copy of argument precludes extra-scope change my(%globals) = %{$_[0]}; print "Current Dir: $globals{'currentdir'}\n"; print "Program Dir: $globals{'programdir'}\n"; print "Version : $globals{'programver'}\n"; print "Accesslevel: $globals{'accesslevel'}\n"; }
|
Dereferencing in Place: The ->
Operator
By FAR the easiest way to handle references, especially when they're being
passed into and out of subroutines, is the -> operator. This operator
works the same as it does in C. It means "element so and so of the dereferenced
reference". This is ABSOLUTELY vital when using objects, because most Perl
objects are references to a hash. Nest a few of those, and without the ->
operator you're dead meat. The -> operator also enables you to
easily modify arguments in place, which is vital in typical OOP applications.
One typical usage is an object containing a list of hashes. The list of hashes
could easily represent a data table, with array elements being rows (records)
and hash elements being columns (fields). Here's how it's easily done in Perl:
#!/usr/bin/perl -w use strict;
package Me;
sub new { my($type) = $_[0]; my($self) = {}; $self->{'name'} = 'Bill Brown';
### Make a reference to an empty array of jobs $self->{'jobs'} = [];
### Now make each element of array referenced by ### $self->{'jobs'} a REFERENCE to a hash! $self->{'jobs'}->[0]={'ystart'=>'1998','yend'=>'1999','desc'=>'Bus driver'}; $self->{'jobs'}->[1]={'ystart'=>'1999','yend'=>'1999','desc'=>'Bus mechanic'}; $self->{'jobs'}->[2]={'ystart'=>'1999','yend'=>'2001','desc'=>'Software Developer'};
bless($self, $type); return($self); }
### showResume is coded to show off the -> operator. In real ### life you'd probably use a foreach loop, but the following ### while(1) loop better demonstrates nested -> operators. sub showResume { my($self)=$_[0]; print "Resume of " . $self->{'name'} . "\n\n"; print "Start\tEnd\tDescription\n"; my $ss = 0;
# Loop through array referenced by $self->{'jobs'}, # and for each subscript, print the value corresponding # to the hash key. In other words, print every field of # every record of the jobs array while (1) { last unless defined $self->{'jobs'}->[$ss]; print "$self->{'jobs'}->[$ss]->{'ystart'}\t"; print "$self->{'jobs'}->[$ss]->{'yend'}\t"; print "$self->{'jobs'}->[$ss]->{'desc'}\n"; $ss++; } }
package Main;
my $me = Me->new(); $me->showResume(); print "\nFirst job was $me->{'jobs'}->[0]->{'desc'}.\n";
|
I think you'll agree that the reference nesting in the preceding code would
have been extremely hard to understand without the in-place dereferencing
provided by the -> operator. The following is the resulting output:
[slitt@mydesk slitt]$ ./test.pl Resume of Bill Brown
Start End Description 1998 1999 Bus driver 1999 1999 Bus mechanic 1999 2001 Software Developer
First job was Bus driver. [slitt@mydesk slitt]$
|
[ Troubleshooters.com| Code Corner | Email Steve Litt ]
Copyright
(C)1998-2003 by Steve Litt --Legal