The Perl You Need To Know - Part 2
by Stas Bekman
|
Pages: 1, 2, 3
The Remedy
The diagnostics pragma suggests that the problem can be solved by
making the inner subroutine anonymous.
An anonymous subroutine can act as a closure with respect to lexically scoped variables. Basically, this means that if you define a subroutine in a particular lexical context at a particular moment, then it will run in that same context later, even if called from outside that context. The upshot of this is that when the subroutine runs, you get the same copies of the lexically scoped variables that were visible when the subroutine was defined. So you can pass arguments to a function when you define it, as well as when you invoke it.
Let's rewrite the code to use this technique:
anonymous.pl
--------------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
my $func_ref = sub {
return $x ** 2;
};
my $result = &$func_ref();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Now $func_ref contains a reference to an anonymous function, which
we later use when we need to get the power of two. (In Perl, a
function is the same thing as a subroutine.) Since it is anonymous,
the function will automatically be rebound to the new value of the
outer scoped variable $x, and the results will now be as expected.
Let's verify:
% ./anonymous.pl
5^2 = 25
6^2 = 36
So we can see that the problem is solved.
When You Cannot Get Rid of the Inner Subroutine
First, you might wonder, why in the world will someone need to define an inner subroutine? Well, for example, to reduce some of Perl's script startup overhead you might decide to write a daemon that will compile the scripts and modules only once, and cache the pre-compiled code in memory. When some script is to be executed, you just tell the daemon the name of the script to run and it will do the rest and do it much faster since compilation has already taken place.
Seems like an easy task; and it is. The only problem is once the
script is compiled, how do you execute it? Or let's put it another
way: After it was executed for the first time and it stays compiled in
the daemon's memory, how do you call it again? If you could get all
developers to code their scripts so each has a subroutine called run()
that will actually execute the code in the script, then we've solved
half the problem.
But how does the daemon know to refer to some specific script if they
all run in the main:: name space? One solution might be to ask the
developers to declare a package in each and every script, and for the
package name to be derived from the script name. However, since there
is a chance that there will be more than one script with the same name
but residing in different directories, then in order to prevent
namespace collisions the directory has to be a part of the package
name, too. And don't forget that the script may be moved from one
directory to another, so you will have to make sure that the package
name is corrected each time the script gets moved.
But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that the daemon is about to execute for the first time, the script should be wrapped inside the package whose name is constructed from the mangled path to the script and a subroutine called run(). For example, if the daemon is about to execute the script /tmp/hello.pl:
hello.pl
--------
#!/usr/bin/perl
print "Hello\n";
then prior to running it, the daemon will change the code to be:
wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl
print "Hello\n";
}
The package name is constructed from the prefix cache::, each
directory separation slash is replaced with ::, and nonalphanumeric characters are encoded so that for example . (a dot)
becomes _2e (an underscore followed by the ASCII code for a dot in
hex representation).
% perl -e 'printf "%x",ord(".")'
prints: 2e. The underscore is the same you see in URL encoding
except the % character is used instead (%2E), but since % has
a special meaning in Perl (prefix of hash variable) it couldn't be
used.
Now when the daemon is requested to execute the script
/tmp/hello.pl, all it has to do is to build the package name as
before based on the location of the script and call its run()
subroutine:
use cache::tmp::hello_2epl;
cache::tmp::hello_2epl::run();
We have just written a partial prototype of the daemon we wanted. The only outstanding problem is how to pass the path to the script to the daemon. This detail is left as an exercise for the reader.
If you are familiar with the Apache::Registry module, then you know that
it works in almost the same way. It uses a different package prefix
and the generic function is called handler() and not run(). The
scripts to run are passed through the HTTP protocol's headers.
Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:
simple.pl
---------
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
Wrapped into a run() subroutine it becomes:
simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
}
Therefore, hello() is an inner subroutine and if you have used my()
scoped variables defined and altered outside and used inside hello(),
it won't work as you expect starting from the second call, as was
explained in the previous section.
Remedies for Inner Subroutines
First of all, there is nothing to worry about, as long as you don't forget to turn the warnings On. If you do happen to have the ``my() Scoped Variable in Nested Subroutines'' problem, then Perl will always alert you.
Given that you have a script that has this problem, what are the ways to solve it? There are many of them and we will discuss some of them here.
We will use the following code to show the different solutions.
multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run{
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
This code executes the run() subroutine three times, which in turn
initializes the $counter variable to 0 each time it is executed
and then calls the inner subroutine increment_counter() twice. Sub
increment_counter() prints $counter's value after incrementing
it. One might expect to see the following output:
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:
% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18.
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 3 !
Counter is equal to 4 !
run: [time 3]
Counter is equal to 5 !
Counter is equal to 6 !
Obviously, the $counter variable is not reinitialized on each
execution of run(). It retains its value from the previous execution,
and sub increment_counter() increments that.
One of the workarounds is to use globally declared variables, with the
vars pragma.
multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
If you run this and the other solutions offered below, then the expected output will be generated:
% ./multirun1.pl
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
By the way, the warning we saw before has gone, and so has the
problem, since there is no my() (lexically defined) variable used
in the nested subroutine.

