The Perl You Need To Know
by Stas Bekman
|
Pages: 1, 2, 3
The Scope of the Special Perl Variables
Now let's talk about Special Perl Variables.
Special Perl variables like $| (buffering), $^T (script's start
time), $^W (warnings mode), $/ (input record separator), $\
(output record separator) and many more are all true global variables;
they do not belong to any particular package (not even main::) and
are universally available. This means that if you change them, then you
change them anywhere across the entire program; furthermore you cannot
scope them with my(). However, you can local()ize them, which means that
any changes you apply will only last until the end of the enclosing
scope. In the mod_perl situation where the child server doesn't
usually exit, if in one of your scripts you modify a global variable, then
it will be changed for the rest of the process' life and will affect
all the scripts executed by the same process. Therefore, localizing
these variables is highly recommended; I'd say even mandatory.
I will demonstrate the case on the input record separator variable. If you undefine this variable, then the diamond operator (readline) will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.
$/ = undef; # BAD!
open IN, "file" ....
# slurp it all into a variable
$all_the_file = <IN>;
The proper way is to have a local() keyword before the special
variable is changed, like this:
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
But there is a catch. local() will propagate the changed value to the
code below it. The modified value will be in effect until the script
terminates, unless it is changed again somewhere else in the script.
A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:
{
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
}
That way when Perl leaves the block it restores the original value of
the $/ variable, and you don't need to worry elsewhere in your
program about its value being changed here.
Note that if you call a subroutine after you've set a global variable but within the enclosing block, the global variable will be visible with its new value inside the subroutine.
Compiled Regular Expressions
And finally I want to cover the pitfall many people have fallen into. Let's talk about regular expressions use under mod_perl.
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not
change during the execution of the program, a standard optimization
technique is to add the /o modifier to the regex pattern. This
directs the compiler to build the internal table once, for the entire
lifetime of the script, rather than each time the pattern is
executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
|
Previously in the Series
Installing mod_perl without superuser privileges |
This is usually a big win in loops over lists, or when using the
grep() or map() operators.
In long-lived mod_perl scripts, however, the variable may change with each invocation and this can pose a problem. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by that child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is supposed to depend on. Your script will appear to be broken.
There are two solutions to this problem:
The first is to use eval q//, to force the code to be evaluated
each time. Just make sure that the eval block covers the entire loop
of processing, and not just the pattern match itself.
The above code fragment would be rewritten as:
my $pat = '^foo$';
eval q{
foreach( @list ) {
print if /$pat/o;
}
}
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
means that I recompile the regex for every element in the list, even though the regex doesn't change.
You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one
operator (be it an m// or s///), then you can rely on the property of
the null pattern, that reuses the last pattern seen. This leads to the
second solution, which also eliminates the use of eval.
The above code fragment becomes:
my $pat = '^foo$';
"something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
foreach( @list ) {
print if //;
}
The only gotcha is that the dummy match that boots the regular
expression engine must absolutely, positively succeed, otherwise the
pattern will not be cached, and the // will match everything. If
you can't count on fixed text to ensure the match succeeds, then you have
two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), then you can use the dummy match:
$pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain meta-characters, then you should search for the pattern or the nonsearchable \377 character as follows:
"\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regex to which you apply this technique. One common usage where a compiled regex is usually more efficient is to "match any one of a group of patterns'' over and over again.
Maybe with a helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book "Mastering Regular Expressions''.
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
$Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
In the next article, I'll present a few other Perl basics directly related to the mod_perl programming.
References
- The book "Mastering Regular Expressions'' by Jeffrey E. Friedl.
- The book "Learning Perl'' by Randal L. Schwartz (also known as the "Llama'' book, named after the llama picture on the cover of the book).
- The book "Programming Perl'' by L.Wall, T. Christiansen and J.Orwant (also known as the "Camel'' book, named after the camel picture on the cover of the book).
- The Exporter, perlre, perlvar, perlmod and perlmodlib man pages.

