Perl Design Patterns, Part 2
by Phil Crow
|
Pages: 1, 2, 3, 4
So, hashes are superior structures for simple to moderately complex data. To see how to build a hash structure consider an example: visualizing an outline. For simplicity, I'll represent the outline purely through indentation (not with Roman or other numerals). Here's an example outline:
Grocery Store
Milk
Juice
Butcher
Thin sliced ham
Chuck roast
Cheese
Cleaners
Home Center
Door
Lock
Shims
This outline describes a theoretical shopping trip. I want to represent it internally in my program so I can play with it. (One of my favorite games is turning outlines into pictures, see below.)
Instead of a full-blown object, I'll use a little hash-based data container for each node in the tree. Each node will keep track of three things:
- Name
- Level
- Children (a list of other nodes)
To keep track of who is a child of whom, I'll use a stack of these nodes. The node on the top of the stack is usually the parent of the next line of input. To show my method, I'll intersperse comments with the script. At the bottom of this section the script appears in one piece.
#!/usr/bin/perl
use strict; use warnings;
These lines are always a good idea.
my $root = {
name => "ROOT",
level => -1,
children => [],
};
This is the root node. It's a hash reference containing the three keys mentioned earlier. The root node is special. Since it isn't in the file, I give it an artificial name and a level that is lower than anyone else's. (In a moment, we will see that levels in the input will be zero or positive.) Initially the list of children is empty.
my @stack;
push @stack, $root;
The stack will keep track of the ancestry of each new node. For starters it needs the root node, which won't ever be popped, because it is an ancestor of all the nodes.
while (<>) {
/^(\s*)(.*)/;
my $indentation = length $1 if defined ($1);
my $name = $2;
To read the file, I chose a magic while. For each line there will be two
parts: the indentation (the leading spaces) and the name (the rest of the
line). The regular expression captures any leading space into $1 and
everything else (except the new line) into $2. The length of the
indentation is the important part, the bigger this is the more ancestors
the node has. Lines starting at the margin have an indentation of 0
(which is why the ROOT has a level of -1).
while ($indentation <= $stack[-1]{level}) {
pop @stack;
}
This loop handles ancestry. It pops the stack, until the node on top of
the stack is the parent of the new node. Think of an example.
When Home Center comes along, Cleaners and ROOT are on the stack.
Home Center's level is 0 (it's at the margin), so is Cleaners'. Thus,
Cleaners is popped (since 0 <= 0). Then only ROOT remains, so popping
stops (0 is not <= -1).
my $node = {
name => $name,
level => $indentation,
children => [],
};
This builds a new node for the current line. It's name and level are set. We haven't seen any children yet, but I make room for them in an empty list.
push @{$stack[-1]{children}}, $node;
This line adds the new node to its parent's list of children. Remember
that the parent is sitting on top of the stack. The top of the stack is
$stack[-1] or the last element in the array.
push @stack, $node;
}
This pushes the new node onto the stack, in case it has children. The
closing brace ends the magic while loop. For simplicity, I chose to
display the output with Data::Dumper:
use Data::Dumper; print Dumper($root);
Running this shows the tree (sideways) on standard out.
Here's the whole code without interruption:
#!/usr/bin/perl
use strict; use warnings;
my $root = {
name => "ROOT",
level => -1,
children => [],
};
my @stack;
push @stack, $root;
while (<>) {
/^(\s*)(.*)/;
my $indentation = length $1;
my $name = $2;
while ($indentation <= $stack[-1]{level}) {
pop @stack;
}
my $node = {
name => $name,
level => $indentation,
children => [],
};
push @{$stack[-1]{children}}, $node;
push @stack, $node;
}
use Data::Dumper; print Dumper($root);
I promised to explain how structures like the one above can be turned into
pictures. The CPAN module UML::Sequence builds a structure similar to the
one shown here. It then uses that to generate a UML Sequence diagram of
the steps in SVG (Scalable Vector Graphics) format. That format can
be converted with standard tools like Batik to PNG or JPEG. In practice
the outlines which I turn into pictures represent call sequences for
programs. Perl can even generate the outline by running the program.
See UML::Sequence for more details.
When you have some interesting structured input, a builder might help
make a good internal structure. One high value builder is XML::DOM.
Another with a slightly different approach is XML::Twig. It is not
coincidental that XML parsers are really builders, as XML files are
non-binary trees.
Interpreter
If you haven't looked in GoF yet, start with the interpreter pattern. Laughter is good for the soul. The person who taught me patterns in Java did not even know why this pattern would not work in practice. He had heard it was somewhat slow, but he wasn't sure. Well I'm sure.
Luckily for us, Perl has alternatives. These range from quick and dirty to full blown. Here's the litany covered with examples below:
-
split -
eval'ing Perl code -
Config::Auto -
Parse::RecDescent
Since we already have a language we like (that's Perl for those who haven't been paying attention), interpreting is limited to small languages that do something for us. Usually these turn out to be configuration files, so I will focus on those. (See the builder section above if a tree can represent your data file.)
Splitting
The easiest route involves split. Suppose I have a config file which
uses variable=value settings. Comments and blanks should be ignored,
all other lines should have a variable, value pair. That's easy:
sub parse_config {
my $file = shift;
my %answer;
open CONFIG, "$file" or die "Couldn't read config file $file: $!\n";
while (<CONFIG>) {
next if (/^#|^\s*$/); # skip blanks and comments
my ($variable, $value) = split /=/;
$answer{$variable} = $value;
}
close CONFIG;
return %answer;
}
This subroutine expects a config file name. It opens and reads that file.
Inside the magic while loop the regex rejects lines which start with '#'
and those which contain only whitespace. All other lines are split on '='.
The variables become keys in the %answer hash. When all the lines are read,
the caller gets the hash back.
You could go much further along these lines, but see below for those who've
gone before you (see especially Config::Auto).
Evaluating Perl Code
My current favorite way to bring configuration information into a Perl program is to specify the config file in Perl. So, I might have a config file like this:
our $db_name = "projectdb";
our $db_pass = "my_special_password_no_one_will_think_of";
our %personal = (
name => "Phil Crow",
address => "philcrow2000@yahoo.com",
);
To use this in a Perl program all I have to do is eval it:
...
open CONFIG, "config.txt" or die "couldn't...\n";
my $config = join "", <CONFIG>;
close CONFIG;
eval $config;
die "Couldn't eval your config: $@\n" if $@;
...
To read the file, I open it, then use join to put the angle read operator
in list context. This lets me bring the whole file into a scalar. Once
it's in (and the file is closed for tidiness), I just eval the string I read.
I need to check $@ to make sure the file was good Perl. After that, I'm
ready to use the values just as if they appeared in the program originally.

