Perl Design Patterns, Part 2
by Phil Crow
|
Pages: 1, 2, 3, 4
Config::Auto -- For Those Who Can't be Bothered
If you're too lazy to write your own config handler, or if you have lots
of configs outside your control, Config::Auto may be for you. Basically,
it takes a file and guesses how to turn it into a config hash. (It
can even guess the name of your config file). Using it is easy (if it works):
#!/usr/bin/perl
use strict; use warnings;
use Config::Auto;
my $config = Config::Auto::parse("your.config");
...
What ends up in $config depends on what your config file looks like (shock).
For files which use variable=value pairs, you get what you expect, which
is exactly what the first example above generates for the same input. It
is possible to specify a config file that Config::Auto cannot understand
(shock and amazement).
Real Hackers Use Parse::RecDescent
If the file you need to parse is complex, consider Parse::RecDescent.
It implements a clever top/down parser scheme. To use it, you specify
a grammar. (You remember grammars, don't you? If not, see below.) It
builds a parser from your grammar. You feed text to the parser. It does
whatever the grammar specifies in its actions.
To give you a feel for how this works, I'll parse small Roman numerals. The program below takes numbers from the keyboard and translates them from Roman numerals to decimal integers, so XXIX becomes 29.
#!/usr/bin/perl
use strict; use warnings;
use Parse::RecDescent;
my $grammar = q{
Numeral : TenList FiveList OneList /\Z/
{ $item[1] + $item[2] + $item[3]; }
| /quit/i { exit(0); }
| <error>
TenList : Ten(3) { 30 }
| Ten(2) OptionalNine { 20 + $item[2] }
| Ten OptionalNine { 10 + $item[2] }
| OptionalNine { $item[1] }
OptionalNine : One Ten { 9 }
| { 0 }
FiveList : One Five { 4 }
| Five { 5 }
| { 0 }
OneList : /(I{0,3})/i { length $1 }
Ten : /X/i
Five : /V/i
One : /I/i
};
my $parse = new Parse::RecDescent($grammar);
while (<>) { chomp; my $value = $parse->Numeral($_); print ``value: $value\n''; }
As you can see $grammar takes up most of the space in this program.
The rest is pretty simple. Once I receive the parser from the
Parse::RecDescent constructor, I just call its Numeral method repeatedly.
So what does the grammar mean? Let's start at the top. Grammars are built from rules. The rule for a Numeral (the Roman kind) says:
A Numeral takes the form of one of these choices
a TenList then a FiveList then a OneList then the end of the string
OR
the word quit in any case (not a Numeral, but a way to quit)
OR
anything else, which is an error
We'll see what TenList and its friends are shortly. The code after
the first choice is called an action. If the rule matches a possibility,
it performs that possibility's action. So if a valid Numeral is seen,
the action is executed. This particular action adds up the values
TenList, FiveList, and OneList have accumulated. The items are numbered
starting with 1, so TenList's value is in $item[1], etc.
How does TenList get a value? Well, when Numeral starts matching, it looks first for a valid TenList. There are four choices:
A TenList takes the form of one of these choices
three Tens
OR
two Tens then an OptionalNine
OR
a Ten then an OptionalNine
OR
an OptionalNine
These choices are tried in order. A Ten is simply an upper- or lower-case X (see the Ten rule). The result of an action is the result of its last statement. So, if there are three tens, the TenList returns 30. If there are two tens, it returns 20 plus whatever OptionalNine returned.
The Roman numeral IX is our 9. I call this an OptionalNine. (The names are completely arbitrary.) So after zero, one, or two X's, there can be an IX which adds 9 to the total. If there is no IX, the OptionalNine will match the empty rule. That consumes no text from the input and returns zero according to its action.
Roman numerals are a lot more complex than my little grammar can handle. For starters, by my calendar, we're now in the year MMIII. There are no M's in my grammar. Further, some Romans thought that IIIIII was perfectly valid. In my grammar three is the limit for all repetitions, and only I and X can repeat. Further, reductions can only take one away. So, IIX is not eight, it's invalid. This grammar can recognize any normalized Roman numeral up to 38. Feel free to expand it.
Parse::RecDescent is not as fast as a yacc-generated parser, but it is
easier to use. See the documentation in the distribution for more
information, especially the tutorial which originally appeared in The
Perl Journal.
If you look at what's inside the parser (say with Data::Dumper) you might
think this actually implements the interpreter pattern. After all, it
makes a tree of objects from the grammar. Look closer and you will see
the key difference. All of the objects in the tree are members of
classes of like Parse::RecDescent::Action, which were written by Damian
Conway when he wrote the module. In the GoF interpreter pattern we are
expected to build a class for each non-terminal in the grammar (above
those classes would be Numeral, ReducedTen, etc.). Thus, the tree node
types are different for each grammar.
This difference has two implications: (1) it makes the RecDescent parser
generator simpler and (2) it's result faster.
Summary
In this installment we have seen how to use code references to implement
the Strategy and Template Method patterns. We even saw how to force
our code into someone else's class. Builder turns text into an internal
structure, which most Interpreters also do. Those
structures can often be simple combinations of hashes, lists, and scalars.
If what you need to read is simpler, use split or Config::Auto.
If it is more complex, use Parse::RecDescent. If that won't do it fast
enough, you might need one of the yaccs.
Next time I will look at patterns which actually rely on objects.

