Sign In/My Account | View Cart  
advertisement


Listen Print

Perl Design Patterns, Part 2
by Phil Crow | Pages: 1, 2, 3, 4

Config::Auto -- For Those Who Can't be Bothered

If you're too lazy to write your own config handler, or if you have lots of configs outside your control, Config::Auto may be for you. Basically, it takes a file and guesses how to turn it into a config hash. (It can even guess the name of your config file). Using it is easy (if it works):


    #!/usr/bin/perl
    use strict; use warnings;

    use Config::Auto;

    my $config = Config::Auto::parse("your.config");
    ...

What ends up in $config depends on what your config file looks like (shock). For files which use variable=value pairs, you get what you expect, which is exactly what the first example above generates for the same input. It is possible to specify a config file that Config::Auto cannot understand (shock and amazement).

Real Hackers Use Parse::RecDescent

If the file you need to parse is complex, consider Parse::RecDescent. It implements a clever top/down parser scheme. To use it, you specify a grammar. (You remember grammars, don't you? If not, see below.) It builds a parser from your grammar. You feed text to the parser. It does whatever the grammar specifies in its actions.

To give you a feel for how this works, I'll parse small Roman numerals. The program below takes numbers from the keyboard and translates them from Roman numerals to decimal integers, so XXIX becomes 29.


    #!/usr/bin/perl
    use strict; use warnings;

    use Parse::RecDescent;

    my $grammar = q{
        Numeral : TenList FiveList OneList /\Z/
                    { $item[1] + $item[2] + $item[3]; }
                | /quit/i { exit(0); }
                | <error>

        TenList : Ten(3)                  { 30            }
                | Ten(2) OptionalNine     { 20 + $item[2] }
                | Ten OptionalNine        { 10 + $item[2] }
                | OptionalNine            { $item[1]      }

        OptionalNine : One Ten { 9 }
                     |         { 0 }

        FiveList : One Five { 4 }
                 | Five     { 5 }
                 |          { 0 }

        OneList : /(I{0,3})/i { length $1 }

        Ten : /X/i

        Five : /V/i

        One : /I/i
};

my $parse = new Parse::RecDescent($grammar);

while (<>) { chomp; my $value = $parse->Numeral($_); print ``value: $value\n''; }

As you can see $grammar takes up most of the space in this program. The rest is pretty simple. Once I receive the parser from the Parse::RecDescent constructor, I just call its Numeral method repeatedly.

So what does the grammar mean? Let's start at the top. Grammars are built from rules. The rule for a Numeral (the Roman kind) says:


    A Numeral takes the form of one of these choices
        a TenList then a FiveList then a OneList then the end of the string
        OR
        the word quit in any case (not a Numeral, but a way to quit)
        OR
        anything else, which is an error

We'll see what TenList and its friends are shortly. The code after the first choice is called an action. If the rule matches a possibility, it performs that possibility's action. So if a valid Numeral is seen, the action is executed. This particular action adds up the values TenList, FiveList, and OneList have accumulated. The items are numbered starting with 1, so TenList's value is in $item[1], etc.

How does TenList get a value? Well, when Numeral starts matching, it looks first for a valid TenList. There are four choices:


    A TenList takes the form of one of these choices
        three Tens
        OR
        two Tens then an OptionalNine
        OR
        a Ten then an OptionalNine
        OR
        an OptionalNine

These choices are tried in order. A Ten is simply an upper- or lower-case X (see the Ten rule). The result of an action is the result of its last statement. So, if there are three tens, the TenList returns 30. If there are two tens, it returns 20 plus whatever OptionalNine returned.

The Roman numeral IX is our 9. I call this an OptionalNine. (The names are completely arbitrary.) So after zero, one, or two X's, there can be an IX which adds 9 to the total. If there is no IX, the OptionalNine will match the empty rule. That consumes no text from the input and returns zero according to its action.

Roman numerals are a lot more complex than my little grammar can handle. For starters, by my calendar, we're now in the year MMIII. There are no M's in my grammar. Further, some Romans thought that IIIIII was perfectly valid. In my grammar three is the limit for all repetitions, and only I and X can repeat. Further, reductions can only take one away. So, IIX is not eight, it's invalid. This grammar can recognize any normalized Roman numeral up to 38. Feel free to expand it.

Parse::RecDescent is not as fast as a yacc-generated parser, but it is easier to use. See the documentation in the distribution for more information, especially the tutorial which originally appeared in The Perl Journal.

If you look at what's inside the parser (say with Data::Dumper) you might think this actually implements the interpreter pattern. After all, it makes a tree of objects from the grammar. Look closer and you will see the key difference. All of the objects in the tree are members of classes of like Parse::RecDescent::Action, which were written by Damian Conway when he wrote the module. In the GoF interpreter pattern we are expected to build a class for each non-terminal in the grammar (above those classes would be Numeral, ReducedTen, etc.). Thus, the tree node types are different for each grammar.

This difference has two implications: (1) it makes the RecDescent parser generator simpler and (2) it's result faster.

Summary

In this installment we have seen how to use code references to implement the Strategy and Template Method patterns. We even saw how to force our code into someone else's class. Builder turns text into an internal structure, which most Interpreters also do. Those structures can often be simple combinations of hashes, lists, and scalars. If what you need to read is simpler, use split or Config::Auto. If it is more complex, use Parse::RecDescent. If that won't do it fast enough, you might need one of the yaccs.

Next time I will look at patterns which actually rely on objects.