Perl Design Patterns, Part 2

This is the second in a series of articles which form one Perl programmer's response to the book, Design Patterns (also known as the Gang of Four book or simply as GoF, because four authors wrote it).

As I showed in the first article, Perl provides the best patterns in its core and many others are modules which ship with Perl or are available from CPAN. There I considered Iterator (foreach), Decorator (pipes and list filters), Flyweight (Memoize.pm), and Singleton (bless an object in a BEGIN block).

People into patterns often talk about how knowing patterns makes describing designs easier. The parenthetical comments in the last sentence show how Perl takes this to new heights by including the patterns internally.

This article continues my treatment by considering patterns which rely on data containers and/or code references (which are also called callbacks). Before showing the patterns let me explain these terms.

Data Containers

I use data containers to mean any reference that holds a data structure. Arrays and hashes are common data containers, but hashes of lists of hashes storing things are more interesting. Careful use of these structure containers can often eliminate the need for objects.

Here's a concrete example. Suppose I want a phone list. I might use a container like this:


    my $phone_list = {
        'phil' => [
            { type => 'home',  number => '555-0001' },
            { type => 'pager', number => '555-1000' },
        ],
        'frank' => [
            { type => 'cell',  number => '555-9012' },
            { type => 'pager', number => '555-5678' },
            { type => 'home',  number => '555-1234' },
        ],
    };

This container is housed in a hash. Its keys are names; its values are phone numbers. The numbers are listed in the order the person would like them used. (Call Frank on his cell phone first, then try his pager. If all else fails, use his home phone.)

To use this structure I might do the following:


    #!/usr/bin/perl
    use strict; use warnings;

    my $phone_list = {
        'Phil' => [
            { type => 'home',  number => '555-0001' },
            { type => 'pager', number => '555-1000' },
        ],
        'Frank' => [
            { type => 'cell',  number => '555-9012' },
            { type => 'pager', number => '555-5678' },
            { type => 'home',  number => '555-1234' },
        ],
    };

    my $person = shift or die "usage: $0 person\n";

    foreach my $number (@{$phone_list->{$person}}) {
        print "$number->{type} $number->{number}\n";
    }

Here, the user supplies the name of the person he or she wants to reach as a command line argument which I store as $person. I then loop through all the phone numbers for that person, printing the type and number.

Of course, in practice your data would live outside your script. The example just shows what one data container can hold.

If you need to use a structure made of data nodes, you can often avoid the need for a node object by using a data container instead. Object Oriented programming proponents would probably want me to make an object for each person. In that object they might even want me to store an object for each phone type in some cumbersome list container. My advice: don't give in to pedants. Even in Java, I can build a structure like the one above (though not as easily). Doing so is often wise. Objects work better in more complex situations.

What's a Code Reference?

A code reference is like any reference in Perl, but what it points to is a subroutine you can call. For instance, I could write:


    my $doubler = sub { return 2 * $_[0]; };

Then later in my program I would call that routine as:


    my $doubled = &$doubler(5);  # $doubled is now 10

This example is contrived. But it lets you see the basic syntax of code references. If you assign a sub to a variable, you receive a code reference by the grace of Perl. To call the sub stored in the reference put an & in front of the variable which stores it. This is like we do for other references, as in this standard hash walker:


    foreach my $key (keys %$hash_reference) { ... }

The & is the sigil (or funny character) for subroutines, just like @ and % are the sigils for arrays and hashes.

Many patterns in GoF and outside it can be implemented well in Perl with code references. Languages which don't provide code references are missing an important type.

Having explained these tools, I'm ready to show you some patterns which use them.

Strategy

When you want to select from a series of choices for how something should be done, you need a strategy scheme. For example, you might want to sort based on a comparison function. Each time you sort, you should be able to specify the order strategy.

Since Perl has code references, we can easily implement the strategy pattern without bloating our code base with a proliferation of classes whose sole purpose is to provide one function.

Here's an example with the built-in sort:


    sort { lc($a) cmp lc($b) } @items

This sorts without regard to case. Notice how sort is receiving the function directly in the call. Though we could do this for our own functions, it is more common to take a reference to the function as a required positional parameter.

Suppose, for example, that we want to list all files in the current directory, or any of its subdirectories, with some property. There are two pieces to this task: (1) Scan down the directory tree for all the entries, and (2) Test each file to see if it meets the criterion. Ideally we would like to separate these tasks so we can reuse them independently (for instance scanning a directory tree is more common than any particular criterion). We will make the criterion a strategy executed by the directory scanner.


    #!/usr/bin/perl
    use strict; use warnings;

    my @files = find_files(\&is_hidden, ".");
    local $" = "\n";
    print "@files\n";

    sub is_hidden {
        my $file = shift;
        $file    =~ s!.*/!!;
        return 0 if ($file =~ /^\./);
        return 1;
    }

    sub find_files {
        my $callback = shift;
        my $path     = shift;
        my @retval;

        push   @retval, $path if &$callback($path);
        return @retval unless (-d $path);

        # identify children
        opendir DIR, $path or return;
        my @files = readdir DIR;
        closedir DIR;

        # visit each child
        foreach my $file (@files) {
            next if ($file =~ /^\.\.?$/);  # skip . and ..
            push @retval, find_files("$path/$file", $callback);
        }

        return @retval;
    }

To understand this example, start with the initial call to find_files. It passes two arguments. The first is a code reference. Note the syntax. As I pointed out in the introduction, to let Perl know I mean a subroutine, I put the & sigil in front of is_hidden. To make a reference to that routine (instead of calling it immediately), I put a backslash in front, just as I would to take any other kind of reference.

When I use the callback in find_files, $callback has the reference to the code. To dereference it I put the & sigil in front of it.

The find_files subroutine takes a path where the search begins and a code reference called $callback. At each invocation, it stores the path in the return list, if callback returns true for that path. This allows you to reuse find_files for many applications, changing only the callback subroutine to change the outcome. This is the strategy pattern, but without the hassle of subclassing the find_files abstract base class and overriding the criterion method.

In find_files, I use recursion to descend the directory tree and its subtrees. First, I call the callback to see if the current path should go into the output. Then the real routine begins. What the callback does makes no difference to this routine. Any true or false value is OK with find_files.

The recursion stops if the file is not a directory. At that point the list is immediately returned. (It could be empty or have the current path in it, depending on the callback's return value.) Otherwise, all the files and subdirectories in the current path are read into @files. Each of those entries is scanned by the recursive call to find_files (unless the file is . or .., which would create endless recursion). Whatever the recursive call to find_files returns, it is pushed onto the end of the final output. When all children have been visited, @result is returned to the caller.

The CPAN module File::Find robustly solves the problem approached quickly in my example above. It relies on exactly this kind of function callback.

The Strategy Pattern uses a callback to perform a single task that varies from use to use. The next pattern uses a series of callbacks to implement the steps of an algorithm.

Template Method

In some calculations the steps are known, but what the steps do is not. For example, computing charges for a rental might involve three steps:

  1. Calculate amount due from rates.
  2. Calculate taxes.
  3. Add these together.

Yet different rentals might have different schemes for calculating the amount due from rates, and different jurisdictions usually have different tax schemes. A template method can implement the outline, deferring to callers for the individual schemes.


    package Calc;
    use strict; use warnings;

    sub calculate {
        my $class     = shift;   # discarded
        my $data      = shift;
        my $rate_calc = shift;   # a code ref
        my $tax_calc  = shift;   # also a code ref

        my $rate      = &$rate_calc($data);
        my $taxes     = &$tax_calc($data, $rate);
        my $answer    = $rate + $taxes;
    }

Here the caller supplies a data reference (probably to a hash or object) together with two code references which are used as callbacks. Each callback must expect the data reference as its first parameter. The tax_calc code reference also receives the amount due from the rate calculator. This allows it to use a percentage of the amount together with information in the data reference.

A caller might look like this:


    #!/usr/bin/perl
    use strict; use warnings;

    use Calc;

    my $rental = {
        days_used    => 5,
        day_rate     => 19.95,
        tax_rate     => .13,
    };

    my $amount_owed = Calc->calculate($rental, \&rate_calc, \&taxes);
    print "You owe $amount_owed\n";

    sub rate_calc {
        my $data = shift;
        return $data->{days_used} * $data->{day_rate};
    }

    sub taxes {
        my $data     = shift;  # discarded
        my $subtotal = shift;

        return $data->{tax_rate} * $subtotal;
    }

I made this contrived caller so you can see the calling sequence. The data here is a simple hash. To save exporting from Calc, I made calculate a class method, so I call it through its class. In the call, I pass a reference to my data hash and references to the two calculation routines.

This can be made more complex if you like. One could even make a full-blown class hierarchy of calculators, allowing callers to select the one they want. This example is about as simple as I could make the template method pattern.

Another approach to templates is to have the caller place methods in the template package. This approach amounts to an implementation of mixins a la Ruby. Here's a sample that is more object oriented.


    package Calc;

    sub calculate {
        my $self = shift;
        my $rate = $self->calculate_rate();
        my $tax  = $self->calculate_tax($rate);
        return $rate + $tax;
    }

    1;

The whole module is really only the template method. To use it, you have to code calculate_rate and calculate_tax methods, or your script will die. Here's a particular implementation of the scheme:


    package CalcDaily;
    package Calc;
    use strict; use warnings;

    sub new {
        my $class = shift;
        my $self  = {
            days_used    => shift,
            day_rate     => shift,
            tax_rate     => shift,
        };
        return bless $self, $class;
    }

    sub calculate_rate {
        my $data = shift;
        return $data->{days_used} * $data->{day_rate};
    }

    sub calculate_tax {
        my $data     = shift;  # discarded
        my $subtotal = shift;

        return $data->{tax_rate} * $subtotal;
    }

    1;

Note that I added a constructor and two methods to the Calc package in a different source file. This is perfectly legal and occasionally useful. By doing this, the template is totally isolated. It doesn't even know what sort of data will be stored in the objects of its own type. That does mean that only one Calc subtype can be used at a time. If that's a problem for you, do the standard thing: have Calc call methods on objects in some separate hierarchy.

There are two package statements at the top of the file, this is on purpose. The first one tells people (and crawlers) that this is the CalcDaily package which rightfully belongs in CalcDaily.pm, not the original Calc, which belongs in Calc.pm.

Finally, here's the caller, which is only slightly modified:


    #!/usr/bin/perl
    use strict; use warnings;
    use Calc;
    use CalcDaily;

    my $rental      = Calc->new(5, 19.95, .13);
    my $amount_owed = $rental->calculate();
    print "You owe $amount_owed\n";

This technique is similar to the one used in the debugger architecture for Perl. To make my own debugger, I need a name for it. I might choose PhilDebug.pm. Then I have to make a file with that name in a Devel directory which is in my @INC list. The first line in the file should be (but doesn't have to be):


    package Devel::PhilDebug;

This allows the CPAN indexer to properly catalog my module.

The base package for debuggers is fixed as DB. Perl expects to call the DB function in that package. So all together it might look something like this:


    package Devel::PhilDebug;
    package DB;

    sub DB {
        my @info = caller(0);
        print "@info\n";
    }

    1;

Any script will use this debugger if it is invoked as:


    perl -d:PhilDebug script

Each time the debugger notices that a new statement is about to start, it first calls DB::DB. This is a very powerful example of plug-and-play.

It is not usually wise to pollute foreign classes with your own code. Yet, Perl permits this, because it is sometimes highly useful. There seems to be a theme here:

Don't rule out dangerous things. Just avoid them, unless you have a good reason to use them.

The Strategy and Template patterns use code references to allow the caller to adjust the behavior of an algorithm. The template I showed used a data container to hold rental information. The next pattern makes more use of data containers.

Builder

Many structures external to your program should be represented with composites (like trees or the data container in the introduction) inside your program. There are two fundamentally different ways to represent these structures. For an object-oriented way to compose such structures see the Composite Pattern in GoF (which I will discuss in my next article).

Here we'll look at how to build a composite structure in a hash of hashes. You might rather build the objected-oriented version. Which you choose should depend on the complexity of the data and the methods to act on it. If data and methods are simple, you should probably use the hash structure. It will be faster, have built-in support, and be more familiar to Perl programmers who might need to maintain your code. If the complexities are large, you should use full-blown objects. They make the structure easier to understand for object-oriented programmers and provide more code-based documentation than simple hashes.

So, hashes are superior structures for simple to moderately complex data. To see how to build a hash structure consider an example: visualizing an outline. For simplicity, I'll represent the outline purely through indentation (not with Roman or other numerals). Here's an example outline:


    Grocery Store
        Milk
        Juice
        Butcher
            Thin sliced ham
            Chuck roast
        Cheese
    Cleaners
    Home Center
        Door
        Lock
        Shims

This outline describes a theoretical shopping trip. I want to represent it internally in my program so I can play with it. (One of my favorite games is turning outlines into pictures, see below.)

Instead of a full-blown object, I'll use a little hash-based data container for each node in the tree. Each node will keep track of three things:

  1. Name
  2. Level
  3. Children (a list of other nodes)

To keep track of who is a child of whom, I'll use a stack of these nodes. The node on the top of the stack is usually the parent of the next line of input. To show my method, I'll intersperse comments with the script. At the bottom of this section the script appears in one piece.


    #!/usr/bin/perl
    use strict; use warnings;

These lines are always a good idea.


    my $root = {
        name     => "ROOT",
        level    => -1,
        children => [],
    };

This is the root node. It's a hash reference containing the three keys mentioned earlier. The root node is special. Since it isn't in the file, I give it an artificial name and a level that is lower than anyone else's. (In a moment, we will see that levels in the input will be zero or positive.) Initially the list of children is empty.


    my @stack;
    push @stack, $root;

The stack will keep track of the ancestry of each new node. For starters it needs the root node, which won't ever be popped, because it is an ancestor of all the nodes.


    while (<>) {
        /^(\s*)(.*)/;
        my $indentation = length $1 if defined ($1);
        my $name        = $2;

To read the file, I chose a magic while. For each line there will be two parts: the indentation (the leading spaces) and the name (the rest of the line). The regular expression captures any leading space into $1 and everything else (except the new line) into $2. The length of the indentation is the important part, the bigger this is the more ancestors the node has. Lines starting at the margin have an indentation of 0 (which is why the ROOT has a level of -1).


        while ($indentation <= $stack[-1]{level}) {
            pop @stack;
        }

This loop handles ancestry. It pops the stack, until the node on top of the stack is the parent of the new node. Think of an example. When Home Center comes along, Cleaners and ROOT are on the stack. Home Center's level is 0 (it's at the margin), so is Cleaners'. Thus, Cleaners is popped (since 0 <= 0). Then only ROOT remains, so popping stops (0 is not <= -1).


        my $node = {
            name     => $name,
            level    => $indentation,
            children => [],
        };

This builds a new node for the current line. It's name and level are set. We haven't seen any children yet, but I make room for them in an empty list.


        push @{$stack[-1]{children}}, $node;

This line adds the new node to its parent's list of children. Remember that the parent is sitting on top of the stack. The top of the stack is $stack[-1] or the last element in the array.


        push @stack, $node;
    }

This pushes the new node onto the stack, in case it has children. The closing brace ends the magic while loop. For simplicity, I chose to display the output with Data::Dumper:


    use Data::Dumper; print Dumper($root);

Running this shows the tree (sideways) on standard out.

Here's the whole code without interruption:


    #!/usr/bin/perl
    use strict; use warnings;

    my $root = {
        name     => "ROOT",
        level    => -1,
        children => [],
    };

    my @stack;
    push @stack, $root;

    while (<>) {
        /^(\s*)(.*)/;
        my $indentation = length $1;
        my $name        = $2;
        while ($indentation <= $stack[-1]{level}) {
            pop @stack;
        }
        my $node = {
            name     => $name,
            level    => $indentation,
            children => [],
        };
        push @{$stack[-1]{children}}, $node;
        push @stack, $node;
    }

    use Data::Dumper; print Dumper($root);

I promised to explain how structures like the one above can be turned into pictures. The CPAN module UML::Sequence builds a structure similar to the one shown here. It then uses that to generate a UML Sequence diagram of the steps in SVG (Scalable Vector Graphics) format. That format can be converted with standard tools like Batik to PNG or JPEG. In practice the outlines which I turn into pictures represent call sequences for programs. Perl can even generate the outline by running the program. See UML::Sequence for more details.

When you have some interesting structured input, a builder might help make a good internal structure. One high value builder is XML::DOM. Another with a slightly different approach is XML::Twig. It is not coincidental that XML parsers are really builders, as XML files are non-binary trees.

Interpreter

If you haven't looked in GoF yet, start with the interpreter pattern. Laughter is good for the soul. The person who taught me patterns in Java did not even know why this pattern would not work in practice. He had heard it was somewhat slow, but he wasn't sure. Well I'm sure.

Luckily for us, Perl has alternatives. These range from quick and dirty to full blown. Here's the litany covered with examples below:

  • split
  • eval'ing Perl code
  • Config::Auto
  • Parse::RecDescent

Since we already have a language we like (that's Perl for those who haven't been paying attention), interpreting is limited to small languages that do something for us. Usually these turn out to be configuration files, so I will focus on those. (See the builder section above if a tree can represent your data file.)

Splitting

The easiest route involves split. Suppose I have a config file which uses variable=value settings. Comments and blanks should be ignored, all other lines should have a variable, value pair. That's easy:


    sub parse_config {
        my $file = shift;
        my %answer;

        open CONFIG, "$file" or die "Couldn't read config file $file: $!\n";
        while (<CONFIG>) {
            next if (/^#|^\s*$/);  # skip blanks and comments
            my ($variable, $value) = split /=/;
            $answer{$variable} = $value;
        }
        close CONFIG;

        return %answer;
    }

This subroutine expects a config file name. It opens and reads that file. Inside the magic while loop the regex rejects lines which start with '#' and those which contain only whitespace. All other lines are split on '='. The variables become keys in the %answer hash. When all the lines are read, the caller gets the hash back.

You could go much further along these lines, but see below for those who've gone before you (see especially Config::Auto).

Evaluating Perl Code

My current favorite way to bring configuration information into a Perl program is to specify the config file in Perl. So, I might have a config file like this:


    our $db_name = "projectdb";
    our $db_pass = "my_special_password_no_one_will_think_of";
    our %personal = (
        name    => "Phil Crow",
        address => "philcrow2000@yahoo.com",
    );

To use this in a Perl program all I have to do is eval it:


    ...
    open CONFIG, "config.txt" or die "couldn't...\n";
    my $config = join "", <CONFIG>;
    close CONFIG;

    eval $config;
    die "Couldn't eval your config: $@\n" if $@;
    ...

To read the file, I open it, then use join to put the angle read operator in list context. This lets me bring the whole file into a scalar. Once it's in (and the file is closed for tidiness), I just eval the string I read. I need to check $@ to make sure the file was good Perl. After that, I'm ready to use the values just as if they appeared in the program originally.

Config::Auto -- For Those Who Can't be Bothered

If you're too lazy to write your own config handler, or if you have lots of configs outside your control, Config::Auto may be for you. Basically, it takes a file and guesses how to turn it into a config hash. (It can even guess the name of your config file). Using it is easy (if it works):


    #!/usr/bin/perl
    use strict; use warnings;

    use Config::Auto;

    my $config = Config::Auto::parse("your.config");
    ...

What ends up in $config depends on what your config file looks like (shock). For files which use variable=value pairs, you get what you expect, which is exactly what the first example above generates for the same input. It is possible to specify a config file that Config::Auto cannot understand (shock and amazement).

Real Hackers Use Parse::RecDescent

If the file you need to parse is complex, consider Parse::RecDescent. It implements a clever top/down parser scheme. To use it, you specify a grammar. (You remember grammars, don't you? If not, see below.) It builds a parser from your grammar. You feed text to the parser. It does whatever the grammar specifies in its actions.

To give you a feel for how this works, I'll parse small Roman numerals. The program below takes numbers from the keyboard and translates them from Roman numerals to decimal integers, so XXIX becomes 29.


    #!/usr/bin/perl
    use strict; use warnings;

    use Parse::RecDescent;

    my $grammar = q{
        Numeral : TenList FiveList OneList /\Z/
                    { $item[1] + $item[2] + $item[3]; }
                | /quit/i { exit(0); }
                | <error>

        TenList : Ten(3)                  { 30            }
                | Ten(2) OptionalNine     { 20 + $item[2] }
                | Ten OptionalNine        { 10 + $item[2] }
                | OptionalNine            { $item[1]      }

        OptionalNine : One Ten { 9 }
                     |         { 0 }

        FiveList : One Five { 4 }
                 | Five     { 5 }
                 |          { 0 }

        OneList : /(I{0,3})/i { length $1 }

        Ten : /X/i

        Five : /V/i

        One : /I/i
};

my $parse = new Parse::RecDescent($grammar);

while (<>) { chomp; my $value = $parse->Numeral($_); print ``value: $value\n''; }

As you can see $grammar takes up most of the space in this program. The rest is pretty simple. Once I receive the parser from the Parse::RecDescent constructor, I just call its Numeral method repeatedly.

So what does the grammar mean? Let's start at the top. Grammars are built from rules. The rule for a Numeral (the Roman kind) says:


    A Numeral takes the form of one of these choices
        a TenList then a FiveList then a OneList then the end of the string
        OR
        the word quit in any case (not a Numeral, but a way to quit)
        OR
        anything else, which is an error

We'll see what TenList and its friends are shortly. The code after the first choice is called an action. If the rule matches a possibility, it performs that possibility's action. So if a valid Numeral is seen, the action is executed. This particular action adds up the values TenList, FiveList, and OneList have accumulated. The items are numbered starting with 1, so TenList's value is in $item[1], etc.

How does TenList get a value? Well, when Numeral starts matching, it looks first for a valid TenList. There are four choices:


    A TenList takes the form of one of these choices
        three Tens
        OR
        two Tens then an OptionalNine
        OR
        a Ten then an OptionalNine
        OR
        an OptionalNine

These choices are tried in order. A Ten is simply an upper- or lower-case X (see the Ten rule). The result of an action is the result of its last statement. So, if there are three tens, the TenList returns 30. If there are two tens, it returns 20 plus whatever OptionalNine returned.

The Roman numeral IX is our 9. I call this an OptionalNine. (The names are completely arbitrary.) So after zero, one, or two X's, there can be an IX which adds 9 to the total. If there is no IX, the OptionalNine will match the empty rule. That consumes no text from the input and returns zero according to its action.

Roman numerals are a lot more complex than my little grammar can handle. For starters, by my calendar, we're now in the year MMIII. There are no M's in my grammar. Further, some Romans thought that IIIIII was perfectly valid. In my grammar three is the limit for all repetitions, and only I and X can repeat. Further, reductions can only take one away. So, IIX is not eight, it's invalid. This grammar can recognize any normalized Roman numeral up to 38. Feel free to expand it.

Parse::RecDescent is not as fast as a yacc-generated parser, but it is easier to use. See the documentation in the distribution for more information, especially the tutorial which originally appeared in The Perl Journal.

If you look at what's inside the parser (say with Data::Dumper) you might think this actually implements the interpreter pattern. After all, it makes a tree of objects from the grammar. Look closer and you will see the key difference. All of the objects in the tree are members of classes of like Parse::RecDescent::Action, which were written by Damian Conway when he wrote the module. In the GoF interpreter pattern we are expected to build a class for each non-terminal in the grammar (above those classes would be Numeral, ReducedTen, etc.). Thus, the tree node types are different for each grammar.

This difference has two implications: (1) it makes the RecDescent parser generator simpler and (2) it's result faster.

Summary

In this installment we have seen how to use code references to implement the Strategy and Template Method patterns. We even saw how to force our code into someone else's class. Builder turns text into an internal structure, which most Interpreters also do. Those structures can often be simple combinations of hashes, lists, and scalars. If what you need to read is simpler, use split or Config::Auto. If it is more complex, use Parse::RecDescent. If that won't do it fast enough, you might need one of the yaccs.

Next time I will look at patterns which actually rely on objects.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en