Perl Design Patterns, Part 3

Aug 15, 2003 by Phil Crow

This is the third (and final) article in a series which form one Perl programmer’s response to the book Design Patterns (also known as the Gang of Four book or simply as GoF, because four authors wrote it). As I showed in the second article, Perl provides the types needed to implement many patterns. The Strategy and Template Method patterns can be implemented with code references. Builder usually builds a structure based on references to some combination of hashes and lists. Interpreters can be implemented with simple tools like split or with the king: Parse::RecDescent, which brings the best of yacc into your Perl script (albeit with somewhat less efficiency than yacc).

This article continues my treatment by considering patterns which rely on objects. As such, this article’s patterns bears the most resemblance to the GoF book. Before presenting some patterns, I’ll give you my two cents about object applicability.

When Are Objects Good?

As Larry Wall reminds us about all programming constructs, you should use objects when they make sense and not when they don’t. So when do they make sense? This is partly a matter of taste. This subsection gives you my tastes.

It’s easier to say when objects are bad, which they are in these cases:

More on Perl Design Patterns:

• Perl Design Patterns, Part 1
• Perl Design Patterns, Part 2

There is only data, the methods are either trivial or non-existent. Data containers (also called nodes) are like this. For example, I should not need an object to return three numbers and a string to my caller.
There are only methods. The Java Math class is like this. It won’t even let you make a Math object. Clearly its methods should just be built-in functions of the language.

Seeing the poor uses of objects gives insight into their effective use. Use objects when complexity is high and data is tightly coupled to the methods which act on it. High complexity makes these chief advantages of objects more important: separate namespaces, inheritance, and polymorphism.

Now that I’ve spoken my peace, I’ll go on to the patterns which use objects.

Abstract Factory

If you want to build platform independent programs, you need a way to access the underlying systems without having to recode for each one’s API. This is where a factory comes into play. The source code asks for an instance of a class, the class delivers a subclass instance suitable for use on the current platform. That class is called an abstract factory (or simply a factory). As we will see below, the platform might be a database. So the factory would return an object suitable for use with a particular database, but all the objects would have the same API.

To show the basic idea, here is an example which delivers one of two types. There are four code files in this example. The first two are the greeters.

package Greet::Repeat;

sub new {
    my $class    = shift;
    my $self     = {
        greeting => shift,
        repeat   => shift,
    };
    return bless $self, $class;
}

sub greet {
    my $self = shift;
    print ($self->{greeting} x $self->{repeat});
}

1;

This greeter’s constructor expects a greeting and a repeat count. It stores these in a hash, returning a blessed reference to it. When asked to greet, it prints the greeting repeatedly (hence the name). (I didn’t say this example was practical, but it is small.)

    package Greet::Stamp;
    use strict; use warnings;

    sub new {
        my $class    = shift;
        my $greeting = shift;
        return bless \$greeting, $class;
    }

    sub greet {
        my $greeting = shift;
        my $stamp    = localtime();
        print "$stamp $$greeting";
    }

    1;

This greeter only expects a greeting string, so it blesses a reference to the one it receives. When asked to greet, it prints the current time followed by the greeting.

Here’s the factory:

    package GreetFactory;
    use strict; use warnings;

    sub instantiate {
        my $self           = shift;
        my $requested_type = shift;
        my $location       = "Greet/$requested_type.pm";
        my $class          = "Greet::$requested_type";

        require $location;

        return $class->new(@_);
    }

    1;

A Perl factory looks a lot like factories in other languages. This one has only one method. It returns the requested type to the caller. It uses the caller’s requested type as the name of the class to instantiate and as the name of the Perl module in which the class lives.

Finally, you can use this factory with a script like this:

    #!/usr/bin/perl
    use strict; use warnings;

    use GreetFactory;

    my $greeter_n = GreetFactory->instantiate("Repeat", "Hello\n", 3);
    $greeter_n->greet();

    my $greeter_stamp = GreetFactory->instantiate("Stamp", "Good-bye\n");
    $greeter_stamp->greet();

To make each greeter, call the instantiate method of GreetFactory, passing it the name of the class you want and any arguments that class’s constructor is expecting.

This example shows you the basic idea. It is simple on purpose. But it does show how the factory can be ignorant of the underlying classes. Any new greeter added to the system must have a name of the form Greet::Name and be placed into a Greet subdirectory of an @INC path member as Name.pm. Then callers can use it without changing the factory. Now that you have seen a simple example, here is a more useful one.

The Perl DBI (DataBase Interface) provides an excellent example of a factory. Each call to DBI-connect>, expects the type of database and whatever information that database needs to establish a connection. This is a classic factory. It will load any DBD (DataBase Driver) you have installed on your system, upon request. Additional DBD’s can be added at any time. Once they are installed, any client can use them through the same DBI API. Here’s an example use of DBI:

    use DBI;
    my $dbh      =
        DBI->connect("dbi:mysql:mydb:localhost", "user", "password");
    ...
    my $sth      = $dbh->prepare('select * from table');
    ...

Once the database handle is obtained (which is usually called $dbh), it can be used almost without regard to the underlying engine. If you later move to Oracle, you would merely change the connect call. If a new database comes on the scene, some smart person in contact with Tim Bunce will implement a class for it. You can install and switch to it as soon as they finish their work. You might even be the implementer, but I doubt I will be.

Composite

This pattern shows how to use the fully OO composite pattern. If you are interested in a simpler non-OO implementation see the Builder Pattern in Part 2 of this article series.

Many applications require hierarchies of related items linked into a tree by relationship. Many people see a hierarchy of this type: a directory structure. At the top is the root directory. In the simplest case it includes two types of items: files and subdirectories. Each subdirectory is like the root directory. Note that this definition of the structure is recursive, which is typical of composites.

One of the most popular examples of a composite structure today is an XML file. These files have a root element which contains various types of subelements, including tags and comments. Tags have attributes and some can contain subelements. This makes the classic composite tree. There are two important steps for a composite structure. The first is building it. The second is using it. We’ll see simple examples of both here.

For the genuine pattern, there must be methods that act on both regular and composite elements (the elements with children are called composite elements). Invoking such a method on the root of a composite tree, or subtree, causes that root to do work on its own data AND to forward the request to its children. Those children do the same, collecting their own data and that of their children, until the bottom of the tree is reached. The return value is a collection of all this data.

For a practical example consider using the DOM model to process XML. (You may obtain the XML::DOM module from CPAN.) To find all the paragraphs in a document we could do something like this:

    use XML::DOM;
    my $parser = XML::DOM::Parser->new();
    my $doc    = $parser->parsefile("file.xml");
    foreach my $paragraph ($doc->getElementsByTagName("paragraph")) {
        print "<p>";
        foreach my $child ($paragraph->getChildNodes) {
            print $child->getNodeValue if ($child->getNodeType eq TEXT_NODE);
        }
    }
    $doc->dispose();

The call to getElementsByTagName begins at the root (since I called it through $doc). The root returns any of its children which are paragraphs, but it also forwards the request to all of its tag elements asking them to return their paragraphs. They do the same.

An unrelated note: Notice that the above example ends with a call to dispose. XML::DOM composite structures have references from parents to children and from children to parents. We usually call these circular links. Perl 5 garbage collection cannot harvest such structures. We must call dispose to break the extra links so the structure’s memory can be recovered. If you build structures with circular links, you must break those links yourself, otherwise your program will leak memory.

We’ve seen how useful a well crafted composite structure can be, but how could we build one for ourselves? The objects in the structure must all respond to the methods meant to walk the composite. They may return undef immediately, but they must exist. Further, the version of those methods in the composite objects (the ones which can have children), must take care to pass the message along to their children.

To make this concrete, consider a non-binary tree (as we have been all along). Suppose we want to know how many nodes are in the tree. We can ask the root to count_nodes. It should count itself and add that to the sum of count_nodes calls to each child. Nodes which are not composite (i.e. have no children) return one. Composite nodes, return one plus the sums from their children. The code follows.

There are four pieces of code: (1) A base class for tree nodes: Node.pm, (2) A class for nodes that could have children: Composite.pm, (3) A class for nodes that can’t have children: Regular.pm, and (4) a driver to demonstrate that the system works: comp. I’ll show these one at a time, in the order listed above.

    package Node;
    use strict; use warnings;

    sub count_nodes {
        my $self       = shift;
        my $class_name = ref $self;
        die "$class_name does not implement count_nodes\n";
    }

    1;

The only method here is count_nodes. This serves as an implementation requirement (also called an abstract method). Attempting to use a Node subclass which doesn’t provide count_nodes results in a fatal run-time error. Every subclass should have an appropriate test to make sure this error never happens to users.

    package Regular;

    use Node;
    @ISA = qw(Node);

    use strict; use warnings;

    sub new {
        my $class = shift;
        my $name  = shift;
        return bless \$name, $class;
    }

    sub count_nodes {
        return 1;
    }

    1;

Regular nodes are blessed references to their names. They always count as a single node. (An unrelated note: it is sometimes convenient to turn on strict after the preamble of a package, here that let’s me use @ISA without qualifying it.)

    package Composite;

    use Node;
    @ISA = qw(Node);

    use strict; use warnings;

    sub new {
        my $class = shift;
        my $name  = shift;
        my $self  = { name => $name, children => [] };
        return bless $self, $class;
    }

    sub add_child {
        my $self      = shift;
        my $new_child = shift;

        push @{$self->{children}}, $new_child;
        return $new_child;
    }

    sub count_nodes {
        my $self  = shift;
        my $count = 1;

        foreach my $child (@{$self->{children}}) {
            $count += $child->count_nodes();
        }
        return $count;
    }

    1;

This class is similar to Regular, but it needs a way to keep track of children. Since it also keeps its name, I used a hash for the object type. New children are just pushed onto a list. Counting includes one for the parent node, plus the total for each child. Since leaves of the tree also implement count_nodes, we can process all Node types together. This is the polymorphism advantage of objects and the heart of the Composite Pattern.

    #!/usr/bin/perl
    use strict; use warnings;

    use Composite;
    use Regular;

    my $root     = Composite->new("Root");

    my $eldest   = $root->add_child(Composite->new("Jim"));
    my $middle   = $root->add_child(Composite->new("Jane"));
                   $root->add_child(Regular->new("Bob"));
    my $youngest = $root->add_child(Composite->new("Joe"));

                   $eldest->add_child(Regular->new("JII"));
    my $kayla    = $eldest->add_child(Composite->new("Kayla"));
                   $kayla->add_child(Regular->new("Max"));

    my $count = $root->count_nodes();

    print "count: $count\n";

This contrived example manually builds a simple tree, then asks for a node count. The correct answer is 8.

Proxy

In GoF the proxy pattern example shows a way to delay loading expensive components until the user actually wants them. In the course of the example they show a genuine proxy. Proxies refer all requests to some other object. Think of it like an intermediary for the mob. You make your request to your local thug, as if he could do the thing himself. He passes that on to someone else you never meet who actually does the job. (Note to John Ashcroft: I am only imaging this process, having NO personal experience with it. Honest.)

Suppose an application could use several large files, but usually only needs one or two. Instead of reading all these files, I will delay loading the file until the caller wants to see it. The usual warning applies: this is contrived to explain the concept.

Here is the class that actually stores and prints the files:

    package File;
    use strict; use warnings;

    sub new {
        my $class = shift;
        my $file  = shift;
        open FILE, "$file" or die "Couldn't read $file: $!\n";
        my @data  = <FILE>;
        close FILE;
        return bless \@data, $class;
    }

    sub print_file {
        my $data = shift;
        print @$data;
    }

    sub DESTROY { }

    1;

When the File constructor is called, it reads the file into an array for later use, returning a blessed reference to the data to the caller. When asked to print, it sends the data to the currently selected output handle (usually standard out).

The DESTROY subroutine is called by Perl whenever a blessed reference is about to go out of scope. This allows us to perform clean-up which is guaranteed to happen. In this case, there is no necessary clean-up, but the approach I’m about to show for the proxy class ends up calling this method explicitly. That explicit call offends Perl so much that it complains to the screen. To avoid the warning, I included the stub.

There is nothing special about the File class shown above. The proxy follows.

    package FileProxy;
    use strict; use warnings;

    use File;

    sub new {
        my $class = shift;
        my $self  = {
            params         => \@_,
            wrapped_object => undef,
        };
        return bless $self, $class;
    }

    sub AUTOLOAD {
        my $self    = shift;
        my $command = our $AUTOLOAD;
        $command    =~ s/.*://;

        unless (defined $self->{wrapped_object}) {
            $self->{wrapped_object} = File->new(@{$self->{params}});
        }
        $self->{wrapped_object}->$command(@_);
    }

    1;

The constructor for the proxy takes the things necessary to build an actual File object (namely the file name) and stores them as its params attribute. The other attribute will eventually hold the wrapped File object. The attributes are stored in hash, the hash’s reference is blessed and returned to the caller.

Whenever Perl has no where else to go with a method call, it calls AUTOLOAD (if there is one). So, the AUTOLOAD in FileProxy handles all requests except new and DESTROY, which appear explicitly. AUTOLOAD is all caps to remind us that Perl calls it for us. While making this call, Perl sets the package global variable $AUTOLOAD to the name of the method the caller invoked. The regular expression strips off the package names from $AUTOLOAD, leaving only the method name.

If the object is not yet defined, AUTOLOAD calls File->new passing it the arguments stored during construction. After that, the object is defined, so AUTOLOAD calls the requested method on the wrapped object. The beauty of this mechanism is that the FileProxy class only knows that the constructor is called new. It does not need to change as changes to File.pm are made. Any errors, such as no such method, will be fatal as usual.

To use this proxied scheme we might employ a caller like this:

    #!/usr/bin/perl
    use strict; use warnings;

    use FileProxy;

    my $file1 = FileProxy->new("art1");
    my $file2 = FileProxy->new("art2");

    $file1->print_file();
    $file1->print_file();
    $file2->print_file();

With a couple of changes we could use this for any class. Here’s the new generic version:

    package DelayLoad;
    use strict; use warnings;

    our %proxied_classes;

    sub import {
        shift;  # discard class name

        %proxied_classes = @_;

        foreach my $class (keys %proxied_classes) {
            require "$class.pm";
        }
    }

    sub new {
        my $class = shift;
        my $self  = {
            type           => shift,
            constructor    => shift,
            params         => \@_,
            wrapped_object => undef,
        };
        return bless $self, $class;
    }

    sub AUTOLOAD {
        my $self    = shift;
        my $command = our $AUTOLOAD;
        $command    =~ s/.*://;

        if ($proxied_classes{$command}) {
            return $self->new($command, $proxied_classes{$command}, @_);
        }
        else {
            unless (defined $self->{wrapped_object}) {
                my $proxied_class       = $self->{type};
                my $constructor         = $self->{constructor};
                $self->{wrapped_object} = $proxied_class
                                        ->$constructor(@{$self->{params}});
            }
            $self->{wrapped_object}->$command(@_);
        }
    }

    1;

The first change is cosmetic: the name now reflects the nature of the proxy. Other changes include a new method: import. Even though its name is lower case, Perl calls it whenever the caller says use DelayLoad (see below). It does two things. First, it stores the name of each proxied class in the %proxied_classes package global. Second, it requires each module. require is like use, but it happens at run time instead of compile time. (use also imports symbols, but then your object oriented module shouldn’t be exporting anything anyway.)

The constructor now stores a bit more information. In addition to saving room for the wrapped object and storing the params, it also records the name of the class and of that class’s constructor. These will be used in AUTOLOAD.

The only other changes are in the AUTOLOAD method. There are two changes. The easiest one is to look up the class and constructor names in the DelayLoad object instead of just calling File->new.

The other change is used during construction. My explanation of it will make more sense, if you see the new caller first.

The new version requires a couple of changes to the caller. One change is on the use line which becomes:

    use DelayLoad "File" => "new";

This uses DelayLoad, tells it we want to be able to delay loads for File objects, and that File’s constructor is called new.

The other change is in how we construct the delayed object:

    my $file1 = DelayLoad->File("art1");
    my $file2 = DelayLoad->File("art2");

This explains the unexplained piece in AUTOLOAD above. When the user calls the File method, AUTOLOAD notices that this ``method’’ is really the name of a delay loaded class. When the if in AUTOLOAD is true (i.e. the method is really a key in %proxied_classes), the caller is given a new DelayLoad object primed for later use. When the if fails, DelayLoad works like FileLoad: it constructs the object, if needed and calls the requested method.

The fundamental point of this example is that Perl allows us to implement proxies without knowing very much about the underlying class. In this case, import receives the necessary information from the caller, AUTOLOAD takes care of the rest. Making the caller work is not always a good idea. Here it makes sense. If she knows she wants to delay loading objects until they are really needed, she must at least know the API for those objects. In the API is the name of the constructor, which she mentions in the use statement so Perl can pass to DelayLoad::import for her.

Keep in mind that AUTOLOAD is not designed for this sort of work. Its real purpose in life is to load subroutines on demand for the current package. It can’t do that here, since changing the subroutines affects all instances of a class. Here we are AUTOLOADing data, not routines. By suitably adjusting import and AUTOLOAD, you can make the proxy do many other things.

Summary

In this article, I have finally shown object oriented patterns. We saw how to implement a Factory so our callers can choose their favorite driver, how to build composite structures and routines that traverse them (without explicit first_child and next pointers that would be needed in languages without quality built-in lists), and how to stand as a proxy between a caller and a class with import and AUTOLOAD.