Perl Design Patterns

Introduction

In 1995, Design Patterns was published, and during the intervening years, it has had a great influence on how many developers write software. In this series of articles, I present my take on how the Design Patterns book (the so-called Gang of Four book, which I will call GoF) and its philosophy applies to Perl. While Perl is an OO language -- you could code the examples from GoF directly in Perl -- many of the problems the GoF is trying to solve are better solved in Perl-specific ways, using techniques not open to Java developers or those C++ developers who insist on using only objects. Even if developers in other languages are willing to consider procedural approaches, they can't, for instance, use Perl's immensely powerful built-in pattern support.

Though these articles are self-contained, you will get more out of them if you are familiar with the GoF book (or better yet have it open on your desk while you read). If you don't have the book, then try searching the Web - many people talk about these patterns. Since the Web and the book have diagrams of the object versions of the patterns, I will not reproduce those here, but can direct you to this fine site.

I will show you how to implement the highest value patterns in Perl, most often by using Perl's rich core language. I even include some objects.

For the object-oriented implementations, I need you to understand the basics of Perl objects. You can learn that from printed sources like the Perl Cookbook by Tom Christiansen and Nat Torkington or Objected Oriented Perl by Damian Conway. But the simplest way to learn the basics is from perldoc perltoot.

As motivation for my approach, let me start with a little object-oriented philosophy. Here are my two principles of objects:

  1. Objects are good when data and methods are tightly bound.
  2. In most other cases, objects are overkill.

Let me elaborate briefly on these principles.

Objects are good when data and methods are tightly bound.

When you are working for a company that rents cars (as I do), an object to represent a rental agreement makes sense. The data on the agreement is tightly bound to the methods you need to perform. To calculate the amount owed, you take the various rates and add them together, etc. This is a good use of an object (or actually several aggregated objects).

In most other cases, objects are overkill.

Consider a few examples from other languages. Java has the java.lang.Math class. It provides things such as sine and cosine. It only provides class methods and a couple of class constants. This should not be forced into an object-oriented framework, since there are no Math objects. Rather the functions should be put in the core, left out completely, or made into non-object-oriented functions. The last option is not even available in Java.

Or think of the C++ standard template library. The whole templating framework is needed to make C++ backward compatible with C and to handle strong static-type checking. This makes for awkward object-oriented constructs for things that should be simple parts of the core language. To be specific, why shouldn't the language just have a better array type at the outset? Then a few well-named built-in operations take care of stacks, queues, dequeues and many other structures we learned in school.

So, in particular, I take exception to one consistent GoF trick: turning an idea into a full-blown class of objects. I prefer the Perl way of incorporating the most-important concepts into the core of the language. Since I prefer this Perl way, I won't be showing how to objectify things that could more easily be a simple hash with no methods or a simple function with no class. I will invert the GoF trick: implement full-blown pattern classes with simpler Perl concepts.

The patterns in this first article rely primarily on built-in features of Perl. Later articles will address other groups of patterns. Now that I've told you what I'm about to do, let's start.

Iterator

There are many structures that you need to walk one element at a time. These include simple things such as arrays, moderate things such as the keys of a hash, and complex things such as the nodes of a tree.

The Gang of Four suggest solving this problem with the above mentioned trick: turn a concept into an object. Here that means you should make an iterator object. Each class of objects that can reasonably be walked should have a method that returns an iterator object. The object itself always behaves in a uniform way. For example, consider the following code, which uses an iterator to walk the keys of a hash in Java.


    for (Iterator iter = hash.keySet().iterator(); iter.hasNext();) {
        Object key   = iter.next();
        Object value = hash.get(key);
        System.out.println(key + "\t" + value);
    }

The HashMap object has something that can be walked: its keys. You can ask it for this keySet. That Set will give you an Iterator on request to its iterator method. The Iterator responds to hasNext with a true value if there are more things to be walked, and false otherwise. Its next method delivers the next object in whatever sequence the Iterator is managing. With that key, the HashMap delivers the next value in response to get(key). This is neat and tidy in the completely OO framework of a language with limited operators and built-in types. It also perfectly exhibits the GoF iterator pattern.

In Perl any built-in or user defined object which can be walked has a method which returns an ordered list of the items to be walked. To walk the list, simply place it inside the parentheses of a foreach loop. So the Perl version of the above hash key walker is:


    foreach my $key (keys %hash) {
        print "$key\t$hash{$key}\n";
    }

I could implement the pattern exactly as it is diagrammed in GoF, but Perl provides a better way. In Perl 6, it will even be possible to return a list that expands lazily, so the above will be more efficient than it is now. In Perl 5, the keys list is built completely when I call keys. In the future, the keys list will be built on demand, saving memory in most cases, and time in cases where the loop ends early.

The inclusion of iteration as a core concept represents Perl design at its finest. Instead of providing a clumsy mechanism in non-core code, as Java and C++ (through its standard template library) do, Perl incorporates this pattern into the core of the language. As I alluded to in the introduction, there is a Perl principle here:

If a pattern is really valuable, then it should be part of the core language.

The above example is from the core of the language. To see that foreach fully implements the iterator pattern, even for user-defined modules, consider an example from CPAN: XML::DOM. The DOM for XML was specified by Java programmers. One of the methods you can call on a DOM Document is getElementsByTagName. In the DOM specification this returns a NodeList, which is a Java Collection. Thus, the NodeList works like the Set in the Java code above. You must ask it for an Iterator, then walk the Iterator.

When Perl people implemented the DOM, they decided that getElementsByTagName would return a proper Perl list. To walk the list one says something like:


    foreach my $element ($doc->getElementsByTagName("tag")) {
        # ... process the element ...
    }

This stands in stark contrast to the overly verbose Java version:


    NodeList elements = doc.getElementsByTagName("tag");
    for (Iterator iter = elements.iterator(); iter.hasNext();) {
        Element element = (Element)iter.next();
        // ... process the element ...
    }

One beauty of Perl is its ability to combine procedural, object-oriented, and core concepts in such powerful ways. The facts that GoF suggests implementing a pattern with objects and that object only languages like Java require it do not mean that Perl programmers should ignore the non-object features of Perl.

Perl succeeds largely by excellent use of the principle of promotion. Essential patterns are integrated into the core of the language. Useful things are implemented in modules. Useless things are usually missing.

So the iterator pattern from GoF is a core part of Perl we hardly think about. The next pattern might actually require us to do some work.

Decorator

In normal operation, a decorator wraps an object, responding to the same API as the wrapped object. For example, suppose I add a compressing decorator to a file writing object. The caller passes a file writer to the decorator's constructor, and calls write on the decorator. The decorator's write method first compresses the data, then calls the write method of the file writer it wraps. Any other type of writer could be wrapped with the same decorator, so long as all writers respond to the same API. Other decorators can also be used in a chain. The text could be converted from ASCII to unicode by one decorator and compressed by another. The order of the decorators is important.

In Perl, I can do this with objects, but I can also use a couple of language features to obtain most of the decorations I need, sometimes relying solely on built-in syntax.

I/O is the most common use of decoration. Perl provides I/O decoration directly. Consider the above example: compressing while writing. Here are two ways to do this.

Use the Shell and Its Tools

When I open a file for writing in Perl, I can decorate via shell tools. Here is the above example in code:


    open FILE, "| gzip > output.gz"
        or die "Couldn't open gzip and/or output.gz: $!\n";

Now everything I write is passed through gzip on its way to output.gz. This works great so long as (1) you are willing to use the shell, which sometimes raises security issues; and (2) the shell has a tool to do what you need done. There is also an efficiency concern here. The operating system will spawn a new process for the gzip step. Process creation is about the slowest thing the OS can do without performing I/O.

Tying

If you need more control over what happens to your data, then you can decorate it yourself with Perl's tie mechanism. It will be even faster, easier to use, and more powerful in Perl 6, but it works in Perl 5. It does work within Perl's OO framework; see perltie for more information.

Suppose I want to preface each line of output on a handle with a time stamp. Here's a tied class to do it.


    package AddStamp;
    use strict; use warnings;

    sub TIEHANDLE {
        my $class  = shift;
        my $handle = shift;
        return bless \$handle, $class;
    }

    sub PRINT {
        my $handle = shift;
        my $stamp  = localtime();

        print $handle "$stamp ", @_;
    }

    sub CLOSE {
        my $self = shift;
        close $self;
    }

    1;

This class is minimal, in real life you need more code to make the decorator more robust and complete. For example, the above code does not check to make sure the handle is writable nor does it provide PRINTF, so calls to printf will fail. Feel free to fill in the details. (Again, see perldoc perltie for more information.)

Here's what these pieces do. The constructor for a tied file handle class is called TIEHANDLE. Its name is fixed and uppercase, because Perl calls this for you. This is a class method, so the first argument is the class name. The other argument is an open output handle. The constructor merely blesses a reference to this handle and returns that reference.

The PRINT method receives the object constructed in TIEHANDLE plus all the arguments supplied to print. It calculates the time stamp and sends that together with the original arguments to the handle using the real print function. This is typical decoration at work. The decorating object responds to print just like a regular handle would. It does a little work, then calls the same method on the wrapped object.

The CLOSE method closes the handle. I could have inherited from Tie::StdHandle to gain this method and many more like it.

Once I put AddTimeStamp.pm in my lib path, I can use it like this:


    #!/usr/bin/perl
    use strict; use warnings;

    use AddStamp;

    open LOG, ">output.tmp" or die "Couldn't write output.tmp: $!\n";
    tie *STAMPED_LOG, "AddStamp", *LOG;

    while (<>) {
        print STAMPED_LOG;
    }

    close STAMPED_LOG;

After opening the file for writing as usual, I use the built-in tie function to bind the LOG handle to the AddStamp class under the name STAMPED_LOG. After that, I refer exclusively to STAMPED_LOG.

If there are other tied decorators, then I can pass the tied handle to them. The only downside is that Perl 5 ties are slower than normal operations. Yet, in my experience, disks and networks are my bottlenecks so in memory inefficiency like this tends not to matter. Even if I make the script code execute 90 percent faster, I don't save a noticeable amount of time, because it wasn't using much time in the first place.

This technique works for many of the built-in types: scalars, arrays, hashes, as well as file handles. perltie explains how to tie each of those.

Ties are great since they don't require the caller to understand the magic you are employing behind their back. That is also true of GoF decorators with one clear exception: In Perl, you can change the behavior of built-in types.

Decorating Lists

One of the most common tasks in Perl is to transform a list in some way. Perhaps you need to skip all entries in the list that start with underscore. Perhaps you need to sort or reverse the list. Many built-in functions are list filters. They take a list, do something to it and return a resultant list. This is similar to Unix filters, which expect lines of data on standard input, which they manipulate in some way, before sending the result to standard output. Just as in Unix, Perl list filters can be chained together. For example, suppose you want a list of all subdirectories of the current directory in reverse alphabetical order. Here's one possible solution.


 1  #!/usr/bin/perl
 2  use strict; use warnings;
 3
 4  opendir DIR, ".",
 5      or die "Can't read this directory, how did you get here?\n";
 6  my @files = reverse sort map { -d $_ ? $_ : () } readdir DIR;
 7  closedir DIR;
 8  print "@files\n";

Perl 6 will introduce a more meaningful notation for these operations, but you can learn to read them in Perl 5, with a little effort. Line 6 is the interesting one. Start reading it on the right (this is backward for Unix people). First, it reads the directory. Since map expects a list, readdir returns a list of all files in the directory. map generates a list with the name of each file which is a directory (or undef if the -d test fails). sort puts the list in ASCII-betical order. reverse reverses that. The result is stored in @files for later printing.

You can make your own list filter quite easily. Suppose you wanted to replace the ugly map usage above (I tend to think map is always ugly) with a special purpose function, here's how:


    #!/usr/bin/perl
    use strict; use warnings;

    sub dirs_only (@) {
        my @retval;
        foreach my $entry (@_) {
            push @retval, $entry if (-d $entry);
        }
        return @retval;
    }

    opendir DIR, "."
        or die "Can't read this directory, how did you get here?\n";
    my @files = reverse sort { lc($a) cmp lc($b) } dirs_only readdir DIR;
    closedir DIR;
    local $" = ";";
    print "@files\n";

The new dirs_only routine replaces map above, leaving out the entries we don't want to see.

The sort now has an explicit comparison subroutine. This is to prevent it from thinking that dirs_only is its comparison routine. Since I had to include this, I chose to take advantage of the situation and sort with more finesse: ignoring case.

You can make such list filters to your heart's content.

I have now shown you the most important types of decoration. Any others you need could be implemented in the traditional GoF way.

The next pattern feels like cheating, but then Perl often gives me that feeling.

Flyweight

The idea of reusing objects is the essence of the flyweight pattern. Thanks to Mark-Jason Dominus, Perl takes this far beyond what the GoF had in mind. Further, he did the work once and for all. Larry Wall likes this idea so much he's promoting it to the core for Perl 6 (there's that promotion concept again).

What I want is this:

For objects whose instances don't matter (they are constants or random), those requesting a new object should be given the same one they already received whenever possible.

This pattern fails dramatically if separate instances matter. But if they don't, then it would save time and memory.

Here's an example of how this works in Perl. Suppose I want to provide a die class for games like Monopoly or Craps. My die class might look like this: (Warning: This example is contrived to show you the technique.)


    package CheapDie;
    use strict; use warnings;

    use Memoize;
    memoize('new');

    sub new {
        my $class = shift;
        my $sides = shift;
        return bless \$sides, $class;
    }

    sub roll {
        my $sides         = shift;
        my $random_number = rand;

        return int ($random_number * $sides) + 1;
    }

    1;

On first glance, this looks like many other classes. It has a constructor called new. The constructor stores the received number of sides into a subroutine lexical variable (a.k.a. a my variable), returning a blessed reference to it. The roll method calculates a random number, scales it according to the number of sides, and returns the result.

The only thing strange here are these two lines:


    use Memoize;
    memoize('new');

These exploit Perl's magic extraordinarily well. The memoize function modifies the calling package's symbol table so that new is wrapped. The wrapping function examines the incoming arguments (the number of sides in this case). If it has not seen those arguments before, then it would call the function as the user intended, storing the result in a cache and returning it to the user. This takes more time and memory than if I had not used the module.

The savings come when the method is called again. When the wrapper notices a call with the same arguments it used before, it does not call the method. Rather, it sends the cached object instead. We don't have to do anything special as a caller or as an object implementor. If your object is big, or slow to construct, then this technique would save you time and memory. In my case, it wastes both since the objects are so small.

The only thing to keep in mind is that some methods don't benefit from this technique. For example, if I memoize roll, then it would return the same number each time, which is not exactly the desired result.

Note too that Memoize can be used in non-object situations - in fact the documentation for it doesn't seem to contemplate using it for object factories.

Not only do languages such as Java not have core functions for caching method returns, they don't allow clever users to implement them. Mark-Jason Dominus did a fine thing implementing Memoize, but Larry Wall did a better thing by letting him. Imagine Java letting a user write a class that manipulated the caller's symbol table at run time - I can almost hear the screams of terror. Of course, these techniques can be abused, but precluding them is a greater loss than rejecting poor code on the few occasions that some less-than-stellar programmer improperly adjusts the symbol table.

In Perl all things are legal, but some are best left to modules with strong development communities. This allows regular users to take advantage of magic manipulations without worrying about whether our own magic will work. Memoize is an example. Instead of rolling your own wrapped call and caching scheme, use the well-tested one that ships with Perl (and looked for the 'is cached' trait to do this for routines in Perl 6).

The next pattern is related to this one, so you can use flyweight to implement it.

Singleton

In the flyweight pattern, we saw that there are sometimes resources that everyone can share. GoF calls the special case when there is a single resource that everyone needs to share the singleton pattern. Perhaps the resource is a hash of configuration parameters. Everyone should be able to look there, but it should only be built on startup (and possibly rebuilt on some signal).

In most cases, you could just use Memoize. That seems most reasonable to me. (See the flyweight section above.) In that case, everyone who wants access to the resource calls the constructor. The first person to do so causes the construction to happen and receives the object. Subsequent people call the constructor, but they receive the originally constructed object.

There are many other ways to achieve this same effect. For instance, if you think your callers might pass you unexpected arguments, then Memoize would make multiple instances, one for each set of arguments. In this case, managing the singleton with modules like Cache::FastMemoryCache from CPAN may make more sense. You could even use a file lexical, assigning it a value in a BEGIN block. Remember bless doesn't have to be used in a method. You could say:


    package Name;
    my $singleton;
    BEGIN {
        $singleton = {
            attribute => 'value',
            another   => 'something',
        };
        bless $singleton, "Name";
    }

    sub new {
        my $class = shift;
        return $singleton;
    }

This avoids some of the overhead of Memoize and shows what I'm doing more directly. I made no attempt to take subclassing into account here. Maybe I should, but the pattern says a singleton should belong always to one class. The fundamental statement about singletons is:

``There can only be one singleton.''

Summary

All four of the patterns shown in this article use built-in features, or standard modules. The iterator is implemented with foreach. The decorator is implemented for I/O with Unix pipe and redirection syntax or with a tied file handle. For lists, decorators are just functions which take and return lists. So, I might call decorators filters. Flyweights are shared objects easily implemented with the Memoize module. Singletons can be implemented as flyweights or with simple object techniques.

The next time some uppity OO programmer starts going on about patterns, rest assured, you know how to use them. In fact, they are built-in to the core of your language (at least if you have the sense to use Perl).

Next time, I will look at patterns which rely on code references or data containers.

Acknowledgements and Background

I wrote these articles after taking a training course using GoF from a well-known training and consulting company. My writing is also informed by many people in the Perl community, including Mark-Jason Dominus, who showed at YAPC 2002, using his unique flair, how Perl deals with the iterator pattern. Though the writing here is mine, the inspiration comes from Dominus and many others in the Perl community, most of all Larry Wall, who have incorporated patterns into the heart of Perl during the years. As these patterns show, time and time again, Perl employs the principle of promotion carefully and well. Instead of adding a collection framework in source code modules, as Java and C++ do, Perl has only two collections: arrays and hashes. Both are core to the language. I think Perl's greatest strength is the community's choices of what to include in the core, what to ship along with the core, and what to leave out. Perl 6 will only make Perl more competitive in the war of language design ideas.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en