Recently in Objects Category

Generating UML and Sequence Diagrams

Imagine yourself in a meeting with management. You're about to begin your third attempt to explain how to process online credit card payments. After a couple of sentences, you see some eyes glazing over. Someone says, "Perhaps you could draw us a picture."

Imagine me handling a recent request from my boss. He came in to the bat cave and said (in summary), "We want customers to sign up for email accounts without calling customer service. All the account creation code is in the customer care app." It didn't take long to find the relevant web screen, where the CSR presses Save to kick off the account creation, but there sure were a lot of layers between there and the final result. Keeping them in mind is hard enough when I'm deep in the problem. Three months from now, when an odd bug surfaces, it'll be nearly impossible without the right memory aid.

In both of these cases, the right diagram is the sequence diagram. (I'd show you mine for the situations above, but they're secret.) Sequence diagrams clearly show the time flow of method or function calls between modules. For complex systems, these diagrams can save a lot of time--like the time you and your fellow programmers spend during initial design, the time spent explaining what's possible to management, the time you spend remembering how things work when you revisit an old system that needs a new feature, and especially the time it takes a new programmer in your shop to get up to speed.

In short, sequence diagrams help with complex call stacks just as data model diagrams help with complex database schema.

While the sequence diagram is useful to me, I don't like on-screen drawing tools. Therefore, I wrote the original UML::Sequence to make the drawings for me. With recent help from Dean Arnold, the current version has many nice features and is closer to standards compliance (but, both Dean and I prefer a useful diagram to a compliant one). Using UML::Sequence, you can quickly make proposed diagrams of systems not yet built. You can even run it against existing programs to have it diagram what they actually do.

Reading a Sequence Diagram

If you already know how to read sequence diagrams, you can skip to the next section.

Because most uses of UML involve object-oriented projects, that's where I've drawn my examples. Don't think that objects are necessary for sequence diagrams. I've diagrammed many non-OO programs with it (including some in COBOL).

A simple example will work best for a first look at UML sequence diagrams, so consider rolling two dice. My over-engineered solution gives a nice diagram to discuss. In it, I made each die an object of the Die class and the pair of dice an object of the DiePair class. To roll the dice, I wrote a little script. Here are these pieces:

    package Die;
    use strict;

    sub new {
        my $class = shift;
        my $sides = shift || 6;
        return bless { SIDES => $sides }, $class;
    }

    sub roll {
        my $self       = shift;
        $self->{VALUE} = int( rand * $self->{SIDES} ) + 1;

        return $self->{VALUE};
    }

    1;

The Die constructor takes an optional number of sides for the new die object, but supplies six as a default. It bundles that number of sides into a hash reference, blesses, and returns it.

The roll() method makes a random number and uses it to pick a new value for the die, which it returns.

DiePair is equally scintillating:

    package DiePair;
    use strict;

    use Die;

    sub new {
        my $class     = shift;
        my $self      = {};
        $self->{DIE1} = Die->new( shift );
        $self->{DIE2} = Die->new( shift );

        return bless $self, $class;
    }

    sub roll {
        my $self   = shift;
        my $value1 = $self->{DIE1}->roll();
        my $value2 = $self->{DIE2}->roll();

        $self->{TOTAL}   = $value1 + $value2;
        $self->{DOUBLES} = ( $value1 == $value2 ) ? 1 : 0;

        return $self->{TOTAL}, $self->{DOUBLES};
    }

    1;

The constructor makes two die objects and stores them in a hash reference, which it blesses and returns.

The roll() method rolls each die, storing the value, then totals them and decides whether the roll was doubles. It returns both total and doubles, saving the driver from having to call back for them.

Rather than modeling a real game like craps, I use a small driver, which will simplify the resulting diagram.

    #!/usr/bin/perl
    use strict;

    use DiePair;

    my $die_pair          = DiePair->new(6, 6);
    my ($total, $doubles) = $die_pair->roll();

    print "Your total is $total ";
    print "it was doubles" if $doubles;
    print "\n";

Figure 1 shows the sequence diagram for this driver.

the sequence diagram for the die roller
Figure 1. The sequence diagram for the die roller

Each package has a box at the top of the diagram. The script is in the main package (which is always Perl's default). Time flows from top to bottom. Arrows represent method (or function) calls.

The vertical boxes, or activations, represent the life of a call. Between the activations are dashed lines called the life lines of the objects.

You can see that main begins first (because its first activation is higher than the others). It calls new() on the DiePair class. That call lasts long enough for DiePair's constructor to call new() on the Die class twice.

After making the objects, the script calls roll() on the DiePar, which forwards the request to the individual dice.

This diagram is unorthodox. The boxes at the top should represent individual instances, not classes. Sometimes I prefer this style because it compacts the diagram horizontally. Figure 2 shows a more orthodox diagram (divergent only in the lack of name underlining).

a more orthodox UML diagram
Figure 2. A more orthodox UML diagram

You can see the individual Die objects that the DiePair instance aggregates, because there is now a box at the top for each object (use your imagination when thinking about the driver as an instance). The names do not come from the code; they are sequentially assigned from the class name.

Diagrams like this are especially helpful when many classes interact. For instance, many of them start with a user event (like a button press on a GUI application) and show how the view communicates with the controller and how the controller in turn communicates with the data model.

Another particularly useful application is for programs communicating via network sockets. In their diagrams, each program has a box, and the arrows represent writing on a socket. Note that UML sequence diagrams may also have dashed arrows, which show return values going back to callers. Unless there is something unusual about that value, there is no use to waste space on the diagram for those returns. However, in a network situation, showing the back and forth can be quite helpful. UML::Sequence now has support for return arrows.

Using UML::Sequence

Now that you understand how to read a sequence diagram, I can show you how to make them without mouse-driven drawing tools.

Making diagrams with UML::Sequence is a three-step process:

  1. Create a program or a text file.
  2. Use genericseq.pl to create an XML description of the diagram.
  3. Use a rendering script to turn the XML into an image file.

If the image is in the wrong format for your purposes, you might need an extra step to convert to another format.

Running Perl Programs

Here is how I generated Figure 1 above by running the driver program. If your program is in Perl, you can use this approach (see the next subsection for Java programs).

First, create a file listing the subs you want to see in the diagram:

    DiePair::new
    DiePair::roll
    Die::new
    Die::roll

I called this file roller.methods to correspond to the script's name, roller. When you make your method list, remember that sequence diagrams are visual space hogs, so pick a short list of the most important methods.

Then, run the program through the genericseq.pl script:

$ genericseq.pl UML::Sequence::PerlSeq roller.methods roller > roller.xml

UML::Sequence::PerlSeq uses the Perl debugger's hooks to profile the code as it runs, watching for the methods listed in roller.methods. The result is an XML file describing the calls that actually happened during this run.

To turn this into a picture, use one of the image scripts:

$ seq2svg.pl roller.xml > roller.svg

Obviously, seq2svg.pl makes SVG images. If you have no way to view those, get Firefox 1.5, use a tool like the batik rasterizer, or use seq2rast.pl, which makes PNG images directly using the GD module.

If you want diagrams like Figure 2, use UML::Sequence::PerlOOSeq in place of UML::Sequence::PerlSeq when you run genericseq.pl.

Running Java Programs

I wrote UML::Sequence while working as a Java programmer, so I made it work on Java (at least sometimes it works). The process is similar to the above. First, make a methods file:

    ALL
    Roller
    DiePair
    Die

Here I use ALL to mean all methods from the following classes. You can also list full signatures (but they have to be full, valid, and expressed in the internal signature format as if generated by javap).

Then run genericseq.pl with UML::Sequence::JavaSeq in place of UML::Sequence::PerlSeq. Of course, this requires you to have a Java development environment on your machine. In particular, it must be able to find tools.jar, which provides the debugger hooks necessary to watch the calls.

Produce the image from the resulting XML file as shown earlier for Perl programs.

Text File Input

While I pat myself on the back every time I make a sequence diagram of a running program, that's not always (or even usually) practical. For instance, you might want to show the boss what you have planned for code you haven't written yet. Alternately, you might have a program that is so complex that no amount of tweaking the methods file will restrict the diagram enough to make it useful.

In these cases, there is a small text language you can use to specify the diagram. It is based on indentation and uses dot notation for method names. Here is a sample:

At Home.Wash Car
    Garage.retrieve bucket
    Kitchen.prepare bucket
        Kitchen.pour soap in bucket
        Kitchen.fill bucket
    Garage.get sponge
    Garage.open door
    Driveway.apply soapy water
    Driveway.rinse
    Driveway.empty bucket
    Garage.close door
    Garage.replace sponge
    Garage.replace bucket

Each line will become an arrow in the final diagram (except the first line). Indentation indicates the call depth. The "class" name comes before the dot and the "method" name after it.

There is no need for a methods file in this case, because presumably you didn't bother to type things you didn't care about. You may go directly to running genericseq.pl:

$ genericseq.pl UML::Sequence::SimpleSeq inputfile > wash.xml

Once you have the XML file, render it as before.

Getting Fancy

As I mentioned earlier, Dean Arnold recently added lots of cool features to amaze and impress your bosses and/or clients. In particular, he expanded the legal syntax for text outlines. Here is his sample of car washing with the new features:

AtHome.Wash Car
        /* the bucket is in the garage */
    Garage.retrieve bucket
    Kitchen.prepare bucket
        Kitchen.pour soap in bucket
        Kitchen.fill bucket
    Garage.get sponge
    Garage.checkDoor
            -> clickDoorOpener
        [ ifDoorClosed ] Garage.open door
    * Driveway.apply soapy water
    ! Driveway.rinse
    Driveway.empty bucket
    Garage.close door
    Garage.replace sponge
    Garage.replace bucket

There are several new features here:

  • You can include UML annotations by using C-style comments, as shown on the second line of the example. Each annotation attaches to the following line as a footnote (or tooltip, if you install a third-party open source library).
  • There is a -> in front of clickDoorOpener. This becomes an asynchronous message arrow. When -> comes between a method and additional text, it indicates that a regular method is returning the value on the righthand side of the arrow. The return appears as a dashed arrow from the called activation back to the caller.
  • ifDoorClosed is in brackets, which mark a conditional in UML. These appear in the diagram in front of the method name.
  • There is a star in front of Driveway.apply, which indicates a loop construct in UML. (UML people call this iteration.)
  • There is an exclamation point in front of Driveway.rinse, indicating urgency.

In addition to these changes to the outline syntax, both seq2svg.pl and seq2rast.pl now support options to control appearance (including colors) and to generate HTML imagemaps for raster versions of the diagrams. The imagemaps hyperlink diagram elements--columns header and method call names--to supporting documents. For example, clicking on the Garage header will open Garage.html, while clicking on checkDoor will also open Garage.html, but at the #checkDoor anchor.

Summary

UML Sequence diagrams are a great way to see how function or method calls (or network messages) flow through a multi-module application, whether it is object-oriented or not. Using UML::Sequence and its helper scripts, you can make those diagrams without having to point and click in a drawing program.

References

The imagemapped HTML version of car washing is viewable online.

To read more about UML diagrams, check out the aptly named UML Distilled, by Martin Fowler, available from your favorite bookseller.

I recommend Walter Zorn's JavaScript, DHTML tooltips package to display embedded annotations.

Batik is an Apache project for managing and viewing SVG.

Overloading

Introduction: What is Overloading?

All object-oriented programming languages have a feature called overloading, but in most of them this term means something different from what it means in Perl. Take a look at this Java example:

public Fraction(int num, int den);
public Fraction(Fraction F);
public Fraction();

In this example, we have three methods called Fraction. Java, like many languages, is very strict about the number and type of arguments that you can pass to a function. We therefore need three different methods to cover the three possibilities. In the first example, the method takes two integers (a numerator and a denominator) and it returns a Fraction object based on those numbers. In the second example, the method takes an existing Fraction object as an argument and returns a copy (or clone) of that object. The final method takes no arguments and returns a default Fraction object, maybe representing 1/1 or 0/1. When you call one of these methods, the Java Virtual Machine determines which of the three methods you wanted by looking at the number and type of the arguments.

In Perl, of course, we are far more flexible about what arguments we can pass to a method. Therefore the same method can be used to handle all of the three cases from the Java example. (We'll see an example of this in a short while.) This means that in Perl we can save the term "overloading" for something far more interesting — operator overloading.

Number::Fraction — The Constructor

Imagine you have a Perl object that represents fractions (or, more accurately, rational numbers, but we'll call them fractions as we're not all math geeks). In order to handle the same situations as the Java class we mentioned above, we need to be able to run code like this:

use Number::Fraction;

my $half       = Number::Fraction->new(1, 2);
my $other_half = Number::Fraction->new($half);
my $default    = Number::Fraction->new;

To do this, we would write a constructor method like this:

sub new {
    my $class = shift;
    my $self;
    if (@_ >= 2) {
        return if $_[0] =~ /\D/ or $_[1] =~ /\D/;
        $self->{num} = $_[0];
        $self->{den} = $_[1];
    } elsif (@_ == 1) {
        if (ref $_[0]) {
            if (UNIVERSAL::isa($_[0], $class) {
                return $class->new($_[0]->{num}, $_[0]->{den});
            } else {
                croak "Can't make a $class from a ", ref $_[0];
            }
        } else {
            return unless $_[0] =~ m|^(\d+)/(\d+)|;

            $self->{num} = $1;
            $self->{den} = $2;
        }
    } elsif (!@_) {
        $self->{num} = 0;
        $self->{den} = 1;
    }

    bless $self, $class;
    $self->normalise;
    return $self;
}

As promised, there's just one method here and it does everything that the three Java methods did and more even, so it's a good example of why we don't need method overloading in Perl. Let's look at the various parts in some detail.

sub new {
    my $class = shift;
    my $self;

The method starts out just like most Perl object constructors. It grabs the class which is passed in as the first argument and then declares a variable called $self which will contain the object.

    if (@_ >= 2) {
        return if $_[0] =~ /\D/ or $_[1] =~ /\D/;
        $self->{num} = $_[0];
        $self->{den} = $_[1];

This is where we start to work out just how the method was called. We look at @_ to see how many arguments we have been given. If we've got two arguments then we assume that they are the numerator and denominator of the fraction. Notice that there's also another check to ensure that both arguments contain only digits. If this check fails, we return undef from the constructor.

     } elsif (@_ == 1) {
        if (ref $_[0]) {
            if (UNIVERSAL::isa($_[0], $class) {
                return $class->new($_[0]->num, $_[0]->den);
            } else {
                croak "Can't make a $class from a ", ref $_[0];
            }
        } else {
            return unless $_[0] =~ m|^(\d+)/(\d+)|;
            $self->{num} = $1;
            $self->{den} = $2;
        }

If we've been given just one argument, then there are a couple of things we can do. First we see if the argument is a reference, and if it is, we check that it's a reference to another Number::Fraction object (or a subclass). If it's the right kind of object then we get the numerators and denominators (using the accessor functions) and use them to call the two argument forms of new. It the argument is the wrong type of reference then we complain bitterly to the user.

If the single argument isn't a reference then we assume it's a string of the form num/den, which we can split apart to get the numerator and denominator of the fraction. Once more we check for the correct format using a regex and return undef if the check fails.

     } elsif (!@_) {
        $self->{num} = 0;
        $self->{den} = 1;
    }

If we are given no arguments, then we just create a default fraction which is 0/1.

    bless $self, $class;
    $self->normalise;
    return $self;
}

At the end of the constructor we do more of the normal OO Perl stuff. We bless the object into the correct class and return the reference to our caller. Between these two actions we pause to call the normalise method, which converts the fraction to its simplest form. For example, it will convert 12/16 to 3/4.

Number::Fraction — Doing Calculations

Having now created fraction objects, we will want to start doing calculations with them. For that we'll need methods that implement the various mathematical functions. Here's the add method:

sub add {
    my ($self, $delta) = @_;

    if (ref $delta) {
        if (UNIVERSAL::isa($delta, ref $self)) {
            $self->{num} = $self->num  * $delta->den
                + $delta->num * $self->den;
            $self->{den} = $self->den  * $delta->den;
        } else {
            croak "Can't add a ", ref $delta, " to a ", ref $self;
        }
    } else {
        if ($delta =~ m|(\d+)/(\d+)|) {
            $self->add(Number::Fraction->new($1, $2));
        } elsif ($delta !~ /\D/) {
            $self->add(Number::Fraction->new($delta, 1));
        } else {
            croak "Can't add $delta to a ", ref $self;
        }
    }
    $self->normalise;
}

Once more we try to handle a number of different types of arguments. We can add the following things to our fraction object:

  • Another object of the same class (or a subclass).
  • A string in the format num/den.
  • An integer. This is converted to a fraction with a denominator of 1.

This then allows us to write code like this:

my $half           = Number::Fraction->new(1, 2);
my $quarter        = Number::Fraction->new(1, 4);
my $three_quarters = $half;
$three_quarters->add($quarter);

In my opinion, this code looks pretty horrible. It also has a nasty, subtle bug. Can you spot it? (Hint: What will be in $half after running this code?) To tidy up this code we can turn to operator overloading.

Number::Fraction — Operator Overloading

The module overload.pm is a standard part of the Perl distribution. It allows your objects to define how they will react to a number of Perl's operators. For example, we can add code like this to Number::Fraction:

use overload '+' => 'add';

Whenever a Number::Fraction is used as one of the operands to the + operator, the add method will be called instead. Code like:

$three_quarters = $half + '3/4';

is converted to:

$three_quarters = $half->add('3/4');

This is getting closer, but it still has a serious problem. The add method works on the $half object. In general, however, that's not how an assignment should work. If you were working with ordinary scalars and had code like:

$foo = $bar + 0.75;

You would be very surprised if this altered the value of $bar. Our objects need to work in the same way. We need to change our add method so that it doesn't alter $self but instead returns the new fraction.

sub add {
    my ($l, $r) = @_;
    if (ref $r) {
        if (UNIVERSAL::isa($r, ref $l) {
            return Number::Fraction->new($l->num * $r->den + $r->num * $l->den,
                    $l->den * $r->den})
        } else {
            ...
        } else {
            ...
        }
    }
}

In this example, I've only shown one of the sections, but I hope it's clear how it would work. Notice that I've also renamed $self and $delta to $l and $r. I find this makes more sense as we are working with the left and right operands of the + operator.

Overloading Non-Commutative Operators

We can now happily handle code like:

$three_quarters = $half + '1/4';

Our object will do the right thing — $three_quarters will end up as a Number::Fraction object that contains the value 3/4. What will happen if we write code like this?

$three_quarters = '1/4' + $half;

The overload modules handle this case as well. If your object is either operand of one of the overloaded operators, then your method will be called. You get passed an extra argument which indicates whether your object was the left or right operand of the operator. This argument is false if your object is the left operand and true if it is the right operand.

For commutative operators you probably don't need to take any notice of this argument as, for example:

$half + '1/4'

is the same as:

'1/4' + $half

However, for non-commutative operators (like - and /) you will need to do something like this:

sub subtract {
    my ($l, $r, $swap) = @_;

    ($l, $r) = ($r, $l) if $swap;
    ...
}

Overloadable Operators

Just about any Perl operator can be overloaded in this way. This is a partial list:

  • Arithmetic: +, +=, -, -=, *, *=, /, /=, %, %=, **, **=, <<, <<=, >>, >>=, x, x=, ., .=
  • Comparison: <, <=, >, =>, ==, !=, <=> lt, le, gt, ge, eq, ne, cmp
  • Increment/Decrement: ++, -- (both pre- and post- versions)

A full list is given in overload.

It's a very long list, but thankfully you rarely have to supply an implementation for more than a few operators. Perl is quite happy to synthesize (or autogenerate) many of the missing operators. For example:

  • ++ can be derived from +
  • += can be derived from +
  • - (unary) can be derived from - (binary)
  • All numeric comparisons can be derived from <=>
  • All string comparisons can be derived from cmp

Two other special operators give finer control over this autogeneration of methods. nomethod defines a subroutine that is called when no other function is found and fallback controls how hard Perl tries to autogenerate a method. fallback can have one of three values:

undef
Attempt to autogenerate methods and die if a method can't be autogenerated. This is the default.
0
Never try to autogenerate methods.
1
Attempt to autogenerate methods but fall back on Perl's default behavior for the the object if a method can't be autogenerated.

Here's an example of an object that will die gracefully when an unknown operator is called. Notice that the nomethod subroutine is passed the usual three arguments (left operand, right operand, and the swap flag) together with an extra argument containing the operator that was used.

use overload
    '-' => 'subtract',
    fallback => 0,
    nomethod => sub { 
        croak "illegal operator $_[3]" 
};

Three special operators are provided to control type conversion. They define methods to be called if the object is used in string, numeric, and boolean contexts. These operators are denoted by q{""}, 0+, and bool. Here's how we can use these in Number::Fraction:

use overload
    q{""} => 'to_string',
    '0+'  => 'to_num';

sub to_string {
    my $self = shift;
    return "$_->{num}/$_->{den}";
}

sub to_num {
    my $self = shift;
    return $_{num}/$_->{den};
}

Now, when we print a Number::Fraction object, it will be displayed in num/den format. When we use the object in a numeric context, Perl will automatically convert it to its numeric equivalent.

We can use these type-conversion and fallback operators to cut down the number of operators we need to define even further.

use overload
    '0+' => 'to_num',
    fallback => 1;

Now, whenever our object is used where Perl is expecting a number and we haven't already defined an overloading method, Perl will try to use our object as a number, which will, in turn, trigger our to_num method. This means that we only need to define operators where their behavior will differ from that of a normal number. In the case of Number::Fraction, we don't need to define any numeric comparison operators since the numeric value of the object will give the correct behavior. The same is true of the string comparison operators if we define to_string.

Overloading Constants

We've come a long way with our overloaded objects. Instead of nasty code like:

use Number::Fraction;

$f = Number::Fraction->new(1, 2);
$f->add('1/4');

we can now write code like:

use Number::Fraction;

$f = Number::Fraction->new(1, 2) + '1/4';

There are still, however, two places where we need to use the full name of the class — when we load the module and when we create a new fraction object. We can't do much about the first of these, but we can remove the need for that ugly new call by overloading constants.

You can use overload::constant to control how Perl interprets constants in your program. overload::constant expects a hash where the keys identify various kinds of constants and the values are subroutines which handle the constants. The keys can be any of integer (for integers), float (for floating point numbers), binary (for binary, octal, and hex numbers), q (for strings), and qr (for the constant parts of regular expressions).

When a constant of the right type is found, Perl will call the associated subroutine, passing it the string representation of the constant and the way that Perl would interpret the constant by default. Subroutines associated with q or qr also get a third argument -- either qq, q, s, or tr --which indicates how the string is being used in the program.

As an example, here is how we would set up constant handlers so that strings of the form num/den are always converted to the equivalent Number::Fraction object:

my %_const_handlers = 
    (q => sub { 
        return __PACKAGE__->new($_[0]) || $_[1] 
});

sub import {
    overload::constant %_const_handlers if $_[1] eq ':constants';
}

sub unimport {
    overload::remove_constant(q => undef);
}

We've defined a hash, %_const_handlers, which only contains one entry as we are only interested in strings. The associated subroutine calls the new method in the current package (which will be Number::Fraction or a subclass) passing it the string as found in the program source. If this string can be used to create a valid Number::Fraction object, a reference to that object is returned.

If a valid object isn't returned then the subroutine returns its second argument, which is Perl's default intepretation of the constant. As a result, any strings in the program that can be intepreted as a fraction are converted to the correct Number::Fraction object and other strings are left unchanged.

The constant handler is loaded as part of our package's import subroutine. Notice that it is only loaded if the import subroutine is passed the optional argument :constants. This is because this is a potentially big change to the way that a program's source code is interpreted so we only want to turn it on if the user wants it. Number::Fraction can be used in this way by putting the following line in your program:

use Number::Fraction ':constants';

If you don't want the scary constant-refining stuff you can just use:

use Number::Fraction;

Also note that we've defined an unimport subroutine which removes the constant handler. An unimport subroutine is called when a program calls no Number::Fraction — it's the opposite of use. If you're going to make major changes to the way that Perl parses a program then it's only polite to undo your changes if the programmer askes you to.

Conclusion

We've finally managed to get rid of most of the ugly class names from our code. We can now write code like this:

use Number::Fraction ':constants';

my $half = '1/2';
my $three_quarters = $half + '1/4';
print $three_quarters;  # prints 3/4

I hope you can agree that this has the potential to make code far easier to read and understand.

Number::Fraction is available on the CPAN. Please feel free to take a closer look at how it is implemented. If you come up with any more interesting overloaded modules, I'd love to hear about them.

POOL

In this article, we're going to look at POOL, a handy "little language" I recently created for templating object-oriented modules. Now you may not write many object-oriented modules, so this may not sound too interesting to you. Don't worry; I also plan to discuss, among other things, Ruby, how to use the Template Toolkit, profiling, computational linguistic trie structures, Ruby again, and the oil paintings of the Great Masters. Hopefully, something in here will be enough to keep your interest.

Splashing Around

One of the reasons that I always feel I never get anything substantial done in Perl is that I'm always distracted unduly by subtasks, particularly metaprogramming. I write so many "labor-saving" modules that I never get around to doing the original labor in the first place.

For instance, I wanted to write something to handle my accounts; I needed something to handle command-line options but I couldn't be bothered with the Getopt::Long rigmarole, so I wrote Getopt::Auto. Next, I needed to parse a simple configuration file and I couldn't face writing yet another colon-separated file parser, so I wrote Config::Auto. Finally, I wrote something that examined a database schema and wrote Class::DBI packages for each table -- all helpful tasks, and they meant I would never have to worry about configuration files or command-line options again. But, of course, I forgot about my accounts-handling application.

Something like this happened again recently. I started writing a module I'm calling Devel::DProfPP, which parses the data output by Devel::DProf in a neat object-oriented way. I was about five minutes into it when found myself writing:


    =head1 CONSTRUCTOR

        $object = Devel::DProfPP->new( %options )

    Creates a new C<Devel::DProfPP> instance.

    =cut

    sub new {
        my ($class, %opts) = @_;
        bless { %opts }, $class;
    }

And then I wrote my test: (I'm a bad boy, I write my tests after I write the code)


    use Test::More;
    use_ok("Devel::DProfPP");
    my $x = Devel::DProfPP->new();
    isa_ok($x, "Devel::DProfPP");
    ...

Nothing strange about that, you might think. It's something I've written a dozen of times before, and something I will probably write a dozen times again. And that's when it hit me. I don't want to spend my time pounding out constructors and accessors, documentation that's practically identical from class to class, tests that are eminently predictable, and I never, ever wanted to have to write my $self = shift; again.

There are two ways to solve this. Now I'm not going to last long as editor of perl.com if I suggest everyone should do the first, and switching to Ruby wouldn't help with the documentation and tests part of the problem anyway.

The second is, of course, to make the computer do the hard work. I sketched out a short description of what I wanted in my Devel::DProfPP module, and wrote a little parser to read in that description, and then had some code generate the module. Then, I made a critical decision. I had just been doing some work with Template Toolkit, and wanted some more opportunity to play with it. So instead of hard-coding what the output module ought to look like, I simply passed the parsed data structure to a bunch of templates and let them do the work. This gave me an amazing amount of flexibility in terms of styling the output, and that's where I think the power of this system, the Perl Object Oriented Language, (POOL) lies.

In order to investigate that, let's look at some POOL files and the output they generate, and then we'll examine how the templates work and fit together.

Diving In

The POOL language is very, very ad hoc. It's bizarre and inconsistent, but that's because it was created by rationalizing a module description I scribbled down late one night. But it does the job. It's essentially a brain dump, and if your brain happens not to work like mine, then you might not like it; if yours does work like mine, my commiserations.

The POOL distribution, available from CPAN, ships with a handy reference manual, and that very description; these are the original notes I made when I was mocking up Devel::DProfPP, and they look like this:


    Devel::DProfPP - Parse C<Devel::DProf> output
        DESCRIPTION

    This module takes the output file from L<Devel::DProf> (typically
    F<tmon.out>) and parses it into a Perl data structure. Additionally, it
    can be used in an event-driven style to produce reports more
    immediately.

        EOD
        @fh
        ->@enter    || sub {}
        ->@leave    || sub {}
        ro->@stack  []
        @syms       []
        parse
        top_frame   ->stack->[0]

The first line should be familiar to anyone who writes or uses Perl modules: it's the description line that goes at the top of the documentation. It's enough to identify the class name and provide some documentation about it.

The next thing is the module description, again for the documentation, which begins with DESCRIPTION and ends with EOD.

When you look at the things starting with @, don't think Perl, think Ruby - they're not arrays, they're instance variables. Our Devel::DProfPP object will contain a filehandle to read the profiling data from, a subroutine reference for what to do when we enter a subroutine and when we leave one, the current Perl stack, and an array of symbols to hold the subroutine names.

These instance variables come in three types. The first are variables that the user doesn't set, but come with every new instance. I call these "set" variables, because they're come set to a particular value. Then there are "defaulting" instance variables, which the user may specify but otherwise default to a particular value. And then there are just ordinary ones, which are not initialized at all. Thankfully for me, the Devel::DProfPP brain dump contained all three types.

The symbol table and the stack are "set" variables. They come set to the an empty array reference; we signify this by simply putting an empty array reference on the same line.


        @syms       []

The enter subroutine, on the other hand, is a defaulting variable. (Don't worry about the arrow for now.) If the user doesn't specify an action to be performed when the profiler says that a subroutine has been entered, then we want to default to a coderef that does nothing.

In Perl, when we want to default to a value, we say something like:


    $object->{enter} = $args{enter} || sub {};

So the POOL encoding of that is:


        @enter      || sub {}

Finally, there's the filehandle, which the user supplies. Nothing special has to occur for this instance variable, so we just name it:


        @fh

From this, we know enough to create a constructor like so:


    sub new {
        my $class = shift;
        my %args = @_;
        my $self = bless {
            fh => $args{fh},
            enter => $args{enter} || sub {},
            leave => $args{leave} || sub {},
            syms => [],
            stack => [],
        }, $class;
        return $self;
    }

And that's precisely what POOL does. After the constructor, comes the accessors; we want to be able to say $obj->enter to retrieve the on-enter action, for instance. This thought led naturally to the syntax


    ->@enter || sub {}

When POOL sees an arrow attached to an instance variable, it creates an accessor for it:


    sub enter {
        my $self = shift;
        if (defined @_) { $self->{enter} = @_ };

        return $self->{enter};
    }

The stack accessor is an interesting one. First, we only want this to be an accessor and not a mutator -- we really don't want people modifying the profiler's idea of the stack behind its back. This is signified by the letters ro (read-only) before the accessor arrow.


    ro->@stack []

A further twist comes from the fact that POOL is still trying to DWIM around my brain and my brain expects POOL to be very clever indeed. Because we have declared stack to be set to an array reference, we know that the stack accessor deals with arrays. Hence, when $obj->stack is called, it should know to dereference the reference and return a list. This means the code ends up looking like this:


    sub stack {
        my $self = shift;
        return @{$self->{stack}};
    }

Aside from constructors and accessors, POOL knows about two other styles of method. (For now; there are more coming.) There are ordinary methods, which are simply named:


    parse

Sadly we can't DWIM the entire code for this method, so we generate the best we can:


    sub parse {
        my $self = shift;
        #...
    }

And the final style is the delegate. The sad thing about the delegate given in the DProfPP example is that it doesn't actually work, but we'll pretend that it does. Delegates are useful when you have an object that contains another object; you can provide a method in your class that simply diverts off to the contained object. For instance, if we have a class representing a mail message, then we may wish to store the head and body as separate objects inside our main object. (Mail::Internet does something like this, containing a Mail::Header object.) Now we can provide a method called get_header, which simply locates the header object and passes on its arguments to that:


    sub get_header {
        my $self = shift;
        $self->header->get(@_);
    }

In POOL lingo, this is a delegate via the header method, and get tells us how to do the delegation. It would be specified like this:


    get_header  ->header->get

Notice that this is precisely what appears in the middle of the Perl code for this method. An additional feature is that the "how" part of the delegation is optional. If we were happy for our top-level method to be called get instead of get_header, then we could say:


    get         ->header->

To me, this symbolizes going "through" the header method in order to call the get method.

These are the basics of the POOL language, and we've seen a little of the code it generates. It also generates a full set of documentation and tests, as well as a MANIFEST file and Makefile.PL or Build.PL file, but we'll look at those a little later.

In case you're interested, the reason why the delegation in the example doesn't work is because I was being too clever. I thought I could say:


    top_frame   ->stack->[0]

and have a top_frame method which "calls" [0] on the stack array reference, returning its first entry. This doesn't work for two reasons. First, I was too clever about ->stack and now it returns a list instead of an array reference. Second, delegates need to pass arguments, so POOL ends up generating code that looks like:


    return $self->stack->[0](@_);

(The third reason, of course, is that the top of the stack when represented as an array is element -1, not element 0. Oops.)

I thought about fixing this to do what I really, really mean, but decided that would be too nasty.

Another Example

Now that I had this neat tool for generating modules, I set it to work on the next module I wrote; this was a variant of Tree::Trie, a class to represent the trie data structure. Tries are a handy way of representing prefix and suffix relationships between words. They're conceptually simple; each letter in a word is inserted into a tree as the child of the previous letter. If we wanted a trie to count the prefices in the phrase THERE IS A TAVERN IN THE TOWN, then we would first insert the chain T-H-E-R-E-#, then I-S-#, then A-#, and so on, where # represents "end-of-word". We'd end up with a trie looking like this:

An example of a trie
Figure 1: An Example of a Trie

Tree::Trie is good at this sort of thing, but it didn't do a few extra things I needed, so I wrote an internal-use module called Kasei::Trie; this was the POOL file I used to generate it:


    Kasei::Trie - Implement the trie data structure
        DESCRIPTION

    "Trie"s are compact tree-like representations of multiple words, where
    each successive letter is introduced as the child of the previous.

        EOD
        @children {}
        insert

    Kasei::Trie::Node
        @children
        ->@data

The main class, Kasei::Trie, has a constructor with one instance variable that is initialized to be an empty hash reference, and one method, insert. There's also a secondary class representing each node in the trie, which has its own children, and has a data variable with its own accessor.

After generating this with POOL, all I needed to do was to fill in the code for the insert method, and modify some tests. A manifest, Makefile.PL, test suite with nine tests, and 161 lines of code and documentation were automatically created for me. I suspect that POOL saved me one to two hours.

The High Dive

Let's now take a look at the templates that make this all happen. The main template is called module, and it looks like this:


    package [% module.package %];
    [% INCLUDE module_header %]
    =head1 SYNOPSIS

    [% INCLUDE synopsis %]
    =head1 DESCRIPTION

    [% module.description  %]

    =head1 METHODS
    [% 
        FOREACH method = module.methods;
        INCLUDE do_method;
        END
    %]

    [% INCLUDE module_footer %]

    1;

As you can probably guess, in the Template Toolkit language, interesting things happen between [% and %]. Everything else is just output wholesale, but all kinds of things can happen inside the brackets. The first thing that happens is that we look at the module's package name. All the data we've collated from the parsing phase is stuffed into a hash reference, which we have passed into the template as module. The dot operator in Template Toolkit is a general do-the-right-thing operator that can be a method call, a hash reference look-up or an array reference look-up. In this case, it's a hash reference look-up, and we perform the equivalent of $module->{package} to extract the name.

Template Toolkit's [% INCLUDE template %] directive looks for the file template in its template path, processes it passing in all the relevant variables, and includes its output. So after the initial package ...; line, we include another template that contains everything that goes at the top of the module. As we'll see later, part of the beauty of templating things this way is that you can override templates by placing your own idea of what should go at the top of a module into your private version of module_header earlier in the template path, in a sense "inheriting" from the base set of templates.

Similarly, we include a file that will output the synopsis, and output the description that we collected between the DESCRIPTION and EOD lines of our POOL definition file.

Next, we want to document the various methods and output the code for them. POOL will have placed all the metadata for the methods we've defined, plus a constructor, in the appropriate order in the methods hash entry of module. As this is an array reference, we want to use a foreach-style loop to look at each method in turn. Not surprisingly, Template Toolkit's foreach-style loop is called FOREACH.

So this code:


    [% 
        FOREACH method = module.methods;
        INCLUDE do_method;
        END
    %]

will set a variable called method to each method in the array, and then call the do_method template. This simply dispatches to appropriate templates for each type of method. For instance, there's the set of templates for the "delegate" style; delegate_code looks like this:


    sub [% method.name %] {
        my $self = shift;
        return $self->[% method.via %]->[% method.how %](@_);
    }

Whereas the documentation template contains some generic commentary:


    =head2 [% method.name %]

    [% INCLUDE delegate_synopsis -%]
    Delegates to the [%method.how%] method of this object's [%method.via%].

    =cut

The synopsis that appears in the documentation here and in the synopsis at the top of the file simply explains how the delegation is done:


    $self->[% method.name %](...); # $self->[% method.via %]->[%method.how%]

Of course, there are some templates that are a little more complex, particularly those that generate the tests, but the main thing is that you can override any or none of those. If you don't like the standard same-terms-as-Perl-itself licensing block that appears at the end of the module, then create a file called ~/.pool/license containing:


    =head1 LICENSE

    This module is licensed under the Crowley Public License: do what thou
    wilt shall be the whole of the license.

POOL will pick up this template and use it instead of the standard one.

There's No P in Our POOL

When I started planning this article in the bath this morning, I realized that POOL is actually fantastically badly named; there's nothing actually Perl-specific about the language itself, and it's a handy definition language for any object-oriented system. Hence, I hereby retroactively name the project "the POOL Object Oriented Language", which also satisfies the recursive acronym freaks. But can we, using the same parser and templating system, turn POOL files into other languages? Of course we can; this is all part of the flexibility of the Template Toolkit system. What's more, we don't even have to override all of the templates in order to do so, just some of them. For instance, here's a Ruby equivalent of accessor_code:


    [% IF method.ro == "ro"; %]
        attr_reader :[% method.name %]
    [% ELSE; %]
        attr_accessor :[% method.name %]
    [% END; %]

do_method and module_footer, however, never need to change, since all they do is include other methods. With a complete set of toolkits, the same POOL description can be used to output a Perl, Ruby, Python, Java and C++ implementation of a given class.

Going Deeper

When Frans Hals' famous painting "The Laughing Cavalier" was being examined in a museum's labs, someone had the bright idea of putting it through an X-ray machine. When they did this, they were amazed to find underneath the famous painting a completely different work -- a painting of a young girl. They then adjusted the settings on the X-ray machine and tried again, and underneath the young girl, they found another painting. Since then, it's been common practice to X-ray pictures, and art historians have found many layers of paint underneath some of the most-famous pictures.

What's this got to do with POOL? Well, very little, but I wanted to throw that in. Since I realised that POOL's templates can be inherited so easily, I've had the idea of POOL "flavors"; coherent sets of templates that can be layered like oil paintings to impart certain properties to the output.

For instance, at the moment, POOL outputs unit tests in separate files in the t/ directory, one for each class. Some people, however, prefer to have their tests in the module right alongside the documentation and implementation, using the method described in Test::Inline. Well, there's no reason why POOL shouldn't be able to support this. All you'd need to do is create a new directory, let's say testinline/, and put a modified version of do_method in there which says something like:


    [% INCLUDE method_pod %]
    =head2 begin testing
    [% INCLUDE method_test %]
    =cut
    [% INCLUDE method_code %]

Next, arrange for testinline/ to appear in the Template Toolkit template path, and magically your tests will appear in the right place.

It's not inconcievable that multiple "flavours" could combine in order to theme a module; for instance, you might want a module which uses Test::Class for its tests, and Module::Build for its build file, with a BSD license flavor and Class::Accessor for its accessors instead of having them explicitly coded. Conceptually, you'd then say:


    pool --flavours=testclass,modulebuild,bsdlicense,classaccessor mymodule.pool

and the module would come out just as you want. This hasn't happened yet for two reasons: First, although it's only a two- or three-line change to the pool parser to support pushing these directories onto the template path, I haven't needed it yet so I haven't done it, and second, because I haven't written any flavors yet. But it's easy enough to do.

Other future directions for POOL include a syntax for class methods and class variables, support for other languages as mentioned about, (which basically means ripping out the hard-coding of MANIFEST, Makefile.PL and so on and replacing that with a more flexible method) and other minor modifications. For instance, I'd like some syntax to specify dependencies; other Perl modules which will then be use'd in the main modules and which would be named at the appropriate place in the Makefile.PL. And, of course, there's building up a library of flavors, including "total conversion" flavors like Ruby and Python.

The one thing that's becoming really, really important is the need for nondestructive editing -- the ability to fill in some additional code for a method, then regenerate the class from a slight change to the POOL file without losing the new method's code. I'm going to need to add that soon to allow for iterative redesigning of modules.

But the main thing about POOL is what it does now -- it saves me time, and it takes away the drudgery of writing OO classes in Perl.

And I will finish Devel::DProfPP soon. I promise.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02