Recently in Java Category

Using Java Classes in Perl

I started a new job recently to refocus my career from systems administration to web development. Part of that move meant using Java as my primary language at work and using a relatively new technology from the Java Community Process (JCP), the Content Repository API for Java (JCR), which is a hierarchical database standard for storing content. However, not wanting to let the skills in my favorite language waste away, I've been toying with similar technologies at home using Perl. I decided to make a direct port of the JCR to Perl and did so by making Perl use an existing Java implementation via Inline::Java. While I ran into some snags along the way, I was happily surprised to find the process of using Java classes from Perl was fabulously easy.

Bringing the JCR to Perl

The key to using JCR from Perl is Inline::Java. This library allows a Perl program to call Java methods with very little effort. For an introduction to Inline::Java, I suggest starting where I did, Phil Crow's 2003 Bringing Java into Perl article on Perl.com about Inline::Java. I also relied heavily upon the documentation for Inline::Java, which is very complete, if not exhaustive.

To get started on using the JCR, I used the reference implementation, Jackrabbit. I downloaded the Jackrabbit JAR file, along with all the prerequisites, which I found on the Jackrabbit website under First Hops. Then, I wrote a small script using Inline::Java to load the Java classes from Jackrabbit, create a repository, and then quit. I was able to take the First Hop with Jackrabbit in Perl as fast or faster than in Java:

#!/usr/bin/perl
use strict; 
use warnings;
use Inline
    Java => 'STUDY',
    STUDY => [ qw(
        org.apache.jackrabbit.core.TransientRepository
        javax.jcr.Repository
    ) ],
    AUTOSTUDY => 1;

my $repository = org::apache::jackrabbit::core::TransientRepository->new;
my $session = $repository->login;
eval {
    my $user = $session->getUserID;
    my $name = $repository
        ->getDescriptor($javax::jcr::Repository::REP_NAME_DESC);
    print "Logged in as $user to a $name repository.\n";
};

if ($@) {
    print STDERR "Exception: ", $@->getMessage, "n";
}

$session->logout;

This code is a direct Perl port of the first tutorial on the Jackrabbit website. To run the code, you must make sure your class path is correct. Because I initially dropped the JCR files into my working directory, I just ran these commands to get it to work:

% export CLASSPATH=$CLASSPATH:`echo *.jar | tr ' ' ':'`
% perl firsthop.pl

Within five minutes, I had a Perl script that could access the Jackrabbit libraries, create a repository, and login as anonymous. This answered my first question: Can I port the JCR to Perl? Yes.

First Snags

After proceeding to the "Second Hop" in the Jackrabbit tutorial, I ran into my first snag. To create nodes and properties with Jackrabbit, you must log in using a username and password. However, the JCR uses an array of characters for the password argument. Because Inline::Java helpfully translates Java string objects into Perl scalars, I could not determine a way to do so.

I also realized that I did not want to use lengthy Java namespaces in my Perl code. Writing out org::apache::jackrabbit::core::TransientRepository or javax::jcr::Repository is not a very productive use of my time and makes for odd-looking Perl code.

In addition, I didn't want a library that depended on Jackrabbit. There are several other JCR implementations either already written or on the way. Day has CRX, there's another Open Source implementation named Jaceira in the works, and eXo has also created a JCR implementation, to name a few.

Given these difficulties and the potential for other problems that I knew would come up, it was time to build this project as a Perl module.

Creating the Wrappers

To create the abstraction I desired, it quickly became apparent that I needed a way to build wrappers around the stubs generated by Inline::Java. Therefore, I set about writing a script that could generate a Perl package for each library in the JCR. Each wrapper package would, in addition to helping wrap special cases, clean up the Java namespace using naming conventions that are more common to Perl code (particularly my Perl code, which is similar to Conway's conventions from Perl Best Practices).

Using Java Reflection

I first needed to discover the classes, methods, and fields to wrap. There are more than 50 classes, interfaces, and exceptions in the JCR specification--I'm too lazy to type all that. Furthermore, the JCR is currently under revision via JSR 283, I don't want to update the class list again later. Finally, I want my wrappers to handle each method specifically because the use of AUTOLOAD() is evil (sometimes useful, but still evil).

I wrote a Java program to find all the classes in the JCR JAR file and write those class names out with additional information about methods, constructors, and fields. I used a YAML-formatted file to store the information. I made heavy use of the Java Reflection API to make this happen. You can see the full source of JCR package generator in the Java::JCR distribution. Here's one entry in the YAML JCR package output file:

javax.jcr.SimpleCredentials:
  isa:
   - java.lang.Object
   - javax.jcr.Credentials
  has_constructors: 1
  methods:
    instance:
      getAttributeNames: Array:java.lang.String
      getUserID: java.lang.String
      toString: java.lang.String
      getPassword: Array:char
      getAttribute: java.lang.Object
      setAttribute: void
      removeAttribute: void

The information I chose to place in the YAML file is mostly the outcome of experimentation with the Perl generator script. Because I wrote a generic handler to perform the required unwrapping that can handle any set of arguments, I didn't bother to remember them here. On the other hand, knowing the return type, recorded after each method name, is helpful to my implementation.

Code Generation with Perl

Next, I wrote a Perl script to load the information in the YAML file and generate the packages. You can see the full source for package-generator.pl as well. This script is pretty ugly. I do all the work of generating the information in Perl with embedded here-documents. A much better way to do this would be to use a templating tool like Andy Wardley's Template Toolkit, which is what I'd ultimately like to do.

Basically, this program iterates over all the entries loaded from the YAML file and generates a package for each class. It creates a Perl package name from the Java package name and a Perl package file at the appropriate location in the distribution.

For example, javax.jcr.nodetype.ItemDefinition gets a Perl package name of Java::JCR::Nodetype::ItemDefinition and a file location of lib/Java/JCR/Nodetype/ItemDefinition.pm.

The code injects a stock header and footer into the package file. All the real magic happens in between these.

Handling Static Fields

The code adds static fields by modifying the symbol table so that the wrappers point to the automatically generated stubs. For example, Java::JCR::PropertyType gets several entries like:

*STRING = *Java::JCR::javax::jcr::PropertyType::STRING;
*BINARY = *Java::JCR::javax::jcr::PropertyType::BINARY;
*LONG = *Java::JCR::javax::jcr::PropertyType::LONG;

For those who may not know, the first line makes the name Java::JCR::PropertyType::STRING exactly identical to using the longer name, Java::JCR::javax::jcr::Property::STRING by modifying the symbol table directly.

OK, looking at that, you probably want to know why all the Inline::Java stubs now have Java::JCR on the front of them. The reason is that in the generated code, I use the study_classes() routine to import the Java code and specify that the base package for the import should be Java::JCR:

study_classes(['javax.jcr.PropertyType'], 'Java::JCR');

Why? It's really not that critical, but I figured that because the name of the package I was putting on CPAN was Java::JCR, I really didn't want to drop packages into an external namespace while I was at it. Because the wrappers hide all the long names, the actual length of the internal names doesn't matter anyway.

Dealing with Constructors and Methods

After fields, the code checks whether the Java class provides a constructor (that is, if it's a class rather than an interface). As it turns out, I never actually use the code for dealing with constructors for two reasons:

  • Exceptions. For reasons I'll explain later, I don't generate the exception classes. Therefore, these constructors go unused.
  • SimpleCredentials. The only remaining class that has a constructor is java.jcr.SimpleCredentials, which is the special case I've already mentioned. Therefore, I only need to cope with constructors as a special case. I'll cover the special cases later as well.

After the constructor, the program runs through each method and generates both the static and instance method wrappers. Here's a typical method wrapper from Java::JCR::Repository:

sub login {
    my $self = shift;
    my @args = Java::JCR::Base::_process_args(@_);

    my $result = eval { $self->{obj}->login(@args) };
    if ($@) { my $e = Java::JCR::Exception->new($@); croak $e }

    return Java::JCR::Base::_process_return($result, "javax.jcr.Session", "Java::JCR::Session");
}

Camel Case

This particular example doesn't show it, but I also changed the camel-case Java names of every method to all lowercase with underscores, which is a much more common way of naming methods in Perl. I may add aliases using the Java names in the future, but I don't care for Java-style naming conventions in Perl code. The most interesting part of this process was handing names that include all-caps abbreviations. That required two lines of Perl:

my $perl_method_name = $method_name;
$perl_method_name =~ s/(p{IsLu}+)/_L$1E/g;

The /(\p{IsLu}+)/ matches any uppercase letter or string of uppercase letters. The replacement applies the \L modifier to the regular expression to convert the matched snippet to all lowercase. I prepend an underscore to complete the conversion. Thus, the method named getDescriptor becomes get_descriptor and the method named getNodeByUUID becomes get_node_by_uuid. This won't work very well, by the way, if there are any names that have abbreviations before the end (for example, if there had been a getUUIDNode, which would become get_uuidnode) Fortunately, this case never shows up in the JCR API.

Method Wrappers

Java::JCR::Base::_process_arg() processes the arguments passed to each method. This function looks for any of the generated wrapper objects (anything that isa Java::JCR::Base) in the list of arguments and unwraps the generated stub by pulling the obj key out of the blessed hash.

sub _process_args {
    my @args;
    for my $arg (@_) {
        if (UNIVERSAL::isa($arg, 'Java::JCR::Base')) {
            push @args, $arg->{obj};
        }
        else {
            push @args, $arg;
        }
    }

    return @args; 
}

The wrapper then executes the wrapped method on the generated stub by passing it the unwrapped arguments (as if the wrappers weren't there).

I make sure to wrap every call in an eval because Inline::Java passes Java exceptions as Perl exception objects. If an exception is thrown, I wrap it in a custom class named Java::JCR::Exception, which I wrote by hand.

Finally, the code returns the result. If the return type has a wrapper, as is the case in login()), I use Java::JCR::Base::_process_return() to cast the class and wrap it.

sub _process_return {
    my $result = shift;
    my $java_package = shift;
    my $perl_package = shift;

    # Null is null
    if (!defined $result) {
        return $result;
    }

    # Process array results
    elsif ($java_package =~ /^Array:(.*)$/) {
        my $real_package = $1;
        return [
            map { bless { obj => cast($real_package, $_) }, $perl_package }
                @{ $result }
        ];
    }

    # Process scalar results
    else {
        return bless {
            obj => cast($java_package, $result),
        }, $perl_package;
    }
}

This brings up two considerations: Why the custom exception class? Why do I need to cast the object? In both cases, I do this to handle minor issues in Inline::Java.

In the case of exceptions, the generated exception objects don't handle Perl stringification very well. Because a lot of exception handlers assume that exceptions are strings or properly stringified, this can be (and has been for me) a problem. My exception class makes sure stringification works the right way.

As for the cast, Inline::Java works on the assumption that you want to use the class in its most specific form, but if it hasn't studied that form, you get a generic object on which you cannot call any methods. Rather than engage the potentially costly AUTOSTUDY option to make sure Inline::Java studies everything and then smarten up the wrappers more, I've chosen to cast the objects into the expected return type. This does limit some of the flexibility.

Loading Packages

Other than the custom pieces, I needed some additional helpers to get the job done. I didn't want to write out a lot of use statements to use this library. As a JAPH, I like to keep things simple. Therefore, if I need to use the JCR and Jackrabbit, I just want to say:

use Java::JCR;
use Java::JCR::Jackrabbit;

I included a package loader in the main package, Java::JCR, that will take care of these details and then created a package for each of the subpackages in the JCR. The loader looks like:

sub import_my_packages {
    my ($package_name, $package_file) = caller;
    my %excludes = map { $_ => 1 } @_;

    my $package_dir = $package_file;
    $package_dir =~ s/.pm$//;
    my $package_glob = File::Spec->catfile($package_dir, '*.pm');

    for my $package (glob $package_glob) {
        $package =~ s/^$package_dir///;
        $package =~ s/.pm$//;
        $package =~ s///::/g;

        next if $excludes{$package};

        eval "use ${package_name}::$package;";
        if ($@) { carp "Error loading $package: $@" }
    }
}

I make sure to call that method once the package has finished loading and pass in exclusions to keep it from loading all the subpackages. This needs further enhancement to allow for future extensions under the Java::JCR namespace, so as not to load them automatically, but this is a good starting point. I built one class for each subpackage, then, that inherits from Java::JCR and then calls this method to load each of those classes.

Connecting to Jackrabbit

Obviously, the next step was to create the code to connect to Jackrabbit. This was done in Java::JCR::Jackrabbit. The initial implementation is very simple:

use base qw( Java::JCR::Base Java::JCR::Repository );

use Inline (
    Java => 'STUDY',
    STUDY => [],
);
use Inline::Java qw( study_classes );

study_classes(['org.apache.jackrabbit.core.TransientRepository'], 'Java::JCR');

sub new {
    my $class = shift;

    return bless {
        obj => Java::JCR::org::apache::jackrabbit::core::TransientRepository
                ->new(@_),
    }, $class;
}

I extended Java::JCR::Repository to add a constructor that calls the Jackrabbit constructor. Done.

Handling Special Cases

With all that work, I still couldn't make the second hop because I still hadn't resolved the whole problem of passing an array of characters. However, with the infrastructure I had in place, this was now solvable.

I created an additional YAML configuration file named specials.yml. This file contains hand-coded alternatives to use where appropriate. I then wrote an alternative for the new constructor:

javax.jcr.SimpleCredentials:
  new: |-
    sub new {
        my $class = shift;
        my $user = shift;
        my $password = shift;

        my $charArray = Java::JCR::PerlUtils->charArray($password);

        return bless {
            obj => Java::JCR::javax::jcr::SimpleCredentials->new($user, $charArray),
        }, $class;
    }

Then, I reran the generator script. Fortunately, I had already improved it to use any implemented method or constructor rather than generating one automatically.

To perform the conversion, I also needed to embed a little extra Java code. I wrote a very small Java class called PerlUtils for handling the conversion:

use Inline (
    Java => <<'END_OF_JAVA',

class PerlUtils {
    public static char[] charArray(String str) {
        return str.toCharArray();
    }
}

END_OF_JAVA
);

Given a string, it returns an array of characters to pass back into the SimpleCredentials constructor. No other work is necessary. I could now perform the JCR second hop in Perl. That script attaches to a Jackrabbit repository, logs in as "username" with password "password" and then creates a node.

Using Handles as InputStreams

The third (and final) hop of the Jackrabbit tutorial demonstrates node import using an XML file. However, in order to perform the import shown, you must pass an InputStream off to the importXML() method. While Inline::Java provides the ability to use Java InputStreams as Perl file handles, it doesn't provide the mapping in the opposite direction. Thus I needed another special handler and an additional set of helper methods.

The special code configuration looks like:

javax.jcr.Session:
  import_xml: |-
    sub import_xml {
        my $self = shift;
        my $path = shift;
        my $handle = shift;
        my $behavior = shift;

        my $input_stream = Java::JCR::JavaUtils::input_stream($handle);

        $self->{obj}->importXML($path, $input_stream, $behavior);
    }

This calls the input_stream() method, which is a Perl subroutine.

sub input_stream {
    my $glob = shift;
    my $glob_val = $$glob;
    $glob_val =~ s/^\*//;
    my $glob_caller = Java::JCR::GlobCaller->new($glob_val);
    return Java::JCR::GlobInputStream->new($glob_caller);
}

As you can see, this subroutine uses two separate Java classes to provide the interface from a Perl file handle to Java InputStream. The first class, Java::JCR::GlobCaller, performs most of the real work using the callback features provided by Inline::Java. It gets passed to the Java::JCR::GlobInputStream, which calls read() whenever the JCR reads from the stream:

public int read() throws InlineJavaException, InlineJavaPerlException {
    String ch = (String) CallPerlSub(
            "Java::JCR::JavaUtils::read_one_byte", new Object[] {
                this.glob
           });
    return ch != null ? ch.charAt(0) : -1;
}

The read_one_byte() function is a very basic wrapper for the Perl built-in getc.

sub read_one_byte {
    my $glob = shift;
    my $c = getc $glob;
    return $c;
}

With this in place, you can now perform the third JCR hop in Perl. By executing this script, you will connect to a repository, log in, and then create nodes and properties from an XML file.

Getting Ready to Distribute

The implementation is now, more or less, complete. You can use Java::JCR to connect to a Jackrabbit repository, log in, create nodes and properties, and import data from XML. There's a lot left untested, but the essentials are now present. With this done, I was ready to begin getting ready for the distribution. However, because some Java libraries are requirements to use the library, the library has some special needs to build and install easily. You should be able to install it by just running:

% cpan Java::JCR

I needed a way to build this library. My preferred build tool is Ken Williams' Module::Build. It's in common use, compatible with the CPAN installer, and cooperates well with g-cpan.pl, which is a packaging tool for my favorite Linux distribution, Gentoo. Finally, it's easy to extend.

When customizing Module::Build, I prefer to create a custom build module rather than by placing the extension directly inline with the Build.PL file. In this case, I've called the module Java::JCR::Build. I placed it inside a directory named inc/ with the rest of the tools I built for generating the package.

After creating the basic module that extends Module::Build, I added a custom action to fetch the JAR files called get_jars. I also added the code to execute this action on build by extending the code ACTION:

sub ACTION_get_jars {
    my $self = shift;

    eval "require LWP::UserAgent"
        or die "Failed to load LWP::UserAgent: $@";

    my $mirror_dir
        = File::Spec->catdir($self->blib, 'lib', 'Java', 'JCR');
    mkpath( $mirror_dir, 1);

    my $ua = LWP::UserAgent->new;

    print "Checking for needed jar files...n";
    while (my ($file, $url) = each %jars) {
        my $path = File::Spec->catfile($mirror_dir, $file);
        $self->add_to_cleanup($path);

        next if -f $path;

        my $response = $ua->mirror($url, $path);
        if ($response->is_success) {
            print "Mirroring $url to $file.n";
        }

        elsif ($response->is_error) {
            die "An error occurred fetching $url to $file: ",
                $response->status_line, "n";
        }
    }
}

sub ACTION_code {
    my $self = shift;

    $self->ACTION_get_jars;
    $self->SUPER::ACTION_code;
}

I use Gisle Aas's LWP::UserAgent to fetch the JAR files from the public Maven repositories and drop them into the build library folder, blib. Module::Build will take care of the rest by copying those JAR files to the appropriate location during the install process.

I also needed some code in Java::JCR to set the CLASSPATH correctly ahead of time:

my $classpath;
BEGIN {
    my @classpath;
    my $this_path = $INC{'Java/JCR.pm'};
    $this_path =~ s/.pm$//;
    my $jar_glob = File::Spec->catfile($this_path, "*.jar");
    for my $jar_file (glob $jar_glob) {
        push @classpath, $jar_file;
    }
    $classpath = join ':', @classpath, ($ENV{'CLASSPATH'} || '');
    $ENV{'CLASSPATH'} = $classpath;
}

This bit of code asks Perl for the path to the location of this library, which I assume is the installed location of the JAR files. Then, I find each file ending with .jar in that directory and put them into the CLASSPATH. Unfortunately, my code assumes a Unix environment when it uses the colon as the path separator. A future revision could make sure that this works on other systems as well, but because I use only Unix-based operating systems, my motivation is lacking.

With all that, you can now deploy this by downloading the tarball and running:

% perl Build.PL
% ./Build
% ./Build test
% ./Build install

It works!

Testing

I haven't mentioned this yet, but during the whole process of building this library, I also built a series of test cases. You can find these in the t/ directory of the distribution. The first few tests are actually just variations on the Jackrabbit tutorial, as well as a test to make sure the POD documentation contains no errors (every module author should use this test; you can just copy and paste it into any project).

Final Thoughts

I love Perl. This port from Java to Perl was easier than I would have thought possible. I wanted to share my success in the hopes of spurring on others. Kudos go to Ken Williams and Patrick LeBoutillier and the others that have assisted them to build the tools that made this possible.

Cheers.

Generating UML and Sequence Diagrams

Imagine yourself in a meeting with management. You're about to begin your third attempt to explain how to process online credit card payments. After a couple of sentences, you see some eyes glazing over. Someone says, "Perhaps you could draw us a picture."

Imagine me handling a recent request from my boss. He came in to the bat cave and said (in summary), "We want customers to sign up for email accounts without calling customer service. All the account creation code is in the customer care app." It didn't take long to find the relevant web screen, where the CSR presses Save to kick off the account creation, but there sure were a lot of layers between there and the final result. Keeping them in mind is hard enough when I'm deep in the problem. Three months from now, when an odd bug surfaces, it'll be nearly impossible without the right memory aid.

In both of these cases, the right diagram is the sequence diagram. (I'd show you mine for the situations above, but they're secret.) Sequence diagrams clearly show the time flow of method or function calls between modules. For complex systems, these diagrams can save a lot of time--like the time you and your fellow programmers spend during initial design, the time spent explaining what's possible to management, the time you spend remembering how things work when you revisit an old system that needs a new feature, and especially the time it takes a new programmer in your shop to get up to speed.

In short, sequence diagrams help with complex call stacks just as data model diagrams help with complex database schema.

While the sequence diagram is useful to me, I don't like on-screen drawing tools. Therefore, I wrote the original UML::Sequence to make the drawings for me. With recent help from Dean Arnold, the current version has many nice features and is closer to standards compliance (but, both Dean and I prefer a useful diagram to a compliant one). Using UML::Sequence, you can quickly make proposed diagrams of systems not yet built. You can even run it against existing programs to have it diagram what they actually do.

Reading a Sequence Diagram

If you already know how to read sequence diagrams, you can skip to the next section.

Because most uses of UML involve object-oriented projects, that's where I've drawn my examples. Don't think that objects are necessary for sequence diagrams. I've diagrammed many non-OO programs with it (including some in COBOL).

A simple example will work best for a first look at UML sequence diagrams, so consider rolling two dice. My over-engineered solution gives a nice diagram to discuss. In it, I made each die an object of the Die class and the pair of dice an object of the DiePair class. To roll the dice, I wrote a little script. Here are these pieces:

    package Die;
    use strict;

    sub new {
        my $class = shift;
        my $sides = shift || 6;
        return bless { SIDES => $sides }, $class;
    }

    sub roll {
        my $self       = shift;
        $self->{VALUE} = int( rand * $self->{SIDES} ) + 1;

        return $self->{VALUE};
    }

    1;

The Die constructor takes an optional number of sides for the new die object, but supplies six as a default. It bundles that number of sides into a hash reference, blesses, and returns it.

The roll() method makes a random number and uses it to pick a new value for the die, which it returns.

DiePair is equally scintillating:

    package DiePair;
    use strict;

    use Die;

    sub new {
        my $class     = shift;
        my $self      = {};
        $self->{DIE1} = Die->new( shift );
        $self->{DIE2} = Die->new( shift );

        return bless $self, $class;
    }

    sub roll {
        my $self   = shift;
        my $value1 = $self->{DIE1}->roll();
        my $value2 = $self->{DIE2}->roll();

        $self->{TOTAL}   = $value1 + $value2;
        $self->{DOUBLES} = ( $value1 == $value2 ) ? 1 : 0;

        return $self->{TOTAL}, $self->{DOUBLES};
    }

    1;

The constructor makes two die objects and stores them in a hash reference, which it blesses and returns.

The roll() method rolls each die, storing the value, then totals them and decides whether the roll was doubles. It returns both total and doubles, saving the driver from having to call back for them.

Rather than modeling a real game like craps, I use a small driver, which will simplify the resulting diagram.

    #!/usr/bin/perl
    use strict;

    use DiePair;

    my $die_pair          = DiePair->new(6, 6);
    my ($total, $doubles) = $die_pair->roll();

    print "Your total is $total ";
    print "it was doubles" if $doubles;
    print "\n";

Figure 1 shows the sequence diagram for this driver.

the sequence diagram for the die roller
Figure 1. The sequence diagram for the die roller

Each package has a box at the top of the diagram. The script is in the main package (which is always Perl's default). Time flows from top to bottom. Arrows represent method (or function) calls.

The vertical boxes, or activations, represent the life of a call. Between the activations are dashed lines called the life lines of the objects.

You can see that main begins first (because its first activation is higher than the others). It calls new() on the DiePair class. That call lasts long enough for DiePair's constructor to call new() on the Die class twice.

After making the objects, the script calls roll() on the DiePar, which forwards the request to the individual dice.

This diagram is unorthodox. The boxes at the top should represent individual instances, not classes. Sometimes I prefer this style because it compacts the diagram horizontally. Figure 2 shows a more orthodox diagram (divergent only in the lack of name underlining).

a more orthodox UML diagram
Figure 2. A more orthodox UML diagram

You can see the individual Die objects that the DiePair instance aggregates, because there is now a box at the top for each object (use your imagination when thinking about the driver as an instance). The names do not come from the code; they are sequentially assigned from the class name.

Diagrams like this are especially helpful when many classes interact. For instance, many of them start with a user event (like a button press on a GUI application) and show how the view communicates with the controller and how the controller in turn communicates with the data model.

Another particularly useful application is for programs communicating via network sockets. In their diagrams, each program has a box, and the arrows represent writing on a socket. Note that UML sequence diagrams may also have dashed arrows, which show return values going back to callers. Unless there is something unusual about that value, there is no use to waste space on the diagram for those returns. However, in a network situation, showing the back and forth can be quite helpful. UML::Sequence now has support for return arrows.

Using UML::Sequence

Now that you understand how to read a sequence diagram, I can show you how to make them without mouse-driven drawing tools.

Making diagrams with UML::Sequence is a three-step process:

  1. Create a program or a text file.
  2. Use genericseq.pl to create an XML description of the diagram.
  3. Use a rendering script to turn the XML into an image file.

If the image is in the wrong format for your purposes, you might need an extra step to convert to another format.

Running Perl Programs

Here is how I generated Figure 1 above by running the driver program. If your program is in Perl, you can use this approach (see the next subsection for Java programs).

First, create a file listing the subs you want to see in the diagram:

    DiePair::new
    DiePair::roll
    Die::new
    Die::roll

I called this file roller.methods to correspond to the script's name, roller. When you make your method list, remember that sequence diagrams are visual space hogs, so pick a short list of the most important methods.

Then, run the program through the genericseq.pl script:

$ genericseq.pl UML::Sequence::PerlSeq roller.methods roller > roller.xml

UML::Sequence::PerlSeq uses the Perl debugger's hooks to profile the code as it runs, watching for the methods listed in roller.methods. The result is an XML file describing the calls that actually happened during this run.

To turn this into a picture, use one of the image scripts:

$ seq2svg.pl roller.xml > roller.svg

Obviously, seq2svg.pl makes SVG images. If you have no way to view those, get Firefox 1.5, use a tool like the batik rasterizer, or use seq2rast.pl, which makes PNG images directly using the GD module.

If you want diagrams like Figure 2, use UML::Sequence::PerlOOSeq in place of UML::Sequence::PerlSeq when you run genericseq.pl.

Running Java Programs

I wrote UML::Sequence while working as a Java programmer, so I made it work on Java (at least sometimes it works). The process is similar to the above. First, make a methods file:

    ALL
    Roller
    DiePair
    Die

Here I use ALL to mean all methods from the following classes. You can also list full signatures (but they have to be full, valid, and expressed in the internal signature format as if generated by javap).

Then run genericseq.pl with UML::Sequence::JavaSeq in place of UML::Sequence::PerlSeq. Of course, this requires you to have a Java development environment on your machine. In particular, it must be able to find tools.jar, which provides the debugger hooks necessary to watch the calls.

Produce the image from the resulting XML file as shown earlier for Perl programs.

Text File Input

While I pat myself on the back every time I make a sequence diagram of a running program, that's not always (or even usually) practical. For instance, you might want to show the boss what you have planned for code you haven't written yet. Alternately, you might have a program that is so complex that no amount of tweaking the methods file will restrict the diagram enough to make it useful.

In these cases, there is a small text language you can use to specify the diagram. It is based on indentation and uses dot notation for method names. Here is a sample:

At Home.Wash Car
    Garage.retrieve bucket
    Kitchen.prepare bucket
        Kitchen.pour soap in bucket
        Kitchen.fill bucket
    Garage.get sponge
    Garage.open door
    Driveway.apply soapy water
    Driveway.rinse
    Driveway.empty bucket
    Garage.close door
    Garage.replace sponge
    Garage.replace bucket

Each line will become an arrow in the final diagram (except the first line). Indentation indicates the call depth. The "class" name comes before the dot and the "method" name after it.

There is no need for a methods file in this case, because presumably you didn't bother to type things you didn't care about. You may go directly to running genericseq.pl:

$ genericseq.pl UML::Sequence::SimpleSeq inputfile > wash.xml

Once you have the XML file, render it as before.

Getting Fancy

As I mentioned earlier, Dean Arnold recently added lots of cool features to amaze and impress your bosses and/or clients. In particular, he expanded the legal syntax for text outlines. Here is his sample of car washing with the new features:

AtHome.Wash Car
        /* the bucket is in the garage */
    Garage.retrieve bucket
    Kitchen.prepare bucket
        Kitchen.pour soap in bucket
        Kitchen.fill bucket
    Garage.get sponge
    Garage.checkDoor
            -> clickDoorOpener
        [ ifDoorClosed ] Garage.open door
    * Driveway.apply soapy water
    ! Driveway.rinse
    Driveway.empty bucket
    Garage.close door
    Garage.replace sponge
    Garage.replace bucket

There are several new features here:

  • You can include UML annotations by using C-style comments, as shown on the second line of the example. Each annotation attaches to the following line as a footnote (or tooltip, if you install a third-party open source library).
  • There is a -> in front of clickDoorOpener. This becomes an asynchronous message arrow. When -> comes between a method and additional text, it indicates that a regular method is returning the value on the righthand side of the arrow. The return appears as a dashed arrow from the called activation back to the caller.
  • ifDoorClosed is in brackets, which mark a conditional in UML. These appear in the diagram in front of the method name.
  • There is a star in front of Driveway.apply, which indicates a loop construct in UML. (UML people call this iteration.)
  • There is an exclamation point in front of Driveway.rinse, indicating urgency.

In addition to these changes to the outline syntax, both seq2svg.pl and seq2rast.pl now support options to control appearance (including colors) and to generate HTML imagemaps for raster versions of the diagrams. The imagemaps hyperlink diagram elements--columns header and method call names--to supporting documents. For example, clicking on the Garage header will open Garage.html, while clicking on checkDoor will also open Garage.html, but at the #checkDoor anchor.

Summary

UML Sequence diagrams are a great way to see how function or method calls (or network messages) flow through a multi-module application, whether it is object-oriented or not. Using UML::Sequence and its helper scripts, you can make those diagrams without having to point and click in a drawing program.

References

The imagemapped HTML version of car washing is viewable online.

To read more about UML diagrams, check out the aptly named UML Distilled, by Martin Fowler, available from your favorite bookseller.

I recommend Walter Zorn's JavaScript, DHTML tooltips package to display embedded annotations.

Batik is an Apache project for managing and viewing SVG.

Still More Perl Lightning Articles

It has been common practice within the Perl community for ages to ship distributions with a Makefile.PL so that the user will be able to install the packages when he retrieves them, either via the shell which the CPAN/CPANPLUS modules offer or via manual CPAN download.

The Makefile.PL consists of meta-information, which in the case of the distribution HTML::Tagset is:

 # This -*-perl-*- program writes the Makefile for installing this distribution.
 #
 # See "perldoc perlmodinstall" or "perldoc ExtUtils::MakeMaker" for
 # info on how to control how the installation goes.

 require 5.004;
 use strict;
 use ExtUtils::MakeMaker;

 WriteMakefile(
     NAME            => 'HTML::Tagset',
     AUTHOR          => 'Andy Lester <andy@petdance.com>',
     VERSION_FROM    => 'Tagset.pm', # finds $VERSION
     ABSTRACT_FROM   => 'Tagset.pm', # retrieve abstract from module
     PMLIBDIRS       => [qw(lib/)],
     dist            => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', },
     clean           => { FILES => 'HTML-Tagset-*' },
 );

Of interest are the arguments to WriteMakefile(), because they influence the Makefile written by ExtUtils::MakeMaker after the user has invoked the usual build and install procedure:

 % perl Makefile.PL
 % make
 % make test
 # make install

Module::Build, Successor of ExtUtils::MakeMaker?

As Ken Williams grew tired of ExtUtils::MakeMaker and its portability issues, he invented Module::Build, a successor of ExtUtils::MakeMaker. One goal of Module::Build is to run smoothly on most operating systems, because it takes advantage of creating Perl-valid syntax files only and does not rely upon crufty Makefiles, which are often subject to misinterpretation, because so many incompatible flavors of make exist in the wild.

The current maintainer of ExtUtils::MakeMaker, Michael G. Schwern, elaborated about this problem in his talk reachable via "MakeMaker is DOOMED."

Module::Build Distribution "Skeleton"

If you take in consideration the distribution HTML::Tagset again, the rough skeleton suitable for Module::Build having converted the Makefile.PL by Module::Build::Convert into a Build.PL, the output would be:

 # This -*-perl-*- program writes the Makefile for installing this distribution.
 #
 # See "perldoc perlmodinstall" or "perldoc ExtUtils::MakeMaker" for
 # info on how to control how the installation goes.
 # Note: this file has been initially generated by Module::Build::Convert 0.24_01

 require 5.004;
 use strict;
 use warnings;

 use Module::Build;

 my $build = Module::Build->new
   (
    module_name => 'HTML::Tagset',
    dist_author => 'Andy Lester <andy@petdance.com>',
    dist_version_from => 'Tagset.pm',
    add_to_cleanup => [
                        'HTML-Tagset-*'
                      ],
    license => 'unknown',
    create_readme => 1,
    create_makefile_pl => 'traditional',
   );
  
 $build->create_build_script;

As you can see, while ExtUtils::MakeMaker prefers uppercased arguments, Module::Build goes by entirely lowercased arguments, which obey the rule of least surprise by being as intuitive as a description can be.

The build and installation procedure for a Module::Build distribution is:

 % perl Build.PL
 % perl Build
 % perl Build test
 # perl Build install

Module::Build::Convert's State of Operation

Module::Build::Convert actually does all of the background work and can be safely considered the back end, whereas make2build is the practical front-end utility. Module::Build::Convert currently exposes two kinds of operation: static approach and dynamic execution. The static approach parses the arguments contained within the Makefile.PL's WriteMakefile() call, whereas dynamic execution runs the Makefile.PL and captures the arguments provided to WriteMakefile().

Module::Build::Convert parses statically by default, because the dynamic execution has the downside that code will be interpreted and the interpreted output will be written to the Build.PL, so you have to conclude that the user of the distribution will end up with predefined values computed on the author's system. This is something to avoid, whenever possible! If the parsing approach fails, perhaps looping endlessly on input, Module::Build::Convert will reinitialize to perform dynamic execution of the Makefile.PL instead.

Data Section

Module::Build::Convert comes with a rather huge data section containing the argument conversion table, default arguments, sorting order, and begin and end code. If you wish to change this data, consider making a ~/.make2buildrc file by launching make2build with the -rc switch. Do not edit the Data section within Module::Build::Convert directly, unless you are sure you want to submit a patch.

Argument Conversion

On the left-hand side is the MakeMaker's argument name, and on the right-hand side the Module::Build's equivalent.

 NAME                  module_name
 DISTNAME              dist_name
 ABSTRACT              dist_abstract
 AUTHOR                dist_author
 VERSION               dist_version
 VERSION_FROM          dist_version_from
 PREREQ_PM             requires
 PL_FILES              PL_files
 PM                    pm_files
 MAN1PODS              pod_files
 XS                    xs_files
 INC                   include_dirs
 INSTALLDIRS           installdirs
 DESTDIR               destdir
 CCFLAGS               extra_compiler_flags
 EXTRA_META            meta_add
 SIGN                  sign
 LICENSE               license
 clean.FILES           @add_to_cleanup

Default Arguments

These are default Module::Build arguments to added. Arguments with a leading # are ignored.

 #build_requires       HASH
 #recommends           HASH
 #conflicts            HASH
 license               unknown
 create_readme         1
 create_makefile_pl    traditional

Sorting Order

This is the sorting order for Module::Build arguments.

 module_name
 dist_name
 dist_abstract
 dist_author
 dist_version
 dist_version_from
 requires
 build_requires
 recommends
 conflicts
 PL_files
 pm_files
 pod_files
 xs_files
 include_dirs
 installdirs
 destdir
 add_to_cleanup
 extra_compiler_flags
 meta_add
 sign
 license
 create_readme
 create_makefile_pl

Begin Code

Code that precedes converted Module::Build arguments. $(UPPERCASE) are stubs being substituted by Module::Build code.

 use strict;
 use warnings;

 use Module::Build;

 $MAKECODE

 my $b = Module::Build->new
 $INDENT(

End Code

Code that follows converted Module::Build arguments. $(UPPERCASE) are stubs being substituted by Module::Build code.

 $INDENT);

 $b->create_build_script;

 $MAKECODE

make2build Basic Usage

Using make2build is as easy as launching it in the directory of the distribution of which Makefile.PL you wish to convert.

For example:

% make2build

You may also provide the full path to the distribution, assuming, for example, you didn't cd directly into the distribution directory.

% make2build /path/to/HTML-Tagset*

In both cases, the command will convert any found Makefile.PL files and will generate no output because make2build acts quiet by default.

make2build Switches

As make2build aims to be a proper script, it of course, provides both the -h (help screen) and -V (version) switches.

 % make2build -h
 % make2build -V

In case you end up with a mangled Build.PL written, you can examine the parsing process by launching make2build with the -d switch, enabling the pseudo-interactive debugging mode.

 % make2build -d

Should you not like the indentation length or judge it to be too small, increase it via the -l switch followed by an integer.

 % make2build -l length

If you don't agree with the sorting order predefined in Module::Build::Convert, you may enforce the native sorting order, which strives to arrange standard arguments with those seen available in the Makefile.PL.

 % make2build -n

The argument conversion table, default arguments to add, the sorting order of the arguments, and the begin and end code aren't absolute, either. Change them by invoking make2build with the -rc switch to create a resource configuration file in the home directory of the current user; that is likely ~/.make2build.rc.

 % make2build -rc

While make2build is quiet by default, there are two verbosity levels. To enforce verbosity level 1, launch make2build with -v. To enforce verbosity level 2, use -vv.

With -v, the code will warn about Makefile.PL options it does not understand or skips. With -vv, it will accumulate -v output and the entire generated Build.PL.

 % make2build -v
 % make2build -vv

You may execute the Makefile.PL in first place, but such usage is deprecated because Module::Build::Convert downgrades automatically when needed.

 % make2build -x (deprecated)

Swinging with Perl

Phil Crow

Perl does not have a native graphical user interface (GUI) toolkit. So we use all manner of existing GUI tools in front of our Perl applications. Often we use a web browser. We have long had Perl/Tk and other libraries based on C/C++. Now we can also use Java's Swing toolkit with similar ease.

In my sample application, when the user presses a button, Perl evaluates an arithmetic expression from the input text box. The result appears in another text box. I'll show the code for this application a piece at a time with a discussion after each piece. To see the whole thing, look in the examples directory of the Java::Swing distribution.

    #!/usr/bin/perl
    use strict; use warnings;

    BEGIN {
        $ENV{CLASSPATH} .= ':/path/to/Java/Swing/java'
    }

Java::Swing needs certain Java classes to be in the class path before it loads, so I've appended a path to those classes in a BEGIN block (this block must come before using Java::Swing).

    use Java::Swing;

This innocuous statement magically sets up namespaces for each Java Swing component, among other things.

    my $expression  = JTextField->new();
    my $answer      = JTextField->new( { columns => 10 } );
    my $submit      = JButton   ->new("Evaluate");
    my $frame       = JFrame    ->new();
    my $root_pane   = $frame->getContentPane();
    my $south_panel = JPanel->new();

After using Java::Swing, you can refer to Swing components as Perl classes. You can even pass named parameters to their constructors, as shown for the second JTextField.

    $south_panel->add(JLabel->new("Answer:"), "West");
    $south_panel->add($answer,                "Center");
    $south_panel->add($submit,                "East");

    $root_pane->add($expression,  "North");
    $root_pane->add($south_panel, "South");

    $frame->setSize(300, 100);
    $frame->show();

Most work with the components is the same as in any Java program. If you don't understand the above code, consult a good book on Swing (like the one from O'Reilly).

    my $swinger = Java::Swing->new();

This creates a Java::Swing instance to connect event listeners and to control the event loop.

    $swinger->connect(
        "ActionListener", $submit, { actionPerformed => \&evaluate }
    );

    $swinger->connect(
        "WindowListener", $frame, { windowClosing => \&ending }
    );

Connection is simple. Pass the listener type, the object to listen to, and a hash of code references to call back as events arrive.

    $swinger->start();

Start the event loop. After this, the program passively waits for event callbacks. It stops when one of the callbacks stops the event loop.

    sub evaluate {
        my $sender_name = shift;
        my $event       = shift;

        $answer->setText(eval $expression->getText());
    }

My evaluation is simple. I retrieve the text from the expression JTextField, eval it, and pass the result to setText on the answer JTextField. Using eval raises possible security concerns, so use it wisely.

    sub ending {
        $swinger->stop();
    }

When the user closes the window, I stop the event loop by calling stop on the Java::Swing instance gained earlier. This kills the program.

With Java::Swing, you can build Swing apps in Perl with some important bits of syntactic sugar. First, you don't need to have separate Java files or inline sections. Second, you can pass named arguments to constructors. Finally, you can easily connect event listeners to Perl callback code.

Scriptify Your Module

Josh McAdams

Recently during an MJD talk at Chicago.pm, I saw a little Perl trick that was so amazingly simple and yet so useful that it was hard to believe that more mongers in the crowd hadn't heard of it. The trick involved taking your module and adding a driver routine to it so the module could run as a script.

To illustrate, start with an example module that contains two utility subroutines that convert weights between pounds and kilograms. The subroutines accept some number and multiplies it by a conversion factor.

  package WeightConverter;
  
  use strict;
  use warnings;
  use constant LB_PER_KG => 2.20462262;
  use constant KG_PER_LB => 1/LB_PER_KG;
  
  sub kilograms_to_pounds { $_[0] * LB_PER_KG; }
  
  sub pounds_to_kilograms { $_[0] * KG_PER_LB; }

Assuming that the real module has a little error checking and POD, this module would serve you just fine. However, what if you decided that we needed to be able to easily do weight conversions from the command line? One option would be to write a Perl script that used WeightConverter. If that seems like too much effort, there is a one-liner that would do conversions.

  perl -MWeightConverter -e 'print WeightConverter::kilograms_to_pounds(1),"\n"'

This would do the trick, but it is a lot to remember and isn't very fun to type. There is a lot of benefit available from saving some form of script, and believe it or not, the module can hold that script. All that you have to do is write some driver subroutine and then call that subroutine if the module is not being used by another script. Here is an example driver for WeightConverter.

This example driver script just loops through the command-line arguments and tries to find instances where the argument contains either a k or p equal to some value. Based on whether or not you are starting with pounds or kilograms, it calls the appropriate subroutine and prints the results.

  sub run {
    for (@ARGV) {
      if(/^[-]{0,2}(k|p)\w*=(.+)$/) {
        $1 eq 'k' ?
          print "$2 kilograms is ", kilograms_to_pounds($2), " pounds\n" :
          print "$2 pounds is ", pounds_to_kilograms($2), " kilograms\n" ;
      }
    }
  }

Now all that is left is to tell the module to run the run subroutine if someone has run the module on its own. This is as easy as adding one line somewhere in the main body of the module.

  run unless caller;

All this statement does is execute the run subroutine unless the caller function returns a value. caller will only return true if WeightConverter is being used in another script. Now, this module is usable in other scripts as well as on the command line.

  $> perl WeightConverter.pm -kilos=2 -pounds=145 -k=.345
  2 kilograms is 4.40924524 pounds
  145 pounds is 65.7708937051548 kilograms
  .345 kilograms is 0.7605948039 pounds

Mocks in Your Test Fixtures

by chromatic

Since writing Test::MockObject, I've used it in nearly every complex test file I've written. It makes my life much easier to be able to control only what I need for the current group of tests.

I wish I'd written Test::MockObject::Extends earlier than I did; that module allows you to decorate an existing object with a mockable wrapper. It works just as the wrapped object does, but if you add any mocked methods, it will work like a regular mock object.

This is very useful when you don't want to go through all of the overhead of setting up your own mock object but do want to override one or two methods. (It's almost always the right thing to do instead of using Test::MockObject..)

Another very useful test module is Test::Class. It takes more work to understand and to use than Test::More, but it pays back that investment by allowing you to group, reuse, and organize tests in the same way you would group, reuse, and organize objects in your code. Instead of writing your tests procedurally, from the start to the end of a test file, you organize them into classes.

This is most useful when you've organized your code along similar lines. If you have a base class with a lot of behavior and a handful of subclasses that add and override a little bit of behavior, write a Test::Class-based test for the base class and smaller tests that inherit from the base test for the subclasses.

Goodbye, duplicate code.

Fixtures

Test::Class encourages you to group related tests into test methods. This allows you to override and extend those groups of tests in test subclasses. (Good OO design principles apply here; tests are still just code, after all.) One of the benefits of grouping tests in this way is that you can use test fixtures.

A test fixture is another method that runs before every test method. You can use them to set up the test environment--creating a new object to test, resetting test data, and generally making sure that tests don't interfere with each other.

A standard test fixture might resemble:

  sub make_fixture :Test( setup )
  {
      my $self        = shift;
      $self->{object} = $self->test_class()->new();
  }

Assuming that there's a test_class() method that returns the name of the class being tested, this fixture creates a new instance before every test method and stores it as the object attribute. The test methods can then fetch this as normal.

Putting Them Together

I recently built some tests for a large system using Test::Class. Some of the tests had mockable features--they dealt with file or database errors, for example. I found myself creating a lot of little Test::MockObject::Extends instances within most of the tests.

Then inspiration struck. Duplication is bad. Repetition is bad. Factor it out into one place.

The insight was quick and sudden. If Test::MockObject::Extends is transparent (and if it isn't, please file a bug--I'll fix it), I can use it in the test fixture all the time and then be able to mock whenever I want without doing any setup. I changed my fixture to:

  sub make_fixture :Test( setup )
  {
      my $self        = shift;
          my $object      = $self->test_class()->new();
      $self->{object} = Test::MockObject::Extends->new( $object );
  }

The rest of my code remained unchanged, except that now I could delete several identical lines from several test methods.

Do note that, for this to work, you must adhere to good OO design principles in the code being tested. Don't assume that ref is always what you think it should be (and use the isa() method instead).

Sure, this is a one-line trick, but it removed a lot of busy work from my life and it illustrates two interesting techniques for managing tests. If you need simpler, more precise mocks, use Test::MockObject::Extends. If you need better organization and less duplication in your test files, use Test::Class. Like all good test modules, they work together almost flawlessly.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02