Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Building Good CPAN Modules
by Rob Kinyon | Pages: 1, 2

Major New Features

Some new features are so large that they change the name of the game. These include Unicode and threading. Unicode has had support, in one form or another, in every version of Perl 5. That support has slowly moved from modules (such as Unicode::String) to the Perl core itself.

Threading

In 5.8.0, Perl's threading model changed from the 5.005 model (which never worked very well) to ithreads (which do). Additionally, multi-core processors are coming to the smaller servers. More and more, developers using 5.8+ choose to write threaded applications.

This means that your module might have to play in a threaded playground, which is a weird place indeed to process-oriented folks. Now, Perl's threading model is unshared by default, which means that global variables are safe from clobbering each other. This is different from the standard threading model, like Java's, which shares all variables by default. Because of this decision, most modules will run under threads with little to no changes.

The main issue you will need to resolve is what happens with your stateful variables. These are the variables that persist and keep a value in between invocations of a subroutine, yet need coordination across threads. A good example is:

{
    my $counter;
    sub next_value ( return ++$counter; }
}

If you depend on this counter being coordinated across every invocation of the next_value() subroutine, you need to take three steps.

  • Sharing

    Because Perl doesn't share your variables for you, you must explicitly share $counter to make sure that it is correctly updated across threads.

  • Locking

    Because a context-switch between threads can happen at any time, you need to lock $counter within the next_value() subroutine.

  • Version safety

    Also, because ithreads is an optional 5.8.0+ feature and the lock() subroutine is undefined before 5.6.0+, you may want to do some version checks.

    {
        my $counter = 0;
        if ( $] >= 5.008 && exists $INC{'threads.pm'} ) {
            require threads::shared;
            import threads::shared qw(share);
            share( $counter );
        }
        else {
            *lock = sub (*) {}
        }
    
        sub next_value {
            lock( $counter );
            $counter++;
        }
    }

The best description that I've seen of what you need to do to port your application to a threaded works successfully is "Where Wizards Fear to Tread" on Perl 5.8 threads.

Unicode

Although Unicode had some support prior to 5.8.0, a major feature in 5.8.0 was the near-seamless handling of Unicode within Perl itself. Prior that that, developers had to use Unicode::String and other modules. This means that you should look to handling strings as gingerly as possible if you consider support for Unicode on Perls prior to 5.8.0 as important. Luckily, most major modules already do this for you without you having to worry about it.

Discussing how to handle Unicode cleanly is an article in itself. Please see perlunicode and perluniintro for more information.

Playing Nicely with Others

If you're like me, you heard "Doesn't play well with others" a lot in kindergarten. While that's an admirable trait for a hacker, it's not something to praise in any modules that production systems depend upon. There are several common items to look out for when trying to play nicely with others.

Persistent Environments

Persistent environments, like mod_perl and FastCGI, are a fact of life. They make the WWW work. They are also a very different beast than a basic script that runs, does its thing, and ends. Basically, a persistent environment, such as mod_perl, does a few things.

  • Persistent interpreter

    Launching the Perl executable is expensive, relatively speaking. In an environment such as a web application, every request is a separate invocation of a Perl script. Persistence keeps a Perl interpreter around in memory between invocations, reducing the startup overhead dramatically.

  • Forked children

    In order to handle multiple requests at once, persistent environments tend to provide the capability for forked child processes, each with its own interpreter. Normally, this requires a copy of each module in every child's memory area.

  • Shared memory

    Nearly every request will use the same modules (CGI, DBI, etc). Instead of loading them every time, persistent environments load them into shared memory that each of the child processes can access. This can save a lot of memory that would otherwise be required to load DBI once for every child. This allows the same machine to create many more children to handle many more requests simultaneously on the same machine.

Caching needs a special mention. Because most persistent environments load most of the code into shared memory before forking off children, it makes sense to load as much code that won't change as possible before forking. (If the code does change, the child process receives a fresh copy of the modified memory space, reducing the benefit of shared memory.) This means that modules need to be able to pre-load what they need on demand. This is why CGI, which normally defers loading anything as much as possible, provides the :all option to load everything at once.

The mod_perl folks have an excellent set of documentation as to what's different about persistent environments, why you should care, and what you need to do for your module to work right.

Overloading

It's very easy to create an overloaded class that cannot work with other overloaded classes. For example, if I'm using Overload::Num1 and Overload::Num2, I would expect $num1 + $num2 to DWIM. Unfortunately, with most overloaded classes written as below, they won't. (For more information as to how this code works, please read overload, or the excellent article "Overloading.")

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    $l = ref $l ? $l->numify : $l;
    $r = ref $r ? $r->numify : $r;

    return $l + $r;
}

Overload::Num1 uses the numify() method to retrieve the number associated with the class. Overload::Num2 uses the get_number() method. If I tried to use the two classes together, I would receive an error that looks something like Can't locate object method "numify" via package "Overload::Num2".

The solution is very simple--don't define an add() method. Define a numify (0+) method, set fallback to true, and walk away. You don't need to define a method for each option. You only need to do so if you have to do something special as part of doing that operation. For example, complex numbers have to add the rational and complex parts separately.

If you absolutely have to define add(), though, use something like this:

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    my $pkg = ref($l) || ref($r);

    # This is to explicitly call the appropriate numify() method
    $l = do {
        my $s = overload::Method( $l, '0+' );
        $s ? $s->($l) : $l
    };

    $r = do {
        my $s = overload::Method( $r, '0+' );
        $s ? $s->($r) : $r
    };

    return $pkg->new( $l + $r );
}

This way, each overloaded class can handle things its way. The assumption, you'll notice, is to bless the return value into the class whose add() the caller called. This is acceptable; someone called its method, so someone thought it was top dog! (If you have an add method, no numify method, and fallback activated, you will enter an infinite loop because numify falls back to $x + 0.)

Finding Out What Something Is

At some point, your module needs to accept some data from somewhere. If you're like me, you want your module to DWIM based on what data it has received. Eventually, you want to know "Is it a scalar, arrayref, or hashref?" (Yes, I know there are seven different types in Perl.) There are many, many ways to do this. Some even work.

  • ref()

    ref() is the time-honored way to dispatch based on datatype, resulting in code that looks like:

    my $is_hash = ref( $data ) eq 'HASH';

    The problem is that ref( $data ) will return the class name of $data if it's an object. If someone has defined a class named HASH (don't do that!) that uses blessed array references, this will also break spectacularly.

  • isa()

    isa() will tell you whether a reference inherits from a class. The various datatypes are actually class-like. Some people suggest writing code like:

    my $is_hash = UNIVERSAL::isa( $data, 'HASH' );

    This will work whether or not $data is blessed. Again, though, if someone is mean enough to call a class HASH and bless an arrayref into it, you'll have trouble. Worse, this technique may break polymorphism spectacularly if $data is an object with an overloaded isa() method.

  • eval blocks

    Just try the data as a hashref and see if it succeeds.

    my $is_hash = eval { %{$data}; 1 };

    This avoids the primary issue of the two options listed above, but this may unexpectedly succeed in the case of overloaded objects. If $data is a Number::Fraction, you will mistakenly use $data as a hash because Number::Fraction uses blessed hashes for objects, even though the intent is to use them as scalars.

  • Assume that objects are special

    By using Scalar::Util's blessed() and reftype() functions, you can determine if a given scalar is a blessed reference or what type of reference it really is. If you want to find out if something is a hash reference, but you want to avoid the pitfalls listed above, write:

    my $is_hash = ( !blessed( $data ) && ref $data eq 'HASH' );
    # or
    my $is_hash = reftype( $data ) eq 'HASH';

    Nearly every use of overloading is to make an object behave as a scalar, as in Number::Fraction and similar classes. Using this technique allows you to respect the client's wishes more easily. You will still miss a few possibilities, such as (the somewhat eccentric) Object::MultiType (an excellent example of what you can do in Perl, if you put your mind to it).

    My personal preference is to let $data tell you what it can do.

  • Object representations

    Not all objects are blessed hashrefs. I like to represent my objects as arrayrefs, and other people use Inside-Out objects which are references to undef that work with hidden data. This means that my overloaded numbers are arrays, but I want you to treat them as scalars. Unless you ask $data how it wants you to treat it, how will you handle it correctly?

  • Overloading accessors

    overload allows you to overload the accessor operators, such as @{} and %{}. This means that one can theoretically bless an array reference and provide the ability to access it as a hash reference. Object::MultiType is an example of this. It is a hashref that provides array-like access.

    Unfortunately, the CPAN module that would do this doesn't exist, yet.

Letting Others Do Your Dirty Work

The modules that you and I use on a daily basis are, in general, as OS-portable, version-independent, and polite as possible. This means that the more your module depends upon other modules to do the dirty work, the less you have to worry about it. Modules like File::Spec and Scalar::Util exist to help you out. Other modules like XML::Parser will do their jobs, but also handle things like any Unicode you encounter so that you don't have to.

That said, you still have to be careful with whom your young module fraternizes with. Every module you add as a dependency is another module that can restrict where your module can live. If one of your module's dependencies is Windows-only, such as anything from the Win32 namespace, then your module is now Windows-only. If one of your dependencies has a bug, then you also have that bug. Fortunately, there are a few ways to bypass these problems.

  • Buggy dependencies

    Generally, module authors fix bugs relatively quickly, especially if you've provided a test file that demonstrates the bug and a patch that makes those tests pass. Once your module's dependency has a new version released, you can release a new version that requires the version with the bug fix.

  • OS-specific dependencies

    The first option is to accept it. If no one on Atari MiNT cares, then why should you? Alternatively, you can encapsulate the OS-dependent module and find another module that provides the same features on the OS you're trying to support. File::Spec is an excellent example of how to encapsulate OS-specific behavior behind a common API.

There's a lot to keep in mind when writing a module for CPAN: OS and Perl versions, Unicode, threading, persistence--it can be very overwhelming at times. With a few simple steps and a willingness to let your users tell you what they need, you'll be the toast of the town.