Perl.com 
 Published on Perl.com http://www.perl.com/pub/a/2005/04/14/cpan_guidelines.html
See this if you're having trouble printing code examples

 

Building Good CPAN Modules
By Rob Kinyon
April 14, 2005

When you are planning to release a module to CPAN, one of your first tasks is figure out what OS, Perl version(s), and other environments you will and will not support. Often, the answers will come from what you can and cannot support, based on the features you want to provide and the modules and libraries you have used.

Many CPAN modules, however, unintentionally limit the places where they can work. There are several steps you can take to remove those limitations. Often, these steps are very simple changes that can actually enhance your module's functionality and maintainability.

It Runs On My Machine

You have the latest PowerBook, update from CPAN every day, and run the latest Perl version. The people using your module are not. Remember, just because an application or OS is older than your grandmother doesn't mean that it isn't useful anymore. Code doesn't spontaneously develop bugs over time, nor does it collect cruft that makes it run slower. Some vitally important applications have run untouched for 30+ years in languages that were deprecated when you were in diapers. These applications keep the lights on and keep track of all the money in the world, for example, and they typically run on very old computers.

Companies want to keep using their older systems because these systems work and they want to use Perl because Perl works everywhere. If you can leverage CPAN, you already have 90 percent of every Perl application written.

When in Rome

Perl runs on at least 93 different operating systems. In addition, there are 18 different productionized Perl 5 versions floating around out there (not counting the development branches and build options). 93 x 18 = 1674. That means your module could run on one of well over 1500 different OS/Perl version environments. Add in threading, Unicode, and other options, and there is simply no way you can test your poor module in all of the places it will end up!

Luckily, Perl also provides (many of) the answers.

Defining Your Needs

If you know that your module simply will not run in a certain environment, you should set up prerequisites. These allow you to provide a level of safety for your users. Prerequisites include:

Related Reading

Learning Perl Objects, References, and Modules
By Randal L. Schwartz

Operating System

What OS your module happens to land on is both less and more of an issue than most people realize. Most of us have had to work in both Unix-land and Windows-land, so we know of pitfalls with directory separators and hard-coding outside executables. However, there are other problems that only arise when your module lands in a place like VMS.

The VMS filesystem, for example, has the idea of a volume in a fully qualified filename. VMS also handles file permissions and file versioning very differently than the standard Unix/Win32/Mac model. An excellent example of how to handle these differences is the core module File::Spec.

Because this is an issue most authors have had to face at some point, there is a standard perlpod called, fittingly, perlport. If you follow what's in there, you will be just fine.

Perl Version

It's been over ten years since the release of Perl 5.0.0, and Perl has changed a lot in that time. Most installations, however, are not the latest and greatest version. The main reason is "If it ain't broke, don't fix it." There is no such thing as a safe upgrade.

Most applications have no need for the latest features and will never trip most of the bugs or security holes. They just aren't that complex. If you restrict your module to features only found in 5.8, or even 5.6, you will ignore a large number of potential users.

Security Improvements

Most security fixes are transparent to the programmer. If the algorithms behind Perl hashes improve, you won't see it. If a new release fixes a hole in suidperl, your module won't care.

Sometimes, however, a security fix is a new feature whose usage will (and should) become the accepted norm: for example, the three-arg form of open() of 5.6. In these cases, I use string-eval to try to use the new feature and default to the old feature if it doesn't work. (Checking $] here isn't helpful because if your Perl version is pre-5.6, it will still try to compile the three-arg form and complain.)

eval q{
    open( INFILE, ">", $filename )
          or die "Cannot open '$filename' for writing: $!\n";
}; if ( $@ ) {
    # Check to see if it's a compile error
    if ( $@ =~ /Too many arguments for open/ ) {
        open( INFILE, "> $filename" )
            or die "Cannot open '$filename' for writing: $!\n";
    }
    else {
        # Otherwise, rethrow the error
        die $@;
    }
}
Bug Fixes

Like security fixes, most bug fixes are transparent to the programmer. Most of us didn't notice that the hashing algorithm was less than optimal in 5.8.0 and had several improvements in 5.8.1. I know I didn't. In general, these will not affect you at all.

Unlike security fixes, if your module breaks on a bug in a prior version of Perl, there's probably not much you can do other than require the version where the bug fix occurred.

New Features

Everyone knows about use warnings; and our appearing in 5.6.0. You may, however, not know about the smaller changes. A good example is sorting.

5.8.0 changed sorting to be stable. This means that if the two items compare equally, the resulting list will preserve their original order. Prior versions of Perl made no such guarantee. This means that code like this may not do what you expect:

my @input = qw( abcd abce efgh );
my @output = sort {
    substr( $a, 0, 3 ) cmp substr( $b, 0, 3 )
} @input;

If you depend on the fact that @output will contain qw( abcd abce efgh ), your module may be run into problems on versions prior to 5.8.0. @output could contain qw( abce abcd efgh) because the sorting function considers abcd and abce identical.

Gotchas With OS and Perl Versions

Your module may be pristine when it comes to OS or Perl versions. Is the rest of your distribution? Your tests may betray a dependency that you weren't aware of.

For example, 5.6.0 added lexically scoped warnings. Instead of using the -w flag to the Perl executable, you can now say use warnings. Because enabling warnings is generally a good thing, this is a very common header for test files written by conscientious programmers using Perl 5.6.0+:

use strict;
use warnings;

use Test::More tests => 42;

Now, even if your module runs with Perls older than 5.6.0, your tests won't! This means your distribution will not install through CPAN or CPANPLUS. For administrators who install modules this way and who have better things to do that debug a module's tests, they won't install it.

by Rob Kinyon

Major New Features

Some new features are so large that they change the name of the game. These include Unicode and threading. Unicode has had support, in one form or another, in every version of Perl 5. That support has slowly moved from modules (such as Unicode::String) to the Perl core itself.

Threading

In 5.8.0, Perl's threading model changed from the 5.005 model (which never worked very well) to ithreads (which do). Additionally, multi-core processors are coming to the smaller servers. More and more, developers using 5.8+ choose to write threaded applications.

This means that your module might have to play in a threaded playground, which is a weird place indeed to process-oriented folks. Now, Perl's threading model is unshared by default, which means that global variables are safe from clobbering each other. This is different from the standard threading model, like Java's, which shares all variables by default. Because of this decision, most modules will run under threads with little to no changes.

The main issue you will need to resolve is what happens with your stateful variables. These are the variables that persist and keep a value in between invocations of a subroutine, yet need coordination across threads. A good example is:

{
    my $counter;
    sub next_value ( return ++$counter; }
}

If you depend on this counter being coordinated across every invocation of the next_value() subroutine, you need to take three steps.

The best description that I've seen of what you need to do to port your application to a threaded works successfully is "Where Wizards Fear to Tread" on Perl 5.8 threads.

Unicode

Although Unicode had some support prior to 5.8.0, a major feature in 5.8.0 was the near-seamless handling of Unicode within Perl itself. Prior that that, developers had to use Unicode::String and other modules. This means that you should look to handling strings as gingerly as possible if you consider support for Unicode on Perls prior to 5.8.0 as important. Luckily, most major modules already do this for you without you having to worry about it.

Discussing how to handle Unicode cleanly is an article in itself. Please see perlunicode and perluniintro for more information.

Playing Nicely with Others

If you're like me, you heard "Doesn't play well with others" a lot in kindergarten. While that's an admirable trait for a hacker, it's not something to praise in any modules that production systems depend upon. There are several common items to look out for when trying to play nicely with others.

Persistent Environments

Persistent environments, like mod_perl and FastCGI, are a fact of life. They make the WWW work. They are also a very different beast than a basic script that runs, does its thing, and ends. Basically, a persistent environment, such as mod_perl, does a few things.

Caching needs a special mention. Because most persistent environments load most of the code into shared memory before forking off children, it makes sense to load as much code that won't change as possible before forking. (If the code does change, the child process receives a fresh copy of the modified memory space, reducing the benefit of shared memory.) This means that modules need to be able to pre-load what they need on demand. This is why CGI, which normally defers loading anything as much as possible, provides the :all option to load everything at once.

The mod_perl folks have an excellent set of documentation as to what's different about persistent environments, why you should care, and what you need to do for your module to work right.

Overloading

It's very easy to create an overloaded class that cannot work with other overloaded classes. For example, if I'm using Overload::Num1 and Overload::Num2, I would expect $num1 + $num2 to DWIM. Unfortunately, with most overloaded classes written as below, they won't. (For more information as to how this code works, please read overload, or the excellent article "Overloading.")

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    $l = ref $l ? $l->numify : $l;
    $r = ref $r ? $r->numify : $r;

    return $l + $r;
}

Overload::Num1 uses the numify() method to retrieve the number associated with the class. Overload::Num2 uses the get_number() method. If I tried to use the two classes together, I would receive an error that looks something like Can't locate object method "numify" via package "Overload::Num2".

The solution is very simple--don't define an add() method. Define a numify (0+) method, set fallback to true, and walk away. You don't need to define a method for each option. You only need to do so if you have to do something special as part of doing that operation. For example, complex numbers have to add the rational and complex parts separately.

If you absolutely have to define add(), though, use something like this:

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    my $pkg = ref($l) || ref($r);

    # This is to explicitly call the appropriate numify() method
    $l = do {
        my $s = overload::Method( $l, '0+' );
        $s ? $s->($l) : $l
    };

    $r = do {
        my $s = overload::Method( $r, '0+' );
        $s ? $s->($r) : $r
    };

    return $pkg->new( $l + $r );
}

This way, each overloaded class can handle things its way. The assumption, you'll notice, is to bless the return value into the class whose add() the caller called. This is acceptable; someone called its method, so someone thought it was top dog! (If you have an add method, no numify method, and fallback activated, you will enter an infinite loop because numify falls back to $x + 0.)

Finding Out What Something Is

At some point, your module needs to accept some data from somewhere. If you're like me, you want your module to DWIM based on what data it has received. Eventually, you want to know "Is it a scalar, arrayref, or hashref?" (Yes, I know there are seven different types in Perl.) There are many, many ways to do this. Some even work.

Letting Others Do Your Dirty Work

The modules that you and I use on a daily basis are, in general, as OS-portable, version-independent, and polite as possible. This means that the more your module depends upon other modules to do the dirty work, the less you have to worry about it. Modules like File::Spec and Scalar::Util exist to help you out. Other modules like XML::Parser will do their jobs, but also handle things like any Unicode you encounter so that you don't have to.

That said, you still have to be careful with whom your young module fraternizes with. Every module you add as a dependency is another module that can restrict where your module can live. If one of your module's dependencies is Windows-only, such as anything from the Win32 namespace, then your module is now Windows-only. If one of your dependencies has a bug, then you also have that bug. Fortunately, there are a few ways to bypass these problems.

There's a lot to keep in mind when writing a module for CPAN: OS and Perl versions, Unicode, threading, persistence--it can be very overwhelming at times. With a few simple steps and a willingness to let your users tell you what they need, you'll be the toast of the town.

Perl.com Compilation Copyright © 1998-2006 O'Reilly Media, Inc.