Building Good CPAN Modules
by Rob Kinyon
|
Pages: 1, 2
Major New Features
Some new features are so large that they change the name of the game. These include Unicode and threading. Unicode has had support, in one form or another, in every version of Perl 5. That support has slowly moved from modules (such as Unicode::String) to the Perl core itself.
Threading
In 5.8.0, Perl's threading model changed from the 5.005 model (which never worked very well) to ithreads (which do). Additionally, multi-core processors are coming to the smaller servers. More and more, developers using 5.8+ choose to write threaded applications.
This means that your module might have to play in a threaded playground, which is a weird place indeed to process-oriented folks. Now, Perl's threading model is unshared by default, which means that global variables are safe from clobbering each other. This is different from the standard threading model, like Java's, which shares all variables by default. Because of this decision, most modules will run under threads with little to no changes.
The main issue you will need to resolve is what happens with your stateful variables. These are the variables that persist and keep a value in between invocations of a subroutine, yet need coordination across threads. A good example is:
{
my $counter;
sub next_value ( return ++$counter; }
}
If you depend on this counter being coordinated across every invocation of
the next_value() subroutine, you need to take three steps.
Sharing
Because Perl doesn't share your variables for you, you must explicitly share
$counterto make sure that it is correctly updated across threads.Locking
Because a context-switch between threads can happen at any time, you need to lock
$counterwithin thenext_value()subroutine.Version safety
Also, because ithreads is an optional 5.8.0+ feature and the
lock()subroutine is undefined before 5.6.0+, you may want to do some version checks.{ my $counter = 0; if ( $] >= 5.008 && exists $INC{'threads.pm'} ) { require threads::shared; import threads::shared qw(share); share( $counter ); } else { *lock = sub (*) {} } sub next_value { lock( $counter ); $counter++; } }
The best description that I've seen of what you need to do to port your application to a threaded works successfully is "Where Wizards Fear to Tread" on Perl 5.8 threads.
Unicode
Although Unicode had some support prior to 5.8.0, a major feature in 5.8.0 was the near-seamless handling of Unicode within Perl itself. Prior that that, developers had to use Unicode::String and other modules. This means that you should look to handling strings as gingerly as possible if you consider support for Unicode on Perls prior to 5.8.0 as important. Luckily, most major modules already do this for you without you having to worry about it.
Discussing how to handle Unicode cleanly is an article in itself. Please
see perlunicode and perluniintro for more
information.
Playing Nicely with Others
If you're like me, you heard "Doesn't play well with others" a lot in kindergarten. While that's an admirable trait for a hacker, it's not something to praise in any modules that production systems depend upon. There are several common items to look out for when trying to play nicely with others.
Persistent Environments
Persistent environments, like mod_perl and FastCGI, are a fact of life.
They make the WWW work. They are also a very different beast than a basic
script that runs, does its thing, and ends. Basically, a persistent
environment, such as mod_perl, does a few things.
Persistent interpreter
Launching the Perl executable is expensive, relatively speaking. In an environment such as a web application, every request is a separate invocation of a Perl script. Persistence keeps a Perl interpreter around in memory between invocations, reducing the startup overhead dramatically.
Forked children
In order to handle multiple requests at once, persistent environments tend to provide the capability for forked child processes, each with its own interpreter. Normally, this requires a copy of each module in every child's memory area.
Shared memory
Nearly every request will use the same modules (CGI, DBI, etc). Instead of loading them every time, persistent environments load them into shared memory that each of the child processes can access. This can save a lot of memory that would otherwise be required to load DBI once for every child. This allows the same machine to create many more children to handle many more requests simultaneously on the same machine.
Caching needs a special mention. Because most persistent environments load
most of the code into shared memory before forking off children, it makes sense
to load as much code that won't change as possible before forking. (If the code does
change, the child process receives a fresh copy of the modified memory space,
reducing the benefit of shared memory.) This means that modules need to be able
to pre-load what they need on demand. This is why CGI, which normally defers
loading anything as much as possible, provides the :all option to
load everything at once.
The mod_perl folks have an excellent set of documentation as to what's
different about persistent environments, why you should care, and what you need
to do for your module to work right.
Overloading
It's very easy to create an overloaded class that cannot work with other
overloaded classes. For example, if I'm using Overload::Num1 and
Overload::Num2, I would expect $num1 + $num2 to DWIM.
Unfortunately, with most overloaded classes written as below, they won't. (For
more information as to how this code works, please read overload, or the excellent article "Overloading.")
sub add {
my ($l, $r, $inv) = @_;
($l, $r) = ($r, $l) if $inv;
$l = ref $l ? $l->numify : $l;
$r = ref $r ? $r->numify : $r;
return $l + $r;
}
Overload::Num1 uses the numify() method to retrieve the number
associated with the class. Overload::Num2 uses the get_number()
method. If I tried to use the two classes together, I would receive an error
that looks something like Can't locate object method "numify" via package
"Overload::Num2".
The solution is very simple--don't define an add() method.
Define a numify (0+) method, set fallback to true, and walk away.
You don't need to define a method for each option. You only need to do so if
you have to do something special as part of doing that operation. For example,
complex numbers have to add the rational and complex parts separately.
If you absolutely have to define add(), though, use something
like this:
sub add {
my ($l, $r, $inv) = @_;
($l, $r) = ($r, $l) if $inv;
my $pkg = ref($l) || ref($r);
# This is to explicitly call the appropriate numify() method
$l = do {
my $s = overload::Method( $l, '0+' );
$s ? $s->($l) : $l
};
$r = do {
my $s = overload::Method( $r, '0+' );
$s ? $s->($r) : $r
};
return $pkg->new( $l + $r );
}
This way, each overloaded class can handle things its way. The assumption,
you'll notice, is to bless the return value into the class whose
add() the caller called. This is acceptable; someone called its
method, so someone thought it was top dog! (If you have an add method,
no numify method, and fallback activated, you will enter an infinite loop
because numify falls back to $x + 0.)
Finding Out What Something Is
At some point, your module needs to accept some data from somewhere. If you're like me, you want your module to DWIM based on what data it has received. Eventually, you want to know "Is it a scalar, arrayref, or hashref?" (Yes, I know there are seven different types in Perl.) There are many, many ways to do this. Some even work.
ref()ref()is the time-honored way to dispatch based on datatype, resulting in code that looks like:my $is_hash = ref( $data ) eq 'HASH';The problem is that
ref( $data )will return the class name of$dataif it's an object. If someone has defined a class namedHASH(don't do that!) that uses blessed array references, this will also break spectacularly.isa()isa()will tell you whether a reference inherits from a class. The various datatypes are actually class-like. Some people suggest writing code like:my $is_hash = UNIVERSAL::isa( $data, 'HASH' );This will work whether or not
$datais blessed. Again, though, if someone is mean enough to call a classHASHand bless an arrayref into it, you'll have trouble. Worse, this technique may break polymorphism spectacularly if$datais an object with an overloadedisa()method.evalblocksJust try the data as a hashref and see if it succeeds.
my $is_hash = eval { %{$data}; 1 };This avoids the primary issue of the two options listed above, but this may unexpectedly succeed in the case of overloaded objects. If
$datais a Number::Fraction, you will mistakenly use$dataas a hash because Number::Fraction uses blessed hashes for objects, even though the intent is to use them as scalars.Assume that objects are special
By using Scalar::Util's
blessed()andreftype()functions, you can determine if a given scalar is a blessed reference or what type of reference it really is. If you want to find out if something is a hash reference, but you want to avoid the pitfalls listed above, write:my $is_hash = ( !blessed( $data ) && ref $data eq 'HASH' ); # or my $is_hash = reftype( $data ) eq 'HASH';Nearly every use of overloading is to make an object behave as a scalar, as in Number::Fraction and similar classes. Using this technique allows you to respect the client's wishes more easily. You will still miss a few possibilities, such as (the somewhat eccentric) Object::MultiType (an excellent example of what you can do in Perl, if you put your mind to it).
My personal preference is to let
$datatell you what it can do.Object representations
Not all objects are blessed hashrefs. I like to represent my objects as arrayrefs, and other people use Inside-Out objects which are references to undef that work with hidden data. This means that my overloaded numbers are arrays, but I want you to treat them as scalars. Unless you ask
$datahow it wants you to treat it, how will you handle it correctly?Overloading accessors
overloadallows you to overload the accessor operators, such as@{}and%{}. This means that one can theoretically bless an array reference and provide the ability to access it as a hash reference. Object::MultiType is an example of this. It is a hashref that provides array-like access.Unfortunately, the CPAN module that would do this doesn't exist, yet.
Letting Others Do Your Dirty Work
The modules that you and I use on a daily basis are, in general, as OS-portable, version-independent, and polite as possible. This means that the more your module depends upon other modules to do the dirty work, the less you have to worry about it. Modules like File::Spec and Scalar::Util exist to help you out. Other modules like XML::Parser will do their jobs, but also handle things like any Unicode you encounter so that you don't have to.
That said, you still have to be careful with whom your young module fraternizes with. Every module you add as a dependency is another module that can restrict where your module can live. If one of your module's dependencies is Windows-only, such as anything from the Win32 namespace, then your module is now Windows-only. If one of your dependencies has a bug, then you also have that bug. Fortunately, there are a few ways to bypass these problems.
Buggy dependencies
Generally, module authors fix bugs relatively quickly, especially if you've provided a test file that demonstrates the bug and a patch that makes those tests pass. Once your module's dependency has a new version released, you can release a new version that requires the version with the bug fix.
OS-specific dependencies
The first option is to accept it. If no one on Atari MiNT cares, then why should you? Alternatively, you can encapsulate the OS-dependent module and find another module that provides the same features on the OS you're trying to support. File::Spec is an excellent example of how to encapsulate OS-specific behavior behind a common API.
There's a lot to keep in mind when writing a module for CPAN: OS and Perl versions, Unicode, threading, persistence--it can be very overwhelming at times. With a few simple steps and a willingness to let your users tell you what they need, you'll be the toast of the town.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 1 of 1.
- Wow !
2005-04-14 18:35:25 Oyku [Reply]
Module building topic is generally untouched. I always keep my Writings modules for CPAN book near by. But this article will be printed and stamped on my board over the monitor.
The content in the article is not appropriate for only building modules but also very useful and time saver when running your perl code on even two different distros or at a clients' machine.
Thanks a lot for writing this.



