March 2010 Archives

I'm working on a project with Curtis "Ovid" Poe and Adrian Howard. We use Perl 5.10.1, but because we control which version of Perl 5 we use, there's no reason not to test with Perl 5.12.0 -- and if we find bugs, we can report them and get them fixed in the proper place.

This application has its own quirks for setup and installation. I managed to clean up some of the worst offenses as my first work on the project; it installs and passes tests on my server with Perl 5.10.1, so it should install cleanly if all of its dependencies work with Perl 5.12.

My first approach was to manage my own parallel installation of Perl 5 with local::lib and a custom installation of Perl 5.12, but the manual intervention required to make all of that work was enough of a hassle that I took a tip from Chris Prather and installed App::perlbrew to manage my various installations (system Perl 5.10.0 built with threading, custom Perl 5.10.1 without threads, and now Perl 5.12.0 RC1).

    $ cpan App::perlbrew
    $ perlbrew init
    $ echo 'source /home/chromatic/perl5/perlbrew/etc/bashrc' >> ~/.bashrc
    $ source /home/chromatic/perl5/perlbrew/etc/bashrc
    $ perlbrew install perl-5.12.0-RC1 -as p512

The -as p512 option was optional; it lets me use p512 as a short name to refer to that particular installation when switching between versions.

After a while with no obvious output (which is fine), the end result is the ability to switch between parallel Perl 5 installations without them stomping on each other. They're all installed locally in my own home directory, so I can use CPAN or cpanminus to install modules without worrying about root access or messing up the system for anyone else.

I had already installed local::lib, but I'm not sure it's necessary in this case.

With the changes to my .bashrc, now perl is a symlink. Switching my version with perlbrew swaps a symlink, so every time I invoke perl directly, it uses the intended version. Shebang lines remain unaffected, so anything which invokes a program directly will use a hard-coded version of Perl. Unfortunately, this includes cpanm, so I took to using an alias which does perl `which cpanm` as a temporary workaround. Miyagawa suggested not using CPAN to install cpanminus. Instead, he recommends:

$ curl -L http://cpanmin.us | perl - App::cpanminus

Note that you'll have to do this for every new version of Perl you install with perlbrew.

Here's the nice part of perlbrew. I can also install Perl 5.10.1 through it (replacing my custom installation) and switch between the two with a simple command:

    $ perlbrew switch p5101
    $ perlbrew switch perl-5.10.1

You can see what you have installed with:

    $ perlbrew installed

For those of you curious as to the results of my experiments with 5.12.0, Devel::Cover doesn't work correctly yet, but that's not a requirement for this project. Devel::BeginLift needs a patch to build. Fortunately, that's available in the RT queue. A manual build and test worked just fine. Other than that, a little bit of babysitting on the installation satisfied all of the dependencies.

If I'd had to manage the installation (and module paths and...) of all of this software, I'd have spent a lot more time on the fiddly details of installing dependencies and not the interesting part. App::perlbrew allowed me to concentrate on what really matters: does my software work?

Perl 5.12.0 will come out soon. Use App::perlbrew to test code you care about with it.

Suppose that you want to load a module dynamically (you have the name in a scalar), then alias a function from that module to a new name in another class. In other words, you want a renaming import. How do you do that in Perl 5?

{
    no strict 'refs';
    eval qq{require $class} or die $@;
    *{$other_class."::".$alias} = $class->can($func);
}

There's a lot of magic going on there. Aliasing requires using symbolic refs which means turning off strict. Because you want strict off in as small a hunk of code as possible you have to enclose it in braces. Then, require Class and require $class work differently, so you have to trick require into seeing a bareword by evaling it. Don't forget to catch and rethrow the error! Finally, to do the aliasing you need to get a code ref with can() and assign it to the symbol table via the magic of typeglobs.

Guh. There's an idea in interface design called The Gulf of Execution which measures the distance between the user's goal and the actions she must take to achieve that goal. The goals here are to:

  1. Load a class from a variable.
  2. Alias a function in that class.

The actions are:

  1. Enclose the code in a block.
  2. Turn off strict.
  3. require $class in an eval block to turn it into a bareword.
  4. Catch and rethrow any error which might result.
  5. Use can() to get a reference to the function.
  6. Construct a fully qualified name for the alias.
  7. Turn that into a typeglob.
  8. Assign the code ref to the typeglob.
  9. Drink.

Try explaining that to a non-Perl guru.

Now consider the perl5i (specifically perl5i::2) way:

$class->require
      ->can($func)
      ->alias($other_class, $alias);

Release the breath you've been holding in for the last 15 years of Perl 5.

Through the magic of autoboxing, perl5i lets you call methods on unblessed scalars, hashes, arrays, regexes, references... anything. It also implements some handy methods. Some, like require(), are core functions redone as methods. Others, like alias(), should be core functions never made it in for whatever reason. autoboxing gives perl5i the freedom to add handy features without polluting the global function/method namespace with new keywords.

Recall the goals:

  1. Load a class from a variable.
  2. Alias a function in that class.

... and consider perl5i's actions:

  1. Call require() to load the class.
  2. Call can() to get a reference to the function.
  3. Call alias() on that reference to alias it to the other class.

The gulf has narrowed to a stream you can hop over while hardly getting your feet wet.

The goal of perl5i is to bring modern conveniences back to Perl 5. In the 15 years since the release of Perl 5, we've learned a lot. Our views of good practices have changed. 15 years ago, aliasing a function was black magic available only to the wildest of gurus. Now it's a technique many module authors take advantage of. Why should it remain complicated and error prone?

Autoboxing is a big part of perl5i, allowing it to add convenience methods without having to add new keywords. Adding new keywords--which contracts the function names available to programmers--is a big thing holding Perl 5 back! Every potential new keyword is a debate over compatibility. Autoboxing eliminates that debate. It takes off the brakes.

Some other examples: how do I check if a scalar contains a number, an integer, or a float? The Perl FAQ entry on the subject is two pages long offering five different possibilities, two of which require pasting code. Code in FAQs tends to rot, and perlfaq is no exception; without testing nobody noticed that those venerable regexes fail to catch "+1.23". How does perl5i do it?

say "It's a number"   if $thing->is_number;
say "It's an integer" if $thing->is_integer;
say "It's a decimal"  if $thing->is_decimal;

It's clear, simple, fast, and tested. TMTOWTDI is great and all, but FAQs are about getting things done, not about writing dissertations on the subject. perl5i picks one way that's pretty good and makes it available with no fuss.

The bar for what is "simple" has moved since Perl 5 first came out. perl5i takes the goal of "simple things should be simple" and helps us all catch up.

Idioms, or How to Write Perlish Perl

Any language—programming or natural—develops idioms, or common patterns of expression. The earth revolves, but we speak of the sun rising or setting. We talk of clever hacks and nasty hacks and slinging code. We ping each other on IRC to discuss spaghetti code, and we factor and refactor away the artifacts of copy pasta.

As you learn Perl 5 in more detail, you will begin to see and understand common idioms. They're not quite language features—you don't have to use them—and they're not quite large enough that you can encapsulate them away behind functions and methods. They're something more than habits. They're mannerisms. They're our shared jargon of code. They're ways of writing Perl with a Perlish accent.

The Object as $self

Perl 5's object system treats the invocant of a method as a mundane parameter. The invocant of a class method (a string containing the name of the class) is that method's first parameter. The invocant of an object or instance method, the object itself, is that method's first parameter. You are free to use or ignore it as you see fit.

Idiomatic Perl 5 uses $class as the name of the class method and $self for the name of the object invocant. This is a convention not enforced by the language itself, but it is a convention strong enough that useful extensions such as MooseX::Method::Signatures assume you will use $self as the name of the invocant by default.

This is true even if you use Moose.

Named Parameters

Without a module such as signatures or MooseX::Multimethods, Perl 5's argument passing mechanism is simple: all arguments flatten into a single list accessible through @_ (function_parameters). While this simplicity is occasionally too simple—named parameters can be very useful at times—it does not preclude the use of idioms to provide named parameters.

The list context evaluation and assignment of @_ allows you to unpack named parameters pairwise. Even though this function call is equivalent to passing a comma-separated or qw//-created list, arranging the arguments as if they were true pairs of keys and values makes the caller-side look like the function supports named parameters:

    make_ice_cream_sundae(
        whipped_cream => 1,
        sprinkles     => 1,
        banana        => 0,
        ice_cream     => 'mint chocolate chip',
    );

The callee side can unpack these parameters into a hash and treate the hash as if it were a single argument:

    sub make_ice_cream_sundae
    {
        my %args = @_;

        my $ice_cream = get_ice_cream( $args{ice_cream}) );
        ...
    }

This technique works well with import(); you can process as many parameters as you like before slurping the remainder into a hash:

    sub import
    {
        my ($class, %args)  = @_;
        my $calling_package = caller();

        ...
    }

Note how this idiom falls naturally out of list assignment; that makes this idiom Perlish.

The Schwartzian Transform

People new to Perl sometimes overlook the importance of lists and list processing as a fundamental component of expression evaluation (footnote: People explaining its importance in this fashion do not help). Put more simply, the ability for Perl programmers to chain expressions which evaluate to variable-length lists gives them countless ways to manipulate data effectively.

The Schwartzian transform is an elegant demonstration of that principle as an idiom handily borrowed from the Lisp family of languages. (Randal Schwartz's initial posting of the Schwartzian transform mentions "Speak[ing] with a lisp in Perl.")

Suppose you have a Perl hash which associates the names of your co-workers with their phone extensions:

    use 5.010;

    my %extensions =
    (
        1004 => 'Jerryd',
        1005 => 'Rudy',
        1006 => 'Juwan',
        1007 => 'Brandon',
        1010 => 'Joel',
        1012 => 'LaMarcus',
        1021 => 'Marcus',
        1024 => 'Andre',
        1023 => 'Martell',
        1052 => 'Greg',
        1088 => 'Nic',
    );

Suppose you want to print a list of extensions and co-workers sorted by their names, not their extensions. In other words, you need to sort a hash by its keys. Sorting the values of the hash in string order is easy:

    my @sorted_names = sort values %extensions;

... but that loses the association of names with extensions. The beauty of the Schwartzian transform is that it solves this problem almost trivially. All you have to do is transform the data before and after sorting it to preserve the necessary information. This is most obvious when explained in multiple steps. First, convert the hash into a list of data structures which contain the vital information in sortable fashion. In this case, converting the hash pairs into two-element anonymous arrays will help:

    my @pairs = map { [ $_, $extensions{$_} ] } keys %extensions;

sort gets the list of anonymous arrays and can compare the second elements (the names) with a stringwise comparison:

    my @sorted_pairs = sort { $a->[1] cmp $b->[1] } @pairs;

Given @sorted_pairs, a second map operation can convert the data structure to a more usable form:

    my @formatted_exts = map { "$_->[1], ext. $_->[0]" } @sorted_pairs;

... and now you can print the whole thing:

    say for @formatted_exts;

Of course, this uses several temporary variables (with admittedly bad names). It's a worthwhile technique and good to understand, but the real magic is in the combination:

    say for
        map  { " $_->[1], ext. $_->[0]"          }
        sort {   $a->[1] cmp   $b->[1]           }
        map  { [ $_      =>    $extensions{$_} ] }
            keys %extensions;

Read the expression from right to left, in the order of evaluation. For each key in the extensions hash, make a two-item anonymous array containing the key and the value from the hash. Sort that list of anonymous arrays by their second elements, the values from the hash. Create a nicely formatted string of output from those sorted arrays.

The Schwartzian transform is this pipeline of map-sort-map where you transform a data structure into another form easier for sorting and then transform it back into your preferred form for modification.

In this case the transformation is relatively simple. Consider the case where calculating the right value to sort is expensive in time or memory, such as calculating a cryptographic hash for a large file. In that case, the Schwartzian transform is also useful because you can perform those expensive operations once (in the rightmost map), compare them repeatedly from a de-facto cache in the sort, and then remove them in the leftmost map.

The original example in the comp.lang.perl.misc shows an effective use of the transform, and a good programming technique in general. When the data you have isn't in the optimal form for what you want to do with it, first transform it into that optimal form, then manipulate it.

Phrased that way, the technique is so obvious as to seem trivial... but what is an idiom but a brilliant idea made vulgar by its ubiquity?

Every software distribution is a bunch of files written and maintained by programmers. The files are of three types: code, documentation, and crap—though this distinction is too subtle. Much of the documentation and code is crap, too. It's pointless. It's boring to write and to maintain, but convention dictates that it exist.

Perl's killer feature is the CPAN, and Dist::Zilla is a tool for packaging code to release to the CPAN. The central notion of Dzil is that no programmer should ever have to waste his or her precious time on boring things like README files, prerequisite accounting, duplicated license statements, or anything else other than solving real problems.

It's worth noting, too, that the "CPAN distribution" format is useful even if your code never escapes to the CPAN. Libraries packaged in any way are much easier to manage than their unpackaged counterpart, and any libraries package the CPAN way can interact with all the standard CPAN tools. As long are you're going to package up your code, you might as well use the same tools as everyone else in the game.

A Step-by-Step Conversion

Switching your old code to use Dist::Zilla is easy. You can be conservative and work in small steps, or you can go whole hog. This article demonstrates the process with one of my distributions, Number::Nary. To follow along, clone its git repository and start with the commit tagged pre-dzil. If you don't want to use git, that's fine. You'll still be able to see what's going on.

Replacing Makefile.PL

The first thing to do is to replace Makefile.PL, the traditional program for building and installing distributions (or dists). If you started with a Module::Build-based distribution, you'd replace Build.PL, instead. Dist::Zilla will build those files for you in the dist you ship so that installing users have them, but you'll never need to think about them again.

I packaged Number::Nary with Module::Install, the library that inspired me to build Dist::Zilla. Its Makefile.PL looked like:

  use inc::Module::Install;
  all_from('lib/Number/Nary.pm');
  requires('Carp'            => 0);
  requires('Test::More'      => 0);
  requires('List::MoreUtils' => 0.09);
  requires('Sub::Exporter'   => 0.90);
  requires('UDCode'          => 0);
  auto_manifest;
  extra_tests;
  WriteAll;

If I'd used ExtUtils::MakeMaker, it might've looked something like:

  use ExtUtils::MakeMaker;

  WriteMakefile(
    NAME      => 'Number::Nary',
    DISTNAME  => 'Number-Nary',
    AUTHOR    => 'Ricardo Signes <rjbs@cpan.org>',
    ABSTRACT  => 'encode and decode numbers as n-ary strings',
    VERSION   => '0.108',
    LICENSE   => 'perl',
    PREREQ_PM => {
      'Carp'                => 0
      'List::MoreUtils'     => '0.09',
      'Sub::Exporter'       => 0,
      'Test::More'          => 0,
      'UDCode'              => 0,
    }
  );

Delete that file and replace it with the file dist.ini:

  name    = Number-Nary
  version = 0.108
  author  = Ricardo Signes <rjbs@cpan.org>
  license = Perl_5
  copyright_holder = Ricardo Signes

  [AllFiles]
  [MetaYAML]
  [MakeMaker]
  [Manifest]

  [Prereq]
  Carp            = 0
  Test::More      = 0
  List::MoreUtils = 0.09
  Sub::Exporter   = 0.90
  UDCode          = 0

Yes, this file contains more lines than the original version, but don't worry—that won't last long.

Most of this should be self-explanatory, but the cluster of square-bracketed names isn't. Each line enables a Dzil plugin, and every plugin helps with part of the well-defined process of building your dist. The plugins I've used here enable the absolute minimum behavior needed to replace Makefile.PL: they pull in all the files in your checkout. When you build the dist, they add the extra files you need to ship.

At this point, you can build a releasable tarball by running dzil build (instead of perl Makefile.PL && make dist). There are more savings on the way, too.

Eliminating Pointless Packaging Files

The MANIFEST.SKIP file tells other packaging tools which files to exclude when building a distribution. You can keep using it (with the ManifestSkip plugin), but you can almost always just drop the file and use the PruneCruft plugin instead. It prunes all the files people usually put in their skip file.

The CPAN community has a tradition of shipping lots of good documentation written in Pod. Even so, several tools expect you also to provide a plain README file. The Readme plugin will generate one for you.

Downstream distributors (like Linux distributions) like to see really clear license statements, especially in the form of a LICENSE file. Because your dist.ini knows the details of your license, the License plugin can generate this file for you.

All three of these plugins are part of the Dist::Zilla distribution. Thus you can delete three whole files—MANIFEST.SKIP, LICENSE, and README—at the cost of a couple of extra lines in dist.ini:

  [PruneCruft]
  [License]
  [Readme]

That's not bad, especially when you remember that now when you edit your dist version, license, or abstract, these generated files will always contain the new data.

Stock Tests

People expect CPAN authors to run several tests before releasing a distribution to the public. Number::Nary had three of them:

  xt/release/perl-critic.t
  xt/release/pod-coverage.t
  xt/release/pod-syntax.t

(Storing them under the ./xt/release directory indicates that only people interested in testing a new release should run them.)

These files are pretty simple, but the last thing you want is to find out that you've copied and pasted a slightly buggy version of the file around. Instead, you can generate these files as needed. If there's a bug, fix the plugin once and everything gets the fix on the next rebuild. Once again, you can delete those three files in favor of three plugins:

  [ExtraTests]
  [CriticTests]
  [PodTests]

CriticTests and PodTests add test files to your ./xt directory. ExtraTests rewrites them to live in ./t, but only under the correct circumstances, such as during release testing.

If you've customized your Pod coverage tests to consider certain methods trusted despite having no docs, you can move that configuration into your Pod itself. Add a line like:

  =for Pod::Coverage some_method some_other_method this_is_covered_too

The CriticTests plugin, by the way, does not come with Dist::Zilla. It's a third party plugin, written by Jerome Quelin. There are a bunch of those on the CPAN, and they're easy to install. [CriticTests] tells Dist::Zilla to load Dist::Zilla::Plugin::CriticTests. Install it with cpan or your package manager and you're ready to use the plugin.

The @Classic Bundle and Cutting Releases

Because most of the time you want to use the same config everywhere, Dist::Zilla makes it easy to reuse configuration. The current dist.ini file is very close to the "Classic" old-school plugin bundle shipped with Dist::Zilla. You ca replace all the plugin configuration (except for Prereq) with:

  [CriticTests]
  [@Classic]

...which makes for a nice, small config file.

Classic enables a few other plugins, most of which aren't worth mentioning right now. A notable exception is UploadToCPAN. It enables the command dzil release, which will build a tarball and upload it to the CPAN, assuming you have a ~/.dzil/config.ini which resembles:

  [!release]
  user     = rjbs
  password = PeasAreDelicious

Letting Dist::Zilla Alter Your Modules

So far, this Dist::Zilla configuration builds extra files like tests and packaging files. You can get a lot more out of Dist::Zilla if you also let it mess around with your library files.

Add the PkgVersion and PodVersion plugins to let Dist::Zilla take care of setting the version in every library file. They find .pm files and add a our $VERSION = ... declaration and a =head1 VERSION section to the Pod—which means you can delete all those lines from the code and not worry about keeping them up to date anymore.

Prereq Detection

Now the dist.ini looks like:

  name    = Number-Nary
  version = 0.108
  author  = Ricardo Signes <rjbs@cpan.org>
  license = Perl_5
  copyright_holder = Ricardo Signes

  [CriticTests]
  [PodVersion]
  [PkgVersion]
  [@Classic]

  [Prereq]
  Carp            = 0
  Test::More      = 0
  List::MoreUtils = 0.09
  Sub::Exporter   = 0.90
  UDCode          = 0

Way too much of this file handles prerequisites. AutoPrereq fixes all of that by analyzing the code to determine all of the necessary dependencies and their versions. Install this third-party plugin (also by Jerome Quelin!) and replace Prereq with AutoPrereq. This plugin requires the use of the use MODULE VERSION form for modules which require specific versions. This is actually a very good thing, because it means that your code will no longer even compile if Perl cannot meet those prerequisites. It also keeps code and installation data in sync. (Make sure that you're requiring the right version in your code. Many dists require one version in the code and one in the prereq listing. Now that you have only one place to list the required version, make sure you get it right.)

You don't have to modify all use statements to that form. In this example, it's only necessary for List::MoreUtils and Sub::Exporter.

Pod Rewriting

Now it's time to bring out some heavy guns. Pod::Weaver is a system for rewriting documentation. It can add sections, rejigger existing sections, or even translate non-Pod syntax into Pod as needed. Its basic built-in configuration can take the place of PodVersion, which allows you to delete gobs of boring boilerplate Pod. For example, you can get rid of all the NAME sections. All you need to do is provide an abstract in a comment. If your library says:

  package Number::Nary;
  # ABSTRACT: encode and decode numbers as n-ary strings

... then you'll get a NAME section containing that abstract. You can document methods and attributes and functions with =method and =attr and =func respectively. Pod::Weaver will gather them up, put them under a top-level heading, and make them into real Pod.

You can delete your "License and Copyright" sections. Pod::Weaver will generate those just like Dist::Zilla generates a LICENSE file. It'll generate an AUTHOR section, so you can drop that too.

Release Automation

Now you're in the home stretch, ready to understand the "maximum overkill" approach to using Dist::Zilla. First, get rid of the version setting in the dist.ini and load the AutoVersion plugin. It will set a new version per day, or use any other sort of scheme you configure. Then add NextRelease, which will update the changelog with every new release. In other words, the changelog file now starts with:

  {{$NEXT}}
            updated distribution to use Dist::Zilla
            expect lots more releases now that it's so easy!

When you next run dzil release, the distribution will pick a new version number and build a dist using it. It will replace {{$NEXT}} with that version number (and the date and time of the build). After it has uploaded the release, it will update the changelog on disk to replace the marker with the release that was made and re-add it above, making room for notes on the next release.

Version Control

Finally, you can tie the whole thing into your version control system. I use Git. (That's convenient, because it's the only VCS with a Dist::Zilla plugin so far.) Add a single line to dist.ini:

  [@Git]

The Git plugin bundle will refuse to cut a release if there are uncommitted changes in the working tree. Once the tree is clean for a release, Dzil will commit the changes to the changelog, tag the release, and push the changes and the new tag to the remote origin.

Like the CriticTests, the Dzil Git plugins aren't bundled with Dist::Zilla (thank Jerome Quelin one more time). The at sign in the plugin name indicates that it's a bundle of Dzil plugins, but you can load or install the whole thing at once. To install it, install Dist::Zilla::PluginBundle::Git.

Total Savings?

Switching this little dist to Dist::Zilla entirely eliminated seven files from the repository. It cleaned out a lot of garbage Pod that was a drag to maintain. It improved the chances that every dist will have consistent data throughout, and it made cutting a new release as easy as running dzil release. That release command will do absolutely everything needed to make a pristine, installable CPAN distribution, apart from the actual programming.

All told, it takes under half an hour to upgrade a dist to Dist::Zilla, depending on the number of files from which you have to delete cruft. Once you've converted a few, explore some Dzil plugins. When you see how easy it is to write one, you'll probably want make a few of your own. Pretty soon you may find your dist.ini files contain exactly as much configuration as mine:

  [@RJBS]

That's the best kind of lazy.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en