July 2005 Archives

This Week in Perl 6, July 20-26, 2005


All--

Welcome to another Perl 6 summary, brought to you by microwaved Chinese food and air conditioning. I love the modern era. Without further ado, I bring you:

Perl 6 Compilers

PxPerl

Grégoire Péan announced the release of PxPerl 5.8.7-3, allowing easy access for people who want to play with Pugs and Parrot on Windows.

Test Report for Windows

Ronald Hill reported some failing tests for Pugs on Windows. Fortunately, given Pugs' development, there is a reasonable chance of having these problems fixed. Unfortunately, given Pugs' development, no such information made it to the list.

Parsing Perl 6 Rules

Nathan Gray wondered how Jeff Pinyan's parsing-Perl-6-rules project was going. Jeff said that it did not get very far, but he posted what he did have to his feather site.

Pugs Problems

Vadim Konovalov found two problems while playing with slurp. Adriano Ferreira showed him how to work around slurp not accepting a :raw option. Nobody commented on the peculiar $*ARGS[0] value when the argument is -foobarfluffy.

Official Perl 6 Rules Grammar

Patrick announced an "official Perl 6 grammar," which he will maintain closely with PGE in Parrot. It is incomplete at this point, but patches are most welcome.

PIL Nodes Descriptions

Allison Randal posted a request for a clue batting, listing various types of nodes in PIL and explaining her guesses at their descriptions. Stuart Cook and Patrick both provided a little help, although they did not address everything on her list.

Perl 6 FAQ Patch

Autrijus provided a patch for the Perl 6 FAQ to remove an outdated question. Robert Spier applied the patch (modulo some confusion about staged versus live copies).

Parrot

Opcode Optimizability

Curtis Rawls noted that it is often simpler from an optimizer writer's standpoint to do constant folding and optimization on a smaller set of opcodes (just one variant of add instead of five (seven, if you count inc and dec)). Leo explained that removing these opcodes isn't an option, but suggest to add to the FAQ the recommendation for compiler writers to emit only the more verbose codes.

Refcounting Hash

Nicholas Clark wants to use a hash to hold reference counts for Ponie (something like dod_register_pmc in pmc.c), but he doesn't want to duplicate code. Leo suggested that he move some of the code into a PMC and then switch the real registry to use that PMC.

New PGE Test

Mitchell N. Charity submitted a test for a "large" Pugs grammar. It currently fails. Patrick noted that the test likely came from rx_grammar.pl in the Pugs distribution. This probably led to his above addition of an "Official Perl 6 Rules Grammar."

JIT Emit Help

Adam Preble decided that he would play with an x86_64 code generator. Unfortunately, he hit some stumbling blocks. Leo offered to help him and provided pointers from #parrot.

Call Opcode Cleanup

Leo wants to clean up some of the various invoke opcodes. He posted a request for comment, but Warnock applies. It seems that Leo's requests for comments like this get Warnocked a lot.

spawnw Return Value

Prompted by Jonathan Worthington submitting a patch to make the spawnw tests pass on Windows (applied), Jerry Gay opened a TODO ticket for switching spawnw to return something object-like to wrap platform-specific oddities.

Bugs in ops2vim.pl

Amir Karger noticed a bug in ops2vim.pl and suggested a fix. Jerry Gay fixed it.

Leo's Ctx Branch Tests

Jerry Gay and Leo worked together to get his branch passing a few more tests on Windows. Nick Glencross wondered if the Python dynclasses tests were running, too. Jonathan Worthington explained that they were being skipped for the moment.

Raised by the Aliens

Matt Diephouse was surprised to discover that you cannot use addparent with a PMC for either argument. He suggested that either should work or should have official documentation.

Patches Accumulating

Leo requested that people with commit bits pick up some of the patches that were building up, as he was running a little low on tuits.

Dump CFG

Curtis Rawls moved the dump_cfg call from reg_alloc.c to cfg.c. Leo applied the patch.

string_to_cstring Leaks

Jonathan Worthington plugged a few leaks caused by string_to_cstring. Leo applied the patch.

Deleting Globals/Lexicals

Matt Diephouse noted that there was no way to delete globals or lexicals. Leo posted one (untested) way to do it.

Generating Win32 Executables

Jonathan Worthington laid some groundwork for generating executables on Windows. Leo applied the patch.

Library Loading on Win32

Jonathan Worthington beefed up the library searching logic in Parrot to be a little more Windowsy. Leo applied the patch.

PBC Merge Utility

Leo posted a request for a utility that could merge several PBC files into one.

Calling Super Methods

Matt Diephouse noticed that there was no way to call the method from a super class. Leo pointed out a way to do it by accessing the slots of the parent directly.

cmd Buffer Reallocation

Greg Bacon fixed a bug in the reallocation of the cmd buffer on Win32. Jonathan Worthington applied the patch.

Data::Dump (PGE)

Will Coleda added a TODO for making PGE's match objects compatible with Data::Dumper.

does Hash

Will Coleda wants Data::Dumper to check if an object does Hash or Array and dump it thusly if it has no default dump.

rx.ops's Future

Will Coleda wondered about the future of the rx ops. Brent "Dax" Royal-Gordon, who wrote them, recons they are not long for this world. He mentioned though that the intstacks and the bitmap handling code might be worth saving.

Debugger-List Breakpoints

Will Coleda noticed that the debugger was not quite compatible with Perl's. Leo replied that the debugger's whole command loop was a mess that required a turn of the crank.

\u Escape Issues

Will Coleda brought up an old ticket for some Unicode escape issues. Leo asked for a test case.

string -> int Conversions

Matt Diephouse noticed that there are no opcode octal and hex conversions. Leo suggested adding one of the form set Ix, Sy, Ibase # Ibase = 2..36.

Make make languages Failures Non-Fatal

Bernhard Schmalhofer suggested that make languages should not give up after the first failure, but should instead build the remaining languages.

Dynclasses on Windows

Nick Glencross and Jonathan Worthington discussed how to make dynclasses build on Windows.

Resizable*Array Allocation

Matt Fowles submitted a patch making all the various Resizeable*Array PMCs share their allocation strategy. Bernhard Schmalhofer applied the patch.

MMD Roundup: Take 2

Nicholas Clark attempted to de-Warnock a suggested change by Leo. Unfortunately, his thoughts on the matter were, "This is really a call for the designer to make, isn't it?" Leo suggested starting a WARNOCKED file for these things. Will countered that adding it to the DESIGN section in docs/ROADMAP, would mean that Chip needs only look in one place.

Parrot Failures on Mac OS X

Nicholas Clark forwarded some failures on Mac OS X to the Parrot list (from the Ponie one).

Parrot Needs STDERR

Nicholas Clark noticed that running Parrot with a closed STDERR makes Parrot unhappy.

GMC

Alexandre Buisse and many others have been talking about his Generation Mark and Compact Garbage Collector. Plans are rapidly taking shape.

Perl 6 Language

User-Defined Context Behavior

Ingo Blechschmidt wanted to know how to make his own custom class that would act specially in list context. Thomas Sandlaß suggested overloading &infix:<=>. Sadly, his answer doesn't seem to have made it to Google Groups.

Hash Creation with Duplicate Keys

Ingo Blechschmidt noticed that in Perl 5 hash definitions, the rightmost duplicate wins, whereas the leftmost wins in Perl 6. He wondered if this was a bug or not. Luke explained that it was that way for named variable bindings. Larry figured it should be that way only for named variable binding. If Pugs has not done it yet, some brave soul could probably add tests and find it implemented before they had finished committing.

Tail Calls, Methods, and Currying

Brent "Dax" Royal-Gordon wondered about tail calls, noting that the current method ($obj.can('meth').goto($obj, *@args);) is kinda ugly. Larry mused that return g() should go ahead and tail call. If the code does not want a tailcall there, then it should avoid it manually.

Pairs and Binding Play Poorly Together

Autrijus noted that pairs and bindings (such as in a for loop) play badly together. Larry supposed that the Bare code object could have parameters of type Pair|Item (note no Junction) by default to solve this problem. Damian supported the exclusion of Junction.

Method Introspection and Meta meta.classes

Chromatic wondered about subroutine and method introspection. Sam Vilain thought he might want to look at Stevan Little's Perl 6 MetaModel. He also talked about closing the loop on meta-meta-meta headaches. Apparently Smalltalk has done this somewhere.

Big Object Rethink

Larry posted a fairly major rethink of member variables and methods. Honestly I did not quite follow what he described, and there is a lot to summarize--Hey! Look over there! ::PUNT:: Nothing to see here, move along.

Garbage Collection API

David Formosa (after being lightly chastized by an unknown summarizer) started a new thread expanding on his desire for a GC API. I thought there were replies to this, but they don't seem to have made it to Google.

Exposing the GC

Piers Cawley thought that it might be useful to expose the GC to get an array of all objects of a particular class. Brent "Dax" Royal-Gordon thought that the ability to get such an array would be useful, but that it should merely be an implementation detail of whether an array of weak refs or the GC or Harry Potter was invoked.

The Usual Footer

To post to any of these mailing lists, please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to

Porting Test::Builder to Perl 6


Perl 6 development now proceeds in two directions. The first is from the bottom up, with the creation and evolution of Parrot and underlying code, including the Parrot Grammar Engine. The goal there is to build the structure Perl 6 will need. The second direction is from the top down, with the Pugs project implementing Perl 6 initially separate from Parrot, though recent additions allow an embedded Parrot to run the parsed code and to emit valid Parrot PIR code.

Both projects are important and both help the design of Perl 6 and its implementation. Parrot is valuable in that it demonstrates a solid foundation for Perl 6 (and other similar languages); a far better foundation than the internals of Perl 5 have become. Pugs is important because it allows people to use Perl 6 productively now, with more features every day.

Motivation and Design

Perl culture values testing very highly. Several years ago, at the suggestion of Michael Schwern, I extracted the code that would become Test::Builder from Test::More and unified Test::Simple and Test::More to share that back end. Now dozens of other testing modules, built upon Test::Builder, work together seamlessly.

Pugs culture also values testing. However, there was no corresponding Test::Builder for Perl 6 yet--there was only a single Test.pm module that did most of what the early version of Test::More did in Perl 5.

Schwern and I have discussed updates and refactorings of Test::Builder for the past couple of years. We made some mistakes in the initial design. As Perl 6 offers the chance to clean up Perl 5, so does a port of Test::Builder to Perl 6 offer the chance to clean up some of the design decisions we would make differently now.

Internally, Test::Builder provides a few testing and reporting functions and keeps track of some test information. Most importantly, it contains a plan consisting of the number of tests expected to run. It also holds a list of details of every test it has seen. The testing and reporting functions add information to this list of test details. Finally, the module contains functions to report the test details in the standard TAP format, so that tools such as Test::Harness can interpret the results correctly.

Test::Builder needs to do all of these things, but there are several ways to design the module's internals. Some ways are better than others.

The original Perl 5 version mashed all of this behavior together into one object-oriented module. To allow the use of multiple testing modules without confusing the count or the test details, Test::Builder::new() always returns a singleton. All test modules call the constructor to receive the singleton object and call the test reporting methods to add details of the tests they handle.

This works, but it's a little inelegant. In particular, modules that test test modules have to go to a lot of trouble to work around the design. A more flexible design would make things like Test::Builder::Tester much easier to write.

The biggest change that Schwern and I have discussed is to separate the varying responsibilities into separate modules. The new Test::Builder object in Perl 6 itself contains a Test::Builder::TestPlan object that represents the plan (the number of tests to run), a Test::Builder::Output object that contains the filehandles to which to write TAP and diagnostic output, and an array of tests' results (all Test::Builder::Test instances).

The default constructor, new(), still returns a singleton by default. However, modules that use Test::Builder can create their own objects, which perform the Test::Builder::TestPlan or Test::Builder::Output roles and pass them to the constructor to override the default objects created internally for the singleton. If a test module really needs a separate Test::Builder object, the alternate create() method creates a new object that no other module will share.

This strategy allows the Perl 6 version of Test::Builder::Tester to create its own Test::Builder object that reports tests as normal and then creates the shared singleton with output going to filehandles it can read instead of STDOUT and STDERR. The design appears to be sound; it took less than two hours to go from the idea of T::B::T to a fully working implementation--counting a break to eat ice cream.

First Attempts

Translating Perl 5 OO code into Perl 6 OO code was mostly straightforward, despite my never having written any runnable Perl 6 OO code. (Also, Pugs was not far enough along that objects worked.)

What Went Right

One nice revelation is that opaque objects are actually easier to work with than blessed references. Even better, Perl 6's improved function signatures reduce the necessity to write lots of boring boilerplate code.

Breaking Test::Builder into separate pieces gave the opportunity for several other refactorings. One of my favorite is "Replace Condititional with Polymorphism". There are four different types of tests that have different reporting styles: pass, fail, SKIP, and TODO. It made sense to create separate classes for each of those, giving each the responsibility and knowledge to produce the correct TAP output. Thus I wrote Test::Builder::Test, a façade factory class with a very smart constructor that creates and returns the correct test object based on the given arguments. When Test::Builder receives one of these test objects, it asks it to return the TAP string, passes that message to its contained Test::Builder::TestOutput object, and stores the test object in the list of run tests.

O'Reilly Open Source Convention 2005.

What Went Wrong

Writing the base for all (or at least many) possible test modules is tricky. In this case, it was trebly so. Not only was this the first bit of practical OO Perl 6 code I'd written, but I had no way to test it, either by hand (how I tested the Perl 5 version, before Schwern and I worked out a way to write automated tests for it), or with automated tests. Pugs didn't even have object support when I wrote this, though checking in this code pushed OO support higher on the schedule.

Infinite Loops in Construction

Originally, I thought all test classes would inherit from Test::Builder::Test. As Damian Conway pointed out, my technique created an infinite loop. (He suggested that "Don't make a façade factory class an ancestor of the instantiable classes" is a design mistake akin to "Don't get involved in a land war in Asia" and mumbled something else about battles of wits and Sicilians.) The code looked something like:

  class Test::Builder::Test
  {
      my Test::Builder::Test $:singleton is rw;

      has Bool $.passed;
      has Int  $.number;
      has Str  $.diagnostic;
      has Str  $.description;

      method new (Test::Builder::Test $class, *@args)
      {
          return $:singleton if $:singleton;
          $:singleton = $class.create( @args );
          return $:singleton;
      }

      method create(
          $number, 
          $passed       =  1,
          ?$skip        =  0,
          ?$todo        =  0,
          ?$reason      = '',
          ?$description = '',
      )
      {
          return Test::Builder::Test::TODO.new(
              description => $description, reason => $reason, passed => $passed,
          ) if $todo;

          return Test::Builder::Test::Skip.new(
              description => $description, reason => $reason, passed => 1,
          ) if $skip;

          return Test::Builder::Test::Pass.new(
              description => $description, passed => 1,
          ) if $passed;

          return Test::Builder::Test::TODO.new(
              description => $description, passed => 0,
          ) if $todo;
      }
  }

  class Test::Builder::Test::Pass is Test::Builder::Test {}
  class Test::Builder::Test::Fail is Test::Builder::Test {}
  class Test::Builder::Test::Skip is Test::Builder::Test { ... }
  class Test::Builder::Test::TODO is Test::Builder::Test { ... }

  # ...

Why is this a singleton? I have no idea; I typed that code into the wrong module and continued writing code a few minutes later, thinking that I knew what I was doing. The infinite loop stands out in my mind very clearly now. Because all of the concrete test classes inherit from Test::Builder::Test, they inherit its new() method; none of them override it. Thus, they'll all call create() again (and none of them override that either).

Confusing Initialization

I also struggled with the various bits and pieces of creating and building objects in Perl 6. There are a lot of hooks and overrides available, making the object system very flexible. However, without any experience or examples or guidance, choosing between new(), BUILD(), and BUILDALL() is difficult.

I realized I had no idea how to handle the singleton in Test::Builder. At least, when I realized that (for now) Test::Builder could remain a singleton, I didn't know how or where to create it.

I finally settled on putting it in new(), with code much like that in the broken version of Test::Builder::Test previously. new() eventually allocates space for, creates, and returns an opaque object. BUILD() initializes it. This led me to write code something like:

  class Test::Builder;

  # ...

  has Test::Builder::Output   $.output;
  has Test::Builder::TestPlan $.plan;

  has @:results;

  submethod BUILD ( Test::Builder::Output ?$output, ?$TestPlan )
  {
      $.plan   = $TestPlan if $TestPlan;
      $.output = $output ?? $output :: Test::Builder::Output.new();
  }

There's a difference here because most uses of Test::Builder set the test plan explicitly later, after receiving the Test::Builder object. I added a plan() method, too:

  method plan ( $self:, Str ?$explanation, Int ?$num )
  {
      die "Plan already set!" if $self.plan;

      if ($num)
      {
          $self.plan = Test::Builder::TestPlan.new( expect => $num );
      }
      elsif $explanation ~~ 'no_plan'
      {
          $self.plan = Test::Builder::NullPlan.new();
      }
      else
      {
          die "Unknown plan";
      }

      $self.output.write( $self.plan.header() );
  }

There are some stylistic errors in the previous code. First, when declaring an invocant, there's a colon but no comma. Second, fail is much better than die (an assertion Damian made that I take on faith, having researched more serious issues instead). Third, the parenthesization of the cases in the if statement is inconsistent.

Final (Ha!) Version

Shortly after I checked in the example code, Stevan Little began work on a test suite (using Test.pm). I knew that Pugs didn't support many of the necessary language constructs, but this allowed Pugs hackers to identify necessary features and me to identify legitimate bugs and mistakes in the code. (It's tricky to bootstrap test-driven development.)

After filling out the test suite, fixing all of the known bugs in my code, talking other Pugs hackers into adding features I needed, and implementing those I couldn't pawn off on others, Test::Builder works completely in Pugs right now. There is one remaining nice feature: splatty args in method calls. But I'm ready to port Test.pm to the new back end and then write many, many more useful testing modules--starting with a port of Mark Fowler's Test::Builder::Tester written the night before this article went public!

The singleton creation in Test::Builder now looks like:

  class Test::Builder-0.2.0;
  
  use Test::Builder::Test;
  use Test::Builder::Output;
  use Test::Builder::TestPlan;
  
  my  Test::Builder           $:singleton;
  has Test::Builder::Output   $.output handles 'diag';
  has Test::Builder::TestPlan $.testplan;
  has                         @:results;
  
  method new ( Test::Builder $Class: ?$plan, ?$output )
  {
      return $:singleton //= $Class.SUPER::new(
          testplan => $plan, output => $output
      );
  }
  
  method create ( Test::Builder $Class: ?$plan, ?$output )
  {
      return $Class.new( testplan => $plan, output => $output );
  }
  
  submethod BUILD
  (
      Test::Builder::TestPlan ?$.testplan,
      Test::Builder::Output   ?$.output = Test::Builder::Output.new()
  )
  {}

Those test modules that want to use the default $Test object directly can call Test::Builder::new() to return the singleton, creating it if necessary. Test modules that need different output or plan objects should call Test::Builder::create(). (The test suite actually does this.)

Having removed the Test::Builder code from Test::Builder::Test, I revised the latter, as well:

  class Test::Builder::Test-0.2.0
  {
      method new (
          $number,     
          ?$passed      = 1,
          ?$skip        = 0,
          ?$todo        = 0,
          ?$reason      = '', 
          ?$description = '',
      )
      {
          return ::Test::Builder::Test::TODO.new(
              description => $description, passed => $passed, reason => $reason
          ) if $todo;

          return ::Test::Builder::Test::Skip.new(
              description => $description, passed =>       1, reason => $reason
          ) if $skip;

          return ::Test::Builder::Test::Pass.new(
              description => $description, passed =>       1,
          ) if $passed;

          return ::Test::Builder::Test::Fail.new(
              description => $description, passed =>       0,
          );
      }
  }

That's it. I moved the object attributes into roles. Test::Builder::Test::Base is the basis for all tests, encapsulating all of the attributes that tests share and providing the important methods:

  role Test::Builder::Test::Base
  {
      has Bool $.passed;
      has Int  $.number;
      has Str  $.diagnostic;
      has Str  $.description;

      submethod BUILD (
          $.description,
          $.passed,
          ?$.number     =     0,
          ?$.diagnostic = '???',
      ) {}

      method status returns Hash
      {
          return
          {
              passed      => $.passed,
              description => $.description,
          };
      }

      method report returns Str
      {
          my $ok          = $.passed ?? 'ok' :: 'not ok';
          my $description = "- $.description";
          return join( ' ', $ok, $.number, $description );
      }

  }

  class Test::Builder::Test::Pass does Test::Builder::Test::Base {}
  class Test::Builder::Test::Fail does Test::Builder::Test::Base {}

Test::Builder::Test::WithReason forms the basis for TODO and SKIP tests, adding the reason why the developer marked the test as either:

  role Test::Builder::Test::WithReason does Test::Builder::Test::Base
  {
      has Str $.reason;

      submethod BUILD ( $.reason ) {}

      method status returns Hash ( $self: )
      {
          my $status        = $self.SUPER::status();
          $status{"reason"} = $.reason;
          return $status;
      }
  }

  class Test::Builder::Test::Skip does Test::Builder::Test::WithReason { ... }
  class Test::Builder::Test::TODO does Test::Builder::Test::WithReason { ... }

What's Hard

The two greatest difficulties I encountered in this porting effort were in mapping my design to the new Perl 6 way of thinking and in working around Pugs bugs and unsupported features. The former is interesting; it may suggest places where other people will run into difficulties.

One of the trickiest parts of Perl 6's OO model to understand is the interaction of the new(), BUILD(), and BUILDALL() methods. Perl 5 provides very little in the way of object support beyond bless. Though having finer-grained control over object creation, initialization, and initializer dispatch will be very useful, remembering the purposes of each method is very important, lest you override the wrong one and end up with an infinite loop or partially initialized object.

From rereading the design documents, experimenting, picking the brains of other @Larry members, and thinking hard, my rules are:

  • Leave new() alone.

    This method creates the opaque object. Override it when you don't want to return a new object of this class every time. Don't do initialization here. Don't forget to call SUPER::new() if you actually want an object.

  • Override BUILD() to add initialize attributes for objects of this class.

    Think of this as an initializer, not a constructor.

  • Override BUILDALL() when you want to change the order of initialization.

    I haven't needed this yet and don't expect to.

Pugs-wise, find a good Haskell tutorial, find a really fast machine that can run GHC 6.4, and look for lambdacamel mentors on #pugs. (My productivity increased when Autrijus told me about Haskell's trace function. He called it a refreshing desert in the oasis of referential transparency.)

What's Easy

Was this exercise valuable? Absolutely! It reinforced my belief that Perl 6 is not only Perlish, but that it's a fantastic revolution of Perl 5 in several ways:

  • The object system is much better. Attributes and accessors require almost no syntax, and that only in their declarations. Using attributes feels Perlish, even if it's not manipulating hash keys.
  • Function signatures eliminate a lot of code. My initializers do a lot of work, but they don't take much code. Some even have empty method bodies. This is a big win, except for the poor souls who had to implement the underlying binding code in Pugs. (That took a while.)
  • Roles are fantastic. Sure, I believed in them already, but being able to use them without the hacks required in Perl 5 was even better.

Final Thoughts

Schwern and I did put a lot of thought into the Perl 5 redesign we never really did, and my code here really benefits from the lessons I learned from the previous version. Still, even though I wrote code to a moving project that didn't yet support all of the features I wanted, it was a great exercise. Test::Builder is simpler, shorter, cleaner, and more flexible; it's ready for everything the Perl 6 QA group can throw at it.

Test::Builder isn't the only Perl 5 module being ported to Perl 6. Other modules include ports of HTTP::Server::Simple, Net::IRC, LWP, and CGI. There are even ports underway for Catalyst and Maypole.

Perl 6 isn't ready yet, but it's closer every day. Now's a great time to port some of your code to see how Perl 6 is still Perlish, but a revolutionary step in refreshing new directions.

chromatic is the author of Modern Perl. In his spare time, he has been working on helping novices understand stocks and investing.

This Week in Perl 6, July 13-19, 2005


Welcome to another Perl 6 summary, brought to you by the words "displacement" and "activity." So far today, I've caught up with everything unread in NetNewsWire, my Flickr groups, every other mailing list I'm subscribed to, and completed about five Sudoku. Now I'm dragging out this introduction and I don't know why; I enjoy writing these things.

This Week in perl6-compiler

This was another quiet week on the list. However, you only have to watch the SVN commit log and the other stuff on PlanetSix to know that things are still proceeding apace. Last time I looked, it seemed that Stevan Little was working on bootstrapping the Perl 5 implementation of the Perl 6 MetaModel to implement it in terms of itself.

Rather mind-bogglingly, Pugs is now targeting JavaScript as well.

The current Pugs release is 6.2.8.

Creating Threads in BEGIN

Nicholas Clark posted what he described as a "note to collective self" wondering about how Perl 6 will cope with people creating threads inside of BEGIN blocks. According to Luke, "it won't." Larry thought that it might be okay to create threads at CHECK time, so long as any spawned threads didn't do any real work later than CHECK time.

Perl 6 Modules

Gav... (I presume the ellipsis is important to someone) wondered what he needed to do to write Perl 6 modules. Nathan Gray pointed him at the porting how-to in the Pugs distribution.

Is Namespace Qualification Really Required?

Phil Crow came across some weirdness with namespace resolution. It seems that you have to qualify function names explicitly in signatures. Autrijus agreed that it was a bug and asked for Phil to write a TODO test. Discussion ensued--I think the fix is in SVN now.

Parsing Perl 6 Grammars

Nathan Gray wondered about the state of Jeff "Japhy" Pinyan's effort to implement a Perl 6 rules parser. Japhy said that it's been on hold for a while, but that he'd started to work on it again, basing it on his earlier Regexp::Parser module.

Meanwhile, in perl6-internals

PMC Changes?

Nicholas Clark wondered if the PMC layout is likely to remain stable, or if there might be changes in relation to the generational garbage collector. In particular, he wanted to know if the API would remain stable. Leo thought that there might be changes in the layout, but the API shouldn't change.

ParTcl Accelerator

Will Coleda showed some timings for ParTcl, the Parrot implementation of Tcl, and made a few suggestions about how to get things going faster. Patrick and Leo mused on the issues involved.

Partitioning PMCs

Nicholas Clark had some questions about making PMCs and Ponie play well together, with particular reference to using SvFLAGS().

Embedding/Extending Interface

Nicholas Clark wondered if Chromatic was still waiting for confirmation that his automated embedding tools were the Right Thing. Apparently, Chromatic is waiting for confirmation, but offered to send his existing patch, if only to force the discussion.

Ponie Questions

Nicholas Clark had a bunch of questions about various subsystems, mostly in relation to Ponie. Leo came good with answers.

Parrot Project Management

I'm not sure if Will Coleda's suffering culture shock about the way Parrot project management happens, or if we're really not doing it right. The first rule of Parrot/Perl 6 development is that if you really want something, then the only way to guarantee that it gets done is to do it yourself. It's certainly worked for me over the years.

Tcl GC Issues--Solved

Matt Diephouse announced that as of r8617 in SVN, the longstanding GC bug that ParTcl occasionally tickled has been fixed. There was no rejoicing on the list, but at least one summarizer was really pleased to hear it.

GMC for Dummies

Summer of Code intern Alexandre Buisse, who is working on a new GC system for Parrot, pointed us all at an introduction to the Generational Mark and Compact scheme that he's working to implement. He and Leo had a discussion about implications, assumptions, and other stuff.

Bob Rogers asked some tricky questions relating to circular structures and timely destruction. Discussion of this continues.

Register Allocation Fun

There was a flurry of patches from Curtis Rawls, who appears to be working on refactoring and (one hopes) fixing the IMCC register allocator. Way to go, Curtis.

Meanwhile in perl6-language

MML Dispatch

The ongoing discussion of the right way to dispatch multimethods is still, um, going on. Damian and Luke appear to have a fundamental disagreement about what the Right Thing is. "Manhattan!" "Pure!" "Manhattan!"--it's not quite that bad, but they seem to have entrenched positions. Elsewhere in the thread, Larry mused on which was more general, classes or roles. Thomas Sandlaß wondered how they stood in relation to types.

Your summarizer wondered how he was ever going to explain all this and punted.

Method Calls on $self

My eyes, they burn! At this rate, I'm simply going to use $?SELF in all my Perl 6 classes. Larry's latest suggestion seems to please even fewer people than ./method, which is really saying something. As someone who's not a fan of ./, I found myself slightly surprised to agree with Autrijus, who reckons you get used to it really quickly.

The Perl 6 Library System

In response to a question from Autrijus about coderefs in @INC, or whatever Perl 6 is going to call it, Larry mused on the possible eventual design of Perl 6's library system. It seemed to me that he was dropping a rather heavy hint to any interested readers who might like to come up with a first cut of Perl 6's library system.

Later, he did some thinking aloud about treating strings as arrays, or vice versa.

Method Resolution Order

Stevan "MetaModel" Little cheered Larry's statement that methods, subs, submethods, and "anything in between" all live in the same namespace. If you want to give two code-like things the same name, then you must explicitly declare them as multi.

Stevan went on to ask a bunch of questions about the semantics of method resolution, so as to get the Perl 6 MetaModel working right. Discussion ensued.

Type::Class::Haskell Does Role

I haven't the faintest idea what Yuval Kogman is talking about. Dammit, I need to learn Haskell now. Luckily, Autrijus, Luke, David Formosa, and Damian did seem to understand it. There was ASCII art and everything. Sadly, there's no Unicode art, but it's only a matter of time.

Optimization Pipeline

Yuval Kogman posted an outline of the optimization pipeline idea that he'd brought up in a Hackathon. If confess that it looks rather like something discussed a few months (years?) ago that Chip shot down rather convincingly. (I remember this because I took pretty much the same position as Yuval, and I really didn't want to be convinced.)

STM Semantics, the Transactional Role

Yuval Kogman discussed some issues with Software Transactional Memory (STM). A short discussion ensued.

More Method Resolution Order Questions

Returning from reading up on method resolution orders and class precedence lists, Stevan Little had a pile of questions and suggestions about Perl 6's method resolution semantics. He pushed for using "C3" as Perl's algorithm of choice and is implementing it in Perl6::MetaModel until and unless @Larry decides differently. He's off to a flying start in that the One True Larry thinks it's a good idea.

Accessor-Only Virtual Attributes

Sam Vilain wondered what would happen if he made an "accessor" for an attribute that didn't really exist. He wanted to be able to disguise accessor methods as attributes within class and subclass scope (at least, I think that's what he wants). Larry seemed to think he was barking up the wrong tree--class attributes are only likely to be accessible using the $.whatever form within their declaring class and not any subclasses. Larry's "got some driving to do" so expect some more thoughts about this in the next summary.

Strange Interaction Between Pairs and Named Binding

Autrijus noted that, although

for [1..10].pairs -> Pair $x { say $x.value }

works,

for [1..10].pairs ->      $x { say $x.value }

doesn't, which is somewhat counter-intuitive. The problem is, the second cas treats the pair as a named argument specifier. After discussion, Autrijus suggested that the best thing might be to specify that the Bare code object (which includes pointy and non-pointy blocks) have Any as a default parameter type--essentially turning off the special behavior of pairs when calling named blocks. I'm all for this myself, but Larry has yet to speak.

How Do Subroutines Check Types?

Ingo Blechschmidt had some questions about specifying types in subroutine definitions. Specifically, he wanted to be able to specify that a sub only take instances of a class Foo and its subclasses but not the class Foo (or its subclasses) itself. Thomas Sandlaß thought that what Ingo wanted is the default behavior and you actually have to do some work to get it to behave any other way.

Referring to Package Variables in the Default Namespace

Matthew Hodgson asked for some clarification of how the default package namespace works. Apparently, Pugs and Synopsis 10 are slightly at odds. Larry had some answers. Matthew probably has some more questions.

Crikey! That Went Quickly

Or, for the traditionalists among you:

Acknowledgements, Adverts, Apologies and Alliteration

Hunting the Perfect Archive

I'm still on the lookout for a replacement for Google groups for my message links. I need an archive that's up to date with the lists, and has URLs that are easy to derive from Message-IDs. Bonus points for good thread handling.

Help Chip

Tell all your friends, this cannot stand.

The Usual Coda

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.

Or, you can check out my website. Maybe now I'm back writing stuff I'll start updating it. There are also vaguely pretty photos by me.

An Introduction to Test::MockDBI

Prelude

How do you test DBI programs:

  • Without having to modify your current program code or environment settings?
  • Without having to set up multiple test databases?
  • Without separating your test data from your test code?
  • With tests for every bizarre value your program will ever have to face?
  • With complete control over all database return values, along with all DBI method return values?
  • With an easy, regex-based rules interface?

You test with Test::MockDBI, that's how. Test::MockDBI provides all of this by using Test::MockObject::Extends to mock up the entire DBI API. Without a solution like Test::MockDBI--a solution that enables direct manipulation of the DBI--you'll have to trace DBI methods through a series of test databases.

You can make test databases work, but:

  • You'll need multiple (perhaps many) databases when you need multiple sets of mutually inconsistent values for complete test coverage.
  • Some DBI failure modes are impossible to generate through any test database.
  • Depending on the database toolset available, it may be difficult to insert all necessary test values--for example, Unicode values in ASCII applications, or bizarre file types in a document-manager application.
  • Test databases, by definition, are separate from their corresponding test code. This increases the chance that the test code and the test data will fall out of sync with each other.

Using Test::MockDBI avoids these problems. Read on to learn how Test::MockDBI eases the job of testing DBI applications.

A Mock Up of the Entire DBI

Test::MockDBI mocks up the entire DBI API by using Test::MockObject::Extends to substitute a Test::MockObject::Extends object in place of the DBI. A feature of this approach is that if the DBI API changes (and you use that change), you will notice during testing if you haven't upgraded Test::MockDBI, as your program will complain about missing DBI API method(s).

Mocking up the entire DBI means that you can add the DBI testing code into an existing application without changing the initial application code--using Test::MockDBI is entirely transparent to the rest of your application, as it neither knows nor cares that it's using Test::MockDBI in place of the DBI. This property of transparency is what drove me to develop Test::MockDBI, as it meant I could add the Test::MockDBI DBI testing code to existing client applications without modifying the existing code (handy, for us consultants).

Further enhancing Test::MockDBI's transparency is the DBI testing type class value. Testing is only enabled when the DBI testing type is non-zero, so you can just leave the DBI testing code additions in your production code--users will not even know about your DBI testing code unless you tell them.

Mocking up the entire DBI also means that you have complete control of the DBI's behavior during testing. Often, you can simulate a SELECT DBI transaction with a simple state machine that returns just a few rows from the (mocked up) database. Test::MockDBI lets you use a CODEREF to supply database return values, so you can easily put a simple state machine into the CODEREF to supply the necessary database values for testing. You could even put a delay loop into the CODEREF when you need to perform speed tests on your code.

Rules-Based DBI Testing

You control the mocked-up DBI of Test::MockDBI with one or more rules that you insert as Test::MockDBI method calls into your program. The default DBI method values provided by Test::MockDBI make the database appear to have a hole in the bottom of it--all method calls return OK, but you can't get any data out of the database. Rules for DBI methods that return database values (the fetch*() and select*() methods) can use either a value that they return directly for matching method calls, or a CODEREF called to provide a value each time that rule fires. A rule matches when its DBI testing type is the current testing type and the current SQL matches the rule's regular expression. Rules fire in the order in which you declare them, so usually you want to order your rules from most-specific to least-specific.

The DBI testing type is an unsigned integer matching /^d+$/. When the DBI testing type is zero, there will be no DBI testing (or at least, no mocked-up DBI testing) performed, and the program will use the DBI normally. A zero DBI testing type value in a rule means the rule could fire for any non-zero DBI testing type value--that is, zero is the wildcard DBI testing type value for rules. Set the DBI testing type either by a first command-line argument of the form:

--dbitest[=DTT]

where the optional DTT is the DBI testing type (defaulting to one), or through Test::MockDBI's set_dbi_test_type() method. Setting the DBI testing type through a first command-line argument has the advantage of requiring no modifications to the code under test, as this command-line processing is done so early (during BEGIN time for Test::MockDBI) that the code under test should be ignorant of whether this processing ever happened.

DBI Return Values

Test::MockDBI defaults to returning a success (true) value for all DBI method calls. This fits well with the usual techniques of DBI programming, where the first DBI error causes the program to stop what it is doing. Test::MockDBI's bad_method() method creates a rule that forces a failure return value on the specified DBI method when the current DBI testing type and SQL match those of the rule. Arbitrary DBI method return value failures like these are difficult (at best) to generate with a test database.

Test::MockDBI's set_retval_scalar() and set_retval_array() methods create rules for what database values to return. Set rules for scalar return values (arrayrefs and hashrefs) with set_retval_scalar() and for array return value rules with set_retval_array(). You can supply a value to be returned every time the rule matches, which is good when extracting single rows out of the database, such as configuration parameters. Alternatively, pass a CODEREF that will be called each time the rule fires to return a new value. Commonly, with SELECT statements, the DBI returns one or more rows, then returns an empty row to signify the end of the data. A CODEREF can incorporate a state machine that implements this "return 1+ rows, then a terminator" behavior quite easily. Having individual state machines for each rule is much easier to develop with than having one master state machine embedded into Test::MockDBI's core. (An early alpha of Test::MockDBI used the master state machine approach, so I have empirical evidence of this result--I am not emptily theorizing here.)

Depending on what tools you have for creating your test databases, it may be difficult to populate the test database with all of the values you need to test against. Although it is probably not so much the case today, only a few years ago populating a database with Unicode was difficult, given the national-charset-based tools of the day. Even today, a document management system might be difficult to populate with weird file types. Test::MockDBI makes these kinds of tests much easier to carry out, as you directly specify the data for the mock database to return rather than using a separate test database.

This ease of database value testing also applies when you need to test against combinations of database values that are unlikely to occur in practice (the old "comparing apples to battleships" problem). If you need to handle database value corruption--as in network problems causing the return of partial values from a Chinese database when the program is in the U.S.--this ability to completely specify the database return values could be invaluable in testing. Test::MockDBI lets you take complete control of your database return values without separating test code and test data.

Simplicity: Test::MockDBI's Standard-Output-Based Interface

This modern incarnation of the age-old stubbed-functions technique also uses the old technique of "printf() and scratch head" as its output interface. This being Perl we are working with, and not FORTRAN IV (thank goodness), we have multiple options beyond the use of unvarnished standard output.

One option that I think integrates well with DBI-using module testing is to redirect standard output into a string using IO::String. You can then match the string against the regex you are looking for. As you have already guessed, use of pure standard output integrates well with command-line program testing.

What you will look for, irrespective of where your code actually looks, is the output of each DBI method as it executes--the method name and arguments--along with anything else your code writes to standard output.

Bind Test Data to Test Code

Because DBI and database return values are bound to your test programs when using Test::MockDBI, there is less risk of test data getting out of sync with the test code. A separate test database introduces another point of failure in your testing process. Multiple test databases add yet another point of failure for each database. Whatever you use to generate the test databases also introduces another point of failure for each database. I can imagine cases where special-purpose programs for generating test databases might create multiple points of failure, especially if the programs have to integrate data from multiple sources to generate the test data (such as a VMS Bill of Materials database and a Solaris PCB CAD file for a test database generation program running on Linux).

One of the major advances in software engineering is the increasing ability to gather and control related information together--the 1990s advance of object-oriented programming in common languages is a testimony to this, from which we Perl programmers reap the benefits in our use of CPAN. For many testing purposes, there is no need for separate test databases. Without that need for a separate test database, separating test data from test code only complicates the testing process. Test::MockDBI lets you bind together your test code and test data into one nice, neat package. Binding is even closer than code and comments, as comments can get out of sync with their code, while the test code and test data for Test::MockDBI cannot get out of sync too far without causing their tests to fail unexpectedly.

When to Use Test::MockDBI

DBI's trace(), DBD::Mock, and Test::MockDBI are complementary solutions to the problem of testing DBI software. DBI's trace() is a pure tracing mechanism, as it does not change the data returned from the database or the DBI method return values. DBD::Mock works at level of a database driver, so you have to look at your DBI testing from the driver's point of view, rather than the DBI caller's point of view. DBD::Mock also requires that your code supports configurable DBI DSNs, which may not be the case in all circumstances, especially when you must maintain or enhance legacy DBI software.

Test::MockDBI works at the DBI caller's level, which is (IMHO) more natural for testing DBI-using software (possibly a matter of taste: TMTOWTDI). Test::MockDBI's interface with your DBI software is a set of easy-to-program, regex-based rules, which incorporate a lot of power into one or a few lines of code, thereby using Perl's built-in regex support to best advantage. This binds test data and test code tightly together, reducing the chance of synchronization problems between the test data and the test code. Using Test::MockDBI does not require modifying the current code of the DBI software being tested, as you only need additional code to enable Test::MockDBI-driven DBI testing.

Test::MockDBI takes additional coding effort when you need to test DBI program performance. It may be that for performance testing, you want to use test databases rather than Test::MockDBI. If you were in any danger of your copy of DBI.pm becoming corrupted, I don't know whether you could adequately test that condition with Test::MockDBI, depending on the corruption. You would probably have to create a special mock DBI to test corrupted DBI code handling, though you could start building the special mock DBI by inheriting from Test::MockDBI without any problems from Test::MockDBI's design, as it should be inheritance-friendly.

Some Examples

To make:

$dbh = DBI->connect("dbi:AZ:universe", "mortal", "(none)");

fail, add the rule:

$tmd->bad_method("connect", 1,
    "CONNECT TO dbi:AZ:universe AS mortal WITH \\(none\\)");

(where $tmd is the only Test::MockDBI object, which you obtain through Test::MockDBI's get_instance() method).

To make a SQL SELECT failure when using DBI::execute(), use the rule:

$tmd->bad_method("execute", 1,
    "SELECT zip_plus_4 from zipcodes where state='IN'");

This rule implies that:

  • The DBI::connect() succeeded().
  • The DBI::prepare() succeeded().
  • But the DBI::execute() failed as it should.

A common use of direct scalar return values is returning configuration data, such as a U.S. zip code for an address:

$tmd->set_retval_scalar(1,
 "zip5.*'IN'.*'NOBLESVILLE'.*'170 WESTFIELD RD'",
 [ 46062 ]);

This demonstrates using a regular expression, as matching SQL could then look like this:

SELECT
  zip5
FROM
  zipcodes
WHERE
  state='IN' AND
  city='NOBLESVILLE' AND
  street_address='170 WESTFIELD RD'

and the rule would match.

SELECTs that return one or more rows from the database are the common case:

my $counter = 0;                    # name counter
sub possibly_evil_names {
    $counter++;
    if ($counter == 1) {
        return ('Adolf', 'Germany');
    } elsif ($counter == 2) {
        return ('Josef', 'U.S.S.R.');
    } else {
        return ();
    }
}
$tmd->set_retval_array(1,
   "SELECT\\s+name,\\s+country.*possibly_evil_names",
   \&possibly_evil_names);

Using a CODEREF (\&possibly_evil_names) lets you easily add the state machine for implementing a return of two names followed by an empty array (because the code uses fetchrow_array() to retrieve each row). SQL for this query could look like:

SELECT
  name,
  country
FROM
  possibly_evil_names
WHERE
  year < 2000

Summary

Albert Einstein once said, "Everything should be made as simple as possible, but no simpler." This is what I have striven for while developing Test::MockDBI--the simplest possible useful module for testing DBI programs by mocking up the entire DBI.

Test::MockDBI gives you:

  • Complete control of DBI return values and database-returned data.
  • Returned database values from either direct value specifications or CODEREF-generated values.
  • Easy, regex-based rules that govern the DBI's behavior, along with intelligent defaults for the common cases.
  • Complete transparency to other code, so the code under test neither knows nor cares that you are testing it with Test::MockDBI.
  • Test data tightly bound to test code, which promotes cohesiveness in your testing environment, thereby reducing the chance that your tests might silently fail due to loss of synchronization between your test data and your test code.

Test::MockDBI is a valuable addition to the arsenal of DBI testing techniques.

This Week in Perl 6, July 5-12, 2005


All--

Welcome to another summary from the frog house, a house so green you can see it from outer space (according to Google Earth).

Perl 6 Compiler

Building Pugs Workaround

Sam Vilain posted a useful workaround to the error error: field `_crypt_struct' has incomplete type, which occurs on some systems. Fortunately, Salvador Ortiz Garcia found a fix.

Pugs, Pirate. Pirate, Pugs.

Autrijus began plotting with the Pirate folks. Thoughts include unifying PIL and PAST, or possibly retargeting PIL to PAST. Perhaps the result should be a more nautical dog. Maybe schipperke.

Implicit Invocants and Pain

Larry (as the summary will later explain) ruled that ./method was gone. He further ruled that .method would pitch fits at either compile or run time if $_ =:= $?SELF were false. Autrijus found this quite difficult to implement. Talk continues, and my instincts tell me that this too will pass, although Larry assures us that it is absolutely permanent for at least a week.

Parrot

Key Question

Klass-Jan Stol found that using a assigning a floating point value to a key and then using it makes Parrot segfault. Warnock applies.

Parrot Copyrights

Allison Randal hinted that the Perl Foundation has almost finished hammering out some legal stuff and there will soon be sweeping changes throughout the repository addressing copyright issues.

Character Classes in Globs

Will Coleda noted that Tcl would pass more tests if PGE supported character classes in globs. Patrick, unable to resist the siren call of passing tests, implemented it.

Amber for Parrot

Roger Browne announced that he had succeed in extracting viable DNA from a Parrot encased in amber since the Jurassic age. Either that or he released Amber version 0.2.2--I'm not sure which.

Leo's Branch

Leo has created a branch in SVN (branches/leo-ctx5) of his work implementing the new calling conventions. This led to some discussion of how to deal with optional arguments.

Leo's Branch Meets mod_parrot

Jeff Horwitz posted some observations and troubles he was having with Leo's branch of new calling conventions. Leo warned that the branch was still young, but would gladly take test cases.

Leo's Branch Meets PGE

After the initial discussion of optional parameters, Patrick updated the leo_ctx5 branch of PGE to the new calling conventions. All tests pass.

Get Onto the Bus

Matt Diephouse found a Bus Error when running languages/tcl/examples/bench.tcl. Warnock applies.

MinGW Patch Resurrection

François Perrad resurrected a patch from mid-June with a set of action items. Warnock applies.

Scared Parrots Like Scheme

Joh Lenz posted an announcement that he had an alpha version of Chicken (a Scheme to C compiler) backending to Parrot. Leo provided answers to some of his questions.

Bytecode Vs. PMCs

Matt Diephouse posted a list of questions about the place of PMCs. Some of the core tradeoffs include maintainability, portability, optimization, duplicate implementations, and security.

make svnclean

Leo pointed out that someone removed make svnclean, but that he found it useful. Chip suggested renaming it make svnclobber, as it does more than just clean.

pmc2c.pl Bug

Nicholas Clark found a bug in the shortcut to avoid writing a PMC dump file. Warnock applies.

Define cache

Nicholas Clark suggested that it was probably not wise to #define cache. They removed it.

Parrots Getting Smarter

Leo pointed out that at least one parrot understood the concept of zero, putting it some distance ahead of Romans when it comes to math. Once the Parrots start to grow opposable thumbs, I will welcome our new Parrot overlords.

Leo's Branch Meets Exceptions

Leo posted two suggestions for how the new calling conventions could interact with exceptions. Autrijus liked the idea of unifying exception handlers with the rest of calls and returns.

Control Flow Graph Bugs

Curtis Rawls noted what he thought might be a bug in the compute_dominators function. Leo confirmed that it was likely a bug. Later he posted a note saying he was working on a new implementation for some of the CFG algorithms. He asked for a hand, but Warnock applied. Actually, I think I have looked at that code before. I would help be happy to take a look, Curtis.

TODO: Steal Good Ideas from Dan

Will Coleda opened a ticket suggesting that we open tickets based on some of Dan's latest posts to Squaks of the Parrot. Remember: "talent imitates, but genius steals."

Punie

Allison Randal wants to add Punie (a Perl 1 compiler) to SVN. Response was positive.

Mobilizing PM groups

Will Coleda wondered if there had been any work mobilizing Perl Monger groups for the good of Parrot. Maybe I should finally look up the Cambridge or Boston PM group.

Perl 6 Language

As usual in p6l land, there are a couple of really long threads. As usual in p6summarizer land, they will get short summaries. Odd how that happens.

Conflicting Autogenerated Accessors

Last week, Stevan Little wondered what would happen with conflicting autogenerated accessor names. Larry said they would carp as soon as they were discovered.

DBI v2

The first really long thread has to do with the next version of DBI. I am not really a database person, but apparently those who are have strong opinions.

Time::Local

The next really long thread has to do with the next version of Time::Local. I am not really a Time person, but apparently those who are have strong opinions.

Submethods

Stevan Little and Larry Wall talked about submethods, their purpose, and their interaction with the metamodel. I must say that I have only partially wrapped my head around metamodels at all.

SMD Considered Harmful?

Last week, Yuval Kogman conjectured that MMD should be the one true MD, as it allowed nifty extensibility. This week, Stuart Cook offered a sort of compromise. I rather like Stuart's compromise.

Dependency Injection

Piers wants to be able to have classes that inject themselves in correctly at use time, based on what is actually used. Larry commented, but one quote really caught my attention: "Globals are bad only if you use them to hold non-global values." <off-topic>There is an important lesson imbedded in that quote. We really should learn rules not to follow them blindly, but so that we understand the spirit behind them and respect that instead. Not that I have had screaming matches with any programmers who blindly eschew globals and gotos without understanding why. </off-topic> That was more of a rant then just off-topic. Oh well.

File.seek Interface

Wolverian wondered what the seek interface would look like for handles. Larry likes the idea of it working entirely through opaque position objects using ` to specify units.

Perl 6/Perl 5 ~ 82%

Michael Hendricks noticed that (according to Text::TypingEffort). Perl 6 requires 18 percent less effort than Perl 5. He suggests that this is a bad thing for the community's waistline. I conjecture that Perl developers will use the extra time they save for activities such as running and canoeing, and as a result will paradoxically lose weight from expending less effort at work.

Creating Value Types

Ingo Blechschmidt wondered how to create a value type. Luke Palmer suggested using an is value trait. He then went on to speculate about mutating value traits and COW semantics. Larry though that perhaps an is copy trait was called for. Oooh, a preposition at the end of a sentence; makes me want to occasionally split infinitives.

OO .isa

Ingo Blechschmidt viciously lied when claiming to post a "quick" isa question. This quickly went the way of the meta object. I think I mentioned my take on those earlier (powerful, but ow).

Method Call on Invocant

The last really long thread has to do with the next chapter in the "method call on self" saga. I am a bit of a "method call on self" person, and apparently those who are have strong opinions. It's worth noting this time that Larry updated the current state of the world. Now ./method is gone and .method only works when $_ =:= $?SELF.

use and require Question

Ingo Blechschmidt wondered what use and require actually do. Gall Yahas suggested they return the last thing in the used/required file. Larry agreed and held that they would also return the same thing. He also warned that %INC would probably work differently in Perl 6.

User-Defined infix

Autrijus wondered if an method infix:<===> would need marking as export for a script that uses it to get the method. Larry explained that the method infix:<===> would be available by name, but would have to be marked as export for the syntactic sugar of A === B to work.

Hackathon Notes

Autrijus posted a link to his Hackathon notes. This spawned several threads. David Storrs tried to convince people to change subject lines, with limited success. Much of the discussion focused on MMD and how confusing it was. Damian Conway posted his set of three rules that would prevent AIs from harming humans and his set of eight rules that would prevent MMDs from harming programmers, proving that MMDs are almost three times more dangerous than AIs.

Raw Binary Data

David Formosa wanted to play with raw binary data in Perl 6. I try and avoid raw things except sushi. Larry suggested that this would just be a string with its maximum abstraction level set to bytes.

The Usual Footer

To post to any of these mailing lists please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to

Ten Essential Development Practices


The following ten tips come from Perl Best Practices, a new book of Perl coding and development guidelines by Damian Conway.

1. Design the Module's Interface First

The most important aspect of any module is not how it implements the facilities it provides, but the way in which it provides those facilities in the first place. If the module's API is too awkward, or too complex, or too extensive, or too fragmented, or even just poorly named, developers will avoid using it. They'll write their own code instead. In that way, a poorly designed module can actually reduce the overall maintainability of a system.

Designing module interfaces requires both experience and creativity. Perhaps the easiest way to work out how an interface should work is to "play test" it: to write examples of code that will use the module before implementing the module itself. These examples will not be wasted when the design is complete. You can usually recycle them into demos, documentation examples, or the core of a test suite.

O'Reilly Open Source Convention 2005.

The key, however, is to write that code as if the module were already available, and write it the way you'd most like the module to work.

Once you have some idea of the interface you want to create, convert your "play tests" into actual tests (see Tip #2). Then it's just a Simple Matter Of Programming to make the module work the way that the code examples and the tests want it to.

Of course, it may not be possible for the module to work the way you'd most like, in which case attempting to implement it that way will help you determine what aspects of your API are not practical, and allow you to work out what might be an acceptable alternative.

2. Write the Test Cases Before the Code

Probably the single best practice in all of software development is writing your test suite first.

A test suite is an executable, self-verifying specification of the behavior of a piece of software. If you have a test suite, you can--at any point in the development process--verify that the code works as expected. If you have a test suite, you can--after any changes during the maintenance cycle--verify that the code still works as expected.

Write the tests first. Write them as soon as you know what your interface will be (see #1). Write them before you start coding your application or module. Unless you have tests, you have no unequivocal specification of what the software should do, and no way of knowing whether it does it.

Writing tests always seems like a chore, and an unproductive chore at that: you don't have anything to test yet, so why write tests? Yet most developers will--almost automatically--write driver software to test their new module in an ad hoc way:

> cat try_inflections.pl

# Test my shiny new English inflections module...

use Lingua::EN::Inflect qw( inflect );

# Try some plurals (both standard and unusual inflections)...

my %plural_of = (
   'house'         => 'houses',
   'mouse'         => 'mice',
   'box'           => 'boxes',
   'ox'            => 'oxen',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law',
);
 
# For each of them, print both the expected result and the actual inflection...

for my $word ( keys %plural_of ) {
   my $expected = $plural_of{$word};
   my $computed = inflect( "PL_N($word)" );
 
   print "For $word:\n", 
         "\tExpected: $expected\n",
         "\tComputed: $computed\n";
}

A driver like that is actually harder to write than a test suite, because you have to worry about formatting the output in a way that is easy to read. It's also much harder to use the driver than it would be to use a test suite, because every time you run it you have to wade though that formatted output and verify "by eye" that everything is as it should be. That's also error-prone; eyes are not optimized for picking out small differences in the middle of large amounts of nearly identical text.

Instead of hacking together a driver program, it's easier to write a test program using the standard Test::Simple module. Instead of print statements showing what's being tested, you just write calls to the ok() subroutine, specifying as its first argument the condition under which things are okay, and as its second argument a description of what you're actually testing:

> cat inflections.t

use Lingua::EN::Inflect qw( inflect);

use Test::Simple qw( no_plan);

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law',
);

for my $word ( keys %plural_of ) {
   my $expected = $plural_of{$word};
   my $computed = inflect( "PL_N($word)" );

   ok( $computed eq $expected, "$word -> $expected" );
}

Note that this code loads Test::Simple with the argument qw( no_plan ). Normally that argument would be tests => count, indicating how many tests to expect, but here the tests are generated from the %plural_of table at run time, so the final count will depend on how many entries are in that table. Specifying a fixed number of tests when loading the module is useful if you happen know that number at compile time, because then the module can also "meta-test:" verify that you carried out all the tests you expected to.

The Test::Simple program is slightly more concise and readable than the original driver code, and the output is much more compact and informative:

> perl inflections.t

ok 1 - house -> houses
ok 2 - law -> laws
not ok 3 - mongoose -> mongooses
#     Failed test (inflections.t at line 21)
ok 4 - goose -> geese
ok 5 - ox -> oxen
not ok 6 - mother-in-law -> mothers-in-law
#     Failed test (inflections.t at line 21)
ok 7 - mouse -> mice
ok 8 - box -> boxes
1..8
# Looks like you failed 2 tests of 8. 

More importantly, this version requires far less effort to verify the correctness of each test. You just scan down the left margin looking for a not and a comment line.

You might prefer to use the Test::More module instead of Test::Simple. Then you can specify the actual and expected values separately, by using the is() subroutine, rather than ok():

use Lingua::EN::Inflect qw( inflect );
use Test::More qw( no_plan ); # Now using more advanced testing tools

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law',
);

for my $word ( keys %plural_of ) {
   my $expected = $plural_of{$word};
   my $computed = inflect( "PL_N($word)" );

   # Test expected and computed inflections for string equality...
   is( $computed, $expected, "$word -> $expected" );
}

Apart from no longer having to type the eq yourself, this version also produces more detailed error messages:

> perl inflections.t

ok 1 - house -> houses
ok 2 - law -> laws
not ok 3 - mongoose -> mongooses
#     Failed test (inflections.t at line 20)
#          got: 'mongeese'
#     expected: 'mongooses'
ok 4 - goose -> geese
ok 5 - ox -> oxen
not ok 6 - mother-in-law -> mothers-in-law
#     Failed test (inflections.t at line 20)
#          got: 'mothers-in-laws'
#     expected: 'mothers-in-law'
ok 7 - mouse -> mice
ok 8 - box -> boxes
1..8
# Looks like you failed 2 tests of 8.

The Test::Tutorial documentation that comes with Perl 5.8 provides a gentle introduction to both Test::Simple and Test::More.

3. Create Standard POD Templates for Modules and Applications

One of the main reasons documentation can often seem so unpleasant is the "blank page effect." Many programmers simply don't know how to get started or what to say.

Perhaps the easiest way to make writing documentation less forbidding (and hence, more likely to actually occur) is to circumvent that initial empty screen by providing a template that developers can cut and paste into their code.

For a module, that documentation template might look something like this:

=head1 NAME

<Module::Name> - <One-line description of module's purpose>

=head1 VERSION

The initial template usually just has:

This documentation refers to <Module::Name> version 0.0.1.

=head1 SYNOPSIS

   use <Module::Name>;

   # Brief but working code example(s) here showing the most common usage(s)
   # This section will be as far as many users bother reading, so make it as
   # educational and exemplary as possible.

=head1 DESCRIPTION

A full description of the module and its features.

May include numerous subsections (i.e., =head2, =head3, etc.).

=head1 SUBROUTINES/METHODS

A separate section listing the public components of the module's interface.

These normally consist of either subroutines that may be exported, or methods
that may be called on objects belonging to the classes that the module
provides.

Name the section accordingly.

In an object-oriented module, this section should begin with a sentence (of the
form "An object of this class represents ...") to give the reader a high-level
context to help them understand the methods that are subsequently described.

=head1 DIAGNOSTICS

A list of every error and warning message that the module can generate (even
the ones that will "never happen"), with a full explanation of each problem,
one or more likely causes, and any suggested remedies.

=head1 CONFIGURATION AND ENVIRONMENT

A full explanation of any configuration system(s) used by the module, including
the names and locations of any configuration files, and the meaning of any
environment variables or properties that can be set. These descriptions must
also include details of any configuration language used.

=head1 DEPENDENCIES

A list of all of the other modules that this module relies upon, including any
restrictions on versions, and an indication of whether these required modules
are part of the standard Perl distribution, part of the module's distribution,
or must be installed separately.

=head1 INCOMPATIBILITIES

A list of any modules that this module cannot be used in conjunction with.
This may be due to name conflicts in the interface, or competition for system
or program resources, or due to internal limitations of Perl (for example, many
modules that use source code filters are mutually incompatible).

=head1 BUGS AND LIMITATIONS

A list of known problems with the module, together with some indication of
whether they are likely to be fixed in an upcoming release.

Also, a list of restrictions on the features the module does provide: data types
that cannot be handled, performance issues and the circumstances in which they
may arise, practical limitations on the size of data sets, special cases that
are not (yet) handled, etc.

The initial template usually just has:

There are no known bugs in this module.

Please report problems to <Maintainer name(s)> (<contact address>)

Patches are welcome.

=head1 AUTHOR

<Author name(s)>  (<contact address>)

=head1 LICENSE AND COPYRIGHT

Copyright (c) <year> <copyright holder> (<contact address>).
All rights reserved.

followed by whatever license you wish to release it under.

For Perl code that is often just:

This module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself. See L<perlartistic>.  This program is
distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

Of course, the specific details that your templates provide may vary from those shown here, according to your other coding practices. The most likely variation will be in the license and copyright, but you may also have specific in-house conventions regarding version numbering, the grammar of diagnostic messages, or the attribution of authorship.

4. Use a Revision Control System

Maintaining control over the creation and modification of your source code is utterly essential for robust team-based development. And not just over source code: you should be revision controlling your documentation, and data files, and document templates, and makefiles, and style sheets, and change logs, and any other resources your system requires.

Just as you wouldn't use an editor without an Undo command or a word processor that can't merge documents, so too you shouldn't use a file system you can't rewind, or a development environment that can't integrate the work of many contributors.

Programmers make mistakes, and occasionally those mistakes will be catastrophic. They will reformat the disk containing the most recent version of the code. Or they'll mistype an editor macro and write zeros all through the source of a critical core module. Or two developers will unwittingly edit the same file at the same time and half their changes will be lost. Revision control systems can prevent those kinds of problems.

Moreover, occasionally the very best debugging technique is to just give up, stop trying to get yesterday's modifications to work correctly, roll the code back to a known stable state, and start over again. Less drastically, comparing the current condition of your code with the most recent stable version from your repository (even just a line-by-line diff) can often help you isolate your recent "improvements" and work out which of them is the problem.

Revision control systems such as RCS, CVS, Subversion, Monotone, darcs, Perforce, GNU arch, or BitKeeper can protect against calamities, and ensure that you always have a working fallback position if maintenance goes horribly wrong. The various systems have different strengths and limitations, many of which stem from fundamentally different views on what exactly revision control is. It's a good idea to audition the various revision control systems, and find the one that works best for you. Pragmatic Version Control Using Subversion, by Mike Mason (Pragmatic Bookshelf, 2005) and Essential CVS, by Jennifer Vesperman (O'Reilly, 2003) are useful starting points.

5. Create Consistent Command-Line Interfaces

Command-line interfaces have a strong tendency to grow over time, accreting new options as you add features to the application. Unfortunately, the evolution of such interfaces is rarely designed, managed, or controlled, so the set of flags, options, and arguments that a given application accepts are likely to be ad hoc and unique.

This also means they're likely to be inconsistent with the unique ad hoc sets of flags, options, and arguments that other related applications provide. The result is inevitably a suite of programs, each of which is driven in a distinct and idiosyncratic way. For example:

> orchestrate source.txt -to interim.orc

> remonstrate +interim.rem -interim.orc 

> fenestrate  --src=interim.rem --dest=final.wdw
Invalid input format

> fenestrate --help
Unknown option: --help.
Type 'fenestrate -hmo' for help

Here, the orchestrate utility expects its input file as its first argument, while the -to flag specifies its output file. The related remonstrate tool uses -infile and +outfile options instead, with the output file coming first. The fenestrate program seems to require GNU-style "long options:" --src=infile and --dest=outfile, except, apparently, for its oddly named help flag. All in all, it's a mess.

When you're providing a suite of programs, all of them should appear to work the same way, using the same flags and options for the same features across all applications. This enables your users to take advantage of existing knowledge--instead of continually asking you.

Those three programs should work like this:

> orchestrate -i source.txt -o dest.orc

> remonstrate -i source.orc -o dest.rem

> fenestrate  -i source.rem -o dest.wdw
Input file ('source.rem') not a valid Remora file
(type "fenestrate --help" for help)

> fenestrate --help
fenestrate - convert Remora .rem files to Windows .wdw format
Usage: fenestrate [-i <infile>] [-o <outfile>] [-cstq] [-h|-v]
Options:
   -i <infile> Specify input source [default: STDIN]
   -o <outfile> Specify output destination [default: STDOUT]
   -c Attempt to produce a more compact representation
   -h Use horizontal (landscape) layout
   -v Use vertical (portrait) layout
   -s Be strict regarding input
   -t Be extra tolerant regarding input
   -q Run silent
   --version Print version information
   --usage Print the usage line of this summary
   --help Print this summary
   --man Print the complete manpage

Here, every application that takes input and output files uses the same two flags to do so. A user who wants to use the substrate utility (to convert that final .wdw file to a subroutine) is likely to be able to guess correctly the required syntax:

> substrate  -i dest.wdw -o dest.sub

Anyone who can't guess that probably can guess that:

> substrate --help

is likely to render aid and comfort.

A large part of making interfaces consistent is being consistent in specifying the individual components of those interfaces. Some conventions that may help to design consistent and predictable interfaces include:

  • Require a flag preceding every piece of command-line data, except filenames.

    Users don't want to have to remember that your application requires "input file, output file, block size, operation, fallback strategy," and requires them in that precise order:

    > lustrate sample_data proc_data 1000 normalize log

    They want to be able to say explicitly what they mean, in any order that suits them:

    > lustrate sample_data proc_data -op=normalize -b1000 --fallback=log
  • Provide a flag for each filename, too, especially when a program can be given files for different purposes.

    Users might also not want to remember the order of the two positional filenames, so let them label those arguments as well, and specify them in whatever order they prefer:

    > lustrate -i sample_data -op normalize -b1000 --fallback log -o proc_data
  • Use a single - prefix for short-form flags, up to three letters (-v, -i, -rw, -in, -out).

    Experienced users appreciate short-form flags as a way of reducing typing and limiting command-line clutter. Don't make them type two dashes in these shortcuts.

  • Use a double -- prefix for longer flags (--verbose, --interactive, --readwrite, --input, --output).

    Flags that are complete words improve the readability of a command line (in a shell script, for example). The double dash also helps to distinguish between the longer flag name and any nearby file names.

  • If a flag expects an associated value, allow an optional = between the flag and the value.

    Some people prefer to visually associate a value with its preceding flag:

    > lustrate -i=sample_data -op=normalize -b=1000 --fallback=log -o=proc_data

    Others don't:

    > lustrate -i sample_data -op normalize -b1000 --fallback log -o proc_data

    Still others want a bit each way:

    > lustrate -i sample_data -o proc_data -op=normalize -b=1000 --fallback=log

    Let the user choose.

  • Allow single-letter options to be "bundled" after a single dash.

    It's irritating to have to type repeated dashes for a series of flags:

    > lustrate -i sample_data -v -l -x

    Allow experienced users to also write:

    > lustrate -i sample_data -vlx
  • Provide a multi-letter version of every single-letter flag.

    Short-form flags may be nice for experienced users, but they can be troublesome for new users: hard to remember and even harder to recognize. Don't force people to do either. Give them a verbose alternative to every concise flag; full words that are easier to remember, and also more self-documenting in shell scripts.

  • Always allow - as a special filename.

    A widely used convention is that a dash (-) where an input file is expected means "read from standard input," and a dash where an output file is expected means "write to standard output."

  • Always allow -- as a file list marker.

    Another widely used convention is that the appearance of a double dash (--) on the command line marks the end of any flagged options, and indicates that the remaining arguments are a list of filenames, even if some of them look like flags.

6. Agree Upon a Coherent Layout Style and Automate It with perltidy

Formatting. Indentation. Style. Code layout. Whatever you choose to call it, it's one of the most contentious aspects of programming discipline. More and bloodier wars have been fought over code layout than over just about any other aspect of coding.

What is the best practice here? Should you use classic Kernighan and Ritchie style? Or go with BSD code formatting? Or adopt the layout scheme specified by the GNU project? Or conform to the Slashcode coding guidelines?

Of course not! Everyone knows that <insert your personal coding style here> is the One True Layout Style, the only sane choice, as ordained by <insert your favorite Programming Deity here> since Time Immemorial! Any other choice is manifestly absurd, willfully heretical, and self-evidently a Work of Darkness!

That's precisely the problem. When deciding on a layout style, it's hard to decide where rational choices end and rationalized habits begin.

Adopting a coherently designed approach to code layout, and then applying that approach consistently across all your coding, is fundamental to best-practice programming. Good layout can improve the readability of a program, help detect errors within it, and make the structure of your code much easier to comprehend. Layout matters.

However, most coding styles--including the four mentioned earlier--confer those benefits almost equally well. While it's true that having a consistent code layout scheme matters very much indeed, the particular code layout scheme you ultimately decide upon does not matter at all! All that matters is that you adopt a single, coherent style; one that works for your entire programming team, and, having agreed upon that style, that you then apply it consistently across all your development.

In the long term, it's best to train yourself and your team to code in a consistent, rational, and readable style. However, the time and commitment necessary to accomplish that isn't always available. In such cases, a reasonable compromise is to prescribe a standard code-formatting tool that must be applied to all code before it's committed, reviewed, or otherwise displayed in public.

There is now an excellent code formatter available for Perl: perltidy. It provides an extensive range of user-configurable options for indenting, block delimiter positioning, column-like alignment, and comment positioning.

Using perltidy, you can convert code like this:

if($sigil eq '$'){
   if($subsigil eq '?'){ 
       $sym_table{substr($var_name,2)}=delete $sym_table{locate_orig_var($var)};
       $internal_count++;$has_internal{$var_name}++
   } else {
       ${$var_ref} =
           q{$sym_table{$var_name}}; $external_count++; $has_external{$var_name}++;
}} elsif ($sigil eq '@'&&$subsigil eq '?') {
   @{$sym_table{$var_name}} = grep
       {defined $_} @{$sym_table{$var_name}};
} elsif ($sigil eq '%' && $subsigil eq '?') {
delete $sym_table{$var_name}{$EMPTY_STR}; } else
{
${$var_ref}
=
q{$sym_table{$var_name}}
}

into something readable:

if ( $sigil eq '$' ) {
   if ( $subsigil eq '?' ) {
       $sym_table{ substr( $var_name, 2 ) }
           = delete $sym_table{ locate_orig_var($var) };
       $internal_count++;
       $has_internal{$var_name}++;
   }
   else {
       ${$var_ref} = q{$sym_table{$var_name}};
       $external_count++;
       $has_external{$var_name}++;
   }
}
elsif ( $sigil eq '@' && $subsigil eq '?' ) {
   @{ $sym_table{$var_name} }
       = grep {defined $_} @{ $sym_table{$var_name} };
}
elsif ( $sigil eq '%' && $subsigil eq '?' ) {
   delete $sym_table{$var_name}{$EMPTY_STR};
}
else {
   ${$var_ref} = q{$sym_table{$var_name}};
}

Mandating that everyone use a common tool to format their code can also be a simple way of sidestepping the endless objections, acrimony, and dogma that always surround any discussion on code layout. If perltidy does all the work for them, then it will cost developers almost no effort to adopt the new guidelines. They can simply set up an editor macro that will "straighten" their code whenever they need to.

7. Code in Commented Paragraphs

A paragraph is a collection of statements that accomplish a single task: in literature, it's a series of sentences conveying a single idea; in programming, a series of instructions implementing a single step of an algorithm.

Break each piece of code into sequences that achieve a single task, placing a single empty line between each sequence. To further improve the maintainability of the code, place a one-line comment at the start of each such paragraph, describing what the sequence of statements does. Like so:

# Process an array that has been recognized...
sub addarray_internal {
   my ($var_name, $needs_quotemeta) = @_;

   # Cache the original...
   $raw .= $var_name;

   # Build meta-quoting code, if requested...
   my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;

   # Expand elements of variable, conjoin with ORs...
   my $perl5pat = qq{(??{join q{|}, $quotemeta \@{$var_name}})};

   # Insert debugging code if requested...
   my $type = $quotemeta ? 'literal' : 'pattern';
   debug_now("Adding $var_name (as $type)");
   add_debug_mesg("Trying $var_name (as $type)");

   return $perl5pat;
}

Paragraphs are useful because humans can focus on only a few pieces of information at once. Paragraphs are one way of aggregating small amounts of related information, so that the resulting "chunk" can fit into a single slot of the reader's limited short-term memory. Paragraphs enable the physical structure of a piece of writing to reflect and emphasize its logical structure.

Adding comments at the start of each paragraph further enhances the chunking by explicitly summarizing the purpose of each chunk (note: the purpose, not the behavior). Paragraph comments need to explain why the code is there and what it achieves, not merely paraphrase the precise computational steps it's performing.

Note, however, that the contents of paragraphs are only of secondary importance here. It is the vertical gaps separating each paragraph that are critical. Without them, the readability of the code declines dramatically, even if the comments are retained:

sub addarray_internal {
   my ($var_name, $needs_quotemeta) = @_;
   # Cache the original...
   $raw .= $var_name;
   # Build meta-quoting code, if required...
   my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;
   # Expand elements of variable, conjoin with ORs...
   my $perl5pat = qq{(??{join q{|}, $quotemeta \@{$var_name}})};
   # Insert debugging code if requested...
   my $type = $quotemeta ? 'literal' : 'pattern';
   debug_now("Adding $var_name (as $type)");
   add_debug_mesg("Trying $var_name (as $type)");
   return $perl5pat;
}

8. Throw Exceptions Instead of Returning Special Values or Setting Flags

Returning a special error value on failure, or setting a special error flag, is a very common error-handling technique. Collectively, they're the basis for virtually all error notification from Perl's own built-in functions. For example, the built-ins eval, exec, flock, open, print, stat, and system all return special values on error. Unfortunately, they don't all use the same special value. Some of them also set a flag on failure. Sadly, it's not always the same flag. See the perlfunc manpage for the gory details.

Apart from the obvious consistency problems, error notification via flags and return values has another serious flaw: developers can silently ignore flags and return values, and ignoring them requires absolutely no effort on the part of the programmer. In fact, in a void context, ignoring return values is Perl's default behavior. Ignoring an error flag that has suddenly appeared in a special variable is just as easy: you simply don't bother to check the variable.

Moreover, because ignoring a return value is the void-context default, there's no syntactic marker for it. There's no way to look at a program and immediately see where a return value is deliberately being ignored, which means there's also no way to be sure that it's not being ignored accidentally.

The bottom line: regardless of the programmer's (lack of) intention, an error indicator is being ignored. That's not good programming.

Ignoring error indicators frequently causes programs to propagate errors in entirely the wrong direction. For example:

# Find and open a file by name, returning the filehandle
# or undef on failure...
sub locate_and_open {
   my ($filename) = @_;

   # Check acceptable directories in order...
   for my $dir (@DATA_DIRS) {
       my $path = "$dir/$filename";

       # If file exists in an acceptable directory, open and return it...
       if (-r $path) {
           open my $fh, '<', $path;
           return $fh;
       }
   }

   # Fail if all possible locations tried without success...
   return;
}

# Load file contents up to the first <DATA/> marker...
sub load_header_from {
   my ($fh) = @_;

   # Use DATA tag as end-of-"line"...
   local $/ = '<DATA/>';

   # Read to end-of-"line"...
   return <$fh>;
}

# and later...
for my $filename (@source_files) {
   my $fh = locate_and_open($filename);
   my $head = load_header_from($fh);
   print $head;
}

The locate_and_open() subroutine simply assumes that the call to open works, immediately returning the filehandle ($fh), whatever the actual outcome of the open. Presumably, the expectation is that whoever calls locate_and_open() will check whether the return value is a valid filehandle.

Except, of course, "whoever" doesn't check. Instead of testing for failure, the main for loop takes the failure value and immediately propagates it "across" the block, to the rest of the statements in the loop. That causes the call to loader_header_from() to propagate the error value "downwards." It's in that subroutine that the attempt to treat the failure value as a filehandle eventually kills the program:

readline() on unopened filehandle at demo.pl line 28.

Code like that--where an error is reported in an entirely different part of the program from where it actually occurred--is particularly onerous to debug.

Of course, you could argue that the fault lies squarely with whoever wrote the loop, for using locate_and_open() without checking its return value. In the narrowest sense, that's entirely correct--but the deeper fault lies with whoever actually wrote locate_and_open() in the first place, or at least, whoever assumed that the caller would always check its return value.

Humans simply aren't like that. Rocks almost never fall out of the sky, so humans soon conclude that they never do, and stop looking up for them. Fires rarely break out in their homes, so humans soon forget that they might, and stop testing their smoke detectors every month. In the same way, programmers inevitably abbreviate "almost never fails" to "never fails," and then simply stop checking.

That's why so very few people bother to verify their print statements:

if (!print 'Enter your name: ') {
   print {*STDLOG} warning => 'Terminal went missing!'
}

It's human nature to "trust but not verify."

Human nature is why returning an error indicator is not best practice. Errors are (supposed to be) unusual occurrences, so error markers will almost never be returned. Those tedious and ungainly checks for them will almost never do anything useful, so eventually they'll be quietly omitted. After all, leaving the tests off almost always works just fine. It's so much easier not to bother. Especially when not bothering is the default!

Don't return special error values when something goes wrong; throw an exception instead. The great advantage of exceptions is that they reverse the usual default behaviors, bringing untrapped errors to immediate and urgent attention. On the other hand, ignoring an exception requires a deliberate and conspicuous effort: you have to provide an explicit eval block to neutralize it.

The locate_and_open() subroutine would be much cleaner and more robust if the errors within it threw exceptions:

# Find and open a file by name, returning the filehandle
# or throwing an exception on failure...
sub locate_and_open {
   my ($filename) = @_;

   # Check acceptable directories in order...
   for my $dir (@DATA_DIRS) {
       my $path = "$dir/$filename";

       # If file exists in acceptable directory, open and return it...
       if (-r $path) {
           open my $fh, '<', $path
               or croak( "Located $filename at $path, but could not open");
           return $fh;
       }
   }

   # Fail if all possible locations tried without success...
   croak( "Could not locate $filename" );
}

# and later...
for my $filename (@source_files) {
   my $fh = locate_and_open($filename);
   my $head = load_header_from($fh);
   print $head;
}

Notice that the main for loop didn't change at all. The developer using locate_and_open() still assumes that nothing can go wrong. Now there's some justification for that expectation, because if anything does go wrong, the thrown exception will automatically terminate the loop.

Exceptions are a better choice even if you are the careful type who religiously checks every return value for failure:

SOURCE_FILE:
for my $filename (@source_files) {
   my $fh = locate_and_open($filename);
   next SOURCE_FILE if !defined $fh;
   my $head = load_header_from($fh);
   next SOURCE_FILE if !defined $head;
   print $head;
}

Constantly checking return values for failure clutters your code with validation statements, often greatly decreasing its readability. In contrast, exceptions allow an algorithm to be implemented without having to intersperse any error-handling infrastructure at all. You can factor the error-handling out of the code and either relegate it to after the surrounding eval, or else dispense with it entirely:

for my $filename (@directory_path) {

   # Just ignore any source files that don't load...
   eval {
       my $fh = locate_and_open($filename);
       my $head = load_header_from($fh);
       print $head;
   }
}

9. Add New Test Cases Before you Start Debugging

The first step in any debugging process is to isolate the incorrect behavior of the system, by producing the shortest demonstration of it that you reasonably can. If you're lucky, this may even have been done for you:

To: DCONWAY@cpan.org
From: sascha@perlmonks.org
Subject: Bug in inflect module

Zdravstvuite,

I have been using your Lingua::EN::Inflect module to normalize terms in a
data-mining application I am developing, but there seems to be a bug in it,
as the following example demonstrates:

   use Lingua::EN::Inflect qw( PL_N );
   print PL_N('man'), "\n";       # Prints "men", as expected
   print PL_N('woman'), "\n";     # Incorrectly prints "womans"

Once you have distilled a short working example of the bug, convert it to a series of tests, such as:

use Lingua::EN::Inflect qw( PL_N );
use Test::More qw( no_plan );
is(PL_N('man') ,  'men', 'man -> men'     );
is(PL_N('woman'), 'women', 'woman -> women' );

Don't try to fix the problem straight away, though. Instead, immediately add those tests to your test suite. If that testing has been well set up, that can often be as simple as adding a couple of entries to a table:

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law', 

   # Sascha's bug, reported 27 August 2004...
   'man'           => 'men',
   'woman'         => 'women',
);

The point is: if the original test suite didn't report this bug, then that test suite was broken. It simply didn't do its job (finding bugs) adequately. Fix the test suite first by adding tests that cause it to fail:

> perl inflections.t
ok 1 - house -> houses
ok 2 - law -> laws
ok 3 - man -> men
ok 4 - mongoose -> mongooses
ok 5 - goose -> geese
ok 6 - ox -> oxen
not ok 7 - woman -> women
#     Failed test (inflections.t at line 20)
#          got: 'womans'
#     expected: 'women'
ok 8 - mother-in-law -> mothers-in-law
ok 9 - mouse -> mice
ok 10 - box -> boxes
1..10
# Looks like you failed 1 tests of 10.

Once the test suite is detecting the problem correctly, then you'll be able to tell when you've correctly fixed the actual bug, because the tests will once again fall silent.

This approach to debugging is most effective when the test suite covers the full range of manifestations of the problem. When adding test cases for a bug, don't just add a single test for the simplest case. Make sure you include the obvious variations as well:

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law', 

   # Sascha's bug, reported 27 August 2004...
   'man'           => 'men',
   'woman'         => 'women',
   'human'         => 'humans',
   'man-at-arms'   => 'men-at-arms', 
   'lan'           => 'lans',
   'mane'          => 'manes',
   'moan'          => 'moans',
);

The more thoroughly you test the bug, the more completely you will fix it.

10. Don't Optimize Code--Benchmark It

If you need a function to remove duplicate elements of an array, it's natural to think that a "one-liner" like this:

sub uniq { return keys %{ { map {$_=>1} @_ } } }

will be more efficient than two statements:

sub uniq {
   my %seen;
   return grep {!$seen{$_}++} @_;
}

Unless you are deeply familiar with the internals of the Perl interpreter (in which case you already have far more serious personal issues to deal with), intuitions about the relative performance of two constructs are exactly that: unconscious guesses.

The only way to know for sure which of two--or more--alternatives will perform better is to actually time each of them. The standard Benchmark module makes that easy:

# A short list of not-quite-unique values...
our @data = qw( do re me fa so la ti do );

# Various candidates...
sub unique_via_anon {
   return keys %{ { map {$_=>1} @_ } };
}

sub unique_via_grep {
   my %seen;
   return grep { !$seen{$_}++ } @_;
}

sub unique_via_slice {
   my %uniq;
   @uniq{@_} = ();
   return keys %uniq;
}

# Compare the current set of data in @data
sub compare {
   my ($title) = @_;
   print "\n[$title]\n";

   # Create a comparison table of the various timings, making sure that
   # each test runs at least 10 CPU seconds...
   use Benchmark qw( cmpthese );
   cmpthese -10, {
       anon  => 'my @uniq = unique_via_anon(@data)',
       grep  => 'my @uniq = unique_via_grep(@data)',
       slice => 'my @uniq = unique_via_slice(@data)',
   };

   return;
}

compare('8 items, 10% repetition');

# Two copies of the original data...
@data = (@data) x 2;
compare('16 items, 56% repetition');

# One hundred copies of the original data...
@data = (@data) x 50;
compare('800 items, 99% repetition');

The cmpthese() subroutine takes a number, followed by a reference to a hash of tests. The number specifies either the exact number of times to run each test (if the number is positive), or the absolute number of CPU seconds to run the test for (if the number is negative). Typical values are around 10,000 repetitions or ten CPU seconds, but the module will warn you if the test is too short to produce an accurate benchmark.

The keys of the test hash are the names of your tests, and the corresponding values specify the code to be tested. Those values can be either strings (which are eval'd to produce executable code) or subroutine references (which are called directly).

The benchmarking code shown above would print out something like the following:

[8 items, 10% repetitions]
        Rate anon  grep slice
anon  28234/s --  -24%  -47%
grep  37294/s   32% --  -30%
slice 53013/s   88% 42%    --

[16 items, 50% repetitions]
        Rate anon  grep slice
anon  21283/s --  -28%  -51%
grep  29500/s   39% --  -32%
slice 43535/s  105% 48%    --

[800 items, 99% repetitions]
       Rate  anon grep slice
anon   536/s --  -65%  -89%
grep  1516/s  183% --  -69%
slice 4855/s  806%  220% --

Each of the tables printed has a separate row for each named test. The first column lists the absolute speed of each candidate in repetitions per second, while the remaining columns allow you to compare the relative performance of any two tests. For example, in the final test tracing across the grep row to the anon column reveals that the grepped solution was 1.83 times (183 percent) faster than using an anonymous hash. Tracing further across the same row also indicates that grepping was 69 percent slower (-69 percent faster) than slicing.

Overall, the indication from the three tests is that the slicing-based solution is consistently the fastest for this particular set of data on this particular machine. It also appears that as the data set increases in size, slicing also scales much better than either of the other two approaches.

However, those two conclusions are effectively drawn from only three data points (namely, the three benchmarking runs). To get a more definitive comparison of the three methods, you'd also need to test other possibilities, such as a long list of non-repeating items, or a short list with nothing but repetitions.

Better still, test on the real data that you'll actually be "unique-ing."

For example, if that data is a sorted list of a quarter of a million words, with only minimal repetitions, and which has to remain sorted, then test exactly that:

our @data = slurp '/usr/share/biglongwordlist.txt';

use Benchmark qw( cmpthese );

cmpthese 10, {
    # Note: the non-grepped solutions need a post-uniqification re-sort
    anon  => 'my @uniq = sort(unique_via_anon(@data))',
    grep  => 'my @uniq = unique_via_grep(@data)',
    slice => 'my @uniq = sort(unique_via_slice(@data))',
};

Not surprisingly, this benchmark indicates that the grepped solution is markedly superior on a large sorted data set:

s/iter anon slice  grep
anon    4.28 --   -3%  -46%
slice   4.15 3%    --  -44%
grep    2.30 86%   80%    --

Perhaps more interestingly, the grepped solution still benchmarks as being marginally faster when the two hash-based approaches aren't re-sorted. This suggests that the better scalability of the sliced solution as seen in the earlier benchmark is a localized phenomenon, and is eventually undermined by the growing costs of allocation, hashing, and bucket-overflows as the sliced hash grows very large.

Above all, that last example demonstrates that benchmarks only benchmark the cases you actually benchmark, and that you can only draw useful conclusions about performance from benchmarking real data.

This Week in Perl 6, June 29-July 5, 2005


My, doesn't time fly? Another fortnight gone and another summary to write. It's a hard life, I tell you!

This Week in perl6-compiler

Where's Everyone Gone?

It seems that most of the Perl 6 compiler development discussions occur at Hackathons and on IRC, with summaries appearing in developers' weblogs. What's a summarizer to do? For now, I'll point you at Planet Perl 6, which aggregates a bunch of relevant blogs.

PGE Now Supports Grammars, Built-In Rules

Allison Randal raved about the "totally awesome" PGE grammar support. I doubt she's alone in her enthusiasm.

Multiple Implementations Are Good, M'kay?

Patrick discussed the idea of a "final" Perl 6 compiler, pointing out that it isn't clear that there needs to be a "final" compiler. As long as multiple implementations are compatible.

Meanwhile, in perl6-internals

New Calling Conventions

Klaas-Jan Stol asked a bunch of questions about the new calling conventions and Leo answered them.

Parrot Segfaults

What's a tester to do? You find a bug that makes Parrot dump core, so you write a test to document the bug and make sure it gets fixed. But the test leaves core files lying about. It goes without saying that Parrot should never dump core without the active assistance of an NCI call or some other unsafe call blowing up in its face.

This makes it a little embarrassing that PIR code generated by Pugs can cause a Parrot segfault, though the cause appears be mixed up calling convention style in the generated call.

Brian Wheeler's segfaulting Pugs script

Python PMCs

Leo pointed out that the various dynclasses/py*.pmc Parrot support PMCs don't yet support all the semantics that Python needs. He outlined some outstanding issues and announced that, as calling conventions and context handling were changing, he'd be turning off compiling py*.pmc for the time being.

PGE Bug

It appears that the upcoming changes in Parrot's context handling tweak a bug in PGE. The discussion moved onto a discussion of PGE's implementation strategy; Nicholas Clark was keen to make sure it didn't repeat some of the Perl 5's regex engine's infelicities. While this discussion continued, Leo spent half a day with gdb and tracked down the problem, which turned out to be that a register wasn't getting initialized in the right place.

Left-Recursive Grammars Are Bad, M'kay?

While experimenting with PGE grammars, Will Coleda managed to write a left-recursive grammar that blew Parrot's call stack with impressive ease. Luke apologized for things blowing up so spectacularly, but pointed out that PGE didn't support left-recursive grammars and showed a rewritten grammar that didn't have the same problem (but which doesn't appear to match the same expressions).

Coroutines

Leo pointed to a summary of coroutines (PDF), and noted that we still hadn't defined the syntax of Parrot coroutines, especially with respect to argument passing. He discussed it with Matt Fowles and solicited a set of tests that expressed the semantics they came up with.

ParTcl, Perl 6 Grammars

Will Coleda announced that, thanks to Matt Diephouse's work, ParTcl (Tcl on Parrot) is now able to run part of Tcl's cvs-latest test suite. The tests aren't fully native yet, being currently driven through a Perl test harness and only passing ten percent of the tests, but hopefully the situation will improve and ParTcl will end up able to run the tests completely natively (while passing far more of them). Congratulations on the work so far, though.

Python and Parrot

Kevin Tew popped up to say that he too is working on a Python compiler targeting Parrot and wondered how to handle things like Python's self parameter. Michal Wallace and Leo chipped in with suggestions.

Another Month, Another Release

Has it really been a month? Seems so. Parrot walked through the traditional feature freeze and code freeze before being released on Sunday. The latest release is Geeksunite, referencing the website that discusses Chip's problems with his former employer. You should definitely visit the Geeksunite site--Chip needs our help.

lower in Default find_name Scope

Patrick posted a code fragment whose output surprised him--it turned out that looking up lower as a name in the default scope returns an NCI object. Leo explained why this was so, prompting Patrick to suggest that it would be useful if, somewhere in the Parrot documentation, there were some descriptions of Parrot's built-in namespace. Leo encourage others to comment on namespace issues, and hoped for some decisions as well.

Copyrights

If you're like me, discussion of copyrights and licenses is the sort of thing that either really winds you up or induces serious drowsiness, depending on your mood as you read the thread. It's one of those "too important not to think about, but too tedious to think about any more than is absolutely necessary" topics. That said, Will Coleda said that he had thought that all of Parrot's code should to have its copyright assigned to the Perl Foundation. However, on inspection, he noticed a multiplicity of copyright notices in the actual code, including one file in the repository with a Microsoft copyright.

PGE: Code Blocks

Matt Diephouse wondered about the plan for integrating code blocks into PGE. He thought it'd be nice to be able to specify a compiler to use along with the code block (or, for the time being, just to be able to use PIR code). Patrick said that there is a plan (or several) for handling this, but getting blocks to work well needs coordination between PGE and the compiler language. In essence, when PGE encounters a code block, it needs to hand off to the target language's compiler to parse to the end of the code block, and get back from the compiler the length of the block thus parsed.

Possible Bug Calculating Denominators

Curtis Rawls posted a fragment of code that seems to break Inc's computed_denominators algorithm. Leo wasn't surprised that there were probably bugs in that part of IMCC, which was contributed by Angel Faus, who no longer seems to be participating in Parrot development. Which means it's not been maintained for a while for lack of tuits. Anyone with an appropriate supply of tuits is welcome (nay, encouraged) to take it on.

Meanwhile, in perl6-language

Type Variables Vs. Type Literals

Autrijus had a question about the difference between

sub (::T $x, ::T $y) {...}

and

sub (T $x, T $y) {...}

Larry answered about four times, mulling over various options. It's times like that remind me why it's worth following the list in detail rather than reading the summaries--it's good to see Larry thinking aloud, considering all sorts of (seemingly) wacky options and getting feedback.

Mr. Clean Vs. Perl 6

Yuval Kogman had some comments about fascism, strong (but I think he meant static) typing, cleaning products, Perl 6, and type inferencing. Stephane Payrard hoped that "Perl6 could become the playground of type theory searchers." (To which I can only respond with a highly personal "Ick!")

Documentation Trait/Docstring Equivalent

The Platypus (AKA David Formosa) wondered if documentation traits on subs would be useful. The first to hope that it would be was Chromatic, commenting that it's a shame for Perl 6 to throw away potentially useful data recklessly. Larry commented that he always cringes when he hears "the documentation" as if it's the only thing. Again, Larry's thinking aloud on this subject is well worth your time.

SMD Is for Weenies

So says Yuval Kogman, and who are we to doubt him? Yuval wanted to make multimethods the default type. Sam Vilain disagreed, pointing up the usefulness of warnings like "method foo redefined at ...."

DBI v2: The Plan and How You Can Help

Tim Bunce outlined his current thinking on how DBI v2 will work (DBI v2's going to be Perl-6-only) and a local roadmap for the first things that need doing. He then opened the floor for detailed proposals for what a Perl 6 DBI API should look like. (I wonder if DBI v2's going to be an important enough tool that it'll want an RFC type process.)

I'm glossing over the ensuing discussion--it's at the stage where, if you're interested, you're better off joining in directly.

Should .assuming Always Be Non-Mutating?

Ingo Blechschmidt had some suggestions about the behavior of the currying method .assuming, arguing that it should always return a new thing and not alter the state of the underlying object. Larry agreed.

return() in Pointy Blocks

Coo. The pointy block thread returns. The question is, where to?

Time::Local

Gaal Yahas announced that he'd added localtime to Pugs in order to address Dave Rolsky's needs when porting the very lovely and worthwhile DateTime family of modules. He noted that Perl 6's final time-related interfaces were rather underspecified and had a bunch of questions. The one thing that's absolutely certain is that the default Perl time API will use subsecond resolution by default.

I've noticed that, every time you start to discuss how computers handle "human" things such as time, dates, or writing systems, people often seem to have very strong and deeply held ideas of the Right Way of doing things, and those Right Ways are almost all different. Larry's job is probably going to be to work out the Least Wrong Way. (If you've not heard Dave Rolsky's talks about the underlying reasons for writing DateTime (MP3) and the headaches it gave him, then I suggest you seek it out.)

Autogenerated Attribute Accessor Names

MetaModel maker Stevan Little wondered what to do when attribute names clashed, as in:

class Foo { has @.bar; has $.bar; }

No answers yet.

Acknowledgements, Adverts, Apologies, and Alliteration

Summarizing a week is definitely way easier than summarizing a fortnight.

I'm apologizing in advance for the fact that, for those of you who read this via the mailing list, some of the links probably don't work yet. The thing is, the thread links are generated directly from the message-ids because that's the information I have access to and, so far as I know, Google Groups is the only archive that has a RESTful search interface that lets me use message-ids as my key. If you know of an archive site that does this, but is more timely in its archiving of perl6-language in particular, then please let me know and I'll start using that instead. Ideally, it should allow me to directly address a message complete with its thread context.

If you haven't already done so, you really should pay a visit to Geeksunite. For the life of me, I can't see what I would have done differently in Chip's situation, and I'm staggered by what's happened to him.

Right, back to our standard coda:

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.

Building Navigation Menus


Navigation menus are a group of links given at one side of the page that allows users to navigate to different places of a website. Navigation menus allow site visitors to explore other pages of the site and to find what they want more easily. For example, Paul Graham's home page contains a simple navigation menu made out of images. It doesn't change as the site visitor move to different pages of the site. The KDE desktop environment home page contains a more sophisticated menu. Click on the link to the screenshots to see a submenu for links to screenshots from multiple versions of KDE. Other menu items have similar expansions.

Common Patterns in Navigation Menus and Site Flow

There are several patterns in maintaining navigation menus and general site flow.

A Tree of Items

Usually, the items in the navigation menus are a tree structure (as is the case for the KDE site) or a flat list. Sometimes, branches of the tree can expand or collapse depending on the current page. This prevents having to display the entire tree at once.

Next/Previous/Up Links to Traverse a Site

Many sites provide links to traverse the pages of the site in order: a Next link to go to the next page, a Previous link to go to the previous page, an Up link to go to the section containing the current page, a Contents link to go to the main page, and so on.

HTML can represent these links by using <head> tag and <link rel="next" href="" />. directives. Mozilla, Firefox, and Opera all support these tags. They can be also visible in the HTML as normal links, as is the case with GNU Documentation.

Site Maps and Breadcrumb Trails

Other navigation aids provided by sites include a site map (like the one on Eric S. Raymond's home page) and a breadcrumb trail. A breadcrumb trail is a path of the components of the navigation menu that leads to the current page. The documentation for Module::Build::Cookbook on search.cpan.org provides an example ("Ken Williams > Module-Build > Module::Build::Cookbook," in this case).

Hidden Pages and Skipped Pages

Hidden pages are part of the site flow (the next/previous scheme) but don't appear in the navigation menu. Skipped pages are the opposite: they appear in the navigation menu, but are not part of the site flow.

Introducing HTML::Widgets::NavMenu

How do you create menus? HTML::Widgets::NavMenu is a CPAN module for maintaining navigation menus and site flow in general. It supports all of the above-mentioned patterns and some others, has a comprehensive test suite, and is under active maintenance. I have successfully used this module to maintain the site flow logic for such sites as my personal home page and the Perl Beginners' Site. Other people use it for their own sites.

This makes it easy to generate and maintain such navigation menus in Perl. It is generic enough so that it can generate static HTML or dynamic HTML on the fly for use within server-side scripts (CGI, mod_perl, etc.).

To install it, use a CPAN front end by issuing a command such as perl -MCPANPLUS -e "install HTML::Widgets::NavMenu" or perl -MCPAN -e "install HTML::Widgets::NavMenu".

A Simple Example

Here's a simple example: a navigation tree that contains a home page and two other pages.

You can see the complete code for this example:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Widgets::NavMenu;
use File::Path;

This is the standard way to begin a Perl script. It imports the module and the File::Path module, both of which it uses later.

my $css_style = <<"EOF";
a:hover { background-color : palegreen; }
.body {
.
.
.
EOF

This code defines a CSS stylesheet to make things nicer visually.

my $nav_menu_tree =
{
    'host'  => "default",
    'text'  => "Top 1",
    'title' => "T1 Title",
    'subs'  =>
    [
        {
            'text' => "Home",
            'url'  => "",
        },
        {
            'text'  => "About Me",
            'title' => "About Myself",
            'url'   => "me/",
        },
        {
            'text'  => "Links",
            'title' => "Hyperlinks to other Pages",
            'url'   => "links/",
        },
    ],
};

Now this is important. This is the tree that describes the navigation menu. It is a standard nested Perl 5 data structure, with well-specified keys. These keys are:

  • host: A specification of the host on which the sub-tree starting from that node resides. HTML::Widgets::NavMenu menus can span several hosts on several domains. In this case, the menu uses just one host, so default here is fine.
  • text: What to place inside of the <a>...</a> tag (or alternatively, the <b> tag, if it's the current page).
  • title: Text to place as a title attribute to a hyperlink (usually displayed as a tooltip). It can display more detailed information, helping to keep the link text itself short.
  • url: The path within the host where this item resides. Note that all URLs are relative to the top of the host, not the URL of their supernode. If the supernode has a path of software/ and you wish the subnode to have a path of software/gimp/, specify url => 'software/gimp/'.
  • subs: An array reference that contains the node's sub-items. Normally, this will render them in a submenu.

One final note: HTML::Widgets::NavMenu does not render the top item. The rendering starts from its sub-items.

my %hosts =
(
    'hosts' =>
    {
        'default' =>
        {
            'base_url' => ("http://web-cpan.berlios.de/modules/" .
                "HTML-Widgets-NavMenu/article/examples/simple/dest/"),
        },
    },
);

This is the hosts map, which holds the hosts for the site. Here there is only one host, called default.

my @pages =
(
    {
        'path'    => "",
        'title'   => "John Doe's Homepage",
        'content' => <<'EOF',
<p>
Hi! This is the homepage of John Doe. I hope you enjoy your stay here.
</p>
EOF
    },
    .
    .
    .
);

The purpose of this array is to enumerate the pages, giving each one the <title> tag, the <h1> title, and the content that it contains. It's not part of HTML::Widgets::NavMenu, but rather something that this script uses to render meaningful pages.

foreach my $page (@pages)
{
    my $path     = $page->{'path'};
    my $title    = $page->{'title'};
    my $content  = $page->{'content'};
    my $nav_menu =

        HTML::Widgets::NavMenu->new(
            path_info     => "/$path",
            current_host  => "default",
            hosts         => \%hosts,
            tree_contents => $nav_menu_tree,
        );

    my $nav_menu_results = $nav_menu->render();
    my $nav_menu_text    = join("\n", @{$nav_menu_results->{'html'}});
    
    my $file_path = $path;
    if (($file_path =~ m{/$}) || ($file_path eq ""))
    {
        $file_path .= "index.html";
    }
    my $full_path = "dest/$file_path";
    $full_path =~ m{^(.*)/[^/]+$};

	# mkpath() throws an exception if it isn't successful, which will cause
	# this program to terminate.  This is what we want.
    mkpath($1, 0, 0755);
    open my $out, ">", $full_path or
        die "Could not open \"$full_path\" for writing: $!\n";
    
    print {$out} <<"EOF";
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>$title</title>
<style type="text/css">
$css_style
</style>
</head>
<body>
<div class="navbar">
$nav_menu_text
</div>
<div class="body">
<h1>$title</h1>
$content
</div>
</body>
</html>
EOF

    close($out);
}

This loop iterates over all the pages and renders each one in turn. If the directory up to the file does not exist, the program creates it by using the mkpath() function. The most important lines are:

    my $nav_menu =
        HTML::Widgets::NavMenu->new(
            path_info     => "/$path",
            current_host  => "default",
            hosts         => \%hosts,
            tree_contents => $nav_menu_tree,
        );

    my $nav_menu_results = $nav_menu->render();
    my $nav_menu_text    = join("\n", @{$nav_menu_results->{'html'}});

This code initializes a new navigation menu, giving it four named parameters. path_info is the path within the host. Note that, as opposed to the paths in the navigation menu, it starts with a slash. This is to allow some CGI-related redirections. current_host is the current host (again, it's default). Finally, hosts and tree_contents point to hosts and the tree of contents, respectively.

The object render() method returns the results in a hash reference, with the navigation menu results as an array of tags pointed by the html key. The code finally joins and returns them.

The program produces this result, with three entries, placed in a <ul>. When a user visits a page, the corresponding menu entry displays in bold and has its link removed.

A More Complex Example

Now consider a more complex example. This time, the tree is considerably larger and contains nested items. There are now subs of other pages.

The final site has a menu. When accessing a page (for example, the "About Myself" page) its expands so visitors can see its sub-items.

Adding More Navigation Aids

The next step is to add a breadcrumb trail, navigation links, and a site map to the site. You can inspect the new code to see if you understand it and view the final site.

The breadcrumb trail appears right at the top of the site. Below it is a toolbar with navigation links like "next," "previous," and "up." Finally, there's a site map. Here are the salient points of the code's modifications:

  1. The code loads the Template Toolkit to render the page, then fills in the variables of the template to define the template itself and to process it into the output file.
  2. The CSS stylesheet has several new styles, to make the modified page look nicer.
  3. A template portion to transform a breadcrumb-trail object as returned by HTML::Widgets::NavMenu into HTML. It should be easy to understand.
  4. The bottom of the navigation menu tree now has an entry with a link to the site map page.
  5. The site map is now part of the @pages array. It initializes an HTML::Widgets::NavMenu with the appropriate URL, and then uses its gen_site_map() function.
  6. There is new code used to generate the navigation links. These links are a hash reference with the keys being the relevance of the link and the value being an object supplying information about the link (such as direct_url() or title()). There are two loops that renders each link into both the HTML <head> tag <link> elements, or the toolbar to present on the page.
  7. The text of the breadcrumb trail is a join of their HTML representations.
  8. The generated HTML template includes the new page elements.

Fine-Grained Site Flow

The final example modifies the site to have a more sophisticated site flow. Looking at the changes shows several more additions. Their implications are:

  1. Both English resumés have a 'skip' => 1, pair. This caused these pages to appear in the navigation menu, but not to be part of the traversal flow. Clicking "next" at that page will skip them both. Pressing "prev" at the page that follows them leads to the page that precedes them.
  2. The Humour section has its 'show_always' attribute set, causing it to expand on all pages of the site.
  3. 'expand' is a regular expression for the Software section. As a result, accessing a page not specified in the navigation menu but that matches that regular expression causes the Software section to expand there.
  4. The software tools page entry has the attribute 'hide' => 1. This removes it from the navigation menu but allows it to appear in the site flow. Clicking on "next" on the preceding page will reach it.

A CGI Script

Until now, the examples have demonstrated generating a set of static HTML pages. The code can also run dynamically on a server. One approach is to use the ubiquitous CGI.pm, which comes bundled with Perl.

Converting to the CGI script required few changes. Inside of the page loop, the code checks if the page matches the CGI path info (the path appended after the CGI script name). If so, the code calls the render_page() function.

render_page() is similar to the rest of the loop except that it prints the output to STDOUT after the CGI header. Finally, after the loop ends, the code checks that it has found a page. If not, it displays an error page.

Note that the way this script looks for a suitable page is suboptimal. A better-engineered script might keep the page paths in a persistent hash or other data structure from which to look up the path info.

Conclusion

This article demonstrated how to use HTML::Widgets::NavMenu to maintain navigation menus and organize site flow. Reading its documentation may reveal other useful features. Now you no longer have an excuse for lacking the niceties demonstrated here on your site. Happy hacking!

Acknowledgments

Thanks to Diego Iastrubni, Aankehn, and chromatic (my editor) for giving some useful commentary on early drafts of this document.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en