September 2002 Archives

This week on Perl 6 (9/23 - 9/29, 2002)

Okay, this is my last summary before I take a couple of week's holiday away from any form of connectivity. Will I cope? Can my system stand going cold turkey? Can you live without my summaries?

Luckily, Leon Brocard has been volunteered to step into the breach and produce summaries for the next couple of weeks.

Oh yes, due to my being a lazy swine and not reading release notes, combined with a new version of Spamassassin no longer delivering mail by default (now it silently drops mail on the floor in cases where it had previously just delivered the mail), I may be missing some messages from this week. Sorry.

We'll kick off, as usual with happenings on the internal list:

Of Variables, Values and Vtables

Dan stopped travelling (for a while at least), and listed the current short term goals for Parrot. They are:

and promised the variable/vtable stuff in the 'next day or so', with the calling convention stuff a little earlier or later. Leo Toetsch offered some his thoughts on vtable methods for _keyed opcodes.

http://groups.google.com/groups

http://groups.google.com/groups

IMCC 0.0.9.2

Leopold Toetsch provided a patch which 'fixes all currently known problems [with respect to] IMCC/Perl6'. Andy Dougherty had some problems with the patch dumping core, possibly because of platform specific issues, and Steve Fink realised that there was an overlap between this patch and one he'd been working on. The patch has not yet been applied, but work continued.

http://rt.perl.org/rt2/Ticket/Display.html

Fun with intlists

Leopold Toetsch showed some benchmarks of intlist against PerlArrays, the difference is stunning. The intlist based test is some ten times faster than PerlArray, with most of PerlArray's time being spent allocating memory. Leo suggests using intlist as the PerlArray base class.

Having got bragging rights for one speed up, Leo sent in a second patch which gave another ten fold performance boost. Sean O'Rourke had a few questions about performance in typical usage and wondered if, we shouldn't look at using borrowing from SGI's STL implementation of a dequeue (double ended queue). Leo was ahead of him there; his second patch was already using the trick Sean had suggested.

http://groups.google.com/groups

http://groups.google.com/groups

Functions in Scheme

Jürgen Boumlmmels sent a pre patch which gets Scheme functions working. It's built on top of an early version of Sean O'Rourke's scratchpad.pmc, so be careful applying the initial patch. Sean hoped that it would be be easy to reconcile Jürgen's changes to the scratchpad pmc with the changes he'd made since he sent Jürgen his early code. Jonathan Sillito asked why the scheme interpreter maintained its own environment stack rather than use the pad_stack. Apparently the current pad_stack is very closely tied to Sub.pmc, which doesn't quite offer the semantics needed for scheme functions. Also, the pad_stack makes it tricky to implement set! and define correctly.

Dan chimed in asking everyone to hash out what they needed from scratchpads and lexical variables; once we have that nailed down it should be easy to get everything designed and implemented reasonably quickly, so Jürgen and Sean came up with a list between them.

http://groups.google.com/groups -- The patch

http://groups.google.com/groups -- Its description

Perl6 on HP-UX 11.00

H Merijn Brand was having trouble getting Perl 6 to work on HP-UX. It was initially thought that this was a problem with the version of perl he was using, but was eventually tracked down to a problem with make test; the tests passed when Merijn did perl6 --test. However the thread also covered making sure that the Perl6 build process rebuilt the Grammar if appropriate. There's also a theory that there's a problem with IMCC generating .pasm files.

Leopold Toetsch put his hand up for causing the problem, and submitted a patch to fix things. Applied.

http://groups.google.com/groups

http://groups.google.com/groups

The status of Leopold Toetsch's patches

Leo wondered what's happening with the pile of patches he's submitted this week. At the time he made the post, he had 15 patches outstanding (or is that 'outstanding patches'?) and, as a result several of the patches were applied. Steve Fink voted that Leo should be given commit access to CVS and Leo was grateful for the vote of confidence.

Leo later sent in yet another patch for intlist, which after a short quibble from Tom Hughes, and a correction from Leo, was applied.

http://groups.google.com/groups

http://rt.perl.org/rt2/Ticket/Display.html

Of PMCs Buffers and memory management

Worker of the week, Leo Toetsch posted a bunch of questions about PMCs, Buffers and their associated memory management. Firstly, he wondered why there was a separation between the two. He commented that 'If PMCs and Buffers are unified, it should be possible to mark [during a GC run] in one recursive process'. And there's the rub; we don't like recursion. PMCs are structured in such a way that a PMC tree can be walked in iterative fashion, which means that GC can be done in pretty much constant memory. Leo had a bunch of other questions, that were mostly answered by Mike Lambert, which drew supplementary questions from Leo. Both Mike and Leo agreed that the changes needed to Parrot for unification would lead to massive patches; but that's not a reason for not doing the work.

http://groups.google.com/groups

http://groups.google.com/groups

Add Stone Age Exception Handling

This should possibly really go in the 'In Brief' section because there was only one post in the 'thread'. But it looks like an important post. Brent Dax sent in a patch which 'adds a very, very rudimentary form of C-level exception handling' to parrot. Brent Reckons that brings parrot up to slightly better than 'homo erectus' quality exception handling.

http://rt.perl.org/rt2/Ticket/Display.html

Meanwhile, in Perl 6

I can't remember who it was christened this week's monster thread 'Paren Madness', but they weren't wrong. The 'Here, we can build a list like this...' thread continued on its merry way. I'm afraid I pretty much stopped reading once it became apparent that the only thing that was going to stop the madness was Larry making a pronouncement. Eventually Dan stepped up and asked if someone could summarize the discussion, maybe with a few possible conclusions, and then leave it for a while 'til Larry got back. Luke Palmer wrote it up and offered a suggestion which looks at first glance to be sane, and which seemed to be well liked.

http://groups.google.com/groups

http://groups.google.com/groups

For loop and streams

In pretty much the only other substantial thread of the week, 'dulfer@widd.de' had some problems with the new for loop using multiple counters, and wondered if this was because of problems with the current Perl6 implementation, or because of problems with his understanding. It turned out that it was a problem with Sean O'Rourke's understanding when he implemented the Perl 6 grammar; he'd missed something in the appropriate Apocalypse. There also seems to be a problem in that the current behaviour is mostly defined with hand waves, which is great when you're doing the broad brush design, but less great when you're trying to implement the language.

http://groups.google.com/groups

In Brief

Leopold Toetsch patched packfile.c to stop monkeying with the internals of key structures.

Parrot T Shirts, based on Andy Wardley's parrot logo design, are now available from Cafe Press at http://www.cafeshops.com/cp/store.aspx, any proceeds go to YAS/TPF.

Simon Cozens found, and patched a problem with IMCC's 'ostat' structure, which clashed with a structure in Darwin's stat.h.

Leopold Toetsch has been playing with using Doug Lea's memory allocator (see http://gee.cs.oswego.edu/dl/html/malloc.html) in Parrot. Apparently it makes 'life' run faster, but appears to double the memory footprint.

Steve Fink sent in some patches for IMCC, Leopold Toetsch did some cherry picking and released an integrated patch.

Erik Lechak wondered if there was a getting started guide to parrot, and if there wasn't, how should he go about writing one? My tip: Do it, use the tools you prefer to make the kind of guide you would have welcomed finding when you first came to parrot. Just don't use proprietary formats. Heck, it's how I started writing these summaries.

Who's Who in Perl 6

Who are you?
Piers Cawley
What do you do for/with Perl 6?
I write the summaries every week, and try and contribute to perl6-language and perl6-internals when they're discussing things I know about.
Where are you coming from?
I've been a happy Perl user for since around 4.036, initially using it as a shell and awk replacement for system administration tasks, then moving over into a programming rôle where I got heavily into OO Perl. As so many others have said, Perl 5 fits my brain better than anything else I've been paid to do, but Perl 6 offers the chance to make that fit much closer.
When do you think Perl 6 will be released?
Sooner than we all think. Later than I want.
Why are you doing this?
Someone had to. I missed Bryan's summaries and decided that, if nobody else was going to volunteer it might as well be me.
You have 5 words. Describe yourself.
Just another opinionated Perl hack.
Do you have anything to declare?
I've run out of answer sets to this questionnaire. C'mon people, your summarizer needs you.

Acknowledgements

Thanks to Piers Cawley, for taking time out of his massively busy schedule to answer the questionnaire; to Leon Brocard, for not squawking too loudly when he got volunteered to do the next two summaries; to Leo Toetsch, for a fantastic number of patches this week; to Simon Cozens, for coming back to Perl 6; to the lovely Gill, for continuing to put up with me, day in, day out...

Hmm... check out the Oscar speech.

I'm trying an experiment this week, community proofreading. I'll run the speelchucker over this summary and release it to the ravening masses. Who knows, maybe it'll make sense. It does at least have the right date at the top of the page.

Once more, if you think this summary has value send money to the Perl Foundation http://donate.perl-foundation.org and feed back and/or T?iBooks to me, mailto:pdcawley@bofh.org.uk. As usual, the fee paid for publication of this summary on perl.com has been donated directly to the Perl Foundation.

An AxKit Image Gallery

AxKit is not limited to working with pure XML data. Starting with this article, we'll work with and around non-XML data by developing an image browser that works with two types of non-XML data: a directory listing built from operating system calls (file names and statistics) and image files. Furthermore, it will be built from small modules that you can adapt to your needs or use elsewhere, like the thumbnail generator or the HTML table wrapper.

By the time we're done, several articles from now, we'd like an application that:

  • provides navigation around a tree of directories containing images,
  • displays image galleries with thumbnails,
  • ignores nonimage files,
  • allows you to define and present a custom set of information ("meta data") about each image,
  • allows you to view the complete images with and without metadata,
  • uses a non-AxKit mod_perl handler to generate thumbnail images on the fly, and
  • allows you to edit the metadata information in-browser

That feature list should allow us to build a "real world" application (rather than the weather examples we've discussed so far), and hopefully a useful one as well. Here's a screenshot of the page created by this article and the next:

Example page.

That page has four sections:

  1. Heading: Tells you where you are and offers navigation up the directory tree.
  2. Folders: links to the parent directory and any sub folders (Jim and Mary).
  3. Images: offers a thumbnail and caption area for each image. Clicking on an image or image title takes you to the full-size variant.
  4. Footer: A breadcrumbs display for getting back up the directory tree after scrolling down through a large page of images.

We'll implement the (most challenging) third section in this article and the other section in the next article.

If you want to review the basics of AxKit and Apache configuration, then here are the previous articles in this series:

Working with non-XML data as XML

The easiest way to actually work with non-XML data in AxKit is to turn it in to XML often and feed it to AxKit. AxKit itself takes this approach in its new directory handling feature -- thanks to Matt Sergeant and Jörg Walters AxKit can now scan the directory and build an XML document with all of the data. This is a lot like what native Apache does when it serves up an HTML directory listing, but it allows you to filter it. The main part of this article is about filtering this directory listing in order to create a gallery, or proofsheet, of thumbnail images.

In This Series

Introducing AxKit
The first in a series of articles by Barrie Slaymaker on setting up and running AxKit. AxKit is a mod_perl application for dynamically transforming XML. In this first article, we focus on getting started with AxKit.

XSP, Taglibs and Pipelines
Barrie explains what a "taglib" is, and how to use them to create dynamic pages inside of AxKit.

Taglib TMTOWTDI
Continuing our look at AxKit tag libraries, Barrie explains the use of SimpleTaglib and LogicSheets.

In this case, we'll be using a relatively recent addition to AxKit's standard toolkit, SAX Machines, integrated in to AxKit thanks to Kip Hampton. (disclaimer: XML::SAX::Machines is a module I wrote.) The SAX machine we'll create will be a straight pipeline with a few filters, a lot like the pipelines that AxKit uses. This pipeline will dissect directory listings and generate a list of images segmented into rows for easy display purposes. We don't get in to the details of SAX or SAX machines except to bolt together three building blocks; all of the gory details are handled for us by other modules. If you are interested in the gory details, then see Part One and Part Two of Kip's article "Introducing XML::SAX::Machines" on XML.com.

After the SAX machine builds our list of images, XSLT will be used to merge in metadata (like image titles and comments) from independant XML files and format the result for the browser. The resulting pages look like:

Managing non-XML data (the images)

On the other hand, it doesn't make sense to XMLify raw image data (though things like SVG--covered in XML.com's Sacre SVG articles--and dia files are a natural fit), so we'll take advantage of AxKit's integration with Apache and mod_perl to delegate image handlng to these more suitable tools.

This is done by using a distinctive URL for thumbnail image files and a custom mod_perl handler, My::Thumbnailer to convert full-size images to thumbnails. Neither AxKit nor mod_perl code will be used to serve the images, that will be left to Apache.

Thumbnails will be autogenerated in files with the same name as the main image file with a leading period (".") stuck on the front. In Unix land, this indicates a hidden file, and we don't want thumbnails (or other dotfiles) showing up in our gallery pages.

My::Thumbnailer uses the relatively new Imager module by Arnar M. Hrafnkelsson and Tony Cook. This is a best-of-breed module that competes with the likes of the venerable GD, the juggernaut Image::Magick, and Graphics::Libplot). Imager is gaining a reputation for speed, quality and a full-featured API.

The .meta file

Before we delve in to the implementation, let's look at one of the more subtle points of this design. Our previous examples have all been of straight pipelines that successively process a source document into an HTML page. In this application, however, we'll be funneling data from the source document and a collection of related files we'll call meta files.

This subtlety is not apparent from the screenshot, but if you look closely you can see that the caption for the first image ("A baby picture") contains more information than the captions for the other eight. This is because the first image has a meta file that contains a title and a comment to be displayed while the others don't (though they could).

The first image ("A baby picture") is from a file named a-look.jpeg, for which there is a meta file named a-look.meta in the same directory that looks like (bold shows the data that ends up getting sent to the browser):

    <meta>
      <title>A baby picture</title>
      <comment>
        <b>ME!</b>.  Well, not really.  Actually, it's some
        random image from the 'net.
      </comment>
    </meta>

An important feature of this file is that its contents and how they are presented within the caption area are completely unspecified by the core image gallery code. This makes our image gallery highly customizable: the site designer can determine what meta information needs to be associated with each image and how that information gets presented. Data can be presented in the thumbnail caption, in the expanded view, or used for nondisplay purposes.

Here's what's in each caption area:

  1. The title. If a .meta file is found for an image and it has a nonempty <title> element, then it is used as the name, otherwise the image's filename is stripped of extensions and used.
  2. The last modified time of the image file (in server-local time, unfortunately).
  3. A comment (optional): if a .meta file has a <comment> element, including XHTML markup, it is displayed.

Why a .meta file per image instead of one huge file? It will hopefully allow admins to manage images and meta files together and to allow us to access an image's meta information in a single file, a natural thing to do in AxKit. By having a pair of files for each image, you can use simple filesystem manipulations to move them around, or use filesystem links to make an image appear in multiple directories, perhaps with the same meta file, perhaps with different ones. This way we don't need to develop a lot of complex features to get a lot of mileage out of our image gallery (though we could if need be).

The Pipeline

No AxKit implementation documentation would be complete without detailing the pipeline. Here is the pipeline for the image proofsheet page shown above (click on any of the boxes to take you to the discussion about that portion of the pipeline, click on any of the miniature versions of this diagram to come to this one):

The AxKit pipeline for the image gallery application, take 1
The <filelist> document generated by AxKit My::Filelist2Data. Converts the <filelist> to a Perl data structure My::ProofSheet.  Takes the Perl data structure and generates a list of images with some metadata XML::Filter::TableWrapper.  Segments the list of images in to rows suitable for use in an HTML <table> My::ProofSheetMachine.  A SAX machine containing 3 SAX filters rowsplitter.xsl.  Converts each row of thumbnail metadata in to two rows, one for images, the other for captions metamerger.xsl.  Adds in the external .meta files, if they exist .meta files for the images captionstyler.xsl.  Converts the captions to XHTML pagestyler.xsl.  Converts the main part of the page to XHTML the final output The blue documents are content: the directory listing, the meta files and the generated HTML. This does not show the image processing, see My::Thumbnailer for that.

In this case, unlike our previous pipelines, data does not flow in a purely linear fashion: The directory listing from AxKit (<filelist>) feeds the pipeline and is massaged by three SAX filters and then by four XSLT filters. There are so many filters because this application is built to be customizable by tweaking specific filters or by adding other filters to the pipeline. It also uses several SAX filters available on CPAN to make life much easier for us.

In actual use, you may want to add more filters for things like branding, distinguishing groups of images by giving directory heirarchies different backgrounds or titles, adding ad banners, etc.

Here's a brief description of what each filter does, and why each is an independant filter:

  • My::ProofSheetMachine is a short module that builds a SAX Machine Pipeline. SAX filters are used in this application to handle tasks that are more suited to Perl than to XSLT or XSP:
    • My::FileList2Data is another short module that uses the XML::Simple module from CPAN to convert the <filelist> in to a Perl data structure that is passed on. This is its own filter because we want to customize XML::Simple and the resulting data structure a bit before passing it on.
    • My::ProofSheet is the heart of the gallery page generation. It builds a list of images from the filelist data structure and adds information about the thumbnail images and meta files.
    • XML::Filter::TableWrapper is a module from CPAN that is used to wrap a possibly lengthy list of images into rows of no more than five images each.
  • rowsplitter.xsl takes each row of images and makes it into two table rows: one for the images and one for the captions. This is easier to do in XSLT than in SAX, so here is where we shift from SAX processing to XSLT processing.
  • metamerger.xsl examines each caption to see if My::ProofSheet put the URL for a meta file in it. If so, it opens the meta file and inserts it in the caption. This is a separate filter because the site admin may prefer to write a custom filter here to integrate meta information from some other source, like a single master file or a centralized database.
  • captionstyler.xsl looks at each caption and rewrites it to be XHTML. This is a separate filter for two reasons: it allows the look and feel of the captions to be altered without having to mess with the other filters and, because it is the only filter that cares about the contents of the meta file, the site admin can alter the schema of the meta files and then alter this filter to match.
  • pagestyler.xsl converts everything outside of the caption elements in to HTML. It is separate so that the page look and feel can be altered per-site or per-directory without affecting the caption content, etc.

There are several key things to note about this design. The first is that the separation of the process into multiple filters offers the administrator the ability to modify the site's content and styling. Second, because AxKit is built on Apache's configuration engine, which filters are used for a particular directory request can be selected based on URL, directory path, query string parameters, browser types, etc. The third point to note is the use of SAX processors to handle tasks that are easier (far easier in some cases) to implement in Perl, while XSLT is used when it is more (programmer and/or processor) efficient.

The Configuration

Here's how we configure AxKit to do all of this:

    ##
    ## Init the httpd to use our "private install" libraries
    ##
    PerlRequire startup.pl

    ##
    ## AxKit Configuration
    ##
    PerlModule AxKit

    <Directory "/home/me/htdocs">
        Options -All +Indexes +FollowSymLinks

        # Tell mod_dir to translate / to /index.xml or /index.xsp
        DirectoryIndex index.xml index.xsp
        AddHandler axkit .xml .xsp

        AxDebugLevel 10

        AxTraceIntermediate /home/me/axtrace

        AxGzipOutput Off

        AxAddXSPTaglib AxKit::XSP::Util
        AxAddXSPTaglib AxKit::XSP::Param

        AxAddStyleMap text/xsl \
                      Apache::AxKit::Language::LibXSLT

        AxAddStyleMap application/x-saxmachines \
                      Apache::AxKit::Language::SAXMachines

    </Directory>

    
    <Directory "/home/me/htdocs/04">
        # Enable XML directory listings (see Generating File Lists)
        AxHandleDirs On

        #######################
        # Begin pipeline config
        AxAddRootProcessor application/x-saxmachines . \
            {http://axkit.org/2002/filelist}filelist
        PerlSetVar AxSAXMachineClass "My::ProofSheetMachine"

        # The absolute stylesheet URLs are because
        # I prefer to keep stylesheets out of the
        # htdocs for security reasons.
        AxAddRootProcessor text/xsl file:///home/me/04/rowsplitter.xsl \
            {http://axkit.org/2002/filelist}filelist

        AxAddRootProcessor text/xsl file:///home/me/04/metamerger.xsl \
            {http://axkit.org/2002/filelist}filelist

        AxAddRootProcessor text/xsl file:///home/me/04/captionstyler.xsl \
            {http://axkit.org/2002/filelist}filelist

        AxAddRootProcessor text/xsl file:///home/me/04/pagestyler.xsl \
            {http://axkit.org/2002/filelist}filelist
        # End pipeline config
        #####################

        # This is read by My::ProofSheetMachine
        PerlSetVar MyColumns 5

        # This is read by My::ProofSheet
        PerlSetVar MyMaxX 100

        # Send thumbnail image requests to our
        # thumbnail generator
        <FilesMatch "^\.">
            SetHandler  perl-script
            PerlHandler My::Thumbnailer
            PerlSetVar  MyMaxX 100
            PerlSetVar  MyMaxY 100
        </FilesMatch>
        
    </Directory>

The first <Directory> section contains the AxKit directives we introduced in article 1 and a new stylesheet mapping for application/x-saxmachines that allows us to use a SAX machine in the pipeline. Otherwise, all of the configuration directives key to this example are in the <Directory "/home/me/htdocs/04"> section.

We saw basic examples of how AxKit works with the Apache configuration engine in article 1 and article 2 in this series. We'll use this photo gallery application to demonstrate many of the more powerful mechanisms in a future article.

By setting AxHandleDirs On, we tell AxKit to generate the <filelist> document (described in the section Generating File Lists) in the 04 directory and below.

Then it's off to configure the pipeline for the 04 directory hierarchy. To do this, we take advantage of the fact that AxKit places all elements in the filelist document in to the namespace http://axkit.org/2002/filelist. The AxAddRootProcessor's third parameter causes AxKit to look at all documents it serves from the 04 directory tree and check to see whether the root element matches the namespace and element name.

This is specified in the notation used by James Clark in his introduction to XML namespaces.

If the document matches, and all AxKit-generated filelists will, then the MIME type and the stylesheet specified in the first two parameters are added to the pipeline. The four AxAddRootProcessor directives add the SAX machine and the four XSLT filters we described in the section "The Pipeline".

When loading a SAX machine into the pipeline, you can give it a simple list of SAX filters (there are many available on CPAN) and it will build a pipeline of them. This is done with a (not shown) PerlSetVar AxSAXMachineFilters "..." directive. The limitation with this directive is that you cannot pass in any initialization values to the filters and we want to.

So, instead, we use the PerlSetVar AxSAXMachineClass "My::ProofSheetMachine" to tell the Apache::AxKit::Language::SAXMachines module to load the class My::ProofSheetMachine and let that class construct the SAX machine.

The final part of the configuration uses a <Files> section to forward all requests for thumbnail images to the mod_perl handler in My::Thumbnailer.

Walking the Pipeline

Now that we have our filters in place, let's walk the pipeline and take a look at each filter and what it emits.

Generating File Lists

<filelist> document's position in the processing pipeline

First, here's a look at the <filelist> document that feeds the chain. This is created by AxKit when it serves a directory request in much the same way that Apache creates HTML directory listings. AxKit only generates these pages when AxHandleDirs On directive. This causes AxKit to scan the directory for the above screenshot and emit XML like (whitespace added, repetitive stuff elided):

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE filelist PUBLIC
      "-//AXKIT/FileList XML V1.0//EN"
      "file:///dev/null"
    >
    <filelist xmlns="http://axkit.org/2002/filelist">
      <directory
        atime="1032276941"
        mtime="1032276939"
        ctime="1032276939"
        readable="1"
        writable="1"
        executable="1"
        size="4096" >.</directory>
      <directory ...>..</directory>
      <directory ...>Mary</directory>
      <directory ...>Jim</directory>
      <file mtime="1031160766" ...>a-look.jpeg</file>
      <file mtime="1031160787" ...>a-lotery.jpeg</file>
      <file mtime="1031160771" ...>a-lucky.jpeg</file>
      <file mtime="1032197214" ...>a-look.meta</file>
      <file mtime="1035239142" ...>foo.html</file>
      ...
    </filelist>

The emboldened bits are the pieces of data we want to display: some filenames and their modification times. Some things to notice:

  • All of the elements -- most importantly the root element as we'll see in a bit -- are in a special namespace, http://axkit.org/2002/filelist, using the xmlns= attribute (see James Clark's introduction for details).
  • The entries are in unsorted order. We might want to allow the user to sort by different attributes someday, but this means that we at least need to sort the results somehow.
  • They contain the complete output from the stat() system call as attributes, so we can use the mtime attribute to derive a modification time.
  • There are files in there (a-look.meta and foo.html) that we clearly should not be displayed as images.
  • The filename for a-look.jpeg is not emboldened: We'll use the <title> element from the a-look.meta file instead.
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE filelist PUBLIC
      "-//AXKIT/FileList XML V1.0//EN"
      "file:///dev/null"
    >
    <filelist xmlns="http://axkit.org/2002/filelist">
      <directory
        atime="1032276941"
        mtime="1032276939"
        ctime="1032276939"
        readable="1"
        writable="1"
        executable="1"
        size="4096" >.</directory>
      <directory ...>..</directory>
      <directory ...>Mary</directory>
      <directory ...>Jim</directory>
      <file mtime="1031160766" ...>a-look.jpeg</file>
      <file mtime="1031160787" ...>a-lotery.jpeg</file>
      <file mtime="1031160771" ...>a-lucky.jpeg</file>
      <file mtime="1032197214" ...>a-look.meta</file>
      <file mtime="1035239142" ...>foo.html</file>
      ...
    </filelist>

My::ProofSheetMachine

My::ProofSheetMachine's position in the processing pipeline.

The processing pipeline is kicked off with a set of three SAX filters built by the My::ProofSheetMachine module:

    package My::ProofSheetMachine;
    
    use strict;
    
    use XML::SAX::Machines qw( Pipeline );
    use My::ProofSheet;
    use XML::Filter::TableWrapper;
    
    sub new {
        my $proto = shift;
        return bless {}, ref $proto || $proto;
    }
    
    sub get_machine {
        my $self = shift;
        my ( $r ) = @_;
    
        my $m = Pipeline(
            My::Filelist2Data
            => My::ProofSheet->new( Request => $r ),
            => XML::Filter::TableWrapper->new(
                ListTags => "{}images",
                Columns  => $r->dir_config( "MyColumns" ) || 3,
            ),
        );
    
        return $m;
    }
    
    1;

This module provides a minimal constructor, new() so it can be instantiated (this is an Apache::AxKit::Language::SAXMachines requirement, we don't need that for our sake). AxKit will call the get_machine() method once each request to obtain the SAX machine is used. SAX machines are not reused from request to request.

$r is a reference to the Apache request object (well, actually, to an AxKit subclass of it). This is passed into My::ProofSheet, which uses to interact query some httpd.conf settings, to control AxKit's cache, and to probe the filesystem through Apache.

$r is also queried in this module to see whether there is a MyColumns setting for this request, with a default in case, it's not. The ListTags setting tells XML::Filter::TableWrapper to segment the image list produced by the first two filters into rows of images (preparing it to be an HTML table, in other words).

The need to pass parameters like this to the SAX filters is the sole reason we're using a SAX machine factory class like this. This class is specified by using PerlSetVar AxSAXMachineClass; if we didn't need to initialize the filters like this, then we could have listed them in a PerlSetVar AxSAXMachineFilters directive. For more details on how SAX machines are integrated with AxKit, see the man page

Currently, only one SAX machine is allowed in an AxKit pipeline at a time (though different pipelines can have different machines in them). This is a limitation of the configuration system more than anything and may well change if need be. However, if we need to add SAX processors to the end of the machine, then the PerlSetVar AxSAXMachineFilters can be used to insert site-specific filters after the main machine (and before the XSLT processors).

My::Filelist2Data

My::Filelist2Data's position in the processing pipeline.

Converting the <filelist> into a proofsheet takes a bit of detailed data munging. This is quite easy in Perl, so the first step in our pipeline is to convert the XML file listing into data. XML::Simple provides this functionality for us, and we overload it so we can grab the resulting data structure and pass it on:

    package My::Filelist2Data;
    
    use XML::Simple;
    @ISA = qw( XML::Simple );
    
    use strict;
    
    sub new {
        my $proto = shift;
        my %opts = @_;
    
        # The Handler value is passed in by the Pipeline()
        # call in My::ProofSheetMachine.
        my $h = delete $opts{Handler};
    
        # Even if there's only one file element present,
        # make XML::Simple put it in an ARRAY so that
        # the downstream filter can depend on finding an
        # array of elements and not a single element.
        # This is an XML::Simple option that is almost
        # always set in practice.
        $opts{forcearray} = [qw( file )];
    
        # Each <file> and <directory> element contains
        # the file name as simple text content.  This
        # option tells XML::Simple to store it in the
        # data member "filename".
        $opts{contentkey} = "filename";
    
        # This subroutine gets called when XML::Simple
        # has converted the entire document with the
        # $data from the document.
        $opts{DataHandler} = sub {
            shift;
            my ( $data ) = @_;
    
            # If no files are found, place an array
            # reference in the right spot.  This is to
            # to simplify downstream filter code.
            $data->{file}      ||= [];
    
            # Pass the data structure to the next filter.
            $h->generate( $data );
        } if $h;
    
        # Call XML::Simple's constructor.
        return $proto->SUPER::new( %opts );;
    }
    
    1;

Sending a data structure like this between SAX machines using a non-SAX event is known as "cheating." But this is Perl, and allowing you to cheat responsibly and judiciously is one of Perl's great strengths. This works and should work for the foreseeable future. If you're planning on doing something like this for a general purpose filter, then it behooves you to also provide set_handler and get_handler methods so your filter can be repositioned after instantiation (something XML::SAX::Machines do if need be), but we don't need to clutter this single-purpose example.

The <filelist> document gets converted to a Perl data structure where each element is a data member in a HASH or an array, like (data elided and rearranged to relate well to the source XML):

    {
      xmlns => 'http://axkit.org/2002/filelist',
      directory => [
        {
          atime      => '1032276941'
          mtime      => '1032276939',
          ctime      => '1032276939',
          readable   => '1',
          writable   => '1',
          executable => '1',
          size       => '4096',
          content    => '.',
        },
        {
          ...
          content    => '..',
        },
        {
          ...
          content    => 'Mary',
        },
        {
          ...
          content    => 'Jim',
        }
      ]
      file => [
        {
          mtime      => '1031160766',
          ...
          content    => 'a-look.jpeg',
        },
        {
          mtime      => '1031160787',
          ...
          content    => 'a-lotery.jpeg',
        },
        {
          mtime      => '1031160771',
          ...
          content    => 'a-lucky.jpeg',
        },
        {
          mtime      => '035239142',
          ...
          content    => 'foo.html',
        },
        ...
      ],
    }

My::ProofSheet

My::ProofSheet's position in the processing pipeline.

Once the data is in Perl data structure, it's easy to tweak it (making mtime fields into something readable, for instance) and extend it (adding information about thumbnail images and .meta files, for instance). This is what My::ProofSheet does:

    package My::ProofSheet;
    
    use XML::SAX::Base;
    @ISA = qw( XML::SAX::Base );
    
    # We need to access the Apache request object to
    # get the URI of the directory we're presenting,
    # its physical location on disk, and to probe
    # the files in it to see if they are images.
    use Apache;
    
    # My::Thumbnailer is an Apache/mod_perl module that
    # creates thumbnail images on the fly.  See below.
    use My::Thumbnailer qw( image_size thumb_limits );
    
    # XML::Generator::PerlData lets us take a Perl data
    # structure and emit it to the next filter serialized
    # as XML.
    use XML::Generator::PerlData;
    
    use strict;
    
    sub generate {
        my $self = shift;
        my ( $data ) = @_;
    
        # Get the AxKit request object so we can
        # ask it for the URI and use it to test
        # whether files are images or not.
        my $r = $self->{Request};
    
        my $dirname = $r->uri;      # "/04/Baby_Pictures/Other/"
        my $dirpath = $r->filename; # "/home/me/htdocs/...Other/"
    
    
        my @images = map $self->file2image( $_, $dirpath ),
            sort {
                $a->{filename} cmp $b->{filename}
            } @{$data->{file}};
    
        # Use a handy SAX module to generate XML from our Perl
        # data structures.  The XML will look basically like:
        # Write XML that looks like
        #
        # <proofsheet>
        #   <images>
        #     <image>...</image>
        #     <image>...</image>
        #     ...
        #   </images>
        #   <title>/04/BabyePictures/Others</title>
        # </proofsheet>
        #
        XML::Generator::PerlData->new(
            rootname => "proofsheet",
            Handler => $self,
        )->parse( {
            title       => $dirname,
            images      => { image => \@images },
        } );
    }
    
    
    sub file2image {
        my $self = shift;
        my ( $file, $dirpath ) = @_;
    
        # Remove the filename from the fields so it won't
        # show up in the <image> structure.
        my $fn = $file->{filename};
    
        # Ignore hidden files (first char is a ".").
        # Thumbnail images are cached as hidden files.
        return () if 0 == index $fn, ".";
    
        # Ignore files Apache knows aren't images
        my $type = $self->{Request}->lookup_file( $fn )->content_type;
        return () unless
            defined $type
            && substr( $type, 0, 6 ) eq "image/";
    
        # Strip the extension(s) off.
        ( my $name = $fn ) =~ s/\..*//;
    
        # A meta filename is the image filename with a ".meta"
        # extension instead of whatever extension it has.
        my $meta_fn   = "$name.meta";
        my $meta_path = "$dirpath/$meta_fn";
    
        # The thumbnail file is stored as a hidden file
        # named after the image file, but with a leading
        # '.' to hide it.
        my $thumb_fn   = ".$fn";
        my $thumb_path = "$dirpath/$thumb_fn";
    
        my $last_modified = localtime $file->{mtime};
    
        my $image = {
            %$file,                  # Copy all fields
            type           => $type, # and add a few
            name           => $name,
            thumb_uri      => $thumb_fn,
            path           => "$dirpath/$fn",
            last_modified  => $last_modified,
        };
    
        if ( -e $meta_path ) {
            # Only add a URI to the meta info, metamerger.xsl will
            # slurp it up if and only if <meta_uri> is present.
            $image->{meta_filename} = $meta_fn;
            $image->{meta_uri}      = "file://$meta_path";
        }
    
        # If the thumbnail exists, grab its width and height
        # so later stages can populate the <img> tag with them.
        # The eval {} is in case the image doesn't exist or
        # the library can't cope with the image format.
        # Disable caching AxKit's output if a failure occurs.
        eval {
            ( $image->{thumb_width}, $image->{thumb_height} )
                = image_size $thumb_path;
        } or $self->{Request}->no_cache( 1 );
    
        return $image;
    }
    
    
    1;

When My::Filelist2Data calls generate(), generate() sorts and scans the list of files by filename, converts each to an image and sends a page title and the resulting list of images to the next filter (XML::Filter::TableWrapper). Kip Hampton's XML::Generator::PerlData is a Perl data -> XML serialization module. It's not meant for generating generic XML; it focuses purely on building an XML representation of a Perl data structure. In this case, that's ideal, because we will be generating the output document with XSLT templates and we don't care about the exact order of the elements in each <image> element, each <image> element is just a hash of key/value pairs. We do control the order of the <image> elements, however, by passing an ordered list of them in to XML::Generator::PerlData as an array.

Sorting by filename may not be the preferred thing to do for all applications, because users may prefer to sort by the caption title for the image, but then again they may not, and this allows the site administrator to control sort order by naming the files appropriately. We can add always add sorting later.

Another peculiarity of this code is that it doesn't guarantee that there will be thumb_width and thumb_height values available. If you just drop the source images in a directory, then the first time the server generates this page, there will be no thumbnails available. In this case, the call to no_cache(1) prevents AxKit from caching the output page so that suboptimal HTML does not get stuck in the cache. This will give the server another chance at generating it with proper tags, hoping of course that by the next time this page is requested, the requisite thumbnails will be available to measure.

This approach gets the HTML to the browser fast, so the user's browser window will clear quickly and start filling with the top of ths page, so the user will see some activity and be less likely to get impatient. The thumbnails will be generated when the browser sees all the <img> tags. The alternative approach would be to thumbnail the images inline, which would result in a significant delay on large listings before the first HTML hits the browser, or prethumbnailing.

One thing to note about this approach is that many browsers will request images several at a time, which will cause several server processes to be thumbnailing several different images at once. This should result in lower lag on low-load servers because processes can interleave CPU time and disk I/O waits, and can take advantage of multiple processors, if present. On heavily loaded servers, of course, this might be a bad thing; pregenerating thumbnails there would be a good idea.

The output from this filter looks like:

    <?xml version="1.0"?>
    <proofsheet>
      <images>
        <image>
          <path>
		    /home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.jpeg
		  </path>
          <writable>1</writable>
          <filename>a-look.jpeg</filename>
          <thumb_uri>.a-look.jpeg</thumb_uri>
          <meta_filename>a-look.meta</meta_filename>
          <name>a-look</name>
          <last_modified>Wed Sep  4 13:32:46 2002</last_modified>
          <ctime>1032552249</ctime>
          <meta_uri>
            file:///home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.meta
          </meta_uri>
          <mtime>1031160766</mtime>
          <size>8522</size>
          <readable>1</readable>
          <type>image/jpeg</type>
          <atime>1032553327</atime>
        </image>
        <image>
          <path>
            /home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-lotery.jpeg
          </path>
          <writable>1</writable>
          <filename>a-lotery.jpeg</filename>
          <thumb_uri>.a-lotery.jpeg</thumb_uri>
          <name>a-lotery</name>
          <last_modified>Wed Sep  4 13:33:07 2002</last_modified>
          <ctime>1032552249</ctime>
          <mtime>1031160787</mtime>
          <size>10113</size>
          <readable>1</readable>
          <type>image/jpeg</type>
          <atime>1032553327</atime>
        </image>
      </images>
      ...
      <title>/04/Baby_Pictures/Others</title>
    </proofsheet>

All the data from the original <file> elements are in each <image> element along with the new fields. Note that the first <image> contains the <meta_uri> (pointing to a-look.meta) while the second doesn't because there is no a-lotery.meta. As expected both have the <thumb_uri> tags. The parts in bold face are the bits that our presentation happens to want; yours might want more or different bits.

While there is a lot of extra information in this structure, it's really just the output from one system call (stat()) and some possibly useful byproducts of the My::ProofSheet machinations, so it's very cheap information that some front end somewhere might want. It's also easier to leave it all in than to emit just what our example frontend might want and will enable any future upstream filters or extentions to AxKit's directory scanning to shine through.

No <thumb_width> or <thumb_height> tags are present because I copied this file from the axtrace directory (see the AxTraceIntermediate directive in our httpd.conf file) after viewing a newly added directory. Here's what the first <image> element looks like when viewing after my browser had requested all thumbnails:

    <?xml version="1.0"?>
    <proofsheet>
      <images>
        <image>
          <thumb_width>72</thumb_width>
          <path>
            /home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.jpeg
          </path>
          <writable>1</writable>
          <filename>a-look.jpeg</filename>
          <thumb_height>100</thumb_height>
          <thumb_uri>.a-look.jpeg</thumb_uri>
          <meta_filename>a-look.meta</meta_filename>
          <name>a-look</name>
          <last_modified>Wed Sep  4 13:32:46 2002</last_modified>
          <ctime>1032552249</ctime>
          <meta_uri>
            file:///home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.meta
          </meta_uri>
          <mtime>1031160766</mtime>
          <size>8522</size>
          <readable>1</readable>
          <type>image/jpeg</type>
          <atime>1032784360</atime>
        </image>
        ...
      </images>
      <title>/04/Baby_Pictures/Others</title>
    </proofsheet>

XML::Filter::TableWrapper

My::TableWrapper's position in the processing pipeline

XML::Filter::TableWrapper is a CPAN module is used to take the <images> list and segmenting it by insert <tr>...</tr> tags around every (it's configurable) <image> elements. This configuration is done by the My::ProofSheetMachine module we showed earlier:

    XML::Filter::TableWrapper->new(
        ListTags => "{}images",
        Columns  => $r->dir_config( "MyColumns" ) || 3,
    ),

The output, for our list of 9 images, looks like:

    <?xml version="1.0"?>
    <proofsheet>
      <images>
        <tr>
          <image>
            ...
          </image>
          ... 4 more image elements...
        </tr>
        <tr>
          <image>
            ...
          </image>
          ... 3 more image elements...
        </tr>
      </images>
      <title>/04/Baby_Pictures/Others</title>
    </proofsheet>

Now all the presentation stylesheet (pagestuler.xsl) can key off the <tr> tags to build an HTML <table> or ignore them (and not pass them through) if it wants to display in a list format.

While I'm sure this is possible in XSLT, I have no idea how to do it easily.

rowsplitter.xsl

rowsplitter.xsl's position in the processing pipeline.

Experimentation with an early version of this application showed that presenting captions in the same table cell as the thumbnails when the thumbnails are of differing heights caused the captions to be showed at varying heights. This made it hard to scan the captions and added a lot of visual clutter to the page.

One solution is to add an XSLT filter that splits each table row of image data in to two rows, one for the thumbnail and another for the caption:

    <xsl:stylesheet 
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    
    <xsl:template match="image" mode="caption">
      <caption>
        <xsl:copy-of select="@*|*|node()" />
      </caption>
    </xsl:template>
    
    <xsl:template match="images/tr">
      <xsl:copy-of select="." />
      <tr><xsl:apply-templates select="image" mode="caption" /></tr>
    </xsl:template>
    
    <xsl:template match="@*|node()">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:template>
    
    </xsl:stylesheet>

The second template in this stylesheet matches each row (<tr> element) in the <images> element and copies it verbatim and then emits a second <tr> element right after it with a list of <caption> elements with copies of the content of each of the <image> tags in the original row. The first template is applied only to the <image> tags when creating this second row due to the mode="caption" attributes.

The third template is a standard piece of XSLT boilerplate that passes through all the XML that is not matched by the first two templates. This XML would otherwise be mangled (stripped of elements, to be specific) by the wacky default XSLT rules.

Now, I know several ways to do this in Perl in the AxKit environment and none are so easy for me as using XSLT. YMMV.

The output from that stage looks like:

    <?xml version="1.0"?>
    <proofsheet>
      <images>

        <tr><image>...  </image>   ...total of 5... </tr>
        <tr><caption>...</caption> ...total of 5... </tr>

        <tr><image>...  </image>   ...total of 4... </tr>
        <tr><caption>...</caption> ...total of 4... </tr>

      </images>
      <title>/04/Baby_Pictures/Others</title>
    </proofsheet>

The content of each <image> tag and each <caption> tag is identical. It's easier to do the transform this way and allows the frontend stylesheets the flexibility of doing things like putting the image filename or modification time in the same cell as the thumbnail.

metamerger.xsl

metamerger.xsl's position in the processing pipeline

As with the row splitter, expressing the metamerger in XSLT is an expedient way of merging in external XML documents, for several reasons. The first is for efficiency's sake: We're already using XSLT before and after this filter, and AxKit optimizes XSLT->XSLT handoffs to avoid reparsing. Another is that the underlying implementation of AxKit's XSLT engine is the speedy C of libxslt. A third is that we're not altering the incoming file at all in this stage, so the XSLT does not get out of hand (I do not consider XSLT to be a very readable programming language; its XML syntax makes for very opaque source code).

Another approach would be to go back and tweak My::ProofSheet to inherit from XML::Filter::Merger and insert it using a SAX parser. That would be a bit slower, I suspect, because SAX parsing in general tends to be slower than XSLT's internal parsing. It would rob the application of the configurability that having merging as a separate step engenders. By factoring this functionality in to the metamerger.xsl stylesheet, we offer the site designer the ability to pull data from other sources, or even to fly without any metadata at all.

Here's what metamerger.xsl looks like:

    <xsl:stylesheet 
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    
    <xsl:template match="caption">
      <caption>
        <xsl:copy-of select="*|@*|node()" />
        <xsl:copy-of select="document( meta_uri )" />
      </caption>
    </xsl:template>
    
    <xsl:template match="*|@*">
      <xsl:copy>
        <xsl:apply-templates select="*|@*|node()" />
      </xsl:copy>
    </xsl:template>
    
    </xsl:stylesheet>

The first template does all the work of matching each <caption> element and copying its content, then parsing and inserting the document indicated by the <meta_uri> element, if present. The document() function turns into a noop if <meta_uri> is not present. The second template is that same piece of boilerplate we saw in rowsplitter.xsl to copy through everything we don't explicitly match.

And here's what the <caption> for a-look.jpeg now looks like (all the other <caption> elements were left untouched because there are no other .meta files in this directory):

    <caption>
      <thumb_width>72</thumb_width>
      <path>/home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.jpeg</path>
      <writable>1</writable>
      <filename>a-look.jpeg</filename>
      <thumb_height>100</thumb_height>
      <thumb_uri>.a-look.jpeg</thumb_uri>
      <meta_filename>a-look.meta</meta_filename>
      <name>a-look</name>
      <last_modified>Wed Sep  4 13:32:46 2002</last_modified>
      <ctime>1032552249</ctime>
      <meta_uri>file:///home/barries/src/mball/AxKit/www/htdocs/04/Baby_Pictures/Others/a-look.meta</meta_uri>
      <mtime>1031160766</mtime>
      <size>8522</size>
      <readable>1</readable>
      <type>image/jpeg</type>
      <atime>1032784360</atime>
      <meta>
        <title>A baby picture</title>
        <comment><b>ME!</b>.  Well, not really.  Actually, it's some random image from the 'net.
</comment>
      </meta>
    </caption>

As mentioned before, this stylesheet does not care what you put in the meta file, it just inserts anything in that file from the root element on down. So you are free to put any meta information your application requires in the meta file and adjust the presentation filters to style it as you will.

The .meta information is not inserted in to the <image> tags because we know that none of our presentation will not need any of it there.

captionstyler.xsl

captionstyler.xsl's position in the processing pipeline

The last two stages of our pipeline turn the data assembled so far into HTML. This is done in two stages in order to separate general layout and presentation from the presentation of the caption because the these portions of the presentation might need to vary independently between one collection of images and another.

The caption stylesheet for this example is:

    <xsl:stylesheet 
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    
    <xsl:template match="caption">
      <caption width="100" align="left" valign="top">
    
        <a href="{filename}">
          <xsl:choose>
            <xsl:when test="meta/title and string-length( meta/title )">
              <xsl:copy-of select="meta/title/node()" />
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="name" />
            </xsl:otherwise>
          </xsl:choose>
        </a><br />
    
        <font size="-1" color="#808080">
          <xsl:copy-of select="last_modified/node()" />
          <br />
        </font>
    
        <xsl:copy-of select="meta/comment/node()" />
    
      </caption>
    </xsl:template>
    
    <xsl:template match="*|@*|node()">
      <xsl:copy>
        <xsl:apply-templates />
      </xsl:copy>
    </xsl:template>
    
    </xsl:stylesheet>

The first template replaces all <caption> elements with new <caption> cells with a default width and alignment, and then fills these with the name of the image, which is also a link to the underling image file, and the <last_modified time string formatted by My::ProofSheet and any <comment> that might be present in the meta file.

The <xsl:choose> element is what selects the title to display for the image. The first <xsl:when>looks to see if there is a <title> element in the meta file and uses it if present. The <xsl:otherwise> defaults the name to the <name> set by My::ProofSheet.

The captions output by this stage look like:

    <caption width="100" align="left" valign="top">
      <a href="a-look.jpeg">A baby picture</a>
      <br/>
      <font size="-1" color="#808080">Wed Sep
        4 13:32:46 2002<br/>
      </font>
      <b>ME!</b>.  Well, not really.  Actually, it's
        some random image from the 'net.
    </caption>
    <caption width="100" align="left" valign="top">
      <a href="a-lotery.jpeg">a-lotery</a>
      <br/>
      <font size="-1" color="#808080">Wed Sep
        4 13:33:07 2002<br/></font>
    </caption>

The former is what comes out when a .meta file is found, the latter when it is not.

pagestyler.xsl

And now, the final stage. If you've made it this far, congratulations; this is the start of a real application and not just a toy, so it's taken quite some time to get here.

pagestyler.xsl's position in the processing pipeline

The final stage of the processing pipeline generates an HTML page from the raw data, except for the attributes and content of <caption> tags, which it passes through as-is:

    <xsl:stylesheet 
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    
    <xsl:template match="/*">
      <html>
        <head>
          <title>Images in <xsl:value-of select="title" /></title>
        </head>
        <body bgcolor="#ffffff">
          <xsl:apply-templates select="images" />
        </body>
      </html>
    </xsl:template>
    
    
    <xsl:template match="images">
      <table>
        <xsl:apply-templates />
      </table>
    </xsl:template>
    
    <xsl:template match="tr">
      <xsl:copy>
        <xsl:apply-templates select="*" />
      </xsl:copy>
    </xsl:template>
    
    <xsl:template match="image">
      <td align="left" valign="top">
        <a href="{filename}">
          <img border="0" src="{thumb_uri}">
            <xsl:if test="thumb_width">
              <xsl:attribute name="width">
                <xsl:value-of select="thumb_width" />
              </xsl:attribute>
            </xsl:if>
            <xsl:if test="thumb_height">
              <xsl:attribute name="height">
                <xsl:value-of select="thumb_height" />
              </xsl:attribute>
            </xsl:if>
          </img>
        </a>
      </td>
    </xsl:template>
    
    <xsl:template match="@*|node()" mode="caption">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()" mode="caption" />
      </xsl:copy>
    </xsl:template>
    
    <xsl:template match="caption">
      <td>
        <xsl:apply-templates select="@*|node()" mode="caption" />
      </td>
    </xsl:template>
    
    </xsl:stylesheet>

The first template generates the skeleton of the HTML page, the second one grabs the <images> list from the source document, emits a <table>, the third copies the <tr> tags, the fourth replaces all <image> tags with <td> tags containing the thumbnail image as a link to the underlying image (similar to what captionstyler.xsl did with the picture name). The only subtlety here is that the optional <thumb_width> and <thumb_height> elements are used, if present, to inform the browser of the size of the thumbnail in order to speed up the layout process (as mentioned before, pages that don't contain this information are not cached so that when the thumbnails are generated, new HTML will be generated with it).

The fourth template converts the <caption> elements to <td> elements and copies all their content through, since captionstyler.xsl already did the presentation for them.

Tweaking this stylesheet or replacing it controls the entire page layout other than thumbnail sizing (which is set by the optional MyMaxX and MyMaxY PerlSetVar settings in httpd.conf). A different stylesheet in this point in the chain could choose to ignore the <tr> tags and present a list style output. A later stylesheet could be added to add branding or advertising to the site, etc., etc.

My::ThumbNailer

Here's the apache module that generates thumbnails. The key thing to remember is that, unlike all the other code and XML shown in this article, this is called once per thumbnail image, not once per directory. When a browser requests a directory listing, it gets HTML from the pipeline above with lots of URIs for thumbnail images. It will then usually request each of those in turn. The httpd.conf file directs all requests for dotfiles to this module:

    package My::Thumbnailer;
    
    # Allow other modules like My::ProofSheet to use some
    # of our utility routines.
    use Exporter;
    @ISA = qw( Exporter );
    @EXPORT_OK = qw( image_size thumb_limits );
    
    use strict;
    
    use Apache::Constants qw( DECLINED );
    use Apache::Request;
    use File::Copy;
    use Imager;
    
    
    sub image_size {
        my $img = shift;
    
        if ( ! ref $img ) {
            my $fn = $img;
            $img = Imager->new;
            $img->open( file => $fn )
                or die $img->errstr(), ": $fn";
        }
    
        ( $img->getwidth, $img->getheight );
    }
    
    
    sub thumb_limits {
        my $r = shift;
    
        # See if the site admin has placed MyMaxX and/or
        # MyMaxY in the httpd.conf.
        my ( $max_w, $max_h ) = map
            $r->dir_config( $_ ),
            qw( MyMaxX MyMaxY );
    
        return ( $max_w, $max_h )
            if $max_w || $max_h;
    
        # Default to scaling down to fit in a 100 x 100
        # pixel area (aspect ration will be maintained).
        return ( 100, 100 );
    }
    
    
    # Apache/mod_perl is configured to call
    # this handler for every dotfile
    # requested.  All thumbnail images are dotfiles,
    # some dotfiles may not be thumbnails.
    sub handler {
        my $r = Apache::Request->new( shift );
    
        # We only want to handle images.
        # Let Apache handle non-images.
        goto EXIT
            unless substr( $r->content_type, 0, 6 ) eq "image/";
    
        # The actual image filename is the thumbnail
        # filename without the leading ".".  There's
        ( my $orig_fn = $r->filename ) =~ s{/\.([^/]+)\z}{/$1}
            or die "Can't parse ", $r->filename;
    
        # Let Apache serve the thumbnail if it already
        # exists and is newer than the original file.
        {
            my $thumb_age = -M $r->finfo;
            my $orig_age  = -M $orig_fn;
            goto EXIT
                if $thumb_age && $thumb_age <= $orig_age;
        }
    
        # Read in the original file
        my $orig = Imager->new;
        unless ( $orig->open( file => $orig_fn ) ) {
            # Imager can't hack the format, fall back
            # to the original image.  This can happen
            # if you forget to install libgif
            # (as I have done).
            goto FALLBACK
                if $orig->errstr =~ /format not supported/;
    
            # Other errors are probably more serious.
            die $orig->errstr, ": $orig_fn\n";
        }
    
        my ( $w, $h ) = image_size( $orig );
    
        die "!\$w for ", $r->filename, "\n" unless $w;
        die "!\$h for ", $r->filename, "\n" unless $h;
    
        my ( $max_w, $max_h ) = thumb_limits( $r );
    
        # Scale down only,  If the image is smaller than
        # the thumbnail limits, let Apache serve it as-is.
        # thumb_limits() guarantees that either $max_w
        # or $max_h will be true.
        goto FALLBACK
            if ( ! $max_w || $w < $max_w )
            && ( ! $max_h || $h < $max_h );
    
        # Scale down to the maximum dimension to the
        # requested size.  This can mess up for images
        # that are meant to be scaled on each axis
        # independantly, like graphic bars for HTML
        # page seperators, but that's a very small
        # demographic.
        my $thumb = $orig->scale(
            $w > $h
                ? ( xpixels => $max_w )
                : ( ypixels => $max_h )
        );
        $thumb->write( file => $r->filename,)
            or die $thumb->errstr, ": ", $r->filename;
    
        goto BONK;
    
    FALLBACK:
        # If we can't or don't want to build the thumbnail,
        # just copy the original and let Apache figure it out.
        warn "Falling back to ", $orig_fn, "\n";
        copy( $orig_fn, $r->filename );
    
    BONK:
        # Bump apache on the head just hard enough to make it
        # forget the thumbnail file's old stat() and
        # mime type since we've most likely changed all
        # that now.  This is important for the headers
        # that control downstream caching, for instance,
        # or in case Imager changed mime types on us
        # (unlikely, but hey...)
        $r->filename( $r->filename );
    
    EXIT:
        # We never serve the image data, Apache is perfectly
        # good at doing this without our help.  Returning
        # DECLINED causes Apache to use the next handler in
        # its list of handlers.  Normally this is the default
        # Apache file handler.
        return DECLINED;
    }
    
    1;

There should be enough inline commentary to explain that lot. The only thing I'll say is that, to head off the gotophobes, I think the use of goto makes this routine a lot clearer than the alternatives; the early versions did not use it and were less readable/maintainable. This is because the three normal exit routes happen to stack nicely up from the bottom so the fallthrough from one labeled chunk to the next happens nicely.

The most glaring mistake here is that there is no file locking. We'll add that in next time.

Summary

The final result of the code in this article is to build the image proofsheet section of the page we showed at the beginning of the article. The next article will complete that page, and then we'll build the image presentation page and a metadata editor in future articles.

Help and thanks

In case of trouble, have a look at some of the helpful resources we listed in the first article.

This week on Perl 6 (9/16 - 9/22, 2002)

So, another week, another Perl 6 summary. Let's see if I can get through this one without calling Tim Bunce 'Tim Bunch' shall we? Or maybe I should leave a couple of deliberate errors in as a less than cunning ploy to get more feedback. Hmmm.

So, kicking off with the internals list as always:

The Compound Key discussions continue

Dan Sugalski, Graham Barr and Leopold Toetsch (who, incidentally, turned 44 on the 16th, so not only does he contribute really useful code, he makes Dan and I feel younger. Can this man do no wrong?) all thought hard about Ken Fox's Hard Question from last week. The Hard Question was: `If %h{"a"}[0][1] is a PASM P2["a";0;1], then what is %h{"a"}{0}[1]?'. Leo thought that things would work because an integer isn't a valid key type for a hash, so the second case would throw a `Not a string' error. Dan thought that this might not be enough, so we probably need an extra flag to designate whether a key element should be taken as an array or hash lookup. Graham Barr agreed, citing the `hash with objects as keys' example that seems to crop up everywhere, and suggesting the rather lovely looking my @array is indexed('a'..'b'); as another possibility. Graham also wondered how the flag should be used, suggesting that it should get passed into a vtable's lookup method, thus allowing the writing of PMCs that don't care how they're looked up, or other PMCs that did cunning things depending on how they were accessed. Dan agreed.

http://groups.google.com/groups

Return undef for negative indices

Sean O'Rourke supplied a patch that arranged for @ary[-1000], say, to give undef when @ary has fewer than 1000 elements. Also included was a patch which changed array's get_string() method to return the array's get_integer() value converted to a string. Leo Toetsch wasn't keen on this idea, wondering if it shouldn't return something like "PerlArray(0xaddr)" by analogy with the behaviour of PerlHash PMCs. Sean disagreed, pointing out that in Perl5 one could say print "You scored $score out of" . @questions . "questions.", and the array would stringify to the number of elements it contained. Brent Dax pointed out that in Perl 6 one would have to write print '\@foo has ' _ +@foo _ ' elements'; because Perl 6 arrays stringify to their elements, separated by .prop{sep} //= ' '. Sean didn't like this, but appeared to take the point. Uri Guttman quibbled about style, and proposed print "\@foo has $(@foo.length) elements";, which certainly does a good job of making its intention explicit.

http://rt.perl.org/rt2/Ticket/Display.html

A Lexicals pre-patch

Sean O'Rourke was unhappy with the current lexical implementation, as it doesn't seem to support different levels of static nesting. Apparently this makes nested scopes hard to implement, especially in the presence of Perl 6's %MY magic. So he sent a patch for people to play with.

Jonathan Sillito liked the patch, and pointed to a different approach that would make taking closures easier, but which would possibly make lookup slightly less efficient. Sean wondered how Jon's scheme would handle recursion. Jon thought about that, and answered by outlining how you would implement closures using Sean's scheme, and proposing that Sean make a 'real' patch.

Jürgen oumlmmels had a pile of questions too, related to using Sean's patch to implement proper Scheme functions, and he proposed a set of ops for manipulating pads. Sean agreed that this looked useful.

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

default.pmc patches

Leopold Toetsch patched default.pmc to make almost all methods throw meaningful exceptions. Sean O'Rourke reckoned that the patch went a bit far, proposing a few places where having a slightly richer default behaviour would be the Right Thing to do, and some others where doing nothing was the right default behaviour -- the example given was init. Leo countered that one should really have a default_scalar.pmc for the first types, and that, for the second type, the PMC should have an explicitly empty method. The thread resembled the Monty Python Argument skit for a few messages (`Look, this isn't a proper argument!', `Yes it is!', `No it isn't it's just contradiction!'). After a couple of rounds of this, Sean showed his (substantial) list of default behaviours that he thinks should be in default.pmc, and Leo showed us his planned PMC hierarchy.

Dan came down on Leo's side.

http://rt.perl.org/rt2/Ticket/Display.html

http://groups.google.com/groups -- Sean's list

http://groups.google.com/groups -- Leo's hierarchy

Keyed ops

Has there been a week since I started doing these summaries that hasn't seen a discussion of keys, keyed ops or key structures? This week's second keys thread was kicked off by Leopold Toetsch wondering about the legality of, for example:


    add P0[P1], P2, P3[P4]

If it is legal, what PASM ops should be generated. The problem is that the naive approach of using an op based on the argument list would lead to a horrible explosion of specific opcodes to deal with the possible combinations of keyed and unkeyed arguments. Leo wondered if the instruction should get turned into:


    add P0[P1], P2[<The Null Key>], P3[P4]

Tom Hughes and Leo batted this back and forth for a bit. Tom noted that it wouldn't be hard to create a null key: just create a key with 0 elements for every PMC that didn't otherwise have a key structure, but he still worried that we were looking at 64 different op codes for each 3 argument op.

Sean O'Rourke pointed out that if scratchpads do become `proper' PMCs, then the various 3 argument keyed ops would become remarkably useful. For instance @a[0] = %b{1} + $c could become


    add P0["@a0";0], P0["%b";1], P0["$c"]

But Tom wasn't sure that this quite fit in with Leo's plan. Leo meanwhile produced an RFC with his proposals for how keyed opcodes should look.

http://groups.google.com/groups

http://groups.google.com/groups

Meanwhile, in perl6-language...

Last week's discussion of argument passing continued on its merry way. As well as a certain amount of debate as to the meaning of `topicalize', Larry clarified a few points. For instance, current thinking seems to be that if you want to capture the caller's topic in a named variable you'd do:


   sub (...) is given ($x) { ... }

which would set the sub's $x variable to the same value as its caller's $_. He also offered some comments on good style when using $_ and made Angel Faus's day when he told us that it's looking like parameter defaults will be specified by = rather than //=. Sean O'Rourke wasn't entirely sure about the is given syntax, but Larry pointed out that Sean's proposed syntax wouldn't allow for prototyping `CORE::print, among other things.' It also looks like we're going to be using exists to see whether a parameter got passed or not.

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

Hotplug regexes, other misc regex questions

Steve Fink asked a few questions, mostly relating to pattern closures having side effects, supplying a few pathological (psychotic?) patterns as examples. My particular `favourite' was


   my $x = "the letter x";
   print "yes" if $x =~ /the { $x _= "!" } .* !/;

Damian reckoned that if the above were allowed at all, then the match should succeed, and offered answers (opinions) on the rest of Steve's menagerie. One of his suggestions involved a superposition, but I think he might have been joking. Larry also gave his somewhat more authoritative answers.

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

Hyperoperators and dimensional extension

Brent Dax wondered about how hyperoperators would work with multiple dimensions. Dan's answer can be summarized as `properly', which wasn't quite specific enough for Brent, but Dan stuck to his guns. Dan also demonstrated that whilst he may be an American, he knows how to spell `behaviour'.

http://groups.google.com/groups

Regex query

Simon Cozens had a few questions about grammars and rules. He's trying to write a grammar to parse a Linux /etc/raidtab file, and has a few questions about `drilling down' into the match object. The thread is tricky to summarize. However, in one post Larry said that `any list forced into scalar context will make a ref in Perl 6' and gave $arrayref = (1,2,3) as an example. This caused a thread explosion, which has boiled over into the current week as people followed the implications of that through (some even going so far as to wonder if that meant we wouldn't need [...] any more). Frankly, things got ugly (at least syntactically). I'm tempted to draw a veil over some of the ugliness; if you want to, read the thread. I'm waiting for Larry to get back home and make everything better with another of his shockingly wise postings. No pressure Larry.

http://groups.google.com/groups

http://groups.google.com/groups -- Pigeons, meet a cat.

Backtracking syntax

'Ralph' doesn't like the backtracking syntax, and proposed replacing :, ::, :::, <commit> and <cut> with :, :], :>, :/ and :// respectively. Simon Cozens and Markus Laire spoke up against the proposal.

http://groups.google.com/groups

In Brief

Leopold Toetsch has sent Dan a patch to implement a proposed hierarchy of PMC classes. Leo was this week's official patchmonster, submitting patches to make core_ops*.c more readable, improve predereferencing in interpreter.c, add a test case for restarting the interpreter and squeezing out a 10% increase in life performance. (This last one brought some questions from Mike Lambert.)

Simon `Unix Guru' Cozens popped in with some bug spots. First, he pointed out that the magic number in a .pbc file wasn't being taken into account in time. Then he found that queens.pasm was solving the somewhat trivial `one queen problem', rather than the more impressive `8 queens' problem. Finally, he pointed out that the hanoi program seemed to be slightly broken too. And then the week ran out.

Garret Goebbel pointed us all at a survey of native interfaces for several languages, which can be found at: http://xarch.tu-graz.ac.at/autocad/lisp/ffis.html

Who's Who in Perl 6

Who are you?
Leon Brocard, acme@astray.com
What do you do for/with Perl 6?
I'm currently more interested in the Parrot side of Perl 6. I mostly tinker with Parrot assembly (PASM), and try to keep the http://parrotcode.org/examples/ page up to date with current Parrot thinking. Sometimes I convert C code to PASM by hand. Sometimes I think evil things about converting other bytecode formats (say, Java's .class files) to Parrot. I presented a talk at the O'Reilly Open Source Conference called "Targeting Parrot", which is something that we should make terribly easy to actually do.
Where are you coming from?
680x0 programming, mostly. Programming Parrot is like how you remember programming assembly was, only higher level and more fun.
When do you think Perl 6 will be released?
Sooner than most people think.
Why are you doing this?
For fun, of course. And because it's interesting to see the development process behind a fast, portable virtual machine. Actually implementing Perl 6 on top of Parrot is just a simple matter of programming...
You have 5 words. Describe yourself.
Orange Perl/Parrot Euro-hacker
Do you have anything to declare?
I enjoy optimising computer-generated SQL statements.

Acknowledgements

It looks like Wednesday is becoming summary mail out day now. Surprisingly time consuming so it is...

Thanks to Leon Brocard for answering the questionnaire, making it embarrassingly easy for me to mention his name this week. I'm now running really low on sets of answers. Come on people, mail your answers to 5Ws@bofh.org.uk and fame and... well fame anyway will be yours.

Once more, thanks to the crack proof readers on rhizomatic.net and elsewhere. This week's primary proof readers were: Kate Pugh, Paul Makepeace and Simon Bisson. Thanks people.

If you think this summary has value, then please send your money to the Perl Foundation http://donate.perl-foundation.org and help support the ongoing development of Perl 6. As usual, the fee paid by the O'Reilly Network for their publication of this summary has been donated directly to the Perl Foundation.

Embedding Web Servers

As with most of my previous articles, this one grew out of a project at my $DAY_JOB. The project du-jour involves large dependency graphs, often containing thousands of nodes and edges. Some of the relationships are automatically generated and can be quite complicated. When something goes wrong, it's useful to be able to visualize the graph.

Simple Graph:

A Simple Graph

We use GraphViz for rendering the graph, but it falls down on huge graphs. They turn into an unreadable mess of thick lines -- less than useful. To work around this, we trim down the graph to just a segment, centered around one node, and display only n inputs or outputs.

This works great, except that the startup time to create the graph data can be quite long, because of all the graph processing that is necessary to make sure the information is up to date. (The actually graph rendering is quite fast, for small graphs.)

The solution? Process the data once, and render it multiple times, using, yes, you guessed it, a Web interface!

Mechanics of HTTP

The Hyper Text Transfer Protocol (HTTP), is the protocol on which most of the Web thrives. It is a simple client/server protocol that runs over TCP/IP sockets.

Extremely oversimplified, it looks like this:

  • Client sends request to server: "Send me document named X"
  • Server responds to client: "Here's the data you asked for" (or "Sorry! I don't know what you mean.")

In practice, it's not much more complicated:

We will use wget to examine a sample HTTP request:

wget -dq http://www.perl.org/index.shtml


 ---request begin---
 GET /index.shtml HTTP/1.0
 User-Agent: Wget/1.8.1
 Host: www.perl.org
 Accept: */*
 Connection: Keep-Alive
 
 ---request end---
 HTTP/1.1 200 OK
 Date: Tue, 13 Aug 2002 18:12:23 GMT
 Server: Apache/2.0.40 (Unix)
 Accept-Ranges: bytes
 Content-Length: 10494
 Keep-Alive: timeout=15, max=100
 Connection: Keep-Alive
 Content-Type: text/html; charset=ISO-8859-1
 
 <... data downloaded to a file by wget...>

There's a lot of things we don't care about in a simple server - so lets boil it down to the guts.

Request:


 GET /index.shtml HTTP/1.0

GET is the type of HTTP action. There are others, but they're beyond the scope of this article.

/index.shtml is the name of the page to retrieve.

HTTP/1.0 is the protocol version supported by your client.

Response:


 HTTP/1.1 200 OK
 Content-Type: text/html;
 
 <data>

The first line is the status response. It includes the HTTP protocol version supported by the server, followed by the status code and a short text string defining the status.

For this article, we'll just care about status code 200 (everything is ok, here's the data) and code '404' (not found).

The next line is the MIME content type. This is required so that the Web browser knows how to display the data.

Common Content-Types:


        text/html       a HTML document
        text/plain      a plain text document
        image/jpeg      a JPEG image
        image/gif       a GIF image

After the above "header" section, there must be a blank line, and then the bytes containing the data. There's a lot more information that can go into the header block, but for the simple applications we will be developing, they are not needed.

You can use a telnet client to retrieve data from any Web server. You need to be careful though - many modern Web servers are virtual hosted, which means they require the Host: header in the request to retrieve the appropriate data.

Writing A Simple Web Server

The Basics

With the above information, it isn't hard to write your own simple Web server. There are several ways to do this and a few already written on CPAN. We're going to start from first principles though, and pretend, for the moment, we don't know about CPAN.

A good place to start looking for client/server information is in the perlipc document. About 2/3 of the way through is a section on "Internet TCP Clients and Servers". This section shows how to use simple socket commands to setup a simple server. A little further down is the section we're interested in - it demonstrates using the IO::Socket module to write a simple TCP server. I'll replicate that here.


 #!/usr/bin/perl -w
 use IO::Socket;
 use Net::hostent;              # for OO version of gethostbyaddr
 
 $PORT = 9000;                  # pick something not in use
 
 $server = IO::Socket::INET->new( Proto     => 'tcp',
                                  LocalPort => $PORT,
                                  Listen    => SOMAXCONN,
                                  Reuse     => 1);
								  
 die "can't setup server" unless $server;
 print "[Server $0 accepting clients at http://localhost:$PORT/]\n";
 
 while ($client = $server->accept()) {
   $client->autoflush(1);
   print $client "Welcome to $0; type help for command list.\n";
   $hostinfo = gethostbyaddr($client->peeraddr);
   printf "[Connect from %s]\n", $hostinfo->name || $client->peerhost;
   print $client "Command? ";
   while ( <$client>) {
     next unless /\S/;       # blank line
     if    (/quit|exit/i)    { last; }
     elsif (/date|time/i)    { printf $client "%s\n", scalar localtime; }
     elsif (/who/i )         { print  $client `who 2>&1`; }
     elsif (/cookie/i )      { print  $client `/usr/games/fortune 2>&1`; }
     elsif (/motd/i )        { print  $client `cat /etc/motd 2>&1`; }
     else {
       print $client "Commands: quit date who cookie motd\n";
     }
   } continue {
      print $client "Command? ";
   }
   close $client;
 }

HTTPify It

That's not a HTTP server by any stretch of the imagination, but with a different inner loop it could easily become one:


 while ($client = $server->accept()) {
   $client->autoflush(1);
   
   my $request = <$client>;
   if ($request =~ m|^GET /(.+) HTTP/1.[01]|) {
      if (-e $1) {
       print $client "HTTP/1.0 200 OK\nContent-Type: text/html\n\n";
       open(my $f,"<$1");
       while(<$f>) { print $client $_ }; 
      } else {
       print $client "HTTP/1.0 404 FILE NOT FOUND\n";
       print $client "Content-Type: text/plain\n\n";
       print $client "file $1 not found\n";
      }      
   } else {
     print $client "HTTP/1.0 400 BAD REQUEST\n";
     print $client "Content-Type: text/plain\n\n
     print $client "BAD REQUEST\n";
   }
   close $client;
 }

Let's look at the changes piece by piece:


   my $request = <$client>;

Retrieve one line from the socket connected to the client. For this to be a valid HTTP request, it must match the following:


   if ($request =~ m|^GET /(.+) HTTP/1.[01]|) {

That checks that it's a HTTP GET request, and is of a protocol version we know about.


      if (-e $1) {
       print "HTTP/1.0 200 OK\nContent-Type: text/html\n\n";
       open(my $f,"<$1");
       while(<$f>) { print $client $_ };

If the requested file exists, then send back a HTTP header that says that, along with a content type, and then the data. (We are assuming the content type is HTML here. Most http servers figure out the content type from the extension of the file.)


      } else {
       print $client "HTTP/1.0 404 FILE NOT FOUND\n";
       print $client "Content-Type: text/plain\n\n"
       print $client "file $1 not found\n";
      }

If the file doesn't exist, then send back a 404 error. The content of the error is a description of what went wrong.


   } else {
     print $client "HTTP/1.0 400 BAD REQUEST\n";
     print $client "Content-Type: text/plain\n\n
     print $client "BAD REQUEST\n";
   }

A similar error handler, in case we can't parse the request.

Almost 50 percent of the code is for error handling, and that doesn't even take into account the error handling we didn't do for I/O issues. But that's the core of a Web server, all in about 15 lines of Perl.

If you use the above code without modification, it will allow every file on your system to be read. Generally, this is a bad thing. An explanation of proper security is outside the scope of this article, but generally you want to limit access to a subset of files, located under some directory prefix. File::Spec::canonpath and Cwd::realpath are useful functions for testing this.

Single Threaded, Nonforking, Blocking

The Web server presented above is very simple. It only deals with one request at a time. If a second request is received while the first is being processed, then it will be blocked until the first completes.

There are two schemes used to take advantage of modern computers' ability to multiprocess (run more than one thing at once.) The simplest way is to fork off a Web server process for each incoming request. Because of forking overhead, many servers pre-fork. The second method is to create multiple threads. (Threads are lighter weight than processes.)

For a simple embedded server, it isn't much more difficult to build a forking server, but the extra work is unnecessary if it's only going to be used by one person or with a low hit-rate. The only advantage to the forking method is that it can serve multiple pages at once. (Taking advantage of modern operating systems ability to multiprocess.)

More information on forking servers, can be found in the perlipc documentation.

With a simple modification to our loop, we can turn our Web server into a forking client:


 while ($client = $server->accept()) {
   my $pid = fork();
   die "Cannot fork" unless defined $pid;
   do { close $client; next; } if $pid; # parent
   # fall through in child
   $client->autoflush(1);

Structure

The example server above is useful for simple reporting of generated data. Because the accept loop is closed, all processing by the main part of the program needs to be complete before the Web server is run. (Of course, actions from the Web server can trigger other pieces of the program to run.)

There are other ways to integrate a simple Web server depending on the structure of your program, but for this article, we'll stick with the design above.

Graph Walker

Above we mentioned using GraphViz to create an embedded graph viewer. To do that we'll use a Graph class that has some methods that will make our life easier. (There isn't actually a class that does all this, but you can do it with a combination of Graph and GraphViz available on CPAN.)

It is outside the scope of this article to cover graph operations, but I've named the methods so that they should be easy to figure out. I am also going to gloss over some of the GraphViz details. They can be picked up from a tutorial.

Three Easy Steps

  1. Define the Goal

    This is the easy part.

    To develop a graph browser that allows the user to click on a node to recenter the graph.

  2. Define the URL scheme

    How is the Web browser going to communicate back to the Web server? The only way it can is by requesting pages (via URLs.) For a graph browser we need two different kinds.

    First, an image representing the graph. Second, a HTML page containing the IMG tag for the graph and the imagemap.

    Since every node has a unique name we can use that to represent which node, and then use an extension to determine whether it is the HTML page or the graphic.

    
        node1.html - HTML page for graph centered on node1
        node1.gif  - GIF image for graph centered on node1
  3. Implement!

    Now that you know what you're building, you can put it all together and implement it.


 my $graph = do_something_and_build_a_graph();
 
 while ($client = $server->accept()) {
   $client->autoflush(1);
   
   my $request = <$client>;
   if ($request =~ m|^GET /(.+)\.(html|gif) HTTP/1.[01]|) {
      if ($graph->has_node($1)) {
       if ($2 eq "gif") {
         send_gif( $client, $1 );
       } else { # $2 must be 'html'
         send_html( $client, $1 );
       }
      } else {
       print $client "HTTP/1.0 404 NODE NOT FOUND\n";
       print $client "Content-Type: text/plain\n\n";
       print $client "node $1 not found\n";
      }      
   } else {
     print $client "HTTP/1.0 400 BAD REQUEST\n";
     print $client "Content-Type: text/plain\n\n
     print $client "BAD REQUEST\n";
   }
   close $client;
 }
 
 sub send_html {
    my ($client, $node) = @_;
	
    my $subgraph = $graph->subgraph( $node, 2, 2 );
    
    my $csimap = $subgraph->as_csimap( "graphmap" );
    my $time = scalar localtime;
	
    print $client "HTTP/1.0 200 OK\nContent-Type: text/html\n\n";
	
    print $client<<"EOF";
	
    <HTML>
     <HEAD>
      <TITLE>Graph centered on $node</TITLE>
      $csimap
     </HEAD>
     <BODY>
     <H1>Graph centered on $node</H1>
     <IMG SRC="/$node.gif" USEMAP="graphmap" BORDER=0>
     <HR>
     <SMALL>Page generated at $time</SMALL>
     </BODY>
    </HTML>
  
  EOF
     ;
	 
  }
  
 sub send_gif {
    my ($client, $node) = @_;
	
    my $subgraph = $graph->subgraph( $node, 2, 2 );
    
    my $gif = $subgraph->as_gif();
	
    print $client "HTTP/1.0 200 OK\nContent-Type: text/gif\n\n";
	
    print $client $gif;
	
  }     
    
And that's it!  We have created a dynamic graph browser.

I will admit that we glossed over some of the HTML and Client Side Imagemap details -- because they're tangential to the issue of embedding a Web server into a tool. An embedded Web server is like merging the Web server, cgi script and source of the data into one program -- sometimes the best way to build one is to start with a standard CGI script and use that.

More Details

Query Strings

Because our embedded Web server isn't serving actual files off of your hard drive you have lots of flexibility as to how to parse the requested URL. In normal Web servers the common way to pass extra arguments to requested pages/scripts is by using a query string.

A URL with a query string looks like this:


    http://foo.bar/thepage.html?querystring

There is a convention for passing key/value data in query strings (from HTML FORM's for example):


    http://foo.bar/page.html?keyone=dataone&keytwo=datatwo&keythree=3"

It's easy to modify our embedded webserver template to accept query strings. Just add it to the regular expression that parses the request:


   if ($request =~ m|^GET /(.+)(?:\?(.*))? HTTP/1.[01]|) {

$2 will then contain the query string. You can parse it by hand or pass it to CGI.pm or another CGI Module to parse it.

URI Escaping

Some characters have special meaning in URIs. (We've already seen ? and &. Others are " (space), %, and #. See the RFC for the full list.) In order to allow them to be passed in requests they need to be escaped. Escaping a URI changes the special characters into their hex representation with a prepended %. For example, " becomes %20.

The easiest way to perform this encoding is to use the URI::Escape module.


 use URI::Escape;
 $safe = uri_escape("10% is enough\n");
  # $safe is "10%25%20is%20enough%0D";
 $str  = uri_unescape($safe);
  # $str is 10% is enough\n

We will want to unescape any data received from the client:


    if ($request =~ m|^GET /(.+)(?:\?(.*))? HTTP/1.[01]|) {
      my $page = uri_unescape($1);
      my $querystring = uri_unescape($2);

More Ideas

You might want to embed a Web server into a tool to display the status of a task in a complicated way. Sure, you could just write the file to disk, but that's less fun!

If a task has multiple possible outputs, then you could use the Web server to allow the user to choose between them visually, or pop up a browser at various states of a project to let the user confirm that things are going according to plan.

Speaking of popping up, you don't need to make the user do anything to see the results of the page. You can force the browser to do it for you.

On UNIX systems you can use Mozilla/Netscape's X-Remote protocol:


    system(q[netscape -remote 'openURL("http://localhost:9000/")' &]);

Our example code has a port number hardcoded into it. If that port is already being used on your system, then the IO::Socket::INET::new() call will fail. An easy improvement is to loop over a range of ports, or random port numbers, until an available port is found.

Reusable Code

In some cases, I've avoided the use of modules in this article. There are many things that could be done with modules including argument handling (CGI.pm), URI/URL parsing (the URI family of modules), and even the HTTP server itself. (HTTP::Daemon)

The code we've presented here tries to go through the behind the scenes process so you know what's going on.

For quick and dirty servers, HTTP::Daemon is probably easier to use. Here's an example:


 use HTTP::Daemon;
 use HTTP::Status;
 use Pod::Simple::HTML;
 
 my $file = shift;
 die "File $file not found" unless -e $file;
 
 my $d = HTTP::Daemon->new || die;
 print "Please contact me at: <URL:", $d->url, ">\n";
 while (my $c = $d->accept) {
   while (my $r = $c->get_request) {
     if ($r->method eq 'GET') {
       my $rs = new HTTP::Response(RC_OK);
       $rs->content( Pod::Simple::HTML->filter($file) );
       $c->send_response($rs);
     } else {
       $c->send_error(RC_FORBIDDEN)
     }
   }
   $c->close;
   undef($c);
 }

The above HTTP::Daemon based server is a simple, single purpose, POD->HTML converter. I provide it with the name of a pod file to parse, and every time I reload the page, it will pass it through Pod::Simple and display the HTML to the browser.

You'll note it has the same structure as our hand-made examples, but it handles some of the nitty-gritty work for you by encapsulating it in classes.

(If you think that HTTP::Daemon is much simpler than the original, then you should see the first version of this article which used the low-level socket calls.)

In Conclusion

Embedded hardware is all the rage these days - but embedded software can be quite useful, too. By embedding a Web server into your software you gain lots of possible output options that were difficult to have before. Tables, graphics and color! (Not to mention, output can be viewed by multiple computers.) Embedded Web servers open a world of opportunities to you.

This week on Perl 6 (9/9 - 9/15, 2002)

Happy birthday to me! // Happy birthday to me! // Happy birthday, dear meeeee! // Happy birthday to me!

And, with a single breech of copyright, Piers was free. The production of this summary was delayed by my turning 35 on the 15th and then spending the Monday train journey reading one of my birthday presents (Dead Air by Iain Banks, it's jolly good) instead of writing a summary. So this morning, I left the book at home.

So, what's been going on with Perl 6. We'll start, as usual with perl6-internals.

Goal Call for 0.0.9

The week before, Dan had asked for some suggestions as to what should be the priorities for the 0.0.9 release of Parrot. One of Nicholas Clark's goals from last week was the 'Careful elimination of all compiler warnings, particularly on non x86 platforms, and for builds with nondefault INTVAL size,' and discussions of how to go about doing this (and indeed some doing) carried on into this week. There was also some discussion about whether IMCC and the Perl 6 compiler should be built by default. On the one hand, it would mean that the tinderboxes were testing those important subsystems, on the other hand, it was thought that there were some people who wouldn't be interested in testing those things. Consensus seemed to be that we should just build and test them anyway.

http://groups.google.com/groups

Scheme Implementation Details

Jürgen Bömmels and Piers Cawley continued their discussion of how to go about implementing a scheme interpreter, and lambda in particular. Piers made noises about a proof of concept implementation of Scheme that he'd made using Perl objects, but didn't show code. (And, I can exclusively reveal, will not be showing the original) code owing to badness with backups and lackadaisical use of CVS). Jürgen, who had actually made the effort of writing some code, listened politely and agreed that Piers' suggestions looked like they might be a way forward. Jürgen went away and implemented a first cut at quote, define and set!.

http://groups.google.com/groups

http://rt.perl.org/rt2/Ticket/Display.html

chr, ord etc.

Clinton A Pierce restarted this thread and discussed what he'd like to see (apart from a pony) from Parrot's equivalent of Perl 5's pack. Clint wondered whether Parrot pack should use a template, or if it should be implemented as a horde of smaller ops, each handling a different conversion, so that a single, Perl level call to pack would become lots of op calls at the parrot level. Clint also drools at the thought of doing sprintf at the parrot level. Aaron Sherman agreed with most (all?) of Clint's proposals, and also wants a pony. (Who doesn't?). Peter Gibbs went so far as to offer a patch which implemented a subset of pack functionality, and was applauded. Graham Barr wondered if pack should also allow for packing to 'native' types, which wouldn't have to worry about endian issues. Peter thought that would be a good idea. Nicholas Clark pointed out that extending the code to cope with unsigned integers would be a good idea, too.

http://groups.google.com/groups

Lexicals

Jürgen Bömmels asked a pile of questions about the implementation of lexical variables and how one could use them to make a closure. Jonathan Sillito provided a mixture of answers and guesses. It seems that we're waiting on Dan to firm up some things about lexicals.

http://groups.google.com/groups

IMCC 0.0.9 Runs 100% Perl 6 Tests + Various Results

Leopold Toetsch has been working on getting IMCC to generate parrot bytecode directly rather than going through a stage of generating an intermediate .pasm file, and had been having some problems with 'writing out the .pbc, especially Const_Table, type PFC_KEY / PARROT_ARG_SC'. Two hours later, he announced that he had all the perl6 tests running successfully within IMCC, but only if GC was turned off (there are problems with the longer running tests when GC is turned on). Things get progressively worse as first JIT, and then Predereferencing are turned on.

Dan wondered what the GC bug could be. Leo wasn't sure but posted some possible pointers. Peter Gibbs thought that at least one of the bugs was in continuation.pmc and posted a patch that fixed one of the problems when running under parrot. Meanwhile, Leo tracked down the bug to a bit of code that he'd appropriated from debug.c, so he fixed his IMCC and sent in a patch to fix debug.c as well. Applying both patches meant that the tests all passed under both IMCC and parrot.

Dan applied both patches.

Leo later fixed his problem with writing out a .pbc file directly from IMCC, and offered a patch to packout.c which he described as ugly, but working.

I think Mr Toetsch is going go get my 'kudos' award for this week as he later patched things to make the 'predereferenced' run mode work again (all perl6 tests pass when run with -P). By the way, there appears to be no reference to 'predereferenced' in the glossary.

http://groups.google.com/groups -- Problem

http://groups.google.com/groups -- Qualified success

http://rt.perl.org/rt2/Ticket/Display.html

Problem Parsing Builtins

Aaron Sherman and others have been torture testing the Perl 6 compiler. The most comprehensive test is the Builtins.p6m file (now split into several smaller files) that provides prototypes (and a smaller number of implementations) for Perl 5's built-in functions (we don't know what Perl 6's builtins will be, so Perl 5 is a good start). Sadly, right now, the Perl6 compiler can't cope with all the builtins, so there's been a game of working out which is broken, the parser or the code. Aaron has posted many short scripts highlighting problems he's found. My particular 'favorite' is my $x = 1; $x = +x, sending the compiler off into an infinite recursion. Sean O'Rourke has added these issues to his queue.

[RFC] buildings core.ops op_hash at runtime

Leopold Toetsch posted a proposal for altering the build system to get rid of some rather over the top duplication of (generated) code. Nicholas Clark liked the idea, but I don't believe the patch got applied -- yet.

Leo also suggested moving the op_info_table out into a separate file which could be shared by the various core_ops*.c files.

http://groups.google.com/groups

http://groups.google.com/groups

IMCC / Mac OS X problem

Leon Brocard (yay! Still batting 100 percent on this one ...) has been having problems building IMCC under Mac OS X. The individual .c files all compile, but bad things happen at link time. Leo, Kevin Falcone and Andy Dougherty all pitched in and, after a flurry of patches, IMCC is now building and working correctly under Mac OS X.

http://rt.perl.org/rt2/Ticket/Display.html

Problems with 64-bit integer builds

There have been problems building Parrot on some of the tinderbox systems, and many boxes are not green. Andy Dougherty had some thoughts on this, and on how to improve things. Andy's view is that so many of the tinderboxes are broken, it's hard to tell whether your new patch is making things better or worse, especially when the rebuilds can take several hours in some cases. Andy hopes that, once the majority of boxes are green most of the time, people will take more notice when one or another turns orange or red. In another thread, Andy offered a patch that had been a showstopper for some architectures, which would dump core during config's alignment detection tests.

http://groups.google.com/groups

http://rt.perl.org/rt2/Ticket/Display.html

RFC: How are compound keys with a PerlHash intended to work?

Leopold Toetsch wondered about the handling of compound keys in PerlHash objects. Dan confirmed that Leo's intuition about this was right, which was good, because Leo had a patch ready, but he still wondered about a some additional vtable methods. So he made some more proposals about how to deal with that case. Dan again agreed with Leo's analysis, and Leo came up with another patch. Steve Fink apologized for not having done this already, but his 'tuit shipment was confiscated due to heightened airport security.' Steve also neatly summarized the conclusions reached last time this came up.

Meanwhile, Graham Barr wondered where any type checking would happen. Leo thought it was implicit on lookup and showed code. So did Dan, but Ken Fox is still unsure.

http://groups.google.com/groups

http://rt.perl.org/rt2/Ticket/Display.html -- Leo's patch

http://groups.google.com/groups

Meanwhile, in perl6-language

The week before, Erik Steven Harrison had wondered what counted as a runtime property, apart from true and false. This week, Damian popped up with a list of 10 off the top of his head. return 255 but undef;, or $name = "Damian" but We_better_call_ya("Bruce") anyone?

http://groups.google.com/groups

Second Try: Builtins

Aaron Sherman's efforts at producing an initial builtins list for Perl 6 got discussed on the language list as well. Chuck Kulchar had wondered how well, if at all, they worked with the current perl6 compiler (they don't... yet), and why they were written in Perl. Aaron Sherman posted his reasons (maintainability, maintainability and maintainability). Nicholas Clark argued that Parrot code wasn't necessarily hard to maintain, and also made the case for implementing some functionality in C. Aaron thought that, eventually, they're be a mix of different implementation languages, with many of the 'munge args, call equivalent library function' type functions moved out into libraries anyway.

http://groups.google.com/groups

More A5/E5 Questions

Discussions of the Perl 6 rules system rolled on. David Helgason had worries about hypothetical variables but keys in a hash and should not therefore have sigils in their names at all. Damian pointed out that all Perl 6 variables were just keys in a hash. David wondered about the difference between binding a value to a variable in a containing scope and just binding to an entry in the match object (Damian and Allison apparently have a really neat idea for this, but it's not yet had the Larry stamp of approval). David's last worry was that $0 was a rather cryptic name for the match object and shouldn't it have a meaningful name like $MATCH (Damian thought that squashing a cryptic name in favor of an arbitrary one wasn't necessarily a win.

Jonathan Scott Duff had wondered (in off list mail to Damian, but Damian answered in public) how he could tell ^^ and $$ only to match just after and before his platform specific newline sequence. Damian thought that suggested rolling ones own <sol> and <eol> rules. Jonathan had also wondered about some of the the binding semantics of nested rules. Damian's answer gave him an appropriate 'ah! yes!' moment.

Aaron Sherman had another question about rules and kicked off the 'Throwing lexicals' (Weren't they a band?) by wondering 'How do rules create hypotheticals?' Everyone passed up the chance to do a 'Well, a mummy rule, and a daddy rule, who are very much in love...' joke, leaving it to the summary writer.

I confess, I'm not sure I understand Aaron's concern (about what to do when you assign to a hypothetical that doesn't exist in a containing scope. I thought you just bound to an appropriately named key in the current match object), which makes things a tad tricky, but Luke Palmer seemed to understand and wondered if there would be some way of declaring that a given hypothetical wouldn't infect its containing scope(s). Damian popped up again, promising that, once Larry had made a decision, he would be unveiling one of the solutions that he and Allison have cooked up.

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

Blocks and Semicolons

Piers Cawley wondered about blocks, statements and when they need terminating semicolons, and kicked off a long thread. To be honest, I'm not sure it really went anywhere, but we covered a lot of ground. The confusion arises, I think, because the design of Perl6 has moved (rather substantially in places) from the design described in some of the earlier apocalypses and exegeses. Questions like 'is when a statement, or just a clever function?', 'has Larry changed his mind about no special cases for blocks?' and others would appear to be standing in need of some definitive answers. These will, of course be forthcoming, if only in Perl6's final grammar, but we're an impatient lot.

http://groups.google.com/groups

XS in Perl 6

Aaron Sherman had a few thoughts about XS (well, whatever is going to replace it in Perl 6 and Parrot anyway) which he shared with the list. Essentially his proposals covered ways in which modules that are partially implemented in other languages could be cleanly declared and prototyped using Perl syntax rather than the current method involving XS, which looks like no other language on ghod's green earth. Brent Dax proposed a slightly different syntax, using a returns property (sub foo is returns(...)? Don't you love grammar?). The thread got rather long as Brent and Aaron discussed things back and forth, with Nicholas Clark interjecting at one point to draw the participants attention to the fact that they seemed to be on the verge of reinventing Inline::C. Tim Bunch suggested that we 'should be thinking about the forward declaration syntax and semantics for using existing libraries at this stage. [He suspects] that it'll then become clear how to add extra code in a simple and natural way.'

Tim also pitched in with a long quote from Larry about his goals for the Perl 6 extension mechanism.

David Whipp wondered whether we shouldn't actually be thinking about Parrot's XS replacement rather than Perl 6's. Aaron thought not, because even when parrot's extension model was fully specced out, we still need to worry about how that interacts with Perl at a language level. Dan Sugalski disagreed with Aaron, pointing out that Perl 6 XS isn't due to be dealt with until Apocalypse 21. Aaron wondered whether this really meant we'd be waiting for '16 more Apocalypses before we write code that allows chdir() to call the C library function?' Dan thought Aaron was worrying unduly and pointed out that chdir is, or will be, a Parrot opcode. Aaron responded to this by stepping back and defining some useful terms and stated his current position in those terms. And then the week ended.

This one could run and run. Tune in next week for the exciting continuation. (Ooh, we haven't done continuations recently have we?)

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

Passing Arguments

In a subthread of the 'blocks and semicolons' thread, Aaron Sherman wondered about passing arguments. Aaron listed five different forms and wondered about how one would mix up the different styles. Luke Palmer and Brent Dax both wondered what made one of his special cases a special case. Again, it looks like the whole area of prototypes could use some cleaning up (but then we're currently working on clues from other design documents; hopefully the upcoming Apocalypse 6 will clear up many of these issues).

http://groups.google.com/groups

In Brief

Steve Fink committed his IntList patch, and Josef Höök queried the creation of an intlist.c file in the Parrot core, as his matrix patch had been rejected for doing something similar. Nobody has responded to this yet.

Ramesh Ananthakrishnan wondered whether compiling C down to Parrot would be a useful thing to do, as a way of magically porting useful stuff that was already written in C. Aaron Sherman thought that it wouldn't be useful as a 'magic porting' tool, as that would be better done by linking to existing C libraries, and that for small fragments, a manual conversion would probably be better anyway.

Ramesh (or should that be Ananthakrishnan) also wondered whether it is possible to write networking code in Parrot. Answer: Not yet, but a Sockets extension is probably get written at some point.

Andy Dougherty patched the build's link order to take traditional, order dependent, linkers into account.

Jerome Quelin fixed Befunge (though, with a language like that, how anyone could tell it's broken is a mystery) to use the new chr opcode.

Dan Sugalski turned 35 on the 12th. I turned 35 on the 15th. Did I miss anyone else's birthday?

Leopold Toetsch patched Parrot_vsprintf_s and after prompting supplied a test that failed without the patch in place. (Remember boys and girls, if you're offering a patch that fixes a bug, make sure you also supply a test that shows up the bug.)

Jeff Forr wanted to declare next (this) week to be a week of bug hunting, but Nicholas Clark pointed out that this clashed with YAPC::Europe and maybe it was better to make the week after a bug hunt. Jeff agreed.

The Perl 6 Mini Conference in Zurich

Also going on last week was a Perl 6 mini conference, held in Zurich. Larry, Damian, Dan, Allison and Hugo all gathered to sit around a table and thrash out some more of the Perl 6 design. I assume that whiteboards were also available. As well as doing design work, there was a mini conference, complete with talks from all of the above, and due to time and money commitments I couldn't make it. However, Paul Johnson could, and he wrote me a report, which I present here pretty much unedited.

A Report, by Paul Johnson

Last week, Perl 6 moved to Zürich. The bulk of the Perl 6 design team was here as guests of ETH, and spent the week, er, designing Perl 6 I suppose. But maybe they were out exploring Zürich and its environs. If they were, who can blame them? If they weren't, well they'll just have to came back another time :-)

On Thursday and Friday they also managed to fit in a Mini::Conference on Perl 6. In attendance were Larry Wall, with his wife, Gloria, Damian Conway, Dan Sugalski, Allison Randall and Hugo van der Sanden.

We were treated to two days of talks and discussion about Perl 6. Larry Wall gave the keynote speech to start the conference. As always, Larry's talk was interesting and entertaining. The scheduled topic was "Studies in the Ballistic Arts", however, Larry said that this title was prepared before the talk itself, and the talk, along with the title, morphed into one about the Science of Perl. This will be heard again at YAPC::E in a few days, and so I won't spoil it by attempting to summarise it here.

Next up was Damian Conway, who gave his presentation entitled "Introduction to Perl 6", covering the first five Apocalypses. Damian has managed to acquire quite a reputation within the Perl community, and Larry promised that Damian would be more entertaining than he. That was quite a promise, but I don't think anyone was disappointed. Damian in turn promised that Dan, speaking next, would be more entertaining than he. I think Dan was probably too busy to notice, checking in some patches or redoing the GC or something.

Damian noted that the audience was probably more sympathetic than most he gave the presentation to, given that they had come to a two day conference devoted to Perl 6. There were nonetheless a number of people who were worried about the move to Perl 6, and one who was still worried about moving to Perl 5! I think that most of Damian's jokes flew high over the heads of most people, but I appreciated them at least. I suppose that XXXX (4x) hasn't made it to Switzerland. And maybe Crocodile Dundee wasn't such a big hit. Even Switzerland's joining of the UN two days earlier seemed to go unnoticed, although it is UNO here. I think there were a few Java programmers in the audience too, since when Damian mentioned about Java having a HelloWorld library about half the audience laughed and half seemed a little concerned that they hadn't heard of it before. And the suggestion that Archbishop Tutu might not like being interpolated was entirely missed. (Should we interpolate $to too?) Still, had the jokes been in German, they'd have flown right past me instead. And I trust Damian's German accent will stay in place should he have occasion to talk about B&D languages in Munich.

Unsurprisingly, Dan spoke about "The Parrot Virtual Machine". Dan actually gave two presentations back to back. The first was an overview of Parrot, and the second was a more detailed look at parts of it. This was a very interesting look at the fast moving world of perl6-internals and seemed to be well received by a knowledgeable audience.

The second day started with a presentation by Allison Randall entitled "Linguistic Basis of Perl 6". Every so often, in perl6-language in particular, some discussion about linguistics crops up, often referring to tagmemics. Allison explained to us what a tagmeme is, and how it relates to the design of Perl 6. I won't pretend to understand it all, but apparently tagmemics is the Swiss Army Knife of linguistics, a tagmeme is a unit in context, tagmemes are fractal, and both "etic" and "emic" are real words, protestations of my spell checker to the contrary notwithstanding. I understand that Allison gave this talk at TPC and will also give it at YAPC::E, so soon we'll all understand tagmemic matrices and be perfectly happy to get dropped off into some uncharted jungle.

Following Allison's presentation there were questions about some minor syntactic issues such as why the switch statement used "given" and "when" instead of "switch" and "case". The explanation of how nicely it read in English was countered with arguments that that wouldn't benefit the German speakers so much and that "switch" and "case" were probably already familiar to programmers. Damian suggested that maybe a German Perl grammar would be in order, to which the inevitable response was that a Swiss German grammar was really required, but which dialect would it be in? Damian showed how easy it would be to derive your own Perl grammar and change keywords if you didn't like them. This was also useful to the chap who wanted elsif spelt correctly.

Damian's second presentation was "Programming in Perl 6" in which he took a number of real Perl 5 programs written and regularly used by prominent members of the Perl community, and he changed them into Perl 6. He did this twice, first to produce a minimal delta change and second to produce idiomatic Perl 6, at least insofar as Perl 6 has managed to acquire idioms. Both of Damian's presentations were punctuated with questions to Larry, asking if what Damian had just presented was true this week too. In some cases the language design seemed to be taking place before our eyes.

This presentation seemed to allay a lot of fears and everyone seemed quite happy with Perl 6 to the extent that Damian finished half an hour early. The option was a long lunch or Damian offered to give his Lingua::Romana::Perligata presentation. I don't think there was ever any doubt, but when Larry mentioned that he had never heard that talk it was decided. The talk is normally two hours long, but Damian managed to squeeze it into 45 minutes. I think this was probably aided in part by most people simultaneously missing the jokes and being comfortable with a language which requires the matching of number, case and gender. I suspect this is the opposite from most native English speaking audiences.

Finally Hugo was here representing the face of sanity. He told us of his plans for Perl 5.10. These included making perl clean, small and fast. To this end he intends to rewrite parts of the regular expression engine, to oversee the creation of a scheme whereby there are multiple blessed perl installations, and to claw back some of the speed that has been lost since version 5 was released. In short, to ensure that making Perl 6 better, faster and stronger than Perl 5 is as difficult as possible.

Last on the agenda was a question and answer session with the entire team. This was especially interesting, in part I think because there was not an enormous number of questions. This allowed the answers to be complete, to the point of verging on rambling. That's not a bad thing, because it let us get past the superficial answers and into more philosophical areas. Dan told us why Parrot was called Parrot. Larry told us why Perl was called Perl, what it stood for and when, and why it was perfect for search engines even before there were search engines. Dan told us not to get worried about everything, after all, it's only ones and zeros. Damian and Dan alluded to interesting things they could tell us, but then they would have to shoot us. Larry speculated on whether placing a time bomb in the perl interpreter would help us find out who is using Perl and for what. Larry and Damian told us some scary things that people do with Perl and Larry told us he flew over here in one of them. Larry also told us the secret of leadership (which is at least 2000 years old), and talked about how well his goal for the community's rewrite of the community is working. And there was a bunch of other stuff that I was far too busy enjoying to make notes about.

All in all, it was a thoroughly enjoyable and informative couple of days. Many thanks to ETH Zürich and in particular to David Schweikert for organising the event. Attendance was about 90, and profits, which look to be around CHF 4000 or so go the the Perl Foundation. Next stop: Münich.

Who's Who in Perl 6

You lucky people, last week you got Dan, this week it's Damian. Next week, the World! Bwah hah hah ha! Ahem. Without further ado:

Who are you?
Damian Conway
What do you do for Perl 6?
Where are you coming from?
Two years of electrical engineering degree, four years of computer science degree, six years of Ph.D research, eight years of designing programming languages, two decades of teaching programming, an abiding interest in human-computer interaction, a deep scepticism of formal/theoretical solutions to practical problems, an abiding belief that computers and languages were meant to serve humans not vice-versa, and the overriding axiom that simpler is better (or, at least, simpler).
When do you think Perl 6 will be released?
By Christmas.
Why are you doing this?
I'd been doing language design for the better part of a decade before I started using Perl. So when the opportunity arose to work on my favourite language and collaborate with such an extraordinarily talented team of people, how could I possibly resist?
You have 17 syllables. Describe yourself.
       Out of the torrent
    an excited voice describes
       the passing wonders.
Do you have anything to declare?
You're kidding, right? How many hours do you have?

Acknowledgements etc.

You may have noticed that I'm a little late mailing out the summary this week (though if you read this at www.perl.com you're probably wondering what I'm on about). Things have been hectic, and I really can't type or think fast enough. Normal service will hopefully be resumed this week.

Thanks are due to Damian for making the time to answer the questionnaire, even if he did cheat on the 'five words' question. Thanks are also due to everyone who has taken the time to send me answers over the weeks, apologies for not thanking you immediately. As usual, if you're involved on either of the main Perl 6 development lists, please consider answering the questions and sending your answers to mailto:5Ws@bofh.org.uk. I'm running low on answers, and I'd really like to see responses from (among others) Leopold Toetsch, Steve Fink, Brent Dax, and Jeff Goff. I don't care if you've already answered Bryan Warnock's questions, it's a different summary now.

Thanks too to the crack team of proofreaders from the rhizomatic.net irc server who will hopefully have whipped my grammar into shape by the time I think 'I really should get my finger out and post this.'

As usual, if you think that this summary has value, please consider sending money to the Perl Foundation http://donate.perl-foundation.org and help to support the ongoing development of Perl. The O'Reilly Network will, as usual, be paying my publication fee for this article directly to the Perl Foundation. If you didn't like the summary, then write your own; different viewpoints are always welcome.

If you want to reward me directly, well, iBooks are always nice (but I'd be so embarrassed if I received one), but so is feedback. Let me know what you think.

Retire your debugger, log smartly with Log::Log4perl!

You've rolled out an application and it produces mysterious, sporadic errors? That's pretty common, even if fairly well-tested applications are exposed to real-world data. How can you track down when and where exactly your problem occurs? What kind of user data is it caused by? A debugger won't help you there.

And you don't want to keep track of only bad cases. It's helpful to log all types of meaningful incidents while your system is running in production, in order to extract statistical data from your logs later. Or, what if a problem only happens after a certain sequence of 'good' cases? Especially in dynamic environments like the Web, anything can happen at any time and you want a footprint of every event later, when you're counting the corpses.

What you need is well-architected logging: Log statements in your code and a logging package like Log::Log4perl providing a "remote-control," which allows you to turn on previously inactive logging statements, increase or decrease their verbosity independently in different parts of the system, or turn them back off entirely. Certainly without touching your system's code -- and even without restarting it.

However, with traditional logging systems, the amount of data written to the logs can be overwhelming. In fact, turning on low-level-logging on a system under heavy load can cause it to slow down to a crawl or even crash.

Log::Log4perl is different. It is a pure Perl port of the widely popular Apache/Jakarta log4j library [3] for Java, a project made public in 1999, which has been actively supported and enhanced by a team around head honcho Ceki Gülcü during the years.

The comforting facts about log4j are that it's really well thought out, it's the alternative logging standard for Java and it's been in use for years with numerous projects. If you don't like Java, then don't worry, you're not alone -- the Log::Log4perl authors (yours truly among them) are all Perl hardliners who made sure Log::Log4perl is real Perl.

In the spirit of log4j, Log::Log4perl addresses the shortcomings of typical ad-hoc or homegrown logging systems by providing three mechanisms to control the amount of data being logged and where it ends up at:

  • Levels allow you to specify the priority of log messages. Low-priority messages are suppressed when the system's setting allows for only higher-priority messages.
  • Categories define which parts of the system you want to enable logging in. Category inheritance allows you to elegantly reuse and override previously defined settings of different parts in the category hierarchy.
  • Appenders allow you to choose which output devices the log data is being written to, once it clears the previously listed hurdles.

In combination, these three control mechanisms turn out to be very powerful. They allow you to control the logging behavior of even the most complex applications at a granular level. However, it takes time to get used to the concept, so let's start the easy way:

Getting Your Feet Wet With Log4perl

If you've used logging before, then you're probably familiar with logging priorities or levels . Each log incident is assigned a level. If this incident level is higher than the system's logging level setting (typically initialized at system startup), then the message is logged, otherwise it is suppressed.

Log::Log4perl defines five logging levels, listed here from low to high:


    DEBUG
    INFO
    WARN
    ERROR
    FATAL

Let's assume that you decide at system startup that only messages of level WARN and higher are supposed to make it through. If your code then contains a log statement with priority DEBUG, then it won't ever be executed. However, if you choose at some point to bump up the amount of detail, then you can just set your system's logging priority to DEBUG and you will see these DEBUG messages starting to show up in your logs, too.

Listing drink.pl shows an example. Log::Log4perl is called with the qw(:easy) target to provide a beginner's interface for us. We initialize the logging system with easy_init($ERROR), telling it to suppress all messages except those marked ERROR and higher (ERROR and FATAL that is). In easy mode, Log::Log4perl exports the scalars $DEBUG, $INFO etc. to allow the user to easily specify the desired priority.

Listing 1: drink.pl


    01 use Log::Log4perl qw(:easy);
    02
    03 Log::Log4perl->easy_init($ERROR);
    04
    05 drink();
    06 drink("Soda");
    07
    08 sub drink {
    09     my($what) = @_;
    10
    11     my $logger = get_logger();
    12
    13     if(defined $what) {
    14         $logger->info("Drinking ", $what);
    15     } else {
    16         $logger->error("No drink defined");
    17     }
    18 }

drink.pl defines a function, drink(), which takes a beverage as an argument and complains if it didn't get one. In the Log::Log4perl world, logger objects do the work. They can be obtained by the get_logger() function, returning a reference to them.

There's no need to pass around logger references between your system's functions. This effectively avoids cluttering up your beautifully crafted functions/methods with parameters unrelated to your implementation. get_logger() can be called by every function/method directly with little overhead in order to obtain a logger. get_logger makes sure that no new object is created unnecessarily. In most cases, it will just cheaply return a reference to an already existing object (singleton mechanism).

The logger obtained by get_logger() (also exported by Log::Log4perl in :easy mode) can then be used to trigger logging incidents using the following methods, each taking one or more messages, which they just concatenate when it comes to printing them:

    $logger->debug($message, ...);
    $logger->info($message, ...);
    $logger->warn($message, ...);
    $logger->error($message, ...);
    $logger->fatal($message, ...);

The method names are corresponding with messages priorities: debug() logs with level DEBUG, info with INFO and so forth. You might think that five levels are not enough to effectively block the clutter and let through what you actually need. But before screaming for more, read on. Log::Log4perl has different, more powerful mechanisms to control the amount of output you're generating.

drink.pl uses $logger->error() to log an error if a parameter is missing and $logger->info() to tell what it's doing in case everything's OK. In :easy mode, log messages are just written to STDERR, so the output we'll see from drink.pl will be:

    2002/08/04 11:43:09 ERROR> drink.pl:16 main::drink - No drink defined

Along with the current date and time, this informs us that in line 16 of drink.pl, inside the function main::drink(), a message of priority ERROR was submitted to the log system. Why isn't there a another message for the second call to drink(), which provides the beverage as required? Right, we've set the system's logging priority to ERROR, so INFO-messages are being suppressed. Let's correct that and change line 3 in drink.pl to:

    Log::Log4perl->easy_init($INFO);

This time, both messages make it through:

    2002/08/04 11:44:59 ERROR> drink.pl:14 main::drink - No drink defined
    2002/08/04 11:44:59 INFO> drink.pl:16 main::drink - Drinking Soda

Also, please note that the info() function was called with two arguments but just concatenated them to form a single message string.

Moving On to the Big Leagues

The :easy target brings beginners up to speed with Log::Log4perl quickly. But what if you don't want to log your messages solely to STDERR, but to a logfile, to a database or simply STDOUT instead? Or, if you'd like to enable or disable logging in certain parts of your system independently? Let's talk about categories and appenders for a second.

Logger Categories

In Log::Log4perl, every logger has a category assigned to it. Logger Categories are a way of identifying loggers in different parts of the system in order to change their behavior from a central point, typically in the system startup section or a configuration file.

Every logger has has its place in the logger hierarchy. Typically, this hierarchy resembles the class hierarchy of the system. So if your system defines a class hierarchy Groceries, Groceries::Food and Groceries::Drinks, then chances are that your loggers follow the same scheme.

To obtain a logger that belongs to a certain part of the hierarchy, just call get_logger with a string specifying the category:

    ######### System initialization section ###
    use Log::Log4perl qw(get_logger :levels);

    my $food_logger = get_logger("Groceries::Food");
    $food_logger->level($INFO);

This snippet is from the initialization section of the system. It defines the logger for the category Groceries::Food and sets its priority to INFO with the level() method.

Without the :easy target, we have to pass the arguments get_logger and :levels to use Log::Log4perl in order to get the get_logger function and the level scalars ($DEBUG, $INFO, etc.) imported to our program.

Later, most likely inside functions or methods in a package called Groceries::Food, you'll want to obtain the logger instance and send messages to it. Here's two methods, new() and consume(), that both grab the (yes, one) instance of the Groceries::Food logger in order to let the user know what's going on:

    ######### Application section #############
    package Groceries::Food;

    use Log::Log4perl qw(get_logger);

    sub new {
        my($class, $what) = @_;

        my $logger = get_logger("Groceries::Food");

        if(defined $what) {
            $logger->debug("New food: $what");
            return bless { what => $what }, $class;
        }

        $logger->error("No food defined");
        return undef;
    }

    sub consume {
        my($self) = @_;

        my $logger = get_logger("Groceries::Food");
        $logger->info("Eating $self->{what}");
    }

Since we've defined the Groceries::Food logger earlier to carry priority $INFO, all messages of priority INFO and higher are going to be logged, but DEBUG messages won't make it through -- at least not in the Groceries::Food part of the system.

So do you have to initialize loggers for all possible classes of your system? Fortunately, Log::Log4perl uses inheritance to make it easy to specify the behavior of entire armies of loggers. In the above case, we could have just said:

    ######### System initialization section ###
    use Log::Log4perl qw(get_logger :levels);

    my $food_logger = get_logger("Groceries");
    $food_logger->level($INFO);

and not only the logger defined with category Groceries would carry the priority INFO, but also all of its descendants -- loggers defined with categories Groceries::Food, Groceries::Drinks::Beer and all of their subloggers will inherit the level setting from the Groceries parent logger (see figure 1).

Figure 1
Figure 1: Explicitly set vs. inherited priorities

Of course, any child logger can choose to override the parent's level() setting -- in this case the child's setting takes priority. We'll talk about typical use cases shortly.

At the top of the logger hierarchy sits the so-called root logger, which doesn't have a name. This is what we've used earlier with the :easy target: It initializes the root logger that we will retrieve later via get_logger() (without arguments). By the way, nobody forces you to name your logger categories after your system's class hierarchy. But if you're developing a system in object-oriented style, then using the class hierarchy is usually the best choice. Think about the people taking over your code one day: The class hierarchy is probably what they know up front, so it's easy for them to tune the logging to their needs.

Let's summarize: Every logger belongs to a category, which is either the root category or one of its direct or indirect descendants. A category can have several children but only one parent, except the root category, which doesn't have a parent. In the system's initialization section, loggers can define their priority using the level() method and one of the scalars $DEBUG, $INFO, etc. which can be imported from Log::Log4perl using the :levels target.

While loggers must be assigned to a category, they may choose not to set a level. If their actual level isn't set, then they inherit the level of the first parent or ancestor with a defined level. This will be their effective priority. At the top of the category hierarchy resides the root logger, which always carries a default priority of DEBUG. If no one else defines a priority, then all unprioritized loggers inherit their priority from the root logger.

Categories allow you to modify the effective priorities of all your loggers in the system from a central location. With a few commands in the system initialization section (or, as we will see soon, in a Log::Log4perl configuration file), you can remote-control low-level debugging in a small system component without changing any code. Category inheritance enables you to modify larger parts of the system with just a few keystrokes.

Appenders

But just a logger with a priority assigned to it won't log your message anywhere. This is what appenders are for. Every logger (including the root logger) can have one or more appenders attached to them, objects, that take care of sending messages without further ado to output devices like the screen, files or the syslog daemon. Once a logger has decided to fire off a message because the incident's effective priority is higher or equal than the logger level, all appenders attached to this logger will receive the message -- in order to forward it to each appender's area of expertise.

Moreover, and this is very important, Log::Log4perl will walk up the hierarchy and forward the message to every appender attached to one of the logger's parents or ancestors.

Log::Log4perl makes use of all appenders defined in the Log::Dispatch namespace, a separate set of modules, created by Dave Rolsky and others, all freely available on CPAN. There's appenders to write to the screen (Log::Dispatch::Screen), to a file (Log::Dispatch::File), to a database (Log::Dispatch::DBI), to send messages via e-mail (Log::Dispatch::Email), and many more.

New appenders are defined using the Log::Log4perl::Appender class. The exact number and types of parameters required depends on the type of appender used, here's the syntax for one of the most common ones, the logfile appender, which appends its messages to a log file:

        # Appenders
    my $appender = Log::Log4perl::Appender->new(
        "Log::Dispatch::File",
        filename => "test.log",
        mode     => "append",
    );

    $food_logger->add_appender($appender);

This will create a new appender of the class Log::Dispatch::File, which will append messages to the file test.log. If we had left out the mode => "append" pair, then it would just overwrite the file each time at system startup.

The wrapper class Log::Log4perl::Appender provides the necessary glue around Log::Dispatch modules to make them usable by Log::Log4perl as appenders. This tutorial shows only the most common ones: Log::Dispatch::Screen to write messages to STDOUT/STDERR and Log::Dispatch::File, to print to a log file. However, you can use any Log::Dispatch-Module with Log::Log4perl. To find out what's available and how to their respective parameter settings are, please refer to the detailed Log::Dispatch documentation. Using add_appender(), you can attach as many appenders to any logger as you like.

After passing the newly created appender to the logger's add_appender() method like in

    $food_logger->add_appender($appender);

it is attached to the logger and will handle its messages if the logger decides to fire. Also, it will handle messages percolating up the hierarchy if a logger further down decides to fire.

This will cause our Log::Dispatch::File appender to add the following line

    INFO - Eating Sushi

to the logfile test.log. But wait -- where did the nice formatting with date, time, source file name, line number and function go we saw earlier on in :easy mode? By simply specifying an appender without defining its layout, Log::Log4perl just assumed we wanted the no-frills log message layout SimpleLayout, which just logs the incident priority and the message, separated by a dash.

Layouts

If we want to get fancier (the previously shown :easy target did this behind our back), then we need to use the more flexible PatternLayout instead. It takes a format string as an argument, in which it will -- similar to printf() -- replace a number of placeholders by their actual values when it comes down to log the message. Here's how to attach a layout to our appender:

        # Layouts
    my $layout =
      Log::Log4perl::Layout::PatternLayout->new(
                     "%d %p> %F{1}:%L %M - %m%n");
    $appender->layout($layout);

Since %d stands for date and time, %p for priority, %F for the source file name, %M for the method executed, %m for the log message and %n for a newline, this layout will cause the appender to write the message like this:

    2002/08/06 08:26:23 INFO> eat:56 Groceries::Food::consume - Eating Sushi

The %F{1} is special in that it takes the right-most component of the file, which usually consists of the full path -- just like the basename() function does.

That's it -- we've got Log::Log4perl ready for the big league. Listing eat.pl shows the entire "system": Startup code, the main program and the application wrapped into the Groceries::Food class.

Listing 2: eat.pl


    01 ######### System initialization section ###
    02 use Log::Log4perl qw(get_logger :levels);
    03
    04 my $food_logger = get_logger("Groceries::Food");
    05 $food_logger->level($INFO);
    06
    07     # Appenders
    08 my $appender = Log::Log4perl::Appender->new(
    09     "Log::Dispatch::File",
    10     filename => "test.log",
    11     mode     => "append",
    12 );
    13
    14 $food_logger->add_appender($appender);
    15
    16     # Layouts
    17 my $layout =
    18   Log::Log4perl::Layout::PatternLayout->new(
    19                  "%d %p> %F{1}:%L %M - %m%n");
    20 $appender->layout($layout);
    21
    22 ######### Run it ##########################
    23 my $food = Groceries::Food->new("Sushi");
    24 $food->consume();
    25
    26 ######### Application section #############
    27 package Groceries::Food;
    28
    29 use Log::Log4perl qw(get_logger);
    30
    31 sub new {
    32     my($class, $what) = @_;
    33
    34     my $logger = get_logger("Groceries::Food");
    35
    36     if(defined $what) {
    37         $logger->debug("New food: $what");
    38         return bless { what => $what }, $class;
    39     }
    40
    41     $logger->error("No food defined");
    42     return undef;
    43 }
    44
    45 sub consume {
    46     my($self) = @_;
    47
    48     my $logger = get_logger("Groceries::Food");
    49     $logger->info("Eating $self->{what}");
    50 }

Beginner's Pitfalls

Remember when we said that if a logger decides to fire, then it forwards the message to all of its appenders and also has it bubble up the hierarchy to hit all other appenders it meets on the way up?

Don't underestimate the ramifications of this statement. It usually puzzles Log::Log4perl beginners. Imagine the following logging requirements for a new system:

  • Messages of level FATAL are supposed to be written to STDERR, no matter which subsystem has issued them.
  • Messages issued by the Groceries category, priorized DEBUG and higher need to be appended to a log file for debugging purposes.

Easy enough: Let's set the root logger to FATAL and attach a Log::Dispatch::Screen appender to it. Then, let's set the Groceries logger to DEBUG and attach a Log::Dispatch::File appender to it.

Figure 2
Figure 2: A Groceries::Food and a root appender

Now, if any logger anywhere in the system issues a FATAL message and decides to 'fire,' the message will bubble up to the top of the logger hierarchy, be caught by every appender on the way and ultimately end up at the root logger's appender, which will write it to STDERR as required. Nice.

But what happens to DEBUG messages originating within Groceries? Not only will the Groceries logger 'fire' and forward the message to its appender, but it will also percolate up the hierarchy and end up at the appender attached to the root logger. And, it's going to fill up STDERR with DEBUG messages from Groceries, whoa!

This kind of unwanted appender chain reaction causes duplicated logs. Here's two mechanisms to keep it in check:

  • Each appender carries an additivity flag. If this is set to a false value, like in

        $appender->additivity(0);

    then the message won't bubble up further in the hierarchy after the appender is finished.

  • Each appender can define a so-called appender threshold, a minimum level required for an oncoming message to be honored by the appender:

        $appender->threshold($ERROR);

    If the level doesn't meet the appender's threshold, then it is simply ignored by this appender.

In the case above, setting the additivity flag of the Groceries-Appender to a false value won't have the desired effect, because it will stop FATAL messages of the Groceries category to be forwarded to the root appender. However, setting the root logger's threshold to FATAL will do the trick: DEBUG messages bubbling up from Groceries will simply be ignored.

Compact Logger Setups With Configuration Files

Configuring Log::Log4perl can be accomplished outside of your program in a configuration file. In fact, this is the most compact and the most common way of specifying the behavior of your loggers. Because Log::Log4perl originated out of the Java-based log4j system, it understands log4j configuration files:

    log4perl.logger.Groceries=DEBUG, A1
    log4perl.appender.A1=Log::Dispatch::File
    log4perl.appender.A1.filename=test.log
    log4perl.appender.A1.mode=append
    log4perl.appender.A1.layout=Log::Log4perl::Layout::PatternLayout
    log4perl.appender.A1.layout.ConversionPattern=%d %p> %F{1}:%L %M - %m%n

This defines a logger of the category Groceries, whichs priority is set to DEBUG. It has the appender A1 attached to it, which is later resolved to be a new Log::Dispatch::File appender with various settings and a PatternLayout with a user-defined format (ConversionPattern).

If you store this in eat.conf and initialize your system with

    Log::Log4perl->init("eat.conf");

then you're done. The system's compact logging setup is now separated from the application and can be easily modified by people who don't need to be familiar with the code, let alone Perl.

Or, if you store the configuration description in $string, then you can initialize it with

    Log::Log4perl->init(\$string);

You can even have your application check the configuration file in regular intervals (this obviously works only with files, not with strings):

    Log::Log4perl->init_and_watch("eat.conf", 60);

checks eat.conf every 60 seconds upon log requests and reloads everything and re-initializes itself if it detects a change in the configuration file. With this, it's possible to tune your logger settings while the system is running without restarting it!

The compatibility of Log::Log4perl with log4j goes so far that Log::Log4perl even understands log4j Java classes as appenders and maps them, if possible, to the corresponding ones in the Log::Dispatch namespace. Log::Log4perl will happily process the following Java-fied version of the configuration shown at the beginning of this section:

    log4j.logger.Groceries=DEBUG, A1
    log4j.appender.A1=org.apache.log4j.FileAppender
    log4j.appender.A1.File=test.log
    log4j.appender.A1.layout=org.apache.log4j.PatternLayout
    log4j.appender.A1.layout.ConversionPattern=%F %L %p %t %c - %m%n

The Java-specific FileAppender class will be mapped by Log::Log4perl to Log::Dispatch::File behind the scenes and the parameters adjusted (The Java-specific File will become filename and an additional parameter mode will be set to "append" for the Log::Dispatch world).

Typical Use Cases

The configuration file format is more compact than the Perl code, so let's use it to illustrate some real-world cases (although you could do the same things in Perl, of course!):

We've seen before that a configuration line like:

    log4perl.logger.Groceries=DEBUG, A1

will turn on logging in Groceries::Drink and Groceries::Food (and all of their descendants if they exist) with priority DEBUG via inheritance. What if Groceries::Drink gets a bit too noisy and you want to raise its priority to at least INFO while keeping the DEBUG setting for Groceries::Food? That's easy, no need to change your code, just modify the configuration file:

    log4perl.logger.Groceries.Drink=INFO, A1
    log4perl.logger.Groceries.Food=DEBUG, A1

or, you could use inheritance to accomplish the same thing. You define INFO as the priority for Groceries and override Groceries.Food with a less restrictive setting:

    log4perl.logger.Groceries=INFO, A1
    log4perl.logger.Groceries.Food=DEBUG, A1

Groceries::Food will be still on DEBUG after that, while Groceries and Groceries::Drinks will be on INFO.

Or, you could choose to turn on detailed DEBUG logging all over the system and just bump up the minimum level for the noisy Groceries.Drink:

    log4perl.logger=DEBUG, A1
    log4perl.logger.Groceries.Drink=INFO, A1

This sets the root logger to DEBUG, which all other loggers in the system will inherit. Except Groceries.Drink and its descendents, of course, which will carry the INFO priority.

Or, similarily to what we've talked about in the Beginner's Pitfalls section, let's say you wanted to print FATAL messages system-wide to STDOUT, while turning on detailed logging under Groceries::Food and writing the messages to a log file? Use this:

    log4perl.logger=FATAL, Screen
    log4perl.logger.Groceries.Food=DEBUG, Log

    log4perl.appender.Screen=Log::Dispatch::Screen
    log4perl.appender.Screen.stderr=0
    log4perl.appender.Screen.Threshold=FATAL
    log4perl.appender.Screen.layout=Log::Log4perl::Layout::SimpleLayout

    log4perl.appender.Log=Log::Dispatch::File
    log4perl.appender.Log.filename=test.log
    log4perl.appender.Log.mode=append
    log4perl.appender.Log.layout=Log::Log4perl::Layout::SimpleLayout

As mentioned in Appenders, setting the appender threshold of the screen appender to FATAL keeps DEBUG messages out of the root appender and so effectively prevents message duplication.

According to the Log::Dispatch::Screen documentation, setting its stderr attribute to a false value causes it log to STDOUT instead of STDERR. log4perl.appender.XXX.layout is the configuration file way to specify the no-frills Layout seen earlier.

You could also have multiple appenders attached to one category, like in

    log4perl.logger.Groceries=DEBUG, Log, Database, Emailer

if you had Log::Dispatch-type appenders defined for Log, Database and Emailer.

Performance Penalties and How to Minimize Them

Logging comes with a (small) price tag: We figure out at runtime if a message is going to be logged or not. Log::Log4perl's primary design directive has been to run this check at maximum speed in order to avoid slowing down the application. Internally, it has been highly optimized so that even if you're using large category hierarchies, the impact of a call to e.g. $logger->debug() in non-DEBUG mode is negligable.

While Log::Log4perl tries hard not to impose a runtime penalty on your application, it has no control over the code leading to Log::Log4perl calls and needs your cooperation with that. For example, take a look at this:

   use Data::Dumper;
   $log->debug("Dump: ", Dumper($resp));

Passing arguments to the logging functions can impose a severe runtime penalty, because there's often expensive operations going on before the arguments are actually passed on to Log::Log4perl's logging functions. The snippet above will have Data::Dumper completely unravel the structure of the object behind $resp, pass the whole slew on to debug(), which might then very well decide to throw it away. If the effective debug level for the current category isn't high enough to actually forward the message to the appropriate appender(s), then we should have never called Dumper() in the first place.

With this in mind, the logging functions don't only accept strings as arguments, but also subroutine references which, in case the logger is actually firing, it will call the subroutine behind the reference and take its output as a message:

   $log->debug("Dump: ", sub { Dumper($resp) } );

The snippet above won't call Dumper() right away, but pass on the subroutine reference to the logger's DEBUG method instead. Perl's closure mechanism will make sure that the value of $resp will be preserved, even if the subroutine will be handed over to Log::Log4perls lower level functions. Once Log::Log4perl will decide that the message is indeed going to be logged, it will execute the subroutine, take its return value as a string and log it.

Also, your application can help out and check if it's necessary to pass any parameters at all:

   if($log->is_debug()) {
       $log->debug("Interpolation: @long_array");
   }

At the cost of a little code duplication, we avoid interpolating a huge array into the log string in case the effective log level prevents the message from being logged anyway.

Installation

Log::Log4perl is freely available from CPAN. It also requires the presence of two other modules, Log::Dispatch (2.00 or better, which is a bundle itself) and Time::HiRes (1.20 or better). If you're using the CPAN shell to install Log::Log4perl, then it will resolve these and other recursive dependencies for you automatically and download the required modules one by one from CPAN.

At the time this article went to print, 0.22 was the stable release of Log::Log4perl, available from [1] and CPAN. Also on [1], the CVS source tree is publicly available for those who want the (sometimes shaky) bleeding development edge. The CPAN releases, on the other hand are guaranteed to be stable.

If you have questions, requests for new features, or if you want to contribute a patch to Log::Log4perl, then please send them to our mailing list at log4perl-devel@lists.sourceforge.net on SourceForge.

Project Status and Similar Modules

Log::Log4perl has been inspired by Tatsuhiko Miyagawa's clever Log::Dispatch::Config module, which provides a wrapper around the Log::Dispatch bundle and understands a subset of the log4j configuration file syntax. However, Log::Dispatch::Config does not provide a full Perl API to log4j -- and this is a key issue which Log::Log4perl has been designed to address. Log::Log4perl is a log4j port, not just a subset.

The Log::Log4perl project is still under development, but its API has reached a fairly mature state, where we will change things only for (very) good reasons. There's still a few items on the to-do list, but these are mainly esoteric features of log4j that still need to find their way into Log::Log4perl, since the overall goal is to keep it compatible. Also, Log::Log4perl isn't thread safe yet -- but we're working on it.

Thanks

Special thanks go to fellow Log4perl founder Kevin Goess (cpan@goess.org), who wrote half of the code, helped generously to correct the manuscript for this article and invented these crazy performance improvements, making log4j jealous!

Mission

Scatter plenty of debug statements all over your code -- and put them to sleep via the Log::Log4perl configuration. Let the INFO, ERROR and FATAL statements print to a log file. If you run into trouble, then lower the level in selected parts of the system, and redirect the additional messages to a different file. The dormant DEBUG statements won't cost you anything -- but if you run into trouble, then they might save the day, because your system will have an embedded debugger on demand. Have fun!

Infos

[1] The log4perl project page on SourceForge: http://log4perl.sourceforge.net

[2] The Log::Log4perl documentation: http://log4perl.sourceforge.net/releases/Log-Log4perl/docs/html/Log/Log4perl.html

[3] The log4j project page on the Apache site: http://jakarta.apache.org/log4j

[4] Documentation to Log::Dispatch modules: http://search.cpan.org/author/DROLSKY/Log-Dispatch-2.01/Dispatch.pm

Writing CGI Applications with Perl

It seems every month or so, there's a new Perl and CGI book out; huge thick volumes promising to teach you all you need to know about programming for the Web in 24 hours. They all start with "Hello world" and they invariably finish with a shopping cart example. All this I find a little tedious.

Writing CGI Applications with Perl is not like this. Although, I have to admit, it has the obligatory shopping cart example. It starts on a firm footing - security, trustworthiness of user input and the environment, and tainting. Indeed, security is a recurrent - and a welcome - theme throughout the book.

Brent and Kevin are both well-known members of the Perl community, and the style and idiomatic nature of their code is second to none. If you learn Web programming from this book, then you'll be learning quality code, guaranteed.

If I had to find the biggest criticism of this book, then it would be that its audience is quite unclear; you'll need a reasonably firm basic knowledge of Perl to get the most out of it, and if your Perl is reasonably strong, then you may find the patient, line-by-line explanation of the code segments a little tedious. On the other hand, the beginner may enjoy this style of exposition, but become lost as the book progresses to more advanced subjects, such as the Perl DBI, graphics manipulation and mod_perl.

Another issue I have is that the organization of material isn't particularly great - applications developed in the middle of the book use a backend database, but the DBI is only explained in a later chapter. However, I have to admit that if you stick to a strict order of introducing material, then the examples for the first half of the book end up being horribly contrived. Kevin and Brent have sacrificed a little linearity to end up with much more interesting, real-life applications.

What struck me most of all about this book was the clarity of presentation; both of the explanation of the code, but also of the physical layout of the book. A large left margin is perfect for scribbling notes, and the code stands out beautifully. Pulling together all of the code for a recap at the end of the chapter helps, too, except in the case of longer examples where you end up with pages of uncommented code.

However, some of the longer examples, particularly the full-chapter document management example (something that's come in particularly handy when we've been developing a similar application ourselves ... .) could do with many more screenshots so the reader can tell what the result is going to be.

I was impressed to find that the book covers the more practical areas of CGI programming - uploading your files to the server, debugging, testing off-line, dealing with Web caches and so on. There's even a welcome section on how to read the Perl documentation. In fact, it was a bit of shame to look up the dreaded "premature end of script headers" error message in the index and not find an entry; but in real life, the entry for "debugging" pointed to a soultion that would work.

On the whole, the book covers the complete range of things you're likely to be doing with CGI: from basic uses of the protocol, through file upload forms, using mod_perl, and the ever-popular Web page hit counter, right up to full-size production applications. In short, I'd consider this the book for those wishing to convert a little Perl experience into solid Web developer knowledge.

Writing CGI Applications with Perl is published by Addison-Wesley

This week on Perl 6 (9/1 - 9/8, 2002)

Well, what a week it's been, eh, people? Larry's been telling the Slashdot crowd about quantum God and big knobs, there's been a call for Perl 6 programmers on Perlmonks (http://makeashorterlink.com/), and the Octarine parrot took flight.

So, let's start with perl6-internals, as is traditional.

The Octarine Parrot flies!

Jeff Goff (who deserves to suffer for his parody of Eight Days a Week) announced Parrot 0.0.8, available from the usual places. This release includes:

Working Perl 6 rules/patterns
Multidimensional keyed access
JIT for ARM processors
Lexical scope operators
Many bugfixes and smaller subsystems

Mere moments after announcing the new baby, Jeff realised he'd made a mistake with the tarball's MANIFEST, and announced the release of Parrot 0.0.8.1, which has all the same features as 0.0.8, but a MANIFEST error has been excised. MANIFEST seems to be a source of problems, as Tanton Gibbs also had problems with it later in the week.

http://makeashorterlink.com/ -- Announcement

http://www.cpan.org/src/parrot-0.0.8.1.tgz -- Tarball

Goal call for 0.0.9

Soon after the 0.0.8 announcement Dan asked us what we wanted to see in 0.0.9 and offered his list. The list became rather long as Sean O'Rourke, Steve Fink and Nicholas Clark all chipped in suggestions. The 'long' list ended up as:

Exceptions
Initial PMC freeze/thaw API
Sub indicators in bytecode
On-the-fly bytecode section generation
Methods (in PASM and C)
Implementation of some methods for the Perl classes
Stashes
Resolve the prompt finalization problem
PMC name, implementation cleanup(*)
Filename, line number tracking
Careful elimination of compiler warnings
Rationalization of bytecode generation

(*) The PMC issue was brought up by Steve. He pointed out that right now we have a situation where theoretically language independent PMCs often refer to, and create, Perl specific PMCs on the fly, which seems slightly wrong.

http://makeashorterlink.com/

http://makeashorterlink.com/ -- Steve's list

Approximate string matching in regexes

Sean O'Rourke discoursed on edit distances and approximate matching, and offered a patch implementing an rx_amatch opcode.

http://makeashorterlink.com/

More regex stack manipulations

Sean also implemented a couple of opcodes for manipulating the regex engine's stack: rx_stackdepth Ix and rx_stackchop Ix. Dan thought that the semantics of rx_stackchop were slightly off, and that the instructions were actually intstack specific and should therefore be called intstackdepth and <intstackchop> respectively. Dan is also 'really, really close to tossing the last pretense of being a stack-based system and [moving] all the way to routine-based frames instead, which would definitely make some things easier.' (Whatever that means). Sean and Steve Fink both seemed to think that that may be a step too far, at least for something as backtrack heavy as the regex core. Steve thought he would rather not create a few thousand Exception or Continuation objects and also made noises about encapsulating intstack.c's functionality in an appropriately named PMC.

Later in the week, Steve delivered his patch to provide an integer array PMC, but wondered if it shouldn't be called an integer dequeue (a double ended queue). John Porter voted for dequeue. Leopold Toetsch wondered why the patch needed a pair of new ops, and questioned the entire premise of the patch, so Steve mounted a sterling defence of his patch.

http://makeashorterlink.com/

http://makeashorterlink.com/

core.ops ARGDIR

Leopold Toetsch kicked off a discussion of 'the intended meaning of ARGDIR_IN/OUT'. I'm afraid to say that while I understand the individual words in his message, I don't really understand the post as a whole. Which is my own fault, and makes life rather hard for a summarizer. However, Nicholas Clark and Angel Faus seemed to understand him, and discussion ensued.

http://makeashorterlink.com/ -- Thread starts here.

Parrot: maximizing the audience

Jerome Quelin put the cat among the pigeons when he made a few observations about the perlcentricity of Parrot, and wondered what the benefits of making Parrot more explicitly vendor neutral. Jerome wondered 'what could we do in order to open parrot to non-Perl centric people?'

Markus Lair suggested renaming perl6-internals to 'something better like "parrot internals". Sean O'Rourke wondered what effect this would have, apart from breaking his procmail setup. Dan thinks we'll probably shift to a more neutral list name eventually. John Porter claimed that 'some folks seem to think that sufficient reason [to change] exists now.', and Dan pointed out that John didn't have to convince 'some folks', he had to convince Dan. John then attempted to invoke Larry.

John Porter reckoned that changing the list name would have "a huge psychological effect, at least outside our tight little club. But if that's only as far as you can see..."; Dan responded with admirable restraint.

Steve Fink is surprised at how little Parrot is tied to Perl 6, and noted that Perl 6 'mostly provides evidence that this isn't just an exercise in intellectual masturbation', and came down in favour of renaming the mailing list.

Ask Bjoern Hansen popped his head up to say that it would soon be relatively easy to change the list name to, say, 'dev@parrotcode.org'.

Andrew Kuchling, one of the python-dev crew, popped in to talk about parrot-gen.py, and why it's not being heavily developed. (From his point of view, the mailing list name is an irrelevancy BTW). Andrew made some good points about the fun and benefits of Parrot from the point of view of a Pythoneer (it isn't much fun, mostly because of the culture shock and that there don't appear to be any real benefits apart from the possibly chimerical cross-language stuff), and worried about the amount of optimization that was going on before we'd got any real languages implemented. Andrew also suggested making Scheme one of the driving languages for Parrot, based on its simple syntax and fully specified semantics.

As a result of all this, Dan posted his list of 'Parrot long-term goals/prospects', a 9 point list, with footnotes about where he sees things going. I'm not going to summarize it because it's already its own summary. Read it. There was some discussion about what the eventual 'pure parrot' build environment will look like, including some optimistic copyright lines...

http://makeashorterlink.com/

http://makeashorterlink.com/

http://makeashorterlink.com/ -- A Pythoneer speaks

http://makeashorterlink.com/ -- Dan's list

Implementation of Lists for languages/scheme

Jügen Bömmels offered a patch implementing Scheme pairs, using simple Arrays. Dan was impressed, and wondered how far we were from 'real' scheme. Jürgen thinks we're quite some way away; we still need symbols, strings, lexicals, functions, macros, continuations... Piers Cawley outlined an OO way forward using (initially) hashes, and proposed a 'SchemeObject' PMC, which would hide a lot of the common structural code needed for dispatching methods implemented in any of C/Parrot/Scheme.

http://makeashorterlink.com/

Teasing notes

Dan announced that he was about to go dark for a while as a deadline of his had just got a lot closer. However, he dropped a list of hints about what was forthcoming. Bryan asked for some clarification of a few hints (but they're hints, they're supposed to be a bit vague), and Dan went and spoiled the mystery by giving him some.

    http://archive.develooper.com/perl6-internals@perl.org/msg12475.html

Tinderbox turning green

Andy Dougherty noted that the recent work on portability and general lint gathering meant that we were well on the way to a green tinderbox (ie: automated builds are mostly working, yay!). Dan thought that gave him an ideal opportunity to break things again by adding exceptions to the mix. Andy then went a bit red, for assorted and faintly embarrassing reasons: the patch that was meant to turn things green had been applied and then inadvertently backed out. There's a moral here somewhere.

Actually there was a spate of inadvertent unpatching last week. People are embarrassed and hopefully putting procedures in place to avoid another spate.

http://makeashorterlink.com/

Perl 6 miscellany

Steve Fink offered a portmanteau patch for the Perl 6 compiler, including such delights as a '-e' flag to perl6 so the one line script folks could play. (First person to do the RSA one-liner in Perl 6 gets much kudos from me). Sean wondered about a few of the other fixes, and between them he and Steve found and squashed a bug with here docs, and discussed ways of getting working Perl 6 patterns up and running.

http://makeashorterlink.com/


Meanwhile, in perl6-language

Garrett Goebel reopened the 'auto deserialization' thread from last week (which ended up with the concept of my Date $date .= new('June 25, 2002') as the Damian approved idea). Garrett's confusion seemed to hang on whether my Date $date and my Date $date = Date->new() were equivalent. If they were, then a class which implemented its own operator:= method could arrange things so that the originally proposed my Date $date = 'June 25, 2002' would work.

Miko O'Sullivan proposed that, in the absence of a method name, .= should automagically call a new_from_[class of object being passed], noting that golfers would love it. The consensus seems to be against this particular bit of DWIMmery.

A subthread of this mammoth thread concerned multimethod dispatch. David Wheeler wanted to know what it was, so Dan explained, and Damian pointed at a Perl Journal article on the subject. This subthread ended up spawning its own thread, where Miko wondered if multi dispatch/overloading implied anything about runtime/compiletime. Dan thought not, and said it was dependent on the language.

Quote of the thread: "I can still remember the precise moment I first fell in love with polymorphism." -- Damian, in the TPJ article referred to.

http://makeashorterlink.com/

http://makeashorterlink.com/

http://makeashorterlink.com/ -- multimethods?

http://makeashorterlink.com/ -- TPJ article

http://makeashorterlink.com/

Class aliasing

The 'auto deserialization' thread also spawned a discussion of Class aliasing, but declined to change its subject line, making the life of a summarizer so much easier...

Last week, Damian had proposed doing class Date is Really::Long::Package::Name::Ugh, ie. subclassing the long name into a handy short name. Everyone else seemed to think he was proposing aliasing a class name (which Damian later denied).

Uri Guttman certainly thought that Damian was talking about lexical aliasing of classes (my mailer picks his post as this week's base for this particular subthread). Trey Harris thought that Damian was talking about subclassing, and proposed my class Date is Really::Long::Package::Name::Ugh, which introduces a lexically scoped subclass. Nicholas Clark wondered if class Date := Really::Long::Package::Name::Ugh would express aliasing of classes. Damian thought it probably would, but noted that Larry hasn't said a definite 'yes' to class name aliasing.

Brent Dax wondered if classes weren't in danger of becoming the "new filehandles" -- relatively limited entities with no sigils that confuse the syntax horribly.' and pointed out an ambiguous case.

David Wheeler apologised to everyone for confusing inheritance and aliasing in the first place, and the thread wound down to a close.

http://makeashorterlink.com/

@array = %hash

Another thread from last week that rumbled on into this one.

The discussion of when hashes could have pairs as keys wouldn't go away. Damian says that the %hash = (...) case is syntactic sugar and that having a pair in an even slot of such a list would probably be an error. I assume that this also applies in the case of %hash = @array.

In a subthread, Piers Cawley told David Whipp that he hadn't actually been joking when he proposed implementing the entire Smalltalk Collection hierarchy and making it available in core Perl 6. Luke Palmer liked the idea too. Dan saw no reason why such a thing should be impossible either...

http://makeashorterlink.com/

http://makeashorterlink.com/

Atomicness and \n

Last week Damian reminded us of $roundor7 = rx/<<roundascii>+[17]>/, and Brent Dax wondered how one could be sure that <roundascii> is a character class. Luke wondered if it mattered. The thread eventually led Aaron Sherman to make a few proposals about user defined character classes in regular expressions. Deborah Ariel Pickett wondered why we still used ? for non-greedy quantifiers, citing teachability reasons for why we shouldn't.

http://makeashorterlink.com/

http://makeashorterlink.com/

http://makeashorterlink.com/

Hypothetical variables and scope

Aaron Sherman announced that he was working on a library of rules and subroutines for dealing with UNIX system files, mostly as mental exercise to help with his understanding of Perl's new pattern features. He wondered, in particular, about the scoping of hypothetical variables in the presence of lexicals of the same name. For some reason this turned into a minor argument about whether my $x; / (\S*) { let $x = .pos } \s* foo /, from Apocalypse 5, was a typo for my $x; / (\S*) { let $x := .pos } /. I'm not going to try to summarize the argument.

http://makeashorterlink.com/

Argument aliasing for subs

Pete Behroozi wondered about syntax for 'allowing subroutines to have many different names for the same argument', citing CGI.pm as an example of such code in Perl 6. Damian thought that if it were allowed, it would be done with properties sub hidden (..., int $force is aka($override) ) {...}, (well, he did after he realised that his first post on this issue was the result of posting after travelling 14,000 miles and giving 93 hours of talks in the space of 18 days, and was somewhat flawed).

All this cunning use of is led Erik Steven Harrison (who should fix his mailer so it does References or In-Reply-To properly) to wonder if properties weren't being a little overused, and wondered if the is/but distinction was still around. Damian thought not, and reiterated the difference between is and but, and when you would use each of them. Erik also wondered what was wrong with sub hidden ( int $force := $override ) {...}. Damian pointed out that it didn't play well with the defaulting operator, //=, in parameter declarations.

Peter Behroozi got confused about the difference between $foo is named('baz') and $foo is aka($baz). (named changes the externally visible name of a parameter, aka adds another external name for the parameter).

The thread then morphed into a discussion of runtime and compiletime properties. If you've not seen such a discussion before, the whole thread is worth reading.

http://makeashorterlink.com/

Perl 6 parser, built in rules, etc.

Erik Steven Harrison wondered about backward compatibility issues with changing Perl 6's grammar when the grammar rules are so exposed to the user. Sean O'Rourke didn't think it was an issue yet. Dan told us that the eventual Perl 6 grammar would be maintained in a backward compatible way, with documented places for adding changes, and that this would be maintained for as long as Larry said so.

http://makeashorterlink.com/

regex args and interpolation

David Whipp confused the heck out of me when he asked about 'regex args and interpolation'. I confess I can't for the life of me see what the issue is that he's trying to get at. Ken Fox seemed to understand him though, but wanted to know what the rule was that he was using in his example code, and proposed a couple of implementations of it. Nicholas Clark wondered about a section of Ken's post:

    { use grammar Perl::AbstractSyntax;
      $0 := (expr (invoke 'new (class 'Date) $1 $2 $3))) }

Specifically, what was that S-expression doing there? Piers Cawley pointed out that S-expressions were a concise way of writing out an AST's structure. Nick agreed, but pointed out that it was still in the middle of a stream of Perl, but worked out that the use grammar Perl::AbstractSyntax part of Ken's code meant that all bets were off. At this point Nick's head threatened to explode at the wonderfulness of it all.

http://makeashorterlink.com/

Defaulting params (reprise)

Miko O'Sullivan doesn't like sub foo ( $arg //= 1 ) {...} for specifying default values for function arguments. He would rather have sub foo ( $arg is default(1) ) {...}. Damian pointed out that is default(...) would be a compiletime only thing, which didn't necessarily make sense.

http://makeashorterlink.com/

Hypotheticals again

Jonathan Scott Duff wondered some more about hypotheticals and let. He wanted to know whether hypothetically binding to a lexically scoped variable would also introduce that name into the match object. A strict reading of Apocalypse 5 suggests that that isn't the case, which, Jonathan points out, causes the programmer a few headaches. Damian agreed that he'd like to see $0 contain keys for all the hypotheticals used in a match, whether they came from the lexical scope or not. Damian would also like them to turn up in $o{'$name_with_dollar' as well. So, it seems that everyone's assumptions are aligned and we can carry on.

http://makeashorterlink.com/

First crack at Builtins.p6m

Aaron Sherman decided that what the world needs now is, at least, a set of function signatures for everything that's in perl5's perlfunc listing. He's even had a crack at implementing those functions where possible. Aaron points out that he thinks of this file as documentation, not code. People were impressed, and there's talk of using it to compile down to the IMCC input as a base for hand optimizing. Aaron also released a second version with more functions implemented and a slightly different organization.

http://makeashorterlink.com/

http://makeashorterlink.com/

More A5/E5 questions

Jonathan Scott Duff asked a bunch of questions about the pattern engine, and got a bunch of answers.

http://makeashorterlink.com/


In brief

Leon Brocard wondered if the concatenation bug was fixed yet...

Leon also offered a patch implementing chr, reckoning that if we have ord, then symmetry demands chr. Dan applied it, and Jerome Quelin found a bug in his Befunge-93 interpreter which he thought was a bug in the chr implmentation, but turned out not to be.

The Perl 6 compiler now does interpolated strings. Kudos to Joseph Ryan.

Josef Höök had a problem with key_next not behaving as expected within multiarray.pmc. Tom Hughes pointed where expectation and reality differed, and harmony was restored.

Kevin Falcone patched the glossary to include a definition of ICU.

Mike Lambert is having problems with the Perl 6 compiler under Windows. Sean O'Rourke can't duplicate the problem. Anyone else tried this? More information is welcome.

Ken Fox and Damian continued their discussion of how one would munge the current Perl 6 grammar.

John Williams wondered about doing reduce with ^+=. Damian can't remember what side he argued when it came up last October, but is now of the opinion that John's suggestion is a good idea.

Mr Nobody suggested some changes to the Apocalypse 5 pattern syntax for reasons of length. Consensus seems to be that these changes aren't a good idea.


Who's who in Perl 6

Who are you?
I'm Dan and ... I design virtual machines.
What do you do for/with Perl 6?
I'm designing the virtual machine to compile and run it.
Where are you coming from?
A place about 70 miles east of here.

I started fiddling with computers with an Atari 800 (fully decked out with a massive *48*K of RAM!) a long, long time ago. That lead me to Atari BASIC, then to 6502 assembly, then to Forth. Then (briefly) to college and Pascal and PL/I running under VM/CMS on an OS/370 system.

From there I bounced to COBOL (on both RSTS/E and OS/370) and OS/370 assembly, and then to C under RSTS/E. I disliked C on first sight (so of course that's what I spend most of my life writing now), and it didn't get any better on my Amiga, though AREXX there was rather nice. Wrote an article on AREXX for one of the now-dead Amiga magazines, too.

My first real job in the industry was maintaining a horrid app written in BASIC on RSTS/E (with the new bits written in BASIC/PLUS 2.6, which was a nice dialect of BASIC, and something I still like better than C) and helping write a new system in Progress on Unix boxen. (DG AViiON systems with the 88K series processor. Now there was a sweet, and alas, now dead, RISC processor)

That lead to writing C and SQL code on a VMS system, at which point my fate was sealed, and I've been doing VMS admin and programming ever since.

I first encountered Perl when looking for a guestbook for a webserver we were running on one of the VMS boxes. It was, of course, written in Perl, which meant grabbing a copy of perl. It was one of the 5.003_9x releases, and since Dec C was significantly pickier than any other C compiler at the time, I started shooting patches for it to Chip.

That lead to my first XS module, and then to the second, and third (and eighth, and tenth, and...). Oddly enough, I wrote my first piece of Perl code about six months after my first patch to the perl core (Charles Bailey wrote the Perl pieces for the first module I did, and chunks of the XS) I ended up with the VMS maintenance hat for a while, which has since been passed on, and I got snared by the fun that was the first threading attempt for 5.005.

I ended up volunteering to coordinate the Perl 6 development at TPC 4, since everyone more competent had the good sense to run screaming away from the job, and I've had it ever since.

When do you think Perl 6 will be released?
When it's done.
Why are you doing this?
Beats the heck out of me. Someone's got to.
You have 5 words. Describe yourself.
Tall enough, under the circumstances.
Do you have anything to declare?
Yes, absolutely.

Acknowledgments

This summary is dedicated to the memory of Gizmo, a cat of great character, who we had to have put down on Saturday at the age of 17.

Chris Ball, Mark Fowler and Pete Sergeant helped with proofreading this week. Thanks chaps.

Once again, if you like the summary, please consider giving money to the Perl Foundation at http://donate.perl-foundation.org/ -- your money will go to help fund the ongoing development of Perl 6.

This week's summary was, once again sponsored by the O'Reilly Network who are paying the publication fee for the article directly to the Perl Foundation.

Going Up?

Perl 5.8.0 is the first version of Perl with a stable threading implementation. Threading has the potential to change the way we program in Perl, and even the way we think about programming. This article explores Perl's new threading support through a simple toy application - an elevator simulator.

Until now, Perl programmers have had a single mechanism for parallel processing - the venerable fork(). When a program forks, an entirely new process is created. It runs the same code as the parent process, but exists in its own memory space with no access to the parent process' memory. Communication between forked processes is possible but it's not at all convenient, requiring pipes, sockets, shared memory or other clumsy mechanisms.

In contrast, multiple threads exist inside a single process, in the same memory space as the creating thread. This allows threads to communicate much more easily than separate processes. The potential exists for threads to work together in ways that are virtually impossible for normal processes.

figure1

Additionally, threads are faster to create and use less memory than full processes (to what degree depends on your operating system). Perl's current threading implementation doesn't do a good job of realizing these gains, but improvements are expected. If you learn to thread now, then you'll be ready to take advantage of the extra speed when it arrives. But even if it never gets here, thread programming is still a lot of fun!

Building a Threading Perl

To get started with threads you'll need to compile Perl 5.8.0 (http://cpan.org/src/stable.tar.gz) with threads enabled. You can do that with this command in the unpacked source directory:


  sh Configure -Dusethreads

Also, it's a good idea to install your thread-capable Perl someplace different than your default install as enabling threading will slow down even nonthreaded programs. To do that, use the -Dprefix argument to configure. You'll also need to tell Configure not to link this new Perl as /usr/bin/perl with -Uinstallusrbinperl. Thus, a good Configure line for configuring a threaded Perl might be:


  sh Configure -Dusethreads -Dprefix=~/myperl -Uinstallusrbinperl

Now you can make and make install. The resulting Perl binary will be ready to run the simulator in Listing 1, so go ahead and give it a try. When you get back, I'll explain how it works.

An Elevator Simulator

The elevator simulator's design was inspired by an assignment from Professor Robert Dewar's class in programming languages at New York University. The objective of that assignment was to learn how to use the threading features of Ada. The requirements are simple:

  • Each elevator and each person must be implemented as a separate thread.
  • People choose a random floor and ride up to it from the ground floor. They wait there for a set period of time and then ride back down to the ground floor.
  • At the end of the simulation, the user receives a report showing the efficiency of the elevator algorithm based on the waiting and riding time of the passengers.
  • Basic laws of physics must be respected. No teleporting people allowed!

The class assignment also required students to code a choice of elevator algorithms, but I've left that part as an exercise for the reader. (See how lazy I can get without a grade hanging over my head?)

When you run the simulator you'll see output like:


  $ ./elevator.pl 
  Elevator 0 stopped at floor 0.
  Elevator 1 stopped at floor 0.
  Elevator 2 stopped at floor 0.
  Person 0 waiting on floor 0 for elevator to floor 11.
  Person 0 riding elevator 0 to floor 11.
  Elevator 0 going up to floor 11.
  Person 1 waiting on floor 0 for elevator to floor 1.
  Person 2 waiting on floor 0 for elevator to floor 14.
  Person 2 riding elevator 1 to floor 14.
  Person 1 riding elevator 1 to floor 1.
  Elevator 1 going up to floor 1.

And when the simulation finishes, you'll get some statistics:


  Average Wait Time:   1.62s
  Average Ride Time:   4.43s

  Longest Wait Time:   3.95s
  Longest Ride Time:  10.09s

Perl's Threading Flavor

Before jumping headlong into the simulator code I would like to introduce you to Perl's particular threading flavor. There are a wide variety of threading models living in the world today - POSIX threads, Java threads, Linux threads, Windows threads, and many more. Perl's threads are none of these; they are of an entirely new variety. This means that you may have to set aside some of your assumptions about how threads work before you can truly grok Perl threads.

Note that Perl's threads are not 5.005 threads. In Perl 5.005 an experimental threading model was created. Now known as 5.005 threads, this system is deprecated and should not be used by new code.

In Perl's threading model, variables are not shared by default unless explicitly marked to be shared. This is important, and also different from most other threading models, so allow me to repeat myself. Unless you mark a variable as shared it will be treated as a private thread-local variable. The downside of this approach is that Perl has to clone all of the nonshared variables each time a new thread is created. This takes memory and time. The upside is that most nonthreaded Perl code will ``just work'' with threads. Since nonthreaded code doesn't declare any shared variables there's no need for locking and little possibility for problems.

Perl's threading model can be described as low-level, particularly compared to the threading models of Java and Ada. Perl offers you the ability to create threads, join them and yield processor time to other threads. For communication between threads you can mark variables as shared, lock shared variables, wait for signals on shared variables, and send signals on shared variables. That's it!

Most higher-level features, like Ada's entries or Java's synchronized methods, can be built on top of these basic features. I expect to see plenty of development happening on CPAN in this direction as more Perl programmers get into threads.

Preamble

Enough abstraction, let's see this stuff work! The elevator simulator in Listing 1 starts with a section of POD documentation describing how to use the program. After that comes a block of use declarations:


  use 5.008;             # 5.8 required for stable threading
  use strict;            # Amen
  use warnings;          # Hallelujah
  use threads;           # pull in threading routines
  use threads::shared;   # and variable sharing routines

The first line makes sure that Perl version 5.8.0 or later is used to run the script. It isn't written use 5.8.0 because that's a syntax error with older Perls and the whole point is to produce a friendly message telling the user to upgrade. The next lines are the obligatory strict and warnings lines that will catch many of the errors to which my fingers are prone.

Next comes the use threads call that tells Perl I'll be using multiple threads. This must come as early as possible in your programs and always before the next line, use threads::shared. The threads::shared module allows variables to be shared between threads, making communication between threads possible.

Finally, GetOpt::Long is used to load parameters from the command line. Once extracted, the parameter values are stored in global variables with names in all caps ($NUM_ELEVATORS, $PEOPLE_FREQ, and so on).

Building State

The building is represented in the simulation with three shared variables, %DOOR, @BUTTON and %PANEL. These variables are declared as shared using the shared attribute:


  # Building State
  our %DOOR   : shared; # a door for each elevator on each floor
  our @BUTTON : shared; # a button for each floor to call the elevators
  our %PANEL  : shared; # a panel of buttons in each elevator for each floor

When a variable is marked as shared its state will be synchronized between threads. If one thread makes a change to a shared variable then all the other threads will see that change. This means that threads will need to lock the variable in order to access it safely, as I'll demonstrate below.

The building state is initialized in the init_building() function.


  # initialize building state
  sub init_building {
      # set all indicators to 0 to start the simulation
      for my $floor (0 .. $NUM_FLOORS - 1) {       
          $BUTTON[$floor] = 0;
          for my $elevator (0 .. $NUM_ELEVATORS - 1) {
              $PANEL{"$elevator.$floor"} = 0;
              $DOOR{"$elevator.$floor"}  = 0;
          }
      }   
  }

The buttons on each floor are set to 0 to indicate that they are ``off.'' When a person wants to summon the elevator to a floor they will set the button for that floor to 1 ($BUTTON[$floor] = 1).For each elevator there are a set of panel buttons and a set of doors, one for each floor. These are all cleared to 0 at the start of the simulation. When an elevator reaches a floor it will open the door by setting the appropriate item in %DOOR to 1 ($DOOR{"$elevator.$floor"} = 1). Similarly, people tell the elevators where to go by setting entries in %PANEL to 1 ($PANEL{"$elevator.$floor"} = 1).

Figure 2 shows a single-elevator building with four floors and three people. Don't worry if this doesn't make much sense yet, you'll see it in action later.

figure2

Thread Creation

After calling init_building() to initialize the shared building state variables, the program creates the elevator threads inside init_elevator():


  # create elevator threads
  sub init_elevator {
      our @elevators;
      for (0 .. $NUM_ELEVATORS - 1) {
          # pass each elevator thread a unique elevator id
          push @elevators, threads->new(\&Elevator::run, 
                                        id => $_);
      }
  }

Threads are created by calling threads->new(). The first argument to threads->new() is a subroutine reference where the new thread will begin execution. In this case, it is the Elevator::run() subroutine declared later in the program. Anything after the first argument is passed as an argument to this subroutine. In this case each elevator is given a unique ID starting at 0.

The return value from threads->new() is an object representing the created thread. This is saved in a global variable, @elevators, for use later in shutting down the simulation.

After the elevators are created the simulation is ready to send in people with the init_people() routine:


  # create people threads
  sub init_people {
      our @people;
      for (0 .. $NUM_PEOPLE - 1) {
          # pass each person thread a unique person id and a random
          # destination
          push @people, threads->new(\&Person::run,
                                     id   => $_, 
                                     dest => int(rand($NUM_FLOORS - 2)) + 1);

          # pause if we've launched enough people this second
          sleep 1 unless $_ % $PEOPLE_FREQ;
      }
  }

This routine creates $PEOPLE_FREQ people and then sleeps for one second before continuing. If this wasn't done, then all the people would arrive at the building at the same time and the simulation would be rather boring. Notice that while the main thread sleeps the simulation is proceeding in the elevator and people threads.

The people threads start at Person::run(), which will be described later. Person::run() receives two parameters - a unique ID and a randomly chosen destination floor. Each person will board an elevator at the ground floor, ride to this floor, wait there for a set period of time and then ride an elevator back down.

The Elevator Class

Each elevator thread contains an object of the Elevator class. The Elevator::run() routine creates this object as its first activity:


  # run an Elevator thread, takes a numeric id as an argument and
  # creates a new Elevator object
  sub run {
      my $self = Elevator->new(@_);

Notice that since $self is not marked shared it is a thread-local variable. Thus, each elevator has its own private $self object. The new() method just sets up a hash with some useful state variables and returns a blessed object:


  # create a new Elevator object
  sub new {
      my $pkg = shift;
      my $self = { state => STARTING,
                   floor => 0,
                   dest  => 0,
                   @_,
                 };
      return bless($self, $pkg);
  }

All elevators start at the ground floor (floor 0) with no destination. The state attribute is set to STARTING which comes from this set of constants used to represent the state of the elevator:


  # state enumeration
  use constant STARTING   => 0;
  use constant STOPPED    => 1;
  use constant GOING_UP   => 2;
  use constant GOING_DOWN => 3;

After setting up the object, the elevator thread enters an infinite loop looking for button presses that will cause it to travel to a floor. At the top of the loop $self->next_dest() is called to determine where to go:


    # run until simulation is finished
    while (1) {
        # get next destination
        $self->{dest} = $self->next_dest;

The next_dest() method examines the shared array @BUTTON to determine if any people are waiting for an elevator. It also looks at %PANEL to see if there are people inside the elevator heading to a particular floor. Since next_dest() accesses shared variables it starts with a call to lock() for each shared variable:


  # choose the next destination floor by looking at BUTTONs and PANELs
  sub next_dest {
      my $self = shift;
      my ($id, $state, $floor) = @{$self}{('id', 'state', 'floor')};
      lock @BUTTON;
      lock %PANEL;

Perl's lock() is an advisory locking mechanism, much like flock(). When a thread locks a variable it will wait for any other threads to release their locks before proceeding. The lock obtained by lock() is lexical - that is, it lasts until the enclosing scope is exited. There is no unlock() call, so it's important to carefully scope your calls to lock(). In this case the locks on @BUTTON and %PANEL last until next_dest() returns.

next_dest()'s logic is simple, and largely uninteresting for the purpose of learning about thread programming. It does a simple scan across @BUTTON and %PANEL looking for 1s and takes the first one it finds.

Once next_dest() returns the elevator has its marching orders. By comparing the current floor ($self->{floor}) to the destination the elevator now knows whether it should stop, or travel up or down. First, let's look at what happens when the elevator decides to stop:


   # stopped?
   if ($self->{dest} == $self->{floor}) {
        # state transition to STOPPED?
        if ($self->{state} != STOPPED) {
            print "Elevator $id stopped at floor $self->{dest}.\n";
            $self->{state} = STOPPED;
        }

        # wait for passengers
        $self->open_door;
        sleep $ELEVATOR_WAIT;

The code starts by printing a message and changing the state attribute if the elevator was previously moving. Then it calls the open_door() method and sleeps for $ELEVATOR_WAIT seconds.

The open_door() method opens the elevator door. This allows waiting people to board to elevator.


  # open the elevator doors
  sub open_door {
      my $self = shift;
      lock %DOOR;
      $DOOR{"$self->{id}.$self->{floor}"} = 1;
      cond_broadcast(%DOOR);
  }

Like next_dest(), open_door() manipulates a shared variable so it starts with a call to lock(). It then sets the elevator door for the elevator on this floor to open by assigning 1 to the appropriate entry in %DOOR. Then it wakes up all waiting person threads by calling cond_broadcast() on %DOOR. I'll go into more detail about cond_broadcast() when I show you the Person class later on. For now suffice it to say that the people threads wait on the %DOOR variable and will be woken up by this call.

The other states, for going up and going down, are handled similarly:


  } elsif ($self->{dest} > $self->{floor}) {
      # state transition to GOING UP?
      if ($self->{state} != GOING_UP) {
          print "Elevator $id going up to floor $self->{dest}.\n";
          $self->{state} = GOING_UP;
          $self->close_door; 
      }

      # travel to next floor up
      sleep $ELEVATOR_SPEED;
      $self->{floor}++;

  } else {
      # state transition to GOING DOWN?
      if ($self->{state} != GOING_DOWN) {
          print "Elevator $id going down to floor $self->{dest}.\n";
          $self->{state} = GOING_DOWN;
          $self->close_door; 
      }

      # travel to next floor down
      sleep $ELEVATOR_SPEED;
      $self->{floor}--;
  }

The elevator looks at the last value for $self->{state} to determine whether it was already heading up or down. If not, then it prints a message and calls $self->close_door(). Then it sleeps for $ELEVATOR_SPEED seconds as it travels between floors and adjusts its current floor accordingly.

The close_door() method simply does the inverse of open_door(), but without the call to cond_broadcast() since there's no point waking people up if they can't get on the elevator:


  # close the elevator doors
  sub close_door {
      my $self = shift;
      lock %DOOR;
      $DOOR{"$self->{id}.$self->{floor}"} = 0;
  }

Finally, at the bottom of the elevator loop there is a check on the shared variable $FINISHED:


  # simulation over?
  { lock $FINISHED; return if $FINISHED; }

Since the elevator threads are in an infinite loop the main thread needs a way to tell them when the simulation is over. It uses the shared variable $FINISHED for this purpose. I'll go into more detail about why this is necessary later.

That's all there is to the Elevator class code. Elevators simply travel from floor to floor opening and closing doors in response to buttons being pushed by people.

The Person Class

Now that we've looked at the machinery, let's turn our attention to the inhabitants of this building, the people. Each person thread is created with a goal - ride an elevator up to the assigned floor, wait a bit and then ride an elevator back down. Person threads are also responsible for keeping track of how long they wait for the elevator and how long they ride. When they finish they report this information back to the main thread where it is output for your edification.

Person::run() starts the same way as Elevator::run(), by creating a new object:


  # run a Person thread, takes an id and a destination floor as
  # arguments.  Creates a Person object.
  sub run {
      my $self = Person->new(@_);

Inside Person::new() two attributes are setup to keep track of the person's progress, floor and elevator:


  # create a new Person object
  sub new {
      my $pkg = shift;
      my $self = { @_,
                   floor    => 0,
                   elevator => 0 };
      return bless($self, $pkg);
  }

Back in Person::run() the person thread begins waiting for the elevator by calling $self->wait(). The calls to time() will be used later to report on how long the person waited.


  # wait for elevator going up
  my $wait_start1 = time;
  $self->wait;
  my $wait1 = time - $wait_start1;

The wait() method is responsible for waiting until an elevator arrives and opens its doors on this floor:


  # wait for an elevator
  sub wait {
      my $self = shift;

      print "Person $self->{id} waiting on floor 1 for elevator ",
        "to floor $self->{dest}.\n";

      while(1) {
          $self->press_button();
          lock(%DOOR);
          cond_wait(%DOOR);
          for (0 .. $NUM_ELEVATORS - 1) {
              if ($DOOR{"$_.$self->{floor}"}) {
                  $self->{elevator} = $_;
                  return;
              }
          }
      }
  }

  # signal an elevator to come to this floor
  sub press_button {
      my $self = shift;
      lock @BUTTON;
      $BUTTON[$self->{floor}] = 1;
  }

After printing out a message, the code enters an infinite loop waiting for the elevator. At the top of the loop, the press_button() method is called. press_button() locks @BUTTON and sets $BUTTON[$self->{floor}] to 1. This will tell the elevators that a person is waiting on the ground floor.

The code then locks %DOOR and calls cond_wait(%DOOR). This has the effect of releasing the lock on %DOOR and putting the thread to sleep until another thread does a cond_broadcast(%DOOR) (or cond_signal(%DOOR), a variant of cond_broadcast() that just wakes a single thread). When the thread wakes up again it re-acquires the lock on %DOOR and then checks to see if the door that just opened is on this floor. If it is the person notes the elevator and returns from wait().

If there's no elevator on the floor where the person is waiting, the loop is run again. The person presses the button again and then goes back to sleep waiting for the elevator. You might be wondering why the call to press_button() is inside the loop instead of outside. The reason is that it is possible for the person to wake up from cond_wait() but have to wait so long to re-acquire the lock on %DOOR that the elevator is already gone.

Once the elevator arrives, control returns to run() and the person boards the elevator:


    # board the elevator, wait for arrival at destination floor and get off
    my $ride_start1 = time;
    $self->board;
    $self->ride;
    $self->disembark;
    my $ride1 = time - $ride_start1;

The board() method is simple enough. It just turns off the @BUTTON entry used to summon the elevator and presses the appropriate button inside the elevator's %PANEL:


  # get on an elevator
  sub board {
      my $self = shift;
      lock @BUTTON;
      lock %PANEL;
      $BUTTON[$self->{floor}] = 0;
      $PANEL{"$self->{elevator}.$self->{dest}"} = 1;
  }

Next, the run() code calls ride() which does another cond_wait() on %DOOR, this time waiting for the door in the elevator to open on the destination floor:


  # ride to the destination
  sub ride {
      my $self = shift;

      print "Person $self->{id} riding elevator $self->{elevator} ",
        "to floor $self->{dest}.\n";

      lock %DOOR;
      cond_wait(%DOOR) until $DOOR{"$self->{elevator}.$self->{dest}"};
  }

When the elevator arrives, ride() will return and the person thread calls disembark(), which clears the entry in %PANEL for this floor and sets the current floor in $self->{floor}.


  # get off the elevator
  sub disembark {
      my $self = shift;


      print "Person $self->{id} getting off elevator $self->{elevator} ",
        "at floor $self->{dest}.\n";


      lock %PANEL;
      $PANEL{"$self->{elevator}.$self->{dest}"} = 0;
      $self->{floor} = $self->{dest};
  }

After reaching the destination floor, the person thread waits for $PEOPLE_WAIT seconds and then heads back down by repeating the steps again with $self->{dest} set to 0:


    # spend some time on the destination floor and then head back
    sleep $PEOPLE_WAIT;
    $self->{dest} = 0;

When this is complete the person has arrived at the ground floor. The thread ends by returning the recorded timing data with return:


    # return wait and ride times
    return ($wait1, $wait2, $ride1, $ride2);

The Grand Finale

While the simulation is running the main thread is sitting in init_people() creating person threads periodically. Once this task is complete the finish() routine is called.

The first task of finish() is to collect statistics from the people threads as they complete:


  # finish the simulation - join all threads and collect statistics
  sub finish {
      our (@people, @elevators);


      # join the people threads and collect statistics
      my ($total_wait, $total_ride, $max_wait, $max_ride) = (0,0,0,0);
      foreach my $person (@people) {
          my ($wait1, $wait2, $ride1, $ride2) = $person->join;
          $total_wait += $wait1 + $wait2;
          $total_ride += $ride1 + $ride2;
          $max_wait    = $wait1 if $wait1 > $max_wait;
          $max_wait    = $wait2 if $wait2 > $max_wait;
          $max_ride    = $ride1 if $ride1 > $max_ride;
          $max_ride    = $ride2 if $ride2 > $max_ride;
      }

To extract return values from a finished thread the join() method must be called on the thread object. This method will wait for the thread to end, which means that this loop won't finish until the last person reaches the ground floor.

Once all the people are processed, the simulation is over. To tell the elevators to shutdown the shared variable $FINISHED is set to 1 and the elevators are joined:


  # tell the elevators to shut down
  { lock $FINISHED; $FINISHED = 1; }
  $_->join for @elevators;

If this code were omitted the simulation would still end but Perl would print a warning because the main thread exited with other threads still running.

Finally, finish() prints out the statistics collected from the person threads:


  # print out statistics
  print "\n", "-" x 72, "\n\nSimulation Complete\n\n", "-" x 72, "\n\n";
  printf "Average Wait Time: %6.2fs\n",   ($total_wait / ($NUM_PEOPLE * 2));
  printf "Average Ride Time: %6.2fs\n\n", ($total_ride / ($NUM_PEOPLE * 2));
  printf "Longest Wait Time: %6.2fs\n",   $max_wait;
  printf "Longest Ride Time: %6.2fs\n\n", $max_ride;

The end!

A Few Wrinkles

Overall, the simulator was a fun project with few major stumbling blocks. However, there were a few problems or near problems that you would do well to avoid.

Deadlock

All parallel programs are susceptible to deadlock, but, by virtue of higher levels of inter-activity, threads suffer it more frequently. Deadlock occurs when independent threads (or processes) each need a resource the other has.

In the elevator simulator I avoided deadlock by always performing multiple locks in the same order. For example, Elevator::next_dest() begins with:


  lock @BUTTON;
  lock %PANEL;

And in Person::board() the same sequence is repeated:


  lock @BUTTON;
  lock %PANEL;

If the lock calls in Person::board() were reversed then the following could occur:

  1. Elevator 2 locks @BUTTON.
  2. Person 3 locks %PANEL.
  3. Elevator 2 tries to lock %PANEL and blocks waiting for Person 3's lock.
  4. Person 3 tries to lock @BUTTON and blocks waiting for Elevator 2's lock.
  5. Deadlock! Neither thread can proceed and the simulation will never end.

Modules

In general, unless a module has been specifically vetted as thread safe it cannot be used in a threaded program. Most pure Perl modules should be thread safe but most XS modules are not. This goes for core modules too!

An earlier version of the elevator simulator used Time::HiRes to allow for fractional sleep() times. This really helped speed up the simulation since it meant that elevators could traverse more than one floor per second. However, on further investigation (and advice from Nick Ing-Simmons) I realized that Time::HiRes is not necessarily thread safe. Although it seemed to work fine on my machine there's no reason to believe that would be the case elsewhere, or even that it wouldn't blow up at some random point in the future. The problem with thread safety is that it's virtually impossible to test for; either you can prove you have it or you must assume you don't!

Synchronized rand()

The first version of the simulator I wrote had the people threads calling rand() inside Person::run() to choose the destination floor. I also had a call to srand() in the main thread, not realizing that Perl now calls srand() with a good seed automatically. The combination resulted in every person choosing the same destination floor. Yikes!

The reason for this is that by calling srand() in the main thread I set the random seed. Then when the threads were created that seed was copied into each thread. The call to rand() then generated the same first value in each thread.

Resources

Perl comes with copious threading documentation. You can read these docs by following the links below or by using the perldoc program that comes with Perl.

The Fusion of Perl and Oracle

Andy Duncan is the co-author of Perl for Oracle DBAs.

My coauthor, Jared Still, and I had the task of writing a book, Perl for Oracle DBAs, about two of our favorite subjects, Perl and Oracle. Our goal was to link Perl and ready-canned Perl applications to the job of making an Oracle DBA's life both easier and better. Besides covering the entire spread of Perl, and in particular, relating the examination and control of Oracle databases, we also created a wide-ranging, open source Perl Toolkit for Oracle DBAs. The toolkit contains all the Perl scripts we've used as DBAs for the past decade, wrapped up into a single, object-oriented project for both Unix and Win32, and which forms the heart of our book. We've also included a comprehensive guide to all of the Perl Oracle DBA tools already out there, including Orac, Oracletool, DDL::Oracle, StatsView, Senora, Apache::OWA, and many more.

I thought that in this article, we'd go beyond our passionate and committed belief that Perl is simply the finest scripting language ever invented for helping Oracle DBAs in their daily working lives, and question just why Perl possesses such a good symbiosis with the Oracle database. We'll also try to explain how the answers to this question helped us construct our Perl DBA Toolkit.

Objectivism

At first glance, Perl and Oracle seem like strange bedfellows, so how can they be linked so well to each other? We think there's is perhaps even a philosophical link stretching between Perl and Oracle, all the way back to Aristotle and Athens, the first state in the world to champion the rights of the individual. This link, we believe, is "Objectivism." This most modern of philosophies was created by Ayn Rand, an escapee of 1920s Leningrad, who is more famous as the author of Atlas Shrugged. It remains perhaps one of the most influential books of the past 100 years, in the same league as Brave New World, Animal Farm, and 1984; it is also the fictional representation of Objectivism.

Ayn Rand's ideas form a philosophy in defense of the individual and his or her rights, which spring principally from an individual's right to life, and the pursuit of his or her own happiness, within the rule of law. To demonstrate this, in a dramatic sense, Ayn Rand's books portrayed what she later called "Ideal" men. In one of her earlier books, The Fountainhead, Ayn Rand's hero was Howard Roark, a maverick architect, and the best of his generation, who insists on clean, simple, and strong design. Virtually everyone else is stuck in ornament and the nineteenth century, whereas Roark creates great soaring towers of glass and steel that take one's breath away. Although Roark often takes payment and credit, he is so dedicated to his craft that he often misses out on both, handing them on to other, lesser architects just so he can see his bold designs completed.

Roll the clock forward 50 years, and if Ayn Rand were to write The Fountainhead now, it would be surprising if she didn't replace her building architect with a computer language architect, perhaps someone like Larry Wall. Howard Roark's architecture brings joy to millions of ordinary, hard-working people. In the same way, Perl has increased the productivity and creative expression of Oracle DBAs and system administrators everywhere because it is so deliberately tailored toward the individual. It is all things to all people, and you can blend it with any architecture to create exactly the software you want. You are limited only by your system administrator's disk quota for DBA users. Similarly, with our toolkit, we have tried to keep it free from adornment and left it highly configurable for your own individual needs. We've also tried to aid the construction of further solutions, and keep this as simple as possible by providing ready-built modules that you can plug right into your own designs and interfaces. Our main architectural modules are summarized below, in Table 1.

Table 1. Main PDBA Toolkit Supporting Modules

Module Description
PDBA::CM Connection manager that simplifies Perl-to-Oracle connectivity.
PDBA::ConfigFile Finds and opens configuration files.
PDBA::ConfigLoad Finds, opens, parses, and loads configuration files into memory.
PDBA::DBA Designed for DBA-specific tasks; many methods are data- dictionary related.
PDBA::Daemon Runs Perl script daemons on Unix.
Win32::Daemon This module, by Dave Roth, is included (with permission) because it is so important to toolkit daemon services on Win32 systems.
PDBA::GQ Generic Query module that simplifies single tables queries.
PDBA::LogFile Creates and locks log files; used by many scripts in the toolkit to perform logging actions.
PDBA::OPT Processes command line arguments unhandled by calling scripts.
PDBA::PWC Password client module.
PDBA::PWD Password server module.
PDBA::PWDNT Password server modules for Win32.
PDBA::PidFile Used to control script execution.
PDBA Modular collection of widely used methods.

With architecture, as with Perl, what you see is what you get, and it is impossible to hide steel and glass constructions from rival architects. However, your inspiration and artistic ability are your own, and it is your signature at the bottom of the blueprints, just as it is Larry Wall's copyright name on the Perl Artistic License. So if Larry Wall is an "Ideal" man in Ayn Rand's Objectivist mold, what about the other Larry in our story, Mr. Ellison of Redwood Shores?

Proprietary Technology

In Atlas Shrugged one of the main "Ideal" men is Hank Rearden. Using his own capital, determination, and research ability, Hank Rearden creates a proprietary super-strength metal, of which the whole world is unable to get enough. It is used in fast trains, speeding bullets, and tall buildings. This entirely new metal is cheaper, lighter, and stronger than any other known type of steel or alloy; Rearden Metal is a product that transforms the world. Once again, rolling forward a couple of generations and modernizing the names, if Ayn Rand were to replace the key productive technology of 1950s America with that of early twenty-first century America, she would probably choose to write about databases. Her hero, Larry Rearden, would be a rugged individualist running a database corporation that creates cutting-edge products used throughout the world in a thousand different industries. Remind you of anyone?

We think Oracle and Perl work together so well because they were both created by incredible individuals for other individual businesses, individual Oracle DBAs, and individual Perl programmers. With both there are no imposed limits, and just as much help as you might need, from either Oracle support, or the worldwide Perl community. And of course, Perl for Oracle DBAs.

The Perl DBA Toolkit

To take advantage of this synergy between Perl and Oracle in our toolkit, we've blended the two streams together into four key DBA areas. These are password serving, the performance of routine DBA tasks, the monitoring of the database, and the building of a database repository for informational time-traveling.

The completed Perl DBA Toolkit scripts described below in Table 2 allow it to work securely around a network without clear passwords being passed around, thereby enabling you to have one toolkit point of control for all of your databases, no matter where they're located.

Table 2. Password serving

Script Description
pwd.pl Password server daemon that encrypts passwords via a TCP socket; works remotely with the other Perl scripts via the toolkit module set.
pwc.pl Client that remotely retrieves encrypted passwords from the password server, easing the secure database access overhead imposed by other scripts.
pwd_service.pl Installs the password server as a daemon (on Unix) or service (on Win32).

The scripts in Table 3 perform a wide variety of DBA tasks, including the creation of new users from the command line, the creation of new users via duplicated accounts, and the creation of multiple accounts with automatically mailed passwords. They also cover the maintenance of indexes, the killing of sniped database sessions, the management of extent usage, and the extraction of DDL and data for SQL*Loader transfer.

Table 3. Routine database administration

Script Description
ddl_oracle.pl Generates the DDL necessary to recreate schemas, tables, indexes, views, PL/SQL, materialized views, and other objects.
sqlunldr.pl Dumps entire schemas to comma-delimited files and generates the SQL*Loader scripts necessary to reload them. Also dumps LONG RAW and BLOB objects, converting them to hex format via the Oracle HEX_TO_RAW function in the SQL*Loader control file in order to convert the data back into binary format.
create_user.pl Creates Oracle users from the command line. You can create a user and assign passwords, tablespaces, and privileges, all with one easy command-line call. Best of all, you can use this script to pre-configure different groups of runtime privileges.
drop_user.pl Drops a database user by first dropping all of their tables and indexes before dropping the account. Doing so avoids most of the resource- intensive SQL recursion incurred when dropping an account containing many tables and indexes.
dup_user.pl Duplicates an account, with the source user's system privileges, object privileges, roles, and quotas assigned directly to the target user.
mucr8.pl When creating a large number of users, this utility creates them all with a single operation. Configurable permissions are granted, and the passwords automatically generated get emailed back to the new account owners.
kss.pl Kills sniped sessions (which are lapsed sessions on busy databases consuming unnecessary memory resource).
kss_NT.pl Win32 version of kss.pl.
kss_service.pl Used to create an appropriate snipe killing service on Win32.
idxr.pl Determines if an index should be rebuilt and, if so, rebuilds it. Checks on a per-schema basis, and is configured to check indexes based on days since the index was last analyzed. A configurable time limit is imposed, which allows index rebuilds to fit within a predefined time schedule.
maxext.pl Monitors the size and number of extents in tables and indexes. If they're nearing a maximum allowed or if the object will be unable to extend because of limited free space, it notifies the DBA. This script is most useful for databases that use dictionary-managed extents.

Table 4 lists our remote monitoring scripts, which help to maximize the availability of your databases by alerting you to both error conditions reported in the Oracle alert log and to problems with database connectivity. Some of them can even phone you up.

Table 4. Database monitoring

Script Description
chkalert.pl Daemon that monitors Oracle alert logs for error conditions and notifies the DBA via either email messages or pager calls. Oracle's alert.log files contain important error messages as well as a log of database startup and shutdown messages.
chkalert_NT.pl Win32 version of chkalert.pl.
chkalert_service.pl Utility script that creates a Win32 service for chkalert_NT.pl.
dbup.pl Working alongside chkalert.pl, a highly configurable database connectivity monitor that checks to see if databases are up and available.
dbup_NT.pl Win32 version of dbup.pl.
dbup_service.pl Creates the Win32 service for dbup_NT.pl.
dbignore.pl Utility script used with dbup.pl to temporarily disable connectivity checks on an individual database (e.g., while maintenance is being performed).

Table 5 summarizes what we've called repository scripts. These compare different database schema versions over time, detecting database changes (official or otherwise). They also store SQL execution plans within a library cache to allow comparison between current execution plans and plans previously collected; this way, the scripts can report on changed execution plans and the reasons behind the changes.

Table 5. Repository and DDL "time travel"

Script Description
baseline.pl Creates the baseline for the PDBA repository, establishes "time travel" control of DDL (Data Definition Language), and stores the entire database structural change record across time boundaries.
spdrvr.pl Perl driver for SQL*Plus that reports on information created by baseline.pl.
sxp.pl Collects and stores SQL statements from the data dictionary and generates accompanying execution plans for later comparison with other plans.
sxpcmp.pl Examines the current SQL statements, generating execution plans.
sxprpt.pl Generates reports based on the stored SQL and execution plans.

Perl Philosophy

There is something else that makes Perl different from other computer languages, which may move it closer towards the rugged individualists of American business; it has three major philosophical virtues. And so does Objectivism. Coincidence? I'll leave it to you to decide whether the two sets of virtues below are in any way related:

The three great virtues of the Perl programmer, as originally defined by Larry Wall, are: Laziness, the quality that makes you write labor-saving programs to increase productivity; Impatience, the injustice you feel when applications are inefficient, which makes you write clever programs to anticipate your needs, and Hubris, the pride that makes you create great solutions, which others will say only good things about.

The three cardinal virtues of Objectivist ethics are: Purpose, the recognition that productive work is how man's mind sustains his life; Reason, the use of rationality as the only guide to considered action and wealth creation, and Self-Esteem, the recognition that as man is a being of self-made wealth, it is this route through which he can acquire the pride of the self-made soul.

Ayn Rand's life works were about wealth creation, and the individual. However they were not just about money. They were also about any form of wealth or ideas creation, where wealth is the product of one person's mind. And with the Perl Artistic License copyright, it is clear the wealth of Perl was created and is owned by Larry Wall. Although he decides to give it away, this is entirely his right. Perl even possesses its own culture of freedom, reflected in the Perl catch phrase TMTOWTDI--There's More Than One Way To Do It-- and its own free-trade area, the Comprehensive Perl Archive Network, or CPAN. This is where thousands of worldwide Perl developers swap modules, in exchange for respect from the rest of Perl society. Indeed, the respect that Larry Wall has duly earned is priceless. He may remain unable to buy MiG jets with it, but it still makes him a major keynote speaker at technology conferences, just like our other shrugging Atlas in this article, Larry Ellison.

In following this established Perl philosophy, we've created our toolkit as an entirely open source project. We'll wait and see how it develops, but we're looking forward to your comments and suggestions on how it can be further improved to meet your own specialized requirements; we're hoping it will match the runaway success of Steve Feuerstein's utPLSQL project.

The Sign of the Dollar

The Two Larrys are both free men of the mind who live on two sides of the same coin. They have created between them two of the world's great American productive inventions, Perl and Oracle, which work well together because they arise from the same intellectual substrate. Without the pioneering work of both Wall and Ellison, the world would be both spiritually and materially poorer. And here's a final thought. For those who've read Atlas Shrugged, you'll know the basic value symbol of the free men of the mind; it was a golden dollar symbol. And by a bizarre twist in our tale, every Perl script ever written is full of basic value variables, each preceded by a dollar symbol. Another coincidence? Possibly. But more bizarrely, if you take a vinyl copy of John Lennon's 1970s anthem, "Imagine," and play it backward, it says "Perl for Oracle DBAs." No kidding.

(If you'd like to learn more about Ayn Rand's work, check out this Web site.)

This week on Perl 6 (8/26 - 9/1, 2002)

Well, it has been a week. Damian came to London and made our heads spin; perl6-language erupted in a flurry of interesting, high signal/noise threads; Parrot reached its 0.0.8 release; Larry made many of his wonderfully unexpected but obviously right interjections and the world kept on turning.

So, we'll kick off with perl6-internals as usual.

DOD etc

The 'elimination of garbage collection hand waving' thread continued as Nicholas Clark asked a hard question about garbage collection and dead object detection (DOD). As far as Nick could tell, it seems that 'if we have unrefcounted "deterministic destruction" objects somewhere freely in the GC system, then we'll be needing a DOD run after every statement' and he noted that 'all ways of doing deterministic destruction seem to have considerable overhead.' Sean O'Rourke wondered whether we could use a hybrid 'full GC + refcounts where needed' scheme, but Juergen Boemmels pointed out that refcounting would be contagious. Anything that contained a reference to a refcounted object would need to be refcounted in its turn.

Meanwhile, Mike Lambert wondered why we needed to promise deterministic destruction in the first place and proposed a couple of schemes to deal with the canonical 'filehandle' case. Sean O'Rourke and Steve Fink both came forward with cases where deterministic destruction proved useful, and where Mike's scheme didn't really work. And that's where the thread came to rest. I have the feeling that it, or a thread like it, will be back.

http://makeashorterlink.com/

Dynamic Keys

Tom Hughes, who has been doing good work on keyed access wondered about dealing with dynamic keys, and proposed a way forward. Dan asked whether Tom had looked at the proposed ops in PDD06, and pointed out that dynamic keys didn't necessarily need to go the whole PMC hog. 'They're our internal structures -- we can screw with them as we need :)'. Tom pointed out a few issues with the PDD06 op set, and proposed a few more ops with a (hopefully) consistent naming scheme. So far he's had no answer to the questions he raised in that post.

http://makeashorterlink.com/

Perl6 Test Failures

Steve Fink wondered about all the test failures he keeps seeing for Perl 6; he doesn't want to go trying to make a language neutral regex engine play nicely with the Perl6 engine when that engine is in such a state of flux. Sean O'Rourke suggested nailing down some calling conventions and then both teams could code to those conventions. Steve pointed out that, so far, he knows of at least five attempts at a regex engine in parrot. Leopold Toetsch suggested that Steve try the tests again, forcing a grammar rebuild, and the test failures got all better.

http://makeashorterlink.com/

Regex status

On Wednesday, Dan wondered where we were with Apocalypse/Exegesis 5 compatible patterns/rules/regexes. Sean O'Rourke told him. (Answer: Still some way to go, but making good speed.)

http://makeashorterlink.com/

Counting down to 0.0.8

On Thursday, Jeff Goff posted his timetable for 0.0.8, 'Octarine' release of Parrot, complete with a 25-hour code freeze. Markus wondered whether using a GMT timetable might be more friendly for everyone who wasn't on the East Coast of the United States. Parrot actually saw release Monday, Sept. 2, which is slightly outside the scope of this summary, but I'll let it sneak in anyway.

http://makeashorterlink.com/

http://makeashorterlink.com/

An 'Oops' Moment

Leopold Toetsch found an interesting bug with the GC system interacting with initialization. examples/life.ar.p6 is a Perl6 implementation of Conway's Life, which has a rather lengthy initialization phase, after which it checks the @ARGS array, which is conventionally placed in P0 at startup. But there's a catch. By the time it comes to make the check, @ARGS has been garbage collected. Peter Gibbs posted a quick fix patch, and Mike Lambert stuck his hand up to being a 'lazy bum,' but reckoned that Steve Fink's fixes should solve the problem.

http://makeashorterlink.com/

Various changes to IMCC

Whilst 'idly toying' with IMCC, Steve Fink made a bunch of speculative changes, bundled 'em up in a patch and offered them to the list. I'm not sure what people thought of the changes, but the thread morphed into a discussion of generating conditional makefiles and making sure that IMCC and the other tools needed to get the Perl6 compiler working were as portable as possible. Mike Lambert pointed out that it may make sense to have the files generated by bison/flex checked directly into the repository, since then those tools wouldn't be needed except by people who go messing with the grammar.

http://makeashorterlink.com/

Concatenation Failing

Leon Brocard (phew, I was worried I was going to have to run his questionnaire this week) found a bug where concatenation fails occasionally, leaving no clues as to why. He attached some sample code that illustrates the problem. Peter Sinnott noted that Parrot seems to be getting confused about the length of the strings involved. Meanwhile, Peter Gibbs offered a patch and Mike Lambert reckoned it fixed a bug in his code, but couldn't for the life of him work out why. Peter reckons it has to do with unmake_COW resizing the allocation and causing confusion elsewhere. I get the feeling that what we have now is a 'symptomatic' fix in search of a fix for an underlying issue. But I'm just a summarizer.

Markus Laire found what he thinks might be another bug, but I've no idea if it's fixed by Peter Gibbs' patch.

http://makeashorterlink.com/

http://makeashorterlink.com/

IRIX64 alignment problem

Steven McDougall chased down a bug causing t/pmc/perlhash.t to throw a bus error, but wasn't at all sure how to go about fixing it, and asked for advice. Bryan C. Warnock offered a few pointers, as did Peter Gibbs, but we don't have a fix yet.

http://makeashorterlink.com/

Meanwhile, in perl6-language

Prototypes, grammars and subs, oh my!

Thom Boyer wondered what while's signature would be. He'd considered sub while (bool $test, &body); and sub (&test, &body); but neither really fit. Larry agreed and offered


    sub while (&test is expr, &body);

and then, reaching deeper into his bag of tricks, he pulled out the wonderful/scary


    sub while (&test is rx/<expr>/, &body);

(Think about that for a moment. What is proposed that you'll be able to specify a grammar for your functions argument list, which is definitely something that made me sit up and take notice.) Damian sat up and took notice, too, offering some refinements and doing some thinking aloud. Damian suggested that maybe the prototype should look like sub while ( &test is specified(/<Perl.expr>/), &body); >>. Damian also suggested blurring the line still further between statements and expressions by having the likes of C<for> return a value, and had some thoughts on multimethods. Trey Harris also offered some more comments on multimethods.

All of which leaves me looking forward with bated breath for Apocalypse 6.

Quote of the thread: "The whole point of making Perl 6 parse itself with regexes is to make this sort of stuff easy." -- Larry

http://makeashorterlink.com/

http://makeashorterlink.com/

Rule, rx and sub

Deborah Ariel Pickett summarized the state of her understanding of the difference between rule and rx and wondered if there was any case where


    ... rule ...

and ... rx ...

(given the same ...s in both cases), lead to valid, but different semantics. Uri Guttman thinks not. Damian thinks so, and provided an example. (It was joked, on the London.pm mailing list (by Damian himself) that Damian is currently our only real, live Perl6 interpreter.). Luke Palmer raised a red flag about Damian's example; Damian thinks it wasn't a red flag, but left it to Larry to adjudicate. This also provoke a certain amount of discussion about the philosophy behind some of the design decisions so far.

Glenn Linderman wondered whether rx shouldn't be respelled, as the term 'regex' is being deprecated. Damian suggested that rx actually stood for 'Rule eXpedient', but I'm not sure he convinced anyone (himself included). Ever the linguist, Larry observed that 'we can tweak what people mean by "regular expression", but there's no way on earth we can stop them from using the term.' and that, no matter how many editions it goes through, Friedl's book is always going to be called Mastering Regular Expressions. So, Larry is 'encouraging use of the technical term "regex" as a way to not precisely mean "regular expression".'

Piers Cawley raised a question about when } terminates a statement and got it wrong. This subthread led to a short discussion on good Perl 6 style. Damian told us that 'Any subroutine/function like if that has a signature that ends in a &sub argument can be parsed without the trailing semicolon', which I don't remember seeing in any Apocalypse. This led to a discussion about what was legal in a prototype specifier, ending when Larry told us that it'd be possible to specify a grammar as a function's prototype.

http://makeashorterlink.com/

http://makeashorterlink.com/

http://makeashorterlink.com/

http://makeashorterlink.com/

Auto deserialization

At the root of what turned into a large thread, Steve Canfield asked a deceptively simple question: '[Will] code like this Do What I Mean: my Date $bday = 'June 24, 2002''? We weren't entirely sure what he meant by that...

The thread was long, and pretty much unsummarizable, but we ended up with the rather pleasant looking my Date $date .= new('Jun 24, 20002'), the idea being that, because $date is known to be a Date, even if it's undefined, then it's possible to make a static method call on it. The response to this suggestion spilt over into the next week, but 'favourable' would be a good description of it.

http://makeashorterlink.com/

http://makeashorterlink.com/

Hypothetical synonyms

Aaron Sherman wondered if he would be able to write


    $stuff = $field if m{^\s*[
        "(.*?)"     {let $field = $1} |
         (\S+)      {let $field = $2}]};

Larry thought


    my $stuff;
    
    m{^\s*[
        "$stuff:=(.*?)" |
         $stuff:=(\S+)
    ]};

was a better way of doing it, saying that he saw no 'particular reason why a top-level regex can't refer to variables in the surrounding scope, either by default, or via a :modifier of some sort.'

Uri Guttman, in possibly the first ever Perl 6 golf post (he denies it's really golf), suggested a way of shortening the pattern further, and Larry trumped him by shortening it to my $field = /<shellword>/, which led Nicholas Clark to wonder about oneliners along the lines of my $data = /<xml>/ and wondered if the Perl regex engine would be faster than using expat. Nick also wondered if Perl 6 would give shorter golf solutions than Perl 5.

There was quite a bit more in this thread, but my summarizing skills are failing.

http://makeashorterlink.com/ -- Thread starts here, it's jolly good.

Does ::: constrain the pattern engine implementation

Deven T. Corzine wondered if the presence of ::: and friends in the pattern language meant we'd constrained the possible implementation of the pattern engine before we'd started, and if we could implement something that didn't do backtracking. General opinion seemed to be that we couldn't avoid backtracking, but Deven wondered if it wouldn't be possible to use a non backtracking implementation for some special cases. The consensus appears to be 'If you build it, and it's faster, we will come'.

http://makeashorterlink.com/

Backtracking into { code }

Ken Fox wondered if


  rule expr1 { <term> { /@operators/ or fail } <term> };

and


  rule expr2 { <term> @operators <term> }

were equivalent. Damian thought not, and added that expr1 should probably be rewritten as rule expr1 { <term> { m:cont/@operators/ or fail } <term> }. Larry says that we will backtrack into subrules.

Again, the whole thread is worth reading if you're interested in the rules/patterns/regex engine.

http://makeashorterlink.com/

Prebinding questions

Philip Hellyer asked a bunch of questions sparked by Damian's talk about Perl 6 to London.pm (at the Conway Hall no less, London.pm knows how to find appropriate venues). Damian answered them.

http://makeashorterlink.com/

@array = %hash

Nicholas Clark noted that @array = %hash, for a hash of n elements would return an array of n pairs. The Perl5 style, returning a list of 2n elements, keys and values interleaved would be @array = %hash.kv. All this led Nick to wonder what happened in the other direction. Obviously %hash = @list_of_pairs was going to do the right thing, but what about %hash = @kv_array. And, more worryingly, what about


   %hash = ("Something", "mixing", pairs => "and", "scalars");

It turns out that the @kv_array case will Just Work, and the last case will cause discussion to break out. Damian thought that the example above would throw an error because there are 5 elements in the list. Another school thought that, because PAIRs are first class objects in Perl 6, the code should work, with one of the keys of the hash being the pair (pairs = 'and')>. Damian thought not, and discussion ensued. I'm afraid I'm not entirely well qualified to summarize this thread as I'm one of those who thinks Damian is wrong, or at least, not yet sufficiently correct. However, for now, the state of the design is such that pairs are 'special' and it takes an effort of compile time will to use them as keys (or values come to that) in a hash.

http://makeashorterlink.com/

Regex stuff

Choosing a deliberately vague subject line in an effort to give the summarizer a headache, Piers Cawley asked a question about binding to numeric hypotheticals. It turns out that binding to a numeric hypothetical variable in a regular expression is special cased (resetting the numeric 'counter') and even mentioned in the appropriate apocalypse, and the problem that Piers thought he saw doesn't actually exist.

http://makeashorterlink.com/

Atomicness and \n

Aaron Sherman wondered what \n would be translated to in a Perl 6 pattern. Aaron proposed <[\x0a\x0d...]+>. Damian thought it was <[\x0a\x0d]>, and Ken Fox thought it would be something like \0xd \0xa | \x0d | \x0a. Personally I think it'll be [ \0xd \0xa | <[\x0a\x0d...]> ]. (I also believe that whoever came up with the idea of a two character end of line marker should be taken out and shot, but that's another story entirely).

The news from London

On Thursday Damian managed to deliver his Perl 6 prospectus talk in about 3.5 hours. Okay, it sounds like a long time, but Damian told us that, on average, the talk runs to 5 hours. The 'one question' rule introduced by London.pm seems to have worked well. As predicted a whole load of lights went on over peoples heads as they started to 'get' how the whole thing hung together.

Anyway, the one question rule led to a bunch of questions about Perl 6 cropping up on the London.pm mailing list that haven't (yet) been cross posted to perl6-language. One interesting question concerned what happens when bare/if/while/do/when/etc blocks have return in them. Damian answered that they throw a 'return' control exception, which the control structure catches and re-throws. Piers wondered how you'd go about writing your own looping construct and made a couple of proposals about the treatment of control exceptions.

Quote of the London.pm threads: 'yImoj Perl javDIch!' which, as everyone knows, is the Klingon for 'Be Perl6. Now!'.

http://makeashorterlink.com/

http://makeashorterlink.com/

Call for assistance

Next week is the Zurich Perl 6 mini conference, and I won't be there. Soon after that is YAPC::Europe 2002, and I won't be there either. This week, Damian is in Belfast and will be talking to Belfast.pm about stuff, guess what, I'm not there either. If anyone would like to send me some Perl 6 related reports for the summary from any of these events I would be enormously grateful. Thanks in advance.

Squashing a myth

You may have come across the 'Damian Conway is looking for graduate students' meme. I know I have, and I repeated it at one of Damian's talks in London. Guess what, it's not true (as I should have realised). Damian is no longer associated with any institute of higher learning and is not looking for graduate students. Kindly readjust your memeplexes.

In Brief

Pete Sergeant pointed us all at some work he'd done toward building an 'operations dictionary' for Parrot. Available at http://grou.ch/parrot/index.cgi.

Over the course of a surprisingly long thread, Steve Lambert, Markus Laire and Leopold Toetsch and a few of the usual suspects got IMCC and Perl6 compiling properly with Windows.

Bryan C. Warnock offered a large patch to parrot's glossary.pod; a document worth reading. Bryan also PODified byteorder.dev so it would show up correctly on http://www.parrotcode.org/docs/. Bryan also added POD title blocks to a pile of PODs. Kudos to the docmonster.

Daniel Grunblatt added conditional breakpoints and watchpoints to the Parrot Debugger. Steve Fink also waved a magic wand over the debugger and fixed up a bunch of problems. Well done chaps.

Andy Dougherty wondered about how up to date MANIFEST is. There were 497 files listed in MANIFEST, but a fresh CVS checkout runs to 2215 files. Daniel Grunblatt thinks he's added all the important files to MANIFEST.

Sean O'Rourke made some big changes to IMCC; newlines within statements are no longer allowed, and there's been some 'significant changes to register allocation and spilling.'

Bryan C. Warnock wondered about the file permissioning inconsistency in the parrot source tree. Andy Dougherty pointed out that it wasn't (quite) as inconsistent as it looked; build scripts should be left without execute bits so that makefile authors would use the more portable $(PERL) script.pl, which makes no assumptions about the whereabouts of perl.

Peter Gibbs offered a huge patch, merging his 'African Grey' parrot tweaks with the CVS parrot. It's not been applied in CVS, but http://makeashorterlink.com/ has the details.

Jason Gloudon wondered about temporary PMCs used in opcodes like < ADD Px, Py, Pz >, and wondered what set_pmc should do in the simple case. Sean O'Rourke pointed out that often one wouldn't need to create transient PMCs because one could use the specialist string and number registers to hold the temporaries. Sean voted for 'morph' in the simple set_pmc case.

Jonathan Sillito contributed a patch allowing for hierarchical lookup of lexical variables in scratchpads, and which makes subroutines into real closures (by virtue of the hierarchical lookup up of lexical variables...). Warnock's Dilemma applies...

Andy Dougherty did some cleaning up of the build process, removing lint and cruft from the Makefiles and Configure.pl. Hmmm... I wonder how far we are from our goal of 'compile a miniparrot, use that to execute the more advanced config script, and then compile the full parrot', and removing the dependency on Perl to build parrot...

Now that we have ICU in the repository Angel Faus wondered how we should deal with differently encoded strings. No answers yet.

Andy Dougherty wondered if the time had come to make IMCC build with plain old yacc and lex instead of depending on bison and flex. Leopold Toetsch reckoned that that would be a good idea and asked Andy for his patches.

Leopold found a bug in mul, div, mod, sub, concat when operating on PMCs. So he fixed it. I'm not sure if the patch has been applied.

Jürgen Bömmels got fed up with MANIFEST not being accurate. So he wrote an automated test which compares the MANIFEST with CVS/Entries and complains if they don't match. I don't know if it's been checked in yet, but when it does, let's hope the committer remembers to update the manifest.

Steve Fink has a patch which stops Configure.pl touching files unless they've actually changed, and wondered if it would be of any use to people not actually working on Configure. Nicholas Clark took the opportunity to point us at http://ccache.samba.org/.

Steve Fink also submitted a flurry of clean up patches, many of which look like they'll get applied once 0.0.8 has been released.

Dan noted that Hashes are an order of magnitude slower to GC than, say PerlStrings, and two orders of magnitude slower than PerlInts. Which isn't good. Dan wondered if there might be a way to get less GC overhead. Steve Fink offered a few suggestions.

':/::/:::/<commit> makes backtrack fail current atom/group/rule/match.' -- Markus Laire summarizes the various backtracking assertions.

Who's who in Perl 6

Who are you?
Miko (pronounced "Mike-Oh") O'Sullivan, Father of Melody, Husband of Starflower, Follower of Jesus, Author of The Idocs Guide to HTML
What do you do for/with Perl 6?
Participate in perl6-language, generally by suggesting small features that I think would make life a lot easier for programmers.
Where are you coming from?
I come from the perspective that a) I want things to just work without a lot of startup effort on my part b) I believe that things can in fact actually do that c) Perl does d) Perl can do so even more e) I'm pretty normal in these feelings. Oh, also Blacksburg, VA, USA.
When do you think Perl 6 will be released?
Put me down for June 6, 2003, 1:37:03 am EST.
Why are you doing this?
I'm sort of like a Lab (i.e. the dog) that instinctively jumps into a lake: I just must do it.
You have 5 words. Describe yourself.
Impatient.
Do you have anything to declare?
I declare a lot of public static constants.

Acknowledgements

Thanks to Gill for putting up with me last night while I sat and pretty much ignored everything as I worked on this summary. For some reason, perl6-language threads are much harder to summarize well; I always take longer to write a summary when that list is busy.

As usual, if you liked this summary, please send money to the Perl Foundation at http://donate.perl-foundation.org/ to support the ongoing development of Perl.

This week's summary was funded by the O'Reilly Network, who now pay the publication fee for the summaries directly to the Perl Foundation. So, a big thank you to them.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en