January 2004 Archives

How We Wrote the Template Toolkit Book ...

There are a number of tools available for writing books. Many people would immediately reach for their favorite word processor, but having written one book using Microsoft Word I'm very unlikely to repeat the experience. Darren Chamberlain, Andy Wardley, and I are all Perl hackers, so when we got together to write Perl Template Toolkit, it didn't take us long to agree that we wanted to write it using POD (Plain Old Documentation).

Of course, any chosen format has its pros and cons. With POD we had all the advantages of working with plain text files and all of the existing POD tools were available to convert our text into various other formats, but there were also some disadvantages. These largely stem from the way that books (especially technical books) are written. Authors rarely write the chapters in the order in which they are published in the finished book. In fact, it's very common for the chapters to rearranged a few times before the book is published.

Now this poses a problem with internal references. It's all very well saying "see chapter Six for further details", but when the book is rearranged and Chapter Six becomes Chapter Four, all of these references are broken. Most word processors will allow you to insert these references as "tags" that get expanded (correctly) as the document is printed. POD and emacs doesn't support this functionality.

Another common problem with technical books is the discrepancy between the code listings in the book and the code that actually got run to produce the output shown. It's easily done. You create an example program and cut-and-paste the code into the document. You then find a subtle bug in the code and fix it in the version that you're running but forget to fix it in the book. What would be really useful would be if you could just use tags saying "insert this program file here" and even "insert the output of running the program here". That's functionality that no word processor offers.

Of course, these shortcomings would be simple to solve if you had a powerful templating system at the ready. Luckily Andy, Darren, and I had the Template Toolkit (TT) handy.

The Book Templates

We produced a series of templates that controlled the book's structure and a Perl program that pulled together each chapter into a single POD file. This program was very similar to the tpage program that comes with TT, but was specialized for our requirements.

Separating Code from Code

There was one problem we had to address very early on with our book templates. This was the problem of listing TT code within a TT template. We needed a way to distinguish the template directives we were using to produce the book from the template directives we were demonstrating in the book.

Of course TT provides a simple way to achieve this. You can define the characters that TT uses to recognize template directives. By default it looks for [% ... %], but there are a number of predefined groups of tags that you can turn on using the TAGS directive. All of our book templates started with the line:

  [% TAGS star %]

When it sees this directive, the TT parser starts to look for template directives that are delimited with [* ... *]. The default delimiters ([% ... %]) are treated as plain text and passed through unaltered. Therefore, by using this directive we can use [% ... %] in our example code and [* ... *] for the template directives that we wanted TT to process.

Of course, the page where we introduced the TAGS directive and gave examples of its usage was still a little complex.

In the rest of this article, I'll go back to using the [% ... %] style of tags.

Useful Blocks and Macros

We defined a number of useful blocks and macros that expanded to useful phrases that would be used throughout the book. For example:

  [% TT = 'Template Toolkit';

     versions = {
       stable = '2.10'
       developer = '2.10a'
     } %]

The first of these must have saved each of us many hours of typing time and the second gave us an easy way to keep the text up-to-date if Andy released a new version of TT while we were writing the book. A template using these variables might look like this:

  The current stable version of the [% TT %] is [% stable %]

Keeping Track of Chapters

We used a slightly more complex set of variables and macros to solve the problem of keeping chapter references consistent. First we defined an array that contained details of the chapters (in the current order):

  Chapters = [
    {  name  = 'intro'
       title = "Introduction to the Template Toolkit"
    }
    {  name  = 'web'
       title = "A Simple Web Site"
    }
    {  name  = 'language'
       title = "The Template Language"
    }
    {  name  = 'directives'
       title = "Template Directives"
    }
    {  name  = 'filters'
       title = "Filters"
    }
    {  name  = 'plugins'
       title = "Plugins"
    }
    ... etc ...
   ]

Each entry in this array is a hash with two keys. The name is the name of the directory in our source tree that contains that chapter's files and the title is the human-readable name of the chapter.

The next step is to convert this into a hash so that we can look up the details of a chapter when given its symbolic name.

    FOREACH c = Chapters;
      c.number = loop.count;
      Chapter.${c.name} = c;
    END;

Notice that we are adding a new key to the hash that describes a chapter. We use the loop.count variable to set the chapter number. This means that we can reorder our original Chapters array and the chapter numbers in the Chapter hash will always remain accurate.

Using this hash, it's now simple to create a macro that lets us reference chapters. It looks like this:

  MACRO chref(id) BLOCK;
    THROW chapter "invalid chapter id: $id"
      UNLESS (c = Chapter.$id);
    seen = global.chapter.$id;
    global.chapter.$id = 1;
    seen ? "Chapter $c.number"
         : "Chapter $c.number, I<$c.title>";
  END;

The macro takes one argument, which is the id of the chapter (this is the unique name from the original array). If this chapter doesn't exist in the Chapter hash then the macro throws an error. If the chapter exists in the hash then the macro displays a reference to the chapter. Notice that we remember when we have seen a particular chapter (using global.chapter.$id) -- this is because O'Reilly's style guide says that a chapter is referenced differently the first time it is mentioned in another chapter. The first time, it is referenced as "Chapter 2, A Simple Web Site", and on subsequent references it is simply called "Chapter 2. "

So with this mechanism in place, we can have templates that say things like this:

  Plugins are covered in more detail in [% chref(plugins) %].

And TT will convert that to:

  Plugins are covered in more detail in Chapter 6, I<Plugins>.

And if we subsequently reorder the book again, the chapter number will be replaced with the new correct number.

Running Example Code

The other problem I mentioned above is that of ensuring that sample code and its output remain in step. The solution to this problem is a great example of the power of TT.

The macro that inserts an example piece of code looks like this:

  MACRO example(file, title) BLOCK;
    global.example = global.example + 1;
    INCLUDE example
      title = title or "F<$file>"
      id    = "$chapter.id/example/$file"
      file  = "example/$file"
      n     = global.example;
    global.exref.$file = global.example;
  END;

The macro takes two arguments, the name of the file containing the example code and (optionally) a title for the example. If the title is omitted then the filename is used in its place. All of the examples in a particular chapter are numbered sequentially and the global.example variable holds the last used value, which we increment. The macro then works out the path of the example file (the structure of our directory tree is very strict) and INCLUDEs a template called example, passing it various information about the example file. After processing the example, we store the number that is associated with this example by storing it in the hash global.exref.$file.

The example template looks like this:

[% IF publishing -%] =begin example [% title %]

      Z<[% id %]>[% INSERT $file FILTER indent(4) +%]

  =end
  [% ELSE -%]
  B<Example [% n %]: [% title %]>

  [% INSERT $file FILTER indent(4) +%]

[% END -%]

This template looks at a global flag called publishing, which determines if we are processing this file for submission to O'Reilly or just for our own internal use. The Z< ... > POD escape is an O'Reilly extension used to identify the destination of a link anchor (we'll see the link itself later on). Having worked out how to label the example, the template simply inserts it and indents it by four spaces.

This template is used within our chapter template by adding code like [% example('xpath', 'Processing XML with XPath') %] to your document. That will be expanded to something like, "Example 2: Processing XML with Xpath," followed by the source of the example file, xpath.

All of that gets the example code into that document. We now have to do two other things. We need to be able to reference the code from the text of the chapter ('As example 3 demonstrates...'), and we also need to include the results of running the code.

For the first of these there is a macro called exref, which is shown below:

  MACRO exref(file) BLOCK;
    # may be a forward reference to next example
    SET n = global.example + 1
      UNLESS (n = global.exref.$file);
    INCLUDE exref
      id    = "$chapter.id/example/$file";
  END;

This works in conjunction with another template, also called exref.

  [% IF publishing -%]
  A<[% id %]>
  [%- ELSE -%]
  example [% n %]
  [%- END -%]

The clever thing about this is that you can use it before you have included the example code. So you can do things like:

  This is demonstrated in [% exref('xpath') %].

  [% example('xpath', 'Processing XML with XPath') %]

As long as you only look at a maximum of one example ahead, it still works. Notice that the A< ... > POD escape is another O'Reilly extension that marks a link anchor. So within the O'Reilly publishing system it's the A<foo> and the associated Z<foo> that make the link between the reference and the actual example code.

The final thing we need is to be able to run the example code and insert the output into the document. For this we defined a macro called output.

  MACRO output(file) BLOCK;
    n = global.example;
    "B<Output of example $n:>\n\n";
    INCLUDE "example/$file" FILTER indent(4);
  END;

This is pretty simple. The macro is passed the name of the example file. It assumes that this is the most recent example included in the document so it gets the example number from global.example. It then displays a header and INCLUDEs the file. Notice that the major difference between example and output is that example uses INSERT to just insert the file's contents, whereas output uses INCLUDE, which loads the file and processes it.

With all of these macros and templates, we can now have example code in our document and be sure that the output we show really reflects the output that you would get by running that code. So we can put something like this in the document:

  The use of GET and SET is demonstrated in [% exref('get_set') %].

  [% example('get_set', 'GET and SET') %]

  [% output('get_set') %]

And that will be expanded to the following.

  The use of GET and SET is demonstrated in example 1.

  B<Example 1: GET and SET>

      [% SET foo = 'bar -%]
      The variable foo is set to "[% GET foo %]".

  B<Output of example 1:

      The variable foo is set to "bar".

As another bonus, all of the example code is neatly packaged away in individual files that can easily be made into a tarball for distribution from the book's web site.

Other Templates, Blocks, and Macros

Once we started creating these timesaving templates, we found a huge numbers of areas where we could make our lives easier. We had macros that inserted references to other books in a standard manner, macros for inserting figures and screenshots, as well as templates that ensured that all our chapters had the same standard structure and warned us if any of the necessary sections were missing. I'm convinced that the TT templates we wrote for the project saved us all a tremendous amount of time that would have otherwise been spent organizing and reorganizing the work of the three authors. I would really recommend a similar approach to other authors.

The Template Toolkit is often seen as a tool for building web sites, but we have successfully demonstrated one more non-Web area where the Template Toolkit excels.

This week on Perl 6, week ending 2004-01-25

Welcome to the first summary from my new home in Gateshead. The same old wibble, with a different view from its window and fewer trips to London. Right, time to see what's been going on in perl6-internals this week.

Global labels in IMCC

The cryptically named TOGoS wondered how to get the address of a label in a different IMCC compilation unit. According to Dan there's no way to do that, and you didn't want to do that anyway.

http://groups.google.com/groups

Dan's threads proposal

After a few weeks of everyone else's proposals, Dan started to outline the design of Parrot's threading capabilities. He started by defining his terms (a useful thing to do in a field where there seem to me multiple competing definitions of various terms) and guaranteeing that user code wouldn't crash the interpreter (subject to the assumption that system level memory allocation was thread safe) before getting into the meat of his proposal. Which you're probably best reading for yourself; it's a long document but there's very little flab and any attempt of mine to summarize it would probably end up being at least as long as and a good deal less lucid than the original.

Of course, this sparked a ton of discussion, generally positive, as people asked for various clarifications and made suggestions. Gordon Henriksen pointed out a nasty race condition that means that the garbage collector can't be made as thread safe as Dan had hoped.

Summarizer Barbie says "Threads are hard!"

On Thursday, Dan posted a last call for comments and objections before he went on to the detailed design. This time there were some objections, but I don't think any of 'em are going to stop Dan.

http://groups.google.com/groups

http://groups.google.com/groups

Vtables organization

Last week Dan had outlined an approach to organizing PMC vtables using a chaining approach; this week saw the discussion of that proposal with Benjamin K. Stuhl asking the hard questions.

http://groups.google.com/groups

Benchmark suite

Matt Fowles suggested that it might make sense to create a canonical suite of benchmarks to exercise Parrot well. His idea being that, if we have a standard suite of Parrot benchmarks, then potential performance affecting changes could be tested against that, rather than having custom benchmarks rolled each time. Luke Palmer pointed to examples/benchmarks and noted that it's hard to create benchmarks that test everything. However, he hoped that any good benchmark that gets posted to the list would get added to this suite, along with some documentation describing what is being tested.

http://groups.google.com/groups

Number formatting

Dan did some more designing, this time mandating that Parrot will, eventually adopt ICU's formatting template for numeric templates but, to start with, we'll be rolling our own. The new op will be format Sx, [INP]y, [SP]z.

http://groups.google.com/groups

Base string transforms

Dan announced that he would be adding upcase, downcase, titlecase and to_chartype to the various chartype vtables. He also noted that he'd like to get some alternative chartypes and encodings into Parrot as soon as possible to make sure we can actually handle things without having to use Unicode all the time.

http://groups.google.com/groups

Calling conventions in IMCC

Will Coleda had some problems with IMCC's handling of the parrot calling conventions when he found that code that worked a couple of months ago had stopped working in the current Parrot (A month is a *very* long time in Parrot development though.) The problem took a fair bit of tracking down and I'm not entirely sure it's entirely fixed yet; Will had reached the point where the code would compile, but it still wouldn't actually run.

http://groups.google.com/groups

Steve Fink's obnoxious test case

Steve Fink posted an obnoxious test case that generated memory corruption. The test case is obnoxious because it's 56KB of IMCC source code, and Steve had been unable to reduce it. This didn't discourage Leo Tötsch though, who set about tracking the bug to its lair. It's not fixed yet, but with the Patchmonster on the case it can only be a matter of time.

There were several other GC related issues that cropped up over the week; I wonder if they're all aspects of a single lurking bug.

http://groups.google.com/groups

IMCC returning ints

Steve Fink also found a problem with IMCC failing to properly return integers from unprototyped routines and posted an appropriate patch to the test suite. It turns out that the problem is that IMCC doesn't quite implement the full Parrot Calling Conventions, especially the return convention, but it's getting there.

http://groups.google.com/groups

The costs of sharing

Leo Töposted a test program and some results for timing the difference between using shared and unshared PMCs. The shared versions are (not surprisingly) slower than the unshared ones; the question is whether the difference between the two can be improved. Hopefully the benchmark will get checked into examples/benchmarks as suggested by Luke earlier.

http://groups.google.com/groups

An array of array types

Dan noted that we have "a pile of different array classes with fuzzy requirements and specified behaviours which sort of inherit from each other except when they don't." He suggested that the time had come to work out what we actually want in the way of array classes, compare our requirements with what we have, and then to do something about making what was available match what was required. I'm not sure that the resulting discussion has finalized the set of array types needed, but it's getting there. (Does anyone else think 'FixedMixedArray' is awfully clumsy as names go?).

http://groups.google.com/groups

Remember to nag Robert Spier

Robert Spier announced that repairing the web accessible TODO list was on his personal TODO list and asked to be nagged about it periodically.

Robert, remember you need to fix the web accessible TODO list.

http://groups.google.com/groups

Churchill's parrot still swearing

Effortlessly Godwinning himself, Uri Guttman pointed to a press release which stated that Winston Churchill's parrot, Charlie, is now 104 years old and can still be coaxed into squawking certain inflammatory remarks which had apparently made it rather unsuitable for keeping at its owner's pet shop due to its habit of swearing at children.

http://groups.google.com/groups

Updated documentation in Perl scripts

Michael Scott continued his sterling work of updating and generally improving Parrot's documentation. This week his attention fell upon: the Perl scripts found in build_tools, classes and tools/dev. Top man that he is, he's currently working on the documentation embedded in C code.

http://groups.google.com/groups

http://groups.google.com/groups

Open issue review

Robert Spier (don't forget the web accessible todo list Robert) posted a list of the 177 currently outstanding Parrot issues in the RT system and asked for volunteers to go through them to help weed out those issues that were no longer current. So people did. Which is nice.

http://groups.google.com/groups

How to subclass dynamic PMCs

Michal Wallace is trying to make a dynamically loaded PMC that subclasses another dynamically loaded PMC and he can't work out how to do it. Leo Tötsch had the answer.

http://groups.google.com/groups

How does Parrot handle High Level Language eval

Nigel Sandever wondered how Parrot would handle eval opcodes for multiple different languages. Leo pointed him at the compile op, which (while it isn't fully implemented yet) will address this issue. Dan noted that it's currently working for PIR and PASM code but that it should be able to work eventually with any compiler that generates standard bytecode.

http://groups.google.com/groups

Signals and events

Leo is working on turning OS level signals into Parrot level events, and he's not having an easy time of it. He posted a summary of the issues and asked for comments. Discussion continues.

http://groups.google.com/groups

Threading again

Gordon Henriksen worries that Parrot's current architecture is actively thread hostile. He also accepted that trying to change it now wasn't really possible. So he outlined various ways in which the need for locking could be reduced, which should help speed things up. The big problem, as Gordon sees it, is that so many Parrot data structures are mutable, and mutable data structures require locks. And having PMCs that can morph from one type to another is... well, Gordon claims that morph must die, though he later modified this claim. He and Leo batted this back and forth for a while; I'm not sure either side is convinced.

http://groups.google.com/groups

Embedding vs. extending interface types

Mattia Barbon noted that the embedding and extending interfaces were still using different names for Parrot_Interp and Parrot_INTERP. He wondered which was correct. It turns out that nobody's quite sure, but the person who can make the decision -- Dan -- was en route to Copenhagen when this came up, so there's no answer yet.

http://groups.google.com/groups

Meanwhile in perl6-language

Semantics of Vector operations

Determined to test everyone's Unicode readiness, Luke Palmer kicked off a discussion of the semantics of [1,2,3] »+« [4,5,6]. At first glance it looks like the result should be [5,7,9], but Luke argued that actually, the code was trying to add two lists, each containing a single scalar, that just happened to be listrefs. Larry pointed out that "Doing what you expect at first glance is also called 'not violating the principle of least surprise'", before going on to surprise us all with 'lopsided' vector ops, which would allow the programmer to specify when a value was expected to be treated as a scalar:

    $a   »+« $b     # Treat $a and $b as lists
    $x    +« $y     # Treat $x as a scalar and $b as a list
          -« @bar   # Return a list of every element of @bar, negated
    @foo »+  @bar   # Add the length of @bar to every element of @foo

Then he scared me with @foo »+= @foo. He noted that it might take some getting used to, but that it helped if you pronounce » and « as 'each'. Austin Hastings didn't like it (from a syntax highlighting point of view), but he appeared to be outvoted. Larry pointed out that «» etc were the least of a syntax highlighters worries given that any use, eval or operator declaration had the potential to morph any subsequent syntax. Piers Cawley thought that truly accurate syntax highlighting would have to be done in an image based IDE implemented in Perl because such an editor would always know what rules were in scope for a given chunk of code. A. Pagaltzis thought that this would definitely increment the Smalltalkometer for Perl 6.

As discussion and exploration of this idea continued it became apparent that people seem to like this particular weirding of the language, and it certainly allows the programmer to disambiguate things rather neatly. Luke even pointed out that this new approach allows for calling a method on a list of values: @list ».method, and to call a list of methods on a value: $value.« @methods.

Then the fun began. The issue is that » and « can also be written << and >> (but your POD processor hates you for it). This leads to ambiguities like >>+<<=<< (which are even harder to type in a Pod escape) which can be parsed as »+<<=« or »+«=«. Larry wondered if the problem arose because of trying to make the <lt and >> alternatives look too similar to the Unicode glyphs.

You know, looking at that last paragraph I can see why people think Perl 6 is horribly scary. The thing is, you're not expected to use constructions like that in real world programs all the time; but when you're working out what a grammar should be you have to think of all the nasty edge cases to see where things break.

Anyway, such nastiness led to the possibility of introducing a 'whitespace eating' macro which would allow for the introduction of disambiguating whitespace. The front runner for this macro is _.

http://groups.google.com/groups

Comma operator

Remember a few months ago when there was some discussion of replacing the C style comma with some other glyph? If that were done, one of the consequences would be that

   @foo = 1,2,3;

would fill @foo with three elements instead of just the one as it does in Perl 5. Joe Gottman had a few questions about the implications of that, and wondered if Larry had actually ruled on it. Larry ruled that list construction would continue to require brackets (or, if you're American, parentheses) and went on to discuss some further implications of that.

http://groups.google.com/groups

Acknowledgements, Apologies, Announcements

Thankfully, this section's normal service is resumed this week. The only catch is, I can't think of anything to say.

However, if you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send me feedback at p6summarizer@bofh.org.uk, or drop by my website (New! Improved! listening on port 80! Still sadly lacking in desperately new content!)

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

http://www.bofh.org.uk/ -- My website, "Just a Summary"

Introducing Mac::Glue

Thanks to the popularity of Mac OS X, the new iBook, and the PowerBook G4, it's no longer uncool to talk about owning an Apple. Longtime Mac devotees have now been joined by longtime Unix devotees and pretty much anyone who wants computers to be shiny, and speakers at conferences such as the Open Source Convention are beginning to get used to looking down over a sea of Apple laptops.

One of the great features about Apple's Mac OS is its support for flexible inter-process communication (IPC), which Apple calls inter-application communication (IAC). One of the components of IAC is called Apple events, and allows applications to command each other to perform various tasks. On top of the raw Apple events layer, Apple has developed the Open Scripting Architecture, an architecture for scripting languages such as Apple's own AppleScript.

But this is perl.com, and we don't need inferior scripting languages! The Mac::Glue module provides OSA compatibility and allows us to talk to Mac applications with Perl code. Let's take a look at how to script Mac tools at a high level in Perl.

The Pre-History of Mac::Glue

In the beginning, there was Mac::AppleEvents. This module wrapped the raw Apple events API, with its cryptic four-character codes to describe applications and their capabilities, and its collection of awkward constants. You had to find out the four-character identifiers yourself, you had to manage and dispose of memory yourself, but at least it got you talking Apple events. Here's some Mac::AppleEvents code to open your System Folder in the Finder::


use Mac::AppleEvents;

my $evt = AEBuildAppleEvent('aevt', 'odoc', typeApplSignature, 
             'MACS', kAutoGenerateReturnID, kAnyTransactionID,
             "'----': obj{want:type(prop), from:'null'()," .
                "form:prop, seld:type(macs)}"
          );
my $rep = AESend($evt, kAEWaitReply);

AEDisposeDesc($evt);
AEDisposeDesc($rep);

Obviously this isn't putting the computer to its full use; in a high-level language like Perl, we shouldn't have to concern ourselves with clearing up descriptors when they're no longer in use, or providing low-level flags. We just want to send the message to the Finder. So along came Mac::AppleEvents::Simple, which does more of the work:


use Mac::AppleEvents::Simple;
do_event(qw(aevt odoc MACS),
     "'----': obj{want:type(prop), from:'null'()," .
     "form:prop, seld:type(macs)}"
);

This is a bit better; at least we're just talking the IAC language now, instead of having to emulate the raw API. But those troublesome identifiers -- "aevt" for the Finder, "odoc" to open a document, and "MACS" for the System folder.

Maybe we'd be better off in AppleScript after all -- the AppleScript code for the same operation looks like this:


tell application "Finder" to open folder "System Folder"

And before Mac::Glue was ported to Mac OS X, this is exactly what we had to do:


use Mac::AppleScript qw(RunAppleScript);
RunAppleScript('tell application "Finder" to open folder "System Folder"');

This is considerably easier to understand, but it's just not Perl. Mac::Glue uses the same magic that allows AppleScript to use names instead of identifiers, but wraps it in Perl syntax:


use Mac::Glue;
my $finder = Mac::Glue->new('Finder');
$finder->open( $finder->prop('System Folder') );

Setting Up and Creating Glues

On Mac OS 9, MacPerl comes with Mac::Glue. However, OS X users will need to install it themselves. Mac::Glue requires several other CPAN modules to be installed, including the Mac-Carbon distribution.

Because this in turn requires the Carbon headers to be available, you need to install the correct Apple developer kits; if you don't have the Developer Tools installed already, you can download them from the ADC site.

Once you have the correct headers installed, the best way to get Mac::Glue up and running is through the CPAN or CPANPLUS modules:


% perl -MCPAN -e 'install "Mac::Glue"'

This should download and install all the prerequisites and then the Mac::Glue module itself.

When it installs itself, Mac::Glue also creates "glue" files for the core applications -- Finder, the System Events library, and so on. A glue file is used to describe the resources available to an application and what can be done to the properties that it has.

If you try to use Mac::Glue to control an application for which it doesn't currently have a glue file, it will say something like this:


No application glue for 'JEDict' found in 
'/Library/Perl/5.8.1/Mac/Glue/glues' at -e line 1

To create glues for additional applications that are not installed by default, you can drop them onto the Mac OS 9 droplet "macglue." On Mac OS X, run the gluemac command.

What's a Property?

Once you have all your glues set up, you can start scripting Mac applications in Perl. It helps if you already have some knowledge of how AppleScript works before doing this, because sometimes Mac::Glue doesn't behave the way you expect it to.

For instance, we want to dump all the active to-do items from iCal. To-dos are associated with calendars, so first we need a list of all the calendars:


use Mac::Glue;
my $ical = new Mac::Glue("iCal");

my @cals = $ical->prop("calendars");

The problem we face immediately is that $ical->prop("calendars") doesn't give us the calendars. Instead, it gives us a way to talk about the calendars' property. It's an object. To get the value of that property, we call its get method:


my @cals = $ical->prop("calendars")->get;

This returns a list of objects that allow us to talk about individual calendars. We can get their titles like so:


for my $cal (@cals) {
    my $name = $cal->prop("title")->get;

And now we want to get the to-dos in each calendar that haven't yet been completed or have no completion date:


    my @todos = grep { !$_->prop("completion_date")->get }
                       $cal->prop("todos")->get;

If we then store the summary for each of the to-do items in a hash keyed by the calendar name:


    $todos{$name} = [ map { $_->prop("summary")->get } @todos ]
	if @todos;
}

Then we can print out the summary of all the outstanding to-do items in each calendar:


for my $cal(keys %todo) {
    print "$cal:\n";
    print "\t$_\n" for @{$todo{$cal}};
}

Putting it all together, the code looks like:


use Mac::Glue;
my $ical = new Mac::Glue("iCal");

my @cals = $ical->prop("calendars")->get;
for my $cal (@cals) {
    my $name = $cal->prop("title")->get;
    my @todos = map  { $_->prop("summary")->get }
                grep { !$_->prop("completion_date")->get }
                       $cal->prop("todos")->get;
    $todo{$name} = \@todos if @todos;
}

for my $cal(keys %todo) {
    print "$cal:\n";
    print "\t$_\n" for @{$todo{$cal}};
}

The question is, where did we get the property names like summary and completion_date from? How did we know that the calendars had titles but the to-do items had summaries, and so on?

There are two answers to this: the first is to use the documentation created when the glue is installed. Typing gluedoc iCal on Mac OS X or using Shuck on Mac OS 9, you will find the verbs, properties, and objects that the application supports. For instance, under the calendar class, you should see:

This class represents a calendar

Properties:

    description (wr12/utxt): This is the calendar
description. (read-only)
    inheritance (c@#^/item): All of the properties of the
superclass. (read-only)
    key (wr03/utxt): An unique calendar key (read-only)
    tint (wr04/utxt): The calendar color (read-only)
    title (wr02/utxt): This is the calendar title.
    writable (wr05/bool): If this calendar is writable
(read-only)

Elements:

    event, todo

This tells us that we can ask a calendar for its title property, and also for the events or todos contained within it.

Similarly, when we get the events back, we can look up the "event" class in the documentation and see what properties are available on it.

The second, and perhaps easier, way to find out what you can do with an application is to open the AppleScript Script Editor application, select Open Dictionary from the File menu, and choose the application you want to script. Now you can browse a list of the classes and commands associated with the application:

When you need to know how to translate those back into Perl, you can then consult the glue documentation. It takes a few attempts to get used to the way Mac::Glue works, but once you've done that, you'll find that you can translate between the AppleScript documentation and a Mac::Glue equivalent in your head.

Some Examples

In a couple of weeks, we'll be presenting a "Mac::Glue Hacks" article in the spirit of the O'Reilly hacks books series, with several simple Mac::Glue-based application scripting tricks to whet your appetite and explore what Mac::Glue can do. But to get you started, here's a couple we found particularly useful.

First, iTunes allows you to give a rating to your favorite songs, on the scale of zero to five stars. Actually, internally, this is stored in the iTunes database as a number between 0 and 100. Simon keeps iTunes playing randomly over his extensive music collection, and every time an interesting track comes up, he runs this script:


my $itunes = Mac::Glue->new("iTunes");
exit unless $itunes->prop("player state")->get eq "playing";

my $rating = $itunes->prop("current track")->prop("rating");
$rating->set(to => ($rating->get + 20))
  if $rating->get < 81;

As well as getting properties from Mac::Glue, we can also set them back with the set method.

One more complex example is the happening script Chris uses to publish details of what's going on at his computer. As well as simply reporting the current foremost application, it dispatches based on that application to report more information. For instance, if Safari has the focus, it reports what web page is being looked at; if it's the Terminal, what program is currently being run. It also contacts iTunes to see what song is playing, and if there's nothing playing on a local iTunes, asks likely other computers on the network if they're playing anything.

Once happening has discovered what's going on, it checks to see if the iChat status is set to "Available," and if so, resets itself it to report this status. Let's break down happening and see how it accomplishes each of these tasks.

First, to work out the name of the currently focused application:


my $system = get_app('System Events') or return;
$app    ||= $system->prop(name => item => 1,
    application_process => whose(frontmost => equals => 1)
);

$app->get;

get_app is just a utility function that memorizes the process of calling Mac::Glue->new($app_name); since loading up the glue file is quite expensive, keeping around application glue objects is a big speed-saving approach.

The next incantation shows you how natural Mac::Glue programming can look, but also how much you need to know about how the Apple environment works. We're asking the System Events library to tell us about the application process that matches a certain condition. Mac::Glue exports the whose function to create conditions.

The important thing about this is the fact that we use $app ||= .... The construction that we saved in $app does not give us "the name of the front-most application at this moment," but it represents the whole concept of "the name of the front-most application." At any time in the future, we can call get on it, and it will find out and return the name of the front-most application at that time, even if it has changed since the last time you called get.

Now that we know what the front-most application is, we can look it up in a hash that contains subroutines returning information specific to that application. For instance, here's the entry for Safari:


Safari => sub { my ($glue) = @_;
                my $obj = $glue->prop(url => document => 1 => window => 1);
                my $url = $obj->get;
                return URI->new($url)->host if $url;

This returns the host part of the URL in the first document in the first window. For ircle, an IRC client, this code will get the channel and server name for the current connection:


ircle       => sub { sprintf("%s:%s",
               $_[0]->prop('currentchannel')->get,
               $_[0]->prop(servername => connection =>
                   $_[0]->prop('currentconnection')->get
               )->get
              )
            },

A decent default action is to return the window title:


default     => sub { my($glue) = @_;
                     my $obj = $objs{$glue->{APPNAME}} ||=
                               $glue->prop(name => window => 1);
                     $obj->get;
                   },

As before, we cache the concept of "the name of the current window" and only create it when we don't have one already.

Now let's look at the "Now playing in iTunes" part:


$state  ||= $itunes->prop('player state');
return unless $state->get eq "playing";

$track  ||= $itunes->prop('current track');
%props    = map { $_ => $track->prop($_) } qw(name artist)
            unless keys %props;

my %info;
for my $prop (keys %props) {
    $info{$prop} = $props{$prop}->get;
}

This first checks to see if iTunes is playing, and returns unless it is. Next, we look for the current track, and get handles to the name and artist properties of that track, as in our previous iTunes example.

Finally, when we've set up all the handles we need, we call get to turn them into real data. This populates %info with the name and artist of the currently playing track.

Now that we have the current application name, the extra information, and the current track, we can publish them as the iChat status, with this subroutine:


use Mac::Apps::Launch qw(IsRunning);

sub ichat {
    my($output) = @_;

    my $ichat = get_app('iChat') or return;
    return unless IsRunning($ichat->{ID});

    $status  ||= $ichat->prop('status');
    return unless $status->get eq 'available';

    $message ||= $ichat->prop('status message');
    $message->set(to => $output);
}

First, we have the IsRunning subroutine from Mac::AppleEvents::Simple, which takes the old-style four-character ID of the application we want to ask about. The ID slot of the glue object will tell us this ID, and so we can immediately give up setting the iChat status if iChat isn't even running. Then we use set as before to change the status to whatever we want.

Finally, we mentioned that happening can also ask other hosts what's playing on their iTunes as well. This is because, if "Remote Apple Events" is turned on in the Sharing preferences, Macs support passing these Apple events between machines. Of course, this often requires authentication, so when it first contacts a host to send an Event, happening will pop-up a login box to ask for credentials -- this is all handled internally by the operating system. Here's the code that happening actually uses:


my $found = 0;
if (IsRunning($itunes->{ID})) {
    $itunes->ADDRESS;
    $found = 1 if $state->get eq 'playing';
}

unless ($found) {
    for my $host (@hosts) {
        next unless $hosts{$host} + 60 < time();
        $itunes->ADDRESS(eppc => iTunes => $host);
        $found = 1, last if $state->get eq 'playing';
        $hosts{$host} = time();
    }
}

The first paragraph checks to see if iTunes is running locally. If so, we're done. If not, we're going to have to ask the hosts specified in the @hosts array about it. The first and last lines inside the for loop simple ensure that hosts are only tried every minute at most. The second line in there is the interesting one, though:


$itunes->ADDRESS(eppc => iTunes => $host);

This changes the iTunes glue handle from being a local one to being one that contacts the "iTunes" application on host $host over EPPC, the remote Apple events transport.

Because $state is the player status of $itunes, it will now return the correct status even though $itunes now refers to an application on a different computer! Similarly, all the handles we have to the artist and name of the current track will correctly refer to $itunes, no matter which iTunes instance that means.

We hope you'll join us next time for more Mac::Glue tips and tricks, as we look at real-life applications of scripting Mac applications in Perl.

Maintaining Regular Expressions

For some, regular expressions provide the chainsaw functionality of the much-touted Perl "Swiss Army knife" metaphor. They are powerful, fast, and very sharp, but like real chainsaws, can be dangerous when used without appropriate safety measures.

In this article I'll discuss the issues associated with using heavy-duty, contractor-grade regular expressions, and demonstrate a few maintenance techniques to keep these chainsaws in proper condition for safe and effective long-term use.

Readability: Whitespace and Comments

Before getting into any deep issues, I want to cover the number one rule of shop safety: use whitespace to format your regular expressions. Most of us already honor this wisdom in our various coding styles (though perhaps not with the zeal of Python developers). But more of us could make better, judicious use of whitespace in our regular expressions, via the /x modifier. Not only does it improve readability, but allows us to add meaningful, explanatory comments. For example, this simple regular expression:

# matching "foobar" is critical here ...
  $_ =~ m/foobar/;

Could be rewritten, using a trailing /x modifier, as:

$_ =~ m/ foobar    # matching "foobar" is critical here ...
         /x;

Now, in this example you might argue that readability wasn't improved at all; I guess that's the problem with triviality. Here's another, slightly less trivial example that also illustrates the need to escape literal whitespace and comment characters when using the /x modifier:

$_ =~ m/^                         # anchor at beginning of line
          The\ quick\ (\w+)\ fox    # fox adjective
          \ (\w+)\ over             # fox action verb
          \ the\ (\w+) dog          # dog adjective
          (?:                       # whitespace-trimmed comment:
            \s* \# \s*              #   whitespace and comment token
            (.*?)                   #   captured comment text; non-greedy!
            \s*                     #   any trailing whitespace
          )?                        # this is all optional
          $                         # end of line anchor
         /x;                        # allow whitespace

This regular expression successfully matches the following lines of input:

The quick brown fox jumped over the lazy dog
The quick red fox bounded over the sleeping dog
The quick black fox slavered over the dead dog   # a bit macabre, no?

While embedding meaningful explanatory comments in your regular expressions can only help readability and maintenance, many of us don't like the plethora of backslashed spaces made necessary by the "global" /x modifier. Enter the "locally" acting (?#) and (?x:) embedded modifiers:

$_ =~ m/^(?#                      # anchor at beginning of line

          )The quick (\w+) fox (?#  # fox adjective
          )(\w+) over (?#           # fox action verb
          )the (\w+) dog(?x:        # dog adjective
                                    # optional, trimmed comment:
            \s*                     #   leading whitespace
            \# \s* (.*?)            #   comment text
            \s*                     #   trailing whitespace

          )?$(?#                    # end of line anchor
          )/;

In this case, the (?#) embedded modifier was used to introduce our commentary between each set of whitespace-sensitive textual components; the non-capturing parentheses construct (?:) used for the optional comment text was also altered to include a locally-acting x modifier. No backslashing was necessary, but it's a bit harder to quickly distinguish relevant whitespace. To each their own, YMMV, TIMTOWTDI, etc.; the fact is, both commented examples are probably easier to maintain than:

# match the fox adjective and action verb, then the dog adjective,
  # and any optional, whitespace-trimmed commentary:
  $_ =~ m/^The quick (\w+) fox (\w+) over the (\w+) dog(?:\s*#\s*(.*?)\s*$/;

This example, while well-commented and clear at first, quickly deteriorates into the nearly unreadable "line noise" that gives Perl programmers a bad name and makes later maintenance difficult.

So, as in other programming languages, use whitespace formatting and commenting as appropriate, or maybe even when it seems like overkill; it can't hurt. And like the choice between alternative code indentation and bracing styles, Perl regular expressions allow a few different options (global /x modifier, local (?#) and (?x:) embedded modifiers) to suit your particular aesthetics.

Capturing Parenthesis: Taming the Jungle

Most of us use regular expressions to actually do something with the parsed text (although the condition that the input matches the expressions is also important). Assigning the captured text from the previous example is relatively easy: the first three capturing parentheses are visually distinct and can be clearly numbered $1, $2 and $3; however, the extra set of non-capturing parentheses, which provide optional commentary, themselves have another set of embedded, capturing parentheses; here's another rewriting of the example, with slightly less whitespace formatting:

my ($fox, $verb, $dog, $comment);
  if ( $_ =~ m/^                         # anchor at beginning of line
               The\ quick\ (\w+)\ fox    # fox adjective
               \ (\w+)\ over             # fox action verb
               \ the\ (\w+) dog          # dog adjective
               (?:\s* \# \s* (.*?) \s*)? # an optional, trimmed comment
               $                         # end of line anchor
              /x
     ) {
      ($fox, $verb, $dog, $comment) = ($1, $2, $3, $4);
  }

From a quick glance at this code, can you immediately tell whether the $comment variable will come from $4 or $5? Will it include the leading # comment character? If you are a practiced regular expression programmer, you probably can answer these questions without difficulty, at least for this fairly trivial example. But if we could make this example even clearer, you will hopefully agree that similarly clarifying some of your more gnarly regular expressions would be beneficial in the long run.

When regular expressions grow very large, or include more than three pairs of parentheses (capturing or otherwise), a useful clarifying technique is to embed the capturing assignments directly within the regular expression, via the code-executing pattern (?{}). In the embedded code, the special $^N variable, which holds the contents of the last parenthetical capture, is used to "inline" any variable assignments; our previous example turns into this:

my ($fox, $verb, $dog, $comment);
  $_ =~ m/^                               # anchor at beginning of line
          The\ quick\  (\w+)              # fox adjective
                       (?{ $fox  = $^N }) 
          \ fox\       (\w+)              # fox action verb
                       (?{ $verb = $^N })
          \ over\ the\ (\w+)              # dog adjective
                       (?{ $dog  = $^N })
          dog
                                          # optional trimmed comment
            (?:\s* \# \s*                 #   leading whitespace
            (.*?)                         #   comment text
            (?{ $comment = $^N })
            \s*)?                         #   trailing whitespace
          $                               # end of line anchor
         /x;                              # allow whitespace

Now it should be explicitly clear that the $comment variable will only contain the whitespace-trimmed commentary following (but not including) the # character. We also don't have to worry about numbered variables $1, $2, $3, etc. anymore, since we don't make use of them. This regular expression can be easily extended to capture other text without rearranging variable assignments.

Repeated Execution

There are a few caveats to using this technique, however; note that code within (?{}) constructs is executed immediately as the regular expression engine incorporates it into a match. That is, if the engine backtracks off a parenthetical capture to generate a successful match that does not include that capture, the associated (?{}) code will have already been executed. To illustrate, let's again look at just the capturing pattern for the comment text (.*?) and let's also add a debugging warn "$comment\n" statement:

# optional trimmed comment
            (?:\s* \# \s*               #   leading whitespace
            (.*?) (?{ $comment = $^N;   #   comment text
                      warn ">>$comment<<\n"
                        if $debug;
                    })
            \s*)?                       #   trailing whitespace
          $                             # end of line anchor

The capturing (.*?) pattern is a non-greedy extension that will cause the regular expression matching engine to constantly try to finish the match (looking for any trailing whitespace and the end of string, $) without extending the .*? pattern any further. The upshot of all this is that with debugging turned on, this input text:

The quick black fox slavered over the dead dog # a bit macabre, no?

Will lead to these debugging statements:

>><<
>>a<<
>>a <<
>>a b<<
>>a bi<<
>>a bit<<
>>a bit <<
>>a bit m<<
[ ... ]
>>a bit macabre, n<<
>>a bit macabre, no<<
>>a bit macabre, no?<<

In other words, the adjacent embedded (?{}) code gets executed every time the matching engine "uses" it while trying to complete the match; because the matching engine may "backtrack" to try many alternatives, the embedded code will also be executed as many times.

This multiple execution behavior does raise a few concerns. If the embedded code is only performing assignments, via $^N, there doesn't seem at first to be much of a problem, because each successive execution overrides any previous assignments, and only the final, successful execution matters, right? However, what if the input text had instead been:

The quick black fox slavered over the dead doggie # a bit macabre, no?

This text should fail to match the regular expression overall (since "doggie" won't match "dog"), and it does. But, because the embedded (?{}) code chunks are executed as the match is evaluated, the $fox, $verb and $dog variables are successfully assigned; the match doesn't fail until "doggie" is seen. Our program might now be more readable and maintainable, but we've also subtly altered the behavior of the program.

The second problem is one of performance; what if our assignment code hadn't simply copied $^N into a variable, but had instead executed a remote database update? Repeatedly hitting the database with meaningless updates may be crippling and inefficient. However, the behavioral aspects of the database example are even more frightening: what if the match failed overall, but our updates had already been executed? Imagine that instead of an update operation, our code triggered a new row insert for the comment, inserting multiple, incorrect comment rows!

Deferred Execution

Luckily, Perl's ability to introduce "locally scoped" variables provides a mechanism to "defer" code execution until an overall successful match is accomplished. As the regular expression matching engine tries alternative matches, it introduces a new, nested scope for each (?{}) block, and, more importantly, it exits a local scope if a particular match is abandoned for another. If we were to write out the code executed by the matching engine as it moved (and backtracked) through our input, it might look like this:

{ # introduce new scope
  $fox = $^N;
  { # introduce new scope
    $verb = $^N;
    { # introduce new scope
      $dog = $^N;
      { # introduce new scope
        $comment = $^N;
      } # close scope: failed overall match
      { # introduce new scope
        $comment = $^N;
      } # close scope: failed overall match
      { # introduce new scope
        $comment = $^N;
      } # close scope: failed overall match

      # ...

      { # introduce new scope
        $comment = $^N;
      } # close scope: successful overall match
    } # close scope: successful overall match
  } # close scope: successful overall match
} # close scope: successful overall match

We can use this block-scoping behavior to solve both our altered behavior and performance issues. Instead of executing code immediately within each block, we'll cleverly "bundle" the code up, save it away on a locally scoped "stack," and only process the code if and when we get to the end of a successful match:

my ($fox, $verb, $dog, $comment);
  $_ =~ m/(?{
              local @c = ();            # provide storage "stack"
          })
          ^                             # anchor at beginning of line
          The\ quick\  (\w+)            # fox adjective
                       (?{
                           local @c;
                           push @c, sub {
                               $fox = $^N;
                           };
                       })
          \ fox\       (\w+)            # fox action verb
                       (?{
                           local @c = @c;
                           push @c, sub {
                               $verb = $^N;
                           };
                       })
          \ over\ the\ (\w+)            # dog adjective
                       (?{
                           local @c = @c;
                           push @c, sub {
                               $dog = $^N;
                           };
                       })
          dog
                                        # optional trimmed comment
            (?:\s* \# \s*               #   leading whitespace
            (.*?)                       #   comment text
            (?{
                local @c = @c;
                push @c, sub {
                    $comment = $^N;
                    warn ">>$comment<<\n"
                      if $debug;
                };
            })
            \s*)?                       #   trailing whitespace
          $                             # end of line anchor
          (?{
              for (@c) { &$_; }         # execute the deferred code
          })
         /x;                            # allow whitespace

Using subroutine "closures" to package up our code and save them on a locally defined stack, @c, allows us to defer any processing until the very end of a successful match. Here's the matching engine code execution "path":

{ # introduce new scope

  local @c = (); # provide storage "stack"

  { # introduce new scope

    local @c;
    push @c, sub { $fox = $^N; };

    { # introduce new scope

      local @c = @c;
      push @c, sub { $verb = $^N; };

      { # introduce new scope

        local @c = @c;
        push @c, sub { $dog = $^N; };

        { # introduce new scope

          local @c = @c;
          push @c, sub { $comment = $^N; };

        } # close scope; lose changes to @c

        { # introduce new scope

          local @c = @c;
          push @c, sub { $comment = $^N; };

        } # close scope; lose changes to @c

        # ...

        { # introduce new scope

          local @c = @c;
          push @c, sub { $comment = $^N; };

          { # introduce new scope

            for (@c) { &$_; }

          } # close scope

        } # close scope; lose changes to @c
      } # close scope; lose changes to @c
    } # close scope; lose changes to @c
  } # close scope; lose changes to @c
} # close scope; no more @c at all

This last technique is especially wordy; however, given judicious use of whitespace and well-aligned formatting, this idiom could ease the maintenance of long, complicated regular expressions.

But, more importantly, it doesn't work as written. What!?! Why? Well, it turns out that Perl's support for code blocks inside (?{}) constructs doesn't support subroutine closures (even attempting to compile one causes a core dump). But don't worry, all is not lost! Since this is Perl, we can always take things a step further, and make the hard things easy ...

Making it Actually Work: use Regexp::DeferredExecution

Though we cannot (yet) compile subroutines within (?{}) constructs, we can manipulate all the other types of Perl variables: scalars, arrays, and hashes. So instead of using closures:

m/
    (?{ local @c = (); })
    # ...
    (?{ local @c; push @c, sub { $comment = ^$N; } })
    # ...
    (?{ for (@c) { &$_; } })
   /x

We can instead just package up our $comment = $^N code into a string, to be executed by an eval statement later:

m/
    (?{ local @c = (); })
    # ...
    (?{ local @c; push @c, [ $^N, q{ $comment = ^$N; } ] })
    # ...
    (?{ for (@c) { $^N = $$[0]; eval $$[1]; } })
   /x

Note that we also had to store away the version of $^N that was active at the time of the (?{}) pattern, because it very likely will have changed by the end of the match. We didn't need to do this previously, as we were storing closures that efficiently captured all the local context of the code to be executed.

Well, now this is getting really wordy, and downright ugly to be honest. However, through the magic of Perl's overloading mechanism, we can avoid having to see any of that ugliness, by simply using the Regexp::DeferredExecution module from CPAN:

use Regexp:DeferredExecution;

  my ($fox, $verb, $dog, $comment);
  $_ =~ m/^                               # anchor at beginning of line
          The\ quick\  (\w+)              # fox adjective
                       (?{ $fox  = $^N }) 
          \ fox\       (\w+)              # fox action verb
                       (?{ $verb = $^N })
          \ over\ the\ (\w+)              # dog adjective
                       (?{ $dog  = $^N })
          dog
                                          # optional trimmed comment
            (?:\s* \# \s*                 #   leading whitespace
            (.*?)
            (?{ $comment = $^N })         #   comment text
            \s*)?                         #   trailing whitespace
          $                               # end of line anchor
         /x;                              # allow whitespace

How does the Regexp::DeferredExecution module perform its magic? Carefully, of course, but also simply; it just makes the same alterations to regular expressions that we made manually. 1) An initiating embedded code pattern is prepended to declare local "stack" storage. 2) Another embedded code pattern is added at the end of the expression to execute any code found in the stack (the stack itself is stored in @Regexp::DeferredExecution::c, so you shouldn't need to worry about variable name collisions with your own code). 3) Finally, any (?{}) constructs seen in your regular expressions are saved away onto a local copy of the stack for later execution. It looks a little like this:

package Regexp::DeferredExecution;

use Text::Balanced qw(extract_multiple extract_codeblock);

use overload;

sub import { overload::constant 'qr' => \&convert; }
sub unimport { overload::remove_constant 'qr'; }

sub convert {

  my $re = shift; 

  # no need to convert regexp's without (?{ <code> }):
  return $re unless $re =~ m/\(\?\{/;

  my @chunks = extract_multiple($re,
                                [ qr/\(\?  # '(?' (escaped)
                                     (?={) # followed by '{' (lookahead)
                                    /x,
                                  \&extract_codeblock
                                ]
                               );

  for (my $i = 1 ; $i < @chunks ; $i++) {
    if ($chunks[$i-1] eq "(?") {
      # wrap all code into a closure and push onto the stack:
      $chunks[$i] =~
        s/\A{ (.*) }\Z/{
          local \@Regexp::DeferredExecution::c;
          push \@Regexp::DeferredExecution::c, [\$^N, q{$1}];
        }/msx;
  }

  $re = join("", @chunks);

  # install the stack storage and execution code:
  $re = "(?{
            local \@Regexp::DeferredExecution::c = (); # the stack
         })$re(?{
            for (\@Regexp::DeferredExecution::c) {
              \$^N = \$\$_[0];  # reinstate \$^N
              eval \$\$_[1];    # execute the code
            }
         })";

  return $re;
}

1;

One caveat of Regexp::DeferredExecution use is that while execution will occur only once per compiled regular expressions, the ability to embed regular expressions inside of other regular expressions will circumvent this behavior:

use Regexp::DeferredExecution;

  # the quintessential foobar/foobaz parser:
  $re = qr/foo
           (?:
              bar (?:{ warn "saw bar!\n"; })
              |
              baz (?:{ warn "saw baz!\n"; })
           )?/x;

  # someone's getting silly now:
  $re2 = qr/ $re
             baroo!
             (?:{ warn "saw foobarbaroo! (or, foobazbaroo!)\n"; })
           /x;

  "foobar" =~ /$re2/;

  __END__
  "saw bar!"
Even though the input text to $re2 failed to match, the deferred code from $re was executed because its pattern did match successfully. Therefore, Regexp::DeferredExecution should only be used with "constant" regular expressions; there is currently no way to overload dynamic, "interpolated" regular expressions.

See Also

The Regexp::Fields module provides a much more compact shorthand for embedded named variable assignments, (?<varname> pattern), such that our example becomes:

use Regexp::Fields qw(my);

  my $rx =
    qr/^                             # anchor at beginning of line
       The\ quick\ (?<fox> \w+)\ fox # fox adjective
       \ (?<verb> \w+)\ over         # fox action verb
       \ the\ (?<dog> \w+) dog       # dog adjective
       (?:\s* \# \s*
          (?<comment> .*?)
       \s*)? # an optional, trimmed comment
       $                             # end of line anchor
      /x;

Note that in this particular example, the my $rx compilation stanza actually implicitly declared $fox, $verb etc. If variable assignment is all you're ever doing, Regexp::Fields is all you'll need. If you want to embed more generic code fragments in your regular expressions, Regexp::DeferredExecution may be your ticket.

And finally, because in Perl there is always One More Way To Do It, I'll also demonstrate Regexp::English, a module that allows you to use regular expressions without actually writing any regular expressions:

use Regexp::English;

  my ($fox, $verb, $dog, $comment);

  my $rx = Regexp::English->new
               -> start_of_line
               -> literal('The quick ')

               -> remember(\$fox)
                   -> word_chars
               -> end

               -> literal(' fox ')

               -> remember(\$verb)
                   -> word_chars
               -> end

               -> literal(' over the ')

               -> remember(\$dog)
                   -> word_chars
               -> end

               -> literal(' dog')

               -> optional
                   -> zero_or_more -> whitespace_char -> end
                   -> literal('#')
                   -> zero_or_more -> whitespace_char -> end

                   -> remember(\$comment)
                       -> minimal
                           -> multiple
                               -> word_char
                               -> or
                               -> whitespace_char
                           -> end
                       -> end
                   -> end
                   -> zero_or_more -> whitespace_char -> end
               ->end

               -> end_of_line;

  $rx->match($_);

I must admit that this last example appeals to my inner-Lispish self.

Hopefully you've gleaned a few tips and tricks from this little workshop of mine that you can take back to your own shop.

This week on Perl 6, week ending 2004-01-11

It's Monday. People have been talking about Perl 6, Parrot and the European Union Constitution. Let's find out what they've been saying about Parrot first shall we?

Threads

Threads were discussed some more. Dan's deadline is coming up soon, hopefully soon after that discussion will move from Holy Skirmishes about architecture and on to meaningful discussions of a real implementation.

Hmm... that came out rather more dismissive than I intended.

Continuation problems

Luke Palmer found a problem with Parrot's continuations. A continuation is supposed to encapsulate the interpreter's control state, in other words the position of the program counter and the state of the register stacks, and a pointer to the previous continuation. However, it turns out that a Parrot continuation just contains the program counter and a pointer to the previous continuation. There was some discussion of why this was so (Melvin Smith seemed to claim that it was both his fault and not his fault).

Everyone agreed that this needed to be fixed pretty promptly and it wasn't long before Luke posted a patch.

http://groups.google.com

http://groups.google.com

A problem with threads

In a change from the discussions of thread philosophy, Jeff Clites posted about a problem he was having with Parrot's current threads implementation which was causing problems when trying to join multiple threads. Between them, Jeff and Leo Tötsch tracked down a possible cause of the problem and Jeff offered up a patch which Leo applied.

http://groups.google.com

The PPC JIT gets fixed

Jeff Clites also posted a patch which cleans up the last problems with the JIT on PPC. Leo applied it. Us Apple users cheered.

http://groups.google.com

Luke Palmer gets his act together

Luke Palmer decided to get his act together (given the level of his contribution to the Perl 6 lists so far, I'm almost scared to find out what he's going to be like now...) and finish up his 'Priority DOD' rethink of the Garbage Collector. I confess I'm not really qualified to discuss what's different about it, beyond the claim of a 10,000% speed up when there were no 'eager' PMCs about (things that need immediate cleanup on scope exit; the canonical example being a Perlish file handle) and only a 5% slowdown when there were.

Luke and Leo discussed the patch a bit before Leo applied it.

http://groups.google.com

http://groups.google.com -- Luke explains the patch

IMCC speed issues

Dan posted some timings he'd made of IMCC compiling some large subs, which were not the most wonderful timings I've ever seen. A 41 minute compile isn't generally what one wishes to see. Melvin Smith had a few ideas about what was causing it, as did Leo (it seems that IMCC's register allocation is very slow in the presence of spilling and Live analysis increases with the product of the number of lines and variables in a segment. Leo recommended redoing the sub to reduce the number of (and avoid long lived) PIR variables (ie. registers) by using lexicals or globals instead.

http://groups.google.com/

References to hash elements

Arthur "Ponie" Bergman had some questions about how references to hash elements would be done. Consider the following Perl code:

   my %hash;
   $foo = \$hash{key};

   $$foo = "bar";

   print $hash{key}; # Prints "bar"

Arthur wondered how this could be made to work with the current vtable setup, specifically in the presence of tying. Simon Cozens thought that there should be a special HashElement PMC which would handle fetching the actual value from the hash (or writing it back to the hash) as appropriate. Dan agreed with him, so it looks like this might be the way forward.

http://groups.google.com

Instantiation?

Michal Wallace asked how to instantiate objects from Parrot. Luke Palmer supplied the answer, but pointed out that, at present, classes can only have integer attributes. It turns out that, for Michal's purposes, he can probably get by with using properties instead, so that's all right.

Stéphane Payrard did the decent thing and implemented the other attribute types. He even wrote tests.

http://groups.google.com

http://groups.google.com

Creating 'proper' interpreters in Parrot

Simon Cozens wondered what was left to do to allow parrot to be embedded in an interpreter and have PIR fed directly to it. Leo pointed him at his own YAL.

http://groups.google.com

http://toetsch.at/yal/ -- Yet Another Language

yield op?

Michal Wallace was uncomfortable with the workings of Parrot Coroutines and posted a sample of parrot code which demonstrated why. Leo promised to fix it once he'd applied Luke's Continuations patch.

http://groups.google.com

Congratulations Dan

Melvin Smith offered his congratulations to Dan for the first commercial use of Parrot. I think I can safely say we all echo those congratulations.

http://www.sidhe.org/~dan/blog/archives/000288.html

Meanwhile in perl6-language

Roles and Mix-ins

Discussion of roles as mix-ins kicked off again after the Christmas break. The canonical Dog::bark vs. Tree::bark problem was discussed.

http://groups.google.com

The European Union Constitution

For reasons that still escape me various Americans paraded their ignorance about the putative constitution of a loose union of sovereign states.

Perl 6 Story Cards

In last week's summary I mentioned the Perl 6 Stories Kwiki that Allison and chromatic set up some months ago and suggested that people wanting to write tests and stories for the new language take a look at it. It seems they did, and the Wiki's seen a good deal of activity. Check it out if you're interested in helping with the project.

http://p6stories.kwiki.org/

A modest question

Austin Hastings asked the design team why they were fascinated with Traits (which will be called Roles in Perl 6). He'd read the original paper and was unimpressed with the gains that were made by using them.

The awkwardly cased chromatic opened the case for Roles by pointing out that Roles allow for finer grained naming of chunks of functionality and code reuse. I must say I agree; I'm always keen on opportunities to name something.

http://groups.google.com

Announcement

Iain "Spoon" Truskett was not a prolific contributor to the Perl 6 mailing lists. He was, however, an important contributor to these summaries every week; he was the maintainer of WWW::Shorten, the module that I use to shorten the URLs on the version of the summary that goes out to perl6-announce.

He died from a sudden cardiac arrest on the 29th of December. He was 24. He will be missed. This summary is dedicated to his memory.

http://iain.truskett.id.au/ -- Iain's website

The State of Perl

A colleague of mine recently asked me about Perl's future. Specifically, he wondered if we have any tricks up our sleeves to compete against today's two most popular platforms: .NET and Java. Without a second's hesitation, I repeated the same answer I've used for years when people ask me if Perl has a future:

        Perl certainly is alive and well.  The Perl 6 development team is
        working very hard to define the next version of the Perl language.
        Another team of developers is working hard on Parrot, the next-
        generation runtime engine for Perl 6.  Parrot is being designed to
        support dynamic languages like Perl 6, but also Python, Ruby and
        others.  Perl 6 will also support a transparent migration of
        existing Perl 5 code.

Then I cheerfully continued with this addendum:

        Fotango is sponsoring one of their developers, Arthur Bergman, to
        work on Ponie, the reimplementation of Perl 5.10 on top of Parrot.

That is often a sufficient answer to the question, "Does Perl have a future?"

However, my colleague already knew about Perl 6 and Parrot. Perl 6 was announced with a great deal of fanfare about three and a half years ago. The Parrot project, announced as an April Fool's joke in 2001, is now over two years old as a real open source project. While Parrot has made some amazing progress, it is not yet ready for production usage, and will not be for some time to come. The big near-term goal for Parrot is to execute Python bytecode faster than the standard CPython implementation, and to do so by the Open Source Convention in July 2004. There's a fair amount of work to do between now and then, and even more work necessary to take Parrot from that milestone to something you can use as a replacement for something like, say, Perl.

So, aside from the grand plans of Perl 6 and Parrot, does Perl really have a future?

The State of Perl Development

Perl 6 and Parrot do not represent our future, but rather our long-term insurance policy. When Perl 6 was announced, the Perl 5 implementation was already about seven years old. Core developers were leaving perl5-porters and not being replaced. (We didn't know it at the time, but this turned out to be a temporary lull. Thankfully.) The source code is quite complex, and very daunting to new developers. It was and remains unclear whether Perl can sustain itself as an open source project for another ten or twenty years if virtually no one can hack on the core interpreter.

In 2000, Larry Wall saw Perl 6 as a means to keep Perl relevant, and to keep the ideas flowing within the Perl world. The fear at the time was quite palpable: if enough alpha hackers develop in Java or Python and not Perl, the skills we have spent years acquiring and honing will soon become useless and literally worthless. Furthermore, backwards compatibility with thirteen years (now sixteen years) of working Perl code was starting to limit the ease with which Perl can adapt to new demands. Taken to a logical extreme, all of these factors could work against Perl, rendering it yesterday's language, incapable of effectively solving tomorrow's problems.

The plan for Perl 6 was to provide not only a new implementation of the language, but also a new language design that could be extended by mere mortals. This could increase the number of people who would be both capable and interested in maintaining and extending Perl, both as a language and as a compiler/interpreter. A fresh start would help Perl developers take Perl into bold new directions that were simply not practical with the then-current Perl 5 implementation.

Today, over three years later, the Perl development community is quite active writing innovative software that solves the problems real people and businesses face today. However, the innovation and inspiration is not entirely where we thought it would be. Instead of seeing the new language and implementation driving a new wave of creativity, we are seeing innovation in the libraries and modules available on CPAN -- code you can use right now with Perl 5, a language we all know and love.

In a very real sense, the Perl 6 project has already achieved its true goals: to keep Perl relevant and interesting, and to keep the creativity flowing within the Perl community.

What does this mean for Perl's future? First of all, Perl 5 development continues alongside Perl 6 and Parrot. There are currently five active development branches for Perl 5. The main branch, Perl 5.8.x, is alive and well. Jarkko Hietaniemi released Perl 5.8.1 earlier this year as a maintenance upgrade to Perl 5.8.0, and turned over the patch pumpkin to Nick Clark, who is presently working on building Perl 5.8.3. In October, Hugo van der Sanden released the initial snapshot of Perl 5.9.0, the development branch that will lead to Perl 5.10. And this summer, Fotango announced that Arthur Bergman is working on Ponie, a port of Perl 5.10 to run on top of Parrot, instead of the current Perl 5 engine. Perl 5.12 may be the first production release to run on top of the new implementation.

For developers who are using older versions of Perl for compatibility reasons, Rafael Garcia-Suarez is working on Perl 5.6.2, an update to Perl 5.6.1 that adds support for recent operating-system and compiler releases. Leon Brocard is working on making the same kinds of updates for Perl 5.005_04.

Where is Perl going? Perl is moving forward, and in a number of parallel directions. For workaday developers, three releases of Perl will help you get your job done: 5.8.x, 5.6.x and, when absolutely necessary, 5.005_0x. For the perl5-porters who develop Perl itself, fixes are being accepted in 5.8.x and 5.9.x. For bleeding-edge developers, there's plenty of work to do on with Parrot. For the truly bleeding edge, Larry and his lieutenants are hashing out the finer points of the design of the Perl 6 language.

That describes where development of Perl as a language and as a platform is going. But the truly interesting things about Perl aren't language issues, but how Perl is used.

The State of Perl Usage

One way to get a glimpse how Perl is used in the wild is to look at CPAN. I recently took a look at the modules list (www.cpan.org/modules/01modules.index.html) and counted module distributions by the year of their most recent release. These statistics are not perfect, but they do give a reasonable first approximation of the age of CPAN distributions currently available.

        1995:   30 ( 0.51%)
        1996:   35 ( 0.59%)
        1997:   68 ( 1.16%)
        1998:  189 ( 3.21%)
        1999:  287 ( 4.88%)
        2000:  387 ( 6.58%)
        2001:  708 (12.03%)
        2002: 1268 (21.55%)
        2003: 2907 (49.40%)
        cpan: 5885 (100.00%)

Interestingly, about half of the distributions on CPAN were created or updated in 2003. A little further analysis shows that nearly 85% of these distributions were created or updated since the Perl 6 announcement in July 2000. Clearly, interest in developing in Perl is not on the wane. If anything, Perl development, as measured by CPAN activity, is quite healthy.

Looking at the "freshness" of CPAN doesn't tell the whole story about Perl. It merely indicates that Perl developers are actively releasing code on CPAN. Many of these uploads are new and interesting modules, or updates that add new features or fix bugs in modules that we use every day. Some modules are quite stable and very useful, even though they have not been updated in years. But many modules are old, outdated, joke modules, or abandoned.

A pessimist looks at CPAN and sees abandoned distributions, buggy software, joke modules and packages in the early stage of development (certainly not ready for "prime time" use). An optimist looks at CPAN and sees some amazingly useful modules (DBI, LWP, Apache::*, and so on), and ignores the less useful modules lurking in the far corners of CPAN.

Which view is correct? Looking over the module list, only a very small number of modules are jokes registered in the Acme namespace: about 85 of over 5800 distributions, or less than 2% of the modules on CPAN. Of course, there are joke modules that are not in the Acme namespace, like Lingua::Perligata::Romana and Lingua::Atinlay::Igpay. Yet the number of jokes released as CPAN modules remains quite small when compared to CPAN as a whole.

But how much of CPAN is actually useful? It depends on what kind of problems you're solving. Let's assume that only the code released within the last three years, or roughly 82% of CPAN, is worth investigating. Let's further assume that everything in the Acme namespace can be safely ignored, and that the total number of joke modules is no more than twice the number of Acme modules. Ignoring a further 3-4% of CPAN leaves us with about 78%, or over 4,000 distributions, to examine.

How much of this code is production-quality? It's quite difficult to say, actually. These modules cover a stunningly diverse range of problem domains, including, but not limited to:

  • Application servers
  • Artificial intelligence algorithms
  • Astronomy
  • Audio
  • Bioinformatics
  • Compression and encryption
  • Content management systems (for both small and large scale web sites)
  • Database interfaces
  • Date/Time Processing
  • eCommerce
  • Email processing
  • GUI development
  • Generic algorithms from computer science
  • Graphing and charting
  • Image processing
  • Mathematical and statistical programming
  • Natural language processing (in English, Chinese, Japanese, and Finnish, among others)
  • Network programming
  • Operating-system integration with Windows, Solaris, Linux, Mac OS, etc.
  • Perl development support
  • Perl/Apache integration
  • Spam identification
  • Software testing
  • Templating systems
  • Text processing
  • Web services, web clients, and web servers
  • XML/HTML processing

...and that's a very incomplete sample of the kinds of distributions available on CPAN today. Suffice it to say that hundreds, if not thousands, of CPAN modules are actively used on a daily basis to solve the kinds problems that we regularly face.

And isn't that the real definition of production quality, anyway?

The Other State of Perl Usage

As Larry mentioned in his second keynote address to the Perl Conference in 1998 (www.perl.com/pub/a/1998/08/show/onion.html), the Perl community is like an onion. The important part isn't the small core, but rather the larger outer layers where most of the mass and all of the growth are found. Therefore, the true state of Perl isn't about interpreter development or CPAN growth, but in how we all use Perl every day.

Why do we use Perl every day? Because Perl scales to solve both small and large problems. Unlike languages like C, C++, and Java, Perl allows us to write small, trivial programs quickly and easily, without sacrificing the ability to build large applications and systems. The skills and tools we use on large projects are also available when we write small programs.

Programming in the Small

Here's a common example. Suppose I want to look at the O'Reilly Perl resource page and find all links off of that page. My program starts out by loading two modules, LWP::Simple to fetch the page, and HTML::LinkExtor to extract all of the links:

        #!/usr/bin/perl -w

        use strict;
        use LWP::Simple;
        use HTML::LinkExtor;

        my $ext = new HTML::LinkExtor;
        $ext->parse(get("http://perl.oreilly.com/"));
        my @links = $ext->links();

At this point, I have the beginnings of a web spider or possibly a screen scraper. With a few regular expressions and a couple of list operations like grep, map, or foreach, I can whittle this list of links down to a list of links to Safari, the O'Reilly's book catalog, or new articles on Perl.com. A couple of lines more, and I could store these links in a database (using DBI, DB_File, GDBM, or some other persistent store).

I've written (and thrown away) many programs like this over the years. They are consistently easy to write, and typically less than one page of code. That says a lot about the capabilities Perl and CPAN provide. It also says a lot about how much a single programmer can accomplish in a few minutes with a small amount of effort.

Yet the most important lesson is this: Perl allows us to use the same tools we use to write applications and large systems to write small scripts and little hacks. Not only are we able to solve mundane problems quickly and easily, but we can use one set of tools and one set of skills to solve a wide range of problems. Furthermore, because we use the same tools, our quick hacks can work alongside larger systems.

Programming in the Large

Of course, it's one thing to assert that Perl programs can scale up beyond the quick hack. It's another thing to actually build large systems with Perl. The Perl Success Stories Archive (perl.oreilly.com/news/success_stories.html) details many such efforts, including many large systems, high-volume systems, and critical applications.

Then there are the high-profile systems that get a lot of attention at Perl conferences and on various Perl-related mailing lists. For example, Amazon.com, the Internet's largest retailer, uses HTML::Mason for portions of their web site. Another fifty-odd Mason sites are profiled (www.masonhq.com/about/sites.html) at www.masonhq.org, including Salon.com, AvantGo, and DynDNS.

Morgan Stanley is another big user of Perl. As far back as 2001, W. Phillip Moore talked about where Perl and Linux fit into the technology infrastructure at Morgan Stanley. More recently, Merijn Broeren detailed (conferences.oreillynet.com/cs/os2003/view/e_sess/4293) how Morgan Stanley relies on Perl to keep 9,000 of its computers up and running non-stop, and how Perl is used for a wide variety of applications used worldwide.

ValueClick, a provider of high-performance Internet advertising, pushes Perl in a different direction. Each day, ValueClick serves up over 100 million targeted banner ads on publisher web sites. The process of choosing which ad to send where is very precise, and handled by some sophisticated Perl code. Analyzing how effective these ads are requires munging through huge amounts of logging data. Unsurprisingly, ValueClick uses Perl here, too.

Ticketmaster sells tickets to sporting and entertainment events in at least twenty countries around the world. In a year, Ticketmaster sells over 80 million tickets worldwide. Recently, Ticketmaster sold one million tickets in a single day, and about half of those tickets were sold over the Web. And the Ticketmaster web site is almost entirely written in Perl.

These are only some of the companies that use Perl for large, important products. Ask around and you'll hear many, many more stories like these. Over the years, I've worked with more than a few companies who created some web-based product or service that was built entirely with Perl. Some of these products were responsible for bringing in tens of millions of dollars in annual revenue.

Clearly, Perl is for more than just simple hacks.

The New State of Perl Usage

Many companies use Perl to build proprietary products and Internet-based services they can sell to their customers. Still more companies use Perl to keep internal systems running, and save money through automating mundane processes.

A new way people are using Perl today is the open source business. Companies like Best Practical and Kineticode are building products with Perl, and earning money from training, support contracts, and custom development. Their products are open source, freely available, and easy to extend. Yet there is enough demand for add-on services that these companies are profitable and sustain development of these open source products.

Best Practical Solutions (www.bestpractical.com) develops Request Tracker, more commonly known as RT (www.bestpractical.com/rt). RT is an issue-tracking system that allows teams to coordinate their activities to manage user requests, fix bugs, and track actions taken on each task. As an open source project, RT has been under development since 1996, and has thousands of corporate users, including those listed on the testimonials page (www.bestpractical.com/rt/praise.html). Today, RT powers bug tracking for Perl development (rt.perl.org/perlbug), and for CPAN module development (rt.cpan.org). Many organizations rely on the information they keep in RT, sometimes upwards of 1000 issues per day, or 300,000 issues that must be tracked and resolved each year.

Kineticode (www.kineticode.com)is another successful open source business built around a Perl product, the Bricolage content management system (www.bricolage.cc). Bricolage is used by some rather large web sites, including ETOnline (www.etonline.com) and the World Health Organization (www.who.int). Recently, the Howard Dean campaign (www.deanforamerica.com) adopted Bricolage as its content management system to handle the site's frequent updates in the presence of millions of pageviews per day, with peak demand more than ten times that rate.

A somewhat related business is SixApart (www.sixapart.com), makers of the ever-popular MovableType (www.movabletype.org). SixApart offers MovableType with a free license for personal and non-commercial use, but charges a licensing fee for corporate and commercial use. Make no mistake, MovableType is proprietary software, even though it is implemented in Perl. Nevertheless, SixApart has managed to build a profitable business around their Perl-based product.

Surely these are the early days for businesses selling or supporting software written in Perl. These three companies are not the only ones forging this path, although they are certainly three of the most visible.

Conclusion

I started looking into the state of Perl today when my colleague asked me if Perl has a future. He challenged me to look past my knee-jerk answers, "Of course Perl has a future!" and "Perl's future is in Perl 6 and Parrot!" I'm glad he did.

There's a lot of activity in the Perl world today, and much of it quite easily overlooked. Core development is moving along at a respectable pace; CPAN activity is quite healthy; and Perl remains a capable environment for solving problems, whether they need a quick hack, a large system, or a Perl-based product. Even if we don't see Perl 6 in 2004, there's a lot of work to be done in Perl 5, and a lot of work Perl 5 is still quite capable of doing.

Then there's the original question that started this investigation rolling: "Can Perl compete with Java and .NET?" Clearly, when it comes to solving problems, Perl is at least as capable a tool as Java and .NET today. When it comes to evangelizing one platform to the exclusion of all others, then perhaps Perl can't compete with .NET or Java. Then again, when did evangelism ever solve a problem that involved sitting down and writing code?

Of course, if Java or .NET is more your speed, by all means use those environments. Perl's success is not predicated on some other language's failure. Perl's success hinges upon helping you get your job done.

This week on Perl 6, week ending 2004-01-04

What a surprise, a scant week after the last Perl 6 Summary of 2003, it's the first Perl 6 Summary of 2004. Will wonders never cease? Without further ado, we'll start with perl6-internals as usual.

Garbage Collection Tasks

Dan noted that a copying garbage collector doesn't play that well with threads so he proposed a twofold task to fix things.

  1. The GC and memory allocation APIs need to be formalized and abstracted in order to allow for changing the GC mechanism when threads come into play.
  2. Someone needs to implement a more traditional, non-moving GC as an alternative to the copying collector.

Plugging Parrot into Mozilla

Stephane Peiry posted a set of patches to allow a parrot plugin for Mozilla. Not satisfied with this (but pretty darned impressed all the same) Sam Vilain noted that it would be nice if someone wrote an ECMAscript front-end to Parrot. Patches welcome Sam.

http://groups.google.com/groups?selm=a06010201bc16014403dd@[10.0.1.2]

Problems with make test

Harry Jackson couldn't get his build of parrot to finish running make test. After a certain amount of thrashing about by the team, Dan narrowed it down to issues with the mutant '2.96' version of GCC that some versions of Red Hat used for a while. This is currently the list's best guess as to the root of the problem, but it's not absolutely certain. If it does turn out to be the compiler, the config suite will have to be rejigged to detect and warn.

http://groups.google.com/groups?selm=20031229221255.12911.qmail@onion.perl.org

Threading threads

They're everywhere! And I despair of summarizing them. So I won't. Here's the root messages for anyone interested enough. Once things have died down and we know how threading is going to work in Parrot I'll summarize that.

Dan opened the floodgates and asked anyone who was serious about their particular Right Way To Do Threading to write things up as a proper proposal. He outlined the constraints that Parrot's threading will be working under and encouraged everyone to speak now or forever hold their peace.

http://groups.google.com/groups?selm=1BAB64E0-3AE5-11D8-9E96-000502994722@mac.com

http://groups.google.com/groups?selm=a06010202bc176658b1c7@[172.24.18.98]

http://groups.google.com/groups?selm=a06010205bc17976efc33@[172.24.18.98] -- Dan says put up or shut up

http://groups.google.com/groups?selm=3FF2D35D.2090707@toetsch.at

http://groups.google.com/groups?selm=1106_1072984463@nntp.perl.org

http://groups.google.com/groups?selm=9227ACBD-3CB1-11D8-9E96-000502994722@mac.com

http://groups.google.com/groups?selm=a06010200bc1cfc809cde@[10.0.1.2]

http://groups.google.com/groups?selm=a0601020abc1e255347d5@[10.0.1.2] -- Dan offers up common terminology

Don't use IMCC macros

Bernhard Schmalhofer found what looked like a bug in IMCC's macro support. This prompted Melvin Smith to expedite the removal of IMCC macro support as threatened some weeks ago. However, it turned out that that wasn't actually the seat of the bug. But if you like IMCC macros now is the time to make a very good case to Melvin, I doubt you'll convince him though; macros belong in a preprocessor.

http://groups.google.com/groups?selm=1072802718.3ff1ab9e10ebc@maxplanck.biomax.de

wxWindows support?

David Cuny wondered if Parrot's objects were not at the point where it's possible to interface wxWindows to Parrot. So far he's been Warnocked.

Just try implementing it David.

http://groups.google.com/groups?selm=200312310021.20878.dcuny@lanset.com

Win32 Core Dumps

Dan's a fan of core dumps (when appropriate) and wondered if there was a way of getting windows to either produce a core dump or attach a debugger to a crashed process. Vladimir Lipsky and Nigel Sandever gave good answers.

http://groups.google.com/groups?selm=a06010206bc1895ed462f@[10.0.1.2]

The Piemark is ready

Dan forwarded the announcement that the Pie-thon Parrot Benchmark (which I've unilaterally decided to call the Piemark) code is ready. Let's make sure it's Guido not Dan who gets Pie eyed in Portland this year.

http://groups.google.com/groups?selm=a06010200bc18c1883507@[172.24.18.98]

ftp://ftp.python.org/pub/python/parrotbench/parrotbench.tgz

Object System?

Luke Palmer wondered what work was needed to finish up Parrot's object system. Judging by Leo's response there are name mangling issues that need deciding on, and we're not quite sure who's supposed to be making the decision. Dan?

http://groups.google.com/groups?selm=20031231210619.GA27582@babylonia.flatirons.org

Enhancements for the debugger

Whilst wearing his employee implementing a large project targetting Parrot hat, Dan has been using IMCC's debugging facilities. This led to a bunch of suggestions/decisions about how these could be improved.

http://groups.google.com/groups?selm=a06010203bc1b7e26d061@[10.0.1.2]

NCI callback functions

Leo isn't enamoured of the current PDD16 design of callbacks in NCI, so he proposed a new design. Dan seemed to think that this proposal smacked of getting a little to sophisticated to early, arguing that the best thing to do was to flesh out what's there (and get it working) before using it as a base on which to build. This means that, once his work deadline is out of the way, we should be expecting some better examples in PDD16. And we'll be reminding Dan of this in a couple of weeks' time.

http://groups.google.com/groups?selm=3FF86732.6090405@toetsch.at

Meanwhile in perl6-language

Nobody said anything. But that's boring, so on Friday I sent an email out to various denizens of the Perl 6 mailing lists asking them for their thoughts on where Perl 6 stands today and where they think it's going in the next 12 months. I'm pleased to say that, despite the ludicrously short notice, a decent number of people responded.

Everyone was remarkably consistent about where they think Perl 6 will be in the next year, they all expect to see a 'useful' alpha released and running on Parrot by the end of next year. Nat Torkington said that he didn't expect "any more unexpected delays -- I believe the doctors have run out of things to remove from Larry." and I think I am sure we all hope he's right, especially about the second part.

Leo Tötsch said that back when he answered the "Who's Who in Perl 6" questionnaire back in 2002, he'd said he thought Perl 6 would be out on 16 September 2004. He asked to increment the year of that prediction by at least one. Austin Hastings reckoned that we'd have a usable early version of Perl 6 sometime in Q2 or Q3, and expects the object apocalypse some time in Q1. However, he expected that there'd be fairly substantial exegesis drift from the original apocalypse to the 'real' design. Austin thinks that Perl 6's main 'cultural' impact will be grammars, arguing that in 10 years time 'getting coders to stop parsing characters, getting them instead to think, code, and word in terms of "sentences" or "paragraphs" will be considered a turning point.' Don't tell Austin this, but I remember Ward Cunningham saying something similar (but less emphatic) to me after Damian's Perl6::Rules presentation at OSCON 2003.

Allison Randal's in the slightly less bullish camp, arguing that it should be possible to produce a reasonably solid Perl 6 alpha in about 3 apocalypses time. She reckoned that we may see Apocalypses 12, 17 and 9 finished this year, and maybe a working prototype Perl 6 compiler. Allison's house mate, chromatic, reckoned that we were about 80% done now (I'm not sure if he was deliberately invoking the old saw that the first 80% takes 80% of the time, and the last 20% takes the other 80% of the time...). He predicted that:

  1. Dan will win his bet with Guido, and that the Python.Net people will be so embarrassed by the piemark that they won't publish numbers.
  2. Perl 6 won't quite be self-hosting, but it'll be usable for small apps.
  3. NCI will continue to be much nicer than XS.
  4. Apocalypse 12 will convince everyone that Roles are what object orientation should have had from the beginning.

Asked for pithy comments, chromatic gave good pith, noting that if he 'had a test case from everyone who asked "When'll it be done" and code to pass a test case from everyone who said "I'd like to help, but I don't know where to start"...' then he'd happily check them into the repository. He also said that anyone who 'wants to revive the Perl 6 Documentation project, start turning Apocalypses and Exegeses into story cards, and story cards into tests' would be his hero. And mine too. He didn't mention http://p6stories.kwiki.org/ so I'll do that instead.

Adam Turoff sounded a note of caution; he worries that Perl 6 'is Larry's Modula 2' but he doesn't think that matters because the real boon is Parrot (and Ponie) which has the potential to open the existing work on CPAN up to any language that targets Parrot (potentially making good work in other languages available to us Perlers too). He didn't think that Perl 6 will offer enough of an incentive for people to move to the new language from Perl 5. Indeed, he argued that the changes in syntax will put people off making the shift. We discussed this on AIM, personally I think Adam's wrong, and that Perl 6 will have enough good new stuff in it that people will bite the bullet of the new syntax (and the changes are reasonably simple after all) in order to get access to Perl 6's goodies. Adam certainly sees the change from Perl 5 to 6 as qualitatively different from the change from 4 to 5. He thinks people aren't going to switch quickly (especially if Ponie fulfils their needs) and he points out that it's going to be a few years before we've worked out the best practices for using all this new stuff.

Sadly, I didn't get any feedback from Larry before my deadline (if I do get something, rest assured it'll get space in next week's summary). I did get a lot from everyone's favourite evil genius though. Damian is alive, well and living in Australia. It seems that his recent silence on p6l may have something to do with his hopes 'to see Exegesis 7 published (along with a full Perl 5 implementation of the new formatting module) by late January.' In other Conway related news, he's attending linux.conf.au and will be attempting to describe the top 100 features of Perl 6 in 30 minutes. After that, he's 'set aside February through April to complete Perl6::Rules and a large test suite for Perl 6 regular expressions, under a small grant generously provided by The Perl Foundation'. Which can only be good news.

Me? I think Perl 6's design 'in the large' will be pretty much done once Apocalypse 12 and its corresponding Exegesis are finished. Of course, the devil is in the details, but I don't doubt that the hoped for existence of a working Perl6::Rules by the end of April is going to provide us with a great deal of the leverage we need to get a working Perl 6 alpha ready for OSCON with something rather more solid ready by the end of the year. Parrot continues to amaze and delight with its progress; Dan tells me that he's about ready to roll out a large parrot based application for his employers, so it's approaching the point where people's salaries will depend on Parrot. I confess I wouldn't be surprised if, by the end of the year, we haven't seen the full implementation of at least one of the big non-Perl scripting languages on top of Parrot.

Acknowledgements, Apologies, Announcements.

Many thanks to those of you who took the time to answer my mail about Perl 6 in the coming year. Apologies to anyone who I may have offended by failing to ask them. If you've got strong opinions about where you think Perl 6 is going, let me know; I'll either make space for them next week make some space for discussion on my website.

If you find these summaries useful or enjoyable, show your appreciation by contributing to the Perl Foundation to help support the ongoing development of Perl. Money and time are both good. Also, I'm always pleased to get feedback at and traffic at my website.

http://donate.perl-foundation.org/ -- The Perl Foundation

http://www.bofh.org.uk:8080/ -- My website, Just a Summary

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en