April 2003 Archives

This week on Perl 6, week ending 2003-04-27

Welcome back to (I hope) a normal length summary of Perl 6 shenanigans after last week's bumper double length installment. (Thankfully traffic has been much lighter this week; I'm not sure I could cope with writing another epic.)

Perl6-internals was quiet again this week, but traffic does appear to be picking up a little.

IMC and Variable Number of Arguments

Following on from questions on this subject last week, Klaas-Jan Stol asked for some clarification of Parrot calling conventions as documented in Parrot Design Document (PDD) 03. In a subsequent post he noted that it seems strange that the current perl6 implementation (such as it is) doesn't follow the Parrot calling conventions. Sean O'Rourke shuffled his feet slightly and admitted that he'd chosen that method because ``It was easiest'', though he'd later rationalized that to ``It doesn't matter for internal calls''. Dan Sugalski pointed out that it was only a temporary aberration. Unless I missed something, nobody actually answered Klaas-Jan's initial questions.


Short-lived memory allocation

Last week, in a discussion of the Garbage Collection (GC) system, Dan mentioned that Parrot GC would walk the system stack in order to build the Root Set (the list of memory 'nodes' that are initially known to be alive) which makes life a good deal easier for anyone allocating memory because they don't have to worry about explicitly attaching their shiny new buffer to the root set. Of course, it doesn't lessen the pain of whoever writes the GC system because, as Benjamin Goldberg pointed out, any system stack walking system is inherently unportable.

Benjamin was concerned that there may be systems where, not only could you not use the same stack walking code as was used everywhere else, but it would be impossible to walk the system stack at all. Dan admitted that there 'used to be' such systems, but asserted that, for Parrot's purposes ``Either we get access to the C auto variable chain, or we can't run there.'' Kurt Stephens pointed at some deeply scary sounding (and gloriously non-portable) system stack walking methods which involve co opting the C stack pointer and collecting garbage from unlikely places...

If you're not sure what is meant by 'Walking the system stack' then you're not alone. Tim Howell asked for clarification and received it from Matt Fowles and Brent Dax.



PMC Documentation

Alberto Simões continued on his mission to document Parrot's various PMC classes. This week he offered up some Perlstring documentation


The Native Call Interface (NCI) system.

Michael Collins has been playing around with calling C from Parrot and found the NCI system rather cumbersome. He wondered if the whole thing couldn't simply be implemented with a callcfunc opcode and showed a simple implementation of what he was driving at. Dan replied to this, explaining that he'd considered it but that it turns out not to be the Right Thing, and gave his reasons to do with encapsulation (when you call a PMC's method you shouldn't have to know whether it's written in C, PASM or Befunge) and issues with dynamic generation of stub functions. Michael wasn't entirely convinced by Dan's argument.

On the subject of NCI, Clinton A Pierce wondered if it is supposed to be Win32 capable and, if it was, what he needed to do to get it working as his first attempt had failed. Currently on the horns of Warnock's Dilemma.



use for p6c

Joseph F. Ryan resent his patch to implement use in Parrot's Perl 6 implementation, this time as a straight CVS diff on Steve Fink's request.


Building on Win32

Matthia Barbon continued his sterling work on getting Parrot to play well with Win32, offering a couple of patches. Steve Fink applied them both.



External Data Interfaces draft PDD

Matthia Barbon redrafted Brent Dax's External Data Interfaces PDD based on the discussion of the original document. This elicited a few comments and I presume we can expect another, clarified draft at some point.


PPC JIT questions

Peter Montagner has started to work on the Just In Time (JIT) system for the PowerPC architecture (a chap can get tired of making sure that every initialism is expanded at least once in a document you know. Still, it's been a few summaries since I last did anything like this) and had a couple of questions about its architecture and what he was allowed to do in PPC assembly. Dan answered, Peter asked for clarification; the usual give and take.


A question about encoding

Speed junkie Luke Palmer (Hmm... that may not mean quite what I want it to mean) is working on the next iteration of string_str_search, which, apparently isn't fast enough (it's still slower than Perl 5). He had a few questions about encodings. Benjamin Goldberg offered answers and an entire function. Luke was impressed, but he still muttered about how much faster single byte searching could be (he's written a singlebyte/singlebyte search routine that goes slightly more than 2.5 times faster than Perl 5 apparently).

Later, Luke posted a new implementation of string_str_search which used his screamingly fast algorithm for single byte searching and a 'stupid, slow algorithm' for multibyte searches. Steve Fink applied the patch and asked for a unified diff next time as they are easier to read.



Leo Tötsch's iterator proposal

Remember Leo Tötsch's iterator proposal that received so little feedback a few weeks back? Steve Fink finally commented.


uniq.pasm and Parrot IO

About a month ago, Jürgen Bömmels was doing some work on the moving Parrot's IO operators to use Parrot's own PIO asynchronous IO system, which broke Leon Brocard's uniq.pasm. Jürgen estimated that he might have it working 'next weekend'. This week Leon asked if any progress had been made yet.


Meanwhile, over in perl6-language

The Type discussions continued. (Though I note, with some satisfaction that most of the discussions died down after I posted the summary. Maybe I got it right. Or not.)


Michael Lazzaro's excellent summary of what we knew about Types served as the seed for this particular discussion, which was essentially a continuation by other title of the thread that started last week. Discussion centred on ways of specifying what sorts of type coercions should be automatic and which disallowed. Paul Hodges suggested use strict allow => { Int => Str, Foo => Bar } as a possibility, which doesn't quite work where you want to allow a type to convert to more than one possible target type, but it does look like a reasonable start.

There was also discussion of autocoercion to different types via a 'path' of coercions which looks to like it has enormous potential for complexity based nuisance. Michael Lazzaro reckoned that a lot of the chaining problems would go away if you drew a distinction between a 'lossless' coercion to a string and a serialization of something to a string for the purposes of debugging or whatever. (Coercing *from* a string to almost any other type but a numeric one seems to be the hard problem...)

Damian offered some thoughts about user defined coercions when he pointed out that a coercion is really an interaction between two classes and should therefore be implemented using a multimethod. He argued that things like Coercions were a strong argument for allowing return type to be incorporated into the multidispatch resolution mechanism. Others agreed that this would be a good thing, and we await a decision from Larry.

http://groups.google.com/groups -- Seed

http://groups.google.com/groups -- This week's 'root'


Change in 'if' syntax

The Subject of the thread still thinks the discussion is about a change in the syntax of if, but it's actually about a change in how a block is recognized, which has knock on effects on on the syntax of various control structures, of which if is just one example. Various people balked (again) at the fact that the new rules for detecting a block mean that code like %foo {bar} no longer means ``find the value associated with 'bar' in the hash %foo'' but instead evaluates to a hash variable followed by a block, and probably a syntax error soon afterwards. Michael Lazzaro and Damian showed workarounds for this (%foo .{bar} and %foo{ bar} respectively) and Arcadi Shehter reminded everyone of the space eating _ that seems to have been completely forgotten since the last time this issue arose.


anon-sub()'s execute in a for??

Paul Hodges took a crack at implementing for as a subroutine and came up with something that didn't look too insane. Luke Palmer added a refinement allowing for n at a time looping. However, for reasons that I can't quite put my finger on, I'm not quite sure that either method has got the sub signature quite right, and I'm not entirely sure how you would express for's signature as a Perl 6 signature anyway. Answers on a mailing list please.


* vs **

Paul Hodges wondered about the * and ** flattening operators. He wondered if they were actually distinct operators or if the double variant was simply the result of applying * to the results of *@whatever. According to Larry, the two are distinct, the only difference between them is one of timing. I'm not sure I understood the distinction to be honest and I don't think I'm alone in this. Austin Hastings asked for an illustrative example, but we didn't see any before the end of the week.


P6Objects: Hint's allegations and things unsaid

Austin Hastings' response to last week's mammoth Types discussion was a mammoth document of his own on what is currently known about Perl 6's Object Orientation features which was the result of trawling back through about two years of perl6-language postings. And jolly good it is too. It triggered a modest amount of discussion, but my theory is that everyone was too impressed to say much. I know I was.


Storing program state for restarting

Ed Peschko wondered if it would be possible to serialize the state of a running program so that it could pick up where it left off after it died at the point of expiry. Dave Storrs suggested that, if one could just serialize the current continuation, the problem would become trivial (I'm not entirely sure he's right about that though; the current continuation seems to me to be in slightly the wrong place) and asked how one could do such a thing. The answer appears to be 'to the best of current research's knowledge, you can't generally do that, but we're working on it...'.


C-struct style data reads

Paul Hodges doesn't like pack. What he wants to be able to do is something akin to C's:

    struct {
        int   someInt;
        float someFloat;
        char  strData[42];
        float otherFloat;
        char  moreStr[123]
    } buf;
    fread(buf, sizeof buf, 1, fp);
    printf("%.0f %s", buf.someFloat, buf.moreStr);

instead of unpacking a string with unpack "ifa43fa123" <$file> or whatever. This led to some discussion of appropriate syntax and the like, but it was generally felt that, even if something like this wasn't in the core, it wouldn't be too hard to implement it.


Return types vs. Generic Programming

Michael Lazzaro wants the moon on a stick. In particular he wants to be able to declare:

   class Object { method CLONE returns MAGIC_TYPE {...} }

where MAGIC_TYPE is some magical token that means not having to declare

   class Whatever { method CLONE returns Whatever { .NEXT::CLONE } }
   class Whoever  { method CLONE returns Whoever  { .NEXT::CLONE } }

Quite what's wrong with

   class Object { method CLONE returns Object {...} }

is a mystery to me. Luke Palmer wants more than that. The insanity continued for several messages.

Do you get the feeling that I'm not a fan of this particular proposal? I confess I'm finding it very hard to summarize it any more than the above without exclamations of shock and dismay. So I'll stop now. Read the thread if you're interested.


Type Conversion Matrix

Michael Lazzaro posted a matrix of some of the known built in types and of the possibilities of conversion between them, noting that what was shown were the conversions that it was possible to make automatic, not necessarily those which should be automatic. For some reason this led to a discussion of boolean types. Nope, I don't know why either.

Another subthread discussed which conversions were lossy and whether or not to make them automatic in the case of going from, say 'Int' to 'int'. (Int being a Perl Integer which could be undef and have properties and traits, and 'int' being, essentially, at least 32 bits of signed integer).

Michael posted a second take on the matrix with more information and with an additional 'Scalar' type added to it.



Fun with Junctions

Austin Hastings got faintly confused by the difference between 'all' and 'any' junctions. The rule of thumb appears to be 'if you want to use a junction to represent a set, use an ``any'' junction.'


Fun with multi-dimensional arrays

Dave Whipp pointed everyone at a discussion of Perl 6 on Perlmonks, where people were implementing J H Conway's Game of Life in Perl. Dave had a some questions about tools for iterating over multidimensional arrays, suggesting some syntax as he went, then pulling it all together in a rather neat recalculation engine for the game of life.

Things got scary pretty quickly after that; at one point Dave even did a local redefinition of infix:= which certainly had me worried. Some clever stuff going on there and no mistake.



Acknowledgements, Announcements and Apologies

That about wraps it up for this week.

If you've appreciated this summary, please consider one or more of the following options:


In this article, we're going to look at POOL, a handy "little language" I recently created for templating object-oriented modules. Now you may not write many object-oriented modules, so this may not sound too interesting to you. Don't worry; I also plan to discuss, among other things, Ruby, how to use the Template Toolkit, profiling, computational linguistic trie structures, Ruby again, and the oil paintings of the Great Masters. Hopefully, something in here will be enough to keep your interest.

Splashing Around

One of the reasons that I always feel I never get anything substantial done in Perl is that I'm always distracted unduly by subtasks, particularly metaprogramming. I write so many "labor-saving" modules that I never get around to doing the original labor in the first place.

For instance, I wanted to write something to handle my accounts; I needed something to handle command-line options but I couldn't be bothered with the Getopt::Long rigmarole, so I wrote Getopt::Auto. Next, I needed to parse a simple configuration file and I couldn't face writing yet another colon-separated file parser, so I wrote Config::Auto. Finally, I wrote something that examined a database schema and wrote Class::DBI packages for each table -- all helpful tasks, and they meant I would never have to worry about configuration files or command-line options again. But, of course, I forgot about my accounts-handling application.

Something like this happened again recently. I started writing a module I'm calling Devel::DProfPP, which parses the data output by Devel::DProf in a neat object-oriented way. I was about five minutes into it when found myself writing:

    =head1 CONSTRUCTOR

        $object = Devel::DProfPP->new( %options )

    Creates a new C<Devel::DProfPP> instance.


    sub new {
        my ($class, %opts) = @_;
        bless { %opts }, $class;

And then I wrote my test: (I'm a bad boy, I write my tests after I write the code)

    use Test::More;
    my $x = Devel::DProfPP->new();
    isa_ok($x, "Devel::DProfPP");

Nothing strange about that, you might think. It's something I've written a dozen of times before, and something I will probably write a dozen times again. And that's when it hit me. I don't want to spend my time pounding out constructors and accessors, documentation that's practically identical from class to class, tests that are eminently predictable, and I never, ever wanted to have to write my $self = shift; again.

There are two ways to solve this. Now I'm not going to last long as editor of perl.com if I suggest everyone should do the first, and switching to Ruby wouldn't help with the documentation and tests part of the problem anyway.

The second is, of course, to make the computer do the hard work. I sketched out a short description of what I wanted in my Devel::DProfPP module, and wrote a little parser to read in that description, and then had some code generate the module. Then, I made a critical decision. I had just been doing some work with Template Toolkit, and wanted some more opportunity to play with it. So instead of hard-coding what the output module ought to look like, I simply passed the parsed data structure to a bunch of templates and let them do the work. This gave me an amazing amount of flexibility in terms of styling the output, and that's where I think the power of this system, the Perl Object Oriented Language, (POOL) lies.

In order to investigate that, let's look at some POOL files and the output they generate, and then we'll examine how the templates work and fit together.

Diving In

The POOL language is very, very ad hoc. It's bizarre and inconsistent, but that's because it was created by rationalizing a module description I scribbled down late one night. But it does the job. It's essentially a brain dump, and if your brain happens not to work like mine, then you might not like it; if yours does work like mine, my commiserations.

The POOL distribution, available from CPAN, ships with a handy reference manual, and that very description; these are the original notes I made when I was mocking up Devel::DProfPP, and they look like this:

    Devel::DProfPP - Parse C<Devel::DProf> output

    This module takes the output file from L<Devel::DProf> (typically
    F<tmon.out>) and parses it into a Perl data structure. Additionally, it
    can be used in an event-driven style to produce reports more

        ->@enter    || sub {}
        ->@leave    || sub {}
        ro->@stack  []
        @syms       []
        top_frame   ->stack->[0]

The first line should be familiar to anyone who writes or uses Perl modules: it's the description line that goes at the top of the documentation. It's enough to identify the class name and provide some documentation about it.

The next thing is the module description, again for the documentation, which begins with DESCRIPTION and ends with EOD.

When you look at the things starting with @, don't think Perl, think Ruby - they're not arrays, they're instance variables. Our Devel::DProfPP object will contain a filehandle to read the profiling data from, a subroutine reference for what to do when we enter a subroutine and when we leave one, the current Perl stack, and an array of symbols to hold the subroutine names.

These instance variables come in three types. The first are variables that the user doesn't set, but come with every new instance. I call these "set" variables, because they're come set to a particular value. Then there are "defaulting" instance variables, which the user may specify but otherwise default to a particular value. And then there are just ordinary ones, which are not initialized at all. Thankfully for me, the Devel::DProfPP brain dump contained all three types.

The symbol table and the stack are "set" variables. They come set to the an empty array reference; we signify this by simply putting an empty array reference on the same line.

        @syms       []

The enter subroutine, on the other hand, is a defaulting variable. (Don't worry about the arrow for now.) If the user doesn't specify an action to be performed when the profiler says that a subroutine has been entered, then we want to default to a coderef that does nothing.

In Perl, when we want to default to a value, we say something like:

    $object->{enter} = $args{enter} || sub {};

So the POOL encoding of that is:

        @enter      || sub {}

Finally, there's the filehandle, which the user supplies. Nothing special has to occur for this instance variable, so we just name it:


From this, we know enough to create a constructor like so:

    sub new {
        my $class = shift;
        my %args = @_;
        my $self = bless {
            fh => $args{fh},
            enter => $args{enter} || sub {},
            leave => $args{leave} || sub {},
            syms => [],
            stack => [],
        }, $class;
        return $self;

And that's precisely what POOL does. After the constructor, comes the accessors; we want to be able to say $obj->enter to retrieve the on-enter action, for instance. This thought led naturally to the syntax

    ->@enter || sub {}

When POOL sees an arrow attached to an instance variable, it creates an accessor for it:

    sub enter {
        my $self = shift;
        if (defined @_) { $self->{enter} = @_ };

        return $self->{enter};

The stack accessor is an interesting one. First, we only want this to be an accessor and not a mutator -- we really don't want people modifying the profiler's idea of the stack behind its back. This is signified by the letters ro (read-only) before the accessor arrow.

    ro->@stack []

A further twist comes from the fact that POOL is still trying to DWIM around my brain and my brain expects POOL to be very clever indeed. Because we have declared stack to be set to an array reference, we know that the stack accessor deals with arrays. Hence, when $obj->stack is called, it should know to dereference the reference and return a list. This means the code ends up looking like this:

    sub stack {
        my $self = shift;
        return @{$self->{stack}};

Aside from constructors and accessors, POOL knows about two other styles of method. (For now; there are more coming.) There are ordinary methods, which are simply named:


Sadly we can't DWIM the entire code for this method, so we generate the best we can:

    sub parse {
        my $self = shift;

And the final style is the delegate. The sad thing about the delegate given in the DProfPP example is that it doesn't actually work, but we'll pretend that it does. Delegates are useful when you have an object that contains another object; you can provide a method in your class that simply diverts off to the contained object. For instance, if we have a class representing a mail message, then we may wish to store the head and body as separate objects inside our main object. (Mail::Internet does something like this, containing a Mail::Header object.) Now we can provide a method called get_header, which simply locates the header object and passes on its arguments to that:

    sub get_header {
        my $self = shift;

In POOL lingo, this is a delegate via the header method, and get tells us how to do the delegation. It would be specified like this:

    get_header  ->header->get

Notice that this is precisely what appears in the middle of the Perl code for this method. An additional feature is that the "how" part of the delegation is optional. If we were happy for our top-level method to be called get instead of get_header, then we could say:

    get         ->header->

To me, this symbolizes going "through" the header method in order to call the get method.

These are the basics of the POOL language, and we've seen a little of the code it generates. It also generates a full set of documentation and tests, as well as a MANIFEST file and Makefile.PL or Build.PL file, but we'll look at those a little later.

In case you're interested, the reason why the delegation in the example doesn't work is because I was being too clever. I thought I could say:

    top_frame   ->stack->[0]

and have a top_frame method which "calls" [0] on the stack array reference, returning its first entry. This doesn't work for two reasons. First, I was too clever about ->stack and now it returns a list instead of an array reference. Second, delegates need to pass arguments, so POOL ends up generating code that looks like:

    return $self->stack->[0](@_);

(The third reason, of course, is that the top of the stack when represented as an array is element -1, not element 0. Oops.)

I thought about fixing this to do what I really, really mean, but decided that would be too nasty.

Another Example

Now that I had this neat tool for generating modules, I set it to work on the next module I wrote; this was a variant of Tree::Trie, a class to represent the trie data structure. Tries are a handy way of representing prefix and suffix relationships between words. They're conceptually simple; each letter in a word is inserted into a tree as the child of the previous letter. If we wanted a trie to count the prefices in the phrase THERE IS A TAVERN IN THE TOWN, then we would first insert the chain T-H-E-R-E-#, then I-S-#, then A-#, and so on, where # represents "end-of-word". We'd end up with a trie looking like this:

An example of a trie
Figure 1: An Example of a Trie

Tree::Trie is good at this sort of thing, but it didn't do a few extra things I needed, so I wrote an internal-use module called Kasei::Trie; this was the POOL file I used to generate it:

    Kasei::Trie - Implement the trie data structure

    "Trie"s are compact tree-like representations of multiple words, where
    each successive letter is introduced as the child of the previous.

        @children {}


The main class, Kasei::Trie, has a constructor with one instance variable that is initialized to be an empty hash reference, and one method, insert. There's also a secondary class representing each node in the trie, which has its own children, and has a data variable with its own accessor.

After generating this with POOL, all I needed to do was to fill in the code for the insert method, and modify some tests. A manifest, Makefile.PL, test suite with nine tests, and 161 lines of code and documentation were automatically created for me. I suspect that POOL saved me one to two hours.

The High Dive

Let's now take a look at the templates that make this all happen. The main template is called module, and it looks like this:

    package [% module.package %];
    [% INCLUDE module_header %]
    =head1 SYNOPSIS

    [% INCLUDE synopsis %]
    =head1 DESCRIPTION

    [% module.description  %]

    =head1 METHODS
        FOREACH method = module.methods;
        INCLUDE do_method;

    [% INCLUDE module_footer %]


As you can probably guess, in the Template Toolkit language, interesting things happen between [% and %]. Everything else is just output wholesale, but all kinds of things can happen inside the brackets. The first thing that happens is that we look at the module's package name. All the data we've collated from the parsing phase is stuffed into a hash reference, which we have passed into the template as module. The dot operator in Template Toolkit is a general do-the-right-thing operator that can be a method call, a hash reference look-up or an array reference look-up. In this case, it's a hash reference look-up, and we perform the equivalent of $module->{package} to extract the name.

Template Toolkit's [% INCLUDE template %] directive looks for the file template in its template path, processes it passing in all the relevant variables, and includes its output. So after the initial package ...; line, we include another template that contains everything that goes at the top of the module. As we'll see later, part of the beauty of templating things this way is that you can override templates by placing your own idea of what should go at the top of a module into your private version of module_header earlier in the template path, in a sense "inheriting" from the base set of templates.

Similarly, we include a file that will output the synopsis, and output the description that we collected between the DESCRIPTION and EOD lines of our POOL definition file.

Next, we want to document the various methods and output the code for them. POOL will have placed all the metadata for the methods we've defined, plus a constructor, in the appropriate order in the methods hash entry of module. As this is an array reference, we want to use a foreach-style loop to look at each method in turn. Not surprisingly, Template Toolkit's foreach-style loop is called FOREACH.

So this code:

        FOREACH method = module.methods;
        INCLUDE do_method;

will set a variable called method to each method in the array, and then call the do_method template. This simply dispatches to appropriate templates for each type of method. For instance, there's the set of templates for the "delegate" style; delegate_code looks like this:

    sub [% method.name %] {
        my $self = shift;
        return $self->[% method.via %]->[% method.how %](@_);

Whereas the documentation template contains some generic commentary:

    =head2 [% method.name %]

    [% INCLUDE delegate_synopsis -%]
    Delegates to the [%method.how%] method of this object's [%method.via%].


The synopsis that appears in the documentation here and in the synopsis at the top of the file simply explains how the delegation is done:

    $self->[% method.name %](...); # $self->[% method.via %]->[%method.how%]

Of course, there are some templates that are a little more complex, particularly those that generate the tests, but the main thing is that you can override any or none of those. If you don't like the standard same-terms-as-Perl-itself licensing block that appears at the end of the module, then create a file called ~/.pool/license containing:

    =head1 LICENSE

    This module is licensed under the Crowley Public License: do what thou
    wilt shall be the whole of the license.

POOL will pick up this template and use it instead of the standard one.

There's No P in Our POOL

When I started planning this article in the bath this morning, I realized that POOL is actually fantastically badly named; there's nothing actually Perl-specific about the language itself, and it's a handy definition language for any object-oriented system. Hence, I hereby retroactively name the project "the POOL Object Oriented Language", which also satisfies the recursive acronym freaks. But can we, using the same parser and templating system, turn POOL files into other languages? Of course we can; this is all part of the flexibility of the Template Toolkit system. What's more, we don't even have to override all of the templates in order to do so, just some of them. For instance, here's a Ruby equivalent of accessor_code:

    [% IF method.ro == "ro"; %]
        attr_reader :[% method.name %]
    [% ELSE; %]
        attr_accessor :[% method.name %]
    [% END; %]

do_method and module_footer, however, never need to change, since all they do is include other methods. With a complete set of toolkits, the same POOL description can be used to output a Perl, Ruby, Python, Java and C++ implementation of a given class.

Going Deeper

When Frans Hals' famous painting "The Laughing Cavalier" was being examined in a museum's labs, someone had the bright idea of putting it through an X-ray machine. When they did this, they were amazed to find underneath the famous painting a completely different work -- a painting of a young girl. They then adjusted the settings on the X-ray machine and tried again, and underneath the young girl, they found another painting. Since then, it's been common practice to X-ray pictures, and art historians have found many layers of paint underneath some of the most-famous pictures.

What's this got to do with POOL? Well, very little, but I wanted to throw that in. Since I realised that POOL's templates can be inherited so easily, I've had the idea of POOL "flavors"; coherent sets of templates that can be layered like oil paintings to impart certain properties to the output.

For instance, at the moment, POOL outputs unit tests in separate files in the t/ directory, one for each class. Some people, however, prefer to have their tests in the module right alongside the documentation and implementation, using the method described in Test::Inline. Well, there's no reason why POOL shouldn't be able to support this. All you'd need to do is create a new directory, let's say testinline/, and put a modified version of do_method in there which says something like:

    [% INCLUDE method_pod %]
    =head2 begin testing
    [% INCLUDE method_test %]
    [% INCLUDE method_code %]

Next, arrange for testinline/ to appear in the Template Toolkit template path, and magically your tests will appear in the right place.

It's not inconcievable that multiple "flavours" could combine in order to theme a module; for instance, you might want a module which uses Test::Class for its tests, and Module::Build for its build file, with a BSD license flavor and Class::Accessor for its accessors instead of having them explicitly coded. Conceptually, you'd then say:

    pool --flavours=testclass,modulebuild,bsdlicense,classaccessor mymodule.pool

and the module would come out just as you want. This hasn't happened yet for two reasons: First, although it's only a two- or three-line change to the pool parser to support pushing these directories onto the template path, I haven't needed it yet so I haven't done it, and second, because I haven't written any flavors yet. But it's easy enough to do.

Other future directions for POOL include a syntax for class methods and class variables, support for other languages as mentioned about, (which basically means ripping out the hard-coding of MANIFEST, Makefile.PL and so on and replacing that with a more flexible method) and other minor modifications. For instance, I'd like some syntax to specify dependencies; other Perl modules which will then be use'd in the main modules and which would be named at the appropriate place in the Makefile.PL. And, of course, there's building up a library of flavors, including "total conversion" flavors like Ruby and Python.

The one thing that's becoming really, really important is the need for nondestructive editing -- the ability to fill in some additional code for a method, then regenerate the class from a slight change to the POOL file without losing the new method's code. I'm going to need to add that soon to allow for iterative redesigning of modules.

But the main thing about POOL is what it does now -- it saves me time, and it takes away the drudgery of writing OO classes in Perl.

And I will finish Devel::DProfPP soon. I promise.

This week on Perl 6, week ending 2003-04-20

You know how it is, you go away for a lovely weekend folk festival in Wales, you have a really good, relaxed time, singing yourself hoarse and generally forgetting all about technology before coming home to email from the perl.com editor asking if he could have the summary about half an hour ago, and then you skim through the lists and find nearly 300 messages unread? You do? I thought it was just me. So, having utterly failed (by virtue of being elsewhere) to get a summary written by Monday, I'm currently shooting for 'getting it written'. Welcome to this week's Perl 6 summary; all the fun of the Perl 6 lists with none of the tedious 'reading every message'.

Let's see if I can't ease myself back into the Perl 6 vibe by summarizing the still rather quiet perl6-internals list first...

Building Parrot on Win32

Steve Fink has been busy committing (in the CVS rather than the culpability sense) Mattia Barbon's patches to get Parrot building happily in a Win32 environment. If you have such an environment, now would probably be a good time to grab the latest Parrot from CVS and see if it builds for you. I'm sure the list would be grateful to hear of your experience, good or bad.

PMC documentation

After seemingly weeks in the wilderness with very little feedback, Alberto Simões finally got some comments on (and thanks for) his latest PMC doc patches from Steve Fink and Brent Dax. The docs haven't made it into the distribution yet though, but it can only be a matter of time.


Is PMC size fixed?

Mattia Barbon wanted to know if it would eventually become possible to create PMCs with additional data members. Dan says not; PMCs are allocated from arenas which apparently means they need to be the same size (variable sized PMCs would mean adding complexity to the garbage collector, which is already complicated enough thanks very much...)


Dan Does Design Decisions

Dan announced a few design decisions:

  • It's time to start assigning permanent opcode numbers to some of the opcodes.
  • There's some new stack ops, halfpop[insp].
  • We now have can and does ops.
  • Dan explained that can and does were there to support fast interface polymorphism.

http://groups.google.com/groups -- permanent opcode numbers

http://groups.google.com/groups -- halfpop

http://groups.google.com/groups -- can/does

http://groups.google.com/groups -- Interfaces/Classes

Short-lived memory allocation

Luke Palmer wondered what the Right Way was to allocate dynamic memory that wouldn't be needed beyond a function invocation. The answer, of course, was 'use Parrot memory management and let Garbage Collection work its shiny magic'. Toward the end of the thread Dan let on that Parrots Garbage Collector is 'always going to be walking the system stack' so there was no need to worry about anchoring the newly allocated buffer to the root set for the duration of the function invocation, which seems to be a new commitment. Both Dan and Steve Fink observed that the memory documentation could use updating to clarify best practice for everyone. Volunteers?


How deep is clone?

Alberto Simões asked how deeply the clone operator worked. According to Leopold Tötsch it's a deep, recursive clone, which he noted makes for interesting times when dealing with self referencing structures (Dan reckoned that it shouldn't be too bad if you take advantage of the GC system's graph traversal smarts...). Luke Palmer wondered why the default was a deep copy as, he claimed, deep copies were seldom needed. He wondered how to make a shallow copy. Leo suggested extending clone with an extra parameter to specify deep or shallow copying. Dan said that it is the way it is because he said so, and that one would make a shallow copy with assign.


Shared memory

David Robins wondered whether Parrot's memory allocation system would cope with sharing memory between processes and found some messages in the archive that seemed to imply that 'it will cope eventually'. He wondered how it would cope. Warnock's Dilemma applies...


A New GC approach?

Kurt Stephens announced that he had a partially written 'conservative, non-copying "treadmill"' GC system that could work in real time without stopping the world. He wondered if it could be useful for Parrot. No comments so far...



IMC and variable number of arguments

K Stol wondered how to handle a variable number of function arguments in IMC code. Dan remarked that it was covered by the Parrot calling conventions (presumably IMC code doesn't do the dfull Parrot calling conventions though). Leo Tötsch suggested making sure that the last thing pushed onto the argument stack was the number of arguments, and Will Coleda suggested passing a single PMC like a PerlArray...


Meanwhile, over in perl6-language

If I were asked to summarize this week's traffic on perl6-language with one word, that word would be 'Types'. It turns out that thinking about types, and how they should behave in Perl 6, is hard. I don't envy Damian the writing of the next Exegesis, that's for sure.

Instead of presenting the threads in roughly chronological order this week I'm going to deal with the none type related threads first and then attempt to sketch the current issues with types without quite so much reference to individual threads. Cover me, I'm going in...

Currying questions

Last week, Ralph Mellor had asked whether currying assumptions could be overridden when the curried function was called and Luke Palmer had said he didn't think so. This week Damian answered with a rather more authoritative "No, they can't be overridden, just make a call to the original function.". Ralph had also wondered if there would be a way to specify whether currying assumptions were made by binding or by copying a value (currently, they get bound, just like they do when you call a function normally (I wonder what happens when the function prototype specifies is copy)). Damian said that, if you wanted to make an assumption based on a copy then you needed to explicitly make that copy.

http://groups.google.com/groups -- Ralph's original questions

Are all list constructors iterators?

'Marek Ph.' admired the shiny goodness that is lazy evaluation and wondered if all list constructors were actually iterators. He wanted to know if that meant that

    @a = 1 .. Inf;
    splice @a, 5, 2;

would yield

    @a == (1 .. 4, 7 .. Inf)

He also asked if the x operator would generate an iterator too. Luke Palmer thought the answer to both questions was "Yes".


... but foo('bar')

Stéphane Payrard spotted a possible ambiguity in Perl 6's grammar. He wanted to know if

    ... but foo('bar')

set the property 'foo' to the value 'bar', or did it create a property with the name being the value returned by a function call of foo('bar'). He wondered what the syntax would be to get the 'other' meaning. Luke Palmer thought the first part was that the property 'foo' would get set to 'bar' (so do I, unless the thing implementing the property has some special semantics). He suggested that to force the call to the function to get a property name one would do one of:

    ... but $(foo('bar'))


    ... but &foo.('bar')

I prefer the second of those two.


Perl 6 parser questions

Right at the end of last week, Austin Hastings asked a bunch of questions about the behaviour of the Perl 6 parser. He wondered, for instance, if, in the future, he'd be able to (usefully) say:

    use Perl6::Grammar v6.0.0.2;

Larry answered this question ("I don't see why not") and all of Austin's other questions on this topic. Apparently the Perl 6 Parser will be documented 'whenever Apocalypse 18 comes out'.


Initializations outside of control flow

Mark J. Reed asked about elegant ways of initializing shared variables. He wanted something a little neater than the blunderbuss of a BEGIN block. Larry obliged with one of his 'thinking aloud' posts which, while not giving us a final answer does give us a few signposts. It's looking like we'll have traits along the lines of:

    state $where is begin($value);
    state $where is check($value);
    state $where is  init($value);
    state $where is first($value);

Where the traits work analogously to BEGIN, CHECK, INIT and FIRST.



The new returns keyword

David Storrs was a little worried about the possible clash between the new returns keyword -- introduced in Apocalypse 6 -- and return. Michael Lazzaro pointed out that the 'possible clash' was almost certainly deliberate, after all:

    sub foo returns Bar {...}

reads rather well. David had used the example my $spot returns Dog, which does look rather ugly, but Michael pointed out, in the case of a variable declaration, it made more sense to use my $spot of Dog or even my Dog $spot. Michael commented that this choice of syntax meant the programmer was able to pick the most readable phrase for a given situation.


A17 early discussion: Perl 6 threading proposal

Austin Hastings posted what would once have been called an RFC about Perl 6's threading model. No comments so far.


wrap from Synopsis 6

David Storrs wondered if the new .wrap method, which returns a unique id identifying the particular 'wrapper' could have an associated warning if the resulting id wasn't stored somewhere. Adam D. Lopresto and Austin Hastings weren't keen...


The difference between - $arg {...}> and sub ($arg) {...}

David Storrs asked for a 'micro-Exegesis' on the difference between - $foo {...}> and sub ($foo) {...} since they both seemed to generate anonymous subroutines. There were an awful lot of responses to this. Essentially the difference is that a 'pointy block' (my coinage I think) is just a block that has a signature. The main difference is what happens to a return.

In a block or a pointy block, a return returns from the subroutine that lexically contains that block, not simply from the block itself. If you want to leave a block prematurely without returning from its enclosing subroutine, you would use the leave keyword.

This distinction between a Block and a Sub allows for some rather neat (Smalltalkish) idioms

    multi iterate_over_file( String $path: Block &block ) {
        my $fh = open File: '<', $path or
            fail "Couldn't open $path: $!";
        while <$fh> {

    sub find_user ($user_name) {
        iterate_over_file "/etc/passwd" -> $line {
            return $line but true if /^$user_name/;
        return undef;

This is a somewhat contrived example, but I think it's useful as an illustration. If a Block were exactly the same as a sub, then the return in find_user would return to the middle of the while loop in iterate_over_file and iterate_over_file would only return after it had gone through every line in the password file, which would mean that find_user would always return undef. However, a return from inside a block returns from the subroutine containing the block so find_user behaves as expected and we get to write powerful control structures without having to resort to macros. I do wonder if the it would be possible for a function like iterate_over_file to CATCH the block's Return exception though...


Compulsorily named parameters?

The debate over declaring non optional named parameters continued as Damian joining in. The current consensus appears to be that the various optional/named/slurpy shorthands introduced in Apocalypse 6 should stay pretty much as they are, but that it should be possible to declare more complex parameter requirements using sensibly named traits. John Siracusa still wants a more 'powerful' shorthand, but there doesn't seem to be anyone taking his side on that.


Multimethod invocants

Multimethods still appear to be causing some confusion, mostly to do with how they are called and dispatched, and which method parameters participate in the dispatch. There's a largish contingent (and I probably my count myself a member of that contingent, spot the bias) who would like to be able to write:

    multi infix:@ (Number $x, Number $y) { new Point: $x, $y }

    class point {
        multi make_rectangle ( Point $p ) {
            new Rectangle: $_, $p;
        multi make_rectangle ( Number $x, Number $y ) {
            .make_rectangle( $x @ $y );

Which isn't allowed. Instead you would have to write:

    multi make_rectangle ( Point $p, Point $q ) {
        new Rectangle: $p, $q;

    multi make_rectangle ( Point $p, Number $x, Number $y ) {
        make_rectangle ($p, $x @ $y);

And you also have to be wary of

    my Point $p;


    $p.make_rectangle($x @ $y);

which would first try to dispatch to Point's make_rectangle 'unimethod', only attempting to dispatch via a multimethod if there is no such method. Personally, I think there's room for a spoonful or two of syntactic sugar to allow for the 'method variant' style of declaration as well as the full on generic multimethod style (which would, of course, underpin the more restricted method variant style). However, if it doesn't exist out of the box I expect someone (me?) will write a set of macros to make things work.

In the message referenced, Damian explains the current state of the multimethod art...



Well, that's 100 or so messages accounted for. Which leaves another 173 messages remaining all of which concern types. The problem as I see it is that different people seem to understand different things from the word type, and there's a lot of people talking at cross purposes as well as a fair amount of axe grinding going on.

Now, I could just punt and write something like "Everyone except Leon Brocard talked for ages about types. Here are the links to those threads" which would at least has the virtue of getting the god awful running joke out of the way, but that would smack of cheating. So, what I'm going to do is to only cheat slightly. At the bottom of this section you'll find links to all the threads that discussed types this week. However, before that I'll try give you a (biased) overview of the issues involved and the areas of confusion.

An illustrative quotation from Lewis Carroll

"The name of the song is called 'Haddocks' Eyes.'" [said the White Knight.]

"Oh, that's the name of the song, is it?" Alice said, trying to feel interested.

"No, you don't understand," the Knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged, Aged Man.'"

"Then I ought to have said 'That's what the song is called'?" Alice corrected herself.

"No you oughtn't: that's another thing. The song is called 'Ways and Means' but that's only what it's called, you know!"

"Well, what is the song then?" said Alice, who was by this time completely bewildered.

"I was coming to that," the Knight said. "The song really is 'A-sitting On a Gate': and the tune's my own invention."

-- From Alice Through The Looking Glass, by Lewis Carroll

Two types of type

Perl 6 draws an important distinction between 'variable type' and 'value type'. A variable is a binding between a name and a container. The variable type is the type of the container associated with the variable's name. A variable's 'value type' is the expected type of the value stored in the variable's container. As far as I can tell, Perl is weirder than the average programming language in this respect in that it allows the programmer to specify both sorts of type. In C for instance, a value doesn't know its own type, it's just an area of memory that is interpreted according to the type of the variable that it is accessed via (or according to the type it is cast into). Meanwhile, in lisp like languages, 'variables' are simply keys in a symbol table, and the values in that symbol table are untyped pointers to values which know their own type.

Perl 6's symbol tables are rather more like Lisp symbol tables than C's, with the added wrinkle that the symbol table values are rather more sophisticated containers than simple generic pointers. This complexity arises for a couple of reasons:

Tied variables.
Instead of storing a variable's value in one of the core container types (Array, Hash, Scalar), it can be useful to use a custom container type to allow for 'magical' behaviour:
    my $FTSE is ShareIndex('FTSE');
    print "$FTSE";
    # FTSE 100 Index: 3916.70 (+27.50/+0.7%) at 2003042216:40
Context is really important to Perl. If you look at an array variable in a numeric context, then you get the number of items in the array; in list context, a list of all the items in the array; in a scalar context, a pointer to the array. This context dependent behaviour is best handled by the container object, possibly with the assistance of the contained object, but not always.

Scalars turn out to be one of the more remarkable types of Perl containers. At their simplest they can be thought of as a container which can hold at most one 'atomic' thing. Perl 5 scalars have three(?) slots for Number, String and Reference values (On IRC, Dan tells me that Perl 6 scalars will probably have slots for String, Float, Integer, Boolean and Reference values). These different 'scalar value types' can, with certain restrictions be treated without regard to their 'actual' type: A Number in a string context will give a sensible string representation, a string in a number context will give an appropriate numeric value, but not every possible scalar value type can be sensibly viewed as any other type; if you try and use a number in a reference context, you're going to get an error for instance. For added fun, it's perfectly possible for a scalar variable to contain both a Number value and a String value (In Perl 5, Scalar::Util provides a nice interface to this preexisting capability. On IRC, Dan suggests that the Perl 6ish way of doing this will probably be my $i = 4 but "Bibble!";).

What are value type declarations for?

Some people see value type declarations as being important for programmer safety. They want to see a situation where:

    my Number $foo = some_function_returning_a_string();


    sub a_func (Number $param) { ... }

    a_func("A string");

will throw exceptions, preferably at compile time.

Others want to see those same code fragments coerce any values assigned to them into the appropriate types (possibly with a warning) and see value type declarations simply as a way of letting the compiler do automatic optimization of code (if you have declared that a given variable will only contain, say, a number, you can (at least) get rid of a layer of indirection in accessing that value).

Others don't really care one way or the other about whether or not to coerce, they just want to use value types in setting up multimethods.

Still others don't really like the idea of declaring types at all, but do see value in ML like type inference for programmer safety reasons...

Others want to let the programmer choose, and worry about how to implement something which will let that happen. I'm a 'let the programmer choose, and the compiler optimize what it can' kind of guy.

Are types the same as objects?

If types are the same as objects, do they all inherit from a common base class? If they do, what does the hierarchy look like? What about interfaces? Do they need to be explicitly declared or can they be inferred. If they can be inferred, what about the problem of:

    class Tree {
        method feed {...}
        method grow {...}
        method bark {...}

    interface Canine {
        method feed {...}
        method grow {...}
        method bark {...}

    class Borzoi {
        method feed {...}
        method grow {...}
        method bark {...}


    multi treat($vet, Canine $critter ) {...}
    treat($some_vet, Tree.new); # Should this fail?

Arrghh!!! Make the hurting stop!

However, I don't care what Dan says, I want every type to have an associated class, and I want them all to inherit from some sort of common base class (at least conceptually, and, if I'm prepared to take the performance hit and jump through the hoops, actually. Sometimes you need to override Scalar's behaviour (or whatever)) but I don't think the inheritance trees that have been bandied about so far even come close to expressing the semantics we need. Expect a longish post to Perl 6 language on this at some point. Probably with (more or less) pseudo code.

Another distinction to think about

OO theory talks about value objects and reference objects. (I'm using 'object' here to try and get come conceptual distance from 'value type'). Here's an abstract example of what I mean

    my $a = new ValueObject: value => 10;
    my $b = $a;

    print "$a $b"; # 10 20

    my $c = new ReferenceObject: value => 10;
    my $d = $c;


    print "$c $d"; # 20 20

Just when you thought you understood value types...

Along come compound value types to mess with your head. Assuming a strict interpretation of value type declarations (assigning the the 'wrong' type to a variable throws an error), consider the following:

    my @a of Int = (1, 2, 3);
    my @b = @a;
    my @c of Str;

What happens to each of the following? If it's an error does it happen at runtime or compile time?

    @c = @a;
    @c = @b;

    push @a, "String";
    push @b, "String";

Are you sure about those? Now, what happens if you start with:

    my @a of Int = (1 .. Inf);
    my @b = @a;
    my @c of Str;

And there's more thorny problems where they came from.

Those thread links

http://groups.google.com/groups -- Types of literals

http://groups.google.com/groups -- Do we really need the dual type system

http://groups.google.com/groups -- User defined hierarchical types

http://groups.google.com/groups -- Mind the difference between value types and reference types

http://groups.google.com/groups -- Static typing with Interfaces

http://groups.google.com/groups -- Michael Lazzaro's superb summary of how containers and values interact. Not sure it's the whole story though...

Acknowledgements, Announcements and Apologies

Sorry it's late. I blame the perl6-language people. It has nothing whatsoever to do with weekend spent in Wales and a Bank Holiday Monday spent at an Easter egg hunt and barbecue at my aunt's.

This has been one of the harder Perl 6 summaries to write, mostly because the language list has been dealing with a complicated subject and finding lots of interesting corners and ambiguities. Many thanks to Michael Lazzaro for his careful summation of his understanding of how things work which certainly clarified my thinking, to Stéphane Payrard for his sanity check of the types summary and to Dan Sugalski for a few answers on IRC about Scalar behaviour.

If you've appreciated this summary, please consider one or more of the following options:

Filters in Apache 2.0

Not too long ago, despite a relative dearth of free tuits, I decided that I had put off my investigation of mod_perl 2.0 for too long - it was time to really start kicking the tires and tinkering with all the new stuff I had been hearing about. What I found was that the new mod_perl API is full of interesting features, yet discovering and using them was tedious, frustrating, enlightening, and fun all at the same time. Hopefully, I can help ease some of the growing pains you are likely to encounter by experiencing the pain myself first, then sharing some of the lessons through a series of articles. Consider this and future articles to be our voyage together into the rocky but exciting new mod_perl frontier.

One of the more interesting and practical features to come out of the Apache 2.0 redesign effort is output filters. While in Apache 2.0 there are all kinds of filters, including input and connection filters, it's output filters that are most interesting to me - mostly because 2.0 discussions make a point of saying that it's impossible (well, really, really hard) to filter output content in Apache 1.3, despite the fact that mod_perl users have been able filter content (to some degree) for years. Thus, when I began to play around with mod_perl 2.0 it seemed only logical that my first task would be to port the instructional yet useful Apache::Clean, a content filter for mod_perl 1.0, over to the new architecture.

What we will be examining here is a preliminary implementation of Apache::Clean using the mod_perl 2.0 API. Because mod_perl 2.0 is still being tweaked daily, if you want to follow along on your own box, then you would need the current version of mod_perl from CVS, or a recent snapshot - the latest versions shipped with Linux distributions like RedHat, or even the latest version on CPAN (1.99_08), are far too out of date for what we will be doing. The most current version of Apache 2.0, as well as Perl 5.8.0, will also be helpful. Keep in mind that many of the more interesting features in mod_perl 2.0 are not entirely stable yet, so do not be surprised if things work just a bit differently six months from now.

What Are Output Filters Anyway?

Go ahead, admit it. At some point, you wrote a CGI script that generated HTML with embedded Server Side Include tags. The impetus behind the idea was a simple one: You had hopes that the embedded SSI tags would save you from the extra work of, say, adding a canned footer to the bottom of your otherwise dynamic page. Sounds reasonable, right? Seeing those SSI tags left unprocessed in the resulting page must have been shocking.

As it turns out, whether you knew it or not, in Apache-speak you were trying to filter your content, or pass the output of one process (the CGI script) into another (Apache's SSI engine) for subsequent processing. Content filtering is a simple idea, and one that feels natural to us as programmers. After all, Apache is supposed to be modular, and piping modular components together - cat yachts.txt | wc -l - is something we do on the Unix command line all the time. Wanting the same functionality in our Web server of choice seems not only logical, but almost required in the interests of efficient application programming.

While the idea is certainly sound, the above experiment exposes a limitation of the Apache 1.3 server itself, namely that by design you cannot have more than one content handler for a given request - you can use either mod_cgi to process and CGI script, or mod_include to parse an SSI document, but not both.

With Apache 2.0, the idea of output filters were introduced, which provide an official way to intercept and manipulate data on its way from the content handler to the browser. In the case of our SSI example, mod_include has been implemented as an output filter in Apache 2.0, giving it the ability to post-process either static files (served by the default Apache content handler) or dynamically generated scripts (such as those generated by mod_cgi, mod_perl, or mod_php). True to its goal of exposing the entire Apache API to Perl, mod_perl allows you to plug into the Apache filter API and create your own output filters in Perl, which is what we will be doing with Apache::Clean.

HTML::Clean and Apache::Clean

Let's take a moment to look at HTML::Clean before delving into Apache::Clean, which is basically just a mod_perl wrapper that takes HTML::Clean and turns it into an output filter. HTML::Clean is a nifty little module that reduces the size of an HTML page using a number of different but simple techniques, such as removing unnecessary white space, replacing longer HTML tags with shorter equivalents, and so on. The end result is a page that, while still valid HTML and easily rendered by a browser, is relatively compact. If reducing bandwidth is important in your environment, then using HTML::Clean to tidy up static pages offline is a quick and easy way to save some bytes.

Here is a simple example of HTML::Clean in action.

use HTML::Clean ();

use strict;

my $dirty = q!<strong>&quot;helm's alee&quot;</strong>!;

my $h = HTML::Clean->new(\$dirty);

$h->strip({ shortertags => 1, entities => 1 });

print ${$h->data};

As you can see, the interface for HTML::Clean is object-oriented and fairly straightforward. Things begin by calling the new() constructor to create an HTML::Clean object. new() accepts either a filename to clean or a reference to a string containing some HTML. Deciding exactly which aspects of the HTML to tidy is determined in one of two ways: either using the level() method to set an optimization level, or by passing the strip() method any number of options from a rich set. In either case, strip() is used to actually clean the HTML. After that, calling the data() method returns a reference to a string containing the HTML, polished to a Perly white. In our sample code, the original HTML has been changed to

<b>"helm's alee"</b>

which is half the size of our original string yet displayed the same way by browsers.

Depending on the size of your site, using HTML::Clean can lead to a significant reduction in the number of bytes sent over the wire - for instance, the front page of the current mod_perl project homepage becomes 70% of it's original size when scrubbed with $h->level(9). However, while spending the time to tidy static HTML might make sense, the number of static pages on any given site seems to be diminishing daily. What about dynamically generated HTML?

One way to handle dynamic HTML would be to add HTML::Clean routines to each dynamic component of your application, a process that really is neither scalable nor maintainable. A better solution would be to have Apache inject HTML::Clean processing directly into the server response wherever we wanted it, to create a pluggable module that we could configure to post-process requests to any given URI. Enter Apache::Clean.

Apache::Clean provides a basic interface into HTML::Clean but it works as an output filter. As briefly mentioned, Apache::Clean already exists for mod_perl 1.0, but over in Apache 1.3 land it was limited in that it could only post-process responses generated by mod_perl, and that only after sufficient magic. We are not going to get into how that all worked in mod_perl 1.0 - for a detailed explanation see Recipe 15.4 in the mod_perl Developer's Cookbook or the original Apache::Clean manpage. With Apache 2.0 and the advent of output filters, we can now code Apache::Clean as a genuine part of Apache's request processing, allowing us to clean responses on their way to the browser entirely independent of who generates the content.

New Directives

Here is a look at a possible configuration for Apache 2.0, one that takes output of a CGI script, post-processes it for SSI tags, then cleans it with our Apache::Clean output filter.

Alias /cgi-bin /usr/local/apache2/cgi-bin
<Location /cgi-bin>
  SetHandler cgi-script

  SetOutputFilter INCLUDES
  PerlOutputFilterHandler Apache::Clean

  PerlSetVar CleanOption shortertags
  PerlAddVar CleanOption whitespace

  Options +ExecCGI +Includes

As with Apache 1.3, mod_cgi is still enabled the same way - in our case via the SetHandler cgi-script directive, although this is not the only way and the familiar ScriptAlias directive is still supported. What is different in this httpd.conf snippet is the configuration of the SSI engine, mod_include. As already mentioned, mod_include was implemented as an output filter in Apache 2.0, and output filters bring with them a new directive. The SetOutputFilter directive activates the SSI engine - the INCLUDES filter - within our container. This means that requests to cgi-bin/, no matter who handles the actual generation of content, will be parsed by mod_include. See the mod_include documentation for other possible SSI configurations and options.

With the generic Apache bits out of the way, we can move on to the mod_perl part, which isn't all that complex. While the PerlSetVar and PerlAddVar directives are exactly the same as they were in mod_perl 1.0, mod_perl 2.0 introduces a new directive - PerlOutputFilterHandler - which specifies the Perl output filter for the request. In our sample httpd.conf, the Apache::Clean output filter will be added after mod_include, which inserts SSI processing after mod_cgi. The really cool part about filters is that everything happens without any tricks or magic - getting all these independent modules to work in harmony in creating the server response is all perfectly normal, which is a huge improvement over Apache 1.3.

In the interests of safety, one thing that you should note about our sample configuration is that it does not include the entities option. Because we're cleaning dynamic content, reducing entity tags (such as changing &quot; to ") would inadvertently remove any protection against Cross Site Scripting introduced by the generating script. For more information about Cross Site Scripting and how to protect against it, a good overview is provided in this perl.com article.

Introducing mod_perl 2.0

mod_perl actually offers two different APIs for coding the Perl output filter. We are going to be using the simpler, streaming API, which hides the raw Apache API a bit. Of course, if you are feeling bold and want to manipulate the Apache bucket brigades directly, you are more than welcome, but it is a more complex process so we are not going to talk about it here. Instead, here is our new Apache::Clean handler, ported to mod_perl 2.0 using the streaming filter API.

package Apache::Clean;

use 5.008;

use Apache::Filter ();      # $f
use Apache::RequestRec ();  # $r
use Apache::RequestUtil (); # $r->dir_config()
use Apache::Log ();         # $log->info()
use APR::Table ();          # dir_config->get() and headers_out->get()

use Apache::Const -compile => qw(OK DECLINED);

use HTML::Clean ();

use strict;

sub handler {

  my $f   = shift;

  my $r   = $f->r;

  my $log = $r->server->log;

  # we only process HTML documents
  unless ($r->content_type =~ m!text/html!i) {
    $log->info('skipping request to ', $r->uri, ' (not an HTML document)');

    return Apache::DECLINED;

  my $context;

  unless ($f->ctx) {
    # these are things we only want to do once no matter how
    # many times our filter is invoked per request

    # parse the configuration options
    my $level = $r->dir_config->get('CleanLevel') || 1;

    my %options = map { $_ => 1 } $r->dir_config->get('CleanOption');

    # store the configuration
    $context = { level   => $level,
                 options => \%options,
                 extra   => undef };

    # output filters that alter content are responsible for removing
    # the Content-Length header, but we only need to do this once.

  # retrieve the filter context, which was set up on the first invocation
  $context ||= $f->ctx;

  # now, filter the content
  while ($f->read(my $buffer, 1024)) {

    # prepend any tags leftover from the last buffer or invocation
    $buffer = $context->{extra} . $buffer if $context->{extra};

    # if our buffer ends in a split tag ('<strong' for example)
    # save processing the tag for later
    if (($context->{extra}) = $buffer =~ m/(<[^>]*)$/) {
      $buffer = substr($buffer, 0, - length($context->{extra}));

    my $h = HTML::Clean->new(\$buffer);




  if ($f->seen_eos) {
    # we've seen the end of the data stream

    # print any leftover data
    $f->print($context->{extra}) if $context->{extra};
  else {
    # there's more data to come

    # store the filter context, including any leftover data
    # in the 'extra' key

  return Apache::OK;


If you can dismiss the mod_perl specific bits for a moment, you will see the HTML::Clean logic embedded in the middle of the handler, which is not very different from the isolated code we used to illustrate HTML::Clean by itself. One of the things we need to do differently, however, is determine which options to pass to the level() and options() methods of HTML::Clean. Here, we use $r->dir_config() to gather whatever httpd.conf options we specified through our PerlSetVar and PerlAddVar configurations.

my $level = $r->dir_config->get('CleanLevel') || 1;

my %options = map { $_ => 1 } $r->dir_config->get('CleanOption');

This use of dir_config() is in fact no different than how we would have coded it in mod_perl 1.0. Similarly, later methods like r->content_type(), $r->server->log->info(), and $r->uri() also behave the same as they did in mod_perl 1.0, which should offer some degree of comfort. For instance, the block

unless ($r->content_type =~ m!text/html!i) {
  $log->info('skipping request to ', $r->uri, ' (not an HTML document)');

  return Apache::DECLINED;

looks almost exactly the same as it would have been in mod_perl 1.0, save the use of Apache::DECLINED. The new Apache::Const class provides access to all constants you will need in your handlers, albeit through a slightly different interface than before - when using the -compile option, constants are imported into the Apache:: namespace. If you want the constants in your own namespace, mimicking the OK of yore, you can just use Apache::Const by itself without the without the -compile option.

Some of the other minor differences you will notice are the addition of a bunch of use statements at the top of the handler. Whereas with mod_perl 1.0 just about every class was magically present when you needed it, with mod_perl 2.0 you need to be very specific about the classes you will be using in your handler, and almost nothing is available by default.

In general, the most important class is Apache::RequestRec, which provides access to all the elements in the Apache C request_rec structure. Methods originating from the request object, $r, but not operating on the actual request_rec slots, such as $r->dir_config(), are defined in Apache::RequestUtil. This is a nice separation, and can help you think about mod_perl more in terms of access to the underlying Apache guts than just a box of black magic.

If you recall from 1.0, $r->dir_config() returned an Apache::Table object, which corresponded to an Apache table and allowed things like headers to be stored in a case-insensitive, multi-keyed manner. In 2.0, Apache tables are accessed through the APR (Apache Portable Runtime) layer, so any API that accesses tables needs to use APR::Table. This includes the get() and set() methods used on tables like headers_out and dir_config.

Besides Apache::RequestRec, Apache::RequestUtil, and APR::Table, our handler also needs access to the Apache::Log and Apache::Filter classes. Apache::Log works no differently than it did under mod_perl 1.0, while Apache::Filter is entirely new and will be discussed shortly.

From experience, I can tell you that determining which module you need to use in order to access the functionality you require is maddening. In the old days (just weeks before this article was written) developers needed to plow through code examples in the mod_perl test suite in order to discern which modules they needed. But no more. Recently introduced was the ModPerl::MethodLookup package, which contains the lookup_method() function - just pass it the name of the method you are looking for and you will get back a list of modules likely to suit your needs. See the MethodLookup documentation for more details.

With basic housekeeping out of the way, we can focus on the guts of our output filter and the Apache::Filter streaming API. You will notice that the first argument passed to the handler() subroutine is an Apache::Filter object, not the typical $r you might have been expecting. mod_perl 2.0 has stepped up it's DWIM factor a bit in an attempt to make writing filters a bit more intuitive - in fact, it is possible to write an output filter without ever needing to access $r, so mod_perl gives you what you will primarily need. In order to access $r we call the (aptly named) r() method on our $f, then use $r as the gateway to per-request attributes, such as the MIME type of the response. Note that our filter genuinely declines to process the request if the content is not HTML. No fancy footwork, just processing like it should be.

Beyond the typical handler initializations is where things really start to get unfamiliar, starting with the notion of filter context, or $f->ctx(). Unlike with mod_perl 1.0, where every handler is called only once per request, output filters can be (and generally are) called multiple times per request. There can be several reasons for this, but for our purposes it is sufficient to understand that we need to adjust our normal handler logic and compensate for some of the subtle behaviors that can arise by being called several times.

So, the first thing we do is isolate parts of the request that only need to happen once per request. $f->ctx(), which stores the filter context, will return undef on the first request, so we can use it as an indicator of the initial invocation of our filter. Since we only really need to parse our httpd.conf configuration once, we use the initial invocation to get our PerlSetVar options and store them in a hash for later - because $f->ctx() can store a scalar for us, we store our hash as a reference in $context. We also set aside space in the hash for the extra element, which will become important later.

Another thing we need to do only once per request is to remove the Content-Length header from the response. Apache 2.0 has taken great steps to make sure that all requests are as RFC compliant as possible, while at the same time trying to make the developer's life easier. As it turns out, part of this was the addition of the Content-Length filter, which calculates the length of the response and adds the Content-Length header if Apache deems it appropriate. If you are writing a handler that alters content to the point where the length of the response is different (which is probably true in most cases) you are responsible for removing the Content-Length header from the outgoing headers table. A simple call to $r->headers_out->unset() is all we need to accomplish this, which is again the same as it would have been in mod_perl 1.0. And don't worry, if the Content-Length is missing Apache takes other steps, such as using a chunked transfer encoding, to ensure that the request is HTTP compliant.

That about wraps up all the processing which should only happen once per request. If you do not like seeing the informational "skipping..." message on every non-HTML filter invocation, feel free to add logic that tests against $f->ctx() there as well.

Once we have taken care of one-time-only processing, we can move on to the heart of our output filter. The actual Apache::Filter streaming API is fairly straight forward. For the most part, we simply call $f->read() to read incoming data in chunks, in our case 1K at a time. Sending the processed data down the line requires only that we call $f->print(). All in all, the basis of the streaming API couldn't be any simpler. Where it begins to get complex stems from the nature of our particular filter.

The idea behind HTML::Clean is that it can, in part, make HTML more compact. However, since HTML is tag based, and those tags often come in pairs, we need to take special steps to make sure that our tags remain balanced after Apache::Clean has run. Because we are reading and processing data in chunks, there is the possibility that a tag might be stranded between chunks. For instance, if the HTML looked like

[1019 bytes of stuff] <strong>Bold!</strong> [more stuff]

the first chunk of data that Apache::Clean would see is

[1019 bytes of stuff] <str

Because <str is not a valid HTML tag, HTML::Clean leaves it unaltered. When the next chunk of data is read from the filter, it comes across as

ong>Bold!</strong> [more stuff]

and HTML::Clean again leaves the unrecognized ong> unprocessed. However, it does catch the closing </strong> tag. The end result, as you can probably see now, would be

[1020 bytes of stuff] <strong>Bold!</b> [more stuff]

which is definitely undesirable. Our matching regex and extra manipulations make certain that any dangling tags are prepended to the front of the next buffer, safeguarding against this particular problem. Of course, this kind of logic is not required of all filters. Just remember to keep in mind the complexity that operating on data in pieces adds when you implement your own filter.

Once we are finished processing all the data from the current filter invocation we come to a critical junction - determining whether we have seen all of the data from the actual response. If our filter will not be called again for this request, $f->seen_eos() will return true, meaning that we have reached the end of the data stream. Because we may have leftover tag fragments stored in $context->{extra}, we need to send those along before exiting our filter. On the other hand, if there is more data to come, then we would need to store the current filter context so it can be used on the next invocation.


So there you have it, output filtering made easy with mod_perl 2.0. All in all, it is a bit different than what you might be used to with mod_perl 1.0, but it's not that difficult once you get your head around it. And it does allow for some pretty amazing things. For instance, not only can we now use Perl to code interesting handlers like Apache::Clean, but the overall filter mechanism makes it possible to use Perl to manipulate any and all content originating from the server - just a simple line like

PerlOutputFilterHandler My::Cool::Stuff

on the server level of your httpd.conf (such as next to the DocumentRoot directive) will allow you to post-process every request. Cool.

Of course, that's not the end of the story, and what's left over has both positive and negative sides. What we've seen here is really just scratching the surface of filters: There are still input and connection filters to talk about, as well as raw access to the Apache filtering API via bucket brigades. The downside is that constructing a filter that handles conditional GET requests properly isn't really as good as it could be due to the (current) incompleteness of the mod_perl 2.0 filtering API. However, these things aside, what we've accomplished here is already pretty impressive, and I hope it gets your creativity flowing and you start tinkering with the very cool new world of mod_perl 2.0.

If you want to try the code from this article, then it is available as a distribution from my Web site. Other good sources of information are the (growing) mod_perl 2.0 docs, particularly the section on filtering, which has been (very) recently updated with the latest information.

Stay Tuned ...

How did I know that my code here actually worked and did what I expected it to? I wrote a test suite for Clean.pm using the new Apache::Test framework and put my code against a live Apache server under every situation I could think of. Apache::Test is probably the single best thing to come out of the mod_perl 2.0 redesign effort, and what I will share with you next time.


Many thanks to Stas Bekman, who reviewed this article, even though it meant having to deal with both my questions, suggestions and API gripes.

This week on Perl 6, week ending 2003-04-13

Another week passes, another summary gets written. How does he do it?

We start, as per usual with the still very quiet perl6-internals list.

hey sexy where you been?

The internals list's normally impeccable spam filters failed slightly this week and we actually saw a couple of spam messages sent via the RT system. The problem is being looked into.

Support for true and false properties

Stéphane Payrard posted a 'micropatch' to support the true and false properties that are Perl 6 needs.

PMC elements() inaccessible from the assembler?

Stéphane also wondered about how he should access a PMC's elements method via Parrot Assembly as there didn't seem to be a an appropriate operator defined in core.ops. Leo Tötsch said that it was known that there were inconsistencies between vtable definitions and the ops files and that a major cleanup was needed which would probably happen once it became clean which vtable methods will remain and how core.ops needed to change.


Parrot on Win32

Clinton A Pierce announced that he'd built a Win32 Parrot distribution and had put a zip file up for downloading (and hopefully mirroring). He warned that it was very much a work in progress and the downloader should beware. But hey! Parrot on Win32! Bravo Clint!

In the same vein, Mattia Barbon posted a couple of patches to improve Win32 support.


http://geeksalad.org/tmp/Parrot_Dist_20030410.zip -- Parrot on Win32!



Dan's Blog

Dan Sugalski continues to post interesting stuff on his weblog, including a discussion of the challenges he faces in writing Parrot's Object spec.


Meanwhile over in perl6-language

The internals list is still very quiet, seeing all of 28 messages. Meanwhile perl6-language saw over 200 messages. Maybe it had something to do with Allison and Damian's Synopsis being posted on perl.com


Incorporating WhatIf

David Storrs had wondered about the CPAN module WhatIf which provides rollback functionality for arbitrary code. He wanted to know if support for this rollback capability could be added to the Perl 6 core. Luke Palmer liked the idea, and pointed to Larry's discussion of let subcall() in one of the appendices to Apocalypse 6. Simon Cozens thought that 'that way lies madness' and wondered if we weren't already a long way down that road. The thread got a little bit meta at this point, discussing what was and wasn't a valid rationale for adding something to the core rather than just implementing it in a module. Austin Hastings rule of thumb (which seems to be a good one) is that something belongs in core if it 'enables a paradigm', citing rules, continuations, multidispatch, thread support and the new object system as examples of paradigm enabling stuff that belongs in core. Dan Sugalski's rule of thumb is 'does it provide extra utility at marginal cost?' Spot the implementer.

It turns out that Larry's rule of thumb is pretty much the same as Austin's and he thinks we should add let funcall() to the core because 'management of hypotheticality is the essence of logic programming'.


http://search.cpan.org/author/SIMONW/Whatif-1.01/ -- The WhatIf module


== vs eq

The discussion of the meaning of

    \@a == \@b

continued. In particular, does it return true if @a and @b are bound to the same array, or does it return true simply if @a and @b have the same number of elements. John Williams noted that he was looking forward to Apocalypse 8, on references which would hopefully clarify a lot of these issues.

The Over overloading of is

Piers Cawley pointed out that the three different behaviours of 'is' that Luke Palmer had worried about the week before were all just examples of setting traits on containers and provided some code in an attempt to show what he meant. Luke Palmer wasn't convinced (and I'm not entirely convinced myself) but thought it was probably best to reserve judgement until the Objects and Classes Apocalypse has been done.


Properties and Methods and Namespaces, oh my!

Piers Cawley tried to put Luke Palmer's worries about clashes between method and property names to rest, and failed to do so. Luke wanted to have lexically and dynamically scoped properties, but that gave Dan Sugalski the screaming abdabs. Luke Palmer came up with anonymous properties, an idea which I particularly like:

    my $visited = Property.new;

    sub breadth_first_traversal($root can LinksInterface, &code) {
        my @queue is Queue = $root;
        for @queue {
            next if .$visited;
            $_ = $_ but $visited;
            push @queue: .links;

I don't think any of the design team have commented on this particular idea but I think it's lovely and I hope it gets rolled into the language.



Paul showed a lump of Perl 5 code that depended on globs to do bulk setting of attributes in hash based objects and wanted to know how one could do the same thing in Perl 6. Responses were along two lines, the first being that you wouldn't be able to do that sort of thing at all with Perl 6 objects, and others pointing out that you didn't really need a glob to get the example given to work. Paul was enlightened.


alphabet-blind pattern matching

The discussion of how to match streams of objects as well as simply strings of characters using the grammar engine continued with a new subject line. And it continued to make my head hurt. I particularly liked Larry's observation that in 'treating patterns as too powerful for just test [...] we run the risk of making patterns to powerful for text'. Larry seemed to think that the idea of matching streams of objects with the grammar engine had merit though, which led to one of his wonderful 'thinking aloud' posts where he went through the various possibilities.


http://groups.google.com/groups -- Larry thinks out loud 'til his brain hurts

Synopsis questions

Angel Faus incurred the potential wrath of the summarizer by bundling up a whole host of questions relating to Synopsis 6 in a single post. Larry earned my undying gratitude by answering each question in a separate post with a new subject. Yay Larry.

As it happened the answers provoked very little in the way of discussion so I don't really need to handle each question separately in this summary. Larry answered all the questions.


Variable/value type syntax

Ralph Mellor wasn't keen on the syntax for declaring the types of values and variables and proposed a new syntax. People weren't keen. I have the really strong feeling that whoever writes the Perl 6 introductory texts will have an interesting time explaining the difference between variable and value types. At least Luke Palmer had a moment of clarity when he realised why using is for tying made sense...


Currying questions

Ralph also wondered if currying assumptions could be overridden by passing the relevant argument by name when the curried function is called. Luke Palmer didn't think it should be allowed. No call from the core design team yet.


Multimethod Invocants

Paul got confused by multimethods with multiple invocants and asked for clarification. Luke Palmer clarified things for him. The rule for multimethods is that multidispatch is only done on the invocants. Therefore a pair of declarations like:

    multi foo($me : Int $i)    {...}
    multi foo($me : String $s) {...}

would probably throw a compilation exception or at the very least a warning because of the collision between invocant types. If you really wanted to dispatch on the type of the second argument you'd have to write the method declarations as:

    multi foo($me, Int $i:)    {...}
    multi foo($me, String $s:) {...}

Larry pointed out that one generally calls multimethods as if they were subroutines, calling $obj.foo(1) would first try and find a method called foo, only falling back to multiple dispatch there were no such method to find. All of this engendered confusion in Michael Lazzaro (and in me if I'm honest, but now it's been made clear, it makes sense).


Change in if syntax

Buddha Buck was caught out by the fact that, in Perl 6 you don't need parentheses 'round the test in an if statement and wondered when things had changed. Luke Palmer pointed out that they were now optional, a convenience bought by prohibiting whitespace between a variable and its subscript (so %foo {bar} does not look up the value associated with 'bar' in the %foo hash). Brent Dax hoped that this no whitespace rule was a simple tiebreaker, but Damian dashed that hope. The whitespace rule means that an opening brace after whitespace *always* denotes the beginning of a block or an anonymous hash. Some people don't like this.


Named parameters are always optional?

John Siracusa wanted a way of creating a subroutine that would only accept named parameters, some or all of which are required. With Perl 6 as it stands, this appeared to be impossible without adding extra code to the body of the sub. Luke Palmer thought that what John wanted was silly, and suggested that if John really wanted the ability he should just write a module to do it. After a few posts back and forth on this, Austin Hastings pointed out that this discussion was analogous to the earlier discussion about WhatIf and what should and shouldn't be in the core.He argued -- by the 'enables a paradigm' rule of thumb -- that there was no need for core support for requiring named arguments because it could be added easily with a module. John thought that adding this with a module meant that there would be no way to check parameters successfully at compilation time (actually, I think you could, with a suitably clever macro...).

By the end of the week John still wanted compulsory named arguments, but nobody else (who participated in the thread) seemed to be convinced that they were important enough to be implemented as part of the language.


Aliasing operations in Perl 6

Mark Jason Dominus asked for some clarification of the binding operator := and how it interacted with autodereferencing. Given code like:

    my $zero = 0;
    my $zeroref = \$zero;

    my $bound := $zeroref;

He wanted to know whether the value of $bound was 0 or \$zero. Damian answered that references only autodereferenced themselves in an array or hash context, so the $bound would be \$zero.


Acknowledgements, Announcements and Apologies

Leon Brocard does not make his traditional appearance in this summary as he is on holiday. Normal service will hopefully be resumed soon.

If you've appreciated this summary, please consider one or more of the following options:

Synopsis 6

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 6 for the current design information.

This document summarizes Apocalypse 6, which covers subroutines and the new type system.

Subroutines and Other Code Objects

Subroutines (keyword: sub) are noninheritable routines with parameter lists.

Methods (keyword: method) are inheritable routines that always have an associated object (known as their invocant) and belong to a particular class.

Submethods (keyword: submethod) are noninheritable methods, or subroutines masquerading as methods. They have an invocant and belong to a particular class.

Multimethods (keyword: multi) are routines that do not belong to a particular class, but which have one or more invocants.

Rules (keyword: rule) are methods (of a grammar) that perform pattern matching. Their associated block has a special syntax (see Synopsis 5).

Macros (keyword: macro) are routines whose calls execute as soon as they are parsed (i.e. at compile-time). Macros may return another source code string or a parse-tree.

Standard Subroutines

The general syntax for named subroutines is any of:

     my RETTYPE sub NAME ( PARAMS ) TRAITS {...}
    our RETTYPE sub NAME ( PARAMS ) TRAITS {...}
                sub NAME ( PARAMS ) TRAITS {...}

The general syntax for anonymous subroutines is:

    sub ( PARAMS ) TRAITS {...}

"Trait" is the new name for a compile-time (is) property. See Traits and Properties

Perl5ish Subroutine Declarations

You can still declare a sub without parameter list, as in Perl 5:

    sub foo {...}

Arguments still come in via the @_ array, but they are constant aliases to actual arguments:

    sub say { print qq{"@_"\n}; }   # args appear in @_

    sub cap { $_ = uc $_ for @_ }   # Error: elements of @_ are constant

If you need to modify the elements of @_, then declare it with the is rw trait:

    sub swap (*@_ is rw) { @_[0,1] = @_[1,0] }


Raw blocks are also executable code structures in Perl 6.

Every block defines a subroutine, which may either be executed immediately or passed in as a Code reference argument to some other subroutine.

"Pointy subs"

The arrow operator -> is almost a synonym for the anonymous sub keyword. The parameter list of a pointy sub does not require parentheses and a pointy sub may not be given traits.

    $sq = -> $val { $val**2 };  # Same as: $sq = sub ($val) { $val**2 };

    for @list -> $elem {        # Same as: for @list, sub ($elem) {
        print "$elem\n";        #              print "$elem\n";
    }                           #          }

Stub Declarations

To predeclare a subroutine without actually defining it, use a "stub block":

    sub foo {...};     # Yes, those three dots are part of the actual syntax

The old Perl 5 form:

    sub foo;

is a compile-time error in Perl 6 (for reasons explained in Apocalypse 6).

Globally Scoped Subroutines

Subroutines and variables can be declared in the global namespace, and are thereafter visible everywhere in a program.

Global subroutines and variables are normally referred to by prefixing their identifier with *, but it may be omitted if the reference is unambiguous:

    $*next_id = 0;
    sub *saith($text)  { print "Yea verily, $text" }

    module A {
        my $next_id = 2;    # hides any global or package $next_id
        saith($next_id);    # print the lexical $next_id;
        saith($*next_id);   # print the global $next_id;

    module B {
        saith($next_id);    # Unambiguously the global $next_id

Lvalue Subroutines

Lvalue subroutines return a "proxy" object that can be assigned to. It's known as a proxy because the object usually represents the purpose or outcome of the subroutine call.

Subroutines are specified as being lvalue using the is rw trait.

An lvalue subroutine may return a variable:

    my $lastval;
    sub lastval () is rw { return $lastval }
or the result of some nested call to an lvalue subroutine:

    sub prevval () is rw { return lastval() }

or a specially tied proxy object, with suitably programmed FETCH and STORE methods:

    sub checklastval ($passwd) is rw {
        my $proxy is Proxy(
                FETCH => sub ($self) {
                            return lastval();
                STORE => sub ($self, $val) {
                            die unless check($passwd);
                            lastval() = $val;
        return $proxy;

Operator Overloading

Operators are just subroutines with special names.

Unary operators are defined as prefix or postfix:

    sub prefix:OPNAME  ($operand) {...}
    sub postfix:OPNAME ($operand) {...}

Binary operators are defined as infix:

    sub infix:OPNAME ($leftop, $rightop) {...}

Bracketing operators are defined as circumfix. The leading and trailing delimiters together are the name of the operator.

    sub circumfix:LEFTDELIM...RIGHTDELIM ($contents) {...}
    sub circumfix:DELIMITERS ($contents) {...}

If the left and right delimiters aren't separated by "...", then the DELIMITERS string must have an even number of characters. The first half is treated as the opening delimiter and the second half as the closing.

Operator names can be any sequence of Unicode characters. For example:

    sub infix:(c)        ($text, $owner) { return $text but Copyright($owner) }
    method prefix:± (Num $x) returns Num { return +$x | -$x }
    multi postfix:!             (Int $n) { $n<2 ?? 1 :: $n*($n-1)! }
    macro circumfix:<!--...-->   ($text) { "" }

    my $document = $text (c) $me;

    my $tolerance = ±7!;

    <!-- This is now a comment -->

Parameters and Arguments

Perl 6 subroutines may be declared with parameter lists.

By default, all parameters are constant aliases to their corresponding arguments -- the parameter is just another name for the original argument, but the argument can't be modified through it. To allow modification, use the is rw trait. To pass-by-copy, use the is copy trait.

Parameters may be required or optional. They may be passed by position, or by name. Individual parameters may confer a scalar or list context on their corresponding arguments.

Arguments destined for required parameters must come before those bound to optional parameters. Arguments destined for positional parameters must come before those bound to named parameters.

Invocant Parameters

A method invocant is specified as the first parameter in the parameter list, with a colon (rather than a comma) immediately after it:

    method get_name ($self:) {...}
    method set_name ($me: $newname) {...}

The corresponding argument (the invocant) is evaluated in scalar context and is passed as the left operand of the method call operator:

    print $obj.get_name();

Multimethod invocants are specified at the start of the parameter list, with a colon terminating the list of invocants:

    multi handle_event ($window, $event: $mode) {...}    # two invocants

Multimethod invocant arguments are passed positionally, though the first invocant can be passed via the method call syntax:

    # Multimethod calls...
    handle_event($w, $e, $m);
    $w.handle_event($e, $m);

Invocants may also be passed using the indirect object syntax, with a colon after them. The colon is just a special form of the comma, and has the same precedence:

    # Indirect method call...
    set_name $obj: "Sam";

    # Indirect multimethod call...
    handle_event $w, $e: $m;

Passing too many or too few invocants is a fatal error.

The first invocant is always the topic of the corresponding method or multimethod.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 6 for the current design information.

Required Parameters

Required parameters are specified at the start of a subroutine's parameter list:

    sub numcmp ($x, $y) { return $x <=> $y }

The corresponding arguments are evaluated in scalar context and may be passed positionally or by name. To pass an argument by name, specify it as a pair: parameter_name => argument_value.

    $comparison = numcmp(2,7);
    $comparison = numcmp(x=>2, y=>7);
    $comparison = numcmp(y=>7, x=>2);

Passing the wrong number of required arguments is a fatal error.

The number of required parameters a subroutine has can be determined by calling its .arity method:

    $args_required = &foo.arity;

Optional Parameters

Optional positional parameters are specified after all the required parameters and each is marked with a ? before the parameter:

    sub my_substr ($str, ?$from, ?$len) {...}

The = sign introduces a default value:

    sub my_substr ($str, ?$from = 0, ?$len = Inf) {...}

Default values can be calculated at run-time. They can even use the values of preceding parameters:

    sub xml_tag ($tag, ?$endtag = matching_tag($tag) ) {...}

Arguments that correspond to optional parameters are evaluated in scalar context. They can be omitted, passed positionally, or passed by name:

    my_substr("foobar");            # $from is 0, $len is infinite
    my_substr("foobar",1);          # $from is 1, $len is infinite
    my_substr("foobar",1,3);        # $from is 1, $len is 3
    my_substr("foobar",len=>3);     # $from is 0, $len is 3

Missing optional arguments default to their default value, or to undef if they have no default.

Named Parameters

Named parameters follow any required or optional parameters in the signature. They are marked by a + before the parameter.

    sub formalize($text, +$case, +$justify) {...}

Arguments that correspond to named parameters are evaluated in scalar context. They can only be passed by name, so it doesn't matter what order you pass them in, so long as they follow any positional arguments:

    $formal = formalize($title, case=>'upper');
    $formal = formalize($title, justify=>'left');
    $formal = formalize($title, justify=>'right', case=>'title');

Named parameters are always optional. Default values for named parameters are defined in the same way as for optional parameters. Named parameters default to undef if they have no default.

List Parameters

List parameters capture a variable length list of data. They're used in subroutines like print, where the number of arguments needs to be flexible. They're also called "variadic parameters," because they take a variable number of arguments.

Variadic parameters follow any required or optional parameters. They are marked by a * before the parameter:

    sub duplicate($n, *@data, *%flag) {...}

Named variadic arguments are bound to the variadic hash (*%flag in the above example). Such arguments are evaluated in scalar context. Any remaining variadic arguments at the end of the argument list are bound to the variadic array (*@data above) and are evaluated in list context.

For example:

    duplicate(3, reverse=>1, collate=>0, 2, 3, 5, 7, 11, 14);

    # The @data parameter receives [2, 3, 5, 7, 11, 14]
    # The %flag parameter receives { reverse=>1, collate=>0 }

Variadic scalar parameters capture what would otherwise be the first elements of the variadic array:

    sub head(*$head, *@tail)         { return $head }
    sub neck(*$head, *$neck, *@tail) { return $neck }
    sub tail(*$head, *@tail)         { return @tail }

    head(1, 2, 3, 4, 5);        # $head parameter receives 1
                                # @tail parameter receives [2, 3, 4, 5]

    neck(1, 2, 3, 4, 5);        # $head parameter receives 1
                                # $neck parameter receives 2
                                # @tail parameter receives [3, 4, 5]

Variadic scalars still impose list context on their arguments.

Variadic parameters are treated lazily -- the list is only flattened into an array when individual elements are actually accessed:

        @fromtwo = tail(1..Inf);        # @fromtwo contains a lazy [2..Inf]

Flattening Argument Lists

The unary prefix operator * flattens its operand (which allows the elements of an array to be used as an argument list). The * operator also causes its operand -- and any subsequent arguments in the argument list -- to be evaluated in list context.

    sub foo($x, $y, $z) {...}    # expects three scalars
    @onetothree = 1..3;          # array stores three scalars

    foo(1,2,3);                  # okay:  three args found
    foo(@onetothree);            # error: only one arg
    foo(*@onetothree);           # okay:  @onetothree flattened to three args

The * operator flattens lazily -- the array is only flattened if flattening is actually required within the subroutine. To flatten before the list is even passed into the subroutine, use the unary prefix ** operator:

    foo(**@onetothree);          # array flattened before &foo called

Pipe Operators

The variadic array of a subroutine call can be passed in separately from the normal argument list, by using either of the "pipe" operators: <== or ==>.

Each operator expects to find a call to a variadic subroutine on its "sharp" end, and a list of values on its "blunt" end:

    grep { $_ % 2 } <== @data;

    @data ==> grep { $_ % 2 };

First, it flattens the list of values on the blunt side. Then, it binds that flattened list to the variadic parameter(s) of the subroutine on the sharp side. So both of the calls above are equivalent to:

    grep { $_ % 2 } *@data;

Leftward pipes are a convenient way of explicitly indicating the typical right-to-left flow of data through a chain of operations:

    @oddsquares = map { $_**2 } sort grep { $_ % 2 } @nums;

    # more clearly written as...

    @oddsquares = map { $_**2 } <== sort <== grep { $_ % 2 } <== @nums;

Rightward pipes are a convenient way of reversing the normal data flow in a chain of operations, to make it read left-to-right:

    @oddsquares =
            @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 };

If the operand on the sharp end of a pipe is not a call to a variadic operation, then it must be a variable, in which case the list operand is assigned to the variable. This special case allows for "pure" processing chains:

    @oddsquares <== map { $_**2 } <== sort <== grep { $_ % 2 } <== @nums;

    @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 } ==> @oddsquares;

Closure Parameters

Parameters declared with the & sigil take blocks, closures, or subroutines as their arguments. Closure parameters can be required, optional, or named.

    sub limited_grep (Int $count, &block, *@list) {...}

    # and later...

    @first_three = limited_grep 3 {$_<10} @data;

Within the subroutine, the closure parameter can be used like any other lexically scoped subroutine:

    sub limited_grep (Int $count, &block, *@list) {
        if block($nextelem) {...}

The closure parameter can have its own signature (from which the parameter names may be omitted):

    sub limited_Dog_grep ($count, &block(Dog), Dog *@list) {...}

and even a return type:

    sub limited_Dog_grep ($count, &block(Dog) returns Bool, Dog *@list) {...}

When an argument is passed to a closure parameter that has this kind of signature, the argument must be a Code object with a compatible parameter list and return type.

Unpacking Array Parameters

Instead of specifying an array parameter as an array:

    sub quicksort (@data, ?$reverse, ?$inplace) {
        my $pivot := shift @data;

it may be broken up into components in the signature, by specifying the parameter as if it were an anonymous array of parameters:

    sub quicksort ([$pivot, *@data], ?$reverse, ?$inplace) {

This subroutine still expects an array as its first argument, just like the first version.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 6 for the current design information.

Attributive parameters

If a method's parameter is declared with a . after the sigil (like an attribute):

    method initialize($.name, $.age) {}

then the argument is assigned directly to the object's attribute of the same name. This avoids the frequent need to write code like:

    method initialize($name, $age) {
        $.name = $name;
        $.age  = $age;

Placeholder Variables

Even though every bare block is a closure, bare blocks can't have explicit parameter lists. Instead, they use "placeholder" variables, marked by a caret (^) after their sigils.

Using placeholders in a block defines an implicit parameter list. The signature is the list of distinct placeholder names, sorted in Unicode order. So:

    { $^y < $^z && $^x != 2 }

is a shorthand for:

    sub ($x,$y,$z) { $y < $z && $x != 2 }


These are the standard type names in Perl 6 (at least this week):

    bit         single native bit
    int         native integer
    str         native string
    num         native floating point
    ref         native pointer 
    bool        native boolean
    Bit         Perl single bit (allows traits, aliasing, etc.)
    Int         Perl integer (allows traits, aliasing, etc.)
    Str         Perl string
    Num         Perl number
    Ref         Perl reference
    Bool        Perl boolean
    Array       Perl array
    Hash        Perl hash
    IO          Perl filehandle
    Code        Base class for all executable objects
    Routine     Base class for all nameable executable objects
    Sub         Perl subroutine
    Method      Perl method
    Submethod   Perl subroutine acting like a method
    Macro       Perl compile-time subroutine
    Rule        Perl pattern
    Block       Base class for all unnameable executable objects
    Bare        Basic Perl block
    Parametric  Basic Perl block with placeholder parameters
    Package     Perl 5 compatible namespace
    Module      Perl 6 standard namespace
    Class       Perl 6 standard class namespace
    Object      Perl 6 object
    Grammar     Perl 6 pattern matching namespace
    List        Perl list
    Lazy        Lazily evaluated Perl list
    Eager       Non-lazily evaluated Perl list

Value Types

Explicit types are optional. Perl variables have two associated types: their "value type" and their "variable type".

The value type specifies what kinds of values may be stored in the variable. A value type is given as a prefix or with the returns or of keywords:

    my Dog $spot;
    my $spot returns $dog;
    my $spot of Dog;

    our Animal sub get_pet() {...}
    sub get_pet() returns Animal {...}
    sub get_pet() of Animal {...}

A value type on an array or hash specifies the type stored by each element:

    my Dog @pound;  # each element of the array stores a Dog

    my Rat %ship;   # the value of each entry stores a Rat

Variable Types

The variable type specifies how the variable itself is implemented. It is given as a trait of the variable:

    my $spot is Scalar;             # this is the default
    my $spot is PersistentScalar;
    my $spot is DataBase;

Defining a variable type is the Perl 6 equivalent to tying a variable in Perl 5.

Hierarchical Types

A nonscalar type may be qualified, in order to specify what type of value each of its elements stores:

    my Egg $cup;                       # the value is an Egg
    my Egg @carton;                    # each elem is an Egg
    my Array of Egg @box;              # each elem is an array of Eggs
    my Array of Array of Egg @crate;   # each elem is an array of arrays of Eggs
    my Hash of Array of Recipe %book;  # each value is a hash of arrays of Recipes

Each successive of makes the type on its right a parameter of the type on its left. So:

    my Hash of Array of Recipe %book;


    my Hash(returns=>Array(returns=>Recipe)) %book;

Because the actual variable can be hard to find when complex types are specified, there is a postfix form as well:

    my Hash of Array of Recipe %book;           # HoHoAoRecipe
    my %book of Hash of Array of Recipe;        # same thing
    my %book returns Hash of Array of Recipe;   # same thing

The returns form is more commonly seen in subroutines:

    my Hash of Array of Recipe sub get_book () {...}
    my sub get_book () of Hash of Array of Recipe {...}
    my sub get_book returns Hash of Array of Recipe {...}

Junctive Types

Anywhere you can use a single type you can use a junction of types:

    my Int|Str $error = $val;              # can assign if $val~~Int or $val~~Str

    if $shimmer.isa(Wax & Topping) {...}   # $shimmer must inherit from both

Parameter Types

Parameters may be given types, just like any other variable:

    sub max (int @array is rw) {...}
    sub max (@array of int is rw) {...}

Return Types

On a scoped subroutine, a return type can be specified before or after the name:

    our Egg sub lay {...}
    our sub lay returns Egg {...}

    my Rabbit sub hat {...}
    my sub hat returns Rabbit {...}

If a subroutine is not explicitly scoped, then it belongs to the current namespace (module, class, grammar, or package). Any return type must go after the name:

    sub lay returns Egg {...}

On an anonymous subroutine, any return type can only go after the name:

    $lay = sub returns Egg {...};

unless you use the "anonymous declarator" (a/an):

    $lay = an Egg sub {...};
    $hat = a Rabbit sub {...};

Properties and Traits

Compile-time properties are now called "traits." The is NAME (DATA) syntax defines traits on containers and subroutines, as part of their declaration:

    my $pi is constant = 3;

    my $key is Persistent(file=>".key");

    sub fib is cached {...}

The will NAME BLOCK syntax is a synonym for is NAME (BLOCK):

    my $fh will undo { close $fh };    # Same as: my $fh is undo({ close $fh });

The but NAME (DATA) syntax specifies run-time properties on values:

    my $pi = 3 but Approximate("legislated");

    sub system {
        return $error but false if $error;
        return 0 but true;

Subroutine Traits

These traits may be declared on the subroutine as a whole (not on individual parameters).

is signature
The signature of a subroutine -- normally declared implicitly, by providing a parameter list and/or return type.
returns/is returns
The type returned by a subroutine.
will do
The block of code executed when the subroutine is called -- normally declared implicitly, by providing a block after the subroutine's signature definition.
is rw
Marks a subroutine as returning an lvalue.
is parsed
Specifies the rule by which a macro call is parsed.
is cached
Marks a subroutine as being memoized
is inline
Suggests to the compiler that the subroutine is a candidate for optimization via inlining.
is tighter/is looser/is equiv
Specifies the precedence of an operator relative to an existing operator.
is assoc
Specifies the associativity of an operator.
Mark blocks that are to be unconditionally executed before/after the subroutine's do block. These blocks must return a true value, otherwise an exception is thrown.
Mark blocks that are to be conditionally executed before or after the subroutine's do block. The return values of these blocks are ignored.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 6 for the current design information.

Parameter Traits

The following traits can be applied to many types of parameters.

is constant
Specifies that the parameter cannot be modified (e.g. assigned to, incremented). It is the default for parameters.
is rw
Specifies that the parameter can be modified (assigned to, incremented, etc). Requires that the corresponding argument is an lvalue or can be converted to one.

When applied to a variadic parameter, the rw trait applies to each element of the list:

    sub incr (*@vars is rw) { $_++ for @vars }
is ref
Specifies that the parameter is passed by reference. Unlike is rw, the corresponding argument must already be a suitable lvalue. No attempt at coercion or autovivification is made.
is copy
Specifies that the parameter receives a distinct, read-writeable copy of the original argument. This is commonly known as "pass-by-value."

    sub reprint ($text, $count is copy) {
        print $text while $count-->0;
is context(TYPE)
Specifies the context that a parameter applies to its argument. Typically used to cause a final list parameter to apply a series of scalar contexts:

    # &format may have as many arguments as it likes,
    # each of which is evaluated in scalar context

    sub format(*@data is context(Scalar)) {...}

Advanced Subroutine Features

The &_ routine

&_ is always an alias for the current subroutine, much like the $_ alias for the current topic:

    my $anonfactorial = sub (Int $n) {
                            return 1 if $n<2;
                            return $n * &_($n-1)

The caller Function

The caller function returns an object that describes a particular "higher" dynamic scope, from which the current scope was called.

    print "In ",           caller.sub,
          " called from ", caller.file,
          " line ",        caller.line,

caller may be given arguments telling it what kind of higher scope to look for, and how many such scopes to skip over when looking:

    $caller = caller;                      # immediate caller
    $caller = caller Method;               # nearest caller that is method
    $caller = caller Bare;                 # nearest caller that is bare block
    $caller = caller Sub, skip=>2;         # caller three levels up
    $caller = caller Block, label=>'Foo';  # caller whose label is 'Foo'

The want Function

The want function returns an object that contains information about the context in which the current block, closure, or subroutine was called.

The returned context object is typically tested with a smart match (~~) or a when:

   given want {
        when Scalar {...}           # called in scalar context
        when List   {...}           # called in list context
        when Lvalue {...}           # expected to return an lvalue
        when 2      {...}           # expected to return two values

or has the corresponding methods called on it:

       if (want.Scalar)    {...}    # called in scalar context
    elsif (want.List)      {...}    # called in list context
    elsif (want.rw)        {...}    # expected to return an lvalue
    elsif (want.count > 2) {...}    # expected to return more than two values

The leave Function

A return statement causes the innermost surrounding subroutine, method, rule, macro or multimethod to return.

To return from other types of code structures, the leave funtion is used:

    leave;                      # return from innermost block of any kind
    leave Method;               # return from innermost calling method
    leave &_ <== 1,2,3;         # Return from current sub. Same as: return 1,2,3
    leave &foo <== 1,2,3;       # Return from innermost surrounding call to &foo
    leave Loop, label=>'COUNT'; # Same as: last COUNT;


The temp function temporarily replaces a variable, subroutine or other object in a given scope:

       temp $*foo = 'foo';      # Temporarily replace global $foo
       temp &bar = sub {...};   # Temporarily replace sub &bar
    } # Old values of $*foo and &bar reinstated at this point

temp invokes its argument's .TEMP method. The method is expected to return a reference to a subroutine that can later restore the current value of the object. At the end of the lexical scope in which the temp was applied, the subroutine returned by the .TEMP method is executed.

The default .TEMP method for variables simply creates a closure that assigns the variable's pre-temp value back to the variable.

New kinds of temporization can be created by writing storage classes with their own .TEMP methods:

    class LoudArray is Array {
        method TEMP {
            print "Replacing $_.id() at $(caller.location)\n";
            my $restorer = .SUPER::TEMP();
            return { 
                print "Restoring $_.id() at $(caller.location)\n";

You can also modify the behaviour of temporized code structures, by giving them a TEMP block. As with .TEMP methods, this block is expected to return a closure, which will be executed at the end of the temporizing scope to restore the subroutine to its pre-temp state:

    my $next = 0;
    sub next {
        my $curr = $next++;
        TEMP {{ $next = $curr }}  # TEMP block returns the closure { $next = $curr }
        return $curr;


Every subroutine has a .wrap method. This method expects a single argument consisting of a block, closure or subroutine. That argument must contain a call to the special call function:

    sub thermo ($t) {...}   # set temperature in Celsius, returns old temp

    # Add a wrapper to convert from Fahrenheit...

    $id = &thermo.wrap( { call( ($^t-32)/1.8 ) } );

The call to .wrap replaces the original subroutine with the closure argument, and arranges that the closure's call to call invokes the original (unwrapped) version of the subroutine. In other words, the call to .wrap has more or less the same effect as:

    &old_thermo := &thermo;
    &thermo := sub ($t) { old_thermo( ($t-32)/1.8 ) }

The call to .wrap returns a unique identifier that can later be passed to the .unwrap method, to undo the wrapping:


A wrapping can also be restricted to a particular dynamic scope with temporization:

    # Add a wrapper to convert from Kelvin
    # wrapper self-unwraps at end of current scope

    temp &thermo.wrap( { call($^t + 273.16) } );


Every subroutine has an .assuming method. This method takes a series of named arguments, whose names must match parameters of the subroutine itself:

    &textfrom := &substr.assuming(str=>$text, len=>Inf);

It returns a reference to a subroutine that implements the same behavior as the original subroutine, but has the values passed to .assuming already bound to the corresponding parameters:

    $all  = $textfrom(0);   # same as: $all  = substr($text,0,Inf);
    $some = $textfrom(50);  # same as: $some = substr($text,50,Inf);
    $last = $textfrom(-1);  # same as: $last = substr($text,-1,Inf);

The result of a use statement is a (compile-time) object that also has an .assuming method, allowing the user to bind parameters in all the module's subroutines/methods/etc. simultaneously:

    (use IO::Logging).assuming(logfile => ".log");

Other Matters

Anonymous Hashes vs. Blocks

{...} is always a block/closure unless it consists of a single list, the first element of which is either a hash or a pair.

The standard pair LIST function is equivalent to:

    sub pair (*@LIST) {
        my @pairs;
        for @LIST -> $key, $val {
            push @pairs, $key=>$val;
        return @pairs;

The standard hash function takes a block, evaluates it in list context, and constructs an anonymous hash from the resulting key/value list:

    $ref = hash { 1, 2, 3, 4, 5, 6 };   # Anonymous hash
    $ref = sub  { 1, 2, 3, 4, 5, 6 };   # Anonymous sub returning list
    $ref =      { 1, 2, 3, 4, 5, 6 };   # Anonymous sub returning list
    $ref =      { 1=>2, 3=>4, 5=>6 };   # Anonymous hash
    $ref =      { 1=>2, 3, 4, 5, 6 };   # Anonymous hash

Pairs as lvalues

Pairs can be used as lvalues. The value of the pair is the recipient of the assignment:

    (key => $var) = "value";

When binding pairs, names can be used to "match up" lvalues and rvalues:

    (who => $name, why => $reason) := (why => $because, who => "me");

Out-of-Scope Names

$CALLER::varname specifies the $varname visible in the dynamic scope from which the current block/closure/subroutine was called.

$MY::varname specifies the lexical $varname declared in the current lexical scope.

$OUTER::varname specifies the $varname declared in the lexical scope surrounding the current lexical scope (i.e. the scope in which the current block was defined).

This week on Perl 6, week ending 2003-04-06

Welcome my friends to the show that never ends. Yes, it's another of Piers Cawley's risible attempts to summarize the week's happenings in the Perl 6 development community. We start, as usual, with events in the perl6-internals world (not the perk6-internals world, obviously, that would be the sort of foolish typo that would never make it into any mail sent to the Perl 6 lists) where things have been quiet... too quiet. I think they're planning something.

Documentation for PMCs

Alberto Simoes posted some documentation for the Array PMC and suggested somewhere for it to be included in the Parrot distribution.


Extension PMC vs built-in

K. Stol asked how to go about creating extension PMCs which he could use without having to recompile the entire Parrot VM. No answers yet.


Documentation problem with set_global

K Stol reported a problem with parrot_assembly.pod, apparently there was a disagreement between the docs and the code about set_global. Leopold Tötsch agreed that this was a documentation bug, but pointed out that it was far from being the only one. He suggested that parrot_assembly.pod should only have a general description of Parrot Assembly, with details of 'real' ops in generated docs like core_ops.pod and information on as yet unimplemented in the appropriate PDDs.


Building IMCC in Win32

Clinton A Pierce's work last week to get IMCC building in a Win32 environment was admired by Leo Tötsch, who promised to check the needed patches into the distribution.


Typos in rx.ops

Cal Henderson submitted a patch to fix some typos in rx.ops, which were applied. Anton Berezin sent in a further patch to fix a few more errors.


Parrot regex engine

Benjamin Goldberg praised rx.ops, commenting that he was 'amazed at how cool it is' and wondered if the work was based on something else. Steve Fink thought it was original to Brent Dax. Steve also pointed to an unfinished draft of his detailed description of how at least one of Parrot's regex implementations works.


http://0xdeadbeef.net/~sfink/uploads/regex.pod -- Steve Fink's description

Dan Explains Everything

Away from the internals list, Dan Sugalski has been doing some fascinating brain dumps about the 'whys' of Parrot in his weblog 'Squawks of the Parrot'. Specifically, he's been addressing why JVM and .NET weren't chosen. In the course of this discussion he's come up with a rather neat one sentence description of continuations: ``A continuation is essentially a closure that, in addition to closing over the lexical environment, also closes over the control chain.''

Anyway, I commend the whole blog to you.


Meanwhile over in perl6-language

The perl6-language list was where all the action was this week with the list attracting far more traffic than the internals list at the moment.

How shall threads work in P6?

Austin Hastings has 'been thinking about closures, continuations, and coroutines, and one of the interfering points has been threads.' Austin wants to know what the Perl 6 threading model will look like, and threw out a few suggestions. Matthijs van Duin thought that preemptive threading in the core would be unnecessary, arguing that cooperative threading would solve everyone's problems. Almost nobody agreed with him. Most tellingly, Dan disagreed with him. Parrot will be getting preemptive threads and that's not a point for negotiation.

Larry popped up to point out that, at the language level, the underlying thread implementation (whether preemptive or cooperative) really didn't matter, what was more important to the user was how much information was shared between threads. Larry said he thought that both the pthreads (all globals shared by default) and ithreads (all globals unshared by default) approaches to threading were 'wrong to some extent'.



Conditional returnss?

Michael Lazzaro was concerned about the inefficiency of

    return result_of_big_long_calculation(...) 
        if result_of_big_long_calculation(...);

and ugliness of:

    my $x = result_of_big_long_calculation(...);
    return $x if $x;

He wondered if there was a less ugly way to do it in Perl 6. There is:

    given result_of_big_long_calculation(...) { return $_ when true }

was the most popular solution, but Dave Whipp pointed out that it would also be possible to write:

    sub return_if_true ($value) {
        if $value {
            leave where => caller(1), value => $value;

and then just do:


You know, I like Perl 6 more and more.


== vs. eq

Luke Palmer wondered if there was anything to be gained by getting rid of the distinction between == and eq and falling back to a polymorphic ==. Brent Dax pointed out that the ~~ 'smartmatch' operator did exactly that. Luke said that ~~ didn't meet his needs (its behaviour with hashes, for instance, isn't what you want when trying to find out if two hashes are equal). Smylers pointed out that Larry had already rejected (and had given good reasons why) the idea. Luke still wanted a 'generic equality' operator though. Marco Baringer threw a spanner in the works by pointing out that no such thing as 'generic equality' exists. Luke also came up with a symbol for an identity operator, =:= defined as:

    sub operator:=:= ( $a, $b ) { $a.id == $b.id }

Michael Lazzaro isn't convinced of the need for a universal .id method, but he does like the idea of an identity test operator. There was then some discussion of

    \@a == \@b;

which, Michael claims would compare the lengths of @a and @b rather than their addresses. He backed this up with a quote from Larry where he said ``But it's probably fair to say that $foo and @foo always behave identically in a scalar context''. I think Michael's wrong because \@a is not the same as $a. By his argument it seems that even

    my $array_ref = \@foo;
    my $ary_ref2  = [1, 2, 3];

won't work, which seems wrong somehow.

Discussion continued with various different possible implementations of 'generic' equality operators thrown up for consideration. One subthread even saw the return of Tom ``I thought he'd got a life and left the Perl community'' Christiansen in a thread that ended up talking about transfinite numbers and the cardinality of infinite sets (which was interesting, but rather beside the point). Andy Wardley got bonus points for making me laugh: ``Gödel seems to think there is an answer, but he doesn't know what it is. Turing is trying to work it out on a piece of paper, but he can't say when (or if) he'll be finished.'' Okay, it may not make you laugh, but my degree is in maths.


http://groups.google.com/groups -- Why generic equality doesn't exist

is is overoverloaded?

Luke Palmer was concerned that is has too many meanings. It gets used for setting variable traits, defining base classes and setting up tied variables. Luke is concerned that these appear to be too varied in meaning, and proposed at least adding a new isa keyword for defining base classes. There was some discussion, but I'm not sure that either side is convinced of the other's rectitude.


Short-circuiting user-defined operators

Joe Gottman wondered if there would be a way to write an operator which would short circuit in the same way as && and ||. He wondered if this would be a function trait (suggesting is short_circuit), or something cunning to do with parameters. Larry thought Joe was trying too hard, proposing:

    sub infix:!! ($lhs, &rhs) is equiv(&infix:||) {...}

He noted that if that couldn't be made to work you might have to declare the operator as a macro, but that it still shouldn't be too hard to implement. Dave Whipp wasn't too sure about Larry's assertion that declaring a parameter as &foo should force the RHS of that operator to be a closure. What, he wondered would happen with 10 !! &closure?



Ruminating RFC 93 - alphabet-blind pattern matching

Yary Hluchan has been reading the Perl 6 RFC 93, which Larry discussed in Apocalypse 5 and discussed doing regular expression type matches on streams of 'things', where the things weren't necessarily Unicode characters. At one point though Larry popped up to say that the discussion seemed to run contrary to what he said about RFC 93 in Apocalypse 5.

Further down the thread, Ed Peschko wondered about using the Regex engine 'in reverse' to generate a list of all the possible strings that a regular expression could match. Everyone politely assumed that when Ed referred to the rule rx/a*/ he really meant rx/^a*$/ and got stuck into the problem. Matthijs van Duin pointed out that to get a 'useful' list of productions you'd have to go about producing all possible matches of length 'n' before giving any possibilities of length n+1 (where 'length' counts atoms I think...). Once Luke Palmer had that key insight he showed off an implementation that could generate such a list. Piers Cawley used Luke's cunning implementation to show a lazy generate_all method that would iterate through all the possible productions of a rule. Luke said that he hoped it wouldn't actually need such a wrapper to be lazy. Personally I think that might be a hope too far...


http://groups.google.com/groups -- Luke's code, with explication further down the thread

Named vs. Variadic Parameters

This thread continued from last week. The problem is that, if you want to define a function that you would call in the following fashion:

    foo $pos, opt_nam => 'value', 'slurpy', 'list', ...;

Then your signature needs to look like

    sub foo ( $pos, *@slurpy, +$opt_nam ) {...}

which seems rather counter intuitive (though mixing positional and non positional parameters could be seen to be counter intuitive whatever happens...) Various people argued for longer forms to denote the differences between positional, named, optional etc arguments and proposed various solutions. Larry didn't sound all that convinced, pointing out that if you wanted to implement such parameter specification styles you could always use a macro and declared that ``Perl 6 is designed to allow cultural experimentation. You may reclassify any part of Perl 6 as a bike shed, and try to persuade other people to accept your taste in color schemes. Survival of the prettiest, and all that...''



Implicit threading

The discussion of how to write short circuiting operators spawned a subthread (which, rather annoyingly shared the same subject...) trigged by a suggestion of David Whipp's to do lazy initialization of some variables by spawning subthreads. The idea being that one could do

   my $a is lazy := expensive_fn(...);

or something, and Perl would immediately spawn a subthread to calculate the value of expensive_fn(...) then, at the point you come to use the value of $a, the main thread would do a thread join and block if the subthread hadn't finished. Austin Hastings liked the idea and went on to suggest other places in which implicit threading could be used. My gut feeling is that this is the sort of thing that will get implemented in a module on top of whatever explicit threading system we get in Perl 6.

http://groups.google.com/groups -- Dave Whipp's cunning idea

http://groups.google.com/groups -- Further discussion here

Properties and Methods

Luke Palmer was concerned about potential naming clashes between properties and methods. He thought that things would be safer if, instead of accessing properties using method call type syntax. He was also worried about method naming clashes between the methods of things in a junction, and of the Junction class itself. Nobody had commented on this by the end of the week.


Patterns and Junctions

Adam D. Lopresto had a suggestion for using two different kinds of alternation in regular expressions. In Adam's proposal, alternation with | could ignore the order of the alternatives when attempting to make a match and alternation with || would do the kind of order respecting alternation we're used to. I confess that I wondered what on earth Adam was on about, but Luke Palmer liked the idea, pointing out a way something like it could be used for globally optimizing large grammars. However, he was unconvinced by the syntax and suggested that it would be better if we could mark rules as being 'side effect free' and thus capable of being run in arbitrary order.


Acknowledgements, Announcements and Apologies

The schedules for both OSCON and YAPC::USA have been announced and there's a fair amount of Perl 6 and Parrot related content at both of them. I shall be attending both conferences (but only speaking at YAPC -- about refactoring tools and about Pixie; the object persistence tool I work on, neither of which are Perl 6 related) and driving from YAPC to OSCON with probable stops in Washington DC, Philadelphia, Boston, Burlington VT, Ottawa and Chicago (where we cheat and hop on a train). Hopefully I'll get to meet up with some of my happy readers at some or all of these stops and at the conferences.

http://www.yapc.org/America/ -- YAPC

http://conferences.oreillynet.com/os2003/ -- OSCON

I would like to apologize to Leon Brocard for failing to mention his name in this week's summary. Leon is offering a 'Little Languages in Parrot' talk at OSCON though.

If you've appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk

Apache::VMonitor - The Visual System and Apache Server Monitor

Stas Bekman is a coauthor of O'Reilly's upcoming Practical mod_perl.

It's important to be able to monitor your production system's health. You want to monitor the memory and file system utilization, the system load, how much memory the processes use, whether you are running out of swap space, and so on. All these tasks are feasible when one has an interactive (telnet/ssh/other) access to the box the Web server is running on, but it's quite a mess since different Unix tools report about different parts of the system. All of this means that you cannot watch the whole system at the one time; it requires lots of typing since one has to switch from one utility to another, unless many connections are open and then each terminal is dedicated to report about something specific.

But if you are running mod_perl enabled Apache server, then you are in good company, since it allows you to run a special module called Apache::VMonitor thatprovides most of the desired reports at once.


The Apache::VMonitor module provides even better monitoring functionality than top(1). It gives all the relevant information top(1) does, plus all the Apache specific information provided by Apache's mod_status module, such as request processing time, last request's URI, number of requests served by each child, etc. In addition, it emulates the reporting functions of the top(1), mount(1), df(1) utilities. There is a special mode for mod_perl processes. It has visual alerting capabilities and a configurable automatic refresh mode. It provides a Web interface, which can be used to show or hide all sections dynamically.

The module provides two main viewing modes:

  1. Multi-processes and system overall status reporting mode
  2. A single process extensive reporting system

Prerequisites and Configuration

You need to have Apache::Scoreboard installed and configured in httpd.conf, which in turn requires mod_status to be installed. You also have to enable the extended status for mod_status, for this module to work properly. In httpd.conf add:

  ExtendedStatus On

You also need Time::HiRes and GTop to be installed. GTop relies in turn on libgtop library, which is not not available for all platforms.

And, of course, you need a running mod_perl-enabled Apache server.

To enable this module, you should modify a configuration in httpd.conf, if you add the following configuration:

  <Location /system/vmonitor>
    SetHandler perl-script
    PerlHandler Apache::VMonitor

The monitor will be displayed when you request http://localhost/system/vmonitor.

You probably want to protect this location from unwanted visitors. If you always access this location from the same IP address, then you can use a simple host-based authentication:

  <Location /system/vmonitor>
    SetHandler perl-script
    PerlHandler Apache::VMonitor
    order deny, allow
    deny  from all
    allow from

Alternatively you may use the Basic or other authentication schemes provided by Apache and various extensions.

You can control the behavior of this module by configuring the following variables in the startup file or inside the <Perl> section.

You should load the module in httpd.conf:

  PerlModule Apache::VMonitor

or from the the startup file:

  use Apache::VMonitor();

You can alter the monitor reporting behavior by tweaking the following configuration arguments from within the startup file:

  $Apache::VMonitor::Config{BLINKING} = 1;
  $Apache::VMonitor::Config{REFRESH}  = 0;
  $Apache::VMonitor::Config{VERBOSE}  = 0;

You can control what sections are to be displayed when the tool is first accessed via:

  $Apache::VMonitor::Config{SYSTEM}   = 1;
  $Apache::VMonitor::Config{APACHE}   = 1;
  $Apache::VMonitor::Config{PROCS}    = 1;
  $Apache::VMonitor::Config{MOUNT}    = 1;
  $Apache::VMonitor::Config{FS_USAGE} = 1;

You can control the sorting of the mod_perl processes reports. These report can be sorted by one of the following columns: ``pid'', ``mode'', ``elapsed'', ``lastreq'', ``served'', ``size'', ``share'', ``vsize'', ``rss'', ``client'', ``request''. For example to sort by process size, try the following setting:

  $Apache::VMonitor::Config{SORT_BY}  = "size";

As the application provides an option to monitor processes other than mod_perl ones, you may set a regular expression to match the wanted processes. For example, to match the process names, which include httpd_docs, mysql and squid string, the following regular expression is to be used:

  $Apache::VMonitor::PROC_REGEX = join "\|", qw(httpd_docs mysql squid);

We will discuss all these configuration options and their influence on the application shortly.

Multi-processes and System Overall Status Reporting Mode

The first mode is the one that is mainly used, since it allows you to monitor almost all important system resources from one location. For your convenience you can turn on and off different sections on the report, to make it possible for reports to fit into one screen.

This mode comes with the following features.

Automatic Refreshing Mode
You can tell the application to refresh the report every few seconds. You can preset this value at the server startup. For example, to set the refresh to 60 seconds you should add the following configuration setting:
  $Apache::VMonitor::Config{REFRESH} = 60;

A 0 (zero) value turns automatic refreshing off.

When the server is started you can always adjust the refresh rate using the application user interface.

top(1) Emulation: System Health Report
Just like top(1), it shows current date/time, machine up-time, average load, all the system CPU and memory usage: CPU load, real memory and swap partition usage.

The top(1) section includes a swap space usage visual alert capability. As we know swapping is undesirable on production systems. The system is said to be swapping when it has used all of its RAM and starts to page out unused memory pages to the slow swap partition, which slows down the whole system and may eventually lead to the machine crash.

Therefore, the tool helps to detect abnormal situation by changing the swap report row's color according to the following rules:

         swap usage               report color
   5Mb < swap < 10 MB             light red
   20% < swap (swapping is bad!)  red
   70% < swap (almost all used!)  red + blinking (if enabled)

Note that you can turn on the blinking mode with:

  $Apache::VMonitor::Config{BLINKING} = 1;

The module doesn't alert when swap is being used just a little (<5Mb), since it happens most of the time on many Unix systems, even when there is plenty of free RAM.

If you don't want the system section to be displayed set:

  $Apache::VMonitor::Config{SYSTEM} = 0;

The default is to display this section.

top(1) Emulation: Apache/mod_perl Processes Status
Then just like in real top(1) there is a report of the processes, but it shows all the relevant information about mod_perl processes only!

The report includes the status of the process (Starting, Reading, Sending, Waiting, etc.), process' ID, time since current request was started, last request processing time, size, shared, virtual and resident size. It shows the last client's IP and Request URI (only 64 chars, as this is the maximum length stored by underlying Apache core library).

This report can be sorted by any column during the application uses, by clicking on the name of the column, or can be preset with the following setting:

  $Apache::VMonitor::Config{SORT_BY}  = "size";

The valid choices are: ``pid'', ``mode'', ``elapsed'', ``lastreq'', ``served'', ``size'', ``share'', ``vsize'', ``rss'', ``client'', ``request''.

The section is concluded with a report about the total memory being used by all mod_perl processes as reported by kernel, plus an extra number, which results from an attempt to approximately calculate the real memory usage when memory sharing is taking place. The calculation is performed by using the following logic:

  1. For each process, sum up the difference between shared and total memory.
  2. Now if we add the share size of the process with the maximum shared memory, then we would get all the memory that is actually used by all mod_perl processes apart from the parent process.

Please note that this might be incorrect for your system, so you should use this number on your own risk. We have verified this number on the Linux OS, by taken the number reported by Apache::VMonitor, then stopping mod_perl and looking at the system memory usage. The system memory went down approximately by the number reported by the tool. Again, use this number wisely!

If you don't want the mod_perl processes section to be displayed set:

  $Apache::VMonitor::Config{APACHE} = 0;

The default is to display this section.

top(1) Emulation: Any Processes
This section, just like the mod_perl processes section, displays the information in a top(1) fashion. To enable this section you have to set:
  $Apache::VMonitor::Config{PROCS} = 1;

The default is not to display this section.

Now you need to specify which processes are to be monitored. The regular expression that will match the desired processes is required for this section to work. For example, if you want to see all the processes whose name include any of these strings: http, mysql and squid, then the following regular expression would be used:

  $Apache::VMonitor::PROC_REGEX = join "\|", qw(httpd mysql squid);

The following snapshot visualizes the sections that have been discussed so far.

Figure 1.1: Emulation of top(1), Centralized Information About mod_perl and Selected Processes

(Click for larger image)

As you can see the swap memory is heavily used and therefore the swap memory report is colored in red.

mount(1) Emulation
This section reports about mounted filesystems, the same way as if you have called mount(1) with no parameters.

If you want the mount(1) section to be displayed set:

  $Apache::VMonitor::Config{MOUNT} = 1;

The default is NOT to display this section.

df(1) Emulation
This section completely reproduces the df(1) utility. For each mounted filesystem, it reports the number of total and available blocks (for both superuser and user), and usage in percents.

In addition it reports about available and used file inodes in numbers and percents.

This section has a capability of visual alert which is being triggered when either some filesystem becomes more than 90 percent full or there are less than 10 percent of free file inodes left. When this event happens, the filesystem-related report row will be displayed in the bold font and in the red color. A mount point directory will blink if the blinking is turned on. You can turn the blinking on with:

  $Apache::VMonitor::Config{BLINKING} = 1;

If you don't want the df(1) section to be displayed set:

  $Apache::VMonitor::Config{FS_USAGE} = 0;

The default is to display this section.

The following snapshot presents an example of the report consisting of the last two sections that were discussed (df(1) and mount(1) emulation), plus the ever important mod_perl processes report.

Figure 1.2: Emulation of df(1) both Inodes and Blocks Utilization. Emulation of mount(1).

(Click for larger image)

You can see that /mnt/cdrom and /usr filesystems are utilized for more than 90 percent and therefore colored in red. (It's normal for /mnt/cdrom, which is a mounted cdrom, but critical for the /usr filesystem which should be cleaned up or enlarged).

abbreviations and hints
The report uses many abbreviations, which might be knew for you. If you enable the VERBOSE mode with:
  $Apache::VMonitor::Config{VERBOSE} = 1;

this section will reveal all the full names of the abbreviations at the bottom of the report.

The default is NOT to display this section.

A Single Process Extensive Reporting System

If you need to get an in-depth information about a single process, then you just need to click on its PID.

If the chosen process is a mod_perl process, then the following info would be displayed:

  • Process type (child or parent), status of the process (Starting, Reading, Sending, Waiting, etc.), how long the current request is processed or the last one was processed if the process is inactive at the moment of the report take.
  • How many bytes transferred so far. How many requests served per child and per slot.
  • CPU times used by process: total, utime, stime, cutime, cstime.

For all (mod_perl and non-mod_perl) processes the following information is reported:

  • General process info: UID, GID, State, TTY, Command line arguments
  • Memory Usage: Size, Share, VSize, RSS
  • Memory Segments Usage: text, shared lib, date and stack.
  • Memory Maps: start-end, offset, device_major:device_minor, inode, perm, library path.
  • Loaded libraries sizes.

Just like the multi-process mode, this mode allows you to automatically refresh the page on the desired intervals.

The following snapshots show an example of the report about one mod_perl process:

Figure 1.3: Extended information about processes: General Process Information

(Click for larger image)

Figure 1.4: Extended information about processes: Memory Maps

(Click for larger image)

Figure 1.5: Extended information about processes: Loaded Libraries

(Click for larger image)


Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en