October 2001 Archives

The Lighter Side of CPAN

Programming can be a stressful game. We sit at our machines, day in, day out, endlessly staring at a monitor fathoming where that last devious bug has buried itself. It's not surprising then that sometimes it gets to be too much, and someone, somewhere, snaps. The strangest things become hilarious and valuable hours will be wasted developing a strained play on words into a fully tested and highly featured module.

Portions of this creative outpouring of humor often find their way onto the CPAN. There is even the special Acme::* name space set aside for these bizarre freaks so they do not interfere with the normal smooth running of that ever so special service. It may seem a little pointless for someone to release these into the wild, but you may be surprised what you could learn from them. A good joke is usually marked out by being both funny and simple, and the same applies to jokes told in Perl. All unnecessary detail is stripped away, leaving a short piece of code that makes a perfect example of how to use, abuse or create a language feature.

There now follows a brief tour of some (but not all) of the more amusing extensions to Perl, along with some hints on how you might improve your programs by taking inspiration from the tricks they use to implement their punch lines. (Ba-boom tching!)

Getting It Wrong the Wright Way

Perl is a lazy language. It appeals to the discerning man of leisure who wants nothing more than a chance to get things done quickly. There are some who would have Perl be more lazy but, until rescued by Dave Cross and his band of visionaries, they were forced to work hard making sure that every single keystroke they made was correct. Dismayed by errors like:

 Undefined subroutine &main::find_nexT called at frob.pl line 39.

many turned away from Perl. Now they return in droves, using Symbol::Approx::Sub, their lives are made an order of magnitude easier, leaving them more time to sip cocktails as they lounge in deck chairs. This module can even save you money; observe the simple calculation of your fiscal dues after the careful application of a couple of typos:

 #!/usr/bin/perl
 use Symbol::Approx::Sub;
 sub milage  { 40_000   };
 sub taxipay {     10   };
 sub tax2pay {$_[0]*0.4 };
 sub myage   {     25   };

 # Sell car
 print "For Sale: Very Good Car, only @{[ miage()]} on the clock\n";

 # Cheque for tax man
 my $income = 40_000;
 print "Must pay taxes, cheque for: @{[ taxpay($income) ]}\n";

A calculation which could not be faulted by any government, but which will leave you with a brand new car and, half the time, a whopping rebate:

 For Sale: Very Good Car, only 25 miles on the clock
 Must pay taxes, cheque for: 10

How does this all work? There are two major bits of magic going on: installing a handler that intercepts calls to undefined subroutines and a tool for divining the names of those routines that it can call instead.

The handler is implemented by creating an AUTOLOAD function for the module that uses Symbol::Approx::Sub. When Perl is asked to run a subroutine that it cannot locate, it will invoke AUTOLOAD in the package where it was first told to look for the subroutine. This is handed the same arguments as the original function call and is told which subroutine Perl was looking for in the global $AUTOLOAD variable. AUTOLOAD is mainly used to write lazy accessors for object data, this example:

 sub AUTOLOAD {
    my ($self, $value) = @_;
    my ($field) = $AUTOLOAD =~ /.*::(.*)$/;
    return if $field eq 'DESTROY';
    return @_==1 ? $self->{$field} : $self->{$field} = $value;
 }

provides simple get/set methods. So that:

 $object->name('Harry');
 print $object->name, "\n";

prints Harry even though you haven't had to explicitly write a name method for your object.

Perl stores all information about non-lexical variables, filehandles and subroutine names in a massive hash. You can inspect this yourself but doing so requires code that is so close to spaghetti you could plate it up and serve it to an Italian. The easy way out, wisely taken by the Symbol::Approx::Sub module, is to use the Devel::Symdump module that provides a friendly and clean interface to Perl's symbol table. Devel::Symdump provides various useful tools: If you are scratching your head trying to resolve an inheritance tree, then the isa_tree method will help; if you want to find exactly what a module exports into your namespace, then you'll find the diff method a constant friend.

Presently Living in the Past

Ever since the ancients erected the huge lumps of stone that paved the way for the digital watches that we hold so dear, mankind has needed to know when he is. Perl is no different and now has many -- although some might say too many -- date- and time-related modules, around 80 of them in fact. Simple statistics tell us that at least a few of those should be so useless we couldn't possibly resist trying to find something to do with them.

Date::Discordian, although lacking in chicken references. Gobble.

This could, in a limited set of circumstances, be helpful though. Imagine the scene: a trusted client is on the phone demanding a completion date for the project doomed to persist. You reach for the keyboard and, in a moment of divine inspiration, type:

 perl -MDate::Discordian -le'print discordian(time+rand(3e7))'

You then soberly relate the result to your tormentor, ``Prickle Prickle, Discord 16 YOLD 3168'' and, suddenly, everything is alright. Well, they've put the phone down and left you in peace. If you prefer to provide a useful service, then you might be better off investigating the Time::Human module by Simon Cozens. This creates person-friendly descriptions of a time, transforming the excessively precise 00:23:12.00 into a positively laid-back ``coming up to 25 past midnight.'' The module is internationalized and could be used in conjunction with a text-to-speech system, such as festival, to build an aural interface to something like a ticket-booking system.

Moving swiftly on, we come to Date::Tolkien::Shire, a king amongst date modules. Most newspapers carry an ``on this day in history'' column -- where you find, for instance, that you were born on the same day as the man who invented chili-paste -- but no broadsheet will tell you what happened to Frodo and his valiant companions as they fought to free Middle Earth from the scourge of the Dark Lord. The undeceptively simple:

 use Date::Tolkien::Shire;
 print Date::Tolkien::Shire->new(time)->on_date, "\n";

outputs (well, output a few days ago):

 Highday Winterfilth 30 7465
 The four Hobbits arrive at the Brandywine Bridge in the dark, 1419.

What better task could there be for crontab but to run this in the wee hours and update /etc/motd for our later enjoyment. Implementing this is, as ever, left as an exercise for the interested reader.

There is a more useful side to Date::Tolkien::Shire or, at the very least, it does light the way for other modules. As well as the on_date() method it provides an overloaded interface to the dates it returns. This allows you to compare dates and times as if they were normal numbers, so that:

 $date1 = Date::Tolkien::Shire->new(time);
 $date2 = Date::tolkien::Shire->new(time - 1e6);

 print 'time is '.( $date1 > $date2  ? 'later':'earlier' ).
     "than time -1e6\n";

prints time is later than time -1e6, the more prosaic Date::Simple module provides a similar interface for real dates and ensures they stringify with ISO formatting.

From One Date to Another

It is often said that computers and relationships don't mix well but this isn't entirely true. If you feel alone in the world and need to find that special person, then Perl is there to help you. Your first task is to meet someone. Perhaps by putting an advertisement on a dating service. Of course, you want to find the very best match and, being fond of concise notation, decide you will search for your companion with the help of the geek code. But how is your prospective mate to know what all those funny sigils mean? With the help of the Convert::GeekCode module of course:

 use Convert::GeekCode;

 print join("\n",geek_decode(<<'ENDCODE'),"\n");
 -----BEGIN GEEK CODE BLOCK-----
 GS d+ s:- a--a? C++> UB++ P++++ !L E+ W+++ N++ K? w--() !M PS++ PE+
 Y PGP+ t+(-) 5++ !X R+ !tv b+++  DI++ D+++ G e* h y?
 ------END GEEK CODE BLOCK------
 ENDCODE

Will tell you, amongst other things, that ``I don't write Perl, I speak it. Perl has superseded all other programming languages. I firmly believe that all programs can be reduced to a Perl one-liner.''

So, you've got a reply and someone wants to meet you. This is a worrying prospect though as you feel you'll need to brush up on your conversation skills a little before meeting your date. Again, Perl comes to your aid with Chatbot::Eliza, which is especially useful if you want to meet a simple-minded psychologist. Fire her up with:

 perl -MChatbot::Eliza -e'Chatbot::Eliza->new->command_interface'

and enjoy hours of elegant conversation:

 you:    I like pie
 Eliza:  That's quite interesting.

If your wit and repartee fail to impress, then you may want to convince your partner that you have a deep and lasting interest in some obscure branch of mystical poetry. Doing this requires some mastery of ZenPAN combined with a careful study of Lee Goddard's Poetry::Aum. More than any other module, this teaches you that true understanding comes from within: by inspecting the source of all your powers. The source code, that is.

If none of this works or you find you've arranged a date with a total bore, don't despair. There are ways to move the encounter toward an interesting conclusion. Simply let Michael Schwern's Bone::Easy take the pain out of dumping your burden.

 perl -MBone::Easy -le'print pickup()'

 When are you going to drain that?

How could all this be useful though? Convert::GeekCode hints at Perl's greatest strength: data transformation. The remaining 20 or so Convert::* modules can sometimes be a Godsend. If you are having trouble with EBCDIC-encoded text or need to make your product catalog acceptable to people who need whichever of metric and Imperial units you haven't provided, then you'll find something to furnish the answer.

Chatbot::Eliza on the other hand is a shining example of code whose behavior you can change easily. Because it was written using Perl's OO features and a bit of thought was applied while deconstructing the problem it addresses, it is full of hooks from which you can dangle your own bits of code, perhaps to use a different interface or a text to speech system. Can Bone::Easy teach you anything? Who knows ...

A Day at the Races

Having foolishly followed my dating advice above you will have a great deal of time to yourself but do not fear, you can still keep yourself amused. If you have a sporting bent, then Jos Boumans's ACME::Poe::Knee uses the wonderful POE framework to race ponies across your screen; you could even make bets with yourself sure in the knowledge that you'll end the day even. One day POE may fulfill its original purpose and morph into a multi-user dungeon (MUD), although at the moment, alas, it is far too busy being useful.

If you get tired of watching ACME::Poe::Knee, then you can instead follow Sean M Burke's Games::Worms, in combination with the cross-platform Tk Perl bindings, as it draws pretty patterns on your screen. Tk is only one of many graphical toolkits for Perl that can be used to quickly prototype an interface design or glue together a range of command line applications with a common frontend.

When Bugs Attack

Every now and then, despite all your best efforts to program extremely and extract the maximum of laziness from Perl, you will come across a deeply buried, complicated and fatal bug in your code. Your spirits sink when you discover one, the next two days of your precious time will be filled with cryptic error messages flashing all over your terminal:

 Something is very wrong at Desiato.pl line 22.

It needn't be like this though - there is a better way. You'll still have to fight this bug for days but you can keep your blood pressure at bay with a little application of Damian Conway's Coy. Simply add:

 PERL5OPT=-MCoy

to your environment (ideally somewhere global like /etc/.cshrc) so that any time Perl explodes all over your hard disk you'll be greeted by a soothing Haiku to take the edge off your pain:

        -----
        A woodpecker nesting 
        in a lemon tree. Ten 
        trout swim in a stream.
        -----
                Tor Kin Tun's commentary...

                Something is very wrong
                        (Analects of Desiato.pl: line 22.)

Setting PERL5OPT can help you in normal circumstances. Should you be developing an existing library you will often want to switch from the new to the old version, saying export PERL5OPT=-Ipath/to/new is less hassle than fiddling with use lib 'path/to/new' within your code.

These, along with a much larger host of useful modules, are available from the CPAN.

Perl 6 : Not Just For Damians

London.pm technical meetings are always inspiring events with top notch speakers. At our most recent gathering Richard Clamp and Mark Fowler gave us "Wax::On, Wax::Off -- how to become a Perl sensei"; Paul Mison showed us how to make an infobot read the BBC news; Richard Clamp and Michael Stevens explained "Pod::Coverage", their contribution to Kwalitee assurance; and, for the first time ever, Leon Brocard didn't talk about Graphing Perl.

However, the highlight of the evening was Simon Cozens's first public demonstration of Parrot, the new virtual computer that will one day run Perl 6.

After he'd finished the talk we expected, he pulled a crumpled piece of paper from a secret pocket. This, he whispered, was an early draft of Apocalypse 3 which he'd smuggled out at great personal risk from under the very noses of the Design Team. An expectant hush fell as he proceeded to reveal the highlights.

The reception his heroic effort received was... low key. Everyone was pleased to get an early peek at what Larry was thinking, but there were widespread mutterings about "all this needless complexity" and "mere syntactic sugar". Almost everyone grumbled about the use of '.' for dereferencing. Almost everyone groused about using '_' for concatenation. And the reassurance that "It's only syntax" didn't seem to appease the doubters.

And then we all went for beer and/or Chinese food.

Fast forward to last weekend. The Apocalypse was up on the website and Damian had just published his Exegesis when Simon Wistow (London.pm's official scapegoat) warned the mailing list that he had:

... an impending sense of doom about Perl 6. The latest Apocalypse/Exegesis fill me with fear rather than wonder. I've got a horrible feeling that Perl 6 is trying to do too much at once.

This provoked a firestorm of agreement. The general consensus was that the latest Apocalypse was:

reinventing wheels which we already have in abundance. And those new wheels have syntax that is only going to confuse those who are already experienced perl5 users

It seems that what we have here is a failure to communicate.

Yes, Damian's efforts have been superb in providing examples of code based on the Apocalypses, and I don't think anyone denies the sterling work that Dan and his team are doing with Parrot. But people do seem to be worried about Perl 6 being a rewrite for the Damians of this world, not for the ordinary Joe.

Well, I'm here to tell you that this ordinary Piers doesn't have a problem with Perl 6. In fact I'm excited and inspired by most of the work that's been done so far, and I hope to convince you too.

"Perl 6 doesn't look like Perl"

Well, up to a point. The thing that you have to remember when reading the sample code that Damian provides in his Exegeses is that he is deliberately exercising all the new features in a condensed example. The most recent code sample is initially scary because there's so much stuff in Apocalypse 3. Admittedly, $self.method() looks weird now, but then, $self->method() looked weird when Perl 5 was introduced. And, on rereading Damian's example with an eye to what hasn't changed, the whole thing still looks like Perl.

"Perl 6 just gives us syntax for stuff we can already do"

That's a mighty big 'just' there, partner. Consider the currying syntax. Before this came along, currying was possible, but required an unreasonable amount of manual work to implement. Just consider the following, 'simple' example:

In perl 6 we have:

    $^_ + $^_

In perl 5, if you didn't worry about currying you'd write:

    sub { $_[0] + $_[1] }

If you do worry about currying, you'll have to write:

    do { 
        my $self;
        $self = sub {
            my ($arg1, $arg2) = @_;
            return $arg1 + $arg2             if @_==2;
            return sub { $self->($arg1,@_) } if @_==1;
            return $self;
        }
    }

And I don't want to think about the hoops I'd have to jump through in the case of a curried function with three arguments, or if I wanted named arguments. Now, you could very well argue that you don't use anonymous functions much anyway, so you're certainly not going to be doing tricks with currying, and you may be right. But then, of course, if you don't want to use them you don't have to.

However, I'm betting that before long it will be just another spanner in your toolbox along with all the other gadgets and goodies that Larry's shown us so far. Tools that you use without a second thought.

If you've ever done any academic computer science, you might have come across Functional Programming, in which case an awful lot of what's new in Perl 6 so far will be looking surprisingly familiar. The thing is, until now, Functional Programming has been seen as only of concern to academics and the kind of weirdoes who are daft enough to write Emacs extensions and, dammit, Perl doesn't need it. There is even an RFC to this effect.

I remember saying almost exactly the same thing about another language feature of 'purely academic interest' that got introduced with perl 5; the closure.

I don't know about you, but closures are old friends now; another tool that gets pulled out and used where appropriate, with hardly a second thought.

"Perl 6 doesn't give us anything that Perl 5 doesn't."

Yeah, and Perl 5 doesn't give us anything that a Universal Turing Machine, Intercal, or Python don't. We use it because it 'fits our brains'. The Perl 6 redesign is all about improving that fit.

"Apocalypse 3 is mostly mere syntactic sugar"

You say that like it's a bad thing. Perl's creed has always been to make the easy things easy and the hard things possible. In many ways, Perl 6 is going further than that: making hard things easy. And Apocalypse 3 continues this trend.

Well chosen syntactic sugar is good voodoo. It's Laziness with a capital L, and Laziness, as we all know, is a virtue.

Let's look at what we get in Apocalypse 3.

The hyper operator.

Well, this is just a foreach loop isn't it? Yes, but as Damian subsequently pointed out, would you rather write and maintain:

    my @relative;
    my $end = @max > @min ? @max > @mean ? $#max : $#mean
                          : @min > @mean ? $#min : $#mean;
    foreach my $i ( 0 .. $end ) {
        $relative[$i] = $mean[$i]) / ($max[$i] - $min[$i]);
    }

or

    my @relative = @mean ^/ (@max ^- @min);

In the second case, the intent of the code is clear. In the first it's obfuscated by the loop structure and set up code.

Chainable file test ops

Why hasn't it always worked like this? Ah yes, because the internals of perl 5 wouldn't allow for it. This is an example of the far reaching effects of some of the earlier Apocalypses giving us cool stuff.

Let's do the comparison again.

Perl 5:

    my @writable_files = grep {-f $_ && -r _ && -w _ && -x _} @files;

Perl 6:

    my @writable_files = grep { -r -w -x -f $_ } @files;

Shorter and clearer. Huzzah.

Binary ';'

This one's a bit odd. We've not yet seen half of what's going to be done with it, but I have the feeling that the multidimensional array mongers are going to have a field day.

=> is a pair builder

Mmmm... pairs. Lisp flashbacks! Well, yes. But if hashes are to become 'bags' of pairs, then it seems that hash keys won't be restricted to being simple strings. Which is brilliant. On more than several occasions I've found myself wanting to do something along the lines of

    $hash{$object}++;

and then later do

    @results = map  { $_.some_method }
               grep { $hash{$_} > 1 }
                   keys %hash;

Try doing that in Perl 5.

The use of pairs for named arguments to subroutines looks neat too, and should avoid the tedious hash setup that goes at the top of any subroutine that's going to accept named parameters in Perl 5.

Lazy lists

Lazy lists are cool, though I note that Damian couldn't squeeze a compelling example of their usage in Exegesis 3. For some applications they are a better mousetrap, and if you don't actually need them they're not going to get in your way. I'm not sure if Larry has confirmed it yet, but I do like the idea of being able to do:

    my($first,$second,$third) = grep is_prime($_), 2 .. Inf;

and have things stop when the first three primes have been found.

And having

    my($has_matches) = grep /.../ @big_long_list;

stop at the first match would be even better.

Logical operators propagating context

Where's the downside? This can only be a good thing. Multiway comparison 0 < $var <= 10 is another example of unalloyed goodness.

Backtracking operators

I'm not entirely sure I understand this yet. But it looks like it has the potential to be a remarkably powerful way of programming (just like regular expressions are, which do loads of backtracking). I have the feeling that parser writers are going to love this, and equation solvers, and...

But again, if you don't need the functionality, don't use it. It'll stay out of your way.

All operators get 'proper' function names.

This one almost had me punching the air. It's brilliant. Especially if, like me, you're the kind of person who goes slinging function references around. (One of the things that I really like about Ruby is its heady mix of functional style higher order functions and hard core object orientation. It looks like Perl's getting this too.)

Again, time to make with the examples. Consider the following perl code from an Assertion package (this is in Perl 6, it's too hard to write clearly in Perl 5).

    &assert_with_comparator := {
        unless ($^comparator.($^a, $^b)) {
            throw Exception::FailedComparison :
                comparator => $^comparator,
                result     => $^a,
                target     => $^b
        }
    }
    &assert_string_equals := assert_with_comparator(&operator:eq);
    &assert_num_equals    := assert_with_comparator(&operator:==);
    &assert_greater_than  := assert_with_comparator(&operator:>);

That's full strength Perl 6 that is, complete with currying, operators as functions, := binding, : used to disambiguate indirect object syntax, the whole nine yards. And it is still obviously a Perl program. The intent of the code is clear, even without comments, and it took very little time to write. Of course, I am assuming an Exception class, but we've already got that in Perl 5; take a look at the lovely Error.pm.

I'm not going to rewrite assert_with_comparator, but just look at the Perl 5 version of the last line of that example:

    *Assert::assert_greater_than =
        $assert_with_comparator->(sub { $_[0] > $_[1] });

Don't try and tell me that the intent is clearer in Perl 5 than in Perl 6, because I'll be forced to laugh at you.

binary and unary '.'

I confess that I'm still not sure I see where binary '.' is a win over '->', especially given that Larry has mandated that most of the time you won't even need it.

Unary '.' is looking really cool. If I read the Apocalypse right, this means that, instead of writing object methods like:

    sub method {
        my $self = shift;
        ...
        $self->{attribute} = $self->other_method(...);
        ...
    }

We can write:

    sub method {
        ...
        $.attribute = .other_method(...);
        ...
    }

Which is, once more clean, clear and perl like. This is the kind of notation I want Right Now. And, frankly, it'd just look silly if you replaced those '.'s with '->' (and should one parse $->attribute as an instance variable accessor, or as $- > attribute). Okay, I'm convinced. Replace '->' with '.' already.

Explicit stringification and numification operators

Again, these have got to be good magic, especially with the NaN stuff (though that's been the cause of some serious debate on perl6-language and may not be the eventual name). In at least one of the modules I'm involved in writing and maintaining, this would have been so useful:

    # geq: Generic equals
    sub operator:geq is prec(\&operator:eq($$)) ($expected, $got)
    {
        # Use numericness of $expected to determine which test to use
        if ( +$expected eq 'NaN') { return $expected eq $got }
        else                      { return $expected == $got }
    }
    sub assert_equals ($expected, $got; $comment)
    {
        $comment //= "Expected $expected, got $got";
        $expected geq $got or die $comment;
    }

Hey! That looks just like Perl! (Except that, to do the same thing in Perl 5, you have to jump through some splendidly non-obvious hoops. Trust me, I've done that.)

:=

This one had me scratching my head as I read the Apocalypse. On reading the Exegesis, things become a good deal clearer. := looks like it's going to be an easy way to export symbols from a module, now that typeglobs have gone away:

    package Foo;
    sub import ($class, @args) {
        # This is an example, ignore the args
        &{"${class}::foo"} := &Foo::foo;
    }

Of course, this isn't the only place where := will be used. Thankfully we'll be able to use it almost everywhere without having to remember all the caveats that used to surround assigning to typeglobs. Here's another example in Perl 6 of something that would be impossible in Perl 5:

    $Sunnydale{ScoobyGang}{Willow}{Traits} = [qw/cute geeky/];
    # Oooh Seasons 4 and 5 happened and I want
    # to use a trait object now
    $traits := $Sunnydale{ScoobyGang}{Willow}{Traits};
    $traits = new TraitCollection: qw/sexy witch lesbian geek/;

Nothing special there you say. Well, yes, but let's take a look at

    print $Sunnydale{ScoobyGang}{Willow}{Traits}
    # sexy witch lesbian geek
    # Or however a TraitCollection stringifies.

You can almost do this in Perl 5, but only if you continue to use an array:

    local *traits = $Sunnydale{ScoobyGang}{Willow}{Traits};
    @traits = qw/sexy witch lesbian geek/;

If you want to switch to using a TraitCollection, you'll have to go back and use the full specifier.

I think this is another of those bits of syntax that I'd like now, please.

Binary :

This is going to make life so much easier for the parser if nothing else. Right now, indirect object syntax can be very useful. However, if you've ever tried to use it in anger, well, you've ended up using it in anger because there are some subtle gotchas that will catch you out. Binary : lets us disambiguate many of these cases and helps to reclaim indirect object syntax as a useful way of working.

And so on... What's not to like? The sugar is sweet, the consistency is just right, and the old annoyances are going away.

"Perl 6 inspires fear."

Well, maybe. But it also just flat out inspires. If you don't believe me, take a look at the response to Perl 6 on CPAN. Damian's Attribute::Handlers successfully attempts to graft some of Perl 6's ease of manipulation of attributes back into Perl 5, and does a remarkably good job of it. Just look at all the really cool new modules that have sprung up around it. And that's just a small part of what we're going to get with Perl 6.

There are many new modules that exist only to 'mutate' perl5 behaviour -- NEXT, Hook::LexWrap, Aspect, Switch, Coro etc. I would argue that many of these have arisen in response to discussions about making Perl 6 a far more mutable language than Perl 5. And, if nothing else, these modules have gone some way to demonstrating that even now, Perl is more flexible than we ever realised.

Perl 6 is taking too long

I'm not quoting anyone else here; that's me complaining. I want it all and I want it now! But I also want a well thought out and coherent design. The choice between doing it Right and doing it Now is not a choice. Doing it Right is imperative.

The changes that Larry is making to the language will have far reaching and probably unforeseen consequences. But that's no reason for shying away from them. I've been programming in Perl for long enough to remember the transition from Perl 4 to Perl 5, and I remember delaying my own move to perl 5 for an embarrassingly long time. I didn't understand the new stuff in 5 and I hadn't a clue why anyone would want it, so I put off the move.

Eventually, I held my nose and jumped in. References were so cool. The new, 'real', data structures meant an end to contortions like:

    $hash{key} = join "\0", @list;
    # and later...
    @list = split /\0/, $hash{key};

Within a remarkably short space of time almost everything that had confused and scared me became almost second nature. Stuff that had been a complete pain in Perl 4 was a breeze Perl 5 (who remembers Oraperl now?) It seemed that all you had to remember was to change "pdcawley@bofh.org.uk" to "pdcawley\@bofh.org.uk".

The same thing is going to happen with Perl 6. Even if it doesn't, all those perl 5 binaries aren't going to disappear from the face of the earth.

What I'm most looking forward to are the gains we're going to see from Perl becoming easier to parse. Over in the Smalltalk world they have this almost magical thing called the 'Refactoring Browser' which is a very smart tool for messing with your source code. Imagine being able highlight a section of your code, then telling the browser to 'extract method'.

The browser then goes away, works out what parameters the new method will need, creates a brand new method which does the same thing as the selected code and replaces the selected section with a method call.

This is Deep Magic. Right now it's an almost impossible trick to pull off in Perl, because the only thing that knows how to parse Perl is perl. It is my fond hope that, once we get Perl 6, it's going to be possible to implement a Perl refactoring browser, and my work with ugly old code bases will become far, far easier.

But even if that particular magic wand never appears, Perl 6 is still going to give us new and powerful ways to do things, some of which we'd never even have tried to do before. Internally it's going to be fast and clean, and we're going to get real Garbage Collection at last. If Parrot fulfils its early promise, we may well see Perl users taking advantages of extensions written in Ruby, or Python, or Smalltalk. World peace will ensue! The lion will lay down with the lamb, and the camel shall abide with the serpent! Cats and dogs living together! Ahem.

Related Articles

Apocalypse 3

Exegesis 3

It's been a long strange trip from perl 1.0 to where we are today. Decisions have been taken that made sense at the time, which today see us lost in a twisty little maze of backward compatibility (or should that be a little twisty maze...). Anyone who looks at the source code for Perl 5 will tell you it's scary, overly complex, and a complete nightmare to maintain. And that's if they understand it.

Perl 6 is our chance to learn from Perl 5, but Perl 6 is also going to be Perl remade. If everything goes to plan (and I see no reason why it won't) we will arrive at Perl 6 with the crud jettisoned and the good stuff improved. We'll be driven by a gleaming, modern engine unfettered by the limitations of the old one. We'll have a shiny new syntax that builds on the best traditions of the old to give us something that is both brand new and comfortingly familiar.

And there, in the Captain's chair, you'll still find Larry, smiling his quiet smile, comfortable in the knowledge that, even if he doesn't know exactly where we're going, it'll be a lot of fun finding out. Over there, at the science officer's station, Damian is doing strange things with source filters, haikus and Quantum. A calm voice comes up from engineering; it's Dan, telling us that the new engines can take it. And at the helm Nat Torkington gently steers Perl 6 on her continuing mission towards new code and new implementations.

And Ensign Cawley? Well... there's a strange alien device called a refactoring browser. I'm going to be replicating one for Perl.

This Week on p5p 2001/10/21

This fortnight on perl5-porters (08-21 October 2001)

Notes

This Week on P5P

5.8.0 TODO

POD

Testing

${^TAINT}

AUTOLOAD and packages

B::Parrot

Miscellaneous

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

5.8.0 TODO

Jarkko posted a list of things to do before the 5.8.0 release (later, he posted another, shorter TODO list.) The main remaining issues are:

  • PerlIO, Threading, and Multiplicity
  • Attributes are Broken
  • Basic Unicode Support
  • FAQ Updates

Nick Ing-Simmons, who was unable to work on Perl for a while because of intellectual property restrictions at his new job, is back, and working very hard on fixing the PerlIO issues. Arthur Bergman has quietly but steadily been making threads::shared work well enough to be shipped with 5.8.0. Damian has suitably evil plans for attributes, which Arthur and I plan to assimilate into the core.

Jarkko, apparently working all alone, has made considerable progress towards making Perl comply with "Level 1 - Basic Unicode Support", as defined by the Unicode standard, TR#18 "Unicode Regular Expression Guidelines". Some features remain unimplemented, though, including character class subtraction and end-of-line matching.

The Perl FAQ needs extensive updates, to remove obsolete material and add new material where appropriate.

Any volunteers?

POD

In August, as he embarked on a rewrite of the POD documentation, Sean M. Burke said:

A markup language without a clear specification simply invites everyone, including implementors, to have their own shaky idea of what the language means. And that, along with the general tendency for markup language parsing to produce write-only code, explains much of the upsetting current state of Pod::* modules.

After several revisions, Sean has now released the final draft of his changes, which include a complete rewrite of the perlpod manpage (a POD reference suitable for module authors), and the new perlpodspec manpage (a more formal POD specification). The documents attempt to formalise best current practice in parsing and rendering POD, without introducing any radically new features. They were greeted with sighs of relief, and have been incorporated into perl-current.

This is an excellent step towards alleviating the upsetting state of the Pod::* modules, but there is still a lot of work to be done. Patches towards making the core documentation comply with the new specification are particularly welcome.

Brad Appleton (the author of Pod::Parser and Pod::Checker) may not be able to hack on his POD modules for a while, and welcomes help. I am sure many of the other module authors could use a hand with testing and updating their modules in light of perlpodspec.

If you want to help, or are interested in seeing what's going on, Sean recommends that you join the pod-people mailing list.

Testing

Perl has been doing better and better in tests, and H. Merijn Brand sums up the situation nicely:

    **Lot's* of 'O's :))

We got a full set of 'O's (OK) for various AIX and HP-UX systems with their native compilers (and later, with gcc 3 under HP-UX). Hal Morris reports that perl passes all tests under Linux/390, while a few failures remain under UTS and VMS.

More test reports from weird platforms are most welcome.

${^TAINT}

When Perl is running in taint mode (because it was started with the -T switch), a global variable (in the core) named PL_tainting is set to 1 (it is usually 0). Michael G. Schwern posted a patch to make this value accessible to Perl programs as ${^TAINT}. Jarkko said:

<troll>Why ${^TAINT} is read-only? It would be *so* convenient to ...</troll>

Of course, my HTML-hating email program stripped the <troll> tags, so I had to post a patch to make ${^TAINT} writable (thereby making local ${^TAINT} = 0 work as expected). The patch made Jarkko go temporarily blind (but not blind enough to accidentally apply it).

The magic_get()/set() functions in mg.c retrieve and set the values of special variables like ${^TAINT} and $^S (where "specialness" is defined by the is_gv_magical() function in gv.c). Both functions are big switch statements which try to recognise the name of a variable, and then take some appropriate action. Schwern's patch made magic_get() assign the value of PL_tainting to ${^TAINT} and make it read-only; mine removed the read-only markings, and taught magic_set() to update PL_tainting when ${TAINT} is assigned to.

It might be an interesting exercise to pick a special variable (say, $^W) and figure out what Perl does to get or set its value. Rusty grep skills can be further polished by figuring out when and where the get and set functions are called. If you're still bored, figure out how the value of PL_tainting is computed and used.

AUTOLOAD and packages

Paul Johnson posted a message about some weird AUTOLOAD behaviour. What should the following code do?

package P1;

    *P2::AUTOLOAD = sub {
        print "|$::AUTOLOAD|$P1::AUTOLOAD|$P2::AUTOLOAD|\n";
    };

    P2->foo();

Nobody was very sure, but perl-current behaves differently from 5.6.1, so something must be broken.

B::Parrot

Will somebody please write a B::Parrot module and post it to P5P, so that I can write about it in the summary? (Rather than just make things up, as certain unscrupulous individuals suggest I should do.)

Miscellaneous

Michael G. Schwern released Test::Simple 0.32. In the process, he introduced $ENV{PERL_CORE}, to enable modules distributed both in the core and on the CPAN to use the same tests everywhere, thereby simplifying life greatly.

Johan Vromans released Getopt::Long 2.26_02, a prerelease of version 2.27. Its guts have been redesigned to make room for expansion; testers and bug reports are particularly welcome.

Russ Allbery released podlators 1.11, with several bug-fixes and some new features. (Russ still has a pile of mail to dig through, and he hopes to release another version soon.)

Jarkko promptly assimilated all three, and found the time to release no less than seven snapshots.

Kirrily Robert also submitted final drafts of some new documentation: perlintro (a Perl introduction for beginners), and perlmodstyle (a discussion of the current "best practices" in writing modules).

Gisle Aas made a brief appearance to post an ExtUtils::MakeMaker patch which causes $ENV{PERL_MM_USE_DEFAULT}, when set, to instruct ExtUtils::MakeMaker::prompt to return the default answer to every question without waiting for input.

Randolph Werner ported perl to 64-bit Windows (!).

Perl 5.6.1 is now installed by default with HP-UX 11.00.

The perl5-changes mailing list records each patch to Perl when it is committed into the repository. It is highly recommended (Pumpking approved! 100% signal!) for people who want to keep in touch with perl-current. (Speaking of lists, a passing remark that P5P mail should have a munged Reply-To fortunately did not elicit any, er... strong opinions.)

Charles Lane fixed some niggling problems under VMS, and Craig A. Berry found some new ones. VMSperl now builds with PerlIO as the default, and it is possible to configure it for 64-bitness (although this causes a colourful test failure, evidently due to bugs in the system's gcvt() function, used to stringify numbers).

Mikhail Zabaluev patched perldoc to use File::Temp instead of creating temporary files in an ad-hoc fashion. Tim Jenness tried to convince him that he should fix various other things in perldoc, but Mikhail didn't fall for it. Does anyone else want to try? (One of the changes, to use Pod::Man rather than the pod2man program, is similar to Schwern's recent change to installhtml.)

Perlbug posted an intriguing, but somewhat hard-to-decipher overview of the bug database, before dying again. Several "Hello, is this thing on?" messages and a flood of pent-up bug reports later, it is reported to be working again.

Perl 5 had its seventh birthday on October 17th, 2001.


Abhijit Menon-Sen

Building a Large-scale E-commerce Site with Apache and mod_perl

Common Myths

Table of Contents

Roll Your Own Application Server

Case Study: eToys.com

Apache::PerlRun to the Rescue

Planning the New Architecture

Surviving Christmas 2000

The Architecture

Proxy Servers

Application Servers

Search Servers

Load Balancing and Failover

Code Structure

Caching

Session Tracking

Security

Exception Handling

Templates

Controller Example

Performance Tuning

Trap: Nested Exceptions

Berkeley DB

Valuable Tools

An Open-Source Success Story

When it comes to building a large e-commerce Web site, everyone is full of advice. Developers will tell you that only a site built in C++ or Java (depending on which they prefer) can scale up to handle heavy traffic. Application server vendors will insist that you need a packaged all-in-one solution for the software. Hardware vendors will tell you that you need the top-of-the-line mega-machines to run a large site. This is a story about how we built a large e-commerce site using mainly open-source software and commodity hardware. We did it, and you can do it, too.

Perl Saves

Perl has long been the preferred language for developing CGI scripts. It combines supreme flexibility with rapid development. Programming Perl is still O'Reilly's top-selling technical book, and community support abounds. Lately though, Perl has come under attack from certain quarters. Detractors claim that it's too slow for serious development work and that code written in Perl is too hard to maintain.

The mod_perl Apache module changes the whole performance picture for Perl. Embedding a Perl interpreter inside of Apache provides performance equivalent to Java servlets, and makes it an excellent choice for building large sites. Through the use of Perl's object-oriented features and some basic coding rules, you can build a set of code that is a pleasure to maintain, or at least no worse than other languages.

Roll Your Own Application Server

When you combine Apache, mod_perl and open-source code available from CPAN (the Comprehensive Perl Archive Network), you get a set of features equivalent to a commercial application server:

  • Session handling
  • Load balancing
  • Persistent database connections
  • Advanced HTML templating
  • Security

You also get some things you won't get from a commercial product, such as a direct line to the core development team through the appropriate mailing list and the ability to fix problems yourself instead of waiting for a patch. Moreover, each part of the system is under your control, making you limited only by your team's abilities.

Case Study: eToys.com

When we first arrived at eToys in 1999, we found a situation that is probably familiar to many who have joined a growing startup Internet company. The system was based on CGI scripts talking to a MySQL database. Static file serving and dynamic content generation were sharing resources on the same machines. The CGI code was largely written in a Perl4-ish style and not as modular as it could be; which was not surprising since most of it was built as quickly as possible by a small team.

Our major task was to figure out how to get this system to scale large enough to handle the expected Christmas traffic. The toy business is all about seasonality, and the difference between the peak selling season and the rest of the year is enormous. The site had barely survived the previous Christmas, and the MySQL database didn't look like it could scale much further.

The call had already been made to switch to Oracle, and a DBA team was in place. We didn't have enough time to do a redesign of the software, so we had to scramble to put in place whatever performance improvements we could finish by Christmas.

Apache::PerlRun to the Rescue

Apache::PerlRun is a module that exists to smooth the transition between basic CGI and mod_perl. It emulates a CGI environment, and provides some (but not all) of the performance benefits associated with code written for mod_perl. Using this module and the persistent database connections provided by Apache::DBI, we were able to do a basic port to mod_perl and Oracle in time for Christmas, and combined with some new hardware we were ready to face the Christmas rush.

The peak traffic lasted for eight weeks, most of which were spent frantically fixing things or nervously waiting for something else to break. Nevertheless, we made it through. During that time, we collected the following statistics:


O'Reilly Open Source Convention -- July 22-26, San Diego, CA.

From the Frontiers of Research to the Heart of the Enterprise

Efficient Shared Data for mod_perl
Perrin Harkins discusses the performance and ease of use of different options for sharing data that are available on CPAN, from IPC::Shareable to Cache::Cache at the O'Reilly Open Source Convention, this July 22-26, in San Diego.

  • 60 - 70,000 sessions/hour
  • 800,000 page views/hour
  • 7,000 orders/hour

According to Media Metrix, we were the third-most-heavily trafficked e-commerce site, behind eBay and Amazon.

Planning the New Architecture

It was clear that we would need to do a redesign for 2000. We had reached the limits of the current system and needed to tackle some of the harder problems that we had been holding off on.

Goals for the new system included moving away from offline page generation. The old system had been building HTML pages for each product and product category on the site in a batch job and dumping them out as static files. This was effective when we had a small database of products since the static files gave such good performance, but we had recently added a children's bookstore to the site that increased the size of our product database by an order of magnitude and made the time required to generate each page prohibitive. We needed a strategy that would only require us to build pages that customers were actually interested in and would still provide solid performance.

We also wanted to re-do the database schema for more flexibility, and structure the code in a more modular way that would make it easier for a team to share the development work without stepping on one another. We knew that the new codebase would have to be flexible enough to support a continuously evolving set of features.

Not all of the team had significant experience with object-oriented Perl, so we brought in Randal Schwartz and Damian Conway to do training sessions with us. We created a set of coding standards, drafted a design and built our system.

Surviving Christmas 2000

Our capacity planning was for three times the traffic of the previous peak. That's what we tested to, and that's about what we got:

  • 200,000+ sessions/hour
  • 2.5 million+ page views/hour
  • 20,000+ orders/hour

The software survived, although one of the routers went up in smoke. Once again, we were rated the third-most-highly trafficked e-commerce site for the season.

The Architecture

The machine strategy for the system is a fairly common one: low-cost Intel-based servers with a load-balancer in front of them, and big iron for the database.

Like many commercial packages, we have separate systems for the front-end Web servers (which we call proxy servers) and the application servers that generate the dynamic content. Both the proxy servers and the application servers are load-balanced using dedicated hardware from f5 Networks. More details about each of the systems shown here is provided below.

We chose to run Linux on our proxy and application servers, a common platform for mod_perl sites. The ease of remote administration under Linux made the clustered approach possible. Linux also provided solid security features and automated build capabilities to help with adding new servers.

The database servers are IBM NUMA-Q machines, which run DYNIX/ptx.

Proxy Servers

The proxy servers run a slim build of Apache, without mod_perl. They have several standard Apache modules installed, in addition to our own customized version of mod_session, which assigns session cookies. Because the processes are so small, we can run as many as 400 Apache children per machine. These servers handle all image requests themselves, and pass page requests on to the application servers. They communicate with the app servers using standard HTTP requests, and cache the page results when appropriate headers are sent from the app servers. The cached pages are stored on a shared NFS partition of a Network Appliance filer. Serving pages from the cache is nearly as fast as serving static files, i.e. very fast.

This kind of reverse-proxy setup is a commonly recommended approach when working with mod_perl, since it uses the lightweight proxy processes to send the content to clients (who may be on slow connections) and frees the resource-intensive mod_perl processes to move to the next request. For more information on why this configuration is helpful, see the mod_perl developer's guide at http://perl.apache.org/guide/.

Application Servers

The application servers run mod_perl, and little else. They have a local cache for Perl objects, using Berkeley DB. The Web applications run here, and shared resources like HTML templates are mounted over NFS from the NetApp filer. Because they do the heavy lifting in this setup, these machines are somewhat beefy, with dual CPUs and 1GB of RAM each.

Search Servers

There is a third set of machines dedicated to handling searches. Since searching was such a large percentage of overall traffic, it was worthwhile to dedicate resources to it and take the load off the application servers and database.

The software on these boxes is a multi-threaded daemon that we developed in-house using C++. The application servers talk to the search servers using a Perl module. The search daemon accepts a set of search conditions and returns a sorted list of object IDs of the products whose data fits those conditions. Then the application servers look up the data to display these products from the database. The search servers know nothing about HTML or the Web interface.

This approach of finding the IDs with the search server and then retrieving the object data may sound like a performance hit, but in practice the object data usually comes from the application server's cache rather than the database. This design allows us to minimize the duplicated data between the database and the search servers, making it easier and faster to refresh the index. It also allows us to reuse the same Perl code for retrieving product objects from the database, regardless of how they were found.

The daemon uses a standard inverted word list approach to searching. The index is periodically built from the relevant data in Oracle. If you prefer an all-Perl solution, then there are modules on CPAN that implement this approach, including Search::InvertedIndex and DBIx::FullTextSearch. We chose to write our own because of the tight performance requirements on this part of the system, and because we had an unusually complex set of sorting rules for the returned IDs.

Load Balancing and Failover

We took pains to make sure that we would be able to provide load-balancing among nodes of the cluster and fault-tolerance in case one or more nodes failed. The proxy servers are balanced using a random selection algorithm. A user might end up on a different one on each request. These servers don't hold any state information, so the goal is just to distribute the load evenly.

The application servers use "sticky" load balancing. That means that once a user goes to a particular app server, all of her subsequent requests during that session will also be passed to the same app server. The f5 hardware accomplishes this using browser cookies.

The load balancers run a periodic service check on each server and remove any servers that fail the check from rotation. When a server fails, all users that were "stuck" to that machine are moved to another one.

In order to ensure that no data is lost if an app server dies, all updates are written to the database. As a result, user data like the contents of a shopping cart is preserved even in cases of catastrophic hardware failure on an app server. This is essential for a large e-commerce site.

The database has a separate failover system, which we will not go into here. It follows standard practices recommended by our vendors.

Code Structure

The code is structured around the classic Model-View-Controller pattern, originally from SmallTalk and now often applied to Web applications. The MVC pattern is a way of splitting an application's responsibilities into three distinct layers.

Classes in the Model layer represent business concepts and data, like products or users. These have an API but no end-user interface. They know nothing about HTTP or HTML and can be used in non-Web applications like cron jobs. They talk to the database and other data sources, and manage their own persistence.

The Controller layer translates Web requests into appropriate actions on the Model layer. It handles parsing parameters, checking input, fetching the appropriate Model objects, and calling methods on them. Then it determines the appropriate View to use and sends the resulting HTML to the user.

View objects are really HTML templates. The Controller passes data from the Model objects to them and they generate a Web page. These are implemented with the Template Toolkit, a powerful templating system written in Perl. The templates have some basic conditional statements and looping in them, but only enough to express the formatting logic. No application control flow is embedded in the templates.

Caching

The core of the performance strategy is a multi-tier caching system. On the application servers, data objects are cached in shared memory with a backing store on local disk. Applications specify how long a data object can be out of sync with the database, and all future accesses during that time are served from the high-speed cache. This type of cache control is known as ``time-to-live.'' The local cache is implemented using a Berkeley DB database. Objects are serialized with the standard Storable module from CPAN.

Data objects are divided into pieces when necessary to provide finer granularity for expiration times. For example, product inventory is updated more frequently than other product data. By splitting up the product data, we can use a short expiration for inventory that keeps it in tighter sync with the database, while still using a longer expiration for the less volatile parts of the product data.

The application servers' object caches share product data between them using the IP Multicast protocol and custom daemons written in C. When a product is placed in the cache on one server, the data is replicated to the cache on all other servers. This technique is successful because of the high locality of access in product data. During the 2000 Christmas season this cache achieved a 99 percent hit ratio, thus taking a large amount of work off the database.

In addition to caching the data objects, entire pages that are not user-specific, like product detail pages, can be cached. The application takes the shortest expiration time of the data objects used in the pages and specifies that to the proxy servers as a page expiration time, using standard Expires headers. The proxy servers cache the generated page on a shared NFS partition. Pages served from this cache have performance close to that of static pages.

To allow for emergency fixes, we added a hook to mod_proxy that deletes the cached copy of a specified URL. This was used when a page needed to be changed immediately to fix incorrect information.

An extra advantage of this mod_proxy cache is the automatic handling of If-Modified-Since requests. We did not need to implement this ourselves since mod_proxy already provides it.

Session Tracking

Users are assigned session IDs using HTTP cookies. This is done at the proxy servers by our customized version of mod_session. Doing it at the proxy ensures that users accessing cached pages will still get a session ID assigned. The session ID is simply a key into data stored on the server-side. User sessions are assigned to an application server and continue to use that server unless it becomes unavailable. This is called "sticky" load balancing. Session data and other data modified by the user -- such as shopping cart contents -- is written to both the object cache and the database. The double write carries a slight performance penalty, but it allows for fast read access on subsequent requests without going back to the database. If a server failure causes a user to be moved to a different application server, then the data is simply fetched from the database again.

Security

A large e-commerce site is a popular target for all types of attacks. When designing such a system, you have to assume that you will be attacked and build with security in mind, at the application level as well as the machine level.

The main rule of thumb is "don't trust the client!" User-specific data sent to the client is protected using multiple levels of encryption. SSL keeps sensitive data exchanges private from anyone snooping on network traffic. To prevent "session hijacking" (when someone tampers with their session ID in order to gain access to another user's session), we include a Message Authentication Code (MAC) as part of the session cookie. This is generated using the standard Digest::SHA1 module from CPAN, with a seed phrase known only to our servers. By running the ID from the session cookie through this MAC algorithm, we can verify that the data being presented was generated by us and not tampered with.

In situations where we need to include some state information in an HTML form or URL and don't want it to be obvious to the user, we use the CPAN Crypt:: modules to encrypt and decrypt it. The Crypt::CBC module is a good place to start.

To protect against simple overload attacks, when someone uses a program to send high volumes of requests at our servers hoping to make them unavailable to customers, access to the application servers is controlled by a throttling program. The code is based on some work by Randal Schwartz in his Stonehenge::Throttle module. Accesses for each user are tracked in compact logs written to an NFS partition. The program enforces limits on how many requests a user can make within a certain period of time.

For more information on Web security concerns including the use of MAC, encryption and overload prevention, we recommend looking at the books CGI Programming with Perl, 2nd Edition and Writing Apache Modules with Perl and C, both from O'Reilly.

Exception Handling

When planning this system, we considered using Java as the implementation language. We decided to go with Perl, but we really missed Java's nice exception-handling features. Luckily, Graham Barr's Error module from CPAN supplies similar capabilities in Perl.

Perl already has support for trapping runtime errors and passing exception objects, but the Error module adds some nice syntactic sugar. The following code sample is typical of how we used the module:

    try {
        do_some_stuff();
    } catch My::Exception with {
        my $E = shift;
        handle_exception($E);
    };

The module allows you to create your own exception classes and trap for specific types of exceptions.

One nice benefit of this is the way it works with DBI. If you turn on DBI's RaiseError flag and use try blocks in places where you want to trap exceptions, the Error module can turn DBI errors into simple Error objects.

    try {
        $sth->execute();
    } catch Error with {
        # roll back and recover
        $dbh->rollback();
        # etc.
    };

This code shows a condition where an error would indicate that we should roll back a database transaction. In practice, most DBI errors indicate something unexpected happened with the database and the current action can't continue. Those exceptions are allowed to propagate up to a top-level try{} block that encloses the whole request. When errors are caught there, we log a stacktrace and send a friendly error page back to the user.

Templates

Both the HTML and the formatting logic for merging application data into it is stored in the templates. They use a CPAN module called Template Toolkit, which provides a simple but powerful syntax for accessing the Perl data structures passed to them by the application. In addition to basics such as looping and conditional statements, it provides extensive support for modularization, allowing the use of includes and macros to simplify template maintenance and avoid redundancy.

We found Template Toolkit to be an invaluable tool on this project. Our HTML coders picked it up quickly and were able to do nearly all of the templating work without help from the Perl coders. We supplied them with documentation of what data would be passed to each template and they did the rest. If you have never experienced the joy of telling a project manager that the HTML team can handle his requested changes without any help from you, then you are seriously missing out!

Template Toolkit compiles templates into Perl bytecode and caches them in memory to improve efficiency. When template files change on disk they are picked up and re-compiled. This is similar to how other mod_perl systems like Mason and Apache::Registry work.

By varying the template search path, we made it possible to assign templates to particular sections of the site, allowing a customized look and feel for specific areas. For example, the page header template in the bookstore section of the site can be different from the one in the video game store section. It is even possible to serve the same data with a different appearance in different parts of the site, allowing for co-branding of content.

This is a sample of what a basic loop looks like when coded in Template Toolkit:

    [% FOREACH item = cart.items %]
    name: [% item.name %]
    price: [% item.price %]
    [% END %]

Controller Example

Let's walk through a simple Hello World example that illustrates how the Model-View-Controller pattern is used in our code. We'll start with the controller code.

    package ESF::Control::Hello;
    use strict;
    use ESF::Control;
    @ESF::Control::Hello::ISA = qw(ESF::Control);
    use ESF::Util;
    sub handler {
        ### do some setup work
        my $class = shift;
        my $apr = ESF::Util->get_request();

        ### instantiate the model
        my $name = $apr->param('name');

        # we create a new Model::Hello object.
        my $hello = ESF::Model::Hello-E<gt>new(NAME =E<gt> $name);

        ### send out the view
        my $view_data{'hello'} = $hello->view();

        # the process_template() method is inherited
        # from the ESF::Control base class
        $class->process_template(
                TEMPLATE => 'hello.html',
                DATA     => \%view_data);
    }

In addition to the things you see here, there are a few interesting details about the ESF::Control base class. All requests are dispatched to the ESF::Control->run() method first, wrapping them in a try{} block before calling the appropriate handler() method. It also provides the process_template() method, which runs Template Toolkit and then sends the results with appropriate HTTP headers. If the Controller specifies it, then the headers can include Last-Modified and Expires for control of page caching by the proxy servers.

Now let's look at the corresponding Model code.

    package ESF::Model::Hello;
    use strict;
    sub new {
        my $class = shift;
        my %args = @_;
        my $self = bless {}, $class;
        $self{'name'} = $args{'NAME'} || 'World';
        return $self;
    }

    sub view {
        # the object itself will work for the view
        return shift;
    }

This is a simple Model object. Most Model objects would have some database and cache interaction. They would include a load() method that accepts an ID and loads the appropriate object state from the database. Model objects that can be modified by the application would also include a save() method.

Note that because of Perl's flexible OO style, it is not necessary to call new() when loading an object from the database. The load() and new() methods can both be constructors for use in different circumstances, both returning a blessed reference.

The load() method typically handles cache management as well as database access. Here's some pseudo-code showing a typical load() method:

    sub load {
        my $class = shift;
        my %args = @_;
        my $id = $args{'ID'};
        $self = _fetch_from_cache($id) ||
                _fetch_from_database($id);
        return $self;
    }

The save method would use the same approach in reverse, saving first to the cache and then to the database.

One final thing to notice about our Model class is the view() method. This method exists to give the object an opportunity to shuffle it's data around or create a separate data structure that is easier for use with a template. This can be used to hide a complex implementation from the template coders. For example, remember the partitioning of the product inventory data that we did to allow for separate cache expiration times? The product Model object is really a façade for several underlying implementation objects, but the view() method on that class consolidates the data for use by the templates.

To finish our Hello World example, we need a template to render the view. This one will do the job:

    <HTML>
    <TITLE>Hello, My Oyster</TITLE>
    <BODY>
        [% PROCESS header.html %]
        Hello [% hello.name %]!
        [% PROCESS footer.html %]
    </BODY>
    </HTML>

Performance Tuning

Since Perl code executes so quickly under mod_perl, the performance bottleneck is usually at the database. We applied all the documented tricks for improving DBD::Oracle performance. We used bind variables, prepare_cached(), Apache::DBI, and adjustments to the RowCache buffer size.

The big win of course is avoiding going to the database in the first place. The caching work we did had a huge impact on performance. Fetching product data from the Berkeley DB cache was about 10 times faster than fetching it from the database. Serving a product page from the proxy cache was about 10 times faster than generating it on the application server from cached data. Clearly, the site would never have survived under heavy load without the caching.

Partitioning the data objects was also a big win. We identified several different subsets of product data that could be loaded and cached independently. When an application needed product data, it could specify which subset was required and skip loading the unnecessary data from the database.

Another standard performance technique we followed was avoiding unnecessary object creation. The Template object is created the first time it's used and then cached for the life of the Apache process. Socket connections to search servers are cached in a way similar to what Apache::DBI does for database connections. Resources that are used frequently within the scope of a request, such as database handles and session objects, were cached in mod_perl's $r->pnotes() until the end of the request.

Trap: Nested Exceptions

When trying out a new technology like the Error module, there are bound to be some things to watch out for. We found a certain code structure that causes a memory leak every time it is executed. It involves nested try{} blocks, and looks like this:

    my $foo;
    try {
        # some stuff...
        try {
            $foo++;
            # more stuff...
        } catch Error with {
            # handle error
        };

    } catch Error with {
        # handle other error
    };

It's not Graham Barr's fault that this leaks; it is simply a byproduct of the fact that the try and catch keywords are implemented using anonymous subroutines. This code is equivalent to the following:

    my $foo;
    $subref1 = sub {
        $subref2 = sub {
            $foo++;
        };
    };

This nested subroutine creates a closure for $foo and will make a new copy of the variable each time it is executed. The situation is easy to avoid once you know to watch out for it.

Berkeley DB

One of the big wins in our architecture was the use of Berkeley DB. Since most people are not familiar with it's more advanced features, we'll give a brief overview here.

The DB_File module is part of the standard Perl distribution. However, it only supports the interface of Berkeley DB version 1.85, and doesn't include the interesting features of later releases. To get those, you'll need the BerkeleyDB.pm module, available from CPAN. This module can be tricky to build, but comprehensive instructions are included.

Newer versions of Berkeley DB offer many features that help performance in a mod_perl environment. To begin with, database files can be opened once at the start of the program and kept open, rather than opened and closed on each request. Berkeley DB will use a shared memory buffer to improve data access speed for all processes using the database. Concurrent access is directly supported with locking handled for you by the database. This is a huge win over DB_File, which requires you to do your own locking. Locks can be at a database level, or at a memory page level to allow multiple simultaneous writers. Transactions with rollback capability are also supported.

This all sounds too good to be true, but there are some downsides. The documentation is somewhat sparse, and you will probably need to refer to the C API if you need to understand how to do anything complicated.

A more serious problem is database corruption. When an Apache process using Berkeley DB dies from a hard kill or a segfault, it can corrupt the database. A corrupted database will sometimes cause subsequent opening attempts to hang. According to the people we talked to at Sleepycat Software (which provides commercial support for Berkeley DB), this can happen even with the transactional mode of operation. They are working on a way to fix the problem. In our case, none of the data stored in the cache was essential for operation so we were able to simply clear it out when restarting an application server.

Another thing to watch out for is deadlocks. If you use the page-level locking option, then you have to handle deadlocks. There is a daemon included in the distribution that will watch for deadlocks and fix them, or you can handle them yourself using the C API.

After trying a few different things, we recommend that you use database-level locking. It's much simpler, and cured our problems. We didn't see any significant performance hit from switching to this mode of locking. The one thing you need to watch out for when using exclusive database level write locks are long operations with cursors that tie up the database. We split up some of our operations into multiple writes in order to avoid this problem.

If you have a good C coder on your team, you may want to try the alternate approach that we finally ended up with. You can write your own daemon around Berkeley DB and use it in a client/server style over Unix sockets. This allows you to catch signals and ensure a safe shutdown. You can also write your own deadlock handling code this way.

Valuable Tools

If you plan to do any serious Perl development, then you should really take the time to become familiar with some of the available development tools. The debugger in particular is a lifesaver, and it works with mod_perl. There is a profiler called Devel::DProf, which also works with mod_perl. It's definitely the place to start when performance tuning your application.

We found the ability to run our complete system on individual's workstations to be extremely useful. Everyone could develop on his own machine, and coordinate changes using CVS source control.

For object modeling and design, we used the open-source Dia program and Rational Rose. Both support working with UML and are great for generating pretty class diagrams for your cubicle walls.

Do Try This at Home

Since we started this project, a number of development frameworks that offer support for this kind of architecture have come out. We don't have direct experience using these, but they have a similar design and may prove useful to you if you want to take an MVC approach with your system.

Apache::PageKit is a mod_perl module available from CPAN that provides a basic MVC structure for Web applications. It uses the HTML::Template module for building views.

OpenInteract is a recently released Web application framework in Perl, which works together with the persistence layer SPOPS. Both are available from CPAN.

The Application Toolkit from Extropia is a comprehensive set of Perl classes for building Web apps. It has excellent documentation and takes good advantage of existing CPAN modules. You can find it on http://www.extropia.com/.

If you want a ready-to-use cache module, take a look at the Perl-cache project on http://sourceforge.net/. This is the next generation of the popular File::Cache module.

The Java world has many options as well. The Struts framework, part of the Jakarta project, is a good open-source choice. There are also commercial products from several vendors that follow this sort of design. Top contenders include ATG Dynamo, BEA WebLogic, and IBM WebSphere.

An Open-Source Success Story

By building on the open-source software and community, we were able to create a top-tier Web site with a minimum of cost and effort. The system we ended up with is scalable to huge amounts of traffic. It runs on mostly commodity hardware making it easy to grow when the need arises. Perhaps best of all, it provided tremendous learning opportunities for our developers, and made us a part of the larger development community.

We've contributed patches from our work back to various open-source projects, and provided help on mailing lists. We'd like to take this opportunity to officially thank the open-source developers who contributed to projects mentioned here. Without them, this would not have been possible. We also have to thank the hardworking Web developers at eToys. The store may be closed, but the talent that built it lives on.

If you have questions about this material, you can contact us at the following e-mail addresses:

Bill Hilf - bill@hilfworks.com

Perrin Harkins - perrin@elem.com

This Week on Perl 6 (7 - 13 October 2001)

This Week In Perl 6 (7 - 13 Oct 2001)



Notes

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to bwarnock@capita.com.

There were 419 messages across 92 threads, with 74 authors contributing.

NaN is Not a Number, and Not Not a Number

(61 posts) Tim Conrow pointed out an inconsistency between Perl 6's proposed semantics of NaN and the IEEE semantics found in other languages. (The subsequent debate consisted mostly of each side trying to convince the other how Wrong it was.)

The non-identity of NaN makes sense in strongly type languages where a numerical entity can only hold a number. There, the IEEE and mathematical notion of NaN isn't so much that it isn't a number, but that it is a number that just can't be represented. (As an integer, floating point number, or +- infinity.) In this context, there's an infinitely small probability that some non-representable number is numerically equivalent to some other non-representable number, so the non-identity makes sense.

However, for languages (such as Perl) where numerics aren't a strongly-typed unique entity, you've really two NaN properties to consider. You've got, as above, two Nans-That-Really-Are-Numbers-But-Can't-Be-Represented-As-Such, but then you've also got NaNs that aren't numbers because they simply aren't numbers.

In the latter case, it's less of the mathematical notion of NaN, with its associated non-identity, than with non-numeric strings having a generic notion of NaN. (That's "Not A Number", as in, "That's Not a Number, It's a String.") Given that notion, is makes sense for NaNs to have an identity. Two non-numeric strings are equally Not A Number.

The confusion, as you can see, is how to merge these two rather orthogonal definitions of What is Not a Number into one language, and discussion continues.

Numerical Strings

(14 posts) A spin-off of the above thread mused about extending some of Perl's numerical string recognition capabilities (such as 1_000_000 for one million) inside strings themselves, and perhaps extending them to handle standard metric suffixes. Several syntaxes and ramifications were discussed, and a couple apparently clean solutions proposed.

Reduction

(18 posts) John Williams asked whether the hyperoperator ( ^ ) would handle reduction:

I just read Apocalypse and Exegesis 3, and something stuck out at me because of its omission, namely using hyper operators for reduction.

$a ^+= @list; # should sum the elements of @list

Larry says @a ^+ 1 will replicate the scalar value for all a's, and Damian talks about doing summation with the reduce function without mentioning this compact syntax.

So am I overlooking some obvious reason why this won't work?

Damian Conway replied that no, it should indeed work, but would require the lvalue to force reduction, which reduce does not. (Without the lvalue, the scalar value would be extended to match the list.)

Parroty Bits

(16 posts) Dan Sugalski gave a quick rundown on Parrot Magic Cookies (PMCs) and how they work.

(18 posts) There was some debate about jump versus branch, and what each instruction is supposed to accomplish. Dan Sugalski:

Absolute addresses are, well, absolute addresses. Relative addresses are offsets from the current location.

I think the confusion's because the jump opcode's broken. When you say

jump 12

It should jump to absolute address 12, not 12 bytes/words/opcodes from the current position.

Subsequent discussion attempted to discern which absolute address - the absolute "virtual" address, similar to what the branch operator uses, or the real machine address of an operator. The current inclinations are for the latter, as that allows Parrot to jump between code segments.

Code Changes

Last Words

Work on Parrot 0.0.3 continues.


Bryan C. Warnock

This Week on p5p 2001/10/07

Notes

This Week on P5P

chromatic

Call for PerlIO bugs

Attributes are broken

Multiple FETCHes

Code cleanups

Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

chromatic

P5P is in awe of the patience and test writing ability of chromatic, who has, in a very short time, given us exhaustive new tests for B::Terse, Dumpvalue, CGI::Fast, CGI::Push, CPAN::Nox, Term::Cap, Tie::Scalar, Term::Complete, ExtUtils::Command, less, open, filetest, sigtrap, warnings::register, and a bunch of patches to older tests. Thank you!

Call for PerlIO bugs

Nicholas Clark found some people at a London.pm meeting who knew of unfixed PerlIO bugs, and suggested the following call-for-bugs:

If you are aware of a PerlIO bug that perl5-porters isn't aware of, please would you tell perl5-porters about it, preferably in the form of a regression test that will print "ok\n" once it is fixed. Else don't be surprised if 5.8.0 ships with the bug still present.

(While you're at it, report any other lurking bugs you know about.)

Attributes are broken

Arthur Bergman explained again why attributes are broken . The problem is that the MODIFY_*_ATTRIBUTE subs (used to set attributes for a variable) are called at compile time. Thus, in the following code:

    sub MODIFY_SCALAR_ATTRIBUTE {
        ...
        tie $obj, $class;
    }

    for (1..10) {
        my $foo : bar = "BAZ";

        ...
    }

The variable $foo would be tied only on the first iteration of the loop, after which the scope cleanup mechanism would remove its magic, and the variable would remain untied for the remaining iterations.

Multiple FETCHes

Jeff 'japhy' Pinyan attacked some instances of FETCH being called multiple times on tied variables (for $tied++, $tied || $untied), thus causing potentially undesirable side-effects (Read an analysis of the problem and the proposed solution). Several people pointed out the problems with this approach, but no better solution has been found yet.

Jeff also wanted to optimise grep when called in boolean context ( if (grep EXPR, LIST)) to avoid iterating over the entire LIST, and just return the first time EXPR was true, but the potentially desirable side-effects of EXPR make this optimisation infeasible.

Code cleanups

Casey West embarked on a pod/* cleanup, to make the example code use strict where appropriate, be generally pleasing to the eye, and easy to understand. Despite this noble goal, his patches were met with some suspicion.

Jeff Pinyan said the example shouldn't use $a, the sort variable, but Jarkko didn't want to encourage cargo-cult programming by punishing the poor innocent $a for the vagaries of sort. Abigail didn't think the patches accomplished anything useful, but Jarkko said that consistency and following our own suggestions (in perlstyle) are obviously good things.

Casey posted an explanation:

I feel that my style is a generally acceptable one that everyone can understand. In that light, I don't think it's a bad thing if I make some minor changes (single versus double quotes) in the process of making a whole document better.

Nobody said anything after that.

Various

Andy Dougherty suggested an optional make torture_test target to run tests which were not portable. (Schwern hasn't noticed yet.)

Arthur Bergman is running the LXR cross referencing tools on the perl source. The result: PXR, the Perl Cross Reference. It will eventually contain cross references of perl-current, the latest stable release, and the Parrot source.

Brian Ingerson (of Inline fame) posted his first P5P patch, implementing the oft-requested Data::Dumper feature to allow hash keys to be sorted. He even got the indentation right the first time!

Craig Berry provided a patch to fix %ENV tainting on VMS. (Only five tests fail on VMS now, and a couple of them are bogus failures.)

Jarkko released two snapshots: 12307 and 12340. (The latest snapshot can be retrieved via rsync from "ftp.funet.fi::perlsnap".)

Michael Schwern sought a philosophical solution to the finding the path to the running perl interpreter. Instead, he found that the contents of $^X are system dependent when perl is invoked by a #!-line.

Nicholas Clark continued to submit lots of little patches, one of which became change #12345. This is entirely fitting, because Nick is a hoopy frood who knows exactly where his towel is.

Paul Marquess and Robin Barker are investigating a DB_File bug involving filtered keys.

Stas Bekman is hunting down interesting MakeMaker (and related) bugs. He also found a scoping problem in the warnings pragma.

Finally, P5P was specially selected to receive a free mini-vacation to any of fifteen exciting destinations, along with several other people whose email addresses contained "per."


Abhijit Menon-Sen

Filtering Mail with PerlMx

Introduction

PerlMx is a utility by ActiveState that allows Perl programs to interface with Sendmail connections. It's quite a powerful tool, and once installed, it's very easy to use. This article will detail how to install and setup PerlMx, and provide an overview both of what you can do with PerlMx, and how to do it. This overview will be based on spamNX, the anti-spam code I developed, available at https://sourceforge.net/projects/spamnx. My next PerlMx article will go through spamNX in depth, to demonstrate how to harness the power of PerlMx.

Prerequisites

PerlMx is made possible by the excellent Milter code provided in Sendmail versions 8.10.0 and higher. This code, when compiled into Sendmail, allows external programs to hook into the Sendmail connection process via C callbacks. PerlMx passes these C hooks to the Perl interpreter, where you can access the information with a simple shift.

Versions 8.12 and higher of Sendmail enable Milter by default. In prior versions, you must first enable the code in Sendmail. To do so, go to the devtools/site directory off of the Sendmail source code. Add the following lines to your site.config.m4 file:

dnl Milter
APPENDDEF(`conf_sendmail_ENVDEF', `-D_FFR_MILTER=1')
APPENDDEF(`conf_libmilter_ENVDEF', `-D_FFR_MILTER=1')

Now compile and install Sendmail. Once installed, add the following lines to your config.mc file (again, for Sendmail below version 8.12):

define(`_FFR_MILTER','1')dnl
INPUT_MAIL_FILTER(`<filter_name>', `S=inet:3366@localhost, F=T')

Be warned that if you enable Milter in your configuration file, all Sendmail connections will fail unless PerlMx is running. So wait until your code is ready to go before you change your config file.

Your Sendmail installation is now ready to go. PerlMx also needs Perl 5.6.0 or higher, with ithreads enabled, and cannot have big integer (nor really big) support enabled. Prior to 5.6.1, PerlMx will also need the File-Temp module installed. With Perl properly configured, run the installation program provided by ActiveState.

Once PerlMx is installed and your code is ready to go, run:

pmx <package> &

To launch PerlMx. At this point, you can safely turn on the Milter code in your Sendmail configuration. There are a few command-line options available for PerlMx. Most are unnecessary, but I've found I need to allow more than the default five threads. To do so, run:

pmx -r <# of threads> <package> &

You can read about all the available options by running pmx -h.

ActiveState has an FAQ available if you run into any trouble or have questions that aren't covered here.

Sendmail, Perl, and PerlMx should now be installed and ready to go. The following section provides an overview of how to write PerlMx code.

Programming Overview

PerlMx allows you to hook into the connection at many stages of the connection, which are called by PerlMx from your code. I'll now go through an outline of spamNX, to show how to start using PerlMx.

package spamNX;
use strict;
use PerlMx;

The basic building blocks: give your package a name, and tell it to use PerlMx. Code executed here is only run once, so you can open a database connection or perform similar operations here.

sub new {
  return bless {
    NAME    => "spamNX",
    CONNECT => \&cb_connect,
    HELO    => \&cb_helo,
    ENVFROM => \&cb_from,
    ENVRCPT => \&cb_rcpt,
    HEADER  => \&cb_header,
    BODY    => \&cb_body,
    EOM     => \&cb_eom,
    ABORT   => \&cb_abort,
    CLOSE   => \&cb_close,
  }, shift;
}

This sub returns a blessed instance of your package's subroutines. Each key in the hash relates to a specific point in the SMTP connection, and the associated function is called at that point in the connection. Your program may define any or all of the above functions; if it isn't defined in your blessed return, PerlMx won't try to run any code.

Upon each function call, an object reference and relevant information are passed to the function via the argument list. The object reference allows calls to various methods and is stateful, allowing you to store information for use later in the connection. The information provided ranges from the name of the connecting host to the body of the message. These functions, and their associated arguments, are discussed below, in order of when they're called. The cb_helo function, which is relatively simple, will be discussed in detail to familiarize you with how a PerlMx function is written.

The first function called is cb_connect. This function is called for every Sendmail connection, and is passed the standard object reference, the name that the connecting host claims to be, and the real IP address of the host.

sub cb_connect {
  # Standard object reference
  my $ctx = shift;
  # (Possibly faked) name of the connecting host
  my $host = shift;
  # IP address of the connectin host
  my $ip = shift;
}

This is followed by cb_helo:

sub cb_helo {
  # Standard object reference
  my $ctx = shift;
  # Connecting host's HELO
  my $helo = shift;
  # Get the name of the host from the $ctx object reference
  my $host = $ctx->{host};
  # Store the HELO in the ctx object for any future reference
  $ctx->{helo} = $helo;

  # Compare the host's real name to what they provide in the HELO
  if ($host =~ /$helo/) {
    # Since the RFC spec's that the HELO should be the connecting 
    # host's domain name, reject the mail if the HELO doesn't at 
    # all match the hostname
    return SMFIS_REJECT;
  }

  # Continue with the connection, including any further function calls
  return SMFIS_CONTINUE;
}

The preceding code looks at the value the connecting host provides for a HELO. If that value does not appear in the host's actual name, then the mail is rejected. Otherwise, the connection continues. Note that there are other possible return values. They are:

SMFIS_ACCEPT -- accept the message without perfoming any further PerlMx checks.
SMFIS_TEMPFAIL -- returns a temporary failure, for cases such as hostname lookup failure.
SMFIS_DISCARD -- accepts the message, but silently drops it in the bit bucket. Not usually a good idea.

Related to the SMFIS returns, the $ctx->setreply($code, $xcode, $message) method is available to each callback, that sets the response code to $code, the extended response to $xcode, and the response message to $message.

The next function called is cb_from, which processes the sender's MAIL FROM.

sub cb_from {
  # Object ref
  my $ctx = shift;
  # MAIL FROM value
  my $from = shift
}

Next is cb_rcpt, which is called once per recipient. Each call is passed the value of one RCPT TO call. To keep track of every recipient, create an array in the ctx object reference with each recipient's address.

sub cb_rcpt {
  # Object ref
  my $ctx = shift;
  # Recipient, a la RCPT TO: line
  my $rcpt = shift;

  # Store the recipient's name
  push(@{$ctx->{rcpts}}, $rcpt);
}

The next code block, cb_header, is the first part of code that doesn't deal directly with the connection or an SMTP protocol field. This code is called during the DATA portion of the SMTP connection, prior to the blank line that seperates headers from body. This function is called once per header. As the Received lines are often multi-lined, the value that you are passed may be folded. You can use Mail::Header, by Graham Barr, to unfold multi-lined headers into one line.

sub cb_header {
  # Object ref
  my $ctx = shift;
  # The header name (e.g. "Cc" or "Received")
  my $name = shift;
  # The header value (i.e. the latter half of XYZ: abc)
  my $value = shift;
}

We next come to cb_body, which is also called during the DATA block, after the headers have finished. The code here processes the actual text of the email message -- very useful for virus scanning or natural language searches. cb_body may be called repeatedly for large bodies of text. A caveat to keep in mind for the future: the built-in replacebody function, which replaces the text of a message, behaves similarly. On the first call, the message body is replaced. On all following calls, the text is appended to the new message. More on this potentially frustrating feature in my second article.

sub cb_body {
  # Object ref
  my $ctx = shift;
  # Message body
  my $body = shift;
}

After the text of the message is sent, the SMTP connection is complete, and cb_eom is called. This block of code is not passed any special variables outside of the standard object reference, however there are a few functions available here that are not available at any other point in the code -- the replacebody function, for example. The special functions are:

$ctx->replacebody($body) -- Replaces the body of a message with given text. As noted above, can be called multiple times for large chunks of data.
$ctx->addrcpt($recipient) -- Adds a message recipient.
$ctx->delrcpt($recipient) -- Removes a message recipient.
$ctx->addheader($name, $value) -- Adds a header to the message.
$ctx->chgheader($name, $index, $value) -- Changes the (1-based) Nth occurence of the named header to the given value.

sub cb_eom {
  # Object ref
  my $ctx = shift;
}

After the connection is completed, and cb_eom finishes, the cb_close function is called. This call does not get passed anything aside from the object ref, and simply provides a means to clean up the ctx object reference and anything else you may have done during the connection. cb_close is virtually identical to cb_abort (which I won't detail, since they're essentially the same); the only difference is that cb_abort is called when a connection is prematurely closed.

sub cb_close {
  # Object ref
  my $ctx = shift;

  # Don't keep a reference to previous recipients
  undef $ctx->{rcpts};
}

Conclusion

We've now walked through an outline of a complete PerlMx program, and seen how to configure Sendmail and Perl to use PerlMx. In my next article, I'll walk through the interesting bits of my spamNX code, to demonstrate the power of PerlMx and some tricks that are available to programmers.

Exegesis 3

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

Diamond lives (context-aware);
Underscore space means concatenate; fat comma means pair;
A pre-star will flatten; colon-equals will bind;
And binary slash-slash yields left-most defined.
-- Sade, "Smooth operator" (Perl 6 remix)

In Apocalypse 3, Larry describes the changes that Perl 6 will make to operators and their operations. As with all the Apocalypses, only the new and different are presented -- just remember that the vast majority of operator-related syntax and semantics will stay precisely as they are in Perl 5.

For example...

To better understand those new and different aspects of Perl 6 operators, let's consider the following program. Suppose we wanted to locate a particular data file in one or more directories, read the first four lines of each such file, report and update their information, and write them back to disk.

We could do that with this:

    sub load_data ($filename ; $version, *@dirpath) {
        $version //= 1;
        @dirpath //= @last_dirpath // @std_dirpath // '.';
        @dirpath ^=~ s{([^/])$}{$1/};
        my %data;
        foreach my $prefix (@dirpath) {
            my $filepath = $prefix _ $filename;
            if (-w -r -e $filepath  and  100 < -s $filepath <= 1e6) {
                my $fh = open $filepath : mode=>'rw' 
                    or die "Something screwy with $filepath: $!";
                my ($name, $vers, $status, $costs) = <$fh>;
                next if $vers < $version;
                $costs = [split /\s+/, $costs];
                %data{$filepath}{qw(fh name vers stat costs)} =
                                ($fh, $name, $vers, $status, $costs);
            }
        }
        return %data;
    }
    my @StartOfFile is const = (0,0);
    sub save_data ( %data) {
        foreach my $data (values %data) {
            my $rest = <$data.{fh}.irs(undef)>
            seek $data.{fh}: *@StartOfFile;
            truncate $data.{fh}: 0;
            $data.{fh}.ofs("\n");
            print $data.{fh}: $data.{qw(name vers stat)}, _@{$data.{costs}}, $rest;
         }
    }
    my %data = load_data(filename=>'weblog', version=>1);
    my $is_active_bit is const = 0x0080;
    foreach my $file (keys %data) {
        print "$file contains data on %data{$file}{name}\n";
        %data{$file}{stat} = %data{$file}{stat} ~ $is_active_bit;
        my @costs := @%data{$file}{costs};
        my $inflation;
        print "Inflation rate: " and $inflation = +<>
            until $inflation != NaN;
        @costs = map  { $_.value }
                 sort { $a.key <=> $b.key }
                 map  { amortize($_) => $_ }
                        @costs ^* $inflation;
        my sub operator:∑ is prec(\&operator:+($)) (*@list : $filter //= undef) {
               reduce {$^a+$^b}  ($filter ?? grep &$filter, @list :: @list);
        }
        print "Total expenditure: $( ∑ @costs )\n";
        print "Major expenditure: $( ∑ @costs : {$^_ >= 1000} )\n";
        print "Minor expenditure: $( ∑ @costs : {$^_ < 1000} )\n";
        print "Odd expenditures: @costs[1..Inf:2]\n"; 
    }
    save_data(%data, log => {name=>'metalog', vers=>1, costs=>[], stat=>0});

I was bound under a flattening star

The first subroutine takes a filename and (optionally) a version number and a list of directories to search:

    sub load_data ($filename ; $version, *@dirpath) {
Note that the directory path parameter is declared as *@dirpath, not @dirpath. In Perl 6, declaring a parameter as an array (i.e @dirpath) causes Perl to expect the corresponding argument will be an actual array (or an array reference), not just any old list of values. In other words, a @ parameter in Perl 6 is like a \@ context specifier in Perl 5.

To allow @dirpath to accept a list of arguments, we have to use the list context specifier -- unary * -- to tell Perl to "slurp up" any remaining arguments into the @dirpath parameter.

This slurping-up process consists of flattening any arguments that are arrays or hashes, and then assigning the resulting list of values, together with any other scalar arguments, to the array (i.e. to @dirpath in this example). In other words, a *@ parameter in Perl 6 is like a @ context specifier in Perl 5.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

It's a setup

In Perl 5, it's not uncommon to see people using the ||= operator to set up default values for subroutine parameters or input data:

    $offset ||= 1;
    $suffix ||= $last_suffix || $default_suffix || '.txt';
    # etc.

Related Articles

Apocalypse 1

Apocalypse 2

Exegesis 2

Apocalypse 3

Of course, unless you're sure of your range of values, this can go horribly wrong -- specifically, if the variable being initialized already has a valid value that Perl happens to consider false (i.e if $suffix or $last_suffix or $default_suffix contained an empty string, or the offset really was meant to be zero).

So people have been forced to write default initializers like this:

    $offset = 1 unless defined $offset;

which is OK for a single alternative, but quickly becomes unwieldy when there are several alternatives:

    $suffix = $last_suffix    unless defined $suffix;
    $suffix = $default_suffix unless defined $suffix;
    $suffix = '.txt'          unless defined $suffix;

Perl 6 introduces a binary 'default' operator -- // -- that solves this problem. The default operator evaluates to its left operand if that operand is defined, otherwise it evaluates to its right operand. When chained together, a sequence of // operators evaluates to the first operand in the sequence that is defined. And, of course, the assignment variant -- //= -- only assigns to its lvalue if that lvalue is currently undefined.

The symbol for the operator was chosen to be reminiscent of a ||, but one that's taking a slightly different angle on things.

So &load_data ensures that its parameters have sensible defaults like this:

    $version //= 1;
    @dirpath //= @last_dirpath // @std_dirpath // '.';
Note that it will also be possible to provide default values directly in the specification of optional parameters, probably like this:
    sub load_data ($filename ; $version //= 1, *@dirpath //= @std_dirpath) {...}

...and context for all

As if it weren't broken enough already, there's another nasty problem with using || to build default initializers in Perl 5. Namely, that it doesn't work quite as one might expect for arrays or hashes either.

If you write:

    @last_mailing_list = ('me', 'my@shadow');
    # and later...
    @mailing_list = @last_mailing_list || @std_mailing_list;
then you get a nasty surprise: In Perl 5, || (and &&, for that matter) always evaluates its left argument in scalar context. And in a scalar context an array evaluates to the number of elements it contains, so @last_mailing_list evaluates to 2. And that's what's assigned to @mailing_list instead of the actual two elements.

Perl 6 fixes that problem, too. In Perl 6, both sides of an || (or a && or a //) are evaluated in the same context as the complete expression. That means, in the example above, @last_mailing_list is evaluated in list context, so its two elements are assigned to @mailing_list, as expected.

Substitute our vector, Victor!

The next step in &load_data is to ensure that each path in @dirpath ends in a directory separator. In Perl 5, we might do that with:
    s{([^/])$}{$1/} foreach @dirpath;
but Perl 6 gives us another alternative: hyper-operators.

Normally, when an array is an operand of a unary or binary operator, it is evaluated in the scalar context imposed by the operator and yields a single result. For example, if we execute:

    $account_balance   = @credits + @debits;
    $biblical_metaphor = @sheep - @goats;
then $account_balance gets the total number of credits plus the number of debits, and $biblical_metaphor gets the numerical difference between the number of @sheep and @goats.

That's fine, but this scalar coercion also happens when the operation is in a list context:

    @account_balances   = @credits + @debits;
    @biblical_metaphors = @sheep - @goats;
Many people find it counter-intuitive that these statements each produce the same scalar result as before and then assign it as the single element of the respective lvalue arrays.

It would be more reasonable to expect these to act like:

    # Perl 5 code...
    @account_balances   =
            map { $credits[$_] + $debits[$_] } 0..max($#credits,$#debits);
    @biblical_metaphors = 
            map { $sheep[$_] - $goats[$_] } 0..max($#sheep,$#goats);
That is, to apply the operation element-by-element, pairwise along the two arrays.

Perl 6 makes that possible, though not by changing the list context behavior of the existing operators. Instead, Perl 6 provides a "vector" version of each binary operator. Each uses the same symbol as the corresponding scalar operator, but with a caret (^) dangled in front of it. Hence to get the one-to-one addition of corresponding credits and debits, and the list of differences between pairs of sheep and goats, we can write:

    @account_balances   = @credits ^+ @debits;
    @biblical_metaphors = @sheep ^- @goats;

This works for all unary and binary operators, including those that are user-defined. If the two arguments are of different lengths, the operator Does What You Mean (which, depending on the operator, might involve padding with ones, zeroes or undef's, or throwing an exception).

If one of the arguments is a scalar, that operand is replicated as many times as is necessary. For example:

    @interest = @account_balances ^* $interest_rate;
Which brings us back to the problem of appending those directory separators. The "pattern association" operator (=~) can also be vectorized by prepending a caret, so we can apply the necessary substitution to each element in the @dirpath array like this:
    @dirpath ^=~ s{([^/])$}{$1/};

(Pre)fixing those filenames

Having ensured everything is set up correctly, &load_data then processes each candidate file in turn, accumulating data as it goes:
    my %data;
    foreach my $prefix (@dirpath) {
The first step is to create the full file path, by prefixing the current directory path to the basic filename:
        my $filepath = $prefix _ $filename;
And here we see the new Perl 6 string concatenation operator: underscore. And yes, we realize it's going to take time to get used to. It may help to think of it as the old dot operator under extreme acceleration.

Underscore is still a valid identifier character, so you need to be careful about spacing it from a preceding or following identifier (just as you've always have with the x or eq operators):

    # Perl 6 code                   # Meaning
    $name = getTitle _ getName;     # getTitle() . getName()
    $name = getTitle_ getName;      # getTitle_(getName())
    $name = getTitle _getName;      # getTitle(_getName())
    $name = getTitle_getName;       # getTitle_getName()
In Perl 6, there's also a unary form of _. We'll get to that a little later.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

Don't break the chain

Of course, we only want to load the file's data if the file exists, is readable and writable, and isn't too big or too small (say, no less than 100 bytes and no more than a million). In Perl 5 that would be:
    if (-e $filepath  &&  -r $filepath  &&  -w $filepath  and
        100 < -s $filepath  &&  -s $filepath <= 1e6) {...
which has far too many &&'s and $filepath's for its own good.

In Perl 6, the same set of tests can be considerably abbreviated by taking advantage of two new types of operator chaining:

    if (-w -r -e $filepath  and  100 < -s $filepath <= 1e6) {...
First, the -X file test operators now all return a special object that evaluates true or false in a boolean context but is really an encapsulated stat buffer, to which subsequent file tests can be applied. So now you can put as many file tests as you like in front of a single filename or filehandle and they must all be true for the whole expression to be true. Note that because these are really nested calls to the various file tests (i.e. -w(-r(-e($filepath)))), the series of tests are effectively evaluated in right-to-left order.

The test of the file size uses another new form of chaining that Perl 6 supports: multiway comparisons. An expression like 100 < -s $filepath <= 1e6 isn't even legal Perl 5, but it Does The Right Thing in Perl 6. More importantly, it short-circuits if the first comparison fails and will evaluate each operand only once.

Open for business

Having verified the file's suitability, we open it for reading and writing:
    my $fh = open $filepath : mode=>'rw' 
        or die "Something screwy with $filepath: $!";
The : mode=>'rw' is an adverbial modifier on the open. We'll see more adverbs shortly.

The $! variable is exactly what you think it is: a container for the last system error message. It's also considerably more than you think it is, since it's also taken over the roles of $? and $@, to become the One True Error Variable.

Applied laziness 101

Contrary to earlier rumors, the "diamond" input operator is alive and well and living in Perl 6 (yes, the Perl Ministry of Truth is even now rewriting Apocalypse 2 to correct the ... err ... "printing error" ... that announced <> would be purged from the language).

So we can happily proceed to read in four lines of data:

    my ($name, $vers, $status, $costs) = <$fh>;
Now, writing something like this is a common Perl 5 mistake -- the list context imposed by the list of lvalues induces <$fh> to read the entire file, create a list of (possibly hundreds of thousands of) lines, assign the first four to the specified variables, and throw the rest away. That's rarely the desired effect.

In Perl 6, this statement works as it should. That is, it works out how many values the lvalue list is actually expecting and then reads only that many lines from the file.

Of course, if we'd written:

    my ($name, $vers, $status, $costs, @and_the_rest) = <$fh>;
then the entire file would have been read.

And now for something completely the same (well, almost)

Apart from the new sigil syntax (i.e. hashes now keep their % signs no matter what they're doing), the remainder of &load_data is exactly as it would have been if we'd written it in Perl 5.

We skip to the next file if the current file's version is wrong. Otherwise, we split the costs line into an array of whitespace-delimited values, and then save everything (including the still-open filehandle) in a nested hash within %data:

            next if $vers < $version;
            $costs = [split /\s+/, $costs];
            %data{$filepath}{qw(fh name vers stat costs)} =
                          ($fh, $name, $vers, $status, $costs);
            }
        }
Then, once we've iterated over all the directories in @dirpath, we return the accumulated data:
        return %data;
    }

The virtue of constancy

Perl 6 variables can be used as constants:
    my @StartOfFile is const = (0,0);
which is a great way to give logical names to literal values, but ensure that those named values aren't accidentally changed in some other part of the code.

Writing it back

When the data is eventually saved, we'll be passing it to the &save_data subroutine in a hash. If we expected the hash to be a real hash variable (or a reference to one), we'd write:
    sub save_data (%data) {...
But since we want to allow for the possibility that the hash is created on the fly (e.g. from a hash-like list of values), we need to use the slurp-it-all-up list context asterisk again:
    sub save_data (*%data) {...

From each according to its ability ...

We then grab each datum for each file with the usual foreach ... values ... construct:
        foreach my $data (values %data) {
and go about saving the data to file.

Your all-in-one input supplier

Because the Perl 6 "diamond" operator can take an arbitrary expression as its argument, it's possible to set a filehandle to read an entire file and do the actual reading, all in a single statement:
    my $rest = <$data.{fh}.irs(undef)>
The variable $data stores a reference to a hash, so to dereference it and access the 'fh' entry, we use the Perl 6 dereferencing operator (dot) and write: $data.{fh}. In practice, we could leave out the operator and just write $data{fh}, since Perl can infer from the $ sigil that we're accessing the hash through a reference held in a scalar. In fact, in Perl 6 the only place you must use an explicit . dereferencer is in a method call. But it never hurts to say exactly what you mean, and there's certainly no difference in performance if you do choose to use the dot.

The .irs(undef) method call then sets the input record separator of the filehandle (i.e. the Perl 6 equivalent of $/) to undef, causing the next read operation to return the remaining contents of the file. And because the filehandle's irs method returns its own invocant -- i.e. the filehandle reference -- the entire expression can be used within the angle brackets of the read.

A variation on this technique allows a Perl program to do a shell-like read-from-filename just as easily:

    my $next_line = <open $filename or die>;
or, indeed, to read the whole file:
    my $all_lines = < open $filename : irs=>undef >;

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

Seek and ye shall flatten

Having grabbed the entire file, we now rewind and truncate it, in preparation for writing it back:
    seek $data.{fh}: *@StartOfFile;
    truncate $data.{fh}: 0;
You're probably wondering what's with the asterisk ... unless you've ever tried to write:
    seek $filehandle, @where_and_whence;
in Perl 5 and gotten back the annoying "Not enough arguments for seek" exception. The problem is that seek expects three distinct scalars as arguments (as if it had a Perl 5 prototype of seek($$$)), and it's too fastidious to flatten the proffered array in order to get them.

It's handy to wrap the magical 0,0 arguments of the seek in a single array (so we no longer have to remember this particular incantation), but to use such an array in Perl 5 we would then have to write:

    seek $data->{fh}, $StartOfFile[0], $StartOfFile[1];    # Perl 5
In Perl 6 that's not a problem, because we have * -- the list context specifier. When used in an argument list, it takes whatever you give it (typically an array or hash) and flattens it. So:
    seek $data.{fh}: *@StartOfFile;                        # Perl 6
massages the single array into a list of two scalars, as seek requires.

Oh, and yes, that is the adverbial colon again. In Perl 6, seek and truncate are both methods of filehandle objects. So we can either call them as:

    $data.{fh}.seek(*@StartOfFile);
    $data.{fh}.truncate(0);
Or use the "indirect object" syntax:
    seek $data.{fh}: *@StartOfFile;
    truncate $data.{fh}: 0;
And that's where the colon comes in. Another of its many uses in Perl 6 is to separate "indirect object" arguments (e.g. filehandles) from the rest of the argument list. The main place you'll see colons guarding indirect objects is in print statements (as described in the next section).

It is written...

Finally, &save_data has everything ready and can write the four fields and the rest of the file back to disk. First, it sets the output field separator for the filehandle (i.e. the equivalent of Perl 5's $, variable) to inject newlines between elements:

    $data.{fh}.ofs("\n");
Then it prints the fields to the filehandle:
    print $data.{fh}: $data.{qw(name vers stat)}, _@{$data.{costs}}, $rest;

Note the use of the adverbial colon after $data.{fh} to separate the filehandle argument from the items to be printed. The colon is required because it's how Perl 6 eliminates the nasty ambiguity inherent in the "indirect object" syntax. In Perl 5, something like:

    print foo bar;

could conceivably mean:

    print {foo} (bar);    # Perl 5: print result of bar() to filehandle foo

or

    print ( foo(bar) );   # Perl 5: print foo() of bar() to default filehandle

or even:

    print ( bar->foo );   # Perl 5: call method foo() on object returned by
                          #         bar() and print result to default filehandle

In Perl 6, there is no confusion, because each indirect object must followed by a colon. So in Perl 6:

    print foo bar;

can only mean:

    print ( foo(bar) );   # Perl 6: print foo() of bar() to default filehandle

and to get the other two meanings we'd have to write:

    print foo: bar;       # Perl 6: print result of bar() to filehandle foo()
                          #         (foo() not foo, since there are no
                          #          bareword filehandles in Perl 6)

and:

    print foo bar: ;      # Perl 6: call method foo() on object returned by
                          #         bar() and print result to default filehandle

In fact, the colon has an even wider range of use, as a general-purpose "adverb marker"; a notion we will explore more fully below.

String 'em up together

The printed arguments are: a hash slice:

    $data.{qw(name vers stat)},
a stringified dereferenced nested array:
     _@{$data.{costs}},
and a scalar:
    $rest;
The new hash slice syntax was explained in the previous Apocalypse/Exegesis, and the scalar is just a scalar, but what was the middle thing again?

Well, $data.{costs} is just a regular Perl 6 access to the 'costs' entry of the hash referred to by $data. That entry contains the array reference that was the result of splitting $cost in &load_data.

So to get the actual array itself, we can prefix the array reference with a @ sigil (though, technically, we don't have to: in Perl 6 arrays and array references are interchangeable in scalar context).

That gives us @{$data.{costs}}. The only remaining difficulty is that when we print the list of items produced by @{$data.{costs}}, they are subject to the output field separator. Which we just set to newline.

But what we want is for them to appear on the same line, with a space between each.

Well ... evaluating a list in a string context does precisely that, so we could just write:

    "@{$data.{costs}}"    # evaluate array in string context
But Perl 6 has another alternative to offer us -- the unary underscore operator. Binary underscore is string concatenation, so it shouldn't be too surprising that unary underscore is the stringification operator (think: concatenation with a null string). Prefixing any expression with an underscore forces it to be evaluated in string context:
    _@{$data{costs}}     # evaluate array in string context
Which, in this case, conveniently inserts the required spaces between the elements of the costs array.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

A parameter by any other name

Now that the I/O is organized, we can get down to the actual processing. First, we load the data:
    my %data = load_data(filename=>'weblog', version=>1);
Note that we're using named arguments here. This attempt would blow up badly in Perl 5, because we didn't set &load_data up to expect a hash-like list of arguments. But it works fine in Perl 6 for two reasons:
  1. Because we did set up &load_data with named parameters; and
  2. Because the => operator isn't in Kansas anymore.
In Perl 5, => is just an up-market comma with a single minor talent: It stringifies its left operand if that operand is a bareword.

In Perl 6, => is a fully-fledged anonymous object constructor -- like [...] and {...}. The objects it constructs are called "pairs" and they consist of a key (the left operand of the =>), and a value (the right operand). The key is still stringified if it's a valid identifier, but both the key and the value can be any kind of Perl data structure. They are accessed via the pair object's key and value methods:

    my $pair_ref = [1..9] => "digit";
    print $pair_ref.value;      # prints "digit"
    print $pair_ref.key.[3];    # prints 4
So, rather than getting four arguments:
    load_data('filename', 'weblog', 'version', 1);    # Perl 5 semantics
&load_data gets just two arguments, each of which is a reference to a pair:
    load_data( $pair_ref1, $pair_ref2);               # Perl 6 semantics
When the subroutine dispatch mechanism detects one or more pairs as arguments to a subroutine with named parameters, it examines the keys of the pairs and binds their values to the correspondingly named parameters -- no matter what order the paired arguments originally appeared in. Any remaining non-pair arguments are then bound to the remaining parameters in left-to-right order.

So we could call &load_data in any of the following ways:

    load_data(filename=>'weblog', version=>1);  # named
    load_data(version=>1, filename=>'weblog');  # named (order doesn't matter)
    load_data('weblog', 1);                     # positional (order matters)
There are numerous other uses for pairs, one of which we'll see shortly.

Please queue for processing

Having loaded the data, we go into a loop and iterate over each file's information. First, we announce the file and its internal name:
    foreach my $file (keys %data) {
        print "$file contains data on %data{$file}{name}\n";

The Xor-twist

Then we toggle the "is active" status bit (the eighth bit) for each file. To flip that single bit without changing any of the other status bits, we bitwise-xor the status bitset against the bitstring 0000000010000000. Each bit xor'd against a zero stays as it is (0 xor 0 --> 0; 1 xor 0 --> 1), while xor'ing the eighth bit against 1 complements it (0 xor 1 --> 1; 1 xor 1 --> 0).

But because the caret has been appropriated as the Perl 6 hyper-operator prefix, it will no longer be used as bitwise xor. Instead, binary tilde will be used:

    %data{$file}{stat} = %data{$file}{stat} ~ $is_active_bit;
This is actually an improvement in syntactic consistency since bitwise xor (now binary ~) and bitwise complement (still unary ~) are mathematically related: ~x is (-1~x).

Note that we could have used the assignment variant of binary ~:

    %data{$file}{stat} ~= $is_active_bit;     # flip only bit 8 of status bitset
but that's probably best avoided due to its confusability with the much commoner "pattern association" operator:
    %data{$file}{stat} =~ $is_active_bit;     # match if status bitset is "128"
By the way, there is also a high precedence logical xor operator in Perl 6. You guessed it: ~~. This finally fills the strange gap in Perl's logical operator set:
        Binary (low) | Binary (high) |    Bitwise
       ______________|_______________|_____________
                     |               |
            or       |      ||       |      |
                     |               |
            and      |      &&       |      &
                     |               |
            xor      |      ~~       |      ~
                     |               |

And it will also help to reduce programmer stress by allowing us to write:

    $make = $money ~~ $fast;
instead of (the clearly over-excited):
    $make = !$money != !$fast;

Bound for glory

In both Perl 5 and 6, it's possible to create an alias for a variable. For example, the subroutine:
    sub increment { $_[0]++ }           # Perl 5
    sub increment { @_[0]++ }           # Perl 6
works because the elements of @_ become aliases for whatever variable is passed as their corresponding argument. Similarly, one can use a for to implement a Pascal-ish with:
    for my $age ( $person[$n]{data}{personal}{time_dependent}{age} ) {
        if    ($age < 12) { print "Child" }
        elsif ($age < 18) { print "Adolescent" }
        elsif ($age < 25) { print "Junior citizen" }
        elsif ($age < 65) { print "Citizen" }
        else              { print "Senior citizen" }
    }
Perl 6 provides a more direct mechanism for aliasing one variable to another in this way: the := (or "binding") operator. For example, we could rewrite the previous example like so in Perl 6:
    my $age := $person[$n]{data}{personal}{time_dependent}{age};
    if    ($age < 12) { print "Child" }
    elsif ($age < 18) { print "Adolescent" }
    elsif ($age < 25) { print "Junior citizen" }
    elsif ($age < 65) { print "Citizen" }
    else              { print "Senior citizen" }
Bound aliases are particularly useful for temporarily giving a conveniently short identifier to a variable with a long or complex name. Scalars, arrays, hashes and even subroutines may all be given less sequipedalian names in this way:
        my   @list := @They::never::would::be::missed::No_never_would_be_missed;
        our  %plan := %{$planning.[$planner].{planned}.[$planet]};
        temp &rule := &FulfilMyGrandMegalomanicalDestinyBwahHaHaHaaaa;
In our example program, we use aliasing to avoid having to write @%data{$file}{costs} everywhere:
    my @costs := @%data{$file}{costs};
An important feature of the binding operator is that the lvalue (or lvalues) on the left side form a context specification for the rvalue (or rvalues) on the right side. It's as if the lvalues were the parameters of an invisible subroutine, and the rvalues were the corresponding arguments being passed to it. So, for example, we could also have written:
    my @costs := %data{$file}{costs};
(i.e. without the @ dereferencer) because the lvalue expects an array as the corresponding rvalue, so Perl 6 automatically dereferences the array reference in %data{$file}{costs} to provide that.

More interestingly, if we have both lvalue and rvalue lists, then each of the rvalues is evaluated in the context specified by its corresponding lvalue. For example:

    (@x, @y) := (@a, @b);
aliases @x to @a, and @y to @b, because @'s on the left act like @ parameters, which require -- and bind to -- an unflattened array as their corresponding argument. Likewise:
    ($x, %y, @z) := (1, {b=>2}, %c{list});
binds $x to the value 1 (i.e. $x becomes a constant), %y to the anonymous hash constructed by {b=>2}, and @z to the array referred to by %c{list}. In other words, it's the same set of bindings we'd see if we wrote:
    sub foo($x, %y, @z) {...}
    foo(1, {b=>2}, %c{list});
except that the := binding takes effect in the current scope.

And because := works that way, we can also use the flattening operator (unary *) on either side of such bindings. For example:

    (@x, *@y) := (@a, $b, @c, %d);
aliases @x to @a, and causes @y to bind to the remainder of the lvalues -- by flattening out $b, @c, and %d into a list and then slurping up all their components together.

Note that @y is still an alias for those various slurped components. So @y[0] is an alias for $b, @y[1..@c.length] are aliases for the elements of @c, and the remaining elements of @y are aliases for the interlaced keys and values of %d.

When the star is on the other side of the binding, as in:

    ($x, $y) := (*@a);
then @a is flattened before it is bound, so $x becomes an alias for @a[0] and $y becomes an alias for @a[1].

The binding operator will have many uses in Perl 6 (most of which we probably haven't even thought of yet), but one of the commonest will almost certainly be as an easy way to swap two arrays efficiently:

    (@x, @y) := (@y, @x);
Yet another way to think about the binding operator is to consider it as a sanitized version of those dreaded Perl 5 typeglob assignments. That is:
    $age := $person[$n]{data}{personal}{time_dependent}{age};
is the same as Perl 5's:
    *age = \$person->[$n]{data}{personal}{time_dependent}{age};
except that it also works if $age is declared as a lexical.

Oh, and binding is much safer than typeglobbing was, because it explicitly requires that $person[$n]{data}{personal}{time_dependent}{age} evaluate to a scalar, whereas the Perl 5 typeglob version would happily (and silently!) replace @age, %age, or even &age if the rvalue happened to produce a reference to an array, hash, or subroutine instead of a scalar.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

Better living through sigils

We should also note that the binding of the @costs array:
    my @costs := @%data{$file}{costs};
shows yet another case where Perl 6's sigil semantics are much DWIM-mier than those of Perl 5.

In Perl 5 we would probably have written that as:

        local *costs = \ @$data{$file}{costs};
and then spent some considerable time puzzling out why it wasn't working, before realising that we'd actually meant:
        local *costs = \ @{$data{$file}{costs}};
instead.

That's because, in Perl 5, the precedence of a hash key is relatively low, so:

    @$data{$file}{costs}    # means: @{$data}{$file}{costs}
                            # i.e. (invalid attempt to) access the 'costs'
                            # key of a one-element slice of the hash
                            # referred to by $data
                            # problem is: slices don't have hash keys
whereas:
    @{$data{$file}{costs}}  # means: @{ $data{$file}{costs} }
                            # i.e. dereference of array referred to by
                            # $data{$file}{costs}
The problem simply doesn't arise in Perl 6, where the two would be written quite distinctly, as:
    %data{@($file)}{costs}  # means: (%data{@($file)}).{costs}
                            # (still an error in Perl 6)
and:
    @%data{$file}{costs}    # means: @{ %data{$file}{costs} }
                            # i.e. dereference of array referred to by
                            # %data{$file}{costs}
respectively.

That's not a number...now that's a number!

One of the perennial problems with Perl 5 is how to read in a number. Or rather, how to read in a string...and then be sure that it contains a valid number. Currently, most people read in the string and then either just assume it's a number (optimism) or use the regexes found in perlfaq4 or Regexp::Common to make sure (cynicism).

Perl 6 offers a simpler, built-in mechanism.

Just as the unary version of binary underscore (_) is Perl 6's explicit stringification specifier, so to the unary version of binary plus is Perl 6's explicit numerifier. That is, prefixing an expression with unary + evaluates that expression in a numeric context. Furthermore, if the expression has to be coerced from a string and the string does not begin with a valid number, the stringification operator returns NaN, the not-a-number value.

That makes it particularly easy to read in numeric data reliably:

    my $inflation;
    print "Inflation rate: " and $inflation = +<>
        until $inflation != NaN;
The unary + takes the string returned by <> and converts it to a number. Or, if the string can't be interpreted as a number, + returns NaN. Then we just go back and try again until we do get a valid number.

Note that these new semantics for unary + are a little different from its role in Perl 5, where it is just the identity operator. In Perl 5 it's occasionally used to disambiguate constructs like:

    print  ($x + $y) * $z;        # in Perl 5 means: ( print($x+$y) ) * $z;
    print +($x + $y) * $z;        # in Perl 5 means: print( ($x+$y) * $z );
To get the same effect in Perl 6, we'd use the adverbial colon instead:
    print   ($x + $y) * $z;        # in Perl 6 means: ( print($x+$y) ) * $z;
    print : ($x + $y) * $z;        # in Perl 6 means: print( ($x+$y) * $z );

Schwartzian pairs

Another handy use for pairs is as a natural data structure for implementing the Schwartzian Transform. This caching technique is used when sorting a large list of values according to some expensive function on those values. Rather than writing:
    my @sorted = sort { expensive($a) <=> expensive($b) } @unsorted;
and recomputing the same expensive function every time each value is compared during the sort, we can precompute the function on each value once. We then pass both the original value and its computed value to sort, use the computed value as the key on which to sort the list, but then return the original value as the result. Like this:
    my @sorted =                        # step 4: store sorted originals
        map  { $_.[0] }                 # step 3: extract original
        sort { $a.[1] <=> $b.[1] }      # step 2: sort on computed
        map  { [$_, expensive($_) ] }   # step 1: cache original and computed
            @unsorted;                  # step 0: take originals
The use of arrays can make such transforms hard to read (and to maintain), so people sometimes use hashes instead:
    my @sorted =                        
        map  { $_.{original} }             
        sort { $a.{computed} <=> $b.{computed} } 
        map  { {original=>$_, computed=>expensive($_)} }   
            @unsorted;
That improves the readability, but at the expense of performance. Pairs are an ideal way to get the readability of hashes but with (probably) even better performance than arrays:
    my @sorted =                        
        map  { $_.value }             
        sort { $a.key <=> $b.key }  
        map  { expensive($_) => $_ }     
            @unsorted;
Or in the case of our example program:
    @costs = map  { $_.value }
             sort { $a.key <=> $b.key }
             map  { amortize($_) => $_ }
                 @costs ^* $inflation;
Note that we also used a hyper-multiplication (^*) to multiply each cost individually by the rate of inflation before sorting them. That's equivalent to writing:
    @costs = map  { $_.value }
             sort { $a.key <=> $b.key }
             map  { amortize($_) => $_ }
             map  { $_ * $inflation }
                 @costs;
but spares us from the burden of yet another map.

More importantly, because @costs is an alias for @%data{$file}{costs}, when we assign the sorted list back to @costs, we're actually assigning it back into the appropriate sub-entry of %data.

The ∑ of all our fears

Perl 6 will probably have a built-in sum operator, but we might still prefer to build our own for a couple of reasons. Firstly sum is obviously far too long a name for so fundamental an operation; it really should be . Secondly, we may want to extend the basic summation functionality somehow. For instance, by allowing the user to specify a filter and only summing those arguments that the filter lets through.

Perl 6 allows us to create our own operators. Their names can be any combination of characters from the Unicode set. So it's relatively easy to build ourselves a operator:

    my sub operator:∑ is prec(\&operator:+($)) (*@list) {
        reduce {$^a+$^b} @list;
    }
We declare the operator as a lexically scoped subroutine. The lexical scoping eases the syntactic burden on the parser, the semantic burden on other unrelated parts of the code, and the cognitive burden on the programmer.

The operator subroutine's name is always operator:whatever_symbols_we_want. In this case, that's operator:∑, but it can be any sequence of Unicode characters, including alphanumerics:

        my sub operator:*#@& is prec(\&operator:\)  (STR $x) {
                return "darn $x";
        }
        my sub operator:† is prec(\&CORE::kill)  (*@tIHoH) {
                kill(9, @tIHoH) == @tIHoH or die "batlhHa'";
                return "Qapla!";
        }
        my sub operator:EQ is prec(\&operator:eq)  ($a, $b) {
                return $a eq $b                 # stringishly equal strings
                    || $a == $b != NaN;         # numerically equal numbers
        }
        # and then:
        warn *#@& "QeH!" unless E<dagger> $puq EQ "Qapla!";

Did you notice that cunning $a == $b != NaN test in operator:EQ? This lovely Perl 6 idiom solves the problem of numerical comparisons between non-numeric strings.

In Perl 5, a comparison like:

        $a = "a string";
        $b = "another string";
        print "huh?" if $a == $b;

will unexpectedly succeed (and silently too, if you run without warnings), because the non-numeric values of both the scalars are converted to zero in the numeric context of the ==.

But in Perl 6, non-numeric strings numerify to NaN. So, using Perl 6's multiway comparison feature, we can add an extra != NaN to the equality test to ensure that we compared genuine numbers.

Meanwhile, we also have to specify a precedence for each new operator we define. We do that with the is prec trait of the subroutine. The precedence is specified in terms of the precedence of some existing operator; in this case, in terms of Perl's built-in unary +:

    my sub operator:∑ is prec( \&operator:+($) )
To do this, we give the is prec trait a reference to teh existing operator. Note that, because there are two overloaded forms of operator:+ (unary and binary) of different precedences, to get the reference to the correct one we need to specify its complete signature (its name and parameter types) as part of the enreferencing operation. The ability to take references to signatures is a standard feature in Perl 6, since ordinary subroutines can also be overloaded, and may need the same kind of disambiguation when enreferenced.

If the operator had been binary, we might also have had to specify its associativity (left, right, or non), using the is assoc trait.

Note too that we specified the parameter of operator:∑ with a flattening asterisk, since we want @list to slurp up any series of values passed to it, rather than being restricted to accepting only actual array variables as arguments.

The implementation of operator:∑ is very simple: we just apply the built-in reduce function to the list, reducing each successive pair of elements by adding them.

Note that we used a higher-order function to specify the addition operation. Larry has decided that the syntax for higher-order functions requires that implicit parameters be specified with a $^ sigil (or @^ or %^, as appropriate) and that the whole expression be enclosed in braces.

So now we have a operator:

    $result = ∑ $wins, $losses, $ties;
but it doesn't yet provide a way to filter its values. Normally, that would present a difficulty with an operator like , whose *@list argument will gobble up every argument we give it, leaving no way -- except convention -- to distinguish the filter from the data.

But Perl 6 allows any subroutine -- not just built-ins like print -- to take one or more "adverbs" in addition to its normal arguments. This provides a second channel by which to transmit information to a subroutine. Typically that information will be used to modify the behaviour of the subroutine (hence the name "adverb"). And that's exactly what we need in order to pass a filter to .

A subroutine's adverbs are specified as part of its normal parameter list, but separated from its regular parameters by a colon:

    my sub operator:∑ is prec(\&operator:+($)) ( *@list : $filter //= undef) {...
This specifies that operator:∑ can take a single scalar adverb, which is bound to the parameter $filter. When there is no adverb specified in the call, $filter is default-assigned the value undef.

We then modify the body of the subroutine to pre-filter the list through a grep, but only if a filter is provided:

        reduce {$^a+$^b}  ($filter ?? grep &$filter, @list :: @list);
    }
The ?? and :: are the new way we write the old ?: ternary operator in Perl 6. Larry had to change the spelling because he needed the single colon for marking adverbs. But it's a change for the better anyway --it was rather odd that all the other short-circuiting logical operators (&& and || and //) used doubled symbols, but the conditional operator didn't. Well, now it does. The doubling also helps it stand out better in code, in part because it forces you to put space around the :: so that it's not confused with a package name separator.

You might also be wondering about the ambiguity of ??, which in Perl 5 already represents an empty regular expression with question-mark delimiters. Fortunately, Perl 6 won't be riddled with the nasty ?...? regex construct, so there's no ambiguity at all.

Adverbial semantics can be defined for any Perl 6 subroutine. For example:

    sub mean (*@values : $type //= 'arithmetic') {
        given ($type) {
            when 'arithmetic': { return sum(@values) / @values; }
            when 'geometric':  { return product(@values) ** (1/@values) }
            when 'harmonic':   { return @values / sum( @values ^** -1 ) }
            when 'quadratic':  { return (sum(@values ^** 2) / @values) ** 0.5 }
        }
        croak "Unknown type of mean: '$type'";
    }
Adverbs will probably become widely used for passing this type of "out-of-band" behavioural modifier to subroutines that take an unspecified number of data arguments.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.

Would you like an adverb with that?

OK, so now our operator can take a modifying filter. How exactly do we pass that filter to it?

As described earlier, the colon is used to introduce adverbial arguments into the argument list of a subroutine or operator. So to do a normal summation we write:

    $sum = ∑ @costs;
whilst to do a filtered summation we place the filter after a colon at the end of the regular argument list:
    $sum = ∑ @costs : sub {$_ >= 1000};
or, more elegantly, using a higher-order function:
    $sum = ∑ @costs : {$^_ >= 1000};
Any arguments after the colon are bound to the parameters specified by the subroutine's adverbial parameter list.

Note that the example also demonstrates that we can interpolate the results of the various summations directly into output strings. We do this using Perl 6's scalar interpolation mechanism ($(...)), like so:

    print "Total expenditure: $( ∑ @costs )\n";
    print "Major expenditure: $( ∑ @costs : {$^_ >= 1000} )\n";
    print "Minor expenditure: $( ∑ @costs : {$^_ < 1000} )\n";

The odd lazy step

Finally (and only because we can), we print out a list of every second element of @costs. There are numerous ways to do that in Perl 6, but the cutest is to use a lazy, infinite, stepped list of indices in a regular slicing operation.

In Perl 6, any list of values created with the .. operator is created lazily. That is, the .. operator doesn't actually build a list of all the values in the specified range, it creates an array object that knows the boundaries of the range and can interpolate (and then cache) any given value when it's actually needed. That's useful, because it greatly speeds up the creation of a list like (1..Inf).

Inf is Perl 6's standard numerical infinity value, so a list that runs to Inf takes ... well ... forever to actually build. But writing 1..Inf is OK in Perl 6, since the elements of the resulting list are only ever computed on demand. Of course, if you were to print(1..Inf), you'd have plenty of time to go and get a cup of coffee. And even then (given the comparatively imminent heat death of the universe) that coffee would be really cold before the output was complete. So there will probably be a warning when you try to do that.

But to get an infinite list of odd indices, we don't want every number between 1 and infinity; we want every second number. Fortunately, Perl 6's .. operator can take an adverb that specifies a "step-size" between the elements in the resulting list. So if we write (1..Inf : 2), we get (1,3,5,7,...). Using that list, we can extract the oddly indexed elements of an array of any size (e.g. @costs) with an ordinary array slice:

    print @costs[1..Inf:2]
You might have expected another one of those "maximal-entropy coffee" delays whilst print patiently outputs the infinite number of undef's that theoretically exist after @costs' last element, but slices involving infinite lists avoid that problem by returning only those elements that actually exist in the list being sliced. That is, instead of iterating the requested indices in a manner analogous to:
    sub slice is lvalue (@array, *@wanted_indices) {
        my @slice;
        foreach $wanted_index ( @wanted_indices ) {
            @slice[+@slice] := @array[$wanted_index];
        }
        return @slice;
    }
infinite slices iterate the available indices:
    sub slice is lvalue (@array, *@wanted_indices) {
        my @slice;
        foreach $actual_index ( 0..@array.last ) {
            @slice[+@slice] := @array[$actual_index]
                if any(@wanted_indices) == $actual_index;
        }
        return @slice;
    }
(Obviously, it's actually far more complicated -- and lazy -- than that. It has to preserve the original ordering of the wanted indexes, as well as cope with complex cases like infinite slices of infinite lists. But from the programmer's point of view, it all just DWYMs).

By the way, binding selected array elements to the elements of another array (as in: @slice[+@slice] := @array[$actual_index]), and then returning the bound array as an lvalue, is a neat Perl 6 idiom for recreating any kind of slice-like semantics with user-defined subroutines.

Take that! And that!

And so, lastly, we save the data back to disk:
    save_data(%data, log => {name=>'metalog', vers=>1, costs=>[], stat=>0});
Note that we're passing in both a hash and a pair, but that these still get correctly folded into &save_data's single hash parameter, courtesy of the flattening asterisk on the parameter definition:
    sub save_data (*%data) {...

In a nutshell...

It's okay if your head is spinning at this point.

We just crammed a huge number of syntactic and semantic changes into a comparatively small piece of example code. The changes may seem overwhelming, but that's because we've been concentrating on only the changes. Most of the syntax and semantics of Perl's operators don't change at all in Perl 6.

So, to conclude, here's a summary of what's new, what's different, and (most of all) what stays the same.

Unchanged operators

  • prefix and postfix ++ and --
  • unary !, ~, \, and -
  • binary **
  • binary =~ and !~
  • binary *, /, and %
  • binary + and -
  • binary << and >>
  • binary & and |
  • binary =, +=, -=, *=, etc.
  • binary ,
  • unary not
  • binary and, or, and xor

Changes to existing operators

  • binary -> (dereference) becomes .
  • binary . (concatenate) becomes _
  • unary + (identity) now enforces numeric context on its argument
  • binary ^ (bitwise xor) becomes ~
  • binary => becomes the "pair" constructor
  • ternary ? : bbeeccoommeess ?? ::

Enhancements to existing operators

  • binary .. becomes even lazier
  • binary <, >, lt, gt, ==, !=, etc. become chainable
  • Unary -r, -w, -x, etc. are nestable
  • The <> input operator are more context-aware
  • The logical && and || operators propagate their context to both their operands
  • The x repetition operator no longer requires listifying parentheses on its left argument in a list context.

New operators:

  • unary _ is the explicit string context enforcer
  • binary ~~ is high-precedence logical xor
  • unary * is a list context specifier for parameters and a array flattening operator for arguments
  • unary ^ is a meta-operator for specifying vector operations
  • unary := is used to create aliased variables (a.k.a. binding)
  • unary // is the logical 'default' operator

Apocalypse 3

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

Table of Contents

RFC 025: Operators: Multiway comparisons

RFC 320: Allow grouping of -X file tests and add filetest builtin

RFC 290: Better english names for -X

RFC 283: tr/// in array context should return a histogram

RFC 084: Replace => (stringifying comma) with => (pair constructor)

RFC 081: Lazily evaluated list generation functions

RFC 285: Lazy Input / Context-sensitive Input

RFC 082: Arrays: Apply operators element-wise in a list context

RFC 045: || and && should propagate result context to both sides

RFC 054: Operators: Polymorphic comparisons

RFC 104: Backtracking

RFC 143: Case ignoring eq and cmp operators

RFC 170: Generalize =~ to a special ``apply-to'' assignment operator

Non-RFC considerations

  Binary . (dot)
Unary . (dot)
Binary _
Unary _
Unary +
Binary :=
Unary *
List context
Binary :
Trinary ??::
Binary //
Binary ;
Unary ^
Unary ?
Binary ?
Binary ~
Binary ~~
User defined operators
Unicode operators
Precedence

To me, one of the most agonizing aspects of language design is coming up with a useful system of operators. To other language designers, this may seem like a silly thing to agonize over. After all, you can view all operators as mere syntactic sugar -- operators are just funny looking function calls. Some languages make a feature of leveling all function calls into one syntax. As a result, the so-called functional languages tend to wear out your parenthesis keys, while OO languages tend to wear out your dot key.

But while your computer really likes it when everything looks the same, most people don't think like computers. People prefer different things to look different. They also prefer to have shortcuts for common tasks. (Even the mathematicians don't go for complete orthogonality. Many of the shortcuts we typically use for operators were, in fact, invented by mathematicians in the first place.)

So let me enumerate some of the principles that I weigh against each other when designing a system of operators.

  • Different classes of operators should look different. That's why filetest operators look different from string or numeric operators.
  • Similar classes of operators should look similar. That's why the filetest operators look like each other.
  • Common operations should be ``Huffman coded.'' That is, frequently used operators should be shorter than infrequently used ones. For how often it's used, the scalar operator of Perl 5 is too long, in my estimation.
  • Preserving your culture is important. So Perl borrowed many of its operators from other familiar languages. For instance, we used Fortran's ** operator for exponentiation. As we go on to Perl 6, most of the operators will be ``borrowed'' directly from Perl 5.
  • Breaking out of your culture is also important, because that is how we understand other cultures. As an explicitly multicultural language, Perl has generally done OK in this area, though we can always do better. Examples of cross-cultural exchange among computer cultures include XML and Unicode. (Not surprisingly, these features also enable better cross-cultural exchange among human cultures -- we sincerely hope.)
  • Sometimes operators should respond to their context. Perl has many operators that do different but related things in scalar versus list context.
  • Sometimes operators should propagate context to their arguments. The x operator currently does this for its left argument, while the short-circuit operators do this for their right argument.
  • Sometimes operators should force context on their arguments. Historically, the scalar mathematical operators of Perl have forced scalar context on their arguments. One of the RFCs discussed below proposes to revise this.
  • Sometimes operators should respond polymorphically to the types of their arguments. Method calls and overloading work this way.
  • Operator precedence should be designed to minimize the need for parentheses. You can think of the precedence of operators as a partial ordering of the operators such that it minimizes the number of ``unnatural'' pairings that require parentheses in typical code.
  • Operator precedence should be as simple as possible. Perl's precedence table currently has 24 levels in it. This might or might not be too many. We could probably reduce it to about 18 levels, if we abandon strict C compatibility of the C-like operators.
  • People don't actually want to think about precedence much, so precedence should be designed to match expectations. Unfortunately, the expectations of someone who knows the precedence table won't match the expectations of someone who doesn't. And Perl has always catered to the expectations of C programmers, at least up till now. There's not much one can do up front about differing cultural expectations.

It would be easy to drive any one of these principles into the ground, at the expense of other principles. In fact, various languages have done precisely that.

My overriding design principle has always been that the complexity of the solution space should map well onto the complexity of the problem space. Simplification good! Oversimplification bad! Placing artificial constraints on the solution space produces an impedence mismatch with the problem space, with the result that using a language that is artificially simple induces artificial complexity in all solutions written in that language.

One artificial constraint that all computer languages must deal with is the number of symbols available on the keyboard, corresponding roughly to the number of symbols in ASCII. Most computer languages have compensated by defining systems of operators that include digraphs, trigraphs, and worse. This works pretty well, up to a point. But it means that certain common unary operators cannot be used as the end of a digraph operator. Early versions of C had assignment operators in the wrong order. For instance, there used to be a =- operator. Nowadays that's spelled -=, to avoid conflict with unary minus.

By the same token (no pun intended), you can't easily define a unary = operator without requiring a space before it most of the time, since so many binary operators end with the = character.

Perl gets around some of these problems by keeping track of whether it is expecting an operator or a term. As it happens, a unary operator is simply one that occurs when Perl is expecting a term. So Perl could keep track of a unary = operator, even if the human programmer might be confused. So I'd place a unary = operator in the category of ``OK, but don't use it for anything that will cause widespread confusion.'' Mind you, I'm not proposing a specific use for a unary = at this point. I'm just telling you how I think. If we ever do get a unary = operator, we will hopefully have taken these issues into account.

While we can disambiguate operators based on whether an operator or a term is expected, this implies some syntactic constraints as well. For instance, you can't use the same symbol for both a postfix operator and a binary operator. So you'll never see a binary ++ operator in Perl, because Perl wouldn't know whether to expect a term or operator after that. It also implies that we can't use the ``juxtaposition'' operator. That is, you can't just put two terms next to each other, and expect something to happen (such as string concatenation, as in awk). What if the second term started with something looked like an operator? It would be misconstrued as a binary operator.

Well, enough of these vague generalities. On to the vague specifics.

The RFCs for this apocalypse are (as usual) all over the map, but don't cover the map. I'll talk first about what the RFCs do cover, and then about what they don't. Here are the RFCs that happened to get themselves classified into chapter 3:

    RFC   PSA    Title
    ---   ---    -----
    024   rr     Data types: Semi-finite (lazy) lists
    025   dba    Operators: Multiway comparisons
    039   rr     Perl should have a print operator 
    045   bbb    C<||> and C<&&> should propagate result context to both sides
    054   cdr    Operators: Polymorphic comparisons
    081   abc    Lazily evaluated list generation functions
    082   abc    Arrays: Apply operators element-wise in a list context
    084   abb    Replace => (stringifying comma) with => (pair constructor)
    104   ccr    Backtracking
    138   rr     Eliminate =~ operator.
    143   dcr    Case ignoring eq and cmp operators
    170   ccr    Generalize =~ to a special "apply-to" assignment operator
    283   ccc    C<tr///> in array context should return a histogram
    285   acb    Lazy Input / Context-sensitive Input
    290   bbc    Better english names for -X
    320   ccc    Allow grouping of -X file tests and add C<filetest> builtin

Note that you can click on the following RFC titles to view a copy of the RFC in question. The discussion sometimes assumes that you've read the RFC.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

RFC 025: Operators: Multiway comparisons

Previous Apocalypses

Apocalypse 1

Apocalypse 2

This RFC proposes that expressions involving multiple chained comparisons should act like mathematician would expect. That is, if you say this:

    0 <= $x < 10

it really means something like:

    0 <= $x && $x < 10

The $x would only be evaluated once, however. (This is very much like the rewrite rule we use to explain assignment operators such as $x += 3.)

I started with this RFC simply because it's not of any earthshaking importance whether I accept it or not. The tradeoff is whether to put some slight complexity into the grammar in order to save some slight complexity in some Perl programs. The complexity in the grammar is not much of a problem here, since it's amortized over all possible uses of it, and it already matches the known psychology of a great number of people.

There is a potential interaction with precedence levels, however. If we choose to allow an expression like:

    0 <= $x == $y < 20

then we'll have to unify the precedence levels of the comparison operators with the equality operators. I don't see a great problem with this, since the main reason for having them different was (I believe) so that you could write an exclusive of two comparisons, like this:

    $x < 10 != $y < 10

However, Perl has a built-in xor operator, so this isn't really much of an issue. And there's a lot to be said for forcing parentheses in that last expression anyway, just for clarity. So unless anyone comes up with a large objection that I'm not seeing, this RFC is accepted.

RFC 320: Allow grouping of -X file tests and add filetest builtin

This RFC proposes to allow clustering of file test operators much like some Unix utilities allow bundling of single character switches. That is, if you say:

    -drwx $file

it really means something like:

    -d $file && -r $file && -w $file && -x $file

Unfortunately, as proposed, this syntax will simply be too confusing. We have to be able to negate named operators and subroutines. The proposed workaround of putting a space after a unary minus is much too onerous and counterintuitive, or at least countercultural.

The only way to rescue the proposal would be to say that such operators are autoloaded in some fashion; any negated but unrecognized operator would then be assumed to be a clustered filetest. This would be risky in that it would prevent Perl from catching misspelled subroutine names at compile time when negated, and the error might well not get caught at run time either, if all the characters in the name are valid filetests, and if the argument can be interpreted as a filename or filehandle (which is usually). Perhaps it would be naturally disallowed under use strict, since we'd basically be treating -xyz as a bareword. On the other hand, in Perl 5, all method names are essentially in the unrecognized category until run time, so it would be impossible to tell whether to parse the minus sign as a real negation. Optional type declarations in Perl 6 would only help the compiler with variables that are actually declared to have a type. Fortunately, a negated 1 is still true, so even if we parsed the negation as a real negation, it might still end up doing the right thing. But it's all very tacky.

So I'm thinking of a different tack. Instead of bundling the letters:

    -drwx $file

let's think about the trick of returning the value of $file for a true value. Then we'd write nested unary operators like this:

    -d -r -w -x $file

One tricky thing about that is that the operators are applied right to left. And they don't really short circuit the way stacked && would (though the optimizer could probably fix that). So I expect we could do this for the default, and if you want the -drwx as an autoloaded backstop, you can explicitly declare that.

In any event, the proposed filetest built-in need not be built in. It can just be a universal method. (Or maybe just common to strings and filehandles?)

My one hesitation in making cascading operators work like that is that people might be tempted to get cute with the returned filename:

    $handle = open -r -w -x $file or die;

That might be terribly confusing to a lot of people. The solution to this conundrum is presented at the end of the next section.

RFC 290: Better english names for -X

This RFC proposes long names as aliases for the various filetest operators, so that instead of saying:

    -r $file

you might say something like:

    use english;
    freadable($file)

Actually, there's no need for the use english, I expect. These names could merely universal (or nearly universal) methods. In any case, we should start getting used to the idea that mumble($foo) is equivalent to $foo.mumble(), at least in the absence of a local subroutine definition to the contrary. So I expect that we'll see both:

    is_readable($file)

and:

    $file.is_readable

Similar to the cascaded filetest ops in the previous section, one approach might be that the boolean methods return the object in question for success so that method calls could be stacked without repeating the object:

    if ($file.is_dir
             .is_readable
             .is_writable
             .is_executable) {

But -drwx $file could still be construed as more readable, for some definition of readability. And cascading methods aren't really short-circuited. Plus, the value returned would have to be something like ``$file is true,'' to prevent confusion over filename ``0.''

There is also the question of whether this really saves us anything other than a little notational convenience. If each of those methods has to do a stat on the filename, it will be rather slow. To fix that, what we'd actually have to return would be not the filename, but some object containing the stat buffer (represented in Perl 5 by the _ character). If we did that, we wouldn't have to play $file is true games, because a valid stat buffer object would (presumably) always be true (at least until it's false).

The same argument would apply to cascaded filetest operators we talked about earlier. An autoloaded -drwx handler would presumably be smart enough to do a single stat. But we'd likely lose the speed gain by invoking the autoload mechanism. So cascaded operators (either -X style or .is_XXX style) are the way to go. They just return objects that know how to be either boolean or stat buffer objects in context. This implies you could even say

    $statbuf = -f $file or die "Not a regular file: $file";
    if (-r -w $statbuf) { ... }

This allows us to simplify the special case in Perl 5 represented by the _ token, which was always rather difficult to explain. And returning a stat buffer instead of $file prevents the confusing:

    $handle = open -r -w -x $file or die;

Unless, of course, we decide to make a stat buffer object return the filename in a string context. :-)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

RFC 283: tr/// in array context should return a histogram

Yes, but ...

While it's true that I put that item into the Todo list ages ago, I think that histograms should probably have their own interface, since the histogram should probably be returned as a complete hash in scalar context, but we can't guess that they'll want a histogram for an ordinary scalar tr///. On the other hand, it could just be a /h modifier. But we've already done violence to tr/// to make it do character counting without transliterating, so maybe this isn't so far fetched.

One problem with this RFC is that it does the histogram over the input rather than the output string. The original Todo entry did not specify this, but it was what I had intended. But it's more useful to do it on the resulting characters because then you can use the tr/// itself to categorize characters into, say, vowels and consonants, and then count the resulting V's and C's.

On the other hand, I'm thinking that the tr/// interface is really rather lousy, and getting lousier every day. The whole tr/// interface is kind of sucky for any sort of dynamically generated data. But even without dynamic data, there are serious problems. It was bad enough when the character set was just ASCII. The basic problem is that the notation is inside out from what it should be, in the sense that it doesn't actually show which characters correspond, so you have to count characters. We made some progress on that in Perl 5 when, instead of:

    tr/abcdefghijklmnopqrstuvwxyz/VCCCVCCCVCCCCCVCCCCCVCCCCC/

we allowed you to say:

    tr[abcdefghijklmnopqrstuvwxyz]
      [VCCCVCCCVCCCCCVCCCCCVCCCCC]

There are also shenanigans you can play if you know that duplicates on the left side prefer the first mention to subsequent mentions:

    tr/aeioua-z/VVVVVC/

But you're still working against the notation. We need a more explicit way to put character classes into correspondence.

More problems show up when we extend the character set beyond ASCII. The use of tr/// for case translations has long been semi-deprecated, because a range like tr/a-z/A-Z/ leaves out characters with diacritics. And now with Unicode, the whole notion of what is a character is becoming more susceptible to interpretation, and the tr/// interface doesn't tell Perl whether to treat character modifiers as part of the base character. For some of the double-wide characters it's even hard to just look at the character and tell if it's one character or two. Counted character lists are about as modern as hollerith strings in Fortran.

So I suspect the tr/// syntax will be relegated to being just one quote-like interface to the actual transliteration module, whose main interface will be specified in terms of translation pairs, the left side of which will give a pattern to match (typically a character class), and the right side will say what to translation anything matching to. Think of it as a series of coordinated parallel s/// operations. Syntax is still open for negotiation till apocalypse 5.

But there can certainly be a histogram option in there somewhere.

RFC 084: Replace => (stringifying comma) with => (pair constructor)

I like the basic idea of pairs because it generalizes to more than just hash values. Named parameters will almost certainly be implemented using pairs as well.

I do have some quibbles with the RFC. The proposed key and value built-ins should simply be lvalue methods on pair objects. And if we use pair objects to implement entries in hashes, the key must be immutable, or there must be some way of re-hashing the key if it changes.

The stuff about using pairs for mumble-but-false is bogus. We'll use properties for that sort of chicanery. (And multiway comparisons won't rely on such chicanery in any event. See above.)

RFC 081: Lazily evaluated list generation functions

Sorry, you can't have the colon--at least, not without sharing it. Colon will be a kind of ``supercomma'' that supplies an adverbial list to some previous operator, which in this case would be the prior colon or dotdot.

(We can't quite implement ?: as a : modifier on ?, because the precedence would be screwey, unless we limit : to a single argument, which would preclude its being used to disambiguate indirect objects. More on that later.)

The RFCs proposal concerning attributes::get(@a) stuff is superseded by value properties. So, @a.method() should just pull out the variable's properties directly, if the variable is of a type that supports the methods in question. A lazy list object should certainly have such methods.

Assignment of a lazy list to a tied array is a problem unless the tie implementation handles laziness. By default a tied array is likely to enforce immediate list evaluation. Immediate list evaluation doesn't work on infinite lists. That means it's gonna fill up your disk drive if you try to say something like:

    @my_tied_file = 1..Inf;

Laziness should be possible, but not necessarily the norm. It's all very well to delay the evaluation of ``pure'' functions in the realm of math, since presumably you get the same result no matter when you evaluate. But a lot of Perl programming is done with real world data that changes over time. Saying somefunc($a .. $b) can get terribly fouled up if $b can change, and the lazy function still refers to the variable rather than its instantaneous value. On the other hand, there is overhead in taking snapshots of the current state.

On the gripping hand, the lazy list object is the snapshot of the values, that's not a problem in this case. Forget I mentioned it.

The tricky thing about lazy lists is not the lazy lists themselves, but how they interact with the rest of the language. For instance, what happens if you say:

    @lazy = 1..Inf;
    @lazy[5] = 42;

Is @lazy still lazy after it is modified? Do we remember the @lazy[5] is an ``exception'', and continue to generate the rest of the values by the original rule? What if @lazy is going to be generated by a recursive function? Does it matter whether we've already generated @lazy[5]?

And how do we explain this simply to people so that they can understand? We will have to be very clear about the distinction between the abstraction and the concrete value. I'm of the opinion that a lazy list is a definition of the default values of an array, and that the actual values of the array override any default values. Assigning to a previously memoized element overrides the memoized value.

It would help the optimizer to have a way to declare ``pure'' array definitions that can't be overridden.

Also consider this:

    @array = (1..100, 100..10000:100);

A single flat array can have multiple lazy lists as part of it's default definition. We'll have to keep track of that, which could get especially tricky if the definitions start overlapping via slice definitions.

In practice, people will treat the default values as real values. If you pass a lazy list into a function as an array argument, the function will probably not know or care whether the values it's getting from the array are being generated on the fly or were there in the first place.

I can think of other cans of worms this opens, and I'm quite certain I'm too stupid to think of them all. Nevertheless, my gut feeling is that we can make things work more like people expect rather than less. And I was always a little bit jealous that REXX could have arrays with default values. :-)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

RFC 285: Lazy Input / Context-sensitive Input

Solving this with want() is the wrong approach, but I think the basic idea is sound because it's what people expect. And the want() should in fact be unnecessary. Essentially, if the right side of a list assignment produces a lazy list, and the left side requests a finite number of elements, the list generator will only produce enough to satisy the demand. It doesn't need to know how many in advance. It just produces another scalar value when requested. The generator doesn't have to be smart about its context. The motto of a lazy list generator should be, ``Ours is not to question why, ours is but to do (the next one) or die.''

It will be tricky to make this one work right:

    ($first, @rest) = 1 .. Inf;

RFC 082: Arrays: Apply operators element-wise in a list context

APL, here we come... :-)

This is by far the most difficult of these RFCs to decide, so I'm going to be doing a lot of thinking out loud here. This is research--or at least, a search. Please bear with me.

I expect that there are two classes of Perl programmers--those that would find these ``hyper'' operators natural, and those that wouldn't. Turning this feature on by default would cause a lot of heartburn for people who (from Perl 5 experience) expect arrays to always return their length under scalar operators even in list context. It can reasonably be argued that we need to make the scalar operators default, but make it easy to turn on hyper operators within a lexical scope. In any event, both sets of operators need to be visible from anywhere--we're just arguing over who gets the short, traditional names. All operators will presumably have longer names for use as function calls anyway. Instead of just naming an operator with long names like:

    operator:+
    operator:/

the longer names could distinguish ``hyperness'' like this:

    @a scalar:+ @b
    @a list:/ @b

That implies they could also be called like this:

    scalar:+(@a, @b)
    list:/(@a, @b)

We might find some short prefix character stands in for ``list'' or ``scalar''. The obvious candidates are @ and $:

    @a $+ @b
    @a @/ @b

Unfortunately, in this case, ``obvious'' is synonymous with ``wrong''. These operators would be completely confusing from a visual point of view. If the main psychological point of putting noun markers on the nouns is so that they stand out from the verbs, then you don't want to put the same markers on the verbs. It would be like the Germans starting to capitalize all their words instead of just their nouns.

Instead, we could borrow a singular/plural memelet from shell globbing, where * means multiple characters, and ? means one character:

    @a ?+ @b
    @a */ @b

But that has a bad ambiguity. How do you tell whether ** is an exponentiation or a list multiplication? So if we went that route, we'd probably have to say:

    @a ?:+ @b
    @a *:/ @b

Or some such. But if we're going that far in the direction of gobbledygook, perhaps there are prefix characters that wouldn't be so ambiguous. The colon and the dot also have a visual singular/plural value:

    @a .+ @b
    @a :/ @b

We're already changing the old meaning of dot (and I'm planning to rescue colon from the ?: operator), so perhaps that could be made to work. You could almost think of dot and colon as complementary method calls, where you could say:

    $len = @a.length;   # length as a scalar operator
    @len = @a:length;   # length as a list operator

But that would interfere with other desirable uses of colon. Plus, it's actually going to be confusing to think of these as singular and plural operators because, while we're specifying that we want a ``plural'' operator, we're not specifying how to treat the plurality. Consider this:

    @len = list:length(@a);

Anyone would naively think that returns the length of the list, not the length of each element of the list. To make it work in English, we'd actually have to say something like this:

    @len = each:length(@a);
    $len = the:length(@a);

That would be equivalent to the method calls:

    @len = @a.each:length;
    $len = @a.the:length;

But does this really mean that there are two array methods with those weird names? I don't think so. We've reached a result here that is spectacularly close to a reductio ad absurdum. It seems to me that the whole point of this RFC is that the ``eachness'' is most simply specified by the list context, together with the knowledge that length() is a function/method that maps one scalar value to another. The distribution of that function over an array value is not something the scalar function should be concerned with, except insofar as it must make sure its type signature is correct.

And there's the rub. We're really talking about enforced strong typing for this to work right. When we say:

    @foo = @bar.mumble

How do we know whether mumble has the type signature that magically enables iteration over @bar? That definition is off in some other file that we may not have memorized quite yet. We need some more explicit syntax that says that auto-interation is expected, regardless of whether the definition of the operator is well specified. Magical auto-iteration is not going to work well in a language with optional typing.

So the resolution of this is that the unmarked forms of operators will force scalar context as they do in Perl 5, and we'll need a special marker that says an operator is to be auto-iterated. That special marker turns out to be an uparrow, with a tip o' the hat to higher-order functions. That is, the hyper-operator:

    @a ^* @b

is equivalent to this:

    parallel { $^a * $^b } @a, @b

(where parallel is a hypothetical function that iterates through multiple arrays in parallel.)

Hyper operators will also intuit where a dimension is missing from one of its arguments, and replicate a scalar value to a list value in that dimension. That means you can say:

    @a ^+ 1

to get a value with one added to each element of @a. (@a is unchanged.)

I don't believe there are any insurmountable ambiguities with the uparrow notation. There is currently an uparrow operator meaning exclusive-or, but that is rarely used in practice, and is not typically followed by other operators when it is used. We can represent exclusive-or with ~ instead. (I like that idea anyway, because the unary ~ is a 1's complement, and the binary ~ would simply be doing a 1's complement on the second argument of the set bits in the first argument. On the other hand, there's destructive interference with other cultural meanings of tilde, so it's not completely obvious that it's the right thing to do. Nevertheless, that's what we're doing.)

Anyway, in essence, I'm rejecting the underlying premise of this RFC, that we'll have strong enough typing to intuit the right behavior without confusing people. Nevertheless, we'll still have easy-to-use (and more importantly, easy-to-recognize) hyper-operators.

This RFC also asks about how return values for functions like abs() might be specified. I expect sub declarations to (optionally) include a return type, so this would be sufficient to figure out which functions would know how to map a scalar to a scalar. And we should point out again that even though the base language will not try to intuit which operators should be hyperoperators, there's no reason in principle that someone couldn't invent a dialect that does. All is fair if you predeclare.

RFC 045: || and && should propagate result context to both sides

Yes. The thing that makes this work in Perl 6, where it was almost impossible in Perl 5, is that in Perl 6, list context doesn't imply immediate list flattening. More precisely, it specifies immediate list flattening in a notional sense, but the implementation is free to delay that flattening until it's actually required. Internally, a flattened list is still an object. So when @a || @b evaluates the arrays, they're evaluated as objects that can return either a boolean value or a list, depending on the context. And it will be possible to apply both contexts to the first argument simultaneously. (Of course, the computer actually looks at it in the boolean context first.)

There is no conflict with RFC 81 because the hyper versions of these operators will be spelled:

    @a ^|| @b
    @a ^&& @b

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

RFC 054: Operators: Polymorphic comparisons

I'm not sure of the performance hit of backstopping numeric equality with string equality. Maybe vtables help with this. But I think this RFC is proposing something that is too specific. The more general problem is how you allow variants of built-ins, not just for ==, but for other operators like <=> and cmp, not to mention all the other operators that have scalar and list variants.

A generic equality operator could potentially be supplied by operator definition. I expect that a similar mechanism would allow us to define how abstract a comparison cmp would do, so we could sort and collate according to the various defined levels of Unicode.

The argument that you can't do generic programming is somewhat specious. The problem in Perl 5 is that you can't name operators, so you couldn't pass in a generic operator in place of a specific one even if you wanted to. I think it's more important to make sure all operators have real function names in Perl 6:

    operator:+($a, $b);     # $a + $b
    operator:^+(@a, @b);    # @a ^+ @b
    my sub operator:<?> ($a, $b) { ... }
    if ($a <?> $b) { ... }
    @sorted = collate \&operator:<?>, @unicode;

RFC 104: Backtracking

As proposed, this can easily be done with an operator definition to call a sequence of closures. I wonder whether the proposal is complete, however. There should probably be more make-it-didn't-happen semantics to a backtracking engine. If Prolog unification is emulated with an assignment, how do you later unassign a variable if you backtrack past it?

Ordinarily, temporary values are scoped to a block, but we're using blocks differently here, much like parens are used in a regex. Later parens don't undo the ``unifications'' of earlier parens.

In normal imperative programming these temporary determinations are remembered in ordinary scoped variables and the current hypothesis is extended via recursion. An andthen operator would need to have a way of keeping BLOCK1's scope around until BLOCK2 succeeds or fails. That is, in terms of lexical scoping:

    {BLOCK1} andthen {BLOCK2}

needs to work more like

    {BLOCK1 andthen {BLOCK2}}

This might be difficult to arrange as a mere module. However, with rewriting rules it might be possible to install the requisite scoping semantics within BLOCK1 to make it work like that. So I don't think this is a primitive in the same sense that continuations would be. For now let's assume we can build backtracking operators from continuations. Those will be covered in a future apocalypse.

RFC 143: Case ignoring eq and cmp operators

This is another RFC that proposes a specific feature that can be handled by a more generic feature, in this case, an operator definition:

    my sub operator:EQ { lc($^a) eq lc($^b) }

Incidentally, I notice that the RFC normalizes to uppercase. I suspect it's better these days to normalize to lowercase, because Unicode distinguishes titlecase from uppercase, and provides mappings for both to lowercase.

RFC 170: Generalize =~ to a special ``apply-to'' assignment operator

I don't think the argument should come in on the right. I think it would be more natural to treat it as an object, since all Perl variables will essentially be objects anyway, if you scratch them right. Er, left.

I do wonder whether we could generalize =~ to a list operator that calls a given method on multiple objects, so that

    ($a, $b) =~ s/foo/bar/;

would be equivalent to

    for ($a, $b) { s/foo/bar/ }

But then maybe it's redundant, except that you could say

    @foo =~ s/foo/bar/

in the middle of an expression. But by and large, I think I'd rather see:

    @foo.grep {!m/\s/}

instead of using =~ for what is essentially a method call. In line with what we discussed before, the list version could be a hyperoperator:

    @foo . ^s/foo/bar/;

or possibly:

    @foo ^. s/foo/bar/;

Note that in the general case this all implies that there is some interplay between how you declare method calls and how you declare quote-like operators. It seems as though it would be dangerous to let a quote-like declaration out of a lexical scope, but then it's also not clear how a method call declaration could be lexically scoped. So we probably can't do away with =~ as an explicit marker that the thing on the left is a string, and the thing on the right is a quoted construct. That means that a hypersubstitution is really spelled:

    @foo ^=~ s/foo/bar/;

Admittedly, that's not the prettiest thing in the world.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

Non-RFC considerations

The RFCs propose various specific features, but don't give a systematic view of the operators as a whole. In this section I'll try to give a more cohesive picture of where I see things going.

Binary . (dot)

This is now the method call operator, in line with industry-wide practice. It also has ramifications for how we declare object attribute variables. I'm anticipating that, within a class module, saying

    my int $.counter;

would declare both a $.counter instance variable and a counter accessor method for use within the class. (If marked as public, it would also declare a counter accessor method for use outside the class.)

Unary . (dot)

It's possible that a unary . would call a method on the current object within a class. That is, it would be the same as a binary . with $self (or equivalent) on the left:

    method foowrapper ($a, $b) {
        .reallyfoo($a, $b, $c)
    }

On the other hand, it might be considered better style to be explicit:

    method foowrapper ($self: $a, $b) {
        $self.reallyfoo($a, $b, $c)
    }

(Don't take that declaration syntax as final just yet, however.)

Binary _

Since . is taken for method calls, we need a new way to concatenate strings. We'll use a solitary underscore for that. So, instead of:

    $a . $b . $c

you'll say:

    $a _ $b _ $c

The only downside to that is the space between a variable name and the operator is required. This is to be construed as a feature.

Unary _

Since the _ token indicating stat buffer is going away, a unary underscore operator will force stringification, just as interpolation does, only without the quotes.

Unary +

Similarly, a unary + will force numification in Perl 6, unlike in Perl 5. If that fails, NaN (not a number) is returned.

Binary :=

We need to distinguish two different forms of assignment. The standard assignment operator, =, works just as it does Perl 5, as much as possible. That is, it tries to make it look like a value assignment. This is our cultural heritage.

But we also need an operator that works like assignment but is more definitional. If you're familiar with Prolog, you can think of it as a sort of unification operator (though without the implicit backtracking semantics). In human terms, it treats the left side as a set of formal arguments exactly as if they were in the declaration of a function, and binds a set of arguments on the right hand side as though they were being passed to a function. This is what the new := operator does. More below.

Unary *

Unary * is the list flattening operator. (See Ruby for prior art.) When used on an rvalue, it turns off function signature matching for the rest of the arguments, so that, for instance:

    @args = (\@foo, @bar);
    push *@args;

would be equivalent to:

    push @foo, @bar;

In this respect, it serves as a replacement for the prototype-disabling &foo(@bar) syntax of Perl 5. That would be translated to:

    foo(*@bar)

In an lvalue, the unary * indicates that subsequent array names slurp all the rest of the values. So this would swap two arrays:

    (@a, @b) := (@b, @a);

whereas this would assign all the array elements of @c and @d to @a.

    (*@a, @b) := (@c, @d);

An ordinary flattening list assignment:

    @a = (@b, @c);

is equivalent to:

    *@a := (@b, @c);

That's not the same as

    @a := *(@b, @c);

which would take the first element of @b as the new definition of @a, and throw away the rest, exactly as if you passed too many arguments to a function. It could optionally be made to blow up at run time. (It can't be made to blow up at compile time, since we don't know how many elements are in @b and @c combined. There could be exactly one element, which is what the left side wants.)

List context

The whole notion of list context is somewhat modified in Perl 6. Since lists can be lazy, the interpretation of list flattening is also by necessity lazy. This means that, in the absence of the * list flattening operator (or an equivalent old-fashioned list assignment), lists in Perl 6 are object lists. That is to say, they are parsed as if they were a list of objects in scalar context. When you see a function call like:

    foo @a, @b, @c;

you should generally assume that three discrete arrays are being passed to the function, unless you happen to know that the signature of foo includes a list flattening *. (If a subroutine doesn't have a signature, it is assumed to have a signature of (*@_) for old times' sake.) Note that this is really nothing new to Perl, which has always made this distinction for builtins, and extended it to user-defined functions in Perl 5 via prototypes like \@ and \%. We're just changing the syntax in Perl 6 so that the unmarked form of formal argument expects a scalar value, and you optionally declare the final formal argument to expect a list. It's a matter of Huffman coding again, not to mention saving wear and tear on the backslash key.

Binary :

As I pointed out in an earlier apocalypse, the first rule of computer language design is that everybody wants the colon. I think that means that we should do our best to give the colon to as many features as possible.

Hence, this operator modifies a preceding operator adverbially. That is, it can turn any operator into a trinary operator (provided a suitable definition is declared). It can be used to supply a ``step'' to a range operator, for instance. It can also be used as a kind of super-comma separating an indirect object from the subsequent argument list:

    print $handle[2]: @args;

Of course, this conflicts with the old definition of the ?: operator. See below.

In a method type signature, this operator indicates that a previous argument (or arguments) is to be considered the ``self'' of a method call. (Putting it after multiple arguments could indicate a desire for multimethod dispatch!)

Trinary ??::

The old ?: operator is now spelled ??::. That is to say, since it's really a kind of short-circuit operator, we just double both characters like the && and || operator. This makes it easy to remember for C programmers. Just change:

    $a ? $b : $c

to

    $a ?? $b :: $c

The basic problem is that the old ?: operator wastes two very useful single characters for an operator that is not used often enough to justify the waste of two characters. It's bad Huffman coding, in other words. Every proposed use of colon in the RFCs conflicted with the ?: operator. I think that says something.

I can't list here all the possible spellings of ?: that I considered. I just think ??:: is the most visually appealing and mnemonic of the lot of them.

Binary //

A binary // operator is the defaulting operator. That is:

    $a // $b

is short for:

    defined($a) ?? $a :: $b

except that the left side is evaluated only once. It will work on arrays and hashes as well as scalars. It also has a corresponding assignment operator, which only does the assignment if the left side is undefined:

    $pi //= 3;

Binary ;

The binary ; operator separates two expressions in a list, much like the expressions within a C-style for loop. Obviously the expressions need to be in some kind of bracketing structure to avoid ambiguity with the end of the statement. Depending on the context, these expressions may be interpreted as arguments to a for loop, or slices of a multi-dimensional array, or whatever. In the absence of other context, the default is simply to make a list of lists. That is,

    [1,2,3;4,5,6]

is a shorthand for:

    [[1,2,3],[4,5,6]]

But usually there will be other context, such as a multidimension array that wants to be sliced, or a syntactic construct that wants to emulate some kind of control structure. A construct emulating a 3-argument for loop might force all the expressions to be closures, for instance, so that they can be evaluated each time through the loop. User-defined syntax will discussed in apocalypse 18, if not sooner.

Unary ^

Unary ^ is now reserved for hyper operators. Note that it works on assignment operators as well:

    @a ^+= 1;    # increment all elements of @a

Unary ?

Reserved for future use.

Binary ?

Reserved for future use.

Binary ~

This is now the bitwise XOR operator. Recall that unary ~ (1's complement) is simply an XOR with a value containing all 1 bits.

Binary ~~

This is a logical XOR operator. It's a high precedence version of the low precedence xor operator.

User defined operators

The declaration syntax of user-defined operators is still up for grabs, but we can say a few things about it. First, we can differentiate unary from binary declarations simply by the number of arguments. (Declaration of a return type may also be useful for disambiguating subsequent parsing. One place it won't be needed is for operators wanting to know whether they should behave as hyperoperators. The pressure to do that is relieved by the explicit ^ hypermarker.)

We also need to think how these operator definitions relate to overloading. We can treat an operator as a method on the first object, but sometimes it's the second object that should control the action. (Or with multimethod dispatch, both objects.) These will have to be thrashed out under ordinary method dispatch policy. The important thing is to realize that an operator is just a funny looking method call. When you say:

    $man bites $dog

The infrastruture will need to untangle whether the man is biting the dog or the dog is getting bitten by the man. The actual biting could be implement in either the Man class or the Dog class, or even somewhere else, in the case of multimethods.

Unicode operators

Rather than using longer and longer strings of ASCII characters to represent user-defined operators, it will be much more readable to allow the (judicious) use of Unicode operators.

In the short term, we won't see much of this. As screen resolutions increase over the next 20 years, we'll all become much more comfortable with the richer symbol set. I see no reason (other than fear of obfuscation (and fear of fear of obfuscation))) why Unicode operators should not be allowed.

Note that, unlike APL, we won't be hardware dependent, in the sense that any Perl implementation will always be able to parse Unicode, even if you can't display it very well. (But note that Vim 6.0 just came out with Unicode support.)

Precedence

We will at least unify the precedence levels of the equality and relational operators. Other unifications are possible. For instance, the not logical operator could be combined with list operators in precedence. There's only so much simplification that you can do, however, since you can't mix right association with left association. By and large, the precedence table will be what you expect, if you expect it to remain largely the same.

And that still goes for Perl 6 in general. We talk a lot here about what we're changing, but there's a lot more that we're not changing. Perl 5 does a lot of things right, and we're not terribly interested in ``fixing'' that.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en