April 2004 Archives

This Week on Perl 6, Week Ending 2004-04-25

And we're back on a weekly schedule again (unless the Mayday bank holiday knocks me for six next week). As I expected, the Apocalypse has brought out a rash of prophets and prognosticators in perl6-language, but perl6-internals is still ahead on number of messages per week.

Constant Strings

I confess I'm not sure I quite understand the constant strings patches that Leo Tötsch and Jeff Clites were discussing. I understand the bottom line though -- they make parrot a lot quicker when comparing constant strings. Huzzah!

Then it turned into a discussion of Unicode (or at least, Parrot string handling which is almost, but not quite, the same thing).

http://groups.google.com/

Parrot m4 0.0.4

Bernhard Schmalhofer posted a patch to bring his parrot implementation of m4 up to what he's calling version 0.0.4

http://groups.google.com/

SDL Parrot status

Portland Parrot stalwart, chromatic, posted a link to a SDL Parrot site he's set up with current status, downloadable snapshots and other good SDL Parrot related things.

http://wgz.org/chromatic/parrot/sdl/

Passing arrays of structs to NCI subs

That man chromatic again, this time he asked if there was a secret syntax for passing and retrieving arrays of things to and from NCI. Leo noted that, according to the docs, there is no secret syntax, it's all documented but unimplemented.

http://groups.google.com

PMC Constants

Last week, Leo asked for comments on his proposal for PMC constants. This week, Dan replied. Essentially he's all for the idea, but wasn't sure which implementation strategy was the best choice.

http://groups.google.com

Assign and set

Leo announced some changes he'd made to the behaviour of the set and assign opcodes. Jens Rieks pointed out a case that he'd missed.

http://groups.google.com

hyper op - proof of concept

Leo also implemented what he described as a rather hackish and incomplete new opcode called hyper. Dan liked it enough to suggest that we should go the whole hog and add a hyper vtable to PMCs, with hyper versions of all the standard vtable entries. He and Dan had a long discussion of this, with contributions from various luminaries including Larry. There was some debate as to whether we really needed overridable hyper ops, but Dan's adamant that whatever happens they'll be implemented in a vtable to allow for potential optimizations in some cases.

http://groups.google.com

Separating allocation and initialization of objects

Last week, Leo posted the latest object benchmarks, and things were fast. But there was one test where Python was faster. Analysis of the results and the code seemed to point the finger at Parrot's combined allocation and initialization. This week Dan confessed that he was leaning towards separating the two because that would allow for a standard PCC call into the initialization code. He pointed out that there were still a few issues, but that appears to be the way he's thinking.

http://groups.google.com

Another config task

Dan pointed out that the current config scheme relies rather heavily on flags set in the current perl install, which isn't ideal. He suggested that people might like to look into making Parrot's config step rather less Perl dependent and pointed at things like 'metaconfig'.

http://groups.google.com

Problems with the Perl 6 Compiler

Allison Randal noted that languages/perl6/ was failing all its tests. The issue arose because the Perl 6 test module inherits from Parrot::Test, and Parrot::Test's behaviour got changed recently. She wondered why the changes had been made. After some discussion, Allison provided a patch to make things work with the new Test module.

http://groups.google.com

IMCC temp optimizations...

Dan is possibly the only person using Parrot in a commercial (mission critical?) environment, using a compiler from a legacy language to Parrot. He's currently experiencing problems with IMCC taking a long time to compile the PIR generated by his compiler. Apparently it's because he's using a lot of .locals and temps. He and Leo discussed it back and forth to try and find optimizations (both of IMCC and of Dan's compiler) for the issue. (Dramatic) progress was made.

http://groups.google.com

Korean character set information

Last week, Dan had wished he had access to anyone who knew anything about Korean. This week kj provided some information. The ensuing discussion of Unicode (again, maybe I'm going to have to extend my "I don't cover the Unicode arguments" policy from perl6-language to the internals list too) led Jarkko Hietaniemi to propose that Parrot's standard character set be cuneiform, with Phaistos disc symbols for variable names.

I think he was joking.

http://groups.google.com

Containers and Values

The difference between containers and values cropped up again (for the first time in a while actually). It got kicked off by a discussion of what

    my Joe $foo;
    $foo = 12; 

meant. It turns out that the my Joe $foo; part sets up a lexical variable (called $foo), which points to a container (in this case a Perl Scalar), which is constrained to contain either undef or a Joe objects. I believe it's somewhat open as to whether $foo = 12 is legal and, if it isn't, when the decision about the legality of the assignment would be made. Simon Cozens asked if the type declaration information went with the variable or the value and the answer was 'neither', it goes with the container that the variable points to. (Actually, that's not quite true, variables also know what type the containers they point to should be constrained to holding, otherwise

    my Dog $spot;
    my $feline = RSPCA.get_stray(Cat);
    $spot := $feline

wouldn't throw an exception.) Dan noted that, right now, there is a 'muddled split' between values and containers in Parrot, which may well come back to bite us in the future.

http://groups.google.com

Dan Does Unicode

After struggling manfully against the slings and arrows of outrageous text representations, Dan finally choked down the Unicode bolus and declared Unicode the "One True Character String Encoding Set Stuff Thingie". Not long after this he posted the shiny new, all Unicode, all the time (ahem) strings design document.

http://groups.google.com

http://groups.google.com

Meanwhile, in perl6-language

Backticks

Last week's thread rumbled on. Larry eventually made a ruling in favour of not using ` as an operator. He mentioned that the current backticks behaviour 'probably needs to be completely rethought anyway', and promised to cover it in a future Apocalypse. This didn't stop Austin Hastings speculating, but a p6l without Austin speculating wouldn't be the p6l we all know and love.

http://groups.google.com

Spaces in method calls

Late last week, Abhijit A Mahabal had wondered about one of the examples of using method calls without brackets given in Apocalypse 12. In particular, he wondered why

    $obj.method ($x + $y) + $z

was equivalent to

    $obj.method(($x + $y) + $z) 

rather than

    $obj.method($x + $y) + $z

Larry was forthcoming with an explanation. He hoped it would be possible to unambiguously define what is ambiguous.

http://groups.google.com

Returning from Rules

Luke Palmer noticed a common pattern in his grammar writing, where he did a great deal of assigning a value to $0. He proposed that <{ some_thing() }> be redefined to assign the result from the block to $0. Warnock applies.

http://groups.google.com

Hyper mutating methods

Matthew Walton wondered if the new @things».method() hyperized method syntax also worked with mutating methods (@things».=method()). Answer: Yes.

http://groups.google.com

Placeholder attachment

Trey Harris asked for an explanation of placeholder attachment. Being the gentleman he so obviously is, he even came up with a few thorny examples. Larry answered that the rule of placeholders is simple: Placeholders bind to the most closely surrounding closure. That is all. If you want anything more complicated then arrow blocks (-> $a, $b, $c { ... }) are what you should be using.

http://groups.google.com

Lvalue methods

John Siracusa is after a neat way of overriding the assignment side of an is rw attribute. Various proposals were batted about, including using a different metaclass. Larry insisted that the $obj.foo($value) was the wrong way of doing what $obj.foo = $value does in A12 and repeated his reasoning for this. It looks like John was reasonably happy with the 'different metaclass' approach though.

http://groups.google.com/

Required Named Parameters Strike Back

Way back in the mists of time, John Siracusa had argued forcefully for required named parameters; required arguments to a function (or method) that must be supplied as pairs rather than positionally. Apocalypse 12 gave him some grist for this particular mill and he reopened the discussion with a long recap and extension of his argument.

Larry's response is a masterpiece of conciseness:

   Well, actually, we saved you last summer when we decided to make +
   mean that the parameter must be named.

Discussion continued after this of course, but it was mostly concerned with making sure things worked as expected.

http://groups.google.com

Conflicting Attributes in Roles

Jonathan Lang appeared to be shooting for the most cryptic question of the week award when he simply posted a chunk of code. Austin thought he understood it and essayed an answer. The resulting thread generated all sorts of interesting stuff. Which is odd, because on rereading Jonathan's original post, I don't think the code he wrote does what he thinks it does.

http://groups.google.com

Syntax to call attributes.

Jonathan Lang also wondered what the syntax was for accessing a list attribute from a scalar object. Answer: The same was as you would access any other attribute of that object; by calling a sigil free method. (Which seems to imply that you either can't have all of $.foo, @.foo and %.foo as attributes in the same class. Which in turn looks like plain common sense to me.)

http://groups.google.com

Lower case magic methods?

Aldo Calpini worried that 'implicit' methods like meta, dispatch, etc seem to break the rule that implicit things get UPPER CASE (and some of the implicit methods have names that a programmer might want for something else -- especially dispatch). Larry confessed that the names were chosen for the very strong reason that he "hadn't thought about the issue yet". He went on to discuss things that go into the Apocalypses as 'placeholders' -- he knows that more design is needed, and there wants to be something there, but he hasn't necessarily got the name right yet. He went on to make like a disabled octopus, offering six hands worth of reasons for choosing different rules for naming the various structural methods. No decision yet, but now we know that Larry's thinking about it...

http://groups.google.com

Subtypes that lack methods or roles

Jonathan Lang wanted to know how to declare a "subtype of a class which messes with the dispatching mechanism to exclude certain methods and/or roles from it". I want to know whether the class he's subtyping is the one that messes with the dispatching, of if the new class is the one that does the messing. I appear not to have been the only one confused by the question: Larry asked Jonathan for some sample code, which he did. Only to have the underlying design criticised by Dov Wasserman and chromatic, who argued that what Jonathan was asking how to do was exactly the thing that Roles were designed to avoid in the first place by pulling orthogonal behaviour out into roles and then composing them in classes.

http://groups.google.com

Delegation using arrays

Austin Hastings voiced his misgivings about the A12 magic for delegating to an array attribute. After a short discussion with Larry, a light went on over Austin's head.

http://groups.google.com

Typed undef

Austin Hastings had a few things to say about typed undefs (the things that get made when you do my Dog $spot), he liked the idea of being able to call class methods on a typed undef for instance. The thread went on to discuss other tricks with flavoured undefs, like undefs that contain unthrown exceptions so that, when someone finally checks their return value, they can examine the exception to find out what it was that failed in the first place.

http://groups.google.com

Universal Exports

Aaron Sherman had questions about the new export syntax discussed at the end of A12. In particular, he wondered how he'd go about reexporting a symbol that a module imports from somewhere else. Discussion and sketchy design occurred in the ensuing thread.

http://groups.google.com

MethodMaker techniques in Perl 6

John Siracusa wondered how he'd go about writing a Class::MethodMaker equivalent in Perl6. (Personally, I'd roll my own subclass of Class and MetaClass). Discussion followed, but I don't think they came up with any implementation techniques yet.

http://groups.google.com

Announcements, Acknowledgements, Apologies

I'm sorry, but there's no announcements this week.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send me feedback at mailto:pdcawley@bofh.org.uk

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

Rapid Web Application Deployment with Maypole : Part 2

When we last left our intrepid web developer, he had successfully set up an online sales catalogue in 11 lines of code. Now, however, he has to move on to turning this into a sales site with a shopping cart and all the usual trimmings. It's time to see some of that flexibility we talked about last week; unfortunately this means we're going to have to write some more code, but we can't have everything.

Who Am I?

In order to add the shopping cart to the site, we need to introduce the concept of a current user. This will allow viewers of the site to log in and have their own cart. We will be adding two new tables to the database, a table to store details about the user, and one to represent the cart. Our tables will look like so:

  CREATE TABLE user (
    id int not null auto_increment primary key,
    first_name varchar(64),
    last_name varchar(64),
    email varchar(255),
    password varchar(64),
    address1 varchar(255),
    address2 varchar(255),
    state varchar(255),
    postal_code varchar(64),
    country varchar(64)
  );
  
  CREATE TABLE cart_item (
    id int not null auto_increment primary key,
    user int,
    item int
  );

As before, Maypole automatically creates classes for the tables. We use Class::DBI relationships to tell Maypole what's going on with these tables:

  ISellIt::User->has_many( "cart_items" => "ISellIt::BasketItem");
  ISellIt::BasketItem->has_a( "user" => "ISellit::User" );
  ISellIt::BasketItem->has_a( "item" => "ISellit::Product" );

We now need a way to tell our application about the current user. There's a long explanation of Maypole's authentication system in the Maypole documentation, but one of the easiest ways to do add the concept of the current user is with the Maypole::Authentication::UserSessionCookie module.

As its name implies, this module takes care of associating a user with a session, and issuing a cookie to the user's browser. It also manages validating the user's login credentials, by default by looking up the user name and password in a database table; precisely what we need!

Maypole provides an authentication method for us to override, and it's here that we're going to intercept any request that requires a user -- viewing the shopping cart, adding items to an order, and so on:

  sub authenticate {
    my ($self, $r) = @_;
    unless ($r->{table} eq "cart" or $r->{action} eq "buy") {
      return OK;
    }

    # Else we need a user
    $r->get_user;
    if (!$r->{user}) {
      $r->template("login");
    }
    return OK;
   }

The get_user method, which does all the work of setting the cookie and setting the credentials, is provided by the UserSessionCookie module. The only thing we need to tell it is that we're going to use the user's email address and password as login credentials, rather than some arbitrary user name. We can do this in the configuration for our application, as described in the UserSessionCookie documentation:

  ISellIt->{config}->{auth}->{user_field} = "email";

Next, we set up a login template, which will present the users with a form to enter their credentials; there's one in the Maypole manual, in the Request chapter, which we can modify to suit our needs:

  [% INCLUDE header %]

    <h2> You need to log in before buying anything </h2>

  <DIV class="login">
  [% IF login_error %]
     <FONT COLOR="#FF0000"> [% login_error %] </FONT>
  [% END %]
    <FORM ACTION="/[% request.path%]" METHOD="post">
  Email Address:
    <INPUT TYPE="text" NAME="email"> <BR>
  Password: <INPUT TYPE="password" NAME="password"> <BR>
  <INPUT TYPE="submit">
  </FORM>
  </DIV>

And now logging in is sorted out; if a user presents the correct credentials, get_user will put the user's ISellIt::User object in the Maypole request object as $r->{user}, and the user's request will continue to where it was going.

Now, of course, since we have a user object we can play with, we can use the user's information in other contexts:

  [% IF request.user %]
    <DIV class="messages">
    Welcome back, [% request.user.first_name %]!
    </DIV>
  [% END %]

Since we're going to be referring to the user a lot, we pass it to the template as an additional argument, my. Maypole has an open-ended "hook" method, additional_data, which is perfect for doing just this.

  sub additional_data {
    my $r = shift;
    $r->{template_args}{my} = $r->{user};
  }

We call it my so that we can say, for instance:

    <DIV class="messages">
    Welcome back, [% my.first_name %]!
    </DIV>

So now we have a user. We can add a new action, order, to add an item to the user's shopping cart:

  package ISellIt::Product;

  sub order :Exported {
    my ($self, $r, $product) = @_;
    $r->{user}->add_to_cart_items({ item => $product });
    $r->{template} = "view";
  }

This adds an entry in the cart_item table associating the item with the user, and then sends us back to viewing the item.

We've sent our user back shopping without an indication that we actually did add an item to his shopping cart; we can give such an indication by passing information into the template:

  sub order :Exported {
    my ($self, $r, $product) = @_;
    $r->{user}->add_to_cart_items({ item => $product });
    $r->{template} = "view";
    $r->{template_args}{bought} = 1;
  }

And then displaying it:

  [% IF bought %]
  <DIV class="messages">
    We've just added this item to your shopping cart. To complete
    your transaction, please <A HREF="/user/view_cart">view your
    cart</A> and check out.
  </DIV>
  [% END %]

So now we need to allow the user to view a cart.

Displaying the Cart

This also turns out to be relatively easy -- most things in Maypole are -- involving an action on the user class. We need to fill our Maypole request object with the items in the user's cart:

  package ISellIt::User;

  sub view_cart :Exported {
    my ($self, $r) = @_;
    $r->{objects} = [ $r->{user}->cart_items ];
  }

And then we need to produce a user/view_cart template that displays them:

  [% PROCESS header %]

  <h2> Your Shopping Cart </h2>

  <TABLE>
  <TR> <TH> Product </TH> <TH> Price </TH> </TR>
  [% SET count = 0;
  FOR item = objects;
    SET count = count + 1;
    "<tr";
    ' class="alternate"' IF count % 2;
    ">";
  %]
    <TD> [% item.product.name %] </TD>
    <TD> [% item.product.price %] </TD>
    <TD> 
      <FORM ACTION="/cart_item/delete/[% item.id %]">
      <INPUT TYPE="submit" VALUE="Remove from cart">
      </FORM>
    </TD>
  </tr>
  [% END %]
  </TABLE>

  <A HREF="/user/checkout"> Check out! </A>

Once again, the HTML isn't great, but it gives us something we can pass to the design people to style up nicely. Now on to checking out the cart...

Check Out

The hardest part about building an e-commerce application is interacting with the payment and credit-card fulfillment service. We'll use the Business::OnlinePayment module to handle that side of things, and handle the order fulfillment by simply sending an email.

The actual check-out page needs to collect credit card and delivery information, and so it doesn't actually need any objects; the only object we actually need is the ISellIt::User, and that was stashed away in the request object by the authentication routine. However, we do want to display the total cost. So to make things easier we'll add an action and compute this in Perl. We make the total cost a method on the user, so we can use this later:

  package ISellIt::User;
  use List::Util qw(sum);
  sub basket_cost {
    my $self = shift;
    sum map { $_->item->price }
    $self->basket_items
  }

And define checkout to add this total to our template:

  sub checkout :Exported {
    my ($self, $r) = @_;
    $r->{template_args}{total_cost} = $r->{user}->basket_cost;
  }

Now we write our user/checkout template:

  [% PROCESS header %]
  <h2> Check out </h2>

  <p> Please enter your credit card and delivery details. </p>

  <form method="post" action="https://www.isellit.com/user/do_checkout">
    <P>
    First name: <input name="first_name" value="[% my.first_name %]"><BR>
    Last name: <input name="last_name" value="[% my.last_name %]"></P>
    <P>
    Street address: <input name="address" value="[% my.address1 %]"><BR>
    City: <input name="city" value="[% my.address2 %]"><BR>
    State: <input name="state" value="[% my.state %]">
    Zip: <input name="zip" value="[% my.postal_code %]">
    </P>

    <P>
    Card type: <select name="type">
      <option>Visa</option>
      <option>Mastercard</option>
      ...
    </select>

    Card number: <input name="card_number"> 
    Expiration: <input name="expiration"> <BR>
    Total: $ [% total_price %]
    </P>
    <P>
    Please click <B>once</B> and wait for the payment to be
    authorised.... <input type="submit" value="order">
  </form>

What happens when this data is sent to the do_checkout action? (Over SSL, you'll notice.) First of all, we'll check if the user has entered address details for the first time, and if so, store them in the database. Perhaps unnecessary in this day of browsers that auto-fill forms, but it's still a convenience. Maypole stores the POST'ed in parameters in params:

  sub do_checkout :Exported {
    my ($self, $r) = @_;
    my %params = %{$r->{params}};
    my $user = $r->{user};

    $user->address1($params{address}) unless $user->address1;
    $user->address2($params{city})  unless $user->address2;
    $user->state($params{state})    unless $user->state;
    $user->postal_code($params{zip})  unless $user->postal_code;

We need to construct a request to go out via Business::OnlinePayment; thankfully, the form parameters we've received are going to be precisely in the format that OnlinePayment wants, thanks to careful form design. All we need to do is to insert our account details and the total:

    my $tx = new Business::OnlinePayment("TCLink");
    $tx->content(%params,
      type   => "cc",
      login  => VENDOR_LOGIN,
      password => VENDOR_PASSWORD,
      action   => 'Normal Authorization'
      amount   => $r->{user}->basket_total
    );

Now we can submit the payment and see what happens. If there's a problem, we add a message to the template and send the user back again:

    $tx->submit;
    if (!$tx->is_success) {
      $r->{template_args}{message} = 
        "There was a problem authorizing your transaction: ".
        $tx->error_message;
      $r->{template} = "checkout";
      return;
    }

Otherwise, we have our money; we probably want to tell the box-shifters about it, or we lose customers fast:

    fulfill_order(
      address_details => $r->{params},
      order_details   => [ map { $_->item } $r->{user}->cart_items ],
      cc_auth     => $tx->authorization
    );

And now we empty the shopping cart, and send the user on his way:

    $_->delete for $r->{user}->cart_items;
    $r->{template} = "frontpage";
  }

Done! We've taken a user from logging in, adding goods to the cart, credit card validation, and checkout. But... wait. How did we get our user in the first place?

Registering a User

We have to find a way to sign a user up. This is actually not that hard, particularly since we can use the example of Flox in the Maypole manual. First, we'll add a "register" link to our login template:

  <P>New user? <A HREF="/user/register">Sign up!</A></P>

This page doesn't require any objects to be loaded up, since it's just going to display a registration form; we can just add our template in /user/register:

  [% INCLUDE header %]
  <P>Welcome to buying with iSellIt!</P>

  <P>To set up your account, we only need a few details from you:
  </P>

  <FORM METHOD="POST" ACTION="/user/do_register">
    <P>Your name:
    <input name="first_name"> 
    <input name="last_name"> </P>
    <P>Your email address: <input name="email"> </P>
    <P>Please choose a password: <input name="password"> </P>
    <input type="submit" name="Register" value="Register">
  </FORM>

As before, we need to explain to Class::DBI::FromCGI how these fields are to be edited:

  ISellIt::User->untaint_columns(
    printable => [qw/first_name last_name password/],
    email   => [qw/email/],
  );

And now we can write our do_register event, using the FromCGI style:

  sub do_register :Exported {
    my ($self, $r) = @_;
    my $h = CGI::Untaint->new(%{$r->{params}});
    my $user = $self->create_from_cgi($h);

If there were any problems, we send them back to the register form again:

    if (my %errors = $obj->cgi_update_errors) {
      $r->{template_args}{cgi_params} = $r->{params};
      $r->{template_args}{errors} = \%errors;
      $r->{template} = "register";
      return;
    }

Otherwise, we now have a user; we need to issue the cookie as if the user had logged in normally. Again, this is something that UserSessionCookie looks after for us:

    $r->{user} = $user;
    $r->login_user($user->id);

And finally we send the user on his or her way again:

    $r->{template} = "frontpage";
  }

There we go: now we can create new users; provision of a password reminder function is an exercise for the interested reader.

Maypole Summary

We've done it -- we've created an e-commerce store in a very short space of time and with a minimal amount of code. One of the things that I like about Maypole is the extent to which you only need to code your business logic; all of the display templates can be mocked up and then shipped off to professionals, and the rest of the work is just handled magically behind the scenes by Maypole.

Thanks to the TPF funding of Maypole, we now have an extensive user manual with several case studies (this one included), and a lively user and developer community. I hope you too will be joining it soon!

This Fortnight on Perl 6, Weeks Ending 2004-04-18

The only problem with summarizing two week's worth of Perl 6 happenings is that there's twice as much stuff to summarize. Still, there's no way I could have made the time to write a summary last week so I'll take my lumps. I am exceedingly grateful that Apocalypse 12 (Objects) wasn't released the Thursday before Easter though, as now I can clear the decks for the expected perl6-language explosion next week.

We'll start with perl6-internals as usual.

Initializers, Finalizers and Fallbacks

There was some discussion of the various functions that get called by object initialization/destruction, and so on. Dan wondered what he'd been thinking when he declared that there would be distinct FINALIZE, DELETE, and CLEANUP properties. (Instead of declaring that a function must be called, say, FINALIZE, you can mark any function with a FINALIZE property, and Parrot will recognize that as the function to call at finalization time). Andy Wardley quibbled about American/British spelling, but Tim Bunce pointed out that the 'ize' form is preferred by the Oxford English Dictionary (and your humble summarizer).

Leo, meanwhile, made a (Warnocked) proposal for a new init scheme.

http://groups.google.com/groups?selm=a06010207bc988256b491@[172.24.18.98]

http://groups.google.com/groups?selm=200404060833.i368Xou16545@thu8.leo.home

New SDL Parrot Bindings

Taking advantage of Parrot's new, improved object system, chromatic updated us all on his efforts to provide a shiny OO interface to the SDL library. Jens Rieks wondered about chromatic's implementation choices, and posted a few suggestions and questions. Dan reckoned he'd prefer it if the SDL bindings used Parrots internal events systems. The catch being that Parrot doesn't actually have an internal events system yet...

Later in the fortnight, chromatic posted an initial release and promised to post his notes and slides from the Portland Perl Mongers meeting where he was showing it off. There's no sign of 'em yet though.

http://groups.google.com/groups?selm=1081225995.12079.27.camel@localhost

http://groups.google.com/groups?selm=1081317984.19190.2.camel@localhost

new Method

Jens Rieks and chromatic were unsure of the best name for a constructor method. They'd like to be able to write a method called new, but IMCC wouldn't allow it. Leo Tötsch pointed out that there's already a default constructor method: __init. However, chromatic wasn't too keen on it because he wanted to be able to pass arguments to the constructor. Leo pointed out that, although it wasn't officially supported, you could pass arguments in the same was as if you were making a normal parrot function call.

Dan pointed out that our current initialization system is some way from being the final one (which will use properties to mark constructor methods). What we have now is more like an allocator than a real constructor.

http://groups.google.com/groups?selm=200404041830.09293.parrot@jensbeimsurfen.de

Overriding __find_method in PASM?

Noting that, according to PDD15, defining a method called __find_method should allow Perl 5 AUTOLOAD-like behavior, chromatic wondered if it was workable yet and, if it wasn't, how he should go about adding it. Leo confessed that what was in the docs was a cut and paste error and AUTOLOAD behavior hadn't yet been defined. He suggested a workaround using exceptions (which turned out to be overkill for what chromatic needed, but it looks interesting.)

http://groups.google.com/groups?selm=1081138355.7995.26.camel@localhost

Language Interoperability

Randy W. Sims popped over from the Module-Build mailing list, where they've been discussing plugin architectures to allow for the modification and extension of Module::Build's capabilities. One of the desiderata is that, once the transition to Parrot is underway, it should be possible to write plugins in any language supported by Parrot. (Mmm... a build system written in Befunge, it's what the world's been crying out for I tell you). There are currently two competing schemes, one involving manipulating the class hierarchy at runtime and the other involving plugins held in particular directories and a formal API. Randy wondered if there were any technical reasons to choose one scheme or another.

Dan reckoned that there were no particular technical reasons, but the inheritance based scheme rather offended his sensibilities, going so far as to say "No way in hell it'll ever make it in as parrot's standard module building system if you do it [the inheritance munging solution]".

http://groups.google.com/groups?selm=406FDFAB.3050008@thepierianspring.org

Save the Return Continuation!

The ongoing discussion about continuations seems to be fizzling slightly. Piers Cawley had proposed moving the return continuation out of P1 and held 'somewhere else' (possibly the control stack) and then, when a routine needed to do cunning things with the current continuation it could access it with some op (say get_current_cont). Dan reckoned he was okay with the idea, and Luke Palmer voted in favor of a special RC register. Dan asked for further discussion and the whole thing got tossed onto the horns of Warnock's Dilemma.

http://groups.google.com/groups?selm=a06010205bc987b530fde@[172.24.18.98]

Rounding Up Pending Objects

Dan asked for a list of pending issues with objects so he could get things nailed down and finished so we could move on. Jarkko Hietaniemi, chromatic and Leo all chimed in with suggestions.

http://groups.google.com/groups?selm=a06010206bc9880273177@[172.24.18.98]

Release the Streams!

Jens Rieks posted a "working version" of his new Stream library, promising more documentation later in the month. Leo and Dan looked on it, saw that it was good, and granted Jens commit privileges. Congratulations are in order.

Later, Leo found some bugs around continuation and context handling and set about tracking it down. (Jens had thought it was an issue with string handling.)

http://groups.google.com/groups?selm=200404062141.12274.parrot@jensbeimsurfen.de

Parrot Libraries

Leo pointed out that we currently have two different places to find Parrot runtime stuff: runtime/parrot/* and library/. He proposed we pick one, add some support for handling library search paths, work out a scheme to organize library paths and add some library tests. Dan decided everything should go in runtime/parrot and mused about handling library metadata.

http://groups.google.com/groups?selm=4073A4CC.7010006@toetsch.at

Splitting interpreter.c

Apparently interpreter.c runs to rather more than 2500 lines. Leo proposed splitting it into multiple files. Dan told him to go for it.

http://groups.google.com/groups?selm=4073BAAC.5030806@toetsch.at

Incorporating ICU

I have a rule of thumb about Unicode: Nobody likes it. Nor does anyone dislike it enough to come up with something better.

Jeff Clites dropped the list a line to let everyone know that he's still working on integrating the ICU Unicode library into Parrot. (A thankless task if ever there was one.) With some encouragement from Dan he posted his (huge) patch. After some debate, Dan checked it in giving a baseline to start dealing with any issues.

Jeff explained the rationale of his approach (which I have to confess I skimmed, I don't care how strings work, so long as they work). Jarkko liked it, noting that other approaches lead "into combinatorial explosion and instant insanity". Jarkko went on to share his Unicode pain and generally back Jeff up in discussions with Leo. If you're interested in the gory details of Unicode implementation, I commend this thread to you. Or you can just trust Jeff, Jarkko, Leo, Larry and Dan to get it right (which is what I'm doing).

http://groups.google.com/groups?selm=CFB7F3BE-88B8-11D8-ADF3-000393A6B9DA@mac.com

http://groups.google.com/groups?selm=612F0D93-8A6A-11D8-ADF3-000393A6B9DA@mac.com -- Jeff's explanations

Tracking JIT Down

In another of his ongoing series of simple Perl tasks for the interested, Dan asked for a script to generate a list of all the ops that aren't JITted (along with a few extra goodies that would be nice). Stefan Lidman was the man with the script, which was rapidly checked in.

http://groups.google.com/groups?selm=a0610050ebc9a03e47738@[10.0.1.2]

Diamond Inheritance Is Broken

If you've ever sung bass in choir you'll be aware that sometimes a bass line is sung on one note for rather a long time. For the past two weeks perl6-internals' repetitive bass note has been the failure of test 17 in t/pmc/object-meths.t. Should you find yourself building a CVS parrot and get caught by this, please be aware that we know about the problem it's just Dan's suffering from a tuit shortage and there are other important strings he's concentrating on.

New Libraries

Jens "The librarian" Rieks released another set of libraries, "Data::Sort", "Data::Replace" and "Data::Escape". Tim Bunce wasn't that keen on his choice of names (and indeed functionality). The current front runners for new names for these are "PMC::DeepReplace", "PMC::Printable", "PMC::Sort" and "PMC::Dumper".

http://groups.google.com/groups?selm=200404082028.49211.parrot@jensbeimsurfen.de

Attribute Questions

Mark Sparshatt wondered how to handle class attributes, with particular reference to implementing Ruby. Dan reckons we'll get proper class attributes once he's sorted out metaclasses. Mark muddied the water somewhat by pointing out that Ruby has two kinds of class attributes; ones that are hung off metaclasses and those that are (I think) held in the class namespace. Annoyingly they have two distinct behaviors.

http://groups.google.com/groups?selm=4075919F.30905@yahoo.co.uk

Tcl PMCs

Will Coleda posted his first cut at a set of PMCs to support TCL semantics. He apparently had problems with the Array PMC's assumption that 'empty' slots contained PerlUndefs, which meant he had to implement a custom TclArray PMC. For the rest of the thread Will and Leo worked out how to re-jig the patch so the PMCs could be dynamically loaded before Leo checked it into the repository.

http://groups.google.com/groups?selm=rt-3.0.8-28393-84164.14.5997263059787@perl.org

Parrot Everywhere

I've not really mentioned his work in recent summaries, but Marcus Thiesen has been doing sterling work helping to get Parrot up and running on a bewildering variety of systems. Thanks for the sterling work Marcus.

http://www.luusa.org/~marcus/parrottest -- Marcus's Smoker

Warnocked

Bryan C. Warnock posted a patch to Parrot's CREDITS, correcting a long defunct email address. You might enjoy the patch:

    N: Bryan C. Warnock
   +D: Little things here and there in pre-Parrot days.
   +D: And, yes, {sigh}, *that* Warnock.
   +E: bwarnock@raba.com

He's too modest of course, Bryan started off writing the Perl 6 Summaries. When he stopped doing them due (I presume) to a lack of time, I missed them so much I started writing my own. So don't blame me for these, blame Bryan.

Rather appropriately, nobody commented on the patch.

http://groups.google.com/groups?selm=rt-3.0.8-28383-84147.3.00438339593441@perl.org

Unicode Step By Step

Leo Tötsch posted a quick overview of steps to get Unicode support into Parrot. Right now, if you turn Unicode on, your (at least) first build is going to take a looong time.

Debate centered on whether or not the Parrot distribution should include the full ICU distribution. (It's looking like a qualified yes, but we will attempt to use an existing installation of ICU if we can find it.)

http://groups.google.com/groups?selm=4077F315.6060200@toetsch.at

Disappearing PASM Files in the Test Directory

Leo wondered what had happened to the generated .pasm files in t/*/ (he wasn't alone in this, but he was the person who posted). Will Coleda confessed that he'd doctored Parrot::Test so that they ended up in /tmp (probably). He didn't say why.

http://groups.google.com/groups?selm=4077EC4D.3040806@toetsch.at

ICU Build Pains

I don't normally discuss issues people have with building Parrot on various different machines either (the threads usually die out quite quickly: "Did you do this?" "Oh! Thanks, that worked.") but the ICU check in seems to have caused no small amount of pain on Linux systems for some reason.

Alberto Manuel Brandao Simoes posted an error log for a failing build. Jeff Clites, our Guru of ICU set about helping him to track the problem down with incomplete success. Dan pointed everyone at Debian's patches to get ICU to build, and suggested that people wait for his patch to allow the use of an existing ICU installation.

Various other threads continued the discussion, at the end of which Dan had checked in a patch that seemed to solve the problems.

http://groups.google.com/groups?selm=40784BF0.20608@alfarrabio.di.uminho.pt

http://groups.google.com/groups?selm=a06100503bca05b379e6f@[10.0.1.2]

http://groups.google.com/groups?selm=a06100505bca05f409099@[10.0.1.2]

Tangled Strings

Dan posted the beginnings of his plan for how strings are going to work in Parrot. On the face of it, not a contentious issue. However, strings are text, and text is a human cultural artifact, which means there's politics and really, really, really ugly complexities to deal with if you want to Do It Right (assuming you can decide what Right is). There was much discussion. And then there was some more. The trouble is, this stuff is Important (and it's very important that we get it right *before* we start implementing the matching engine, otherwise some of the assumptions it might make about how fast various string manipulations are might turn out to be very wrong indeed...) and Hard. Because it's Hard it's rather tricky to summarize, so I'm going to punt and just give you the root message.

http://groups.google.com/groups?selm=a06100500bca0384ce81c@[10.0.1.2]

Basic Library Paths

Dan finally got 'round to designing how Parrot was going to handle searching for libraries and such. Oh, and he and Jarkko engaged in some unseemly bragging about VMS which has had all this stuff fixed for ages. There was a fair bit of discussion, but the response was generally positive.

http://groups.google.com/groups?selm=a06100500bca449931036@[10.0.1.2]

Alternative Object Initializer Calling Scheme

Leo announced that he'd added a new, property based scheme for object initialization. Instead of initializing an object automagically with the __init method, you mark any method with the BUILD property and Parrot handles calling it for you. You do have to set the CALL__BUILD environment variable before starting Parrot to make use of it though.

http://groups.google.com/groups?selm=40768F70.1010106@toetsch.at

Joseph Höök Is Back

Long time no see Joseph.

http://groups.google.com/groups?selm=381-220044011141848685@kth.se

Version Bump Time?

Dan suggested that, once the ICU patch is properly nailed down, it could be time to start the push to a 0.1.1 (or even 0.2.0) release.

http://groups.google.com/groups?selm=a06100502bc9cbc80e445@[172.24.18.98]

Lies, Damned Lies, and Benchmarks

Leo posted a set of benchmark timings for the OO examples when run with all current optimizations. The numbers are looking rather good: Parrot's faster than everything on all but one test, where it's outperformed by Python. Of course, these aren't the benchmarks that'll determine whether Dan gets a Pie at OSCON...

http://groups.google.com/groups?selm=408009CE.2070804@toetsch.at

PMC Constants

Leo asked for comments on a proposal for dealing with PMC constants. No comments so far.

http://groups.google.com/groups?selm=407FF01A.2080102@toetsch.at

Meanwhile, in perl6-language...

Backticks

A proposal for a new use of backticks was made (because the proposer didn't think the current semantics deserved their privileged place in the language's Huffman table). Some people disagreed. Some people disagreed rather strongly. Toys were thrown out of prams. People called each other narrow minded. It wasn't pretty. With any luck people are going to calm down, apologize to each other for getting so aerated over something so trivial, and the list can settle down to the more rewarding task of dealing with the implications of Apocalypse 12.

http://groups.google.com/groups?selm=20040414121848.GJ3645@c4.convolution.nl

Compatibility with Perl 5

Dave Cantrell wondered how Perl 6 would spot legacy code. Everyone forgot to refer him to the appropriate section of Apocalypse 1 in which Larry lays down the two rules:

  • Files that are pulled in with require etc will be deemed to be Perl 6 unless they contain a package declaration.
  • Files that are run as scripts (perl some_script.pl) are treated as Perl 5 unless it's obviously Perl 6. The proposed way of making this obvious would be to begin the script with module Main.

Easy eh? It didn't stop the thread running and running though (not helped by someone getting the rules of thumb rather badly wrong in the early stages).

http://groups.google.com/groups?selm=20040413121602.GA5213@bytemark.barnyard.co.uk

Oooh, Look, It's an Apocalypse

Apocalypse 12 finally stepped out of the drafty shadows into the glare of publicity. It's very long. I expect next week will be rather busy on p6l.

http://www.perl.com/pub/a/2004/04/16/a12.html

And We're Done

A reminder to everyone on perl6-language: Play nice.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send me feedback at mailto:p6summarizer@bofh.org.uk.

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

Rapid Web Application Deployment with Maypole

You have a database. You have a web server. You have a deadline.

Whether it's bringing up an e-commerce storefront for a new venture, implementing a new front-end to HR's employee database, or even providing a neat way to track citations for U.S. English slang terms, it's always the same story -- and the deadline is always yesterday.

For this month of April, I'm working on a Perl Foundation sponsorship to develop a project of mine called Maypole, which enables Perl programmers to get web front-ends to databases, as well as complex web-based applications, up and running quickly.

Extremely quickly, and with very little Perl coding required. I've used Maypole to set up an Intranet portal, a database and display system for choosing menus and recipes, song lyric and chord sheet projection software, an open-source social network site, and a web database of beer-tasting notes; and that just was in the past two weeks.

Maypole's flexibility stems from three fundamentals:

To demonstrate these three principles, we're going to look at a bread-and-butter web application -- an online shop's product catalogue -- and see how quickly we can put it together with Maypole.

Separation of Concerns

Maypole was originally called Apache::MVC, reflecting its basis in the Model-View-Controller design pattern. (I had to change it firstly because Maypole isn't tied to Apache, and secondly because Apache::MVC is a really dull name.) It's the same design pattern that forms the foundation of similar projects in other languages, such as Java's Struts framework.

This design pattern is found primarily in graphical applications; the idea is that you have a Model class that represents and manipulates your data, a View class that is responsible for displaying that data to the user, and a Controller class that controls the other classes in response to events triggered by the user. This analogy doesn't correspond precisely to a web-based application, but we can take an important principle from it. As Andy Wardley explains:

What the MVC-for-the-web crowd is really trying to achieve is a clear separation of concerns. Put your database code in one place, your application code in another, your presentation code in a third place. That way, you can chop and change different elements at will, hopefully without affecting the other parts (depending on how well your concerns are separated, of course). This is common sense and good practice. MVC achieves this separation of concerns as a byproduct of clearly separating inputs (controls) and outputs (views).

This is what Maypole does. It has a number of database drivers, a number of front-end drivers, and a number of templating presentation drivers. In common cases, Maypole provides precisely what you need for all of these areas, and you get to concentrate on writing just the business logic of your application. This is one of the reasons why Maypole lets you develop so rapidly -- because most of the time, you don't need to do any development at all.

Let's begin, then, by choosing what elements are going to make up our product database. We will actually be using what is by far the most common configuration of model, view, and controller classes: Maypole provides a model class based on Class::DBI, a view class based on Template::Toolkit, and a controller class based on Apache mod_perl. We'll come to what all of this means in a second, but because this configuration is so common, it is the default; no code is required to set that up.

We will, however, need a database. Our client is going to be iSellIt, a fictitious supplier of computer components and software. We will have database tables for products, manufacturers, and categories of stuff, and subcategories of categories. Here's what that database might look like.

    CREATE TABLE product (
        id int NOT NULL auto_increment primary key,
        category int,
        subcategory int,
        manufacturer int,
        part_number varchar(50),
        name varchar(50),
        cost decimal(6,2),
        description text
    );

    CREATE TABLE manufacturer (
        id int NOT NULL auto_increment primary key,
        name varchar(50),
        url varchar(255),
        notes text
    );

    CREATE TABLE category (
        id int NOT NULL auto_increment primary key,
        name varchar(50)
    );

    CREATE TABLE subcategory (
        id int NOT NULL auto_increment primary key,
        name varchar(50),
        category integer
    );

We're going to assume that we've loaded some data into this database already, but we're going to want the sales people to update it themselves over a web interface.

In order to use Maypole, we need what's called a driver module. This is a very short Perl module that defines the application we're working with. I say it's a Perl module, and that may make you think this is about writing code, but to be honest, most of it is actually configuration in disguise. Here's the driver module for our ISellIt application. (The client may be called iSellIt, but many years exposure to Perl module names makes me allergic to starting one with a lowercase letter.)

    package ISellIt;
    use base 'Apache::MVC';
    use Class::DBI::Loader::Relationship;

    ISellIt->setup("dbi:mysql:isellit");
    ISellIt->config->{uri_base} = "http://localhost/isellit";
    ISellIt->config->{rows_per_page} = 10;
    ISellIt->config->{loader}->relationship($_) for 
        ("a manufacturer has products", "a category has products",
         "a subcategory has products", "a category has subcategories");

    1;

Ten lines of code; that's the sort of size you should expect a Maypole application to be. Let's take it apart, a line at a time:

    package ISellIt;

This is the name of our application, and it's what we're going to tell Apache to use as the Perl handler for our web site.

    use base 'Apache::MVC';

This says that we're using the Apache front-end to Maypole, and so we're writing a mod_perl application.

    use Class::DBI::Loader::Relationship;

Now we use a Perl module that I wrote to help put together Maypole driver classes. It allows us to declare the relationships between our database tables in a straightforward way.

    ISellIt->setup("dbi:mysql:isellit");

We tell ISellIt to go connect to the database and work out the tables and columns in our application. In addition, because we haven't changed any class defaults, it's assumed that we're going to use Class::DBI and Template Toolkit. We could have said that we want to use Apache::MVC with DBIx::SearchBuilder and HTML::Mason, but we don't.

Maypole's Class::DBI-based class uses Class::DBI::Loader to investigate the structure of the database, and then map the product table onto a ISellIt::Product class, and so on. You can read more about how Class::DBI's table-class mapping works in Tony's article about it.

    ISellIt->config->{uri_base} = "http://localhost/isellit";

ISellIt sometimes needs to know where it lives, so that it can properly produce links to other pages inside the application.

    ISellIt->config->{rows_per_page} = 10;

This says that we don't want to display the whole product list on one page; there'll be a maximum of 10 items on a page, before we get a page-view of the list.

    ISellIt->config->{loader}->relationship($_) for 
        ("a manufacturer has products", "a category has products",
         "a subcategory has products", "a category has subcategories");

Now we define our relationship constraints, in reasonably natural syntax: a manufacturer has a number of products, and a category will delimit a collection of products, and so on.

Ten lines of code. What has it got us?

Sensible Defaults

The second foundation of Maypole is its use of sensible defaults. It has a system of generic templates that "do the right thing" for viewing and editing data in a database. In many cases, web application programmers won't need to change the default behavior at all; in the majority of cases, they only need to change a few of the templates, and in the best cases, they can declare that the templating is the web design group's problem and not need to do any work at all.

So, if we install the application and the default templates, and go to our site, http://localhost/isellit; we should see this:

Which is only fair for 10 lines of code. But it gets better, because if we click on, say, the product listing, we get a screen like so:

Now that's something we could probably give to the sales team with no further alterations needed, and they could happily add, edit, and delete products.

Similarly, if we then click on a manufacturer in that products table, we see a handy page about the manufacturer, their products, and so on:

Now I think we are getting some worth from our 10 lines. Next, we give the templates to the web designers. Maypole searches for templates in three different places: first, it looks for a template specific to a class; then it looks for a custom template for the whole application; finally, it looks in the factory directory to use the totally generic, do-the-right-thing template.

So, to make a better manufacturer view, we tell them to copy the factory/view template into manufacturer/view and customize it. We copy factory/list into product/list and customize it as a listing of products; we copy factory/header and factory/footer into the custom/ directory, and turn them into the boilerplate HTML surrounding every page, and so on.

Now, I am not very good at HTML design, which is why I like Maypole -- it makes it someone else's problem -- but this means I'm not very good at showing you what sort of thing you can do with the templates. But here's a mock-up; I created product/view with the following template:

    [% INCLUDE header %]
    [% PROCESS macros %]

    <DIV class="nav"> You are in: [% maybe_link_view(product.category) %] > 
    [% maybe_link_view(product.subcategory) %] </DIV>

    <h2> [% product.name %]</h2>
    <DIV class="manufacturer"> By [% maybe_link_view(product.manufacturer) %] 
    </DIV>
    <DIV class="description"> [% product.description %] </DIV>

    <TABLE class="view">
    <TR>
        <TD class="field"> Price (ex. VAT) </TD> 
        <TD> &pound; [% product.cost %] </TD>
    </TR>
    <TR>
        <TD class="field"> Part number  </TD> 
        <TD> [% product.part_number %] </TD>
    </TR>
    </TABLE>

    [% button(product, "order") %]

Producing the following screenshot. It may not look better, but at least it proves things can be made to look different.

We've written a Template Toolkit template; the parts surrounded in [% ... %] are templating directives. If you're not too familiar with the Template Toolkit, the Maypole manual's view documentation has a good introduction to TT in the Maypole context.

Maypole provides a number of default Template macros, such as maybe_link_view, which links an object to a page viewing that object, although all of these can be overridden. It also passes in the object product, which it knows to be the one we're talking about.

In fact, that's what Maypole is really about: we've described it in terms of putting a web front-end onto a database, but fundamentally, it's responsible for using the URL /product/view/210 to load up the product object with ID 210, call the view method on its class, and pass it to the view template. Similarly, /product/list calls the list method on the product class, which populates the template with a page full of products.

The interesting thing about this template is that very last line:

    [% button(product, "order") %]

This produces a button which will produce a POST to the URL /product/order/210, which does the same as view except this time calls the order method. But Maypole doesn't yet know how to order a product. This is OK, because we can tell it.

Ease of Extensibility

Maypole's third principle is ease of extensibility. That is to say, Maypole makes it very easy to go from a simple database front-end to a full-fledged web application. Which is just as well; as has been simulated above, once the templates come back from the web designers, you find that what you thought was just going to be a product database has become an online shop. And you've still got a deadline.

But before we start extending our catalogue application to take on the new specifications (which we'll do in the second article about this), let's take a look at what we've achieved so far and what we need immediately.

We've got a way to list all the products, manufacturers, categories, and subcategories in our database; we have a way to add, edit and delete all of these things; we can search for products by manufacturer, price, and so on. What's to stop us deploying this as a customer-facing web site, as well as for Intranet updates to the product catalogue?

The immediate problem is security. We can add, edit, and delete products -- but so can anyone else. We want to allow those coming from the outside world only to view, list and search; for everything else, we require the user to be coming from an IP address in our internal range. (For now; we'll add the concept of a user when we're adding the shopping cart, and the idea of privileged user won't be far off that.)

Unfortunately, now we want some user-defined behavior, we have to start writing code. Thankfully, we don't have to write much of it. We add a few lines to our driver class, first to define our private IP address space as a NetAddr::IP object, since that provides a handy way of determining if an address is in a network:

    use constant INTERNAL_RANGE => "10.0.0.0/8";
    use NetAddr::IP;
    my $range = NetAddr::IP->new(INTERNAL_RANGE);

Now we write our authentication method; Maypole's default authenticate allows everyone access to everything, so we need to override this.

    use Maypole::Constants;
    sub authenticate {
        my ($self, $r) = @_;

        # Everyone can view, search, list
        return OK if $r->action =~ /^(view|search|list)$/;

        # Else they have to be in the internal network
        my $ip = NetAddr->IP->new($r->{ar}->connection->remote_ip);
        return OK if $ip->within($range);
        return DECLINED;
    }

The authenticate class method gets passed a Maypole request object; this is like an Apache request object, but at a much, much higher level -- it contains information about the web request, the class that's going to be used to fulfill the request, the method we need to call on the class, the template that's going to be processed, any objects, form parameters, and query parameters, and so on.

At this point, Maypole has already parsed the URI into its component database table, action, and additional arguments, so we first check to see if the action is one of the universally permitted ones.

If not, we extract the Apache::Request object stashed inside the Maypole object, and ask it for the remote IP address. If it's in the private range, we can do everything. If not, we can do nothing. Simple enough.

It's almost ready to go live, when the design guys tell you that they'd really love to put a picture alongside the description of a product. No problem.

There's two ways to do this; the way that seems really easy uses the file system to store the pictures, and has you put something like this in the template:

    <IMG SRC="/static/product_pictures/[% product.id %].png">

But while that's very simple for viewing pictures, and makes a great mockup, it's not that easy to upload pictures. So you decide to put the pictures in the database. You add a "picture" binary column to the product table, and then you consult the Maypole manual.

One of the great things about this Perl Foundation sponsorship is that it's allowing me to put together a really great manual, which contains all sorts of tricks for dealing with Maypole; the Request chapter contains a couple of recipes for uploading and displaying photos.

What we need to do is create some new actions -- one to upload a picture, and one to display it again. We'll only show the one to display a picture, since you can get them both from the manual, and because looking at this turns out to be a handy way to understand how to extend Maypole more generally.

It's useful to visualize what we're going to end up with, and work backwards. We'll have a URL like /product/view_picture/210 producing an image/png or similar page with the product's image. This allows us to put in our templates:

    <IMG SRC="/product/view_picture/[% product.id %]/">

And have the image displayed on our product view page. In fact, we're more likely to want to say:

    [% IF product.picture %]
    <IMG SRC="/product/view_picture/[% product.id %]/">
    [% ELSE %]
    <IMG SRC="/static/no_picture.png">
    [% END %]

Now, we've explained that Maypole turns URLs into method calls, so we're going to be putting a view_picture method in the product's class; this class is ISellIt::Product, so we begin like this:

    package ISellIt::Product;
    sub view_picture {
        my ($self, $r) = @_;
        # ...
    }

This has a big problem. We don't actually want people to be able to call any method on our class over the web; that would be unwise. Maypole will refuse to do this. So in order to tell Maypole that we're allowed to call this method remotely, we decorate it with an attribute:

    sub view_picture :Exported {
        my ($self, $r) = @_;
    }

At this point, we can call view_picture over the Web; we now need to make it populate the Maypole request with the appropriate data:

    sub view_picture :Exported {
        my ($self, $r, $product) = @_;
        if ($product) {
            $r->{content_type} = "image/png";
            $r->{content} = $product->picture;
        }
    }

This is a slightly unusual Maypole method, because we're bypassing the whole view class processing and templating stages, and generating content manually, but it serves to illustrate one thing: Maypole arranges for the appropriate object to be passed into the method; we've gone from URL to object without requiring any code of our own.

When we come to implementing ordering, in our next article, we'll be adding more actions like this to place the product in a user's shopping cart, check out, validate his credit card and so on. But this should be good enough for now: a templated, web-editable product database, with pictures, without stress, without too much code, and within the deadline. Well, almost.

Summary

Maypole is evolving rapidly, thanks primarily to the Perl Foundation who have enabled me to work on it for this month; it's allowed me to write many thousands of words of articles, sample applications, and Maypole-related code, and this has helped Maypole to become an extremely useful framework for developing web applications.

Apocalypse 12

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

The official, unofficial slogan of Perl 6 is "Second System Syndrome Done Right!". After you read this Apocalypse you will at least be certain that we got the "Second System" part down pat. But we've also put in a little bit of work on the "Done Right" part, which we hope you'll recognize. The management of complexity is complex, but only if you think about it. The goal of Perl 6 is to discourage you from thinking about it unnecessarily.

Speaking of thinking unnecessarily, please don't think that everything we write here is absolutely true. We expect some things to change as people point out various difficulties. That's the way all the other Apocalypses have worked, so why should this one be different?

When I say "we", I don't just mean "me". I mean everyone who has participated in the design, including the Perl 6 cabal, er, design team, the readers (and writers) of the perl6-language mailing list, and all the participants who wrote or commented on the original RFCs. For this Apocalypse we've directly considered the following RFCs:


    RFC  PSA  Title
    ===  ===  =====
    032  abb  A Method of Allowing Foreign Objects in Perl
    067  abb  Deep Copying, a.k.a., Cloning Around
    092  abb  Extensible Meta-Object Protocol
    095  acc  Object Classes
    101  bcc  Apache-Like Event and Dispatch Handlers
    126  aaa  Ensuring Perl's Object-Oriented Future
    137  bdd  Overview: Perl OO Should <Not> Be Fundamentally Changed
    147  rr   Split Scalars and Objects/References into Two Types
    152  bdd  Replace Invocant in @_ with self() builtin
    163  bdd  Objects: Autoaccessors for Object Data Structures
    171  rr   My Dog $spot Should Call a Constructor Implicitly
    174  bdd  Improved Parsing and Flexibility of Indirect Object Syntax
    187  abb  Objects : Mandatory and Enhanced Second Argument to C<bless>
    188  acc  Objects : Private Keys and Methods
    189  abb  Objects : Hierarchical Calls to Initializers and Destructors
    190  acc  Objects : NEXT Pseudoclass for Method Redispatch
    193  acc  Objects : Core Support for Method Delegation
    223  bdd  Objects: C<use invocant> Pragma
    224  bdd  Objects : Rationalizing C<ref>, C<attribute::reftype>, and
                C<builtin:blessed>
    244  cdr  Method Calls Should Not Suffer from the Action on a Distance
    254  abb  Class Collections: Provide the Ability to Overload Classes
    256  abb  Objects : Native Support for Multimethods
    265  abc  Interface Polymorphism Considered Lovely
    277  bbb  Method Calls Should Suffer from Ambiguity by Default
    307  rr   PRAYER: What Gets Said When You C<bless> Something
    335  acc  Class Methods Introspection: What Methods Does this Object
                Support?
    336  bbb  Use Strict 'objects': A New Pragma for Using Java-Like
                Objects in Perl

These RFCs contain many interesting ideas, and many more "cries for help". Usually in these Apocalypses, I discuss the design with respect to each of the RFCs. However, in this case I won't, because most of these RFCs fail in exactly the same way--they assume the Perl 6 object model to be a set of extensions to the Perl 5 object model. But as it turns out, that would have been a great way to end up with Second System Syndrome Done Wrong. Perl 5's OO system is a great workbench, but it has some issues that have to be dealt with systematically rather than piecemeal.

Some of the Problems with Perl 5 OO

A little too orthogonal

It has often been claimed that Perl 5 OO was "bolted on", but that's inaccurate. It was "bolted through", at right angles to all other reference types, such that any reference could be blessed into being an object. That's way cool, but it's often a little too cool.

Not quite orthogonal enough

It's too hard to treat built-in types as objects when you want to. Perl 5's tie interface helps, but is suboptimal in several ways, not the least of which is that it only works on variables, not values.

Forced non-encapsulation

Because of the ability to turn (almost) anything into an object, a derived class had to be aware of the internal data type of its base class. Even after convention settled on hashes as the appropriate default data structure, one had to be careful not to stomp on the attributes of one's base class.

A little too minimal

Some people will be surprised to hear it, but Perl is a minimalist language at heart. It's just minimalistic about weird things compared to your average language. Just as the binding of parameters to @_ was a minimalistic approach, so too the entire Perl 5 object system was an attempt to see how far you could drive a few features. But many of the following difficulties stem from that.

Too much keyword reuse

In Perl 5, a class is just a package, a method is just a subroutine, and an object is just a blessed referent. That's all well and good, and it is still fundamentally true in Perl 6. However, Perl 5 made the mistake of reusing the same keywords to express similar ideas. That's not how natural languages work--we often use different words to express similar ideas, the better to make subtle distinctions.

Too difficult to capture metadata

Because Perl 5 reused keywords and treated parameter binding as something you do via a list assignment at runtime, it was next to impossible for the compiler to tell which subroutines were methods and which ones were really just subroutines. Because hashes are mutable, it was difficult to tell at compile time what the attribute names were going to be.

Inside-out interfaces

The Perl 5 solution to the previous problem was to declare more things at compile time. Unfortunately, since the main way to do things at compile time was to invoke use, all the compile-time interfaces were shoehorned into use's syntax, which, powerful though it may be, is often completely inside-out from a reasonable interface. For instance, overloading is done by passing a list of pairs to use, when it would be much more natural to simply declare appropriate methods with appropriate names and traits. The base and fields pragmas are also kludges.

Not enough convention

Because of the flexibility of the Perl 5 approach, there was never any "obvious" way to do it. So best practices had to be developed by each group, and of course, everyone came up with a slightly different solution. Now, we're not going to be like some folks and confuse "obvious" with "the only way to do it". This is still Perl, after all, and the flexibility will still be there if you need it. But by convention, there needs to be a standard look to objects and classes so that they can interoperate. There's more than one way to do it, but one of those is the standard way.

Wrong conventions

The use of arrow where most of the rest of the world uses dot was confusing.

Everything possible, but difficult

The upshot of the previous problems was that, while Perl 5 made it easy to use objects and classes, it was difficult to try to define classes or derive from them.

Perl 5 Non-Problems

While there are plenty of problems with Perl 5's OO system, there are some things it did right.

Generating class by running code at compile time

One of the big advances in Perl 5 was that a program could be in charge of its own compilation via use statements and BEGIN blocks. A Perl program isn't a passive thing that a compiler has its way with, willy-nilly. It's an active thing that negotiates with the compiler for a set of semantics. In Perl 6 we're not shying away from that, but taking it further, and at the same time hiding it in a more declarative style. So you need to be aware that, although many of the things we'll be talking about here look like declarations, they trigger Perl code that runs during compilation. Of such methods are metaclasses made. (While these methods are often triggered by grammar rule reductions, remember from Apocalypse 5 that all these grammar rules are also running under the user's control. You can tweak the language without the crude ax of source filtering.)

There are many roads to polymorphism

In looking for an "obvious" way to conventionalize Perl's object system, we shouldn't overlook the fact that there's more than one obvious way, and different approaches work better in different circumstances. Inheritance is one way (and typically the most overused), but we also need good support for composition, delegation, and parametric types. Cutting across those techniques are issues of interface, implementation, and mixtures of interface and implementation. There are multiple strategies for ambiguity resolution as well, and no single strategy is always right. (Unless the boss says so.)

People using a class shouldn't have to think hard

In making it easier to define and derive classes, we must be careful not to make it harder to use classes.

Trust in Convention, But Keep Your Powder Dry

So to summarize this summary, what we're proposing to develop is a set of conventions for how object orientation ought to work in Perl 6--by default. But there should also be enough hooks to customize things to your heart's content, hopefully without undue impact on the sensibilities of others.

And in particular, there's enough flexibility in the new approach that, if you want to, you can still program in a way much like the old Perl 5 approach. There's still a bless method, and you can still pretend that an object is a hash--though it isn't anymore.

However, as with all the rest of the design of Perl 6, the overriding concern has been that the language scale well. That means Perl has to scale down as well as up. Perl has to work well both as a first language and as a last language. We believe our design fulfills this goal--though, of course, only time will tell.

One other note: if you haven't read the previous Apocalypses and Exegeses, a lot of this is going to be complete gobbledygook to you. (Of course, even if you have read them, this might still be gobbledygook. You take your chances in life...)

An Easy Example

Before we start talking about all the hard things that should be possible, let's look at an example of some of the easy things that should be easy. Suppose we define a Point object that (for some strange reason) allows you to adjust the y-axis but not the x-axis.


    class Point {
        has $.x;
        has $.y is rw;

        method clear () { $.x = 0; $.y = 0; }
    }

    my $point = Point.new(x => 2, y => 3);

    $a = $point.y;      # okay
    $point.y = 42;      # okay

    $b = $point.x;      # okay
    $point.x = -1;      # illegal, default is read-only

    $point.clear;       # reset to 0,0

If you compare that to how it would have to be written in Perl 5, you'll note a number of differences:

  • It uses the keywords class and method rather than package and sub.
  • The attributes are named in explicit declarations rather than implicit hash keys.
  • It is impossible to confuse the attribute variables with ordinary variables because of the extra dot (which also associates the attributes visually with method calls).
  • Perhaps most importantly, we did not have to commit to using a hash (or any other external data structure) for the object's values.
  • We didn't have to write a constructor.
  • The implicit constructor automatically knows how to map named arguments to the attribute names.
  • We didn't have to write the accessor methods.
  • The accessors are by default read-only outside the class, and you can't get at the attributes from outside the class without an accessor. (Inside the class you can use the attributes directly.)
  • The invocant of the clear method is implicit.
  • And perhaps most obviously, Perl 6 uses . instead of -> to dereference an object.

Now suppose we want to derive from Point, and add a z-axis. That's just:


    class Point3d is Point {
        has $:z = 123;
        method clear () { $:z = 0; next; }
    }
    my $point3d = Point3d.new(x => 2, y => 3, z => 4);
    $c = $point3d.z;    # illegal, $:z is invisible

The implicit constructor automatically sorts out the named arguments to the correct initializers for you. If you omit the z value, it will default to 123. And the new clear method calls the old clear method merely by invoking next, without the dodgy "super" semantics that break down under MI. We also declared the $:z attribute to be completely private by using a colon instead of a dot. No accessor for it is visible outside the class. (And yes, OO purists, our other attributes should probably have been private in the first place...that's why we're making it just as easy to write a private attribute as a public one.)

If any of that makes your head spin, I'm sure the following will clear it right up. :-)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Classes

A class is what provides a name and a place for the abstract behavior of a set of objects said to belong to the class.

As in Perl 5, a class is still "just a funny package", structurally speaking. Syntactically, however, a class is now distinct from a package or a module. And the body of a class definition now runs in the context of a metaclass, which is just a way of saying that it has a metaclass instance as its (undeclared) invocant. (An "invocant" is what we call the object or class on behalf of which a method is being called.) Hence class definitions, though apparently declarative, are also executing code to build the class definition, and the various declarations within the class are also running bits of code. By convention classes will use a standard metaclass, but that's just convention. (A very strong convention, we hope.)

The primary role of a class is to manage instances, that is, objects. So a class must worry about object creation and destruction, and everything that happens in between. Classes have a secondary role as units of software reuse, in that they can be inherited from or delegated to. However, because this is a secondary role, and because of weaknesses in models of inheritance, composition, and delegation, Perl 6 will split out the notion of software reuse into a separate class-like entity called a "role". Roles are an abstraction mechanism for use by classes that don't care about the secondary aspects of software reuse, or that (looking at it the other way) care so much about it that they want to encapsulate any decisions about implementation, composition, delegation, and maybe even inheritance. Sounds fancy, but just think of them as includes of partial classes, with some safety checks. Roles don't manage objects. They manage interfaces and other abstract behavior (like default implementations), and they help classes manage objects. As such, a role may only be composed into a class or into another role, never inherited from or delegated to. That's what classes are for.

Classes are arranged in an inheritance hierarchy by their "isa" relationships. Perl 6 supports multiple inheritance, but makes it easy to program in a single-inheritance style, insofar as roles make it easy to mix in (or delegate, or parameterize) private implementation details that don't belong in the public inheritance tree.

In those cases where MI is used, there can be ambiguities in the pecking order of classes in different branches. Perl 6 will have a canonical way to disambiguate these, but by design the dispatch policy is separable from inheritance, so that you can change the rules for a given set of classes. (Certainly the rules can change when we call into another language's class hierarchy, for instance.)

Where possible, class names are treated polymorphically, just as method names are. This powerful feature makes it possible to inherit systems of classes in parallel. (These classes might be inner classes, or they might be inner aliases to outer classes.) By making the class names "virtual", the base classes can refer to the appropriate derived classes without knowing their full name. That sounds complicated, but it just means that if you do the normal thing, Perl will call the right class instead of the one you thought it was going to call. :-)

(As in C++ culture, we use the term "virtual" to denote a method that dispatched based on the actual runtime type of the object rather than the declared type of the variable. C++ classes have to declare their methods to be virtual explicitly. All of Perl's public methods are virtual implicitly.)

You may derive from any built-in class. For high-level object classes such as Int or Num there are no restrictions on how you derive. For low-level representational classes like int or num, you may not change the representation of the value; you may only add behaviors. (If you want to change the representation, you should probably be using composition instead of inheritance. Or define your own low-level type.) Apart from this, you don't need to worry about the difference between int and Int, or num and Num, since Perl 6 will do autoboxing.

Declaration of Classes

Class declarations may be either file scoped or block scoped. A file-scoped declaration must be the first thing in the file, and looks like this:


    class Dog is Mammal;
    has Limb @.paws;
    method walk () { .paws».move() }

That has the advantage of avoiding the use of one set of braces, letting you put everything up against left margin. It is otherwise identical to a block-scoped class, which looks like this:


    class Dog is Mammal {
        has Limb @.paws;
        method walk () { .paws».move() }
    }

An incomplete class definition makes use of the ... ("yada, yada, yada") operator:


    class Dog is Mammal {...}

The declaration of a class name introduces the name as a valid bare identifier or name. In the absence of such a declaration, the name of a class in an expression must be introduced with the :: class sigil, or it will be considered a bareword and rejected, since Perl 6 doesn't allow barewords. Once the name is declared however, it may be used as an ordinary term in an expression. Unlike in Perl 5, you should not view it as a bareword string. Rather, you should view it as a parameterless subroutine that returns a class object, which conveniently stringifies to the name of the class for Perl 5 compatibility. But when you say


    Dog.new()

the invocant of new is an object of type Class, not a string as in Perl 5.

Unmodified, a class declaration always declares a global name. But if you prefix it with our, you're defining an inner class:


    class Cell {
        our class Golgi {...}
        ...
    }

The full name of the inner class is Cell::Golgi, and that name can be used outside of Cell, since Golgi is declared in the Cell package. (Classes may be declared private, however. More later.)

Class traits

A class declaration may apply various traits to the class. (A trait is a property applied at compile time.) When you apply a trait, you're accepting whatever it is that that trait does to your class, which could be pretty much anything. Traits do things to classes. Do not confuse traits with roles, which are sworn to play a subservient role to the class. Traits can do whatever they jolly well please to your class's metadata.

Now, the usual thing to do to a class's metadata is to insert another class into its ISA metadata. So we use trait notation to install a superclass:


    class Dog is Mammal {...}

To specify multiple inheritance, just add another trait:


    class Dog is Mammal is Pet {...}

But often you'll want a role instead, specified with does:


    class Dog is Mammal does Pet {...}

More on that later. But remember that traits are evil. You can have traits like:


    class Moose is Mammal is stuffed is really(Hatrack) is spy(Russian) {...}

So what if you actually want to derive from stuffed? That's a good question, which we will answer later. (The short answer is, you don't.)

Now as it happens, you can also use is from within the class. You can also put the does inside to include various roles:


    class Dog {
        is Mammal;
        does Pet;
        does Servant;
        does Best::Friend[Man];
        does Drool;
        ...
    }

In fact, there's no particular reason to put any of these outside the braces except to make them more obvious to the casual reader. If we take the view that inheritance is just one form of implementation, then a simple


    class Dog {...}

is sufficient to establish that there's a Dog class defined out there somewhere. We shouldn't really care about the implementation of Dog, only its interface--which is usually pretty slobbery.

That being said, you can know more about the interface at compile time once you know the inheritance, so it's good to have pulled in a definition of the class as well as a declaration. Since this is typically done with use, the inheritance tree is generally available even if you don't mark your class declaration externally with the inheritance. (But in any event, the actual inheritance tree doesn't have to be available till runtime, since that's when methods are dispatched. (Though as is often the case, certain optimizations work better when you give them more data earlier...))

Use of Classes

A class is used directly by calling class methods, and indirectly by calling methods of an object of that class (or of a derived class that doesn't override the methods in question).

Classes may also be used as objects in their own right, as instances of a metaclass, the class MetaClass by default. When you declare class Dog, you're actually calling a metaclass class method that constructs a metaclass instance (i.e., the Dog class) and then calls the associated closure (i.e., the body of the class) as a method on the instance. (With a little grammatical magic thrown in so that Dog isn't considered a bareword.)

The class Dog is an instance of the class MetaClass, but it's also an instance of the type Class when you're thinking of it as a dispatcher. That is, a class object is really allomorphic. If you treat one as an instance of Class, it behaves as if it were the user's view of the class, and the user thinks the class is there only to dispatch to the user's own class and instance methods. If, however, you treat the object as an instance of MetaClass, you get access to all its metaclass methods rather than the user-defined methods. Another way to look at it is that the metaclass object is a separate object that manages the class object. In any event, you can get from the ordinary class object to its corresponding metaclass object via the .meta method, which every object supports.

By the way, a Class is a Module, which in turn is a Package, which in turn is an Object. Or something like that. So a class can always be used as if it were a mere module or package. But modules and packages don't have a .dispatch method...

By default, classes in Perl are left open. That is, you can add more methods later. (However, an application may close them.) For discussion of this, see the section on "Open vs. Closed Classes".

Class Name Semantics

Class names (and module names) are just package names.

Unlike in Perl 5, when you mention a package name in Perl 6 it doesn't always mean a global name, since Perl 6 knows about inner classes and lexically scoped packages and such. As with other entities in Perl such as variables and methods, a scan is made for who thinks they have the best definition of the name, going out from lexical scopes to package scope to global scope in the case of static class names, and via method inheritance rules in the case of virtual class names.

Note that ::MyClass and MyClass mean the same thing. In Perl 6, an initial :: is merely an optional sigil for when the name of the package would be misconstrued as something else. It specifically does not mean (as it does in Perl 5) that it is a top-level package. To refer to the top-level package, you would need to say something like ::*MyClass (or just *MyClass in places where the * unary operator would not be expected.) But also note that the * package in Perl is not the "main" package in the Perl 5 sense.

Likewise, the presence of :: within a package name like Fish::Carp does not make it a global package name necessarily. Again, it scans out through various scopes, and only if no local scopes define package Fish::Carp do you get the global definition. And again, you can force it by saying ::*Fish::Carp. (Or just *Fish::Carp in places where the * unary operator is not expected.)

You can interpolate a parenthesized expression within a package name after any ::. So, these are all legal package names (or module names, or class names):


    ::($alice)
    ::($alice)::($bob)
    ::($alice::($bob))
    ::*::($alice)::Bob
    ::('*')::($alice ~ '_misc')::Bob
    ::(get_my_dir())
    ::(@multilevel)

And any of those package names could be part of a variable or sub name:


    $::($alice)::name
    @::($alice)::($bob)::elems[1,2,3]
    %::*::($alice)::Bob::map{'xyz'}
    &::('*')::($alice ~ '_misc')::Bob::doit(1,2,3)
    $::(get_my_dir())::x
    $::(@multilevel)

Note in the last example that the final element of @multilevel is taken to be the variable name. This may be illegal under use strict refs, since it amounts to a symbolic reference. (Not that the others aren't symbolic, but the rules may be looser for package names than for variable names, depending on how strict our strictures get.)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Private Classes

A class named with a single initial colon is a private class name:


    class :MyPrivateClass {...}

It is completely ignored outside of the current class. Since the name is useful only in the current package, it makes no sense to try to qualify it with a package name. While it's an inner class of sorts, it does not override any class name from any other class because it lives in its own namespace (a subnamespace of the current package), and there's no way to tell if the class you're deriving from declares its own private class of the same name (apart from digging through the reflection interfaces).

The colon is orthogonal to the scoping. What's actually going on in this example is that the name is stored in the package with the leading colon, because the colon is part of the name. But if you declared "my class :Golgi" the private name would go into the lexical namespace with the colon. The colon functions a bit like a "private" trait, but isn't really a trait. Wherever you might use a private name, the colon in the name effectively creates a private subspace of names, just as if you'd prefixed it with "_" in the good old days.

But if were only that, it would just be encapsulation by convention. We're trying to do a little better than that. So the language needs to actively prevent people from accessing that private subspace from outside the class. You might think that that's going to slow down all the dispatchers, but probably not. The ordinary dispatch of Class.method and $obj.method don't have to worry about it, because they use bare identifiers. It's only when people start doing ::($class) or $obj.$method that we have to trap illegal references to colonic names.

Even though the initial colon isn't really a trait, if you interrogate the ".private" property of the class, it will return true. You don't have to parse the name to get that info.

We'll make more of this when we talk about private methods and attributes. Speaking of methods...

Methods

Methods are the actions that a class knows how to invoke on behalf of an object of that type (or on behalf of itself, as a class object). But you knew that already.

As in Perl 5, a method is still "just a funny subroutine", but in Perl 6 we use a different keyword to declare it, both because it's better documentation, and because it captures the metadata for the class at compile time. Ordinary methods may be declared only within the scope of a class definition. (Multimethods are exempt from this restriction, however.)

Declaration of Methods

To declare a method, use the method keyword just as you would use sub for an ordinary subroutine. The declaration is otherwise almost identical:


    method doit ($a, $b, $c) { ... }

The one other difference is that a method has an invocant on behalf of which the method is called. In the declaration above, that invocant is implicit. (It is implicitly typed to be the same as the current surrounding class definition.) You may, however, explicitly declare the invocant as the first argument. The declaration knows you're doing that because you put a colon between the invocant and the rest of the arguments:


    method doit ($self: $a, $b, $c) { ... }

In this case, we didn't specify the type of $self, so it's an untyped variable. To make the exact equivalent of the implicit declaration, put the current class:


    method doit (MyClass $self: $a, $b, $c) { ... }

or more generically using the ::_ "current class" pronoun:


    method doit (::_ $self: $a, $b, $c) { ... }

In any case, the method sets the current invocant as the topic, which is also known as the $_ variable. However, the topic can change depending on the code inside the method. So you might want to declare an explicit invocant when the meaning of $_ might change. (For further discussion of topics see Apocalypse 4. For a small writeup on sub signatures see Apocalypse 6.)

A private method is declared with a colon on the front:


    method :think (Brain $self: $thought)

Private methods are callable only by the class itself, and by trusted "friends". More about that when we talk about attributes.

Use of Methods

As in Perl 5, there are two notations for calling ordinary methods. They are called the "dot" notation and the "indirect object" notation.

The dot notation

Perl 6's "dot" notation is just the industry-standard way to call a method these days. (This used to be -> in Perl 5.)


    $object.doit("a", "b", "c");

If the object in question is the current topic, $_, then you can use the unary form of the dot operator:


    for @objects {
        .doit("a", "b", "c");
    }

A simple variable may be used for an indirectly named method:


    my $dosomething = "doit";
    $object.$dosomething("a", "b", "c");

As in Perl 5, if you want to do anything fancier, use a temporary variable.

The parentheses may also be omitted when the following code is unambiguously a term or operator, so you can write things like this:


    @thumbs.each { .twiddle }   # same as @thumbs.each({.twiddle})
    $thumb.twiddle + 1          # same as $thumb.twiddle() + 1
    .mode 1                     # same as $_.mode(1)

(Parens are always required around the argument list when a method call with arguments is interpolated into a string.)

The parser will make use of whitespace at this point to decide some things. For instance


    $obj.method + 1

is obviously a method with no arguments, while


    $obj.method +1

is obviously a method with an argument. However, the dwimmery only goes as far as the typical person's visual intuition. Any construct too ambiguous is simply rejected. So


    $obj.method+1

produces a parse error.

In particular, curlies, brackets, or parens would be interpreted as postfix subscripts or argument lists if you leave out the space. In other words, Perl 6 distinguishes:


    $obj.method ($x + $y) + $z  # means $obj.method(($x + $y) + $z)

from


    $obj.method($x + $y) + $z   # means ($obj.method($x + $y)) + $z

Yes, this is different from Perl 5. And yes, I know certain people hate it. They can write their own grammar.

While it's always possible to disambiguate with parentheses, sometimes that is just too unsightly. Many methods want to be parsed as if they were list operators. So as an alternative to parenthesizing the entire argument list, you can disambiguate by putting a colon between the method call and the argument list:


    @thumbs.each: { .twiddle }  # same as @thumbs.each({.twiddle})
    $thumb.twiddle: + 1         # same as $thumb.twiddle(+ 1)
    .mode: 1                    # same as $_.mode(1)
    $obj.for: 1,2,3 -> $i { ... }

If a method is declared with the trait "is rw", it's an lvalue method, and you can assign to it just as if it were an ordinary variable:


    method mystate is rw () { return $:secretstate }

    $object.mystate = 42;
    print $object.mystate;      # prints 42

In fact, it's a general rule that you can use an argumentless "rw" method call anywhere you might use a variable:


    temp $state.pi = 3;
    $tailref = \$fido.tail;

(Though occasionally you might need to supply parentheses to disambiguate, since the compiler can't always know at compile time whether the method has any arguments.)

Method calls on container objects are obviously directed to the container object itself, not to the contents of the container:


    $elems = @array.elems;
    @keys  = %hash.keys;
    $sig   = &sub.signature;

However, with scalar variables, methods are always directed to the object pointed to by the reference contained in the scalar:


    $scalar = @array;           # (implied \ in scalar context)
    $elems = $scalar.elems;     # returns @array.elems

or for value types, the appropriate class is called as if the value were a reference to a "real" object.


    $scalar = "foo";
    $chars = $scalar.chars;     # calls Str::chars or some such

In order to talk to the scalar container itself, use the tied() pseudo-function as you would in Perl 5:


    if tied($scalar).constant {...}

(You may recall, however, that in Perl 6 it's illegal to tie any variable without first declaring it as tie-able, or (preferably) tying it directly in the variable's declaration. Otherwise the optimizer would have to assume that every variable has semantics that are unknowable in advance, and we would have to call it a pessimizer rather than an optimizer.)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

The "indirect object" notation

The other form of method call is known as the "indirect object" syntax, although it differs from Perl 5's syntax in that a colon is required between the indirect object (the invocant) and its arguments:


    doit $object: "a", "b", "c"

The colon may be omitted if there are no arguments (besides the invocant):


    twiddle $thumb;
    $x = new X;

Note that indirect object calls may not be directly interpolated into a string, since they don't start with a sigil. You can always use the $() expression interpolater though:


    say "$(greet $lang), world!";

As in Perl 5, the indirect object syntax is valid only if you haven't declared a subroutine locally that overrides the method lookup. That was a bit of a problem in Perl 5 since, if there happened to be a new constructor in your class, it would call that instead dispatching to the class you wanted it to. That's much less of a problem in Perl 6, however, because Perl 6 cannot confuse a method declaration with a subroutine declaration. (Which is yet another reason for giving methods their own keyword.)

Another factor that makes indirect objects work better in Perl 6 is that the class name in "new X" is a predeclared object, not a bare identifier. (Perl 5 just had to guess when it saw two bare identifiers in a row that you were trying to call a class method.)

The indirect object syntax may not be used with a variable for the methodname. You must use dot notation for that.

Because of precedence, the indirect object notation may not be used as an lvalue unless you parenthesize it:


    (mystate $object) = 42;
    (findtail Dog: "fido") = Wagging::on;

You may parenthesize an argumentless indirect object method to make it look like a function:


    mystate($object) = 42;
    twiddle($thumb);

The dispatch rules for methods and global multi subs conspire to keep these unambiguous, so the user really doesn't have to worry about whether


    close($handle);

is implemented as a global multi sub or a method on a $handle object. In essence, the multimethod dispatching rules degenerate to ordinary method dispatch when there are no extra arguments to consider (and sometimes even when there are arguments). This is particularly important because Perl uses these rules to tell the difference between


    print "Howdy, world!\n";    # global multi sub

and


    print $*OUT;                # ordinary filehandle method

However, you must still put the colon after the invocant if there are other arguments. The colon tells the parser whether to look for the arguments inside:


    doit($object: "a", "b", "c")

or outside:


    doit($object): "a", "b", "c"

If you do say


    doit($object, "a", "b", "c")

the first comma forces it to be interpreted as a sub call rather than a method call.

(We could have decided to say that whenever Perl can't find a doit() sub definition at runtime, it should assume you meant the entire parenthesized list to be the indirect object, which, since it's in scalar context would automatically generate a list reference and call [$object,"a","b","c"].doit(), which is unlikely to be what you mean, or even work. (Unless, of course, that's how you really meant it to work.) But I think it's much more straightforward to simply disallow comma lists at the top level of an indirect object. The old "if it looks like a function" rule applies here. Oddly, though, function syntax is how you call multisubs in Perl 6. And as it happens, the way the multisub/multimethod dispatch rules are defined, it could still end up calling $object.doit("a", "b", "c") if that is deemed to be the best choice among all the candidates. But syntactically, it's not an indirect object. More on dispatch rule interactions later.)

The comma still doesn't work if you go the other way and leave out the parens entirely, since


    doit $object, "a", "b", "c";

would always (in the absence of a prior sub declaration) be parsed as


    (doit $object:), "a", "b", "c";

So a print with both an indirect object and arguments has to look like one of these:


    print $*OUT: "Howdy, world!\n";
    print($*OUT: "Howdy, world!\n");
    print($*OUT): "Howdy, world!\n";

Note that the old Perl 5 form using curlies:


    print {some_hairy_expression()} "Howdy, world!\n";

should instead now be written with parentheses:


    print (some_hairy_expression()): "Howdy, world!\n";

though, in fact, in this case the parens are unnecessary:


    print some_hairy_expression(): "Howdy, world!\n";

You'd only need the parens if the invocant expression contained operators lower in precedence than comma (comma itself not being allowed). Basically, if it looks confusing to you, you can expect it to look confusing to the compiler, and to make the compiler look confused. But it's a feature for the compiler to look confused when it actually is confused. (In Perl 5 this was not always so.)

Note that the disambiguating colon associates with the closest method call, whether direct or indirect. So


    print $obj.meth: "Howdy, world!\n";

passes "Howdy, world!\n" to $obj.meth rather than to print. That's a case where you ought to have parenthesized the indirect object for clarity anyway:


    print ($obj.meth): "Howdy, world!\n";
Calling private methods

A private method does not participate in normal method dispatch. It is not listed in the class's public methods. The .can method does not see it. Calling it via normal dispatch raises a "no such method" exception. It is, in essence, invisible to the outside world. It does not hide a base class's method of the same name--even in the current class! It's fair to ask for warnings about name collisions, of course. But we're not following the C++ approach of making private methods visible but uncallable, because that would violate encapsulation, and in particular, Liskov substitutability. Instead, we separate the namespaces completely by distinguishing the public dot operator from the private dot-colon operator. That is:


    $mouth.say("Yes!")          # always calls public .say method
          .say("Yes!")          # unary form
    $brain.:think("No!")        # always calls private :think method
          .:think("No!")        # unary form

The inclusion of the colon prevents any kind of "virtual" behavior. Calling a private method is illegal except under two very specific conditions. You can call a private method :think on an object $brain only if:

  1. The class of $brain is explicitly declared, and the declared class is either the class definition that we are in or a class that has explicitly granted trust to our current class, and the declared class contains a private :think method. Or...
  2. The class of the $brain is not declared, and the current class contains a private :think method.

The upshot of these rules is that a private method call is essentially a subroutine call with a method-like syntax. But the private method we're going to call can be determined at compile time, just like a subroutine.

Class Methods

Class methods are called on the class as a whole rather than on any particular instance object of the class. They are distinguished from ordinary methods only by the declared type of the invocant. Since an implicit invocant would be typed as an object of the class and not as the class itself, the invocant declaration is not optional in a class method declaration if you wish to specify the type of the invocant. (Untyped explicit invocants are allowed to "squint", however.)

Class Invocant

To declare an ordinary class method, such as a constructor, you say something like:


    method new (Class $class: *@args) { ... }

Such a method may only be called with an invocant that "isa" Class, that is, an object of type Class, or derived from type Class.

Class|Object Invocant

It is possible to write a method that can be called with an invocant that is either a Class or an object of that current class. You can declare the method with a type junction:


    method new (Class|Dog $classorobj: *@args) { ... }

Or to be completely non-specific, you can leave out the type entirely:


    method new ($something: *@args) { ... }

That's not as dangerous as it looks, since almost by definition the dispatcher only calls methods that are consistent with the inheritance tree. You just can't say:


    method new (*@args) { ... }

which would be the equivalent of:


    method new (Dog $_: *@args) { ... }

Well, actually, you could say that, but it would require that you have an existing Dog-compatible object in order to create a new one. And that could present a little bootstrapping problem...

(Though it could certainly cure the boot chewing problem...)

But in fact, you'll rarely need to declare new method at all, because Perl supplies a default constructor to go with your class.

Submethods

Some methods are intended to be inherited by derived classes. Others are intended to be reimplemented in every class, or in every class that doesn't want the default method. We call these "submethods", because they work a little like subs, and a little like methods. (You can also read the "sub" with the meaning it has in words like "subhuman".)

Typically these are (sub)methods related to the details of construction and destruction of the object. So when you call a constructor, for instance, it ends up calling the BUILDALL initialization routine for the class, which ends up calling the BUILD submethod:


    submethod BUILD ($a, $b, $c) {
        $.a = $a;
        $.b = $b;
        $.c = $c;
    }

Since the submethod is doing things that make sense only in the context of the current class (such as initializing attributes), it makes no sense for BUILD to be inherited. Likewise DESTROY is also a submethod.

Why not just make them ordinary subs, then? Ordinary subs can't be called by method invocation, and we want to call these routines that way. Furthermore, if your base class does define an ordinary method named BUILD or DESTROY, it can serve as the default BUILD or DESTROY for all derived classes that don't declare their own submethods. (All public methods are virtual in Perl, but some are more virtual than others.)

You might be saying to yourself, "Wait, private methods aren't virtual. Why not just use a private method for this?" It's true that private methods aren't virtual, because they aren't in fact methods at all. They're just ordinary subroutines in disguise. They have nothing to do with inheritance. By contrast, submethods are all about presenting a unified inherited interface with the option of either inheriting or not inheriting the implementation of that interface, at the discretion of the class doing the implementing.

So the bottom line is that submethods allow you to override an inherited implementation for the current class without overriding the default implementation for other classes. But in any case, it's still using a public interface, called as an ordinary method call, from anywhere in your program that has an object of your type.

Or a class of your type. The default new constructor is an ordinary class method in class Object, so it's inherited by all classes that don't define their own new. But when you write your own new, you need to decide whether your constructor should be inherited or not. If so, that's good, and you should declare it as a method. But if not, you should declare it as a submethod so that derived classes don't try to use it erroneously instead of the default Object.new().

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Attributes

In Perl 6, "attributes" are what we call the instance variables of an object. (We used that word to mean something else in Perl 5--we're now calling those things "traits" or "properties".)

As with classes and methods, attribute declarations are apparently declarative. Underneath they actually call a method in the metaclass to install the new definition. The Perl 6 implementation of attributes is not based on a hash, but on something more like a symbol table. Attributes are stored in an opaque datatype rather like a struct in C, or an array in Perl 5--but you don't know that. The datatype is opaque in the sense that you shouldn't care how it's laid out in memory (unless you have to interface with an outside data structure--like a C struct). Do not confuse opacity with encapsulation. Encapsulation only hides the object's implementation from the outside world. But the object's structure is opaque even to the class that defines it.

One of the large benefits of this is that you can actually take a C or C++ data structure and wrap accessor methods around it without having to copy anything into a different data structure. This should speed up things like XML parsing.

Declaration of Attributes

In order to provide this opaque abstraction layer, attributes are not declared as a part of any other data structure. Instead, they are modeled on real variables, whose storage details are implicitly delegated to the scope in which they are declared. So attributes are declared as if they were normal variables, but with a strange scope and lifetime that is neither my nor our. (That scope is, of course, the current object, and the variable lives as long as the object lasts.) The class will implicitly store those attributes in a location distinct from any other class's attributes of the same name, including any base or derived class. To declare an attribute variable, declare it within the class definition as you would a my variable, but use the has declarator instead of my:


    class Dog is Mammal {
        has $.tail;
        has @.legs;
        ...
    }

The has declarator was chosen to remind people that attributes are in a "HASA" relationship to the object rather than an "ISA" relationship.

The other difference from normal variables is that attributes have a secondary sigil that indicates that they are associated with methods. When you declare an attribute like $.tail, you're also implicitly declaring an accessor method of the same name, only without the $ on the front. The dot is there to remind you that it's also a method call.

As with other declarations, you may add various traits to an attribute:


    has $.dogtag is rw;

If you want all your attributes to default to "rw", you can put the attribute on the class itself:


    class Coordinates is rw {
        has int $.x;
        has int $.y;
        has int $.z;
    }

Essentially, it's now a C-style struct, without having to introduce an ugly word like "struct" into the language. Take that, C++. :-)

You can also assign to a declaration:


    has $.master = "TheDamian";

Well, actually, this looks like an assignment, but it isn't. The effect of this is to establish a default; it is not executed at runtime. (Or more precisely, it runs when the class closure is executed by the metaclass, so it gets evaluated only once and the value is stored for later use by real instances. More below.)

Use of Attributes

The attribute behaves just like an ordinary variable within the class's instance methods. You can read and write the attributes just like ordinary variables. (It is, however, illegal to refer to an instance attribute variable (that is, a "has" variable) from within a class method. Class methods may only access class attributes, not instance attributes. See below.)

Bare attributes are automatically hidden from the outside world because their sigiled names cannot be seen outside the class's package. This is how Perl 6 enforces encapsulation. Outside the class the only way to talk about an attribute is through accessor methods. Since public methods are always virtual in Perl, this makes attribute access virtual outside the class. Always. (Unless you give the optimizer enough hints to optimize the class to "final". More on that later.)

In other words, only the class itself is allowed to know whether this attribute is, in fact, implemented by this class. The class may also choose to ignore that fact, and call the abstract interface, that is, the accessor method, in which case it might actually end up calling some derived class's overriding method, which might in turn call back to this class's accessor as a super method. (So in general, an accessor method should always refer to its actual variable name rather than the accessor method name to avoid infinite recursion.)

You may write your own accessor methods around the bare attributes, but if you don't, Perl will generate them for you based on the declaration of the attribute variable. The traits of the generated method correspond directly to the traits on the variable.

By default, a generated accessor is read-only (because by default any method is read-only). If you mark an attribute with the trait "is rw" though, the corresponding generated accessor will also be marked "is rw", meaning that it can be used as an lvalue.

In any event, even without "is rw" the attribute variable is always writable within the class itself (unless you apply the trait is constant to it).

As with private classes and methods, attributes are declared private using a colon on the front of their names. As with any private method, a private accessor is completely ignored outside its class (or, by extension, the classes trusted by this class).

To carry the separate namespace idea through, we incorporate the colon as the secondary sigil in declarations of private attributes:


    has $:x;

Then we can get rid of the verbose is private altogether. Well, it's still there as a trait, but the colon implies it, and is required anyway.) And we basically force people to document the private/public distinction every place they reference $:x instead of $.x, or $obj.:meth instead of $obj.meth.

We've seen secondary sigils before in earlier Apocalypses. In each case they're associated with a bizarre usage of some sort. So far we have:


    $*foo       # a truly global global (in every package)
    $?foo       # a regex-scoped variable
    $^foo       # an autodeclared parameter variable
    $.foo       # a public attribute
    $:foo       # a private attribute

As a form of the dreaded "Hungarian notation", secondary sigils are not introduced lightly. We define secondary sigils only where we deem instant recognizability to be crucial for readability. Just as you should never have to look at a variable and guess whether it's a true global, you should never have to look at a method and guess which variables are attributes and which ones are variables you just happen to be in the lexical scope of. Or which attributes are public and which are private. In Perl 6 it's always obvious--at the cost of a secondary sigil.

We do hereby solemnly swear to never, never, ever add tertiary sigils. You have been warned.

Default Values

You can set default values on attributes by pseudo-assignment to the attribute declaration:


    has Answer $.ans = 42;

These default values are associated as "build" traits of the attribute declaration object. When the BUILD submethod is initializing a new object, these prototype values are used for uninitialized attributes. The expression on the right is evaluated immediately at the point of declaration, but you can defer evaluation by passing a closure, which will automatically be evaluated at the actual initialization time. (Therefore, to initialize to a closure value, you have to put a closure in a closure.)

Here's the difference between those three approaches. Suppose you say:


    class Hitchhiker {
        my $defaultanswer = 0;
        has $.ans1 = $defaultanswer;
        has $.ans2 = { $defaultanswer };
        has $.ans3 = { { $defaultanswer } };
        $defaultanswer = 42;
        ...
    }

When the object is eventually constructed, $.ans1 will be initialized to 0, while $.ans2 will be initialized to 42. (That's because the closure binds $defaultanswer to the current variable, which still presumably has the value 42 by the time the BUILD routine initializes the new object, even though the lexical variable "$defaultanswer" has supposedly gone out of scope by the time the object is being constructed. That's just how closures work.)

And $.ans3 will be initialized not to 42, but to a closure that, if you ever call it, will also return 42. So since the accessor $obj.ans3() returns that closure, $obj.ans3().() will return 42.

The default value is actually stored under the "build" trait, so this:


    has $.x = calc($y);

is equivalent to this:


    has $.x is build( calc($y) );

and this:


    has $.x = { calc($y) };

is equivalent to either of these:


    has $.x is build( { calc($y) } );
    has $.x will build { calc($y) };

As with all closure-valued container traits, the container being declared (the $.x variable in this case) is passed as the topic to the closure (in addition to being the target that will be initialized with the result of the closure, because that's what build does). In addition to the magical topic, these build traits are also magically passed the same named arguments that are passed to the BUILD routine. So you could say


    has $.x = { calc($^y) };

to do a calculation based on the :y(582) parameter originally passed to the constructor. Or rather, that will be passed to the constructor someday when the object is eventually constructed. Remember we're really still at class construction time here.

As with other initializers, you can be more specific about the time at which the default value is constructed, as long as that time is earlier than class construction time:


    has $.x = BEGIN { calc() }
    has $.x = CHECK { calc() }
    has $.x = INIT  { calc() }
    has $.x = FIRST { calc() }
    has $.x = ENTER { calc() }

which are really just short for:


    has $.x is build( BEGIN { calc() } )
    has $.x is build( CHECK { calc() } )
    has $.x is build( INIT  { calc() } )
    has $.x is build( FIRST { calc() } )
    has $.x is build( ENTER { calc() } )

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Class Attributes

In general, class attributes are just package or lexical variables. If you define a package variable with a dot or colon, it autogenerates an accessor for you just as it does for an ordinary attribute:


    our $.count;        # generates a public read-only .count accessor
    our %:cache is rw;  # generates a private read-write .:cache accessor

The implicit invocant of these implicit accessors has a "squinting" type--it can either be the class or an object of the class. (Declare your own accessors if you have a philosophical reason for forcing the type one way or the other.)

The disadvantage of using "our" above is that both of these are accessible from outside the class via their package name (though the private one is Officially Ignored, and cannot be named simply by saying %MyClass:::cache because that syntax is specifically disallowed).

If on the other hand you declare your class variables lexically:


    my $.count;         # generates a read-only .count accessor
    my %:cache is rw;   # generates a read-write .:cache accessor

then the same pair of accessors are generated, but the variables themselves are visible only within the class block. If you reopen the class in another block, you can only see the accessors, not the bare variables. This is probably a feature.

Generally speaking, though, unless you want to provide public accessors for your class attributes, it's best to just declare them as ordinary variables (either my or state variables) to prevent confusion with instance attributes. It's a good policy not to declare any public accessors until you know you need them. They are, after all, part of your contract with the outside world, and the outside world has a way of holding you to your contracts.

Object Construction

The basic idea here is to remove the drudgery of creating objects. In addition we want object creation and cleanup to work right by default. In Perl 5 it's possible to make recursive construction and destruction work, but it's not the default, and it's not easy.

Perl 5 also confused the notions of constructor and initializer. A constructor should create a new object once, then call all the appropriate initializers in the inheritance tree without recreating the object. The initializer for a base class should be called before the initializer for any class derived from it.

The initializer for a class is always named BUILD. It's in uppercase because it's usually called automatically for you at construction time.

As with Perl 5, a constructor is only named "new" by convention, and you can write a constructor with any name you like. However, in Perl 6, if you do not supply a "new" method, a generic one will be provided (by inheritance from Object, as it happens).

The Default Constructor

The default new constructor looks like this:


    multi method new (Class $class: *%_) {
        return $class.bless(0, *%_);
    }

The arguments for the default constructor are always named arguments, hence the *%_ declaration to collect all those pairs and pass them on to bless.

You'll note also that bless is no longer a subroutine but a method call, so it's now impossible to omit the class specification. This makes it easier to inherit constructors. You can still bless any reference you could bless in Perl 5, but where you previously used a function to do that:



    # Perl 5 code...
    return bless( {attr => "hi"}, $class );


in Perl 6 you use a method call:


    # Perl 6 code...
    return $class.bless( {attr => "hi"} );

However, if what you pass as the first argument isn't a reference, bless is going to construct an opaque object and initialize it. In a sense, bless is the only real constructor in Perl 6. It first makes sure the data structure is created. If you don't supply a reference to bless, it calls CREATE to create the object. Then it calls BUILDALL to call all the initializers.

The signature of bless is something like:


    method bless ($class: $candidate, *%_)

The 0 candidate indicates the built-in opaque type. If you're really strange in the head, you can think of the "0" as standing for "0paque". Or it's the "zero" object, about which we know zip. Whatever tilts your windmill...

In any event, strings are reserved for other object layouts. We could conceivably have things like:


    return $class.bless("Cstruct", *%_);

So as it happens, 0 is short for the layout "P6opaque".

Any additional arguments to .bless are automatically passed on to CREATE and BUILDALL. But note that these must be named arguments. It could be argued that the only real purpose for writing a .new constructor in Perl 6 is to translate different positional argument signatures into a unified set of named arguments. Any other initialization common to all constructors should be done within BUILD.

Oh, the invocant of .bless is either a class or an object of the class, but if you use an object of the class, the contents of that object are not automatically used to prototype the new object. If you wish to do that, you have to do it explicitly by copying the attributes:


    $obj.bless(0, *%$obj)

(That is just a specific application of the general principle that if you treat any object like a hash, it will behave like one, to the extent that it can. That is, %$obj turns the attributes into key/value pairs, and passes those as arguments to initialize the new object. Note that %$obj includes the private attributes when used inside the class, but not outside.)

Just because .bless allows an object to be used for a class doesn't mean your new constructor has to do the same. Some folks have philosophical issues with mixing up classes and objects, and it's fine to disallow that on the constructor level. In fact, you'll note that the default .new above requires a Class as its invocant. Unless you override it, it doesn't allow an object for the constructor invocant. Go thou and don't likewise.

The default cloner

Another good reason not to overload .new to do cloning is that Perl will also supply a default .clone routine that works something like this:


    multi method clone ($proto: *%_) {
        return $proto.bless(0, *%_, *%$proto);
    }

Note the order of the two hash arguments to bless. This gives the supplied attribute values precedence over the copied attribute values, so that you can change some of the attributes en passant, if you like. That's because we're passing the two flattened hashes as arguments to .bless and Perl 6's named argument binding mechanism always picks the first argument that matches, not the last. This is opposite of what happens when you use the Perl 5 idiom:


    %newvals = (%_, %$proto);

In that case, the last value (the one in %$proto) would "win".

CREATE


    submethod CREATE ($self: *%args) {...}

CREATE is called when you don't want to use an existing data structure as the candidate for your object. In general you won't define CREATE because the default CREATE does all the heavy magic to bring an opaque object into existence. But if you don't want an opaque object, and you don't care to write all your constructors to create the data structure before calling .bless, you can define your own CREATE submethod, and it will override the standard one for all constructors in the class.

BUILDALL


    submethod BUILDALL ($self: *%args) {...}

After the data structure is created, it must be populated by each of the participating classes (and roles) in the proper order. The BUILDALL method is called upon to do this. The default BUILDALL is usually correct, so you don't generally have to override it. In essence, it delegates the initialization of parent classes to the BUILDALL of the parent classes, and then it calls BUILD on the current class. In this way the pieces of the object are assembled in the correct order, from least derived to most derived.

For each class BUILDALL calls on, if the arguments contain a pair whose key is that class name, it passes the value of the pair as its argument to that class's BUILDALL. Otherwise it passes the entire list. (There's not much ambiguity there--most classes and roles will start with upper case, while most attribute names start with lower case.)

BUILD


    submethod BUILD ($self: *%args) {...}

That is the generic signature of BUILD from the viewpoint of the caller, but the typical BUILD routine declares explicit parameters named after the attributes:


    submethod BUILD (+$tail, +@legs, *%extraargs) {
        $.tail = $tail;
        @:legs = @legs;
        ...
    }

That occurs so frequently that there's a shorthand available in the signature declaration. You can put the attributes (distinguished by those secondary sigils, you'll recall) right into the signature. The following means essentially the same thing, without repeating the names:


    submethod BUILD (+$.tail, +@:legs, *%extraargs) {...}

It's actually unnecessary to declare the *%extraargs parameter. If you leave it out, it will default to *%_ (but only on methods and submethods--see the section on Interface Consistency later).

You may use this special syntax only for instance attributes, not class attributes. Class attributes should generally not be reinitialized every time you make a new object, after all.

If you do not declare a BUILD routine, a default routine will be supplied that initializes any attributes whose names correspond to the keys of the argument pairs passed to it, and leaves the other attributes to default to whatever the class supplied as the default, or undef otherwise.

In any event, the assignment of default attribute values happens automatically. For any attribute that is not otherwise initialized, the attribute declaration's "build" property is evaluated and the resulting value copied in to the newly created attribute slot. This happens logically at the end of the BUILD block, so we avoid running initialization closures unnecessarily. This implicit initialization is based not on whether the attribute is undefined, but on whether it was initialized earlier in BUILD. (Otherwise we could never explicitly create an attribute with an undefined value.)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Eliminating Redundancy in Constructor Calls

If you say:


    my Dog $spot = Dog.new(...)

you have to repeat the type. That's not a big deal for a small typename, but sometime typenames are a lot longer. Plus you'd like to get rid of the redundancy, just because it's, like, redundant. So there's a variant on the dot operator that looks a lot like a dot assignment operator:


    my Dog $spot .= new(...)

It doesn't really quite fit the assignment operator rule though. If it did, it'd have to mean:


    my Dog $spot = $spot.new(...)

which doesn't quite work, because $spot is undefined. What probably happens is that the my cheats and puts a version of undef in there that knows it should dispatch to the Dog class if you call .self:new() on it. Anyway, we'll make it work one way or another, so that it becomes the equivalent of:


    my Dog $spot = Dog.new(...)

The alternative is to go the C++ route and make new a reserved word. We're just not going to do that.

Note that an attribute declaration of the form


    has Tail $wagger .= new(...)

might not do what you want done when you want it done, if what you want done is to create a new Dog object each time an object is built. For that you'd have to say:


    has Tail $wagger = { .new(...) }

or equivalently,


    has Tail $wagger will build { .new(...) }

But leaving aside such timing issues, you should generally think of the .= operator more as a variant on . than a variant on +=. It can, for instance, turn any non-mutating method call into a mutating method:


    @array.=sort;       # sort @array in place
    .=lc;               # lowercase $_ in place

This presumes, of course, that the method's invocant and return value are of compatible types. Some classes will wish to define special in-place mutators. The syntax for that is:


    method self:sort (Array @a is rw) {...}

It is illegal to use return from such a routine, since the invocant is automatically returned. If you do not declare the invocant, the default invocant is automatically considered "rw". If you do not supply a mutating version, one is autogenerated for you based on the corresponding copy operator.

Object Deconstruction

Object destruction is no longer guaranteed to be "timely" in Perl 6. It happens when the garbage collector gets around to it. (Though there will be ways to emulate Perl 5 end-of-scope cleanup.)

As with object creation, object destruction is recursive. Unlike creation, it must proceed in the opposite order.

DESTROYALL

The DESTROYALL routine is the counterpart to the BUILDALL routine. Similarly, the default definition is normally sufficient for the needs of most classes. DESTROYALL first calls DESTROY on the current class, and then delegates to the DESTROYALL of any parent classes. In this way the pieces of the object are disassembled in the correct order, from most derived to least derived.

DESTROY

As with Perl 5, all the memory deallocation is done for you, so you really only need to define DESTROY if you have to release external resources such as files.

Since DESTROY is the opposite of BUILD, if any attribute declaration has a "destroy" property, that property (presumably a closure) is evaluated before the main block of DESTROY. This happens even if you don't declare a DESTROY.

(The "build" and "destroy" traits are the only way for roles to let their preferences be made known at BUILD and DESTROY time. It follows that any role that does not define an attribute cannot participate in building and destroying except by defining a method that BUILD or DESTROY might call. In other words, stateless roles aren't allowed to muck around with the object's state. This is construed as a feature.)

Dispatch Mechanisms

Perl 6 supports both single dispatch (traditional OO) and multiple dispatch (also known as "multimethod dispatch", but we try to avoid that term).

Single Dispatch

Single dispatch looks up which method to run solely on the basis of the type of the first argument, the invocant. A single-dispatch call distinguishes the invocant syntactically (unlike a multiple-dispatch call, which looks like a subroutine call, or even an operator.)

Basically, anything can be an invocant as long as it fills the Dispatch role, which provides a .dispatcher method. This includes ordinary objects, class objects, and (in some cases) even varieties of undef that happen to know what class of thing they aren't (yet).

Simple single dispatch is specified with the dot operator, or its indirect object equivalent:


    $object.meth(@args)   # always calls public .meth
           .meth(@args)   # unary form
    meth $object: @args   # indirect object form

There are variants on the dot form indicated by the character after the dot. (None of these variants allows indirect object syntax.) The private dispatcher only ever dispatches to the current class or its proxies, so it's really more like a subroutine call in disguise:


    $object.:meth(@args)  # always calls private :meth
           .:meth(@args)  # unary form

It is an error to use .: unless there is a correspondingly named "colon" method in the appropriate class, just as it is an error to use . when no method can be found of that name. Unlike the .: operator, which can have only one candidate method, the . operator potentially generates a list of candidates, and allows methods in that candidate list to defer to subsequent methods in other classes until a candidate has been found that is willing to handle the dispatch.

In addition to the .: and .= operators, there are three other dot variants that can be used if it's not known how many methods are willing to handle the dispatch:


    $object.?meth(@args)  # calls method if there is one
           .?meth(@args)  # unary form
    $object.*meth(@args)  # calls all methods (0 or more)
           .*meth(@args)  # unary form
    $object.+meth(@args)  # calls all methods (1 or more)
           .+meth(@args)  # unary form

The .* and .+ versions are generally only useful for calling submethods, or methods that are otherwise expected to work like submethods. They return a list of all the successful return values. The .? operator either returns the one successful result, or undef if no appropriate method is found. Like the corresponding regex modifiers, ? means "0 or 1", while * means "0 or more", and + means "1 or more". Ordinary . means "exactly one". Here are some sample implementations, though of course these are probably implemented in C for maximum efficiency:


    # Implements . (or .? if :maybe is set).
    sub CALLONE ($obj, $methname, +$maybe, *%opt, *@args) {
        my $startclass = $obj.dispatcher() // fail "No dispatcher: $obj";
      METHOD:
        for WALKMETH($startclass, :method($methname), %opt) -> &meth {
            return meth($obj, @args);
        }
        fail qq(Can't locate method "$methname" via class "$startclass")
            unless $maybe;
        return;
    }

With this dispatcher you can continue by saying "next METHOD". This allows methods to "failover" to other methods if they choose not to handle the request themselves.


    # Implements .+ (or .* if :maybe is set).
    #   Add :force to redispatch in every class
    sub CALLALL ($obj, $methname, +$maybe, +$force, *%opt, *@args) {
        my $startclass = $obj.dispatcher() // fail "No dispatcher: $obj";
        my @results = gather {
            if $force {
              METHOD:
                for WALKCLASS($startclass, %opt) -> $class {
                    take $obj.::($class)::$methname(*@args) # redispatch
                }
            }
            else {
              METHOD:
                for WALKMETH($startclass, :method($methname), %opt) -> &meth {
                    take meth($obj,*@args);
                }
            }
        }
        return @results if @results or $maybe;
        fail qq(Can't locate method "$methname" via class "$startclass");
    }

This one you can quit early by saying "last METHOD". Notice that both of these dispatchers cheat by calling a method as if it were a sub. You may only do that by taking a reference to the method, and calling it as a subroutine, passing the object as the first argument. This is the only way to call a virtual method non-virtually in Perl. If you try to call a method directly as a subroutine, Perl will ignore the method, look for a subroutine of that name elsewhere, probably not find it, and complain bitterly. (Or find the wrong subroutine, and execute it, after which you will complain bitterly.)

We snuck in an example the new gather/take construct. It is still somewhat conjectural.

Calling Superclasses, and Not-So-Superclasses

Perl 5 supplies a pseudoclass, SUPER::, that redirects dispatch to a parent class's method. That's often the wrong thing to do, though, in part because under MI you may have more than one parent class, and also because you might have sibling classes that also need to have the given method triggered. Even if SUPER is smart enough to visit multiple parent classes, and even if all your classes cooperate and call SUPER at the right time, the depth first order of visitation might be the wrong order, especially under diamond inheritance. Still, if you know that your parent classes use SUPER, or you're calling into a language with SUPER semantics (such as Perl 5) then you should probably use SUPER semantics too, or you'll end up calling your parent's parents in duplicate. However, since use of SUPER is slightly discouraged, we Huffman code it a bit longer in Perl 6. Remember the *%opt parameters to the dispatchers above? That comes in as a parameterized pseudoclass called WALK.


    $obj.*WALK[:super]::method(@args)

That limits the call to only those immediate super classes that define the method. Note the star in the example. If you really want the Perl 5 semantics, leave the star out, and you'll only get the first existing parent method of that name. (Why you'd want that is beyond me.)

Actually, we'll probably still allow SUPER:: as a shorthand for WALK[:super]::, since people will just hack it in anyway if we don't provide it...

If you think about it, every ordinary dispatch has an implicit WALK modifier on the front that just happens to default to WALK[:canonical]. That is, the dispatcher looks for methods in the canonical order. But you could say WALK[:depth] to get Perl 5's order, or you could say WALK[:descendant] to get an order approximating the order of construction, or WALK[:ascendant] to get an order approximating the order of destruction. You could say WALK[:omit(SomeClass)] to call all classes not equivalent to or derived from SomeClass. For instance, to call all super classes, and not just your immediate parents, you could say WALK[:omit(::_)] to skip the current lexical class or anything derived from it.

But again, that's not usually the right thing to do. If your base classes are all willing to cooperate, it's much better to simply call


    $obj.method(@args)

and then let each of the implementations of the method defer to the next one when they're done with their part of it. If any method says "next METHOD", it automatically iterates the loop of the dispatcher and finds the next method to dispatch to, even if that method comes from a sibling class rather than a parent class. The next method is called with the same arguments as originally supplied.

That presupposes that the entire set of methods knows to call "next" appropriately. This is not always the case. In fact, if they don't all call next, it's likely that none of them does. And maybe just knowing whether or not they do is considered a violation of encapsulation. In any case, if you still want to call all the methods without their active cooperation, then use the star form:


    $obj.*method(@args)

Then the various methods don't have to do anything to call the next method--it happens automatically by default. In this case a method has to do something special if it wants to stop the dispatch. Naturally, that something is to call "last METHOD", which terminates the dispatch loop early.

Now, sometimes you want to call the next method, but you want to change the arguments so that the next method doesn't get the original argument list. This is done with deep magic. If you use the call keyword in an ordinary (nonwrapper) method, it steals the rest of the dispatch list from the outer loop and redispatches to the next method with the new arguments:


    @retvals = call(@newargs)
    return @retvals;

And unlike with "next METHOD", control returns to this method following the call. It returns the results of the subsequent method calls, which you should return so that your outer dispatcher can add them to the return values it already gathered.

Note that "next METHOD" and "last METHOD" can typically be spelt "next" and "last" unless they are in an inner loop.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Parallel Dispatch

By default the various dot operators call a method on a single object, even if it ends up calling multiple methods for that object. Since a method call is essentially a unary postfix operator, however, you can use it as a hyper operator on a list of objects:


    @object».meth(@args)        # Call one for each or fail
    @object».?meth(@args)       # Call one for each if available
    @object».*meth(@args)       # Call all available for each
    @object».+meth(@args)       # Call one or more for each

Note that with the last two, if a method uses "last METHOD", it doesn't bomb out of the "hyper" loop, but just goes on to the next entry. One can always bomb out of the hyperloop with a real exception, of course. And maybe with "last HYPER", depending on how hyper's implicit iteration is implemented.

If you want to use an array for serial rather than parallel method calling, see Delegation, which lets you set up cascading handlers.

WALKCLASS and WALKMETH Caching

WALKCLASS generates a list of matching classes. WALKMETH generates a list of method references from matching classes.

The WALKCLASS and WALKMETH routines used in the sample dispatch code need to cache their results so that every dispatch doesn't have to traverse the inheritance tree again, but just consult the preconstructed list in order. However, if there are changes to any of the classes involved, then someone needs to call the appropriate cache clear method to make sure that the inheritance is recalculated.

WALKCLASS/WALKMETH options include some that specify ordering:


    :canonical      # canonical dispatch order
    :ascendant      # most-derived first, like destruction order
    :descendant     # least-derived first, like construction order
    :preorder       # like Perl 5 dispatch
    :breadth        # like multimethod dispatch

and some that specify selection criteria:


    :super              # only immediate parent classes
    :method(Str)        # only classes containing method declaration
    :omit(Selector)     # only classes that don't match selector
    :include(Selector)  # only classes that match selector

Note that :method(Str) selects classes that merely have methods declared, not necessarily defined. A declaration without a definition probably implies that they intend to autoload a definition, so we should call the stub anyway. In fact, Perl 6 differentiates an AUTOMETHDEF from AUTOLOAD. AUTOLOAD works as it does in Perl 5. AUTOMETHDEF is never called unless there is already a declaration of the stub (or equivalently, AUTOMETH faked a stub.)

It would be possible to just define everything in terms of WALKCLASS, but that would imply looking up each method name twice, once inside WALKCLASS to see if the method exists in the current class, and once again outside in order to call it. Even if WALKCLASS caches the cache list, it wouldn't cache the derived method list, so it's better to have a separate cache for that, controlled by WALKMETH, since that's the common case and has to be fast.

(Again, this is all abstract, and is probably implemented in gloriously grungy C code. Nevertheless, you can probably call WALKCLASS and WALKMETH yourself if you feel like writing your own dispatcher.)

Multiple Dispatch

Multiple dispatch is based on the notion that methods often mediate the relationships of multiple objects of diverse types, and therefore the first object in the argument list should not be privileged over other objects in the argument list when it comes to selecting which method to run. In this view, methods aren't subservient to a particular class, but are independent agents. A set of independent-minded, identically named methods use the class hierarchy to do pattern matching on the argument list and decide among themselves which method can best handle the given set of arguments.

The Perl approach is, of course, that sometimes you want to distinguish the first invocant, and sometimes you don't. The interaction of these two approaches gets, um, interesting. But the basic notion is to let the caller specify which approach is expected, and then, where it makes sense, fall back on the other approach when the first one fails. Underlying all this is the Principle of Least Surprise. Do not confuse this with the Principle of Zero Surprise, which usually means you've just swept the real surprises under some else's carpet. (There's a certain amount of surprise you can't go below--the Heisenberg Uncertainty Principle applies to software too.)

With traditional multimethods, all methods live in the same global namespace. Perl 6 takes a different approach--we still keep all the traditional Perl namespaces (lexical, package, global) and we still search for names the same way (outward through the lexical scopes, then the current package, then the global * namespace; or upward in the class hierarchy). Then we simply claim that, under multiple dispatch, the "long name" of any multi routine includes its signature, and that visibility is based on the long name. So an inner or derived multi only hides an outer or base multi of the same name and the same signature. (Routines not declared "multi" still hide everything in the traditional fashion.)

To put it another way, the multiple dispatch always works when both the caller and the callee agree that that's how it should work. (And in some cases it also works when it ought to work, even if they don't agree--sort of a "common law" multimethod, as it were...)

Declaration of Multiple Dispatch Routines

A callee agrees to the multiple dispatch "contract" by including the word "multi" in the declaration of the routine in question. It essentially says, "Ordinarily this would be a unique name, but it's okay to have duplicates of this name (the short name) that are differentiated by signatures (the long name)."

Looking at it from the other end, leaving the "multi" out says "I am a perfect match for any signature--don't bother looking any further outward or upward." In other words, the standard non-multi semantics.

You may not declare a multi in the same scope as a non-multi. However, as long as they are in different scopes, you can have a single non-multi inside a set of multis, or a set of multis inside a single non-multi. You can even have a set of multis inside a non-multi inside a set of multis. Indeed, this is how you hide all the outer multis so that only the inner multi's long names are considered. (And if no long name matches, you get the intermediate non-multi as a kind of backstop.) The same policy applies to both nested lexical scopes and derived subclasses.

Actually, up till now we've been oversimplifying the concept of "long name" slightly. The long name includes only that part of the signature up to the first colon. If there is no colon, then the entire signature is part of the long name. (You can have more colons, in which case the additional arguments function as tie breakers if the original set of long names is insufficient to prevent a tie.)

So sometimes we'll probably slip and say "signature" when we mean "long name". We pray your indulgence.

multi sub

A multi sub in any scope hides any multi sub with the same "long name" in any outer scope. It does not hide subs with the same short name but a different signature. Er, long name, I mean...

multi sub * (tradition multimethods)

If you want a multi that is visible in all namespaces (that don't hide the long name), then declare the name in the global name space, indicated in Perl 6 with a *. Most of the so-called "built-ins" are declared this way:


    multi sub *push (Array $array, *@args) {...}
    multi sub *infix:+ (Num $x, Num $y) returns Num {...}
    multi sub *infix:.. (Int $x, Int $y: Int ?$by) returns Ranger {...}

Note the use of colon in the last example to exclude $by as part of the long name. The range operator is dispatched only on the types of its two main arguments.

multi method

If you declare a method with multi, then that method hides any base class method with the same long name. It does not hide methods with the same short name but a different signature when called as a multimethod. (It does hide methods when called under single dispatch, in which case the first invocant is treated as the only invocant regardless of where you put the colon. Just because a method is declared with multi doesn't make it invisible to single dispatch.)

Unlike a regular method declaration, there is no implied invocant in the syntax of a multi method. A method declared as multi must declare all its invocants so that there's no ambiguity as to the meaning of the first colon. With a multi method, it always means the end of the long name. (With a non-multi, it always means that the optional invocant declaration is present.)

multi submethod

Submethods may be declared with multi, in which case visibility works the same as for ordinary methods. However, a submethod has the additional constraint that the first invocant must be an exact class match. Which effectively means that a submethod is first single dispatched to the class, and then the appropriate submethod within that class is selected, ignoring any other class's submethods of the same name.

multi rule

Since rules are just methods in disguise, you can have multi rules as well. (Of course, that doesn't do you a lot of good unless you have rules with different signatures, which is unusual.)

multi submethod BUILD

It is not likely that Perl 6.0.0 will support multiple dispatch on named arguments, but only on positional arguments. Since all the extra arguments to a BUILD routine come in as named arguments, you probably can't usefully multi a BUILD (yet). However, we should not do anything that precludes multiple BUILD submethods in the future. Which means we should probably enforce the presence of a colon before the first named argument declaration in any multi signature, so that the semantics don't suddenly change if and when we start supporting multiple dispatch that includes named arguments as part of the long name.

multi method constructors

To the extent that you declare constructors (such as .new) with positional arguments, you can use multi on them in 6.0.0.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Calling Via Multiple Dispatch

As we mentioned, multiple dispatch is enabled by agreement of both caller and callee. From the caller's point of view, you invoke multiple dispatch simply by calling with subroutine call syntax instead of method call syntax. It's then up to the dispatcher to figure out which of the arguments are invocants and which ones are just options. (In the case where the innermost visible subroutine is declared non-multi, this degenerates to the Perl 5 semantics of subroutine calls.) This approach lets you refactor a simple subroutine into a more nuanced set of subroutines without changing how the subroutines are called at all. That makes this sort of refactoring drop-dead simple. (Or at least as simple as refactoring ever gets...)

It's a little harder to refactor between single dispatch and multiple dispatch, but a good argument could be made that it should be harder to do that, because you're going to have to think through a lot more things in that case anyway.

Anyway, here's the basic relationship between single dispatch and multiple dispatch. Single dispatch is more familiar, so we'll discuss multiple dispatch first.

Multiple dispatch semantics

Whenever you make a call using subroutine call syntax, it's a candidate for multiple dispatch. A search is made for an appropriate subroutine declaration. As in Perl 5, this search goes outward through the lexical scopes, then through the current package and on to the global namespace (represented in Perl 6 with an initial * for the "wildcard" package name). If the name found is not a multi, then it's a good old-fashioned sub call, and no multiple dispatch is done. End of story.

However, if the first declaration we come to is a multi, then lots of interesting stuff happens. (Fortunately for our performance, most of this interesting stuff can happen at compile time, or upon first use.) The basic idea is that we will collect a complete list of candidates before we decide which one to call.

So the search continues outward, collecting all sub declarations with the same short name but different long names. (We can ignore outer declarations that are hidden by an inner declaration with the same long name.) If we run into a scope with a non-multi declaration, then we're done generating our candidate list, and we can skip the next paragraph.

After going all the way out to the global scope, we then examine the type of the first argument as if we were about to do single dispatch on it. We then visit any classes that would have been single dispatched, in most-derived to least-derived order, and for each of those classes we add into our candidate list any methods declared multi, plus all the single invocant methods, whether or not they were declared multi! In other words, we just add in all the methods declared in the class as a subset of the candidates. (There are reasons for this that we'll discuss below.) Anyway, just as with nested lexical scopes, if two methods have the same long name, the more derived one hides the less derived one. And if there's a class in which the method of the same short name is not declared multi, it serves as a "stopper", just as a non-multi sub does in a lexical scope. (Though that "stopper" method can of course redispatch further up the inheritance tree, just as a "stopper" lexical sub can always call further outward if it wants to.)

Now we have our list of candidates, which may or may not include every sub and method with the same short name, depending on whether we hit a "stopper". Anyway, once we know the candidate list, it is sorted into order of distance from the actual argument types. Any exact match on a parameter type is distance 0. Any miss by a single level of derivation counts as a distance of 1. Any violation of a hard constraint (such as having too many arguments for the number of parameters, or violating a subtype check on a type that does constraint checking, or missing the exact type on a submethod) is effectively an infinite distance, and disqualifies the candidate completely.

Once we have our list of candidates sorted, we simply call the first one on the list, unless there's more than one "first one" on the list, in which case we look to see if one of them is declared to be the default. If so, we call it. If not, we die.

So if there's a tie, the default routine is in charge of subsequent behavior:


    # Pick next best at random...
    multi sub foo (BaseA $a, BaseB $b) is default {
        next METHOD;
    }

    # Give up at first ambiguity...
    multi sub bar (BaseA $a, BaseB $b) is default {
        last METHOD;
    }

    # Invoke my least-derived ancestor
    multi sub baz (BaseA $a, BaseB $b) is default {
        my @ambiguities = WALKMETH($startclass, :method('baz'))
            or last METHOD;
        pop(@ambiguities).($a, $b);
    }

    # Invoke most generic candidate (often a good fall-back)...
    multi sub baz (BaseA $a, BaseB $b) is default {
        my @ambiguities = @CALLER::methods or last METHOD;
        pop(@ambiguities).value.($a, $b);
    }

In many cases, of course, the default routine won't redispatch, but simply do something generically appropriate.

Single dispatch semantics

If you use the dot notation, you are explicitly calling single dispatch. By default, if single dispatch doesn't find a suitable method, it does a "failsoft" to multiple dispatch, pretending that you called a subroutine with the invocant passed as the first argument. (Multiple dispatch doesn't need to failsoft to single dispatch since all single dispatch methods are included as a subset of the multiple dispatch candidates anyway.)

This failsoft behavior can be modified by lexically scoped pragma. If you say


    use dispatch :failhard

then single dispatch will be totally unforgiving as it is in Perl 5. Or you can tell single dispatch to go away:


    use dispatch :multi

in which case all your dot notation is treated as a sub call. That is, any


    $obj.method(1,2,3)

in the lexical scope acts like you'd said:


    method($obj,1,2,3)

If single dispatch locates a class that defines the method, but the method in question turns out to be a set of one or more multi methods, then, the single dispatch fails immediately and a multiple dispatch is done, with the additional constraint that only multis within that class are considered. (If you wanted the first argument to do loose matching as well, you should have called it as a multimethod in the first place.)

Indirect objects

If you use indirect object syntax with an explicit colon, it is exactly equivalent to dot notation in its semantics.

However, one-argument subs are inherently ambiguous, because Perl 6 does not require the colon on indirect objects without arguments. That is, if you say:


    print $fh

it's not clear whether you mean


    $fh.print

or


    print($fh)

As it happens, we've defined the semantics so that it doesn't matter. Since all single invocant methods are included automatically in multimethod dispatch, and since multiple dispatch degenerates to single dispatch when there's only one invocant, it doesn't matter which way your write it. The effect is the same either way. (Unless you've defined your own non-multi print routine in a surrounding lexical scope. But then, if you've done that, you probably did it on purpose precisely because you wanted to disable the default dispatch semantics.)

Meaning of "next METHOD"

Within the context of a multimethod dispatch, "next METHOD" means to try the next best match, if unambiguous, or else the marked default method. From within the default method it means just pick the next in the list even if it's ambiguous. The dispatch list is actually kept in @CALLER::methods, which is a list of pairs, the key of each indicating the "distance" rating, and the value of each containing a reference to the method to call (as a sub ref).

Making Fiends, er, Friends.

If you want to directly access the attributes of a class, your multi must be declared within the scope of that class. Attributes are never directly visible outside a class. This makes it difficult to write an efficient multimethod that knows about the internals of two different classes. However, it's possible for private accessors to be visible outside your class under one condition. If your class declares that another class is trusted, that other class can see the private accessors of your class. If the other class declares that you are trusted, then you can see its private accessor methods. The trust relationship is not necessarily symmetrical. This lets you have an architecture where classes by and large don't trust each other, but they all trust a single well-guarded "multi-plexor" class that keeps everyone else in line.

The syntax for trusting another class is simply:


    class MyClass {
        trusts Yourclass;
        ...
    }

It's not clear whether roles should be allowed to grant trust. In the absence of evidence to the contrary, I'm inclined to say not. We can always relax that later if, after many large, longitudinal, double-blind studies, it turns out to be both safe and effective.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Overloading

In Perl 5 overloading was this big special deal that had to have special hooks inserted all over the C code to catch various operations on overloaded types and do something special with them. In Perl 6, that just all falls out naturally from multiple dispatch. The only other part of the trick is to consider operators to be function calls in disguise. So in Perl 6 the real name of an operator is composed of a grammatical context identifier, a colon, and then the name of the operator as you usually see it. The common context identifiers are "prefix", "infix", "postfix", "circumfix", and "term", but there are others.

So when you say something like


    $x = <$a++ * -@b.[...]>;

you're really saying something like this:


    $x = circumfix:<>(
        infix:*(
            postfix:++($a),
            prefix:-(
                infix:.(
                    @b,
                    circumfix:[](
                        term:...();
                    )
                )
            )
        )
    )

Perl 5 had special key names representing stringification and numification. In Perl 6 these naturally fall out if you define:


    method prefix:+ () {...}    # what we do in numeric context
    method prefix:~ () {...}    # what we do in string context

Likewise you can define what to return in boolean context:


    method prefix:? () {...}    # what we do in boolean context

Integer context is, of course, just an ordinary method:


    method int () {...}         # what we do in integer context

These can be defined as normal methods since single-invocant multi subs degenerate to standard methods anyway. C++ programmers will tend to feel comfy defining these as methods. But others may prefer to declare them as multi subs for consistency with binary operators. In which case they'd look more like this:


    multi sub *prefix:+ (Us $us) {...}   # what we do in numeric context
    multi sub *prefix:~ (Us $us) {...}   # what we do in string context
    multi sub *prefix:? (Us $us) {...}   # what we do in string context
    multi sub *prefix:int (Us $us) {...} # what we do in integer context

Coercions to other classes can also be defined:


    multi sub *coerce:as (Us $us, Them ::to) { to.transmogrify($us) }

Such coercions allow both explicit conversion:


    $them = $us as Them;

as well as implicit conversions:


    my Them $them = $us;

Binary Ops

Binary operators should generally be defined as multi subs:


    multi sub infix:+ (Us $us, Us $ustoo) {...}
    multi sub infix:+ (Us $us, Them $them) is commutative {...}

The "is commutative" trait installs an additional autogenerated sub with the invocant arguments reversed, but with the same semantics otherwise. So the declaration above effectively autogenerates this:


    multi sub infix:+ (Them $them, Us $us) {...}

Of course, there's no need for that if the two arguments have the same type. And there might not actually be an autogenerated other subroutine in any case, if the implementation can be smart enough to simply swap the two arguments when it needs to. However it gets implemented, note that there's no need for Perl 5's "reversed arguments flag" kludge, since we reverse the parameter name bindings along with the types. Perl 5 couldn't do that because it had no control of the signature from the compiler's point of view.

See Apocalypse 6 for much more on the definition of user-defined operators, their precedence, and their associativity. Some of it might even still be accurate.

Class Composition with Roles

Objects have many kinds of relationships with other objects. One of the pitfalls of the early OO movement was to encourage people to model many relationships with inheritance that weren't really "isa" relationships. Various languages have sought to redress this deficiency in various ways, with varying degrees of success. With Perl 6 we'd like to back off a step and allow the user to define abstract relationships between classes without committing to a particular implementation.

More specifically, we buy the argument of the Traits paper that classes should not be used both to manage objects and to manage code reuse. It needs to be possible to separate those concerns. Since a lot of the code that people want to reuse is that which manages non-isa object relationships, that's what we should abstract out from classes.

That abstraction we are calling a role. Roles can encompass both interface and implementation of object relationships. A role without implementation degenerates to an interface. A role without interface degenerates to privately instantiated generics. But the typical role will provide both interface and at least a default implementation.

Unlike the Traits paper, we will allow state as part of our implementation. This is necessary if we are to abstract out the delegation decision. We feel that the decision to delegate rather than compose a sub-object is a matter of implementation, and therefore that decision should be encapsulated (or at least be allowed to be encapsulated) in a role. This allows you to refactor a problem by redefining one or more roles without having to doctor all the classes that make use of those roles. This is a great way to turn your huge, glorious "god object" into a cooperating set of objects that know how to delegate to each other.

As in the Traits paper, roles are composed at class construction time, and the class composer does some work to make sure the composed class is not unintentionally ambiguous. If two methods of the same name are composed into the same class, the ambiguity will be caught. The author of the class has various remedies for dealing with this situation, which we'll go into below.

From the standpoint of the typical user, a role just looks like a "smart" include of a "partial class". They're smart in that roles have to be well behaved in certain respects, but most of the time the naive user can ignore the power of the abstraction.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Declaration of Roles

A role is declared much like a class, but with a role keyword instead:


    role Pet {
        method feed ($food) {
            $food.open_can();
            $food.put_in_bowl();
            .call();
        }
    }

A role may not inherit from a class. It may be composed of other roles, however. In essence, a role doesn't know its own type yet, because it will be composed into another type. So if you happen to make any mention of its main type (available as ::_), that mention is in fact generic. Therefore the type of $self is generic. Likewise if you refer to SUPER, the role doesn't know what the parent classes are yet, so that's also generic. The actual types are instantiated from the generic types when the role is composed into the class. (You can use the role name ("Pet") directly, but only in places where a role name is allowed as a type constraint, not in places that declare the type of an actual object.)

Just as the body of a class declaration is actually a method call on an instance of the MetaClass class, so too the body of a role declaration is actually a method call on an instance of the MetaRole class, which is like the MetaClass class, with some tweaks to manage Role objects instead of Class objects. For instance, a Role object doesn't actually support a dispatcher like a Class object.

MetaRole and MetaClass do not inherit from each other. More likely they both inherit from MetaModule or some such.

Parametric types

A role's main type is generic by default, but you can also parameterize other types explicitly:


    role Pet[Type $petfood = TableScraps] {
        method feed (::($petfood) $food) {...}
    }

Unlike certain other languages you may be altogether too familiar with, Perl uses square brackets for parametric types rather than angles. Within those square brackets it uses standard signature notation, so you can also use the arguments to pass initial values, for instance. Just bear in mind that by default any parameters to a role or class are considered part of the name of the class when instantiated. Inasmuch as instantiated type names are reminiscent of multimethod "long names", you may use a colon to separate those arguments that are to be considered part of the name from those that are just options.

Please note that these types can be as latent (or as non-latent) as you like. Remember that what looks like compile time to you is actually runtime to the compiler, so it's free to bind types as early or late as you tell it to, including not at all.

Interfaces

If a role merely declares methods without defining them, it degenerates to an interface:


    role Pet {
        method feed ($food) {...}
        method groom () {...}
        method scratch (+$where) {...}
    }

When such a role is included in a class, the methods then have to be defined by the class that uses the role. Actually, each method is on its own--a role is free to define default implementations for any subset of the methods it declares.

Private interfaces

If a role declares private accessors, those accessors are private to the class, not the role. The class must define any private implementations that are not supplied by the role, just as with public methods. But private method names are never visible outside the class (except to its trusted proxy classes).

Encapsulated attributes

Unlike in the Traits paper, we allow roles to have state. Which is fancy way of saying that the role can define attributes, and methods that act on those attributes, not just methods that act only on other methods.


    role Pet {
        has $.collar = { Collar.new(Tag.new) };
        method id () { return $.collar.tag }
        method lose_collar () { undef $.collar }
    }

By the way, I think that when $.collar is undefined, calling .tag on it should merely return undef rather than throwing an exception (in the same way that @foo[$x][$y][$z] returns undef when @foo[$x] is undefined, and for the same reason). The undef object returned should, of course, contain an unthrown exception documenting the problem, so that if the undef is ever asked to provide a defined value, it can explain why it can't do so. Or if the returned value is tested by //, it can participate in the resulting error message.

If you want to parameterize the initial value of a role attribute, be sure to put a colon if you don't want the parameter to be considered part of the long name:


    role Pet[IDholder $id: $tag] {
        has IDholder $.collar .= new($tag);
    }
    class Dog does Pet[Collar, DogLicense("fido")] {...}
    class Pigeon does Pet[LegBand, RacerId()] {...}
    my $dog = new Dog;
    my $pigeon = new Pigeon;

In which case the long names of the roles in question are Pet[Collar] and Pet[LegBand]. In which case all of these are true:


    $dog.does(Dog)
    $dog.does(Pet)
    $dog.does(Pet[Collar])

but this is false:


    $dog.does(Pet[LegBand])

Anyway, where were we. Ah, yes, encapsulated attributes, which leads us to...

Encapsulated private attributes

We can also have private attributes:


    has Nose $:sniffer .= new();

And encapsulated private attributes lead us to...

Encapsulated delegation

A role can abstract the decision to delegate:


    role Pet {
        has $:groomer handles «bathe groom trim» = hire_groomer();
    }

Now when the Dog or Cat class incorporates the Pet role, it doesn't even have to know that the .groom method is delegated to a professional groomer. (See section on Delegation below.)

Encapsulated inheritance

It gets worse. Since you can specify inheritance with an "is" declaration within a class, you can do the same with a role:


    role Pet {
        is Friend;
    }

Note carefully that this is not claiming that a Pet ISA Friend (though that might be true enough). Roles never inherit. So this is only saying that whatever animal takes on the role of Pet gets some methods from Friend that just happen to be implemented by inheritance rather than by composition. Probably Friend should have been written as a role, but it wasn't (perhaps because it was written in Some Other Language that runs on Parrot), and now you want to pretend that it was written as a role to get your project out the door. You don't want to use delegation because there's only one animal involved, and inheritance will work good enough till you can rewrite Friend in a language that supports role playing.

Of course, the really funny thing is that if you go across a language barrier like that, Perl might just decide to emulate the inheritance with delegation anyway. But that should be transparent to you. And if two languages manage to unify their object models within the Parrot engine, you don't want to suddenly have to rewrite your roles and classes.

And the really, really funny thing is that Parrot implements roles internally with a funny form of multiple inheritance anyway...

Ain't abstraction wonderful.

Use of Roles at Compile Time

Roles are most useful at compile time, or more precisely, at class composition time, the moment in which the MetaClass class is figuring out how to put together your Class object. Essentially, that's while the closure associated with your class is being executed, with a little extra happening before and after.

A class incorporates a role with the verb "does", like this:


    class Dog is Mammal does Pet does Sentry {...}

or equivalently, within the body of the class closure:


    class Dog {
        is Mammal;
        does Pet;
        does Sentry;
        ...
    }

There is no ordering dependency among the roles, so it doesn't matter above if Sentry comes before Pet. That is because the class just remembers all the roles and then meshes them after the closure is done executing.

Each role's methods are incorporated into the class unless there is already a method of that name defined in the class itself. A class's method definition hides any role definition of the same name, so role methods are second-class citizens. On the other hand, role methods are still part of the class itself, so they hide any methods inherited from other classes, which makes ordinary inherited methods third-class citizens, as it were.

If there are no method name conflicts between roles (or with the class), then each role's methods can be installed in the class, and we're done. (Unless we wish to do further analysis of role interrelationships to make sure that each role can find the methods it depends on, in which case we can do that. But for 6.0.0 I'll be happy if non-existent methods just fail at runtime as they do now in Perl 5.)

If, however, two roles try to introduce a method of the same name (for some definition of name), then the composition of the class fails, and the compilation of the program blows sky high--we sincerely hope. It's much better to catch this kind of error at compile time if you can. And in this case, you can.

Conflict resolution

There are several ways to solve conflicts. The first is simply to write a class method that overrides the conflicting role methods, perhaps figuring out which role method to call. It is allowed to use the role name to select one of the hidden role methods:


    method shake ($self: $arg) {
        given $arg {
            when Culprit { $self.Sentry::shake($arg) }
            when Paw     { $self.Pet::shake($arg) }
        }
    }

So even though the methods were not officially composed into the class, they're still there--they're not thrown away.

That last example looks an awful lot like multiple dispatch, and in fact, if you declare the roles' methods with multi, they would be treated as methods with different "long names", provided their signatures were sufficiently different.

An interesting question, though, is whether the class can force two role methods that weren't declared "multi" to behave as if they were. Perhaps this can be forced if the class declares a signatureless multi stub without defining it later in the class:


    multi shake {...}

The Traits paper recommends providing ways of renaming or excluding one or the other of the conflicting methods. We don't recommend that, because it's better if you can keep both contracts through multiple dispatch to the role methods. However, you can force renaming or exclusion by pretending the role is a delegation:


    does Pet handles [ :myshake«shake», Any ];
    does Pet handles { $^name !~ "shake" };

Or something that. (See the section on Delegation below.) If we can't get that to work right, you can always say something like:


    method shake { .Sentry::shake(@_) }    # exclude Pet::shake
    method handshake { .Pet::shake(@_) }   # rename Pet::shake

In many ways that's clearer than trying to attach a selection syntax to "does".

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Use of Roles at Runtime (mixins)

While roles are at their most powerful at compile time, they can also function as mixin classes at runtime. The "does" binary operator performs the feat of deriving a new class and binding the object to it:


    $fido does Sentry

Actually, it only does this if $fido doesn't already do the Sentry role. If it does already, this is basically a no-op. The does operator works on the object in place. It would be illegal to say, for instance,


    0 does true

The does operator returns the object so you can nest mixins:


    $fido does Sentry does Tricks does TailChasing does Scratch;

Unlike the compile-time role composition, each of these layers on a new mixin with a new level of inheritance, creating a new anonymous class for dear old Fido, so that a .chase method from TailChasing hides a .chase method from Sentry.

(Do not confuse the binary does with the unary does that you use inside a class definition to pull in a role.)

In contrast to does, the but operator works on a copy. So you can say:


    0 but true

and you get a mixin based on a copy of 0, not the original 0, which everyone shares. One other wrinkle is that "true" isn't, in fact, a class name. It's an enumerated value of a bit class. So what we said was a shorthand for something like:


    0 but bit::true

In earlier Apocalypses we talked about applying properties with but. This has now been unified with mixins, so any time you say:


    $value but prop($x)

you're really doing something more like


    $tmp = $value;      # make a copy
    $tmp does SomeRole; # guarantee there's a rw .prop method
    $tmp.prop = $x;     # set the prop method

And therefore a property is defined by a role like this:


    role SomeRole {
        has SomeType $.prop is rw = 1;
    }

This means that when you mention "prop" in your program, something has to know how to map that to the SomeRole role. That would often be something like an enum declaration. It's illegal to use an undeclared property. But sometimes you just want a random old property for which the role has the same name as the property. You can declare one with


    my property answer;

and that essentially declares a role that looks something like


    my role answer {
        has $.answer is rw = 1;
    }

Then you can say


    $a = 0 but answer(42)

and you have an object of an anonymous type that "does" answer, and that include a .answer accessor of the same name, so that if you call $a.answer, you'll get back 42. But $a itself has the value 0. Since the accessor is "rw", you can also say


    $a.answer = 43;

There's a corresponding assignment operator:


    $a but= tainted;

That avoids copying $a before tainting it. It basically means the same thing as


    $a does taint::tainted

For more on enumerated types, see Enums below.

Traits

Here we're talking about Perl's traits (as in compile-time properties), not Traits (as in the Traits paper).

Traits can be thought of as roles gone wrong. Like roles, they can function as straightforward mixins on container objects at compile time, but they can also cheat, and frequently do. Unlike roles, traits are not constrained to play fair with each other. With traits, it's both "first come, first served", and "he who laughs last laughs best". Traits are applied one at a time to their container victim, er, object, and an earlier trait can throw away information required by a later trait. Contrariwise, a later trait can overrule anything done by an earlier trait--except of course that it can't undestroy information that has been totally forgotten by the earlier trait.

You might say that "role" is short for "role model", while "trait" is short for "traitor". In a nutshell, roles are symbiotes, while traits are parasites. Nevertheless, some parasites are symbiotic, and some symbiotes are parasitic. Go figure...

All that being said, well-behaved traits are really just roles applied to declared items like containers or classes. It's the declaration of the item itself that makes traits seem more permanent than ordinary properties. The only reason we call them "traits" rather than "properties" is to continually remind people that they are, in fact, applied at compile time. (Well, and so that we can make bad puns on "traitor".)

Even ill-behaved traits should add an appropriately named role to the container, however, in case someone wants to look at the metadata properties of the container.

Traits are generally inflicted upon the "traitee" with the "is" keyword, though other modalities are possible. When the compiler sees words like "is" or "will" or "returns" or "handles", or special constructs like signatures and body closures, it calls into an associated trait handler, which applies the role to the item as a mixin, and also does any other traitorous magic that needs doing.

To define a trait handler for an "is xxx" trait, define one or more multisubs into a property role like this:


    role xxx {
        has Int $.xxx;
        multi sub trait_auxiliary:is(xxx $trait, Class $container: ?$arg) {...}
        multi sub trait_auxiliary:is(xxx $trait, Any $container: ?$arg) {...}
    }

Then it can function as a trait. A well-behaved trait handler will say


    $container does xxx($arg);

somewhere inside to set the metadata on the container correctly. Then not only can you say


    class MyClass is xxx(123) {...}

but you'll also be able to say



    if MyClass.meta.xxx == 123 {...}

Since a class can function as a role when it comes to parameter type matching, you can also say:


    class MyBase {
        multi sub trait_auxiliary:is(MyBase $base, Class $class: ?$arg) {...}
        multi sub trait_auxiliary:is(MyBase $tied, Any $container: ?$arg) {...}
    }

These capture control if MyBase wants to capture control of how it gets used by any class or container. But usually you can just let it call the generic defaults:


    multi sub *trait_auxiliary:is(Class $base, Class $class: ?$arg) {...}

which adds $base to the "isa" list of $class, or


    multi sub *trait_auxiliary:is(Class $tied, Any $container: ?$arg) {...}

which sets the "tie" type of the container to the implementation type in $tied.

In any event, if the trait supplies the optional argument, that comes in as $arg. (It's probably something unimportant, like the function body...) Note that unlike "pair options" such as ":wag", traits do not necessarily default to the value 1 if you don't supply the argument. This is consistent with the notion that traits don't generally do something passive like setting a value somewhere, but something active like totally screwing up the structure of your container.

Most traits are introduced by use of a "helping verb", which could be something like "is", or "will", or "can", or "might", or "should", or "does". We call these helping verbs "trait auxiliaries". Here's "will", which (being syntactic sugar) merely delegates to back to "is":


    multi sub *trait_auxiliary:will($trait, $container: &arg) {
        trait_auxiliary:is($trait, $container, &arg);
    }

Note the declaration of the argument as a non-optional reference to a closure. This is what allows us to say:


    my $dog will eat { anything() };

rather than having to use parens:


    my $dog is eat({ anything() });

Other traits are applied with a single word, and we call one of those a "trait verb". For instance, the "returns" trait described in Apocalypse 6 is defined something like this:


    role returns {
        has ReturnType $.returns;
        multi sub trait_verb:returns($container: ReturnType $arg) {
            $container does returns($arg);
        }
        ...
    }

Note that the argument is not optional on "returns".

Earlier we defined the xxx trait using multi sub definitions:


    role xxx {
        has Int $.xxx;
        multi sub trait_auxiliary:is(xxx $trait, Class $container: ?$arg) {...}
        multi sub trait_auxiliary:is(xxx $trait, Any $container: ?$arg) {...}
    }

This is one of those situations in which you may really want single-dispatch methods:


    role xxx {
        has Int $.xxx;
        method trait_auxiliary:is(xxx $trait: Class $container, ?$arg) {...}
        method trait_auxiliary:is(xxx $trait: Any $container, ?$arg) {...}
    }

Some traits are control freaks, so they want to make sure that anything mentioning them comes through their control. They don't want something dispatching to another trait's trait_help:is method just because someone introduced a cute new container type they don't know about. That other trait would just mess things up.

Of course, if a trait is feeling magnanimous, it should just go ahead and use multi subs. Since the multi-dispatcher takes into account single-dispatch methods, and the distance of an exact match on the first argument is 0, the dispatcher will generally respect the wishes of both the paranoid and the carefree.

Note that we included "does" in our list of "helping verbs". Roles actually implement themselves using the trait interface, but the generic version of trait_auxiliary:does defaults to doing proper roley things rather than proper classy things or improper traitorous things. So yes, you could define your own trait_auxiliary:does and turn your nice role traitorous. That would be...naughty.

But apart from how you typically invoke them, traits and roles are really the same thing. Just like the roles on which they're based, you may neither instantiate nor inherit from a trait. You may, however, use their names as type constraints on multimethod signatures and such. As with well-behaved roles, they should define attributes or methods that show up as metadata properties where that's appropriate. Unlike compile-time roles, which all flatten out in the same class, compile-time traits are applied one at a time, like mixin roles. You can, in fact, apply a trait to a container at runtime, but if you do, it's just an ordinary mixin role. You have to call the appropriate trait_auxiliary:is() routine yourself if you want it to do any extra shenanigans. The compiler won't call it for you at runtime like it would at compile time.

When you define a helping verb such as "is" or "does", it not only makes it a postfix operator for declarations, but a unary operator within class and role closures. Likewise, declarative closure blocks like BEGIN and INIT are actually trait verbs, albeit ones that can add multiple closures to a queue rather than adding a single property. This implies that something like


    sub foo {
        LEAVE {...}
        ...
    }

could (except for scoping issues) equivalently be written:


    sub foo LEAVE {...} {
        ...
    }

Though why you'd want to that, I don't know. Hmm, if we really generalize trait verbs like that, then you could also write things like:


    sub foo {
        is signature ('int $x');
        is cached;
        returns Int;
        ...
    }

That's getting a little out there. Maybe we won't generalize it quite that far...

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Delegation

Delegation is the art of letting someone else do your work for you. The fact that you consider it "your" work implies that delegation is actually a means of taking credit in advance for what someone else is going to do. In terms of objects, it means pretending that some other object's methods are your own. Now, as it happens, you can always do that by hand simply by writing your own methods that call out to another object's methods of the same name. So any shorthand for doing that is pure syntactic sugar. That's what we're talking about here.

Delegation in this sugary sense always requires there to be an attribute to keep a reference to the object we're delegating to. So our syntactic relief will come in the form of annotations on a "has" declaration. We could have decided to instead attach annotations to each method declaration associated with the attribute, but by the time you do this, you've repeated so much information that you almost might as well have written the non-sugary version yourself. I know that for a fact, because that's how I originally proposed it. :-)

Delegation is specified by a "handles" trait verb with an argument specifying one or more method names that the current object and the delegated object will have in common:


    has $:tail handles 'wag';

Since the method name (but nothing else) is known at class construction time, the following .wag method is autogenerated for you:


    method wag (*@args is context(Lazy)) { $:tail.wag(*@args) }

(It's necessary to specify a Lazy context for the arguments to a such a delegator method because the actual signature is supplied by the tail's .wag method, not your method.) So as you can see, the delegation syntax already cuts our typing in half, not to mention the reading. The win is even greater when you specify multiple methods to delegate:


    has $:legs handles «walk run lope shake pee»;

Or equivalently:


    has $:legs handles ['walk', 'run', 'lope', 'shake', 'pee'];

You can also say things like


    my @legmethods := «walk run lope shake pee»;
    has $:legs handles (@legmethods);

since the "has" declaration is evaluated at class construction time.

Of course, it's illegal to call the outer method unless the attribute has been initialized to an object of a type supporting the method. So a declaration that makes a new delegatee at object build time might be specified like this:


    has $:tail handles 'wag' will build { Tail.new(*%_) };

or, equivalently,


    has $:tail handles 'wag' = { Tail.new(*%_) };

This automatically performs


    $:tail = Tail.new(*%_);

when BUILD is called on a new object of the current class (unless BUILD initializes $:tail to some other value). Or, since you might want to declare the type of the attribute without duplicating it in the default value, you can also say


    has Tail $:tail handles 'wag' = { .new(*%_) };

or


    has Tail $:tail handles 'wag' will build { .new(*%_) };

Note that putting a Tail type on the attribute does not necessarily mean that the method is always delegated to the Tail class. The dispatch is still based on the runtime type of the object, not the declared type. So


    has Tail $:tail handles 'wag' = { LongTail.new(*%_) };

delegates to the LongTail class, not the Tail class. Of course, you'll get an exception at build time if you try to say:


    has Tail $:tail handles 'wag' = { Dog.new(*%_) };

since Dog is not derived from Tail (whether or not the tail can wag the dog).

We declare $:tail as a private attribute here, but $.tail would have worked just as well. A Dog's tail does seem to be a public interface, after all. Kind of a read-only accessor.

Wildcard Delegation

We've seen that the argument to "handles" can be a string or a list of strings. But any argument or subargument that is not a string is considered to be a smartmatch selector for methods. So you can say:


    has $:fur handles /^get_/;

and then you can do the .get_wet or .get_fleas methods (presuming there are such), but you can't call the .shake or .roll_in_the_dirt methods. (Obviously you don't want to delegate the .shake method since that means something else when applied to the Dog as a whole.)

If you say


    has $:fur handles Groomable;

then you get only those methods available via the Groomable role or class.

Wildcard matches are evaluated only after it has been determined that there's no exact match to the method name. They therefore function as a kind of autoloading in the overall pecking order. If the class also has an AUTOLOAD, it is called only if none of the wildcard delegations match. (An AUTOMETHDEF is called much earlier, since it knows from the stub declarations whether there is supposed to be a method of that name. So you can think of explicit delegation as a kind of autodefine, and wildcard delegation as a kind of autoload.)

When you have multiple wildcard delegations to different objects, it's possible to have a conflict of method names. Wildcard method matches are evaluated in order, so the earliest one wins. (Non-wildcard method conflicts can be caught at class composition time.)

Renaming Delegated Methods

If, where you would ordinarily specify a string, you put a pair, then the pair maps the method name in this class to the method name in the other class. If you put a hash, each key/value pair is treated as such a mapping. Such mappings are not considered wildcards.


    has $:fur handles { :shakefur«shake» :scratch«get_fleas» };

Perhaps that reads better with the old pair notation:


    has $:fur handles { shakefur => 'shake', scratch => 'get_fleas' };

You can do a wildcard renaming, but not with pairs. Instead do smartmatch with a substitution:


    has $:fur handles (s/^furget_/get_/);

As always, the left-to-right mapping is from this class to the other one. The pattern matching is working on the method name passed to us, and the substituted method name is used on the class we delegate to.

Delegation Without an Attribute

Ordinarily delegation is based on an attribute holding an object reference, but there's no reason in principle why you have to use an attribute. Suppose you had a Dog with two tails. You can delegate based on a method call:


    method select_tail handles «wag hang» {...}

The arguments are sent to both the delegator and delegatee method. So when you call


    $dog.wag(:fast)

you're actually calling


    $dog.select_tail(:fast).wag(:fast)

If you use a wildcard delegation based on a method, you should be aware that it has to call the method before it can even decide whether there's a valid method call to the delegatee or not. So it behooves you not to get too fancy with select_tail(), since it might just have to throw all that work away and go on to the next wildcard specification.

Delegation of Handlers

If your delegation object happens to be an array:


    has @:handlers handles 'foo';

then something cool happens. <cool rays> In this case Perl 6 assumes that your array contains a list of potential handlers, and you just want to call the first one that succeeds. This is not considered a wildcard match unless the "handles" argument forces it to be.

Note that this is different from the semantics of a hyper method such as @objects».foo(), which will try to call the method on every object in @objects. If you want to do that, you'll just have to write your own method:


    has @:ears;
    method twitchears () { @:ears».twitch() }

Life is hard.

Hash-Based Redispatch

If your delegation object happens to be a hash:


    has %:objects handles 'foo';

then the hash provides a mapping from the string value of "self" to the object that should be delegated to:


    has %:barkers handles "bark" =
                (Chihauhau => $yip,
                    Beagle => $yap,
                   Terrier => $arf,
                 StBernard => $woof,
                );
    method prefix:~( return "$.breed" )

If the string is not found in the hash, a "next METHOD" is automatically performed.

Again, this construct is not necessarily considered a wildcard. In the example above we know for a fact that there's supposed to be a .bark method somewhere, therefore a specific method can be autogenerated in the current class.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Relationship to Roles

Delegation is a means of including a set of methods into your class. Roles can also include a set of methods in your class, but the difference is that what a role includes happens at class composition time, while delegation is much more dynamic, depending on the current state of the delegating attribute (or method).

But there's no reason you can't have your cake and eat it too, because roles are specifically designed to allow you to pull in delegations without the class even being aware of the fact that it's delegating. When you include a role, you're just signing up for a set of methods, with maybe a little state thrown in. You don't care whether those methods are defined directly, or indirectly. The role manages that.

In fact, this is one of the primary motivators for including roles in the design of Perl 6. As a named abstraction, a role lets you refactor all the classes using that role without changing any of the classes involved. You can turn your single "god" object into a set of nicely cooperating objects transparently. Well, you have to do the composition using roles first, and that's not transparent.

Note that all statically named methods are dispatched before any wildcard methods, regardless of whether the methods came from a role or the class itself. (Inherited methods also come before wildcard methods because we order all the cachable method dispatches before all the non-cachable ones. But see below.) So the lookup order is:

  1. This class's declared methods (including autodefs and delegations)
  2. An included role's declared methods (including autodefs and delegations)
  3. Normal inherited methods (including autodefs and delegations of the parent class)
  4. Wildcard delegated methods in this class (or failing that, from any inherited class that does wildcard delegations)
  5. Methods autoloaded by an autoloader defined in this class (or failing that, an autoloader from any inherited class)

Note that any method that is stubbed (declared but not yet defined) in steps 1 or 2 skips straight to step 4, because it means this class thinks it "owns" a method of that name. (At this point Perl 5 would skip straight to step 5, but Perl 6 still wants to do wildcard delegation before falling back on inherited autoloading.)

Anonymous Delegation for ISA Emulation

When you inherit from a class with a different layout policy, Perl has to emulate inheritance via anonymous delegation. In this case it installs a wildcard delegation for you. According to the list above, this gives precedence to all methods with the same layout policy over all methods with a different layout policy. This might be a feature, especially when calling cross-language. Then again, maybe it isn't.

There is no "has" variable for such an anonymous delegation. Its delegated object is stored as a property on the class's entry in the ISA list, probably. (Or we could autogenerate an attribute whose name is related to the class name, I suppose.)

Since one of the primary motivations for allowing this is to make it possible to call back and forth between Perl 5 and Perl 6 objects, we need to make that as transparent as possible. When a Perl 6 object inherits from a Perl 5 object, it is emulated with delegation. The invocant passed into the Perl 5 (Ponie) object looks like a Perl 5 object to Perl 5. However, if the Perl 5 object passes that as an invocant back into Perl 6, it has to go back to looking like a Perl 6 object to Perl 6, or our emulation of inheritance is suboptimal. When a Ponie object accesses its attributes through what it thinks is a hash reference, it really has to call the appropriate Perl 6 accessor function if the object comes from Perl 6. Likewise, when Perl 6 calls an accessor on a Perl 5 object, it has to translate that method call into a hash lookup--presuming that the Perl 5 object is implemented as a blessed hash.

Other language boundaries may or may not do similar tricks. Python's attributes suffer from the same misdesign as Perl 5's attributes. (My fault for copying Python's object model. :-) So that'd be a good place for a similar policy.

So we can almost certainly emulate inheritance with delegation, albeit with some possible misordering of classes if there are duplicate method names. However, the hard part is constructing objects. Perl 5 doesn't enforce a policy of named arguments for its constructors, so it is difficult for a Perl 6 BUILDALL routine to have any automatic way to call a Perl 5 constructor. It's tempting to install glue code into the Perl 6 class that will do the translation, but that's really not a good idea, because someday the Perl 5 class may eventually get translated to a Perl 6 class, and your glue code will be useless, or worse.

So the right place to put the glue is actually back into the Perl 5 class. If a Perl 5 class defines a BUILD subroutine, it will be assumed that it properly handles named pairs in Perl 5's even/odd list format. That will be used in lieu of any predefined constructor named "new" or anything else.

If there is no BUILD routine in the Perl 5 package, but there is a "use fields" declaration, then we can autogenerate a rudimentary BUILD routine that should suffice for most scalar attributes.

Types and Subtypes

I've always really liked the Ada distinction between types and subtypes. A type is something that adds capabilities, while a subtype is something that takes away capabilities. Classes and roles generally function as types in Perl 6. In general you don't want to make a subclass that, say, restricts your integers to only even numbers, because then you've violated Liskov substitutability. In the same way that we force role composition to be "before" classes, we will force subtyping constraints to be "after" classes. In both cases we force it by a declarator change so that you are unlikely to confuse a role with a class, or a class with a subtype. And just as you aren't allowed to derive a role from a class, you aren't allowed to derive a class from a constrained type.

On the other hand, a bit confusingly, it looks like subtyping will be done with the "type" keyword, since we aren't using that word yet.

To remind people that a subtype of a class is just a constrained alias for the class, we avoid the "is" word and declare a type using a ::= compile-time alias, like this:


    type Str_not2b ::= Str where /^[isnt|arent|amnot|aint]$/;

The ::= doesn't create the type, nor in fact does the type keyword. It's actually the where that creates the type. The type keyword just marks the name as "not really a classname" so that you don't accidentally try to derive from it.

Since a type is "post-class-ical", there's really no such thing as an object blessed into a type. If you try it, you'll just end up with an object blessed into whatever the underlying unconstrained class is, as far as inheritance is concerned. A type is not a subclass. A type is primarily a handy way of sneaking smartmatching into multiple dispatch. Just as a role allows you to specify something more general than a class, a type allows you to specify something more specific than a class.

While types are primarily intended for restricting parameter types for multiple dispatch, they also let you impose preconditions on assignment. Basically, if you declare any container with a subtype, Perl will check the constraint against any value you might try to bind or assign to the container.


    type Str_not2b ::= Str where /^[isnt|arent|amnot|aint]$/;
    type EvenNum   ::= Num where { $^n % 2 == 0 }
                                                                            
    my Str_not2b $hamlet;
    $hamlet = 'isnt';   # Okay because 'isnt' ~~ /^[isnt|arent|amnot|aint]$/
    $hamlet = 'amnt';   # Bzzzzzzzt!   'amnt' !~ /^[isnt|arent|amnot|aint]$/
                                                                            
    my EvenNum $n;
    $n = 2;             # Okay
    $n = -2;            # Okay
    $n = 0;             # Okay
    $n = 3;             # Bzzzzzzzt

It's perfectly legal to base one subtype on another. It merely adds an additional constraint.

It's possible to use an anonymous subtype in a signature:


    use Rules::Common :profanity;

    multi sub mesg (Str where /<profanity>/ $mesg is copy) {
        $mesg ~~ s:g/<profanity>/[expletive deleted]/;
        print $MESG_LOG: $mesg;
    }
                                                                            
    multi sub mesg (Str $mesg) {
        print $MESG_LOG: $mesg;
    }

Given a set of multimethods that would "tie" on the actual classes of the arguments, a multimethod with a matching constraint will be preferred over an equivalent one with no constraint. So the first mesg above is preferred if the constraint matches, and otherwise the second is preferred. However, if two multis with constraints match (and are otherwise equivalent), it's just as if you'd called any other set of ambiguous multimethods, and one of them had better be marked as the default, or you die.

We say that types are "post-class-ical", but since you can base them off of any class including Any, they are actually rather orthogonal to the class system.

Enums

An enum functions as a subtype that is constrained to a single value. (When a subtype is constrained to a single value, it can be used for that value.) But rather than declaring it as:


    type DayOfWeek ::= Int where 0..6;
    type DayOfWeek::Sunday    ::= DayOfWeek where 0;
    type DayOfWeek::Monday    ::= DayOfWeek where 1;
    type DayOfWeek::Tuesday   ::= DayOfWeek where 2;
    type DayOfWeek::Wednesday ::= DayOfWeek where 3;
    type DayOfWeek::Thursday  ::= DayOfWeek where 4;
    type DayOfWeek::Friday    ::= DayOfWeek where 5;
    type DayOfWeek::Saturday  ::= DayOfWeek where 6;

we allow a shorthand:


    type DayOfWeek ::= int enum
        «Sunday Monday Tuesday Wednesday Thursday Friday Saturday»;

Type int is the default enum type, so that can be:


    type DayOfWeek ::= enum
        «Sunday Monday Tuesday Wednesday Thursday Friday Saturday»;

The enum installer inspects the strings you give it for things that look like pairs, so to number your days from 1 to 7, you can say:


    type DayOfWeek ::= enum
        «:Sunday(1) Monday Tuesday Wednesday Thursday Friday Saturday»;

You can import individual enums into your scope where they will function like argumentless constant subs. However, if there is a name collision with a sub or other enum, you'll have to disambiguate. Unambiguous enums may be used as a property on the right side of a "but", and the enum type can be intuited from it to make sure the object in question has the right semantics mixed in. Two builtin enums are:


    type bool ::= bit enum «false true»;
    type taint ::= bit enum «untainted tainted»;

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Open vs. Closed Classes

By default, classes in Perl are left open. That is, you can add more methods to them, though you have to be explicit that that is what you're doing:


    class Object is extended {
        method wow () { say "Wow, I'm an object." }
    }

Otherwise you'll get a class redefinition error.

Likewise, a "final" class (to use the Java term) is one that you know will never be derived from, let alone mucked with internally.

Now, it so happens that leaving all your classes open is not terribly conducive to certain kinds of optimization (let alone encapsulation). From the standpoint of the compiler, you'd like to be able to say, "I know this class will never be derived from or modified, so I can do things like access my attributes directly without going through virtual accessors." We were, in fact, tempted to make closed classes the default. But this breaks in frameworks like mod_perl where you cannot predict in advance which classes will want to be extended or derived from.

Some languages solve this (or think they solve it) by letting classes declare themselves to be closed and/or final. But that's actually a bad violation of OO principles. It should be the users of a class that decide such things--and decide it for themselves, not for others. As such, there has to be a consensus among all users of a class to close or finalize it. And as we all know, consensus is difficult to achieve.

Nevertheless, the Perl 6 approach is to give the top-level application the right to close (and finalize) classes. But we don't do this by simply listing the classes we want to close. Instead, we use the sneaky strategy of switching the default to closed and then list the classes we want to stay open.

The benefit of this is that modules other than the top level can simply list all the classes that they know should stay open. In an open framework, these are, at worst, no-ops, and they don't cause classes to close that other modules might want to remain open. If any module requests a class to stay open, it stays open. If any module requests that a class remain available as a base class, it remains available.

It has been speculated that optimizer technology in Parrot will develop such that a class can conjecturally be compiled as closed, and then recompiled as open should the need arise. (This is just a specific case of the more general problem of what you do whenever the assumptions of the optimizer are violated.) If we get such an on-the-fly optimizer/pessimizer, then our open class declarations are still not wasted--they will tell the optimizer which classes not to bother trying to close or finalize in the first place. Setting the default the other way wouldn't have the same benefit.

Syntax? You want syntax? Hmm.


    use classes :closed :open«Mammal Insect»;

Or some such. Maybe certain kinds of class reference automatically request the class to be open without a special pragma. A module could request open classes without attempting to close everything with just:


    use classes :open«Mammal Insect»;

On the other hand, maybe that's another one of those inside-out interfaces, and it should just be options on the classes whose declarations you have to include anyway:


    use classes :closed;
    class Mammal is open {...}
    class Insect is open {...}

Similarly, we can finalize classes by default and then "take it back" for certain classes:


    use classes :final;
    class Mammal is base {...}
    class Insect is base {...}

In any event, even though the default is expressed at the top of the main application, the final decision on each class is not made by the compiler until CHECK time, when all the compiled code has had a chance to stake its claims. (A JIT compiler might well wait even longer, in case runtime evaluated code wishes to express an opinion.)

Interface Consistency

In theory, a subclass should always act as a more specialized version of a superclass. In terms of the design-by-contract theory, a subclass should OR in its preconditions and AND in its postconditions. In terms of Liskov substitutability, you should always be able to substitute a derived class object in where a base class object is expected, and not have it blow up. In terms of Internet policy, a derived class (compared to its base class) should be at least as lenient in what it accepts, and at least as strict in what it emits.

So, while it would be lovely in a way to require that derived methods of the same name as a base method must use the same signature, in practice that doesn't work out. A derived class often has to be able to add arguments to the signature of a method so that it can "be more lenient" in what it accepts as input.

But this poses a problem, insofar as the user of the derived object does not know whether all the methods of a given name support the same interface. Under SUPER semantics, one can at least assume that the derived class will "weed out" any arguments that would be detrimental to its superclass. But as we have already pointed out, there isn't a single superclass under MI, and each superclass might need to have different "detrimental arguments" weeded out. One could say that in that case, you don't call SUPER but rather call out to each superclass explicitly. But then you're back to the problem that SUPER was designed to solve. And you haven't solved SUPER's problem either.

Under NEXT semantics, we assume that we are dispatching to a set of methods with the same name, but potentially different signatures. (Perl 6's SUPER implementation is really a limited form of NEXT, insofar as SUPER indicates a set of parent methods, unlike in Perl 5 where it picks one.) We need a way of satisfying different signatures with the same set of arguments.

There are, in fact, two ways to approach this. One way is to say, okay everything is a multimethod, and we just won't call anything whose signature is irreconcilably inconsistent with the arguments presented. Plus there are varying degrees of consistency within the set of "consistent" interfaces, so we try them in decreasing order of consistency. A more consistent multi is allowed to fall back to a less consistent multi with "next METHOD".

But as a variant of the "pick one" mentality, that still doesn't help the situation where you want to send a message to all your ancestor classes (like "Please Mr. Base Class, help me initialize this object."), but you want to be more specific with some classes than others ("Please Miss Derived Class, set your $.prim attribute to 1."). So the other approach is to use named arguments that can be ignored by any classes that don't grok the argument.

So what this essentially comes down to is the fact that all methods and submethods of classes that might be derived from (which is essentially all classes, but see the previous section) must have a *% parameter, either explicitly or implicitly, to collect up and render harmless any unrecognized option pairs in the argument list. So the ruling is that all methods and submethods that do not declare an explicit *% parameter will get an implicit *%_ parameter declared for them whether they like it or not. (Subroutines are not granted this "favor".)

It might be objected that this will slow down the parameter binding algorithm for all methods favored with an implicit *%_, but I would argue that the binding code doesn't have to do anything till it sees a named parameter it doesn't recognize, and then it can figure out whether the method even references %_, and if not, simply throw the unrecognized argument away instead of constructing a %_ that won't be used. And most of this "figuring out" can be done at compile time.

Another counterargument is that this prevents a class from recognizing typos in argument names. That's true. It might be possible to ask for a warning that checks globally (at class-finalization time in the optimizer?) to see if there is any method of that name anywhere that is interested in a parameter of that name. But any class that gets its parameters out of a *% hash at runtime would cause false positives, unless we assume that any *% hash makes any argument name legal, in which case we're pretty much back to where we started, unless we do analysis of the usage of all *% hash in those methods, and count things like %_«prim» as proper parameter declarations. And that can still be spoofed in any number of ways. Plus it's not a trivial warning to calculate, so it probably wouldn't be the default in a load-and-go interpreter.

So I think we basically have to live with possible typos to get proper polymorphic dispatch. If something is frequently misspelled, then you could always put in an explicit test against %_ for that argument:


    warn "Didn't you mean :the(%_«teh»)?" if %_«teh»;

And perhaps we could have a pragma:


    use signatures :exact;

But it's possible that the correct solution is to differentiate two kinds of "isa", one that derives from "nextish" classes, and one that derives from "superish" classes. A "next METHOD" traversal would assume that any delegation to a super class would be handled explicitly by the current class's methods. That is, a "superish" inheritance hides the base class from .* and .+, as well as "next METHOD".

On the other hand, if we marked the super class itself, we could refrain from generating *% parameters for its methods. Any "next" dispatcher would then have to "look ahead" to see if the next class was a "superish" class, and bypass it. I haven't a clue what the syntax should be though. We could mark the class with a "superish" trait, which wouldn't be inheritable. Or we could mark it with a Superish role, which would be inheritable, and a base class would have to override it to impose a Nextish role instead. (But then what if one parent class is Superish and one is Nextish?) Or we could even have two different metaclasses, if we decide the two kinds of classes are fundamentally different beasts. In that case we'd declare them differently using "class" and some other keyword. Of course, people will want to use "class" for the type they prefer, and the other keyword for the type they don't prefer. :-)

But since we're attempting to bias things in favor of nextish semantics, that would be a "class", and the superish semantics might be a "guthlophikralique" or some such. :-)

Seriously, if we mark the class, "is hidden" can hide the current class from "next METHOD" semantics. The problem with that is, how do you apply the trait to a class in a different language? That argues for marking the "isa" instead. So as usual when we can't make up our minds, we'll just have it both ways. To mark the class itself, use "is hidden". To mark the "isa", use "hides Base" instead of "is Base". In neither case will "next METHOD" traverse to such a class. (And no *%_ will be autogenerated.)

For example, here are two base classes that know about "next METHOD":


    class Nextish1 { method dostuff() {...; next;}
    class Nextish2 { method dostuff() {...; next;}

    class MyClass is Nextish1 is Nextish2 {
        method dostuff () {...; next;}
    }

Since all the base classes are "next-aware", MyClass knows it can just defer to "next" and both parent classes' dostuff methods will be called. However, suppose one of our base classes is old-fashioned and thinks it should call things with SUPER:: instead. (Or it's a class off in Python or Ruby.) Then we have to write our classes more like this:


    class Superish { method dostuff(...; .*SUPER::dostuff(); }
    class Nextish { method dostuff() {...; next;}

    class MyClass hides Superish is Nextish {
        method dostuff () {
            .Superish::dostuff();       # do Superish::dostuff()
            next;                       # do Nextish::dostuff()
        }
    }

Here, MyClass knows that it has two very different base classes. Nextish knows about "next", and Superish doesn't. So it delegates to Superish::dostuff() differently than it delegates to Nextish::dostuff(). The fact that it declared "hides Superish" prevents next from visiting the Superish class.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Collections of Classes

In Classes

We'd like to be able to support virtual inner classes. You can't have virtual inner classes unless you have a way to dispatch to the actual class of the invocant. That says to me that the solution is bound up intimately with the method dispatcher, and the syntax of naming an inner class has to know about the invocant in whose context we have to start searching for the inner class. So we could have an explicit syntax like:


    class Base {
        our class Inner { ... }
        has Inner $inner;
        submethod BUILD { .makeinner; }
        method makeinner ($self:){
            my Inner $thing .= $self.Inner.new();
            return $thing;
        }
    }

    class Middle is Base {
        our class Inner is Base::Inner { ... }
    }

    class Derived is Middle {
    }

When you say Derived.new(), it creates a Derived object, calls Derived::BUILDALL, which eventually calls Base::BUILD, which makes a Middle::Inner object (because that's what the virtual method $self.Inner returns) and puts it in a variable that of the Base::Inner type (which is fine, since Middle::Inner ISA Base::Inner. Whew!

The only extra magic here is that an inner class would have to autogenerate an accessor method (of the same name) that returns the class. A class could then choose to access an inner class name directly, in which case it would get its own inner class of that name, much like $.foo always gets you your own attribute. But if you called the inner class name as a method, it would automatically virtualize the name, and you'd get the most derived existing version of the class.

This would give us most of what RFC 254 is asking for, at the expense of one more autogenerated method. Use of such inner classes would take the connivance of a base class that doesn't mind if derived classes redefine its inner class. Unfortunately, it would have to express that approval by calling $self.Inner explicitly. So this solution does not go as far as letting you change classes that didn't expect to be changed.

It would be possible to take it further, and I think we should. If we say that whenever you use any global class, it makes an inner class on your behalf that is merely an alias to the global class, creating the accessor method as if it were an inner class, then it's possible to virtualize the name of any class, as long as you're in a context that has an appropriate invocant. Then we'd make any class name lookup assume $self. on the front, basically.

This may seem like a wild idea, but interestingly, we're already proposing to do a similar aliasing in order to have multiple versions of a module running simultaneously. In the case of classes, it seems perfectly natural that a new version might derive from an older version rather than redefining everything.

The one fly in the ointment that I can see is that we might not always have an appropriate invocant--for instance, outside any method body, when we're declaring attributes. I guess when there's no dynamic context indicating what an "inner" classname should mean, it should default to the ordinary meaning in the current lexical and/or package context. Within a class definition, for instance, the invocant is the metaclass, which is unhelpful. So generally that means that a declared attribute type will turn out to be a superclass of the actual attribute type at runtime. But that's fine, ain't it? You can always store a Beagle in a Dog attribute.

So in essence, it boils down to this. Within a method, the invocant is allowed to have opinions about the meanings of any class names, and when there are multiple possible meanings, pick the most appropriate one, where that amounts to the name you'd find if the class name were a virtual method name.

Here's the example from RFC 254, translated to Perl 6 (with Frog made into an explicit inner class for clarity (though it should work with any class by the aliasing rule above)):


    class Forest {
        our class Frog {
            method speak () { say "ribbit ribbit"; }
            method jump  () {...}
            method croak () {...} # ;-)
        }

        has Frog $.frog;

        method new ($class) {
            my Frog $frog .= new;       # MAGIC
            return $class.bless( frog => $frog );
        }
     
        sub make_noise {
            .frog.speak;        # prints "ribbit ribbit"
        }
    }

Now we derive from Forest, producing Forest::Japanese, with its own kind of frogs:


    class Forest::Japanese is Forest {
        our class Frog is Forest::Frog {
            method speak () { say "kerokero"; }
        }
    }

And finally, we make a forest of that type, and tell it to make a noise:


    $forest = new Forest::Japanese;
    $forest.make_noise();               # prints "kerokero"

In the Perl 5 equivalent, that would have printed "ribbit ribbit" instead. How did it do the right thing in Perl 6?

The difference is on the line marked "MAGIC". Because Frog was mentioned in a method, and the invocant was of type Forest::Japanese rather than of type Forest, the word "Frog" figured out that it was supposed to mean a Forest::Japanese::Frog rather than a Forest::Frog. The name was "virtual". So we ended up creating a forest with a frog of the appropriate type, even though it might not have occurred to the writer of Forest that a subclass would override the meaning of Frog.

So one object can think that its Frog is Japanese, while another thinks it's Russian, or Mexican, or even Antarctican (if you can find any forests there). Base methods that talk about Frog will automatically find the Frog appropriate to the current invocant. This works even if Frog is an outer class rather than an inner class, because any outer class referenced by a base class is automatically aliased into the class as a fake inner class. And the derived class doesn't have to redefine its Frog by declaring an inner class either. It can just alias (or use) a different outer Frog class in as its fake inner class. Or even a different version of the same Frog class, if there are multiple versions of it in the library.

And it just works.

In Modules

It's also possible to put a collection of classes into a module, but that doesn't buy you much except the ability to pull them all in with one use, and manage them all with one version number. Which has a lot to be said for it--in the next section.

Versioning

Way back at the beginning, we claimed that a file-scoped class declaration:


    class Dog;
    ...

is equivalent to the corresponding block-scoped declaration:


    class Dog {...}

While that's true, it isn't the whole truth. A file-scoped class (or module, or package) is the carrier of more metadata than a block-scoped declaration. Perl 6 supports a notion of versions that is file based. But even a class name plus a version is not sufficient to name a module--there also has to be a naming authority, which could be a URI or a CPAN id. This will be discussed more fully in Apocalypse 11, but for now we can make some predictions.

The extra metadata has to be associated with the file somehow. It may be implicit in the filename, or in the directory path leading to the file. If so, then Perl 6 has to collect up this information as modules are loaded and associate it with the top level class or module as a set of properties.

It's also possible that a module could declare properties explicitly to define these and other bits of metadata:


    author        http://www.some.com/~jrandom
    version       1.2.1
    creator       Joe Random
    description   This class implements camera obscura.
    subject       optics, boxes
    language      ja_JP
    licensed      Artistic|GPL

Modules posted to CPAN or entered into any standard Perl 6 library are required to declare some set of these properties so that installations can know where to keep them, such that multiple versions by different authors can coexist, all of them available to any installed version of Perl. (This is a requirement for any Perl 6 installation. We're tired of having to reinstall half of CPAN every time we patch Perl. We also want to be able to run different versions of the Frog module simultaneously when the Frog requirements of the modules we use are contradictory.)

It's possible that the metadata is supplied by both the declarations and by the file's name or location in the library, but if so, it's a fatal error to use a module for which those two sources contradict each other as to author or version. (In theory, it could also be a fatal error to use modules with incompatible licensing, but a kind warning might be more appreciated.) Likely there will also be some kind of automatic checksumming going on as well to prevent fraudulent distributions of code.

It might simplify things if we make an identifier metadatum that incorporates all of naming authority, package name, and version. But the individual parts still have to be accessible, if only as components of identifier. However we structure it, we should make the identifier the actual declared full name of the class, yet another one of those "long names" that include extra parameters.

Version Declarations

The syntax of a versioned class declaration looks like this:


    class Dog-1.2.1-cpan:JRANDOM;
    class Dog-1.2.1-http://www.some.com/~jrandom
    class Dog-1.2.1-mailto:jrandom@some.com

Perhaps those could also have short forms, presuming we can distinguish CPAN ids, web pages, and email addresses by their internal forms.


    class Dog-1.2.1-JRANDOM;
    class Dog-1.2.1-www.some.com/~jrandom;
    class Dog-1.2.1-jrandom@some.com;

Or maybe using email addresses is a bad idea now in the modern Spam Age. Or maybe Spam Ages should be plural, like the Dark Ages...

In any event, such a declaration automatically aliases the full name of the class (or module) to the short name. So for the rest of the scope, Dog refers to the longer name.

(Though if you refer to Dog within a method, it's considered a virtual class name, so Perl will search any derived classes for a redefined inner Dog class (or alias) before it settles on the least-derived aliased Dog class.)

We lied slightly when we said earlier that only the file-scoped class carries extra metadata. In fact, all of the classes (or modules, or packages) defined within your file carry metadata, but it so happens that the version and author of all your extra classes (or modules, or packages) are forced to be the same as the file's version and author. This happens automatically, and you may not override the generation of these long names, because if you did, different file versions could and would have version collisions of their interior components, and that would be catastrophic. In general you can ignore this, however, since the long names of your extra classes are always automatically aliased back down to the short names you thought you gave them in the first place. The extra bookkeeping is in there only so that Perl can keep your classes straight when multiple versions are running at the same time. Just don't be surprised when you ask for the name of the class and it tells you more than you expected.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Use of Version and Author Wildcards

Since these long names are the actual names of the classes, when you say:


    use Dog;

you're really asking for something like:


    use Dog-(Any)-(Any);

And when you say:


    use Dog-1.2.1;

you're really asking for:


    use Dog-1.2.1-(Any);

Note that the 1.2.1 specifies an exact match on the version number. You might think that it should specify a minimum version. However, people who want stable software will specify an exact version and stick with it. They don't want 1.2.1 to mean a minimum version. They know 1.2.1 works, so they want that version nailed down forever--at least for now.

To match more than one version, put a range operator in parens:


    use Dog-(1.2.1..1.2.3);
    use Dog-(1.2.1..^1.3);
    use Dog-(1.2.1...);

What goes inside the parens is in fact any valid smartmatch selector:


    use Dog-(1.2.1 | 1.3.4)-(/:i jrandom/);
    use Dog-(Any)-(/^cpan\:/)

And in fact they could be closures too. These means the same thing:


    use Dog-{$^ver ~~ 1.2.1 | 1.3.4}-{$^auth ~~ /:i jrandom/};
    use Dog-{$^ver ~~ Any}-{$^auth ~~ /^cpan\:/}

In any event, however you select the module, its full name is automatically aliased to the short name for the rest of your lexical scope. So you can just say


    my Dog $spot .= new("woof");

and it knows (even if you don't) that you mean


    my Dog-1.3.4-cpan:JRANDOM $spot .= new("woof");

(Again, if you refer to Dog within a method, it's a virtual class name, so Perl will search any derived classes for a redefined Dog class before it settles on the outermost aliased Dog class.)

Introspection

It's easy to specify what Perl 6 will provide for introspection: the union of what Perl 6 needs and whatever Parrot provides for other languages. ;-)

In the particular case of class metadata, the interface should generally be via the class's metaclass instance--the object of type MetaClass that was in charge of building the class in the first place. The metamethods are in the metaobject, not in the class object. (Well, actually, those are the same object, but a class object ignores the fact that it's also a metaobject, and dispatches by default to its own methods, not the ones defined by the metaclass.) To get to the metamethods of an ordinary class object you have to use the .meta method:


    MyClass.getmethods()        # call MyClass's .getmethods method
    MyClass.meta.getmethods()   # get the method list of MyClass

Unless MyClass has defined or inherited a .getmethods method, the first call is an error. The second is guaranteed to work for Perl 6's standard MetaClass objects. You can also call .meta on any ordinary object:


    $obj.meta.getmethods();

That's equivalent to


    $obj.dispatcher.meta.getmethods();

As for which parts of a class are considered metadata--they all are, if you scratch hard enough. Everything that is not stored directly as a trait or property really ought to have some kind of trait-like method to access it. Even the method body closures have to be accessible as traits, since the .wrap method needs to have something to put its wrapper around.

Minimally, we'll have user-specified class traits that look like this:


    identifier    Dog-1.2.1-http://www.some.com/~jrandom
        name      Dog
        version   1.2.1
        authority http://www.some.com/~jrandom
    author        Joe Random
    description   This class implements camera obscura.
    subject       optics, boxes
    language      ja_JP
    licensed      Artistic|GPL

And there may be internal traits like these:


    isa           list of parent classes
    roles         list of roles
    disambig      how to deal with ambiguous method names from roles
    layout        P6opaque, P6hash, P5hash, P5array, PyDict, Cstruct, etc.

The layout determines whether one class can actually derive from another or has to fake it. Any P6opaque class can compatibly inherit from any other P6opaque class, but if it inherits from any P5 class, it must use some form of delegation to another invocant. (Hopefully with a smart enough invocant reference that, if the delegated object unknowingly calls back into our layout system, we can recover the original object reference and maintain some kind of compositional integrity.)

The metaclass's .getmethods method returns method-descriptor objects with at least the following properties:


    name                the name of the method
    signature           the parameters of the method
    returns             the return type of the method
    multi               whether duplicate names are allowed
    do                  the method body

The .getmethods method has a selector parameter that lets you specify whether you want to see a flattened or hierarchical view, whether you're interested in private methods, and so forth. If you want a hierarchical view, you only get the methods actually defined in the class proper. To get at the others, you follow the "isa" trait to find your parent classes' methods, and you follow the "roles" trait to get to role methods, and from parents or roles you may also find links to further parents or roles.

The .getattributes method returns a list of attribute descriptors that have traits like these:


    name
    type
    scope
    rw
    private
    accessor
    build

Additionally they can have any other variable traits that can reasonably be applied to object attributes, such as constant.

Strictly speaking, metamethods like .isa(), .does(), and .can() should be called through the meta object:


    $obj.meta.can("bark")
    $obj.meta.does(Dog)
    $obj.meta.isa(Mammal)

And they can always be called that way. For convenience you can often omit the .meta call because the base Object type translates any unrecognized .foo() into .meta.foo() if the meta class has a method of that name. But if a derived class overrides such a metamethod, you have to go through the .meta call explicitly to get the original call.

In previous Apocalypses we said that:


    $obj ~~ Dog

calls:


    $obj.isa(Dog)

That is not longer the case--you're actually calling:


    $obj.meta.does(Dog)

which is true if $obj either "does" or "isa" Dog (or "isa" something that "does" Dog). That is, it asks if $obj is likely to satisfy the interface that comes from the Dog role or class. The .isa method, by contrast, is strictly asking if $obj inherits from the Dog class. It's erroneous to call it on a role. Well, okay, it's not strictly erroneous. It will just never return true. The optimizer will love you, and remove half your code.

Note that either of .does or .isa can lie, insofar as you might include an interface that you later override parts of. When in doubt, rely on .can instead. Better yet, rely on your dispatcher to pick the right method without trying to second guess it. (And then be prepared to catch the exception if the dispatcher throws up its hands in disgust...)

By the way, unlike in Perl 5 where .can returns a single routine reference, Perl 6's version of .meta.can returns a "WALK" iterator for a set of routines that match the name. When dereferenced, the iterator gets fed to a dispatcher as if the method had been called in the first place. Note that any wildcard methods (via delegation or AUTOLOAD) are included by default in this list of potential handlers, so there is no reason for subclasses to have to redefine .can to reflect the new names. This does potentially weaken the meaning of .can from "definitely has a method of this name" to "definitely has one or more methods in one or more classes that will try to handle this." But that's probably closer to what you want, and the best we can do when people start fooling around with wildcard methods under MI.

However, that being said, many classes may wish to dynamically specify at the last moment which methods they can or cannot handle. That is, they want a hook to allow a class to declare names even while the .can candidate list is being built. By default .meta.can includes all wildcard delegations and autoloads at the end of the list. However, it will exclude from the list of candidates any class that defines its own AUTOMETH method, on the assumption that each such AUTOMETH method has already had its chance to add any callable names to the list. If the class's AUTOMETH wishes to supply a method, it should return a reference to that method.

Do not confuse AUTOMETH with AUTOMETHDEF. The former is equivalent to declaring a stub declaration. The latter is equivalent to supplying a body for an existing stub. Whether AUTOMETH actually creates a stub, or AUTOMETHDEF actually creates a body, is entirely up to those routines. If they wish to cache their results, of course, then they should create the stub or body.

There are corresponding AUTOSUB and AUTOSUBDEF hooks. And AUTOVAR and AUTOVARDEF hooks. These all pretty much make AUTOLOAD obsolete. But AUTOLOAD is still there for old times's sake.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Other Non-OO Decisions

A lot of time went by while I was in the hospital last year, so we ended up polishing up the design of Perl 6 in a number of areas not directly related to OO. Since I've already got your attention (and we're already 90% of the way through this Apocalypse), I might as well list these decisions here.

Exportation

The trait we'll use for exportation (typically from modules but also from classes pretending to be modules) is export:


                                               # Tagset...
    sub foo is export(:DEFAULT)         {...}  #  :DEFAULT, :ALL
    sub bar is export(:DEFAULT :others) {...}  #  :DEFAULT, :ALL, :others
    sub baz is export(:MANDATORY)       {...}  #  (always exported)
    sub bop is export                   {...}  #  :ALL
    sub qux is export(:others)          {...}  #  :ALL, :others

Compared to Perl 5, we've basically made it easier to mark something as exportable, but more difficult to export something by default. You no longer have to declare your tagsets separately, since :foo parameters are self-declaring, and the module will automatically build the tagsets for you from the export trait arguments.

The Gather/Take Construct

We used one example of the conjectural gather/take construct. A gather executes a closure, returning a list of all the values returned by take within its lexical scope. In a lazy context it might run as a coroutine. There probably ought to be a dynamically scoped variant. Unless it should be dynamic by default, in which case there probably ought to be a lexically scoped variant...

:foo() Adverbs

There's a new pair syntax that is more conducive to use as option arguments. This syntax is reminiscent of both the Unix command line syntax and the I/O layers syntax of Perl 5. But unlike Unix command-line options, we use colon to introduce the option rather than the overly negative minus sign. And unlike Perl 5's layers options, you can use these outside of a string.

We haven't discarded the old pair syntax. It's still more readable for certain uses, and it allows the key to be a non-identifier. Plus we can define the new syntax in terms of it:


    Old                         New
    ---                         ---
    foo => $bar                 :foo($bar)
    foo => [1,2,3,@many]        :foo[1,2,3,@many]
    foo => «alice bob charles»  :foo«alice bob charles»
    foo => 'alice'              :foo«alice»
    foo => { a => 1, b => 2 }   :foo{ a => 1, b => 2 }
    foo => { dostuff() }        :foo{ dostuff() }
    foo => 0                    :foo(0)
    foo => 1                    :foo

It's that last one that's the real winner for passing boolean options. One other nice thing is that if you have several options in a row you don't have to put commas between:


    $handle = open $file, :chomp :encoding«guess» :ungzip or die "Oops";

It might be argued that this conflicts the :foo notation for private methods. I don't think it's a problem because method names never occur in isolation.

Oh, one other feature of option pairs is that certain operations can use them as adverbs. For instance, you often want to tell the range operator how much to skip on each iteration. That looks like this:


    1..100 :by(3)

Note that this only works where an operator is expected rather than a term. So there's no confusion between:


    randomlistop 1..100 :by(3)

and


    randomlistop 1..100, :by(3)

In the latter case, the option is being passed to randomlistop() rather than the infix:.. operator.

Special Quoting of Identifiers Inside Curlies Going Away!

Novice Perl 5 programmers are continually getting trapped by subscripts that autoquote unexpectedly. So in Perl 6, we'll remove that special case. %hash{shift} now always calls the shift function, because the inside of curlies is always an expression. Instead, if you want to subscript a hash with a constant string, or a slice of constant strings, use the new French qw//-ish brackets like this:


    %hash«alice»                # same as %hash{'alice'}
    %hash«alice bob charlie»    # same as %hash{'alice','bob','charlie'}

Note in particular that, since slices in Perl 6 are determined by the subscript only, not the sigil, this:


    %hash«alice» = @x;

evaluates the right side in scalar context, while


    %hash«alice bob charlie» = @x;

evaluates the right side in list context. As with all other uses of the French quotes in Perl 6, you can always use:


    %hash<<alice>> = @x;

if you can't figure out how to type ^K<< or ^K>> in vim.

On the other hand, if you've got a fully Unicode aware editor, you could probably write some macros to use the big double angles from Asian languages:


    %hash《alice》 = @x;

But by default we only provide the Latin-1 compatible versions. It would be easy to overuse Unicode in Perl 6, so we're trying to underuse Unicode for small values of 6. (Not to be confused with ⁶, or ⅵ.)

Vector Operators Renamed Back to "hyper" Operators

The mathematicians got confused when we started talking about "vector" operators, so these dimensionally dwimming versions of scalar operators are now called hyper operators (again). Some folks see operations like


    @a »*« @b

as totally useless, and maybe they are--to a mathematician. But to someone simply trying to calculate a bunch of things in parallel (think cellular automata, or aerodynamic simulations, for instance), they make a lot of sense. And don't restrict your thinking to math operators. How about appending a newline to every string before printing it out:


    print @strings »~« "\n";

Of course,


    for @strings {say}

is a shorter way to do the same thing. ("say" is just Perl 6's version of a printline function.)

Unary Hyper Operators Now Use One Quote Rather Than Two

Unary operators read better if they only "hyper" on the side where there's an actual argument:


    @neg = -« @pos;
    @indexes = @x »++;

And in particular, I consider a method spec like .bletch(1,2,3) to be a unary postfix operator, and it would be really ugly to say:


    @objects».bletch(1,2,3)«

So that's just:


    @objects».bletch(1,2,3)

In general, binary operators still take "hypers" on both sides, indicating that both sides participate in the dwimmery.


    @a »+« @a

To indicate that one side or the other should be evaluated as a scalar before participating in the hyperoperator, you can always put in a context specifier:


    @a »+« +@a

$thumb.twiddle No Longer Requires Parens When Interpolated

In Apocalypse 2 we said that any method interpolated into a double-quoted string has to have parentheses. We're throwing out that special rule in the interests of consistency. Now if you want to interpolate a variable followed by an "accidental" dot, use one of these:


    $($var).twiddle
    $var\.twiddle

Yes, that will make it a little harder to translate Perl 5 to Perl 6.

(Parentheses are still required if there are any arguments, however.)

The =:= Identity Operator

There is a new =:= identity operator, which tests to see if two objects are the same object. The association with the := binding operator should be obvious. (Some classes such as integers may consider all objects of the same value to be a single object, in a Platonic sense.)

Hmm? No, there is no associated assignment operator. And if there were, I wouldn't tell you about it. Sheesh, some people...

But there is, of course, a hyper version:


    @a »=:=« @b

New Grammatical Categories

The current set of grammatical categories for operator names is:


    Category                            Example of use
    --------                            --------------
    coerce:as                           123 as BigInt, BigInt(123)
    self:sort                           @array.=sort
    term:...                            $x = {...}
    prefix:+                            +$x
    infix:+                             $x + $y
    postfix:++                          $x++
    circumfix:[]                        [ @x ]
    postcircumfix:[]                    $x[$y] or $x .[$y]
    rule_modifier:p5                    m:p5//
    trait_verb:handles                  has $.tail handles «wag»
    trait_auxiliary:shall               my $x shall conform«TR123»
    scope_declarator:has                has $.x;
    statement_control:if                if $condition {...} else {...}
    infix_postfix_meta_operator:=       $x += 2;
    postfix_prefix_meta_operator:»      @array »++
    prefix_postfix_meta_operator:«      -« @magnitudes
    infix_circumfix_meta_operator:»«    @a »+« @b

Now, you may be thinking that some of these have long, unwieldy names. You'd be right. The longer the name, the longer you should think before adding a new operator of that category. (And the length of time you should think probably scales exponentially with the length of the name.)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 12 for the latest information.

Assignment to state Variable Declaration Now Does "First" Semantics

As we talked about earlier, assignment to a "has" variable is really pseudo-assignment representing a call to the "build" trait. In the same way, assignment to state variables (Perl's version of lexically scoped "static" variables), is taken as pseudo-assignment representing a call to the "first" trait. The first time through a piece of code is when state variables typically like to be initialized. So saying:


    state $pc = $startpc;

is equivalent to


    state $pc is first( $startpc );

which means that it will pay attention to the $startpc variable only the first time this block is ever executed. Note that any side effects within the expression will only happen the first time through. If you say


    state $x = $y++;

then that statement will only ever increment $y once. If that's not what you want, then use a real assignment as a separate statement:


    state $x;
    $x = $y++;

The := and .= operators also attempt to do what you mean, which in the case of:


    state $x := $y++;

still probably doesn't do what you want. :-)

In general, any "preset" trait is smart about when to apply its value to the container it's being applied to, such that the value is set statically if that's possible, and if that's not possible, it is set dynamically at the "correct" moment.

For ordinary assignment to a "my" variable, that correct moment just happens to be every time it is executed, so = represents ordinary assignment. If you want to force an initial value at execution time that was calculated earlier, however, then just use ordinary assignment to assign the results of a precalculated block:


    my @canines = INIT { split slurp "%ENV«HOME»/.canines" };

It's only the has and state declarators that redefine assignment to set defaults with traits. (For has, that's because the actual attribute variable won't exist until the object is created. For state, that's because we want the default to be "first time through".) But you can use any of the traits on any variable for which it makes sense. For instance, just because we invented the "first" initializer for state variables:


    state $lexstate is first(0);

doesn't mean you can't use it to initialize any variable only the first time through a block of code:


    my $foo is first(0);

However, it probably doesn't make a lot of sense on a "my" variable, unless you really want it to be undefined the second time through. It does make a little more sense on an "our" variable that will hang onto its value like a state variable:


    our $counter is first(0);

An assignment would often be wrong in this case. But generally, the naive user can simply use assignment, and it will usually do what they want (if occasionally more often than they want). But it does exactly what they want on has and state variables--presuming they are savvy enough to want what it actually does... :-)

So as with has variables, state variables can be initialized with precomputed values:


    state $x = BEGIN { calc() }
    state $x = CHECK { calc() }
    state $x = INIT  { calc() }
    state $x = FIRST { calc() }
    state $x = ENTER { calc() }

which mean something like:


    state $x is first( BEGIN { calc() } )
    state $x is first( CHECK { calc() } )
    state $x is first( INIT  { calc() } )
    state $x is first( FIRST { calc() } )
    state $x is first( ENTER { calc() } )

Note, however, that the last one doesn't in fact make much sense, since ENTER happens more frequently than FIRST. Come to think of it, doing FIRST inside a first doesn't buy you much either...

The length() Function Is Gone

In Perl 6 you're not going to see


    my $sizeofstring = length($string);

That's because "length" has been deemed to be an insufficiently specified concept, because it doesn't specify the units. Instead, if you want the length of something in characters you use


    my $sizeinchars = chars($string);

and if you want the size in elements, you use


    my $sizeinelems = elems(@array);

This is more orthogonal in some ways, insofar as you can now ask for the size in chars of an array, and it will add up all the lengths of the strings in it for you:


    my $sizeinchars = chars(@array);

And if you ask for the number of elems of a scalar, it knows to dereference it:


    my $ref = [1,2,3];
    my $sizeinelems = elems($ref);

These are, in fact, just generic object methods:


    @array.elems
    $string.chars
    @array.chars
    $ref.elems

And the functional forms are just multimethod calls. (Unless they're indirect object calls...who knows?)

You can also use %hash.elems, which returns the number of pairs in the hash. I don't think %hash.chars is terribly useful, but it will tell you how many characters total there are in the values. (The key lengths are ignored, just like the integer "keys" of an ordinary array.)

Actually, the meaning of .chars varies depending on your current level of Unicode support. To be more specific, there's also:


    $string.bytes
    $string.codepoints
    $string.graphemes
    $string.letters

...none of which should be confused with:


    $string.columns

or its evil twin:


    $string.pixels

Those last two require knowledge of the current font and rendering engine, in fact. Though .columns is likely to be pretty much the same for most Unicode fonts that restrict themselves to single and double-wide characters.

String Positions

A corollary to the preceding is that string positions are not numbers. If you say either


    $pos = index($string, "foo");

or


    $string ~~ /foo/; $pos = $string.pos;

then $pos points to that location in that string. If you ask for the numeric value of $pos, you'll get a number, but which number you get can vary depending on whether you're currently treating characters as bytes, codepoints, graphemes, or letters. When you pass a $pos to substr($string, $pos, 3), you'll get back "foo", but not because it counted over some number of characters. If you use $pos on some other string, then it has to interpret the value numerically in the current view of what "character" means. In a boolean context, a position is true if the position is defined, even if that position would evaluate to 0 numerically. (index and rindex return undef when they "run out".)

And, in fact, when you say $len = .chars, you're really getting back the position of the end of the string, which just happens to numerify to the number of characters in the string in the current view. A consequence of the preceding rules is that "".chars is true, but +"".chars is false. So Perl 5 code that says length($string) needs to be translated to +chars($string) if used in a boolean context.

Routines like substr and index take either positions or integers for arguments. Integers will automatically be turned into positions in the current view. This may involve traversing the string for variable-width representations, especially when working with combining characters as parts of graphemes. Once you're working with abstract positions, however, they are efficient. So


    while $pos = index($string, "fido", $pos + 1) {...}

never has to rescan the string.

The other point of all this is that you can pass $pos or $len to another module, and it doesn't matter if you're doing offsets in graphemes and they are doing offsets in codepoints. They get the correct position by their lights, even though the number of characters looks different. The main constraint on this is that if you pass a position from a lower Unicode support level to a higher Unicode support level, you can end up with a position that is inside what you think of as a unitary character, whether that's a byte within a codepoint, or a codepoint within a grapheme or letter. If you deref such a position, an exception is thrown. But generally high-level routines call into low-level routines, so the issue shouldn't arise all that often in practice. However, low-level routines that want to be called from high-level routines should strive not to return positions inside high-level characters--the fly in the ointment being that the low-level routine doesn't necessarily know the Unicode level expected by the calling routine. But we have a solution for that...

High-level routines that suspect they may have a "partial position" can call $pos.snap (or $pos.=snap) to round up to the next integral position in the current view, or (much less commonly) $pos.snapback (or $pos.=snapback) to round down to the next integral position in the current view. This only biases the position rightward or leftward. It doesn't actually do any repositioning unless we're about to throw an exception. So this allows the low-level routine to return $pos.snap without knowing at the time how far forward to snap. The actual snapping is done later when the high-level routine tries to use the position, and at that point we know which semantics to snap forward under.

By the way, if you bind to a position rather than assign, it tracks the string in question:


    my $string = "xyz";
    my $endpos := $string.chars;        # $endpos == 3
    substr($string,0,0,"abc");          # $endpos == 6, $string = "abcxyz"

Deletions of string around a position cause the position to be reduced to the beginning of the deletion. Insertions at a position are assumed to be after that position. That is, the position stays pointing to the beginning of the newly inserted string, like this:


    my $string = "xyz";
    my $endpos := $string.chars;        # $endpos == 3
    substr($string,2,1,"abc");          # $endpos == 2, $string = "xyabc"

Hence concatenation never updates any positions. Which means that sometimes you just have to call .chars again... (Perhaps we'll provide a way to optionally insert before any matching position.)

Note that positions try very hard not to get demoted to integers. In particular, position objects overload addition and substraction such that


    $string.chars - 1
    index($string, "foo") + 2

are still position objects with an implicit reference into the string. (Subtracting one position object from another results in an integer, however.)

The New "&" Separator in Regexen

Analogous to the disjunctional | separator, we're also putting in a conjunctional & separator into our regex syntax:


    "DOG" ~~ /D [ <vowel>+ & <upper>+ ] G/

The semantics of it are pretty straightforward, as long as you realize that all of the AND'ed assertions have to match with the same length. More precisely, they have to start and stop matching at the same location. So the following is always going to be false:


    / . & .. /

It would be possible to have the other semantics where, as long as the trailing assertion matches either way, it doesn't have to match the trailing assertion the same way. But then tell me whether $1 should return "O" or "G" after this:


    "DOG" ~~ /^[. & ..] (.)/

Besides, it's easy enough to get the other semantics with lookahead assertions. Autoanchoring all the legs of a conjunction to the same spot adds much more value to it by differentiating it from lookahead. You have to work pretty hard to make separate lookaheads match the same length. Plus doing that turns what should be a symmetric operator into a non-symmetrical one, where the final lookahead can't be a lookahead because someone has to "eat" the characters that all the assertions have agreed on are the right number to eat. So for all these reasons it's better to have a conjunction operator with complicated enough start/stop semantics to be useful.

Actually, this operator was originally suggested to me by a biologist. Which leads us to our...

Optional Mandatory Cross-Disciplinary Joke for People Tired of Dogs


    Biologist: What's worse than being chased by a Velociraptor?
    Physicist: Obviously, being chased by an Acceloraptor.

Future Directions

Away from Acceloraptors, obviously.

References...er, Reference...

 
Nathanael Schärli, Stéphane Ducasse, Oscar Nierstrasz and Andrew Black. 
    Traits: Composable Units of Behavior. European Conference on Object-Oriented 
    Programming (ECOOP), July 2003. Springer LNCS 2743, Ed. Luca Cardelli. 

Using Bloom Filters

Anyone who has used Perl for any length of time is familiar with the lookup hash, a handy idiom for doing existence tests:

foreach my $e ( @things ) { $lookup{$e}++ }

sub check {
	my ( $key ) = @_;
	print "Found $key!" if exists( $lookup{ $key } );
}

As useful as the lookup hash is, it can become unwieldy for very large lists or in cases where the keys themselves are large. When a lookup hash grows too big, the usual recourse is to move it to a database or flat file, perhaps keeping a local cache of the most frequently used keys to improve performance.

Many people don't realize that there is an elegant alternative to the lookup hash, in the form of a venerable algorithm called a Bloom filter. Bloom filters allow you to perform membership tests in just a fraction of the memory you'd need to store a full list of keys, so you can avoid the performance hit of having to use a disk or database to do your lookups. As you might suspect, the savings in space comes at a price: you run an adjustable risk of false positives, and you can't remove a key from a filter once you've added it in. But in the many cases where those constraints are acceptable, a Bloom filter can make a useful tool.

For example, imagine you run a high-traffic online music store along the lines of iTunes, and you want to minimize the stress on your database by only fetching song information when you know the song exists in your collection. You can build a Bloom filter at startup, and then use it as a quick existence check before trying to perform an expensive fetching operation:

use Bloom::Filter;

my $filter = Bloom::Filter->new( error_rate => 0.01, capacity => $SONG_COUNT );
open my $fh, "enormous_list_of_titles.txt" or die "Failed to open: $!";

while (<$fh>) {
	chomp;
	$filter->add( $_ );
}

sub lookup_song {
	my ( $title ) = @_;
	return unless $filter->check( $title );
	return expensive_db_query( $title ) or undef;
}

In this example, there's a 1% chance that the test will give a false positive, which means the program will perform the expensive fetch operation and eventually return a null result. Still, you've managed to avoid the expensive query 99% of the time, using only a fraction of the memory you would have needed for a lookup hash. As we'll see further on, a filter with a 1% error rate requires just under 2 bytes of storage per key. That's far less memory than you would need for a lookup hash.

Bloom filters are named after Burton Bloom, who first described them in a 1970 paper entitled Space/time trade-offs in hash coding with allowable errors. In those days of limited memory, Bloom filters were prized primarily for their compactness; in fact, one of their earliest applications was in spell checkers. However, there are less obvious features of the algorithm that make it especially well-suited to applications in social software.

Because Bloom filters use one-way hashing to store their data, it is impossible to reconstruct the list of keys in a filter without doing an exhaustive search of the keyspace. Even that is unlikely to be of much help, since the false positives from an exhaustive search will swamp the list of real keys. Bloom filters therefore make it possible to share information about what you have without broadcasting a complete list of it to the world. For that reason, they may be especially valuable in peer-to-peer applications, where both size and privacy are important constraints.

How Bloom Filters Work

A Bloom filter consists of two components: a set of k hash functions and a bit vector of a given length. We choose the length of the bit vector and the number of hash functions depending on how many keys we want to add to the set and how high an error rate we are willing to put up with -- more on that a little bit further on.

All of the hash functions in a Bloom filter are configured so that their range matches the length of the bit vector. For example, if a vector is 200 bits long, the hash functions return a value between 1 and 200. It's important to use high-quality hash functions in the filter to guarantee that output is equally distributed over all possible values -- "hot spots" in a hash function would increase our false-positive rate.

To enter a key into a Bloom filter, we run it through each one of the k hash functions and treat the result as an offset into the bit vector, turning on whatever bit we find at that position. If the bit is already set, we leave it on. There's no mechanism for turning bits off in a Bloom filter.

As an example, let's take a look at a Bloom filter with three hash functions and a bit vector of length 14. We'll use spaces and asterisks to represent the bit vector, to make it easier to follow along. As you might expect, an empty Bloom filter starts out with all the bits turned off, as seen in Figure 1.

an empty Bloom filter
Figure 1. An empty Bloom filter.

Let's now add the string apples into our filter. To do so, we hash apples through each of our three hash functions and collect the output:

hash1("apples") = 3
hash2("apples") = 12
hash3("apples") = 11

Then we turn on the bits at the corresponding positions in the vector -- in this case bits 3, 11, and 12, as shown in Figure 2.

a Bloom filter with three bits enabled
Figure 2. A Bloom filter with three bits enabled.

To add another key, such as plums, we repeat the hashing procedure:

hash1("plums") = 11
hash2("plums") = 1
hash3("plums") = 8

And again turn on the appropriate bits in the vector, as shown with highlights in Figure 3.

the Bloom filter after adding a second key
Figure 3. The Bloom filter after adding a second key.

Notice that the bit at position 11 was already turned on -- we had set it when we added apples in the previous step. Bit 11 now does double duty, storing information for both apples and plums. As we add more keys, it may store information for some of them as well. This overlap is what makes Bloom filters so compact -- any one bit may be encoding multiple keys simultaneously. This overlap also means that you can never take a key out of a filter, because you have no guarantee that the bits you turn off don't carry information for other keys. If we tried to remove apples from the filter by reversing the procedure we used to add it in, we would inadvertently turn off one of the bits that encodes plums. The only way to strip a key out of a Bloom filter is to rebuild the filter from scratch, leaving out the offending key.

Checking to see whether a key already exists in a filter is exactly analogous to adding a new key. We run the key through our set of hash functions, and then check to see whether the bits at those offsets are all turned on. If any of the bits is off, we know for certain the key is not in the filter. If all of the bits are on, we know the key is probably there.

I say "probably" because there's a certain chance our key might be a false positive. For example, let's see what happens when we test our filter for the string mango. We run mango through the set of hash functions:

hash1("mango") = 8
hash2("mango") = 3
hash3("mango") = 12

And then examine the bits at those offsets, as shown in Figure 4.

a false positive in the Bloom filter
Figure 4. A false positive in the Bloom filter.

All of the bits at positions 3, 8, and 12 are on, so our filter will report that mango is a valid key.

Of course, mango is not a valid key -- the filter we built contains only apples and plums. The fact that the offsets for mango point to enabled bits is just coincidence. We have found a false positive -- a key that seems to be in the filter, but isn't really there.

As you might expect, the false-positive rate depends on the bit vector length and the number of keys stored in the filter. The roomier the bit vector, the smaller the probability that all k bits we check will be on, unless the key actually exists in the filter. The relationship between the number of hash functions and the false-positive rate is more subtle. If you use too few hash functions, there won't be enough discrimination between keys; but if you use too many, the filter will be very dense, increasing the probability of collisions. You can calculate the false-positive rate for any filter using the formula:

c = ( 1 - e(-kn/m) )k

Where c is the false positive rate, k is the number of hash functions, n is the number of keys in the filter, and m is the length of the filter in bits.

When using Bloom filters, we very frequently have a desired false-positive rate in mind and we are also likely to have a rough idea of how many keys we want to add to the filter. We need some way of finding out how large a bit vector is to make sure the false-positive rate never exceeds our limit. The following equation will give us vector length from the error rate and number of keys:

m = -kn / ( ln( 1 - c ^ 1/k ) )

You'll notice another free variable here: k, the number of hash functions. It's possible to use calculus to find a minimum for k, but there's a lazier way to do it:

sub calculate_shortest_filter_length {
	my ( $num_keys, $error_rate ) = @_;
	my $lowest_m;
	my $best_k = 1;

	foreach my $k ( 1..100 ) {
		my $m = (-1 * $k * $num_keys) / 
			( log( 1 - ($error_rate ** (1/$k))));

		if ( !defined $lowest_m or ($m < $lowest_m) ) {
			$lowest_m = $m;
			$best_k   = $k;
		}
	}
	return ( $lowest_m, $best_k );
}

To give you a sense of how error rate and number of keys affect the storage size of Bloom filters, Table 1 lists some sample vector sizes for a variety of capacity/error rate combinations.

Error RateKeysRequired SizeBytes/Key
1%1K1.87 K1.9
0.1%1K2.80 K2.9
0.01%1K3.74 K3.7
0.01%10K37.4 K3.7
0.01%100K374 K3.7
0.01%1M3.74 M3.7
0.001%1M4.68 M4.7
0.0001%1M5.61 M5.7

You can find further lookup tables for various combinations of error rate, filter size, and number of hash functions at Bloom Filters -- the math.

Building a Bloom Filter in Perl

To make a working Bloom filter, we need a good set of hash functions These are easy to come by -- there are several excellent hashing algorithms available on CPAN. For our purposes, a good choice is Digest::SHA1, a cryptographically strong hash with a fast C implementation. We can use the module to create as many hash functions as we like by salting the input with a list of distinct values. Here's a subroutine that builds a list of unique hash functions:

use Digest::SHA1 qw/sha1/;

sub make_hashing_functions {
	my ( $count ) = @_;
	my @functions;

	for my $salt (1..$count ) {
		push @functions, sub { sha1( $salt, $_[0] ) };
	}

	return @functions;
}

To be able to use these hash functions, we have to find a way to control their range. Digest::SHA1 returns an embarrassingly lavish 160 bits of hashed output, useful only in the unlikely case that our vector is 2160 bits long. We'll use a combination of bit chopping and division to scale the output down to a more usable size.

Here's a subroutine that takes a key, runs it through a list of hash functions, and returns a bitmask of length $FILTER_LENGTH:

sub make_bitmask {
	my ( $key ) = @_;
	my $mask    = pack( "b*", '0' x $FILTER_LENGTH);

	foreach my $hash_function ( @functions ){ 

		my $hash       = $hash_function->($key);
		my $chopped    = unpack("N", $hash );
		my $bit_offset = $result % $FILTER_LENGTH;

		vec( $mask, $bit_offset, 1 ) = 1;       
	}
	return $mask;
}

That's a dense stretch of code, so let's look at it line by line:

my $mask = pack( "b*", '0' x $FILTER_LENGTH);

We start by using Perl's pack operator to create a zeroed bit vector that is $FILTER_LENGTH bits long. pack takes two arguments, a template and a value. The b in our template tells pack that we want it to interpret the value as bits, and the * indicates "repeat as often as necessary," just like in a regular expression. Perl will actually pad our bit vector to make its length a multiple of eight, but we'll ignore those superfluous bits.

With a blank bit vector in hand, we're ready to start running our key through the hash functions.

my $hash = $hash_function->($key);
my $chopped = unpack("N", $hash );

We're keeping the first 32 bits of the output and discarding the rest. This prevents us from having to require BigInt support further along. The second line does the actual bit chopping. The N in the template tells unpack to extract a 32-bit integer in network byte order. Because we don't provide any quantifier in the template, unpack will extract just one integer and then stop.

If you are extra, super paranoid about bit chopping, you could split the hash into five 32-bit pieces and XOR them together, preserving all the information in the original hash:

my $chopped = pack( "N", 0 );
my @pieces  =  map { pack( "N", $_ ) } unpack("N*", $hash );
$chopped    = $_ ^ $chopped foreach @pieces;

But this is probably overkill.

Now that we have a list of 32-bit integer outputs from our hash functions, all we have to do is scale them down with the modulo operator so they fall in the range (1..$FILTER_LENGTH).

my $bit_offset = $chopped % $FILTER_LENGTH;

Now we've turned our key into a list of bit offsets, which is exactly what we were after.

The only thing left to do is to set the bits using vec, which takes three arguments: the vector itself, a starting position, and the number of bits to set. We can assign a value to vec like we would to a variable:

vec( $mask, $bit_offset, 1 ) = 1;

After we've set all the bits, we wind up with a bitmask that is the same length as our Bloom filter. We can use this mask to add the key into the filter:

sub add {
	my ( $key, $filter ) = @_;

	my $mask = make_bitmask( $key );
	$filter  = $filter | $mask;
}

Or we can use it to check whether the key is already present:

sub check {
	my ( $key, $filter ) = @_;
	my $mask  = make_bitmask( $key );
	my $found = ( ( $filter & $mask ) eq $mask );
	return $found;
}

Note that those are the bitwise OR (|) and AND (&) operators, not the more commonly used logical OR (||) and AND ( && ) operators. Getting the two mixed up can lead to hours of interesting debugging. The first example ORs the mask against the bit vector, turning on any bits that aren't already set. The second example compares the mask to the corresponding positions in the filter -- if all of the on bits in the mask are also on in the filter, we know we've found a match.

Once you get over the intimidation factor of using vec, pack, and the bitwise operators, Bloom filters are actually quite straightforward. Listing 1 shows a complete object-oriented implementation called Bloom::Filter.

Bloom Filters in Distributed Social Networks

One drawback of existing social network schemes is that they require participants to either divulge their list of contacts to a central server (Orkut, Friendster) or publish it to the public Internet (FOAF), in both cases sacrificing a great deal of privacy. By exchanging Bloom filters instead of explicit lists of contacts, users can participate in social networking experiments without having to admit to the world who their friends are. A Bloom filter encoding someone's contact information can be checked to see whether it contains a given name or email address, but it can't be coerced into revealing the full list of keys that were used to build it. It's even possible to turn the false-positive rate, which may not sound like a feature, into a powerful tool.

Suppose that I am very concerned about people trying to reverse-engineer my social network by running a dictionary attack against my Bloom filter. I can build my filter with a prohibitively high false-positive rate (50%, for example) and then arrange to send multiple copies of my Bloom filter to friends, varying the hash functions I use to build each filter. The more filters my friends collect, the lower the false-positive rate they will see. For example, with five filters the false-positive rate will be (0.5)5, or 3% -- and I can reduce the rate further by sending out more filters.

If any one of the filters is intercepted, it will register the full 50% false-positive rate. So I am able to hedge my privacy risk across several interactions, and have some control over how accurately other people can see my network. My friends can be sure with a high degree of certainty whether someone is on my contact list, but someone who manages to snag just one or two of my filters will learn almost nothing about me.

Here's a Perl function that checks a key against a set of noisy filters:

use Bloom::Filter;
        
sub check_noisy_filters {
	my ( $key, @filters ) = @_;
	foreach my $filter ( @filters ) {
		return 0 unless $filter->check( $key );
	}
	return 1;
}

If you and your friends agree to use the same filter length and set of hash functions, you can also use bitwise comparisons to estimate the degree of overlap between your social networks. The number of shared on bits in two Bloom filters will give a usable measure of the distance between them.

sub shared_on_bits {
	my ( $filter_1, $filter_2 ) = @_;
	return unpack( "%32b*",  $filter_1 & $filter_2 )
}

Additionally, you can combine two Bloom filters that have the same length and hash functions with the bitwise OR operator to create a composite filter. For example, if you participate in a small mailing list and want to create a whitelist from the address books of everyone in the group, you can have each participant create a Bloom filter individually and then OR the filters together into a Voltron-like master list. None of the members of the group will know who the other members' contacts are, and yet the filter will exhibit the correct behavior.

There are sure to be other neat Bloom filter tricks with potential applications to social networking and distributed applications. The references below list a few good places to start mining.

References

This week on Perl 6, week ending 2004-04-04

Wednesday? Why did I leave it 'til Wednesday to write the summary? I must have some reason. Or maybe not. I'll give fair warning that I won't be doing a summary for next week though, what with Easter and everything, but you'll get a fortnight's summary the week after, because I'm good to you like that.

We'll start this week's summary with perl6-internals.

MMD vtable functions in bytecode

Dan had announced that he was working on adding parrot bytecode support for multimethod dispatch, and outlined how they'd be used and got semi-Warnocked.

The discussion got going this week, Leo Tötsch was unsure about some of Dan's implementation choices. In particular, he wondered if MMD subs should use PMCs rather than the simple function pointer that Dan had used. Dan thought not.

http://groups.google.com/

Behaviour of PMCs on assignment

The discussion of what to do when assigning to PMCs continued. The issue is complicated because we are trying to be friendly to multiple languages (though, as far as I can tell, the really problematic issue is Perl Scalars; most of the other languages that spring to mind have variables that are 'simple' pointers to objects; Perl Scalars can hold (seemingly) a million and one different things, potentially all at once). TOGoS argued that, as things stand there's a disjunction between the way (say) integer registers work and the way PMC registers work. With Integer registers, if you do

    $I1 = $I2 + $I3

then $I1 gets a 'new' integer; there doesn't need to be a preexisting integer. However, if you were to do:

    $P1 = $P2 + $P3

what actually happens (assuming we're using straightforward PMCs here...) is more like:

    $P1.value = $P2 + $P3

In other words, you need a preexisting $P1. Leo agreed with TOGoS's argument, but worried that implementing it would blow core size up to an insane value. Dan didn't agree with TOGoS though, but I'm afraid I didn't quite follow his reasoning (probably because I'm being dumb this morning).

http://groups.google.com/

In which your Summarizer asks dumb questions

In an extended moment of stupidity, Piers Cawley asked why we had distinct register and user stacks. Leo explained it to him, very politely I thought.

http://groups.google.com/

Stalking the wily Garbage Collector bug

Jens Rieks's projet du jour -- an EBNF parser in Parrot -- tweaked a garbage collection bug so he posted appropriate debug traces and Leo set to work on it. He didn't get it working fully, but it takes longer to crash now (but it crashes in the same bit of C code). Jens thinks it's a problem with Parrot's handling of strings.

http://groups.google.com/

New SDL Parrot bindings underway

That stalwart of Portland.pm, chromatic, announced that he's in the process of porting the existing SDL Parrot bindings to use our shiny new Object system. Jens Rieks wondered why he was prefixing his method names with underscores (you only need underscores for globally visible functions, methods can have straightforward names). Tim Bunce wondered why chromatic wasn't using nested namespaces. Leo pointed out that nested namespaces haven't been implemented just yet.

http://groups.google.com/

Some new classes

Dan checked in some stub code for PMCArray and StringArray. Eventually they'll be auto-resizable, PMC or String only arrays, but right now they're simple wrappers for PerlArray. He suggested that rewriting them so they were real, efficient arrays would be a Good Thing (and, I suggest, a relatively gentle introduction to Parrot hacking if anyone reading this is interested.)

Jens Rieks offered up a patch for his data dumper so it could take them into account, which Dan applied.

http://groups.google.com/

Points of focus

Dan went all Managerial on our collective donkey and posted a nice bulletted list of things that need sorting out for a 0.1.1 release. The general thrust of the message is bug fixing and documenting, which is good.

http://groups.google.com/

Fun with non deterministic searches

One of the canonical illustrations of things to do with continuations is non deterministic searches. Imagine that you could write

    $x = choose(1,3,5,9)
    $y = choose(1,5,9,13)

    assert $x * $y == 15
    
    print "$x * $y == ", $x * $y, "\n"

and have "3 * 5 == 15" printed out. (Okay, so in Perl 6 you're going to be able to do that with junctions, but this is about an underlying implementation). Piers Cawley translated a simple non deterministic search algorithm from scheme to Parrot and posted the (initially failing) code to the list and pointed out that, even if he tweaked IMCC to generate full continuations instead of RetContinuations and turned of garbage collection, Parrot fell over with a bus error.

Once he'd explained how it worked (in a post made on April Fools' Day no less) and Leo had wrapped his head round it, the work went on to make it work. It turns out that Parrot had a few too many assumptions about how call stacks would work (starting with the assumption that you could simply reused a stack frame once you'd returned through it; in the presence of a full continuation you have to let stack frames be garbage collected). Leo fixed things so that you can now make a 'full' continuation simply by cloning the current continuation in P1 and there should only be a performance hit for the call chain that leads to the continuation (and that hit should be a one time cost you pay when cloning the continuation). Way to go Leo.

Oh yes, and $P0(...) doesn't throw a syntax error in IMCC any more.

http://groups.google.com/

http://groups.google.com/ -- Continuations made simple

Collision of running jokes

Once upon a time, I endeavoured always to mention Leon Brocard in these summaries, which got increasingly difficult (not to mention tortured) as his posts to the mailing lists became more and more infrequent. However, on the first of April (aka the oldest running joke in Christendom) he posted a couple of patches. Sadly, we didn't manage to get a triple running joke collision, for it was Leo Tötsch and not chromatic who applied the patches.

http://groups.google.com/

Stream library

Okay, if Leo Tötsch is the Patchmonster, then Jens Rieks shows every indication of becoming the Libmonster. Not content with implementing Data::Dumper in pure Parrot, he's working on an EBNF Parser and, on Friday he released his first working development version of a Stream library which wraps all sorts of sources of strings behind a simple interface (suitable for parsers, for instance). Leo had a few issues with some of the implementation choices that potentially make it a little tricky to subclass streams (and then the week ended, but a little bird tells me that Jens took these comments on board and redid the library).

http://groups.google.com/

Subroutine calls

Leo announced that he's added a pmc_const opcode to parrot. The idea being that, in general subroutines don't vary that much so instead of having to call newsub every time you make a function call (IMCC usually does this), you would instead fetch a preexisting Subroutine PMC from the PMC constant pool.

http://groups.google.com/groups?selm=406D85DA.6090003@toetsch.at

Named attribute access

In a very short (but useful) post, Leo announced that you could now do

    getattribute $P0, anObject, "attribute"
    setattribute anObject, "attribute", $P0

For which I personally thank him profusely.

http://groups.google.com/

Meanwhile, over in perl6-language

Things were pretty quiet. But not utterly quiet

Default Program

Extrapolating from the general Perl principle that, in the absence of any indication otherwise, Perl should use a sensible default, Brent Royal-Gordon proposed that Perl 6 should extend this principle to entire programs. He proposed that, when the whole program was an empty string, Perl 6 should substitute a sensible default program. Based on extensive research on the Internet and printed Perl documentation, he proposed that the default program should be:

    print "Hello world!\n"

Apart from those who quibbled with his punctuation, the general response was positive. However, always one to take a good idea that one step further, Austin Hastings suggested that a more sensible default would be to have a naked invocation of perl launch an editor (or other script development environment). He proposed that, to this end, the Parrot team should be focusing on implementing elisp in Parrot rather than worrying about winning the Piethon.

Richard Nuttall thought that Austin hadn't gone far enough, he proposed that Perl 6 should load the DWIM::AI module and provide as output the script you were intending to write.

A quick glance at the calendar was in order at about this time.

http://groups.google.com/

Can colons control backtracking in logical expressions?

Gleefully ignoring Larry's stricture that "I [Larry] get the colon.", Austin Hastings wondered about using :: to mean something special in conditional statements. Quite what his proposal offered over and above

    if specific_condition() ?? detail() :: general_condition()
      { ... }

perplexed Damian somewhat (though he did forget that ... ? ... : ... has become ... ?? ... :: ... in Perl 6. Elsewhere in the thread, Larry reminded everyone that Perl 6 will not be confusing statements and expressions.

http://groups.google.com/

Announcements, Acknowledgements, Apologies

No announcements this week, apart from the one earlier about the next summary being due in two weeks because of Easter.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send me feedback at mailto:pdcawley@bofh.org.uk or, if you find yourself at Miskin Folk Festival this Easter weekend, you could always buy me a drink.

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

http://www.miskinfolk.cjb.net/ -- Miskin folk festival

Photo Galleries with Mason and Imager

Creating a photo gallery is usually considered a daunting task. Lots of people have tried it, not many have succeeded. One of the reasons for so many similar projects is that they don't often integrate well into an existing web site. In this article we're going to build a photo gallery using two important components, Mason and Imager. Writing our gallery in Mason will make it much easier to integrate into an existing web site.

Mason, also known as HTML::Mason, is a web application framework written in Perl. Mason can run in any environment, but is tuned to work best with mod_perl. We will be using a number of Mason features in this article. If you're not familiar with Mason I suggest you get the book or browse before you buy. This article is not meant to be an introduction to Mason, so some experience will definitely help when reading this. Mason idioms will be briefly reviewed when they come up.

Imager is a Perl module for dealing with images. It has mechanisms to manipulate an image, and read and write various formats. It's rather lightweight and has a clean interface in comparison to the alternative, Image::Magick.

Combining these two Perl modules, and adding a few others, allows us to write a feature-full photo gallery in just 200 lines. Let's get started.

Apache Configuration

We're going to use Mason from mod_perl for our gallery. This requires an Apache built with mod_perl, and bit of web server configuration.

First, Mason's Apache handler must be pre-loaded.

  PerlModule HTML::Mason::ApacheHandler

Next, we need to tell Apache to let Mason handle any requests that it gets for resources within our gallery.

  <Location /gallery>
    SetHandler perl-script
    PerlHandler HTML::Mason::ApacheHandler
  </Location>

We want to keep special Mason files secret from the general public. If they're requested, Apache should always return a 404 HTTP status code, for Not Found.

  <LocationMatch "(dhandler|autohandler)$">
    SetHandler perl-script
    PerlInitHandler Apache::Constants::NOT_FOUND
  </LocationMatch>

At this point, every file inside the gallery will be considered a Mason component. If you enjoy paying for lots of bandwidth and you want the full-size images to be viewable by the public, one last configuration step must occur. The raw images are not Mason components so Apache should handle those in the default way.

  <Location /gallery/images>
    SetHandler default
  </Location>

Directory Structure

For this article we'll use the following directory structure, in a directory called gallery, inside our site DocumentRoot.

  .
  |-- autohandler
  |-- dhandler
  |-- images
  |   `-- dhandler
  |-- index.html
  `-- pictures
      `-- [lots of images and sub-directories]

As you can see, all the actual photos will be uploaded to the gallery/pictures directory. Our code will recognize sub-galleries and allow for infinite nesting. We can keep our photos very neatly organized this way.

As for the rest, it's all code. autohandler and dhandler are special Mason files, and index.html is just a wrapper around the top level dhandler.

The autohandler

For this example, our autohandler is extremely simple. I'm going to assume that you already have a Mason site running with your own autohandler wrappers in place. If you don't, you can use this one.

  <%method .title>My Website</%method>

  <html>
    <head>
      <title><& SELF:.title &></title>
    </head>
    <body>
      <% $m->call_next %>
    </body>
  </html>

The first thing our autohandler does is define a subcomponent called .title. Mason subcomponents are wrapped in <%method> blocks. They are templates just like files; the only difference is they live inside the files. This is analogous to Perl files and subroutines.

Next we define the skeleton of the web page. The <title> tag's content is dynamically generated by the output of the SELF:.title subcomponent. Any time you want to call a subcomponent, the call is wrapped in <& &> delimiters.

The body, or content, of our web page will be provided by whatever component is next in the call stack. Using the global variable to access Mason object, $m, the call_next() method is executed to do just that.

In our gallery the next component in the call stack will be one of two files. If we're at the topmost level, http://example.com/gallery/, for example, index.html will be called. Everywhere else dhandler will be called. This is because no files exist for Mason to map to, and when that happens, Mason looks for a dhandler to execute.

The Invisible Index

dhandlers are default handlers for files inside a directory, and not that directory itself. Because of this we are required to provide an index.html file or Apache will attempt to display a directory listing, or possibly return a forbidden status code, if directory listings are not allowed. In reality, our index.html doesn't do anything at all.

In its entirety, index.html simply states that it inherits from dhandler. Now dhandler will be executed for all non-image access to our photo gallery.

  <%flags>
    inherit => 'dhandler'
  </%flags>

This uncovers a portion of Mason's object-like component inheritance. By default, all components inherit from autohandler. For index.html we've changed that. dhandler still inherits from autohandler, so anytime a request is sent to index.html, dhandler is first called, which calls autohandler first. Then autohandler does its thing and moves down the call stack to dhandler. dhandler, as we'll see, is not configured to call down the stack to index.html, because it doesn't need to. Thus ends the very high-level overview of Mason inheritance.

Displaying Gallery Pages

Moving on to the meat of our application, the top-level dhandler. This file has the bulk of our code, roughly 150 lines. The code is neatly organized into subcomponents, so we'll start by discussing the high-level code. And from that we'll work in order of execution.

Each page in our photo gallery has just one optional argument, a page number. By default we always start on page one (1).

  <%args>
    $page => 1
  </%args>

Next, the <%shared> block is executed. It does a lot, so we'll look at it in great detail. We're using a <%shared> block instead of an <%init> block because some of the variables defined here need to be used within multiple subcomponents. As the name suggests, <%shared> blocks allow just that.

  <%shared>
    use List::Group qw[group];
    use HTML::Table;
    use File::Spec::Functions qw[:ALL];

The first step is to load the Perl modules this component will be using. List::Group turns a flat list into a List-of-Lists (LoL) based on specific grouping options, HTML::Table turns such an LoL into an HTML table structure, and File::Spec::Functions provides a number of portable file and directory operations.

    my $GALLERY_ROOT = $r->document_root . "/gallery/pictures";

Next, we define the first shared variable. $GALLERY_ROOT is the absolute path to the location of the gallery pictures on the file system.

    (my $path_from_uri = $m->dhandler_arg) =~ s!(?:(?:/index)?\.html|/)$!!;

It's time to determine the relative path to the resource being requested. Because we're inside a dhandler, Mason provides the dhandler_arg() method, which is similar in purpose to Apache's uri() method. It returns the portion of a URI that is relative to the directory containing the dhandler. If we request /gallery/Family/IMG_0001.JPG.html, $m->dhandler_arg() will return /Family/IMG_0001.JPG.html.

Because we're looking for the path to an actual photo or gallery directory, there is some information to be removed from the end of our relative path. So our regex removes useless information such as index files, HTML extensions, and extra backslashes.

    my $file = catdir $GALLERY_ROOT, $path_from_uri;
    $m->clear_buffer and $m->abort(404) unless -e $file;

From these two variables we can construct the absolute path to the file we're interested in using catdir(), from File::Spec::Functions. If this file doesn't exist, we don't want to go any further, so Mason's output buffer is cleared and the request is aborted immediately with a 404 HTTP status code, meaning Not Found.

If a gallery is being requested, not a specific photo, we must get the contents of that gallery. If a photo is being requested, we must get the contents of the gallery that photo belongs to.

    my $dir = -d $file ? $file : (splitpath $file)[1];
    opendir DIR, $dir or die $!;
    my $dir_list = [ map "$dir/$_", grep { ! /^\./ } readdir DIR ];
    closedir DIR;

Using a file test operator, we can determine if the current request is for a file or a directory. If a directory we simply assign $file to $dir. If a file, we use splitpath() from File::Spec::Functions. splitpath() returns three elements, the volume, directory tree, and filename. We're after the directory tree, or second element.

The $dir_list array reference is populated with a list of absolute paths to each file in $dir, excluding files that begin with a dot (.).

Now it's time to move on to building the breadcrumbs for navigation. This method of navigating "up" the photo gallery is important because we can have infinite levels of sub-galleries.

    my @bread_crumb = ('Gallery', splitdir $path_from_uri);

First we define our plain-text list of crumbs in @bread_crumb. The first element is the name of our photo gallery, which I imaginatively named Gallery. The rest of our breadcrumbs come from $path_from_uri by calling splitpath() to get the list of elements.

Our @bread_crumb list is great for the title of the page, but it doesn't contain any links for use inside the page for navigation. A new list of breadcrumbs will be created with correct linking.

    my @bread_crumb_href;
    push @bread_crumb_href, sprintf '<a href="/gallery/%s">%s</a>',
      join('/',@bread_crumb[1..$_]), $bread_crumb[$_]
        for 0 .. $#bread_crumb - 1;
    push @bread_crumb_href, $bread_crumb[-1];

For each breadcrumb except the very last, we create an HTML link. The reference location for each link, from left to right, needs to cumulatively add directories from the links before it. That's what join('/',@bread_crumb[1..$_]) does. Finally we tack on the last element of the breadcrumb, unlinked, because it is the currently requested resource.

To illustrate, if a request is made to /gallery/Backgrounds/Nature%20Backgrounds/ICmiddleFalls1280x1024.jpg.html, the following list is in @bread_crumb_href.

  (
   '<a href="/gallery/">Gallery</a>',
   '<a href="/gallery/Backgrounds">Backgrounds</a>',
   '<a href="/gallery/Backgrounds/Nature Backgrounds">Nature Backgrounds</a>',
   'ICmiddleFalls1280x1024.jpg'
  )

Finally, we construct two scalars to hold the contents of our breadcrumbs.

    my $bread_crumb      = join ' &middot; ', @bread_crumb;
    my $bread_crumb_href = join ' &middot; ', @bread_crumb_href;
  </%shared>

At this point we can define the .title subcomponent, using the $bread_crumb shared variable.

  <%method .title><& PARENT:.title &> &middot; <% $bread_crumb %></%method>

Notice that there is a subcomponent call to PARENT:.title. This is another illustration of Mason's inheritance model. Because dhandler inherits from autohandler, the .title subcomponent in dhandler is overriding the .title method in autohandler. That is to say, dhandler is subclassing autohandler. For this reason, if we don't want to clobber the .title subcomponent declared in autohandler we must be sure to call our parent. This is very similar to invoking a SUPER:: method in Perl.

Now we can move on to the actual gallery display.

  <h1>Photo Gallery</h1>
  <h2><% $bread_crumb_href %></h2>

Using another shared variable, $bread_crumb_href, we construct our backward navigation.

  <table>
    <tr>
      <td valign="top" width="15%">
        <& SELF:.sub_gal_list, dir_list => $dir_list &>
      </td>
      <td valign="top" width="35%">
        <& SELF:.photo_list, dir_list => $dir_list, page => $page &>
      </td>
      <td valign="top" width="50%">
  % if ( -f $file ) {
        <& SELF:.photo_view, file => $file &>
  % }
      </td>
    </tr>
  </table>

We have three columns of information to display at any one time -- an HTML table is a good way to do that. (Some standards purists and XHTML masochists will disagree with me on this point. I'm interested in keeping the examples simple, not pure.) Each of the table cells calls a subcomponent with the appropriate arguments. Those subcomponents are discussed in detail later in this article. Notice that before we call SELF:.photo_view we check to see if the request is currently for a file. This can save us from calling that subcomponent if we currently don't want to look at a photo.

The first subcomponent called is SELF:.sub_gal_list. As the name suggests, it will list sub-galleries.

  <%method .sub_gal_list>
    <%args>
      @dir_list
      $wrap => 1
    </%args>

    <h3>Sub <% @dir_list == 1 ? "Gallery" : "Galleries" %></h3>
    <% $table %>
  
    <%init>
      @dir_list = grep { -d $_ } @dir_list;
      return unless @dir_list;
      $_ = $m->scomp('SELF:.sub_gal_view',dir => $_) for @dir_list;
      my $table = HTML::Table->new(-data => [ group \@dir_list, cols => $wrap ]);
    </%init>
  </%method>

.sub_gal_list accepts a directory listing argument. It also optionally accepts an argument detailing after how many entries in the list should be in each row.

Jumping to the <%init> block (remember the order of execution?), we filter the directory listing to exclude any entries that are not directories themselves. If that produces an empty list, there's no need to continue processing this subcomponent, so just return. Next, each of the entries are reformatted by passing them to the SELF:.sub_gal_view method. This is where it gets fun.

When a subcomponent is called, it's really just syntactic sugar to call $m->comp().

  <& SELF:.sub_gal_view, dir => $_ &>

The previous statement is exactly equivalent to the following:

  % $m->comp( 'SELF:.sub_gal_view', dir => $_ );

Mason also defines the scomp() method, which compiles a subcomponent but returns its output as a string, just like Perl's sprintf.

After reformatting the entries, we group the flat list into a List-of-Lists containing just one column. That list is used as the value of the -data parameter to HTML::Table-new()>, which returns a table object.

Now it's time to process the template portion. First a heading is created. It's only plural if we have more than one sub-gallery. After the heading the sub-gallery table is displayed. Because an HTML::Table object overloads stringify, there's no need to call a method on it to get the HTML output.

Let's quickly look at the .sub_gal_view subcomponent used to reformat each directory listing.

  <%method .sub_gal_view>
    <%args>
      $dir
    </%args>
    <a href="/gallery/<% $rel_dir %>"><% $label %></a>
    <%init>
      my $rel_dir = abs2rel $dir, $GALLERY_ROOT;
      my $label   = (splitpath $rel_dir)[-1];
    </%init>
  </%method>

This subcomponent is extremely straightforward. It accepts a directory. Inside <%init>, $rel_dir is set to the relative directory path in relation to the $GALLERY_ROOT, which will give us a proper URL for the link. Finding the label for the link is simple, it is the real directory name, which is the last element of the list returned by splitpath(), from File::Spec::Functions.

This subcomponent finally generates the proper link for navigating to sub-galleries.

The next subcomponent called by our top-level component is .photo_list, which generates the thumbnail view of our images.

  <%method .photo_list>
    <%args>
      @dir_list
      $wrap => 5
      $rows => 7
      $page => 1
    </%args>

    <h3><% @dir_list == 1 ? "Photo" : "Photos" %>
        <& SELF:.photo_pager, page => $page, pages => $pages &></h3>
    <% $table %>

    <%init>
      @dir_list = grep { -f $_ } @dir_list;
      return unless @dir_list;
      $_ = $m->scomp('SELF:.thumb_view',file => $_, page => $page)
        for @dir_list;
      my @files = group \@dir_list, cols => $wrap;
      
      my $pages  = int( @files / $rows );
         $pages += 1 if $pages < ( @files / $rows );
      @files = splice @files, $rows * ($page - 1), $rows;
      
      my $table = HTML::Table->new(-data => \@files);
    </%init>
  </%method>

Just like .sub_gal_list, the only required argument to this component is the directory listing. The other optional arguments correspond to how many images should be in each row ($wrap), how many rows to show on a page ($rows), and what page we're currently on ($page).

Once again we jump to the <%init> block where the directory listing is filtered to only include files. If there are no files, there's no reason to go any further, so just return from this subcomponent. Just as we did with sub-gallery listings, we reformat the remaining list of files by calling a subcomponent and storing its output. Next, we group the list of files into a List-of-Lists (LoL), each row containing $wrap entries.

Photo galleries may contain any number of photos, so it's essential to support paging for thumbnails. First we need to determine how many pages this gallery will have in total. To do that we divide the total number of rows by the number of rows we want on each page. That could return a fractional number that will be cut off to the nearest decimal by int. If that's the case then we want to increment the number of pages by one (1). Next we can extract the rows for our current page from all the rows currently in @files using a splice.

Finally, a new HTML::Table object is created, and populated with @files.

In the template portion a header is output, again only using the plural if we have more than one photo. Our header also contains paging information, provided by the .photo_pager subcomponent. Lastly, the HTML table full of thumbnails is displayed.

Speaking of thumbnails, it's time to look at the code in the .thumb_view subcomponent.

  <%method .thumb_view>
    <%args>
      $file
      $page
    </%args>
      <a href="/gallery/<% $rel_img %>.html?page=<% $page %>">
        <img src="/gallery/images/<% $rel_img %>?xsize=50;ysize=40" border="0" />
      </a>
    <%init>
      my $rel_img = abs2rel $file, $GALLERY_ROOT;
    </%init>
  </%method>

This component takes two arguments. $file is the image to be turned into a thumbnail, and $page is the current page of this gallery. The <%init> block just finds the relative path of this image from the $GALLERY_ROOT. In the template the thumbnail is linked to the HTML file that this image would be displayed on, and includes the current page information as a means of saving that state.

The source of the image points to a file under /gallery/images, and includes query parameters for maximum width (xsize) and height (ysize). This is interesting because the pictures don't live there at all. If you recall, the only thing inside the images directory was a dhandler. More on that later.

The other subcomponent that .photo_list called was .photo_pager.

  <%method .photo_pager>
    <%args>
      $page
      $pages
    </%args>
    (
  % for ( 1 .. $pages ) {
  %   if ( $_ == $page ) {
        <strong><% $page %></strong>
  %   } else {
        <a href="?page=<% $_ %>"><% $_ %></a>
  %   }
      <% $_ != $pages ? "&middot;" : "" %>
  % }
    )
    <%init>
      return if $pages == 1;
    </%init>
  </%method>

This subcomponent takes two arguments, the current page and the total number of pages. Before anything is output, the <%init> block checks to make sure we have more than one page. If not, no sense in going on. Looping through all the page numbers, we link all the numbers except our current page. After every number except the last one, we output a stylish separator. This subcomponent is very simple, but big enough that it's worth abstracting from the .photo_list subcomponent.

The final subcomponent in the top-level dhandler is .photo_view.

  <%method .photo_view>
    <%args>
      $file
    </%args>
    <h3>Photo</h3>
    <img src="/gallery/images/<% $rel_image %>?xsize=400x;ysize=300" />
    <%init>
      my $rel_image = abs2rel $file, $GALLERY_ROOT;
    </%init>
  </%method>

This component does things that we've already seen done in .thumb_view, so there's no need to expound upon it here.

The Images dhandler

You've probably guessed by now that we intend to use Mason to process images. Mason is well suited to outputting many forms of data, not just text, and we'll be exploiting that fact for our image gallery.

  <%args>
    $xsize => undef
    $ysize => undef
  </%args>

This component accepts two parameters that we've already described. $xsize is the maximum width an image can be, and $ysize is the maximum height an image can be.

  <%flags>
    inherit => undef
  </%flags>

This is the important part. Because components have inheritance, the dhandler would normally inherit from the autohandler. That's bad news when the autohandler is tuned to sending out HTML and our dhandler is trying to send binary image data. Setting the inherit flag to undef tells Mason that the dhandler doesn't inherit anything, that it's responsible for its own output.

The only code remaining in this template resides in the <%init> block, so let's step through that now.

  <%init>
    $m->clear_buffer;

The very first thing we do is clear Mason's output buffer. This clears any headers that have already been built up in the buffer.

    use Imager;
    use File::Type;

Next we use the modules that will help scale the images, Imager and File::Type. Imager has already been discussed. File::Type uses magic to discover the type of files, and does so in a very memory-sensitive way.

    my $send_img = sub {
      $r->content_type( "image/$_[0]" );
      $r->send_http_header;
      $m->print($_[1]);
      $m->abort(200);
    };

This anonymous subroutine just encapsulates code being executed twice, as a means to remove duplication. It sets the HTTP Content-Type header to the image type passed as the first argument. Next it sends the HTTP header out. Then it sends the image data out, which is the second argument passed to this subroutine. Finally, it aborts execution with an HTTP 200 status code, everything is OK.

    ( my $file = $r->document_root . $r->uri ) =~ s/images/pictures/;

Discovering the proper file name for the image takes just a little work. After concatenating the document_root() with the uri(), we replace the images portion of the file path with pictures. Remember, none of the images are actually in the images directory.

    my ($image, $type) = split /\//, File::Type->checktype_filename($file);
    $type = 'png' if $type eq 'x-png';

With the knowledge of the proper file name, File::Type can figure out what type of file we have. This is more foolproof than attempting a guess based on filename extensions. As a minor oddity, File::Type returns a non-HTTP friendly $type for PNG images, so we need to fix that problem if it exists.

    my $key = "$file|$xsize|$ysize";
    if ( my $data = $m->cache->get( $key ) ) {
      $send_img->($type, $data);
    }

Generating scaled images from huge photos is a time-consuming function. It also has the potential to eat memory like a sieve. As a result, it's imperative that we take advantage of Mason's built-in caching functionality. The key for each entry in our cache must be unique for each file, and the dimensions we're trying to scale it to. Those three pieces of data will make up our $key. If data is returned from the cache using the $key, then the image data is sent and the request is immediately aborted. This is a quick short-circuit that allows us to grab an image from the cache and return it at the earliest possible moment. Later in the article you'll see how to set the data into the cache.

    $m->abort(500) if $image ne 'image' || ! exists $Imager::formats{$type};

It's possible that the file being requested isn't an image. It's also possible that our installation of Imager doesn't support this type of image. If either of these conditions are true, we should abort immediately with a 500 HTTP status code, Internal Server Error.

    my $img  = Imager->new;
    if ( $img->open(file => $file, type => $type) ) {
        if ( $xsize ) {
          $img = $img->scale( xpixels => $xsize )
            unless $img->getwidth < $xsize;
        }
        if ( $ysize ) {
          $img = $img->scale( ypixels => $ysize )
            unless $img->getheight < $ysize;
        }

        my $img_data;
        $img->write(data => \$img_data, type => $type);
        $m->cache->set( $key => $img_data );
        $send_img->($type, $img_data);
    }

Now the heart and soul of image manipulation. The first step is to create a new Imager object. Next we try to open the image $file. If that succeeds, we can proceed to scaling the image.

When scaling, it's more important (to me) that the height of the image is exactly how I want it, so width is scaled first. Before the image is scaled its size is tested against the size of the image to be created. No scaling should occur if the image is smaller than the preferred size.

Once scaling has finished the image data can be extracted from the Imager object. When calling write() on the object we can pass a data option to let Imager write to a scalar reference. After the image data has been retrieved it is placed in the cache using the same $key that we first used when attempting to get information out of the cache. Finally, the image is sent out and the request is aborted.

    warn "[$file] [$image/$type] " . $img->errstr;
    $m->abort(500);
  </%init>

In the event that Imager wasn't able to open the $file, the request should be aborted with a 500 HTTP status code, Internal Server Error. Before abortion, however, it would be useful to get some information in the error_log. The requested $file, its type information, and the error produced by Imager are all printed to STDOUT via warn.

What It Looks Like

For the less adventurous, yet overly curious members of the audience, a screenshot of our photo gallery follows.

Photo Gallery Screenshot

As an aside, that image was originally much larger, but I really wanted it to be just 450 pixels wide. I don't have any image manipulation tools to do that job, but I do have Imager. Thanks to Imager, it took me 30 seconds to whip up the following command line snippet.

  perl -MImager -le'Imager->new->open(file=>shift,type=>"jpeg")
    ->scale(xpixels=>450)
    ->write(file=>shift,type=>"jpeg")' figure_0.jpg figure_0_0.jpg

Conclusion

We've just created a photo gallery that takes all the hard work out of maintaining photo galleries. There's no need to pre-generate HTML or thumbnails. There's no web application interface so you don't have to change ownership of your gallery directory to the same user that Apache runs as. Using Mason's built-in caching, photo galleries are nearly as fast as accessing the data directly from the file system. Well, at least on the second request. Our galleries have paging and infinite sub-galleries. Most importantly, using Mason to its full potential has given us a fully customizable, very tiny web application that can be dropped into any existing web site or framework.

In fact, this code is the majority of the faceplant project. The source code can be downloaded from http://search.cpan.org/dist/faceplant. faceplant implements a few more features and is a bit more customizable. As such, its code is an excellent follow-up to this article. Go forth, now, and plant thy face on the Internet!

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en