March 2003 Archives

This week on Perl 6, week ending 2003-03-30

Welcome once again to the gallimaufry that is a Perl 6 summary. Unfettered this week by the presence of feline distraction we plunge straight into the crystal clear waters of perl6-internals.

Iterator proof of concept

People must really like Leo Tötsch's Iterator proposal and patch. There was only one comment about it this week (from Leo, so I'm not sure if it counts) asking for thoughts about a possible Ref or Reference class. Anyone?

http://groups.google.com/groups

Bonsai

Zach Lipton announced that he has Bonsai up and running on the Parrot CVS repository. This gives us a shiny new way of looking at the history of the distribution. Thanks Zach.

http://tinderbox.perl.org/bonsai

BASIC, IMCC and Windows

Clinton A Pierce had problems trying to get IMCC building correctly under Win32 in order to generate a binary distribution for the win32 platform. He finally brought up a working IMCC on Friday and asked for comments about what a binary distribution should look like.

http://groups.google.com/groups

http://groups.google.com/groups

getopt.macro

Inspired by the argument processing in Leon Brocard's uniq(1) implementation, Jonathan Scott Duff presented a getopt macro for Parrot programmers. Benjamin Goldberg had a few suggestions about how to make things a little more flexible.

http://groups.google.com/groups

Patch to examples/assembly/cat.pasm

Paul Duncan offered a patch to examples/assembly/cat.pasm which would cause it to terminate when it received a ctrl-D.

http://groups.google.com/groups

IMCC and scientific notation

Having got IMCC working in a win32 environment, Clinton A Pierce discovered that IMCC didn't understand scientific notation, but the parrot assembler does. Leo Tötsch pointed out that it sort of does, but instead of 1e20 you have to write 1.e20. Joseph Ryan wondered if handling scientific notation wasn't really a job for individual compilers, but Clint and Mark Biggar explained why this wasn't a good idea. As Mark said, in a world with BigFloats you don't want to have to expand 1e2048 into a 2049 character string if you can possibly help it.

http://groups.google.com/groups

RT spam

Someone claiming to be Marla Hurley crawled out from under a stone and took it upon themselves to offer credit to the perl6 RT installation. Thanks. We needed that.

Meanwhile, over in perl6-language

The language list was again busier than the internals list this week, but volume has fallen on both lists. (If you don't count an off topic thread on the internals list, which I haven't, there were only 22 messages there this week. And very few patches from Leopold Tötsch, I hope he's all right.)

is static?

Discussion of static/state variables continued. Arcadi Shehter wondered if it made sense to attach but properties to closures. I confess I didn't really understand what he was driving at. Austin Hastings and Larry saw something in it, and the question shifted to ways of doing state variable initialization, which in turn led to possible changes in the various control of flow keywords. As Larry pointed out, if you have a static variable:

   state $variable

Then, assuming that you need 'undef' to be a possible value for your variable, you need some way of doing once and only one initialization of that variable.

   state $variable;
   ONCE_AND_ONLY_ONCE { $variable = $initial_value };

The problem is that INIT and CHECK blocks happen too early; in the code above, $initial_value may well not be set, if your state variable is set up inside a closure this becomes even more likely. Larry reckons that the most useful initialization semantics appear to be 'just in time'. In other words you want initialization to happen on the first actual call to a closure. But FIRST {...} is already taken, so Larry is considering renaming FIRST and LAST to ENTER and LEAVE, freeing up FIRST to mean "my very first time". Freudian analysis was discouraged.

http://groups.google.com/groups

Argument initializations

Michael Lazzaro summarized the various different and proposed assignment operators available in Perl 6, including a proposed ::= for 'only assign to uninitialized variables'. Michael wondered how these could be used in method signatures and proposed some changes to the signature system as set out in Apocalypse 6. People were dubious about this, with Damian saying "I don't think that allowing 20 different types of assignment in the parameter list of a subroutine actually helps at all." I'm not sure Michael is convinced yet.

http://groups.google.com/groups

P6ML?

Michael Lazzaro asked if anyone was actually working on P6ML (a project name that popped up last week in the 'XML is too hard for Programmers' thread) and if there was any idea of what such a project would entail. Robin Berjon was unsure of the wisdom of such a project, arguing that supporting a tool that wasn't strict about the XML it parsed would be a retrograde step and giving reasons as to why. However, he did think that creating a toolset for 'recuperating data from a quasi-XML [document] (aka tag soup)' would be interesting and useful, and he proposed a couple of approaches.

It's apparent, from reading this thread that people who don't like the current generations of XML tools really don't like them at all: adjectives such as 'craptacular' and phrases like 'festering pile of steaming camel turds' were bandied about. Then there's the 'Perl has lots of ways of doing XML, which is great because you can pick the one that suits you' camp, and the 'Perl has lots of ways of doing XML, which is terrible because you have to pick the one that suits you and that takes time' camp.

Leon Brocard pointed out that, whilst Perl 6 might have nicer syntax and a faster parsing engine, there was nothing to stop people working out, and implementing, the required semantics in Perl 5 right now. There was a fair amount of muttering that, however desirable or otherwise P6ML may be, there wasn't really much need to be discussing it on a language design list (as if that could stop anything).

Dan Sugalski caught everyone out by raving about the idea: "the thought of altering Perl 6's grammar to make it a functional language is sheer genius [...] I say we start right away!". The only catch is that Dan was talking about ML, a programming language and he bent the needle on Austin Hasting's Sarcasmeter. But he promised to fix any such devices at OSCON if their owners would bring them to him there. So that's all right then.

http://groups.google.com/groups

Summary writer goes all red and glowy.

Matthijs van Duin pointed out that Piers had misspelled him as 'Mattijs' 4 times in last week's summary.

I am really sorry and I shall endeavour never to do it again.

Support for attributed DAGs

In a thread deceptively named 'Perl and *ML', Dan opined that XML would be so much easier to support if Perl had good support for attributed DAGs (That's Directed Acyclic Graphs, or 'trees'), and noted that having such support would be good for things like Abstract Syntax Trees too. Robin Berjon wasn't so sure, pointing out that, whilst fast and efficient graph support would be really useful, acyclic graphs weren't that useful for XML as useful XML representations usually had back links to parent and sibling nodes (and that's before you take linking into account). I have the feeling that further discussion of graph support probably belongs on the internals list for the time being, but I could well be wrong.

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

I've already apologized to Matthijs "Mattijs" van Duin for misspelling his name, but I'll do it again. I'm really, really sorry.

If you've appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk.

Five Tips for .NET Programming in Perl

Randy Ray is coauthor of Programming Web Services with Perl

Abstract: One of the most common categories of questions on the SOAP::Lite mailing list is how to get Perl SOAP applications to work with .NET services. It's not that Perl and SOAP::Lite are not suited to the job, but rather that there are easy traps to fall into. Add to that the fact that .NET has its own distinct philosophy toward applications, and the confusion is understandable. This article will cover some of the most common traps and considerations that trip up Perl developers.

The .NET Attraction

When Microsoft first announced its .NET initiative and the variety of technologies that would be created to support it, it was met with some skepticism. Reactions ranged from "they're at it again" to "this could potentially be really powerful." Right now, the reality is sitting somewhere in between, but it is gradually moving from the realm of "just another Microsoft gimmick" to widespread acceptance. Whatever else the .NET concept accomplishes, it is already bringing Web services to the general desktop arena.

One of the limiting factors to the larger acceptance of .NET has been the limited set of fully supported languages. Microsoft promotes its C# language, while also providing .NET-enabled development tools and environments for the other languages that its Visual Studio product supports--Java, C++, and Visual Basic. Because .NET is based on several published standards, other tools that are not generally .NET-centric are still useful, and provide alternatives to the Microsoft tools for some languages, primarily Java.

The main XML concepts to keep in mind when dealing with a .NET service are XML Schema and WSDL, the Web Services Definition Language. A .NET service automatically generates a WSDL description of itself as a part of the tool environment that Microsoft provides. This is a powerful feature, and the key to interoperability with other languages. WSDL itself defers the definition of complex datatypes to the XML Schema application, which describes both document structure and the nature of the content itself.

Unfortunately, Perl has been largely overlooked in the tools arena where .NET is concerned. Even though the purpose of basing .NET on open standards was to enable wider integration with other tools, systems, and languages, the only .NET product for Perl currently available is a Visual Studio plug-in for Perl that runs on only Microsoft-based platforms. As the number of .NET services grows, so will the desire to access these services from Perl clients, often on platforms that don't support Visual Studio. The key to doing this lies in the fact that .NET services natively support SOAP as an interface, through the WSDL descriptions.

The Tips & Tricks

The rest of this article offers five hints and help on writing these clients in Perl, using the SOAP::Lite toolkit.

  1. Identify and Qualify

    The first, and most common, mistake that Perl clients make is to forget that .NET interfaces are strongly typed and use named arguments. Perl does neither of these things by default. In fact, when dealing with a SOAP service that is written in Perl, most of the time a client doesn't have to worry about things such as parameter name or type. A .NET service is strict about the names and types (and namespace URIs) of the arguments.

    For example, imagine a service that provides the current time, possibly adjusted for time zone. The call, getCurrentTime, may be called with no arguments for the time in UTC (Universal Coordinated Time), or it may be passed a single argument, zone, the specified time zone for which the time should be adjusted. It expects this argument to be of type string (using the definition of that basic type from the XML Schema specification). But simply passing the argument to a call won't get the name right. Instead, SOAP::Lite will create a generic name for the parameter when it creates the actual XML, and .NET will reject it.

    To get around this, use the SOAP::Data class that is a part of SOAP::Lite (it is in the same module, so you don't have to load a second library):

    
     $arg = SOAP::Data->new(name => 'zone', value => 'PST');
        

    SOAP::Lite will properly identify values like "PST" (Pacific Standard Time) or "CDT" (Central Daylight Time) as being strings. But what if the interface also accepts numerical offsets like "-0800" or "+1100"? Without explicit casting as strings, those values would be encoded as int values. And the service would reject them.

    The SOAP::Data class covers this as well:

    
     $arg = SOAP::Data->new(name => 'zone', value => 'PST', type => 'xsi:string');
        

    (The xsi: prefix on the type refers to the XML Schema Instance namespace, which SOAP::Lite always defines and associates with that specific identifier.)

    The SOAP::Data class provides methods for all aspects of the data item: name, value, type, and so on. It also provides uri to specify the namespace URI (when needed), and encodingStyle to specify the URI that identifies an alternative encoding from that being used in the rest of the request.

  2. Be Careful About Namespaces

    The XML namespaces used on elements are just as important in a .NET service as the name and type are. SOAP relies on namespaces to distinguish parts from each other, but again the relaxed nature of Perl can mean that Perl-based services lure you into a false sense of ease that .NET doesn't share.

    Unfortunately, SOAP::Lite makes this harder for .NET applications by defaulting to no namespace for elements when none is explicitly given. Luckily, the SOAP::Data class includes a method for defining the namespace as well:

    
     $arg->url('http://use.perl.org/Slash/Journal/SOAP');
        

    Explicitly providing the namespace gets even more important when encoding complex types such as hashes and arrays.

  3. Use Classes and Objects to Control Encoding

    The default methods that SOAP::Lite uses to encode arrays and hash tables will not produce the style of XML that a .NET service is expecting. SOAP::Lite will base things on the plainest, most vanilla-type descriptions in the SOAP specification itself, while .NET uses complex types as defined in XML Schema for elements that are not basic data.

    Suppose a .NET interface defines a complex type it called CodeModuleDef. This type has (for now) three simple elements: the name of the module, the language it is in, and the lines of code. Now imagine the remote method registerModule expects one such object as an argument. This won't work:

    
     $soap->registerModule(SOAP::Data->name('module')->uri->('http://...')->
                           value({ name => 'SOAP::Lite', language => 'perl',
                                   lines => 4973 }));
        

    The receiving server will get an element typed as "SOAPStruct" in the default namespace for unknown types (http://xml.apache.org/xml-soap). While setting the name and URI were OK, the type will still stop things dead in their tracks. Instead, do this:

    
     $soap->registerModule(SOAP::Data->name('module')->uri->('http://...')->
                           value(bless { name => 'SOAP::Lite',
                                         language => 'perl',
                                         lines => 4973 }, 'CodeModuleDef'));
        

    The resulting structure will have the correct type attribute set. The same approach can be used for array references and scalars that are types other than the basic types.

  4. Oops ... Watch Out for Element Order and Namespaces

    This actually won't quite bridge the gap all the way. There are two problems with it. One is a general Perl feature, the other may be a bug with SOAP::Lite.

    The first problem is that XML Schema generally requires the elements of a complex type to be in a well-defined order (there's one type of compound structure that doesn't require this, but it isn't a commonly used). Perl, by nature, doesn't preserve a specific order to hash keys. You can use the Tie::IxHash module, if you know you will always insert the keys in the correct order. Or, you can provide your own serialization code for SOAP::Lite to use. As it happens, this will allow you to fix the second problem, too. The second problem stems from the fact that SOAP::Lite assigns the correct namespace to the outer element of the structure, but not to the inner elements. In a schema-based type, all the elements must be in the same namespace. This may be a bug in SOAP::Lite, but that hasn't been determined for certain (it seems like an unusual feature). The inner elements are in fact given no namespace at all.

    What is needed is a function that can be given to the serializer to use when it encounters an object of any of the special classes. This routine won't have to actually produce XML, SOAP::Lite will still take care of that. This routine will only have to produce a structure that the serializer will understand from the object it is given. A filter, in other words. To do that, two things are needed: an explicit declaration of the field-order, and an understanding of how the serializer expects to call this filter, and what it expects to be returned by it.

    The explicit ordering is simple; it can be an array declared in the class namespace, or a "private" key on the hash itself. Unlocking the second piece took some digging around into the internals of the SOAP::Lite source code. Connecting the two took even more digging.

    For the example approach below, assume that any class that will use this generic serialization filter defines an array called @FIELDS as a package-global value. Warning: This approach may be a little hard to wrap one's brain around at first. We'll explain it after the code:

    
     sub serialize_complex_type {
       my ($soap, $obj, $name, $type, $attr) = @_;
    
       my @fields = do { no strict 'refs'; @{ref($obj) . '::FIELDS'} };
       if ($name =~ /^(\w+):(\w+)$/) {
         $name = $2;
         $attr->{xmlns} = $attr->{"xmlns:$1"};
         delete $attr->{"xmlns:$1"};
       } else {
         $attr->{xmlns} = $attr->{uri};
       }
    
       [ $name, $attr,
       [ map { defined($obj->{$_}) ?
               $soap->encode_object($obj->{$_}, $_) : () }
         @fields ] ];
     }
        

    Additionally, every package that plans to use it to serialize their objects will have to make it visible to the SOAP::Serializer package:

    
     *SOAP::Serializer::as_Some__Class = \&main::serialize_complex_type;
        

    Here, we assumed a class name of Some::Class, and that the earlier code was declared in the main namespace. So what does it all do?

    The serializing routine is called by an object derived from SOAP::Serializer (or a subclass of it), as a method with the object to serialize, its name, its type, and the hash table of attributes as arguments. It expects to get back an array reference with three or four values. The first is the tag name to use. The second is the hash reference of attributes to add to the opening tag, and the third is either an array reference of nested content, or a simple scalar value to be put inside the opening and closing tag pair. The fourth element, should you include it, is a value for an "ID" attribute. This is used to uniquely identify an element, for the sake of multiple references and such. We don't worry about it here.

    So we take the arguments as they are given, and the first thing the routine does is check for a name that includes a namespace. Since the style of SOAP::Lite will leave any child elements unqualified, the namespace label is stripped and the URI itself is assigned to the simple xmlns attribute. This will make it apply to any child elements of this object. If the object contains other objects as children, then they too will be run through this serializer, so they will have the chance to declare a namespace in the same way. If there was no label, then we take the value of the uri attribute, which would have been set by calling the SOAP::Data method of the same name. Finally, we build the array reference using the (possibly modified) $name, the modified attribute hash reference and an array reference of the object's elements, in field order.

    We got the field order by using a symbolic reference to the @FIELDS array for the object's class. Much the same sort of trick is needed to get the SOAP::Serializer class to use this code. When handed a blessed object, the serializer takes the class name, changes all "::" to "__", and prepends "as_" to the result. It then looks for a method by that name. It searches using the serialization object itself, so the method has to be visible to that class. Right now, there isn't a way in SOAP::Lite to do this more directly. To hook this serialization into place, we directly alias "as_Some__Class" into the needed package, as shown above.

    Note that the recursive encoding of the object's contents (which may be objects themselves) is handled by a SOAP::Serializer method called "encode_object." This is an undocumented part of the serializer that was the source of much of the logic for the code snippet above. It was by examining this routine that the above code (in a slightly different form) was used in writing a complex client to use the MapPoint.NET service, one of Microsoft's commercial .NET services.

  5. Play with stubmaker.pl and SOAPsh.pl

    For the last tip, a more simple piece of advice. The SOAP::Lite package comes with a number of utility scripts. Among these are a "shell" for testing SOAP services, called SOAPsh.pl, and a code-generation tool for generating classes from WSDL description files, called stubmaker.pl.

    As was mentioned earlier, WSDL plays a major role in the definition and the documentation of .NET services. The stubmaker.pl tool tries to create Perl classes based on a WSDL document. It does a fairly good job, but it lacks in support for XML Schema. If a service uses any nonbasic types, then the template that stubmaker.pl generates would not handle it as well.

    This should not prevent you from using the tool. It does a lot of the heavy lifting by extracting the proper namespace URLs, actual server URLs, and remote method names. Even if the code template itself cannot be used directly, then it could still save you work.

Summary

There is much more to properly dealing with .NET services than can be addressed in a single article. The goal here is to head off some of the more frequently encoutered problems, and let you as the developer focus on more important issues.

One of the biggest drawbacks to using Perl for .NET is that there is only limited support at present for WSDL and XML Schema. Some new CPAN modules are working to fill these gaps, but they are in early stages of functionality. For now, it is still necessary to do some of the core steps manually. This situation should improve over time.

As a last "bonus" tip, remember this: Much of the advice here is just as important for writing .NET services, as it is for writing .NET clients. For the next Web service you write, consider the additional client base you would have access to if you wrote it to be compatible with .NET.


O'Reilly & Associates recently released (December 2002) Programming Web Services with Perl.

This week on Perl 6, week ending 2003-03-23

Assuming I can tear myself away from stroking the cat who has just magically appeared on my chest and is even now trying to wipe his dags on my nose, welcome one and all to another Perl 6 summary, which should go a lot quicker now that Sully has moved off and let me see the screen. He'll be back though. But before that happens we'll start with perl6-internals.

Speed question: save/restore vs push/pop

Benjamin Goldberg wondered about how many saves it takes to be slower than one pushx. Steve Fink pointed out that save and pushx are not interchangeable as they push things onto totally different stacks (save pushes onto the generic user stack, pushx push onto type specific register frame stacks). Leo Tötsch ignored this difference (which Ben was apparently aware of) and did some tests which showed that, on his local architecture and core, 2 saves were slower than one push. On Dan's machine, it took 3 saves to be slower than one push. Dan also pointed out that, on some architectures, there's a subtle win with saves because a push would dirty the L1 and L2 caches while the save wouldn't. (On SPARC at least, apparently).

http://groups.google.com/groups

Baby Steps in porting IO to the PIO architecture

Last week, Leon Brocard posted his implementation of the uniq(1) Unix utility to PASM. This week, Jürgen Bömmels' ongoing project of moving Parrot's IO operators from the STDIO library to Parrot's own PIO library (which is asynchronous and a combination of lovely and scary by all reports.) broke this because PIO didn't have any read buffering capabilities, which means that the readline op didn't work.

On Monday, Jürgen had write buffering working, which Dan applied with alacrity while making noises about needing to finish up the documentation for Parrot's asynchronous IO system. There was some debate on whether a PIO flush should force a write to the disk, or simply ensure that all the buffers were flushed to the OS. Nicholas Clark discussed some issues he'd had with Perl 5.8's PerlIO system, and the need for two different kinds of flush, or more succinctly, the need to distinguish 'flush' from 'sync'. Dan was convinced by this argument and Steve Fink seconded the decision.

Then, on Sunday, Jürgen released a patch which got read buffering working. Dan had a few problems with it and the patch hadn't been applied by the end of the week. I'm sure it'll go in like a greased weasel once that's been fixed though.

http://groups.google.com/groups -- Jürgen adds write buffering

http://groups.google.com/groups -- Nicholas discusses different types of flush

http://groups.google.com/groups -- Jürgen adds read buffering

PBC Object Files

Michael Collins wondered if it might be a good idea to come up with a parrot 'object file' specification. I liked Michael's description of an object file as 'a file that contains compiled instructions plus a section with a table that maps subroutine/constant labels to their corresponding byte offsets within the file.' Leo Tötsch's answer was 'not yet'. Apparently the basic functionality is there but there are some cleanups and a couple of opcodes needed.

http://groups.google.com/groups

Parrot binaries

Ask Bjørn Hansen posted to say that he'd been trying to make an RPM .spec file for parrot and had noted the lack of a make install target. He also wondered about the permissions of various files in the distribution, which seem somewhat cockeyed. Steve Fink posted a patch which covered some of this (or at least the make install and RPM issues) with tools which would generate RPMs and asked for comments on his work. Leo Tötsch liked it and offered a couple of suggestions

http://groups.google.com/groups

http://groups.google.com/groups -- Steve's working patch

Some comments on PDD 14 Big Numbers

Mark Biggar, who implemented the original Big* packages for Perl had some suggestions about Parrot's big number handling based on PDD 14. Benjamin Goldberg wondered if it would be a good idea to design our BigInt type so that we could replace the maths engine with a different engine so that users could pick which ever engine they prefer as configuration option, and so that new maths engines could be added without having to make wholesale changes.

http://groups.google.com/groups

99 bottles of beer on the wall

Douglas Hunter appears to have noticed that, up until now, Parrot had lacked a vital program -- one which would output the lyrics to that well known song, 99 Bottles of Beer on the Wall. So, being a community spirited kind of chap, he implemented just that and posted it to the list. A few people quibbled about his style (unnecessary labels, that sort of thing). Leo Tötsch pointed out, with the aid of some IMCC output that the optimizer caught those issues anyway, and added the code to the Parrot distribution's examples/ directory. Well done Douglas, it's good to see Parrot being used for essential applications like this. I'm just surprised nobody has yet benchmarked the Parrot implementation against Perl and Python. This may just be the test application we need for the Parrot/Python speed challenge.

http://groups.google.com/groups

BASIC, IMCC and Windows

Clinton A Pierce has been having a hard time making good on his offer to produce a milestone Windows binary for distribution. He's having a hard time building IMCC because it seems to want headers that MSVC++ isn't happy with. Dan noted that this needs to be fixed before the next release of Parrot (which will be either 0.1.0 or 0.0.11) because he wants to get rid of the IMCC/parrot difference and just have a single binary. Leo Tötsch and Nicholas Clark offered a few suggestions about workarounds, but it looks like there's still no Windows distribution for Parrot.

http://groups.google.com/groups

Iterator Proof of Concept

Following the deafening response to his Iterator proposal of last week, Leo Tötsch agreed with me that that must mean that everyone liked the idea and posted a proof of concept patch and invited comments. Nobody has yet done so.

http://groups.google.com/groups

vtable_1 - accessor macros

Leo Tötsch has started work on hiding vtable functions and posted his first patch. He commented that the new accessor macros seem to make for more readable source files. He asked if people were okay with the macro, as the next step would be to start using the macros all over the parrot source. Nicholas Clark liked the basic idea, but worried about documentation, and about the possibility of scary macro layering a la Perl 5. He also wondered if it would be possible to conditionally write them as inline functions, which would make debugging easier. Leo didn't see much point in making this particular set of macros into inline functions as they were simple translations with no complex functionality.

http://groups.google.com/groups

Parrot 0.0.10 released

Parrot version 0.0.10, codename 'Juice' was released on Wednesday. Steve Fink rounded up the usual list of improvements and generally did a good job of letting the world know about the new Parrot release. If you want the details, follow the link.

http://groups.google.com/groups

Meanwhile, over in perl6-language

perl6-language was again the busier list this week. The fallout from Apocalypse 6 continues both with clarifications of what was in that document and some suggestions inspired by it.

is static?

Towards the end of last week, Uri Guttman had asked about a possible 'is static' variable trait, and the discussion carried on into the this week. Arcadi Shehter suggested using has to introduce a static variable:

    sub foo () {
        has $s //= 0;
        $s ++;
    }

Larry described this as a very interesting idea, but was concerned that this might be akin to overloading the meaning of has in the same way that C overloaded the meaning of static. Damian also worried that using 'has' in this way would mean stretching 'the concept of objects a little too far'. He also thought that using //= to initialize such a variable felt wrong, and that maybe we needed a third assignment operator to go with 'assign unless true (||=)' and 'assign unless defined (//=)', which could be thought of as 'assign unless exists' and suggested ??= as a possible (though he didn't profess to actually like it). Damian also thought that such functionality should really be introduced with a trait and suggested a list of possible trait names. Uri was unconvinced by any of Damian's alternatives and proposed deciding the name with a duel on the beach at Boca Raton.

Larry decided that using a trait to denote a 'static' variable would be the wrong way to go because such a variable has a different kind of scope from a my variable and described the whatever declaration was eventually settled on as introducing a lexically scoped alias to a property of the current block. Larry later suggested state $s = 0 because 'a static var makes a subroutine stateful'. Uri wasn't sure, but Damian liked it.

Arcadi Shehter wondered about scoping of lexically scoped subs and if they could really be thought of as lexically scoped at all, whereupon he and Matthijs van Duin got a bit tangled up with the various kinds of scoping available, but there's been no definitive comment from Damian or Larry about the correct behaviour.

http://groups.google.com/groups

On the place of Scalar in any class hierarchy

During the discussion last week about strictness and type coercion on function calls, some confusion arose about whether Int isa Scalar, or Scalar isa Int|String|Float|.... David Whipp proved himself a hero by posting on this subject, pointing out that Scalar is the type of the variable, not of the value (I'd probably replace variable with 'container', but I'm picky like that) and explained how this solved some of the problems people had been having. There was no comment from Larry or Damian about this post, but I thought it was wonderful.

http://groups.google.com/groups

A6 questions

Dave Storrs posted a grab bag of questions about Apocalypse 6 and got a grab bag of answers.

http://groups.google.com/groups

Apoc 5 - some issues

Matthijs van Duin had a pile of questions and issues with Apocalypse 5. Some of the questions were Hard. Only Luke Palmer dared answer them. He and Matthijs batted things back and forth a few times, but it looks like Matthijs still had some issues with backtracking into and over closures.

http://groups.google.com/groups

``XML is Too Hard for Programmers''

Rich Morin pointed everyone at Tim Bray's article about XML being hard to deal with in most programming languages. Robin Berjon chipped in with a pointer to an xml-dev thread on the same subject. There was a certain amount of muttering about companies that use formats that seem to walk like XML but fail to quack like XML (the initials M & S were used, and I don't think they were referring to Marks and Spencer). Michael Lazzaro made an impassioned plea for insuring that Perl 6 allows easy, fast parsing of XML-like things out of the box. Austin Hastings suggested that Michael should take command of P6ML. Dan pointed to a basic Parrot XML parser in the Parrot examples directory that was at least four times faster than the equivalent Perl 5 code that it's a line for line translation of, and noted that the performance numbers were old.

http://groups.google.com/groups

http://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog -- Tim Bray's article

http://lists.xml.org/archives/xml-dev/200303/msg00536.html -- xml-dev thread

Rules and hypotheticals: continuations versus callbacks

Remember what I said about Matthijs van Duin still having some issues with backtracking? Well, he kicked off a whole new thread just to discuss it and two possible methods for implementing the grammar system (labelled the 'backtracking continuation' and 'callback' methods). Matthijs would like to see the backtracking continuation method blessed as the right way. (He discusses all this here because the choice of implementation could well have language level implications). Luke Palmer was concerned that both of Matthijs's proposed implementations would be very slow and had a couple of other possible implementation approaches (the 'success continuation' and 'backtrack exception' methods). Matthijs reckoned that the success continuation approach was pretty much the same as his 'callback' method, and that the 'backtrack exception' method seemed to have problems with rules like <foo> <bar>. And then it all got very complicated as Dan and Matthijs went back and forth at the issue with occasional interjections from others where appropriate. Then someone mentioned threads...

Dan and Matthijs seem to have very different sets of assumptions which leads one to suspect that they're arguing past each other on occasion. I certainly found myself wishing there was somewhere convenient with an ample supply of chalkboards, chalk, index cards and Sharpies and other high bandwidth communication aids where they could go and come to some kind of understanding.

At one point Austin Hastings observed that 'when anyone says ``I don't see why anyone would ...'', Damian immediately posts an example of why. Unless it's Damian, in which case Simon, Larry, or Dan usually counterpost.' Hey, it made me smile.

I think Larry's only contribution to this thread bears repeating in its entirety:

I would like to express my sincere gratitude to all of you for working through these issues. I bent my brain on the Perl 5 regex engine, and that was just a ``simple'' recurse-on-success engine--and I'm not the only person it drove mad. I deeply appreciate that Perl 6's regex engine may drive you even madder. But such sacrifices are at the heart of why people love Perl. Thanks!

http://groups.google.com/groups

Questions to Help Program Perl6::Parameters

Brent Dax is working on a Perl6::Parameters source filter for Perl 5, a task he describes as 'damn hard'. He had a couple of questions about argument behaviour. Damian had the answers, and made noises about his forthcoming module, Perl6::Rules. Simon muttered something about Parse::FastDescent (but everyone passed over that) and pointed out Matthijs van Duin's work in progress.

Larry thanked Brent for taking on the task and 'suffering vicarious pain for the user' and mentioned that he was interested in feedback on spots where the design impacts performance unduly. Larry went on to discuss possible changes to the scoping of $_ to help with with the transition from Perl 5 to Perl 6. He and Damian proceeded to discuss other entertaining syntax possibilities to do with want, where and when. Nobody has yet proposed meaningful semantics for what, who and why, but I'm sure somebody will get 'round to it.

http://groups.google.com/groups

http://www.liacs.nl/~mavduin/P6P5_0.00_01.tar.gz -- Matthijs' work in progress

.req or .arity?

Austin Hastings asked that Routine's .req method be renamed .arity. Damian rather liked the idea. Larry wasn't so sure because .arity is somewhat opaque to non-mathematicians, but he accepted that one could argue that anyone who doesn't know what arity means shouldn't be writing code that depends on it. Steffen Müller thought that either .req or .reqargs would be a better name for the method.

For those who don't know, 'arity' is the number of arguments a function or operator takes.

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

And another summary rolls towards its close. I'd like to echo Larry's thanks to everyone who's working on scary magic like the Perl 6 rules engine and Perl6::Prototypes. The way people like Brent, Matthijs, Leo and Luke have stepped up to the plate for this undeniably Hard Stuff is, frankly, inspirational. It's people like you who keeps me bullish about Perl 6.

Thanks too to everyone else involved in either list; Michael Lazzaro made the point that often the most productive way for perl6-language to proceed is for an initial suggestion to be thrown out onto the list, then everyone has a good long wibble about it going off in about 3000 different directions at once before someone (usually one of Damian, Larry or Allison but not necessarily) pulls it all together into something that makes sense and we move on to the next item. My gut tells me that without all the wibbling the end result wouldn't be quite so satisfying.

For instance, much as I hated trying to summarize the everlasting pipeline thread, the end results of that discussion are the rather lovely ==> and <== operators that appeared in the latest Apocalypse. (Of course, Larry or Damian will probably respond to this summary by telling me that actually, they'd been planning something like that all along. I just won't necessarily believe them)

If you appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk.

For Perl Programmers : only

Have you ever wished that you could have more than one version of a Perl module installed on your system, and that you could easily tell Perl exactly which one you wanted to load? Perhaps you have some legacy programs that only run with older versions of certain modules, while the rest of your system is coded to the more-modern versions of the same module. Or maybe you administer a multiuser system, where your various users request different module versions.

Perl's built-in mechanism for module version control is very rudimentary. Basically, you can say:


    use XML::Parser 2.27;

The use command will load XML::Parser and then proceed to die unless $XML::Parser::VERSION is greater than or equal to 2.27. There is no way to ask for one specific version or a range of versions. Most of the time this is OK. It's OK if you assume that successive versions of modules always improve, never drop functionality and are fully backward compatible. Unfortunately, in real life, this isn't always true.

Take Lincoln Stein's excellent GD module for example. GD.pm is a toolset for creating graphic images from Perl programs. Up to version 1.19, GD supported the GIF format. Due to GIF's licensing restrictions, the author was forced to retract support. GIF support was replaced by the PNG format, and later JPEG support was added. If you are required to use GD for both GIF and PNG format types, then you need to do some monkeying around with library paths to get it to work.

With big projects in restrictive environments, dealing with module versioning issues can quickly become a tangled briar patch. When your needs for using specific modules become too thorny, you need a sharper chainsaw. This article describes such a saw.

The Only Way

Introducing only.pm! only is a full featured system for installing and loading multiple versions of Perl modules.

To make sure that your program loads version 1.19 of GD, simply say:


    use only GD => 1.19;

If you know that any of the GD versions from 2.01 up to 2.06 are OK, then say:


    use only GD => '2.01-2.06';

If you also want to import GD::foo and GD::bar, then:


    use only GD => '2.01-2.06', qw(foo bar);

only acts as an extension to Perl's use command. It intercepts the parameters that you would normally pass to use along with an acceptable version range. It takes all of that information and does a "heap of bit-wrangling" to make sure that you get the version of the module you wanted. In every other respect, it tries to act like a regular use statement.

Easy Installations

How do you go about installing several versions of Perl modules? Perl is really only designed to support the installation of a single module version. Whenever you install a new version of a module, it simply overwrites the older version. This is known as upgrading. Usually that's what you want, but in the context of only, you actually want to have multiple versions installed simultaneously.

Advanced Perl programmers know that is possible to install modules into different directories. You can supply a PREFIX= parameter to the make install process. But then you have to remember where you installed the module, and manually adjust @INC to have the right sequence of paths. That's just no fun.

Fortunately, only makes it extremely simple to install multiple versions. To start with, just go about building your module in the usual way:


    tar xvzf Cat-Fancy-0.09.tar.gz
    cd Cat-Fancy-0.09
    perl Makefile.PL
    make test

Normally, the next step is make install. But if you want to store version 0.09 separate from any other version of Cat::Fancy, do this:


    perl -Monly=install

This actually does the same procedure as make install would except that it automatically determines a safe place to stick your module; one that only.pm can find when you say:


    use only Cat::Fancy => 0.09;

Note: If you need authorized access to install modules on your system, then you can do:


    sudo perl -Monly=install

But be careful. The perl used by sudo may not be the same perl as the one used by make. If this is the case, then you'll need to specify the full path to perl.

Where Is My Only Module?

You may wonder, "Where exactly do these modules get installed and will they conflict with my existing modules?". The short answer is, "They get installed exactly where you told them to!".

When you install the only module, you are asked to select a base directory where only::install can install modules. The default value is a modification of your Perl's sitelib directory. For instance, if your sitelib was:


    /usr/local/perl580/lib/sitelib/5.8.0

then only.pm would default to:


    /usr/local/perl580/lib/version/5.8.0

Even though this is the default, you are prompted to select any directory you want. Your choice is saved permanently in only::config. The constant only::config::versionlib, becomes the base directory where only will install and search for versioned modules.

If you really need to, then you can override this directory as well. For installation, you would say:


    perl -Monly=install - versionlib=/home/ingy/modules

The versionlib is just the base directory. only separates various module versions by sticking them into a subdirectory named after the version. So Cat::Fancy would be installed as:


    /usr/local/perl580/lib/version/5.8.0/0.09/Cat/Fancy.pm

Module::Build

Many new Perl modules are using Module::Build for their build process, instead of the age-old Makefile.PL and ExtUtils::MakeMaker. Module::Build is a wonderfully organized and extensible replacement for its stalwart yet terribly crufty predecessor.

One of the side benefits of modules distributions that have Module::Build is that you can do version-specific installations natively by saying:


    perl Build.PL
    ./Build versioninstall

Although this is just a pass-through call to only::install, it does not suffer from the aforementioned "sudo" problem.

only The::Facts

Back to only.pm. In this section, I'll discuss all the gory details of the use only syntax. It really isn't that bad. The basic form is:


    use only MODULE => CONDITION;

where MODULE is a module name, and CONDITION is either a simple version number or a more complex version specification. More on version conditions in a moment. If you have arguments to pass to the module, then simply tack them on to the end:


    use only MODULE => CONDITION, ARGUMENTS;

only even accounts for situations where different versions need different arguments. You match up the conditions and arguments as a set of anonymous arrays:


    use only MODULE =>
        [ CONDITION1, ARGUMENTS1 ],
        ...,
        [ CONDITION7, ARGUMENTS7 ];

Finally, there are some special options that you can pass to only to tell it how to behave. This is accomplished syntactically by passing an anonymous hash of option/value pairs. Put the options hash directly after use only:


    use only { OPTION => VALUE},
        MODULE => CONDITION;

If you want to set the options globally (for all subsequent only interaction), then just specify the options without any other arguments. For example, to override the versionlib option for all use only ... statements, say:


    use only { versionlib => '/home/ingy/modules' };

Hazardous Conditions

Even though a "version condition" can be as simple as a single version number like '0.42', only offers flexible syntax for expressing exactly which versions you are (and aren't) interested in.

If you want to specify a list of versions, then just use a space-separated enumeration:


    use only Bird::Talk => '0.42 0.44 0.47 0.50';

If your version requirements fall into a range, then you can specify those, too. Just use two versions, separated by a dash:


    use only Bird::Talk => '0.42-0.50';

Of course, you can list multiple ranges as well. And if you leave one of the versions off the range, then that means the range is open-ended.


    use only Bird::Talk => '0.42-0.50 0.55-0.62 0.67-';

Sometimes it's easier to just specify the versions you don't want. Using a '!' in front of either a range or a single version, negates that meaning. To avoid all versions of Bird::Talk below 0.42, and also the extremely buggy version .53, say:


    use only Bird::Talk => '!-0.41 !0.53';

When more than one eligible version of Bird::Talk is installed on your system, only always chooses the highest version. If you don't specify any version, then that is an indication to choose the highest-numbered version. This is different than saying:


    use Bird::Talk;

because only will search your versionlib first.

For Argument's Sake

There's not much to say about passing arguments. Just pass them in the same way you would on a normal use statement. This should even work for modules like Inline.pm where the arguments are not import lists:


    use only Inline => 0.44, 'Java';

There is one exception. In Perl, when you say something like:


    use Dog::Walk ();

That's a cue to not call the module's import method at all. In other words, it ensures that no functions will be exported into your namespace. But if you say:


    use only Dog::Walk => 1.00, ();

then you will not get the same effect. Unfortunately, there is no way for only.pm to detect that you called it that way. As a workaround, only lets you say:


    use only Dog::Walk => 1.00, [];

This has a similar visual appearance and is meant as a mnemonic. (Hopefully, there aren't a whole lot of modules in the world where it is important to pass in a single empty array ref :)

The Only Options

The only (no pun intended) option currently implemented for only is versionlib. This option allows you to override the system versionlib stored in only::config.


    use only { versionlib => '/home/ingy/modules' },
        Corn::Stalk => 0.99, [];

Friends and Family

One important duty of only is to ensure that when you load a specific version of some module, all of that module's related modules are also loaded from the same version level. This is tricky, because in Perl, you never know when a module is going to be loaded. It could be loaded by your original module or not. It might happen at compile time ( use ) or run time ( require ). It could be loaded hours later in a long-running process (or a very, very, very slow computer :) There might also be autoloaded functions involved.

Most importantly, some of the sub-modules might be loaded using use only, while others are loaded with standard use and require statements. To make all this happen the way you'd expect it to, only plays some tricks with @INC. More on that shortly, my preciouses.

only knows which modules are related because it saves the information as metadata for every module it installs. For example, if I install a module like so:


    $ cd YAML-0.35
    $ perl Makefile.PL
    ... lines deleted
    $ make test
    ... lines deleted
    $ perl -Monly=install
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML.pm
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML.pod
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Error.pm
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Family.pm
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Node.pm
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Transfer.pm
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Error.yaml
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Family.yaml
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Node.yaml
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML/Transfer.yaml
    Installing /usr/local/perl580/lib/version/5.8.0/0.35/YAML.yaml

then I get a YAML metadata file for each module. The metadata file YAML.yaml looks like this:


    # This meta file created by/for only.pm
    meta_version: 0.25
    install_version: 0.35
    distribution_name: YAML
    distribution_version: 0.35
    distribution_modules:
      - YAML.pm
      - YAML/Error.pm
      - YAML/Family.pm
      - YAML/Node.pm
      - YAML/Transfer.pm

This way, no matter which module is loaded first, only knows about every other module that was installed with that module.

Only on the Inside

The internals of only.pm are not incredibly complicated, but there is a little black magic going on. Most of it boils down to a relatively new and under-publicized feature of Perl5: putting objects onto the @INC array.

As you probably know, @INC is a special global array of file-system paths. When a program tries to load a Perl module with the use or require commands, it searches each of these paths in order until the module is found. The default paths in @INC are compiled into Perl. You can alter the array with the PERL5LIB environment variable, the lib.pm module, or even by simply changing it with regular Perl array commands. It's just an array, after all.

As of Perl 5.6.1, you can actually put Perl objects onto @INC and have use and require interact with them. When require encounters an object in @INC it attempts to call that object's INC method. The INC method can do anything it wants to load the module. It could actually go out on the internet and locate the module, download it, install it and load it!

The INC method should either return a filehandle or nothing. If a filehandle is returned, then require considers the operation a success. It reads the contents of that filehandle and eval-s the module into existence. If nothing is returned, then the operation is considered unsuccessful, and require continues its merry way down @INC.

The heart of the only module's magic lies in the fact that it puts an object onto @INC that is responsible for loading an appropriate version of your module. Not only that,it is also responsible for loading the matching version of any related modules that were installed at the same time as your module.

O-O-o - Object-Oriented Only

Since only is an object-oriented module on the inside, it is no surprise that it offers an OO API to the those of you on the outside. (I assume that you don't live inside a Perl module :)

Using the OO interface can give you more understanding and control of the version specific loading process, at the cost of a slightly more verbose syntax specification. As an example, if you would normally do this:


    use only Good::Stuff => '1.20-1.55';

then you can say the same thing by doing this:


    use only;
    my $only;
    BEGIN {
        $only = only->new;
        $only->module('Good::Stuff');
        $only->condition('1.20-1.55');
        $only->include;
    }
    use Good::Stuff;

Note that the use statement is absolutely normal. No only involved. But it still does what we want! That's because the preceding code sticks one of those magic only objects onto @INC.

The methods should be fairly self-explanatory. The key method call is only-include>. It tells the object to attach itself to the front of @INC.

One nice thing is that you can actually do stuff with the object later on in the program:


    use only;
    my $only;
    BEGIN {    # methods can be stacked
        $only = only->new->module('Good::Stuff')->condition('1.20-1.55')->include;
    }
    use Good::Stuff;
    ...
    print "Using Good::Stuff version: " . $only->distribution_version . "\n";
    ...
    $only->remove;    # Remove object from @INC;

Conclusion

I always believed that if my old Triumph motorcycle ever broke down out on the open road, I could somehow figure out a way to repair it by walking down the road for a half mile in either direction, and finding some odds and ends lying around that I could use for tools. That's because the roads are usually littered with all sorts of weird things, and I see everything as a tool to somehow suit my needs.

Perl is much like that roadside. There are all kinds of weird things lying around that can help it solve its own problems. Even when Perl is playing the part of a busted Triumph bike, its roadside qualities always seem to be able to kickstart it right back into action.

only.pm is a great example of this. Even though Perl was inadequate in regards to module versioning yesterday, today, it's packing a brand new chainsaw. Rev it up!

About the Author

Brian Ingerson has been programming for more than 20 years, and hacking Perl for five of those. He is dedicated to improving the overall quality of scripting languages including Perl, Python, PHP and Ruby. He currently hails from Portland, Ore. -- the location of this year's O'Reilly Open Source Convention. How convenient!

This week on Perl 6, week ending 2003-03-16

Spring is in the air, the Apocalypse is here (or imminent, depending on which sense of the word 'Apocalypse' you are using). We'll start with perl6-internals as usual, before bracing ourselves for the increased volume and ploughing on into perl6-language.

Object Spec

Dan's 3rd try at the Objects and Classes spec received a very small amount of further discussion this week. What there was mostly touched on the boundary between where the line is drawn between parrot's object system and a particular language's object system. The inference I draw from all this is that the next Object spec will be the final(ish) Parrot Object spec.

Parrot 0.0.10

Steve Fink instituted a Parrot feature freeze in the run up to 0.0.10 at the end of last week, aiming for a release on Saturday the 15th of March and noted that he was leaning toward "Juice" as a code name (punning on Leo Tötsch's work on the imcc -Oj optimizations). David Cuny pointed out that there was already a virtual machine called Juice and suggested a whole list of possible code names. Leo Tötsch reckoned that calling it 'Juice' would be 'too much honour' and suggested a list of anagrams of 'Parrot ten'. ("Partner to", "Par Rotten" or "Tarte porn" anyone?).

It looks like we missed the release on the 15th, but Steve announced a release candidate on the 16th, in expectation of a release on the 17th.

http://groups.google.com/groups -- Feature freeze

http://groups.google.com/groups -- Release candidate announced

languages/BASIC reorg

Clinton A Pierce announced that he'd reorganized the languages/BASIC subtree into 'compiled' and 'interpreted' subtrees and noted that he was very impressed with the improvements in Parrot's speed and memory management. Leo Tötsch pointed out a few issues with MANIFEST and the need for a Makefile, and wondered if Clinton had run things through IMCC.

http://groups.google.com/groups

The Judy algorithm

Tim Bunce pointed everyone at the Judy dynamic array code on Sourceforge and wondered if it would be useful for Parrot. (Judy is a high speed dynamic array implementation optimized for modern processor architectures apparently). Leo Tötsch thought it looked interesting and suggested that someone try wrapping Judy up in a PMC and running some performance tests. Elizabeth Mattijsen went and took a look and reported some issues with memory leakage and worried that the project looked 'silent'. Tim mailed her concerns to Judy's author who addressed them in his reply and admitted that he wasn't that good at keeping the website up to date. He said that Judy had been 'tested carefully not to have leakage' and wondered if it might have been an issue with the tool Liz used to do the testing.

I await further developments with interest. If Judy can be made to work, it looks jolly quick.

http://groups.google.com/groups

http://judy.sourceforge.net/application/10minutes.htm

Yet another iterator proposal

Leo Tötsch posted a request for comments on what he called 'yet another iterator proposal'. Nobody commented. I wonder if this means everyone liked it.

http://groups.google.com/groups

Parrot for Windows

Benjamin Goldberg wondered if there were any precompiled parrot binaries for Win32 available as he wants to be able to test parrot code without the current weird rites he has to go through (see his post for details). Clinton A Pierce put a snapshot build up on his site temporarily for Benjamin to download. Robert Spier offered space on www.parrotcode.org for a windows build when the next release comes out which Dan thought would be really cool. Dan also wondered about making an automated build farm but I think he may have a tuit shortage when it comes time to actually implement it. Joshua Hoblitt also offered to host binaries on his CPAN mirror. Clint said he'd be happy to make milestone binaries and wondered if there was a standard way such a distribution should be put together.

http://groups.google.com/groups

Moving to PIO

Jürgen Bömmels continued to make Dan happy by moving more file related opcodes from STDIO to Parrot's PIO libraries. The latest ops to get his attention were open and close. Dan did a happy dance and applied the patch before wondering if we were subject to a code freeze (I don't think so; it was feature freeze time).

http://rt.perl.org/rt2/Ticket/Display.html

IMCC and PDD03 (Calling conventions)

Leo has been thinking some more about the parrot calling conventions described in Parrot Design Document 3 and worried that they can't actually be done. He proposed reducing the number of parameters that can be passed in registers in order to take pressure off the register allocator in IMCC. Dan agreed and quickly changed the PDD to take this into account. Leo then asked for some clarification on a couple of other issues that he was having a hard time understanding. Dan said that it was probably best to come back to these issues after he'd done PDD15 (The object spec). Leo agreed that objects will probably shed more light on the calling conventions and we all sat back to hang on Dan's every object oriented word. Not that there's any pressure at all.

http://groups.google.com/groups

Parrot extensions, or PMC versatility

Darren Duncan had some questions about PMCs and parrot extensions and what he should consider as he wrote a general persistence library. Dan and Benjamin Goldberg had some answers.

http://groups.google.com/groups

sun4 vtable JIT support

Jason Gloudon posted a patch adding support for vtable calls in the sun4/JIT and added support for some more ops. There appeared to be a few problems with the patch so it hasn't been applied yet.

http://groups.google.com/groups

A fish, a barrel and a running joke

Leon Brocard made maintaining this particular running joke almost trivial this week by actually posting something. He's implemented uniq(1) in parrot assembly, though he notes that it's not very fast compared to GNU uniq(1) yet. Dan added it to the distribution in examples/assembly/.

http://groups.google.com/groups

Parrot reaches another milestone

Dan has decided that Parrot has reached the point where it should have a working install make target. He asked for someone to make it so. No takers so far, but he posted late on Sunday so maybe there will be news of this in the next summary.

http://groups.google.com/groups

Meanwhile, over in perl6-language

perl6-language saw 210 messages this week. Which I think is more than it's seen in the last 3 or 4 weeks put together. Maybe it had something to do with the return of Damian Conway and the release of Apocalypse 6 (and the spectacularly short Apocalypse 7).

Statement modifiers

Matthijs van Duin wondered if the issue of multiple statement modifiers has been settled. The thread is long, and the answer is essentially (and authoritatively) "Yes, it's settled. No, you can't do it." So, unless Larry changes his mind the point is moot. However, Matthijs does put his case very well, if you're interested in this area I can recommend reading the thread.

http://groups.google.com/groups

http://archive.develooper.com/perl6-language%40perl.org/msg09331.html

Just when you get fed up of waiting for the Apocalypse...

Two come along at once.

Apocalypses 6 and 7 appeared online on Monday, a mere 9 months since the last one (Apocalypse 7 is all of two sentences long and is contained within Apocalypse 6, we'll be ignoring it from now on). This Apocalypse covered closures, subs, functions, methods, types, signatures and pile of other good stuff. All the syntax introduced was spelled out neatly using Perl 6 rule notation, neatly showing off the power of the syntax introduced in Apocalypse 5. I would attempt to summarize it, but it's already pretty dense so I suggest you all read it:

http://www.perl.com/pub/a/2003/03/07/apocalypse6.html

Now you've digested that, the rest of the summary should be easier to follow.

Damian returned to the list, though he's only reading posts related to Apocalypse 6 and he's pretty busy writing Exegesis 6 too so we're not expecting vast amounts of posts from him.

Austin Hastings kicked off with the first question, wondering if the 's' in <psiglet> was silent. "Of course not." says Larry.

Paul applauded the new macro feature (but Damian corrected his syntax).

S&M vs. B&D

Uri Guttman, displayed entirely too much knowledge about the difference between B&D and S&M asking if Larry shouldn't have used B&D where he used S&M in the apocalypse. Austin Hastings knows too much too it appears, commenting that it depends on whether you consider strongly typed compile-type semantics as being restrictive or painful.

http://groups.google.com/groups

Can "is" properties be specified indirectly?

Austin Hastings wondered if there would be some way of differentiating between an array of constants and an array of variables. In other words, how would one specify an array which may be appended/pushed, but whose values cannot change or a hash to which you could add keys but not change existing entries. Damian thought that you'd have to subclass Array or Hash as appropriate. Luke Palmer wasn't keen on this because then it would be easier to do something in C++ than in Perl, which isn't the usual way of things.

http://groups.google.com/groups

is constant eq pass by value?

Uri Guttman was confused by the default parameter passing style for Perl 6 functions, is constant. A parameter variable declared with is constant is 'locked'; you can't use the same variable to hold a different value.


   sub some_func ($some_arg is constant) {
       $some_arg += 10;
       ...
       return $some_arg;
   }

To do that you need to declare the parameter with is copy. Uri noted that he really should keep his finger off the send button until he's read the whole 'megilla', whatever one of those is.

http://groups.google.com/groups

Pipes

Michael Lazzaro declared that he thought the Apocalypse was great and that the 'sig stuff is very, very nice.' Then he asked about 'pipes' (the new <== and ==> operators which are almost, but not quite, entirely unlike the hypothetical ~> and <~ operators that were discussed so interminably a few summaries back). He wanted to know what was decided about some of the edge cases discussed in the appendix to the apocalypse (and had a comment to make about style). Damian pointed out that Michael's edge cases all collapsed to two:


    @var <== LIST;
    LIST ==> @var;

Damian said that Larry was still unsure about these but that he (Damian) thought they would be allowed in, if only because


    @in ==> map  {...}
        ==> sort {...}
        ==> map  {...}
        ==> @out;

is a lot less ugly than


    @in ==> map  {...}
        ==> sort {...}
        ==> map  {...}
        ==> push @out;

or (John Williams suggestion)


    LIST ==> @var.STORE(*);

I have visions of the Perl 6 naysayers reading this section and muttering dark imprecations about the end of the Perl as we know it...

http://groups.google.com/groups

Complex Parameter Types

Michael Lazzaro wanted to know more about 'complex' type declarations. He wanted to know how these interacted with signatures, could one use arbitrarily complex types in subroutine signatures? Could one do multidispatch on them? He also had a question about when/whether types got coerced (that wasn't actually his question, but it's what it boiled down to) when functions were called.

The answer to the first two questions was straightforward: Yes, you can use arbitrarily complex types in function and multimethod signatures.

The third one proved a little trickier. Damian said that values would not be coerced, but Larry wasn't so sure. He thought that coercion would be the right thing to do. Probably. Damian noted that the 'Right Thing' could well vary depending on whether the parameter in question was declared is constant/rw (in which case passing a parameter with the wrong type would be an error) or is copy (in which case coercion could be the thing to do). Austin Hastings wondered if this could be handled with a use strict 'params' type pragma, but Damian wasn't convinced, arguing that the use of typed params automatically implied type stricture.

This thread gets horribly hard to summarize so, for now, I'll just pick out some representative highlights.

Larry says that, in the absence of a pragma to the contrary type checking will be done at runtime if the type is unknown/unknowable at compile time, but he expects the B&D types to come up with a pragma that enforces compile time checking on pain of failed compilations.

Brent Dax came up with the concept of 'strict' and 'loose' argument typing. Strict arg typing doesn't coerce except that, say, a Child can satisfy a requirement for a Parent. Loose arg typing will coerce wherever possible. He argued that the mechanism for choosing between strict and loose arg typing should be under the control of the caller rather than the callee, as that seemed to fit best with The Perl Way. Larry agreed with him. He added that the real question is whether the default use strict should include strict arg typing and noted that he was inclining toward the strict side. This week.

A subthread sprang up dealing with Type inference. David Whipp noted that a smart compiler might well be able to infer the type of some variables at compile time. The theory being that in the presence of good type inference some variables wouldn't have to have a known type to avoid triggering compile time errors under strict arg typing. Angel Faus pointed out that using 'user visible' type inference wasn't too good an idea because (among other reasons) it might lead to cases where a program only works as written for more recent versions of the compiler (which doesn't mean that type inference is a bad idea; if the compiler knows the type of a variable it could potentially use that information to improve code performance).

My head started spinning when Larry seemed to imply that, not only is Int a Scalar, but a Scalar is also an Int. Or maybe it plays one on TV.

This thread is still very much underway as I write this summary, so attempting to summarize it is akin to summarizing a book when you're halfway through it; you're never quite sure what are the important bits. Hopefully things will be a little nearer resolution next week and you'll get a more coherent summary of the conclusions.

http://groups.google.com/groups

http://groups.google.com/groups -- the inference subthread

http://groups.google.com/groups -- My head hurts

Signatures zones and line noise

Brent Dax wasn't keen on the new syntax for declaring whether parameters were optional, positional, named or variadic (greedy) and wondered if we couldn't use something prettier (wordier) with named traits ($x is optional etc). Damian thought that would be 'uncomfortably prolix' with default values ending up a long way from their owners. Paul and Brent proposed allowing


    multi substr(Str $str, $from = $CALLER::_ is optional
                           $len  = Inf is optional,
                           $new is optional) {
        ...
    }

but Damian pointed out that this could lead to badness because, in a signature the is would bind to the variable, but in a normal variable declaration and assignment the is would bind to the value, which would be confusing. There was some discussion of appropriate names for the other 'linenoise' in signatures, specifically the choice of name for the * type. The three front runners appear to be 'variadic', 'greedy' and 'slurpy/slurpificatious'. I like 'greedy', but then I'm close to the person who proposed it.

http://groups.google.com/groups

In which Luke Palmer gets ignored

Luke Palmer had things to say about Appendix C of the apocalypse, specifically to do with temp/let and with caller and Continuations. Maybe what he said on these subjects was completely uncontroversial, but nobody has replied to either of them yet.

http://groups.google.com/groups -- temp/let

http://groups.google.com/groups -- Continuations

Multi promotion

Richard Proctor wondered what would happen if you declared a multimethod with the same name as a previously declared normal sub/method. Specifically he wanted to know if the new declaration would automagically turn the old one into a multimethod. Michael Lazzaro thought not. As did Damian and Larry. Damian provided a summary of the rules for subroutine/method dispatch, which look wonderfully subvertable. Piers Cawley wondered if it would be possible to get at a 'dispatcher' object and/or override it in a lexical scope, nothing that sometimes, he scares himself. Nicholas Clark seemed to think Piers wasn't completely insane before going on to talk about wrapping and unwrapping in the case when the wrapper has itself been wrapped. Larry seemed to think that unwrapping should still do the right thing even if the wrapper being removed is itself wrapped.

http://groups.google.com/groups

http://groups.google.com/groups -- Dispatch rules (OK?)

Wrappers vs. Efficiency

In the Apocalypse, Larry talked about the performance implications of generalized function wrappers and brought up the idea of subroutine traits which would mark a sub as unwrappable, giving is inline as an example. John Siracusa asked that whatever the 'specific definition for speed' is that forbids runtime wrapping, it shouldn't be spelled 'inline' (though inline may imply "don't wrap me please"). A few people agreed, but things mostly got left vague.

http://groups.google.com/groups

Macro invocants

Dave Whipp wondered about using macros to do things like:


    my Database $db = MySqlDatabase.connect(...);
    $db.select * FROM Foo WHERE Foo.bar LIKE a%b;

by implementing select as a macro. Larry thought so, but you would have to export the macro into the calling context. Larry added that you would probably implement that particular functionality by treating ".select" as an infix macro with a specially parsed argument on its right hand side (the SQL statement). David thought this was a little clunky and proposed macromethod as a way to avoid the export requirements. Luke Palmer was unconvinced and it went back and forth a few times.

Larry wasn't convinced by David's suggestion either, commenting that 'mixing such compile-time semantics with a notation that supposedly mandates run-time dispatch is a recipe for a certain amount of confusion no matter how well we do it.'

http://groups.google.com/groups

Overloading multis on constness of parameters

Joe Gottman wondered if it would be possible to overload multimethods based on the const-ness of a parameter is the same way C++ does. The answer appears to be 'yes'. Piers Cawley passed on a question from Greg MacCarroll about overloading on the value of a parameter. Damian thought probably not, but then showed a cunning trick involving overloading isa and a junction to allow something pretty similar. Luke Palmer was impressed.

I'm not entirely sure what it had to do with this particular thread, but Austin Hastings posted a chart of his understanding of the various dispatch rules. Worth reading.

http://groups.google.com/groups

http://groups.google.com/groups -- Austin's chart

Operators and context

Deborah Ariel Picket had some questions about Michael Lazzaro's complete list of known Perl 6 operators, and wondered if this list should be extended to take context into account. She also wondered what the complete list of possible contexts was now. Michael reckoned that the list she pointed to was still current, but he expected to add the new <== and ==> ops. He also agreed that adding information about contextual behaviour would be a good (if daunting) thing to do. He noted too that Apocalypse 6 seems to imply that there would be a context for every type. Nothing definitive from Larry (or even Damian) yet.

http://groups.google.com/groups

http://archive.develooper.com/perl6-language@perl.org/msg12130.html -- the current operators list

Assignment Overloading

Luke Palmer wondered if it would be possible to overload the assignment operator and wondered if the assignment operator was a value or a reference copy. Damian pointed out that *infix:= was really an operator on containers not values and you'd probably implement it by overloading STORE in the container's class. He also commented that STORE might end up being spelled &infix:=. Copying is a shallow value copy.

Larry popped up to discuss the .copy method on SCALAR but I wasn't entirely sure whether that gets called on the target container or the thing being copied. But that might just be me being confused.

http://groups.google.com/groups

P6FC

Aldo Calpini put up a tentative class hierarchy for Perl 6's standard classes (he called them the Perl 6 Foundation Classes, narrowly avoiding a knee jerk response from at least one summarizer of this parish) and asked for comments. Murat Ünalan suggested that Aldo was posting to the wrong audience before suggesting that we adopt the .Net/Java object hierarchy, which confused Aldo somewhat, so Murat clarified what he meant. Simon Cozens made a few suggestions to Aldo about naming and missing classes. Larry added some clarifications on his thinking about the difference between types and classes.

http://groups.google.com/groups

http://dada.perl.it/p6fc.html -- Aldo's class diagram

http://groups.google.com/groups -- Larry on types and classes

Thoughts about multiple properties

Chris Dutton wondered about using junctions for multiple properties so method bar is public is rw {...} could be written as method bar is public & rw {...} instead. Jonathan Scott Duff didn't think that made much sense, and pointed out that the is is optional after the first, so you could write method bar is public rw {...} if you wanted. Larry noted that 'that feature is still in Schrödinger's little box.'

This led Mark Biggar to suggest some syntax for neatly defining groups of traits and complex types without having to go to the length of defining a class. Michael Lazzaro thought that Perl 6 classes wouldn't be as heavyweight as they are in Perl 5 and did some WAGging (Wild Ass Guessing) about making subs that inherit from Sub which looked pretty good to me even if it did lead Austin Hastings to call Michael a 'bad, evil man'.

http://groups.google.com/groups

Named vs. Variadic Parameters

Michael Lazzaro pointed up some issues with subroutines that have both variadic arguments and named arguments. Luke Palmer thought that there were two, mutually exclusive, ways of dealing with this: Michael's proposed behaviour and the behaviour described in the Apocalypse. Luke was happy with the Apocalypse behaviour. Michael said that he hoped that whatever was decided, 'broken' function signatures would give rise to a compiler error.

http://groups.google.com/groups

is static?

Uri Guttman asked about static variables in Perl 6. In Perl 5 you can get a static variable by doing:


    {
        my $count = 0;

        sub sequence { return ++$count }
    }

Uri wanted to be able to do something like:


    sub sequence {
        my $count is static is default(0);
        return ++$count;
    }

Dave Whipp wondered if our did the job (it doesn't). Larry mentioned that he didn't particularly like 'static' as a name for this sort of variable and reckoned that, if there were an 'is static' declaration the compiler would probably translate:


    my $pi is static = 3;

to something like


    our $foo__Xdeadbeef will init {.set(3)}

Larry's preferred syntax would be our $foo is unique, but he noted that 'It's not like someone isn't going to implement "is static" the moment our back is turned anyway'.

http://groups.google.com/groups

No mandatory named arguments?

Nicholas Clark wondered if his reading of Apocalypse 6 was correct and that there is no way to specify mandatory named parameters. Larry agreed that this was so, but that one could finesse things at runtime with tricks like:


    sub foo(+$req = die 'You must supply a "req =>" argument') {...}

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

That about wraps it up for this week. Many thanks to Larry for giving us all something to talk about on perl6-language this week; I was starting to think about making the language list summary a fortnightly thing.

If you appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk.

The Many Dates and Times of Perl

Some Basic Concepts

In order to understand what you might want to do with dates and times, it's good to have a handle on some basic concepts. Here are some terms that I'll be using in this article:

  • datetime
  • This is not a word, but I will use it as a convenient shorthand to refer to a date and time together, because they are basically the same thing. Adding time to a date simply increases its granularity.
  • UTC (also GMT and Zulu)
    UTC stands for "Coordinated Universal Time". It is an international standard which is kept using atomic clocks, and is kept to within 0.9 seconds of the rotation of the earth on its axis in order to work well with traditional standards of time-keeping. UTC time is measured at the prime meridian (O degrees longitude).

    Time zones around the world are specified as an offset from UTC. The widely used term GMT (Greenwich Mean Time) refers to a time zone that is equivalent to UTC. In other words, it has no offset.

    The US military has a set of terms used for time zones around the world based on the alphabet (A-Z). In this system UTC is Z, sometimes called "Zulu".

    UTC is a good standard for the internal representation of dates and times, as it makes comparing datetimes or doing datetime math much easier.

  • Time zones and Daylight Saving Time

    Time zones, as mentioned above, are defined as an offset from UTC. Most, but not all, time zones are in offsets of whole hours. Adelaide, Australia has an offset of nine and a half hours ahead of UTC, and Nepal has an offset five hours and forty-five minutes ahead of UTC.

    Time zones are complicated by the use of Daylight Saving Time, which changes the actual offset of a given location to vary over the course of the year. The eastern US has an offset of -0500 from UTC, five hours behind UTC. This means that 12:00 (noon) UTC becomes 07:00 (AM). However, when Daylight Saving Time is in effect, the offset becomes -0400, four hours behind UTC. Because time zones are determined by governments, use of Daylight Saving Time, and the base offsets, have changed over time, and may change again in the future.

    This greatly complicates math when dealing with non-UTC datetimes. If I have a local datetime for the Eastern US of 2002-10-25 14:15:00 and I add six months to that datetime, then I will have crossed a DST change.

    The upshot of all this is that any code that represents time zones as fixed offset will probably start producing errors once date math gets involved.

    The definitive source of time zone offsets and rules is the Olson time zone database. It defines zones according to names like "America/New_York", as opposed to "EST". The latter shorthand is commonly used, but it should probably be avoided because these short names are not unique or definitive. For example, there is an "EST" at -0500 and +1000.

  • Local time

    The local time is UTC plus the local time zone offset. While UTC is great for internal use, most people want to see datetimes in terms of their location. In a sense, local time is the display format, while UTC is the storage format.

  • The epoch

    Epoch is a generic term referring to the "start" of any particular system. For example, the Gregorian calendar's epoch is January 1, 1 CE.

    The epoch system, as used most operating systems, represents a datetime as the number of seconds after a specific date and time. For Unix systems, the epoch began on January 1, 1970 at midnight GMT (UTC). Other systems have different epochs. Because of this, you cannot assume that an epoch time of 2,003,131 means the same thing from system to system, because different systems have a different "second 0". Storing a datetime as its epoch is not portable.

    Even worse, on most current systems, epochs are represented by a 32 bit signed integer, which only lets you represent datetimes with a range of about 136 years. On most UNIX systems currently in use, this means that the latest date you can represent right now is sometime in the year 2038, and the earliest is around 1902. This doesn't work very well if you're trying to represent the birth date of your great-great-grandmother.

    The upshot of all this is I would strongly recommend not using epochs except when you have no other choice. Of course, you'll often have no choice, so it's important to know how this system works.

  • The Gregorian Calendar

    There have been many different calendar systems in use throughout history. The Gregorian calendar is the current generally agreed upon international standard for representing dates, and is what you are using when you say "August 8, 1999". Other calendars that are still in use include the Hebrew calendar, the Islamic calendar, and the Chinese calendar.

    Even though the Gregorian calendar wasn't created until 1582, and it wasn't adopted world wide until this century, we can still extrapolate backwards in time using the Gregorian calendar.

What Needs Doing with Dates and Times?

There are a lot of things you can do with dates and times, and different modules/distributions provide different types of functionality. Broadly speaking, we can consider the following areas of functionality:

  • Parsing/Formatting

    There are more datetime formats in use in the computing world than you can shake a stick at. You'll often have to parse a datetime in one format in order to turn it into something you can work with internally, like a bunch of integers or an epoch time.

    On the flip side, you'll often have to take some standard representation, like an epoch, and convert it to some other format.

  • Basic representation

    A nice datetime object can be quite handy. These range from lightweight wrappers around Perl's localtime() to objects that try to provide methods for all possible datetime operations.

  • Math and calculations

    You'll often want to answer questions like "what day is seven days after today" or "how much time is there between now and midnight?" This is closely related to the task of figuring out the date Easter falls on in a given year, or what day of the week Martin Luther King's birthday falls on.

There are plenty of other things we can do with datetimes, but these are largely elaborations of the above areas of functionality.

Perl's Built-in Date/Time Functionality

Perl has some built-in functionality for handling dates and times. This functionality includes:

  • time()

    A function that returns the current epoch time.

  • localtime() and gmtime()

    These functions convert an epoch time into a set of components representing the local time. They both return arrays containing things like the hour, minute, month, etc., though some of the values returned are awkward to use. For example, the year is the actual year minus 1900.

    The localtime() function returns the datetime of your current location, based on your system's time zone setting, while the gmtime function returns the current UTC datetime.

    The Time::localtime and Time::gmtime modules provide a thin object layer around gmtime() and localtime() respectively, so you can do things like print gmtime()->year. Of course, that still prints the year minus 1900.

  • Time::Local

    This core module provide functions that translate from the array of components returned by localtime() or gmtime() back to an epoch time.

  • POSIX.pm

    The POSIX module included with Perl provides interfaces to several common C library functions for datetimes, such as strftime(). I consider this the last refuge for the desperate, because the POSIX.pm module is a memory hog, and the C library interface is rather un-Perlish.

The Big Stars

These are the modules that have the longest history, and are the most widely used.

  • TimeDate distribution

    This distribution, maintained and mostly written by Graham Barr, includes three modules. Date::Format module provides a few functions for formatting datetime output, including a strftime() similar to the one in the standard C library. It can work with either epoch times, or the array of components returned by Perl's localtime() function.

    Date::Parse parses a limited set of common datetime formats, returning either an epoch time or an array of components.

    The distribution also includes a number of language modules which can be used to localize both parsing and formatting.

    Finally, Time::Zone provides an interface to time zone offsets, based on short time zone names like "EST" or "GMT". As was mentioned before, these names are not official or standardized, so they are of limited usefulness.

    All of these modules are limited by their use of epoch time internally, but they are fairly quick and light weight. For complex datetime problems, these modules probably don't automate enough of the dirty work.

  • Time::Piece

    Written and maintained by Matt Sergeant, this module is based on an interface designed by Larry Wall. It provides a convenient object API for datetimes, though the API is a bit confusing. For example, $time->mon returns the month number (1-12) while $time->month returns the abbreviated name of the month.

    It also implements basic parsing and formatting via the use of the C-level strptime() and strftime() functions. The included Time::Seconds module allows for basic date math, such as $tomorrow = $time + ONE_DAY.

    The implementation is fairly lightweight, but is limited by its use of epoch time as the internal representation. It is certainly useful, but like the TimeDate modules, it doesn't go far enough for more complex uses.

    As of this writing, Matt Sergeant has released an experimental version of Time::Piece based on my DateTime.pm module. This leaves the Time::Piece API unchanged, but allows it to handle dates that cannot be represented by a given system's epoch.

  • Date::Manip

    Sullivan Beck's Date::Manip is really huge, weighing in at about 3,000 lines of code. As might be expected of that much code, there is something for everyone here. It handles parsing, formatting, date math, as well as more esoteric things like recurring datetimes and business day calculations. It should be noted that it's time zone support is pretty much the same that provided by Time::Zone.

    This module's most unique feature is its very flexible parser, which is capable of handling things like "3 weeks ago Monday" or "next Sunday". It also provides some parsing for specifying recurrences, like "every 3rd Monday in 2003".

    Unlike everything else we've covered so far, this module is not limited to epoch times. It has an entirely functional interface, and in my opinion the API could be cleaner. I dislike the fact that some functions do many different things, with the output depending either on the argument type(s), explicit flags, or both.

    But the biggest problem with this module, which is acknowledged by its author, is its size. It uses lots of memory (about 3MB on my system), and is fairly slow to load. The former makes it problematic for mod_perl, and the latter causes problems with CGI scripts. You can find most of its features elsewhere, in slimmer modules, but if size and speed are not an issue, this module almost certainly does everything you want.

  • Date::Calc

    Steffen Beyer's Date::Calc distribution is where you go when you need functionality combined with speed. This modules offers much of the functionality provided by Date::Manip, but the core Date::Calc module has a much smaller memory footprint than Date::Manip (about 1MB on my box), and much greater speed. This is based its core implementation is in C.

    This module provides functions for calculating all sorts of date-related information, as well some minimal parsing and formatting operations. The interface requires some hand-holding to use, as every function returns one or more elements, never a data structure such as a hash, so you have to constantly deal with passing and receiving arrays of values.

    The distribution also includes a class called Date::Calc::Object, which can represent either a datetime or a "delta", the difference between two datetimes. This dual nature is odd, since many of the methods applicable to one will not work for the other. The class supports overloading for date math and comparison, so you can do things like $date + [1, 2, 3], which adds one year, two months, and three days to the given date.

    Finally, there is a Date::Calendar object, which can be used to set up a calendar "profile" defining holidays, work days, and partial work days in a variety of ways. This is quite useful if you need to calculate a day X business days in the future, while taking account of a particular organization's holidays.

    None of the modules in this distribution rely on epoch time, though they only support positive-numbered years. Time zone support is extremely minimal, and is done only as offsets, without support for daylight saving rules. Localization for a variety of languages is implemented for parsing and formatting.

And a Cast of Thousands ...

It wouldn't be Perl if there weren't at least a dozen other modules with overlapping functionality, right? In this case, there's more than two dozen! For sanity's sake, I've excluded more than a few modules, in particular those that either appeared to be unmaintained, or those without enough comprehensible documentation for me to figure out what the heck they do. In alphabetical order, those remaining are:

  • Date::PeriodParser - Simon Cozens

    Parses English descriptions of time periods like "this morning" or "around two weeks ago" and turns them into pairs of epoch times representing the beginning and end of the period.

  • Date::Range - Tony Bowden

    An object for representing a range of dates. Lets you figure out whether two ranges overlap, whether a date is included in a given range, and a few other useful things.

  • Date::Roman - Leo Cacciari

    An object for representing dates according to the Roman calendar.

  • Date::Set - Fl&0xE1;vio Soibelmann Glock

    An object that represents sets of datetimes. A set can be either a datetime span, a recurring set of dates, or a fixed set of specific datetimes. It provides set math operations for all of these, as well as allowing you to iterate across the members of the set. Also see Date::Set::Timezone.

  • Date::Simple - John Tobey
    A

    date object that represents dates only, without a time component. It provides date math. It is not epoch-limited.

  • Date::Tie - Fl&0xE1;vio Soibelmann Glock

    A basic date object with date math. Its functionality is implemented via a tied hash interface. It supports fixed offset time zones, and it is epoch-limited.

  • HTTP::Date - Gisle Aas

    This module is part of the LWP distribution. It parses many common datetime formats, including all of those that are used by the HTTP protocol. If Date::Parse doesn't understand all the formats you need to deal with, this module provides a good alternative.

  • Time::Duration - Sean Burke

    Given a number of seconds, this module return a human language description of the duration. For example, 3660 seconds would be "1 hour and 1 minute".

  • Time::Human - Simon Cozens

    Given an epoch time, this module can return a string describing that time in colloquial terms, such as "half past nine". It has hooks for localization.

  • Time::Unix - Nathan Wiger

    Loading this module forces Perl's time() function to use the Unix epoch system, regardless of the OS on which the code is running.

  • Time modules - David Muir Sharnoff

    This distribution includes Time::CTime, Time::ParseDate, and Time::Timezone, which are slightly more powerful versions of Graham Barr's Date::Format, Date::Parse, and Time::Zone.

But Wait, There's More!

Not content to leave well enough alone, I've recently started a project to fix what I see as the fundamental problem with the state of Perl datetime modules. That fundamental problem is that despite the fact that almost all the possible functionality you could want exists, it is scattered over a large number of incompatible modules.

For example, Date::Calc provides good functionality for various datetime calculations and date math, but the values it returns are not well suited for being passed to Date::Format. And while Date::Manip has powerful parsing, the return value from its parsing routine cannot be passed to any other module without further massaging. And so and so on.

For example, if I wanted to parse a date with Date::Parse and then calculate the date one week later with Date::Calc, and then format it with Date::Format, I'd have to do the following:

my $time1 = str2time($date_string); # Date::Parse
# Today() from Date::Calc returns
# date information for an epoch time
my ($y1, $m1, $d1) = Today($time);
my ($y2, $m2, $d2) = Add_Delta_Days($y1, $m1, $d1, 7);
my $time2 = Date_to_Time($y2, $m2, $d2);
print strftime('%B %d, %Y', $time2);

Of course, I didn't have to use the strftime() function for formatting a date. I could have done it with just Date::Calc as:

print sprintf('%s %02d %04d',
Month_to_Text($m2), $d2, $y2);

But I want convenience. If I'm dealing with many datetimes and I need to parse various inputs, generate different formats, and do lots of calculations, then a convenient and uniform API can go a long way towards code maintainability. The extra glue code needed to make different modules cooperate can quickly obscure the actual intent of the program.

Efforts in the past to herd all the existing module authors towards a common API have failed, so rather than try that again, I decided to just write even more datetime code. As we all know, the best way to put out a fire is to pour copious amounts of gasoline on it. In order to make my project sound cool, I'm calling it the "Perl DateTime Suite", which sounds much better than "more date and time modules".

The goal for this project is to produce a suite of datetime modules that do everything you'd ever need related to dates and times. The modules in this suite will cooperate with each other, which means that a module that parses datetimes will return a standard object, and a module for formatting datetimes will accept that standard object.

So far, this project has attracted interest from a number of people, and discussions on the datetime@perl.org list have gone well. Recently, I released alpha versions of the core object, DateTime.pm, as well as DateTime::TimeZone, which provides complete time zone support based on the Olson database. There is also an alpha of DateTime::Format::ICal on CPAN, a module for parsing and formatting iCal datetimes and durations. In the future, look for more modules in the suite, all of which will begin with "DateTime::".

Further Resources and Reading

Some excellent online resources include Claus Tondering's Calendar FAQ at http://www.tondering.dk/claus/calendar.html, as well as the US Naval Observatory Time Service site at http://tycho.usno.navy.mil/. For more information on time zones and the Olson database, see http://www.twinsun.com/tz/tz-link.htm.

If you're interested in discussing anything related to Perl and datetimes, check out the datetime@perl.org list. You can subscribe by sending a message to datetime-subscribe@perl.org.

Thanks

Thanks to Jonathan Scott Duff, Nick Tonkin, and Iain Truskett for reviewing this article before publication.

This week on Perl 6, week ending 2003-03-09

Ooh look, it's another of those Perl 6 Summaries where Piers tries to work a gratuitous reference to Leon Brocard into a summary of what's been happening to the Perl 6 development process this week.

As tradition dictates, we'll start with all the action from perl6-internals.

Object Specs

With his great good taste and impeccable timing, Dan Sugalski managed to release his second draft of an Object spec at 22.27 GMT on Sunday March the second. Which meant that, strictly, the root message of this thread doesn't belong in this summary. To which I say 'Tosh!'

Brent Dax wondered how this spec coped with classless OO languages like JavaScript and whether such languages would have to fake things with anonymous classes. It seems that faking it is going to the 'right' way.

Erik Bagfors asked for a clear example of the difference between attributes and properties, and between class and object variables. Dan didn't actually offer an example, but explained that properties are 'runtime assigned name values pairs that are stuck on a variable', while attributes 'correspond to instance variables or slots in other object systems'.

Dave Whipp wondered about the semantics of creation and destruction of objects with a long list of questions. Dan replied that there was a 'Class Spec' forthcoming at some point too. The Object spec concerns itself with 'just the behaviour of objects that already exist'.

Some further discussion of this happened in perl6-language, but Dan dragged it back into perl6-internals. I shall follow his example by summarizing those subthreads here, whilst muttering rude words under my breath about people make a poor summarizer's life so hard.

Garrett Goebel (not Göbel, as I mistakenly spelt it last week) wondered about serialization and persistence and wondered whether a 'pure Parrot' serialization scheme would be possible, allowing one to serialize Perl, Python, Ruby etc objects at the Parrot level. Dan pointed out that there were freeze and thaw vtable entries for PMCs, and said that he had some 'sketches of necessary core functionality to make it feasible'. It looks like simple 'data objects' will be simple to serialize at the parrot level.

Sam Vilain suggested a whole bunch of extra things to be associated with Objects. Dan noted that all of Sam's suggestions were actually things that belonged with Classes rather than objects, with the exception of what Sam referred to as 'associations'. Garret asked for some clarification of what Sam was talking about, tying it to something Sam had said in perl6-language about 'exporting object relationships in a sensible and consistent manner', so Sam did a brain dump which outlined a scheme for persisting almost anything in a language neutral manner (requiring more or less complicity on the part of some languages) with loads of neat ideas in it. Dan liked the ideas, but isn't going to go the whole way (probably). Dan also implied that Parrot's serialize/deserialize methods would make use of the GC system to make sure that everything gets dumped properly, once and only once, solving the problem of cyclic references (YAY! An OO Persistence tool writer writes).

Dan released the next iteration of his tentative object spec on Sunday (but a little earlier this time). He and Uri Guttman spent the rest of the day thrashing out some issues and clarifying some terminology. Dan is threatening a glossary with the next try at an object spec.

(Dan just suggested on IRC that I summarize these threads as 'Dan muttered incoherently about objects again. Folks generally humored him.' Which I would never do of course; I'm English and I know how to spell 'humour'.)

http://groups.google.com/groups -- Object spec (try 2)

http://groups.google.com/groups -- Garrett's first question

http://groups.google.com/groups -- Sam Vilain's suggestions

http://groups.google.com/groups -- Garrett asks for clarity

http://groups.google.com/groups -- Sam's brain dump.

http://groups.google.com/groups -- Object Spec (try 3)

IMCC and multiple source files

K Stol wondered whether it was possible to write multi-file IMCC programs. Leopold Tötsch answered that, whilst there's no Parrot linker as yet, but by the time we read his post we could use the .include macro to glue source files together.

http://groups.google.com/groups

Patch Roundup

Jason Gloudon added some more ops to the sun4 JIT core. His patch was applied, along with most of Steve Peters' patches from last week. Jürgen Bömmels patched the print ops to use pure PIO (Parrot IO) rather than a scary stdio/PIO hybrid. Dan applied this one with alacrity (I don't think he likes stdio).

Coroutines end and DFG

Leo Tötsch wondered if coroutines ended and if they did, how? Leo noted that coroutines caused IMCC some big headaches with register allocation. Jonathan Sillito replied saying that coroutines were never ending and pointed out that each coroutine has its own user stack which means that saveall and restoreall do the right thing, which wasn't quite what gives IMCC headaches because you can't pass arguments in the stack and have to rely on lexicals or globals, which apparently means that IMCC will need some new hints to work out what's happening.

Nobody explained what DFG stands for.

http://groups.google.com/groups

SableVM

Tupshin pointed everyone at SableVM 'an interesting Java VM done as a doctoral thesis'. Leo found some of the optimization techniques in the thesis interesting. Which scares me.

http://groups.google.com/groups

http://www.sablevm.org/ -- SableVM

http://www.info.uqam.ca/%7Eegagnon/gagnon-phd.pdf -- SableVM Thesis

Parrot 0.0.10 freeze

Steve Fink announced the feature freeze before release 0.0.10. Parrot is now frozen prior to the 0.0.10 release on March 15th.

http://groups.google.com/groups

Meanwhile, over in perl6-language

The object discussion leaked over from perl6-internals, I'll cover the language relevant parts here. Dave Whipp had some interesting things to say about Associations in response to one of Sam Vilain's posts that I covered in perl6-internals and Andy Wardley offered a counter argument. There was a fair amount of discussion about the advisability of Multiple Inheritance (which can be summed up as ``we don't have to like it but we probably have to do it''.) Andy Wardley proposed stealing Ruby's Mixins and Simon Cozens came up with a nifty Perl6-o-meter which Larry pointed out was rather more general than Simon intended.

http://groups.google.com/groups

http://groups.google.com/groups

Signal/slot like mechanism

Yannick Le Saint wondered if there was a notification mechanism planned for Perl 6. Luke Palmer offered a simple implementation which wasn't quite what Yannick was after which looks like it might possibly need modifications to the 'Class' Class. Dan pointed noted that there would almost certainly be a mechanism for registering watcher subs with the internals (``Yay!'' writes an insane summarizer with a penchant for digging around in Perl's runtime internals...).

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

Another quiet week. However, as I write this Apocalypse 6 is available to read on perl.com and Dan is making noises on IRC about the next draft of the object spec being released some time before Sunday. I have the feeling that next week's summary will be covering a lot more traffic.

Still haven't produced an American Odyssey web page. I've been too busy taking photographs and relearning how to use my large format camera.

If you appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk.

Apocalypse 6

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

This is the Apocalypse on Subroutines. In Perl culture the term "subroutine" conveys the general notion of calling something that returns control automatically when it's done. This "something" that you're calling may go by a more specialized name such as "procedure", "function", "closure", or "method". In Perl 5, all such subroutines were declared using the keyword sub regardless of their specialty. For readability, Perl 6 will use alternate keywords to declare special subroutines, but they're still essentially the same thing underneath. Insofar as they all behave similarly, this Apocalypse will have something to say about them. (And if we also leak a few secrets about how method calls work, that will make Apocalypse 12 all the easier--presuming we don't have to un-invent anything between now and then...)

Here are the RFCs covered in this Apocalypse. PSA stands for "problem, solution, acceptance", my private rating of how this RFC will fit into Perl 6. I note that none of the RFCs achieved unreserved acceptance this time around. Maybe I'm getting picky in my old age. Or maybe I just can't incorporate anything into Perl without "marking" it...


    RFC   PSA   Title
    ---   ---   -----
     21   abc   Subroutines: Replace C<wantarray> with a generic C<want>
                   function
     23   bcc   Higher order functions
     57   abb   Subroutine prototypes and parameters
     59   bcr   Proposal to utilize C<*> as the prefix to magic subroutines
     75   dcr   structures and interface definitions
    107   adr   lvalue subs should receive the rvalue as an argument
    118   rrr   lvalue subs: parameters, explicit assignment, and wantarray()
                   changes
    128   acc   Subroutines: Extend subroutine contexts to include name
                   parameters and lazy arguments
    132   acr   Subroutines should be able to return an lvalue
    149   adr   Lvalue subroutines: implicit and explicit assignment
    154   bdr   Simple assignment lvalue subs should be on by default
    160   acc   Function-call named parameters (with compiler optimizations)
    168   abb   Built-in functions should be functions
    176   bbb   subroutine / generic entity documentation
    194   acc   Standardise Function Pre- and Post-Handling
    271   abc   Subroutines : Pre- and post- handlers for subroutines
    298   cbc   Make subroutines' prototypes accessible from Perl
    334   abb   Perl should allow specially attributed subs to be called as C
                   functions
    344   acb   Elements of @_ should be read-only by default

In Apocalypses 1 through 4, I used the RFCs as a springboard for discussion. In Apocalypse 5 I was forced by the complexity of the redesign to switch strategies and present the RFCs after a discussion of all the issues involved. That was so well received that I'll try to follow the same approach with this and subsequent Apocalypses.

But this Apocalypse is not trying to be as radical as the one on regexes. Well, okay, it is, and it isn't. Alright, it is radical, but you'll like it anyway (we hope). At least the old way of calling subroutines still works. Unlike regexes, Perl subroutines don't have a lot of historical cruft to get rid of. In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle. Perl 6 will let you migrate the cruft out of the user-defined code and back into the declarations where it belongs. Then you will think it to be very beautiful cruft indeed (we hope).

Perl 5's subroutines have a number of issues that need to be dealt with. First of all, they're just awfully slow, for various reasons:

  • Construction of the @_ array
  • Needless prepping of potential lvalues
  • General model that forces lots of run-time processing
  • Difficulty of optimization
  • Storage of unneeded context
  • Lack of tail recursion optimization
  • Named params that aren't really
  • Object model that forces double dispatch in some cases

Quite apart from performance, however, there are a number of problems with usability:

  • Not easy to detect type errors at compile time
  • Not possible to specify the signatures of certain built-in functions
  • Not possible to define control structures as subroutines
  • Not possible to type-check any variadic args other than as a list
  • Not possible to have a variadic list providing scalar context to its elements
  • Not possible to have lazy parameters
  • Not possible to define immediate subroutines (macros)
  • Not possible to define subroutines with special syntax
  • Not enough contextual information available at run time.
  • Not enough contextual information available at compile time.

In general, the consensus is that Perl 5's simple subroutine syntax is just a little too simple. Well, okay, it's a lot too simple. While it's extremely orthogonal to always pass all arguments as a single variadic array, that mechanism does not always map well onto the problem space. So in Perl 6, subroutine syntax has blossomed in several directions.

But the most important thing to note is that we haven't actually added a lot of syntax. We've added some, but most of new capabilities come in through the generalized trait/property system, and the new type system. But in those cases where specialized syntax buys us clarity, we have not hesitated to add it. (Er, actually, we hesitated quite a lot. Months, in fact.)

One obvious difference is that the sub on closures is now optional, since every brace-delimited block is now essentially a closure. You can still put the sub if you like. But it is only required if the block would otherwise be construed as a hash value; that is, if it appears to contain a list of pairs. You can force any block to be considered a subroutine with the sub keyword; likewise you can force any block to be considered a hash value with the hash keyword. But in general Perl just dwims based on whether the top-level is a list that happens to have a first argument that is a pair or hash:


    Block               Meaning
    -----               -------
    { 1 => 2 }          hash { 1 => 2 }
    { 1 => 2, 3 => 4 }  hash { 1 => 2, 3 => 4 }
    { 1 => 2, 3, 4 }    hash { 1 => 2, 3 => 4 }
    { %foo, 1 => 2 }    hash { %foo.pairs, 1 => 2 }

Anything else that is not a list, or does not start with a pair or hash, indicates a subroutine:


    { 1 }               sub { return 1 }
    { 1, 2 }            sub { return 1, 2 }
    { 1, 2, 3 }         sub { return 1, 2, 3 }
    { 1, 2, 3 => 4 }    sub { return 1, 2, 3 => 4 }
    { pair 1,2,3,4 }    sub { return 1 => 2, 3 => 4 }
    { gethash() }       sub { return gethash() }

This is a syntactic distinction, not a semantic one. That last two examples are taken to be subs despite containing functions returning pairs or hashes. Note that it would save no typing to recognize the pair method specially, since hash automatically does pairing of non-pairs. So we distinguish these:


    { pair 1,2,3,4 }    sub { return 1 => 2, 3 => 4 }
    hash { 1,2,3,4 }    hash { 1 => 2, 3 => 4 }

If you're worried about the compiler making bad choices before deciding whether it's a subroutine or hash, you shouldn't. The two constructs really aren't all that far apart. The hash keyword could in fact be considered a function that takes as its first argument a closure returning a hash value list. So the compiler might just compile the block as a closure in either case, then do the obvious optimization.

Although we say the sub keyword is now optional on a closure, the return keyword only works with an explicit sub. (There are other ways to return values from a block.)

Subroutine Declarations

You may still declare a sub just as you did in Perl 5, in which case it behaves much like it did in Perl 5. To wit, the arguments still come in via the @_ array. When you say:


    sub foo { print @_ }

that is just syntactic sugar for this:


    sub foo (*@_) { print @_ }

That is, Perl 6 will supply a default parameter signature (the precise meaning of which will be explained below) that makes the subroutine behave much as a Perl 5 programmer would expect, with all the arguments in @_. It is not exactly the same, however. You may not modify the arguments via @_ without declaring explicitly that you want to do so. So in the rare cases that you want to do that, you'll have to supply the rw trait (meaning the arguments should be considered "read-write"):


    sub swap (*@_ is rw) { @_[0,1] = @_[1,0] };

The Perl5-to-Perl6 translator will try to catch those cases and add the parameter signature for you when you want to modify the arguments. (Note: we will try to be consistent about using "arguments" to mean the actual values you pass to the function when you call it, and "parameters" to mean the list of lexical variables declared as part of the subroutine signature, through which you access the values that were passed to the subroutine.)

Perl 5 has rudimentary prototypes, but Perl 6 type signatures can be much more expressive if you want them to be. The entire declaration is much more flexible. Not only can you declare types and names of individual parameters, you can add various traits to the parameters, such as rw above. You can add traits to the subroutine itself, and declare the return type. In fact, at some level or other, the subroutine's signature and return type are also just traits. You might even consider the body of the subroutine to be a trait.

For those of you who have been following Perl 6 development, you'll wonder why we're now calling these "traits" rather than "properties". They're all really still properties under the hood, but we're trying to distinguish those properties that are expected to be set on containers at compile time from those that are expected to be set on values at run time. So compile-time properties are now called "traits". Basically, if you declare it with is, it's a trait, and if you add it onto a value with but, it's a property. The main reason for making the distinction is to keep the concepts straight in people's minds, but it also has the nice benefit of telling the optimizer which properties are subject to change, and which ones aren't.

A given trait may or may not be implemented as a method on the underlying container object. You're not supposed to care.

There are actually several syntactic forms of trait:


    rule trait :w {
          is <ident>[\( <traitparam> \)]?
        | will <ident> <closure>
        | of <type>
        | returns <type>
    }

(We're specifying the syntax here using Perl 6 regexes. If you don't know about those, go back and read Apocalypse 5.)

A <type> is actually allowed to be a junction of types:


    sub foo returns Int|Str {...}

The will syntax specifically introduces a closure trait without requiring the extra parens that is would. Saying:


    will flapdoodle { flap() and doodle() }

is exactly equivalent to:


    is flapdoodle({ flap() and doodle() })

but reads a little better. More typically you'll see traits like:


    will first { setup() }
    will last { teardown() }

The final block of a subroutine declaration is the "do" trait. Saying:


    sub foo { ... }

is like saying:


    sub foo will do { ... }

Note however that the closure eventually stored under the do trait may in fact be modified in various ways to reflect argument processing, exception handling, and such.

We'll discuss the of and returns traits later when we discuss types. Back to syntax.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

The sub form

A subroutine can be declared as lexically scoped, package scoped, or unscoped:


    rule lexicalsub :w {
        <lexscope> <type>?
        <subintro> <subname> <psignature>?
        <trait>*
        <block>
    }

    rule packagesub :w {
        <subintro> <subname> <psignature>?
        <trait>*
        <block>
    }

    rule anonsub :w {
        <subintro> <psignature>?
        <trait>*
        <block>
    }

The non-lexically scoped declaration cannot specify a return type in front. The return type can only be specified as a trait in that case.

As in Perl 5, the difference between a package sub and an anonymous sub depends on whether you specify the <subname>. If omitted, the declaration (which is not really a declaration in that case) generates and returns a closure. (Which may not really be a closure if it doesn't access any external lexicals, but we call them all closures anyway just in case...)

A lexical subroutine is declared using either my or our:


    rule lexscope { my | our }

This list doesn't include temp or let because those are not declarators of lexical scope but rather operators that initiate dynamic scoping. See the section below on Lvalue subroutines for more about temp and let.

In both lexical and package declarations, the name of the subroutine is introduced by the keyword sub, or one of its variants:


    rule subintro { sub | method | submethod | multi | rule | macro }

A method participates in inheritance and always has an invocant (object or class). A submethod has an invocant but does not participate in inheritance. It's a sub pretending to be a method for the current class only. A multi is a multimethod, that is, a method that called like a subroutine or operator, but is dispatched based on the types of one or more of its arguments.

Another variant is the regex rule, which is really a special kind of method; but in actuality rules probably get their own set of parse rules, since the body of a rule is a regex. I just put "rule" into <subintro> as a placeholder of sorts, because I'm lazy.

A macro is a subroutine that is called immediately upon completion of parsing. It has a default means of parsing arguments, or it may be bound to an alternate grammar rule to parse its arguments however you like.

These syntactic forms correspond the various Routine types in the Code type hierarchy:


                                   Code
                        ____________|________________
                       |                             |
                    Routine                        Block
       ________________|_______________            __|___
      |     |       |       |    |     |          |      |
     Sub Method Submethod Multi Rule Macro      Bare Parametric

The Routine/Block distinction is fairly important, since you always return out of the current Routine, that is, the current Sub, Method, Submethod, Multi, Rule, or Macro. Also, the &_ variable refers to your current Routine. A Block, whether Bare or Parametric, is invisible to both of those notions.

(It's not yet clear whether the Bare vs Parametric distinction is useful. Some apparently Bare blocks are actually Parametric if they refer to $_ internally, even implicitly. And a Bare block is just a Parametric block with a signature of (). More later.)

A <psignature> is a parenthesized signature:


    rule psignature :w { \( <signature> \) }

And there is a variant that doesn't declare names:


    rule psiglet :w { \( <siglet> \) }

(We'll discuss "siglets" later in their own section.)

It's possible to declare a subroutine in an lvalue or a signature as if it were an ordinary variable, in anticipation of binding the symbol to an actual subroutine later. Note this only works with an explicit name, since the whole point of declaring it in the first place is to have a name for it. On the other hand, the formal subroutine's parameters aren't named, hence they are specified by a <psiglet> rather than a <psignature>:


    rule scopedsubvar :w {
        <lexscope> <type>? &<subname> <psiglet>? <trait>*
    }

    rule unscopedsubvar :w {
        &<subname> <psiglet>? <trait>*
    }

If no <psiglet> is supplied for such a declaration, it just uses whatever the signature of the bound routine is. So instead of:


    my sub foo (*@_) { print @_ }

you could equivalently say:


    my &foo ::= sub (*@_) { print @_ };

(You may recall that ::= does binding at compile time. Then again, you may not.)

If there is a <psiglet>, however, it must be compatible with the signature of the routine that is bound to it:


    my &moo(Cow) ::= sub (Horse $x) { $x.neigh };     # ERROR

"Pointy subs"

"Pointy subs" declare a closure with an unparenthesized signature:


    rule pointysub :w {
        -\> <signature> <block>
    }

They may not take traits.

Bare subs

A bare block generates a closure:


    rule baresub :w {
        <block> { .find_placeholders() }
    }

A bare block declaration does not take traits (externally, anyway), and if there are any parameters, they must be specified with placeholder variables. If no placeholders are used, $_ may be treated as a placeholder variable, provided the surrounding control structure passes an argument to the the closure. Otherwise, $_ is bound as an ordinary lexical variable to the outer $_. ($_ is also an ordinary lexical variable when explicit placeholders are used.)

More on parameters below. But before we talk about parameters, we need to talk about types.

Digression on types

Well, what are types, anyway? Though known as a "typeless" language, Perl actually supports several built-in container types such as scalar, array, and hash, as well as user-defined, dynamically typed objects via bless.

Perl 6 will certainly support more types. These include some low-level storage types:


    bit int str num ref bool

as well as some high-level object types:


    Bit Int Str Num Ref Bool
    Array Hash Code IO
    Routine Sub Method Submethod Macro Rule
    Block Bare Parametric
    Package Module Class Object Grammar
    List Lazy Eager

(These lists should not be construed as exhaustive.) We'll also need some way of at least hinting at representations to the compiler, so we may also end up with types like these:


    int8 int16 int32 int64
    uint8 uint16 uint32 uint64

Or maybe those are just extra size traits on a declaration somewhere. That's not important at this point.

The important thing is that we're adding a generalized type system to Perl. Let us begin by admitting that it is the height of madness to add a type system to a language that is well-loved for being typeless.

But mad or not, there are some good reasons to do just that. First, it makes it possible to write interfaces to other languages in Perl. Second, it gives the optimizer more information to think about. Third, it allows the S&M folks to inflict strongly typed compile-time semantics on each other. (Which is fine, as long as they don't inflict those semantics on the rest of us.) Fourth, a type system can be viewed as a pattern matching system for multi-method dispatch.

Which basically boils down to the notion that it's fine for Perl to have a type system as long as it's optional. It's just another area where Perl 6 will try to have its cake and eat it too.

This should not actually come as a surprise to anyone who has been following the development of Perl 5, since the grammatical slot for declaring a variable's effective type has been defined for some time now. In Perl 5 you can say:


    my Cat $felix;

to declare a variable intended to hold a Cat object. That's nice, as far as it goes. Perl 6 will support the same syntax, but we'll have to push it much further than that if we're to have a type system that is good enough to specify interfaces to languages like C++ or Java. In particular, we have to be able to specify the types of composite objects such as arrays and hashes without resorting to class definitions, which are rather heavyweight--not to mention opaque. We need to be able to specify the types of individual function and method parameters and return values. Taken collectively, these parameter types can form the signature of a subroutine, which is one of the traits of the subroutine.

And of course, all this has to be intuitively obvious to the naive user.

Yeah, sure, you say.

Well, let's see how far we can get with it. If the type system is too klunky for some particular use, people will simply avoid using it. Which is fine--that's why it's optional.

First, let's clarify one thing that seems to confuse people frequently. Unlike some languages, Perl makes a distinction between the type of the variable, and the type of the value. In Perl 5, this shows up as the difference between overloading and tying. You overload the value, but you tie the variable. When you say:


    my Cat $felix;

you are specifying the type of the value being stored, not the type of the variable doing the storing. That is, $felix must contain a reference to a Cat value, or something that "isa" Cat. The variable type in this case is just a simple scalar, though that can be changed by tying the variable to some class implementing the scalar variable operations.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

In Perl 6, the type of the variable is just one of the traits of the variable, so if you want to do the equivalent of a tie to the Box class, you say something like:


    my Cat $felix is Box;

That declares your intent to store a Cat value into a Box variable. (Whether the cat will then dead or alive (or dead|alive) depends on the definition of the Box class, and whether the Box object's side effects extend to the Cat value stored in it.)

But by default:


    my Cat $felix;

just means something like:


    my Cat $felix is Scalar;

Likewise, if you say:


    my Cat @litter;

it's like saying:


    my Cat @litter is Array;

That is, @litter is an ordinary array of scalar values that happen to be references to Cats. In the abstract, @litter is a function that maps integers to cats.

Likewise,


    my Cat %pet;

is like:


    my Cat %pet is Hash;

You can think of the %pet hash as a function that maps cat names (strings) to cats. Of course, that's an oversimplification--for both arrays and hashes, subscripting is not the only operation. But it's the fundamental operation, so the declared type of the returned value reflects the return value of such a subscripted call.

Actually, it's not necessarily the return type. It's merely a type that is consistent with the returned type. It would be better to declare:


    my Animal %pet;

and then you could return a Cat or a Dog or a Sponge, presuming all those are derived from Animal. You'd have to generalize it a bit further if you want to store your pet Rock. In the limit, you can just leave the type out. When you say:


    my %pet;

you're really just saying:


    my Object %pet is Hash;

...except that you're not. We have to push it further than that, because we have to handle more complicated structures as well. When you say:


    my Cat @litter is Array;

it's really shorthand for:


    my @litter is Array of Cat;

That is, "Cat" is really a funny parameter that says what kind of Array you have. If you like, you could even write it like this:


    my @litter is Array(returns => Cat)

Likewise you might write:


    my %pet is Hash(keytype => Str, returns => Cat)

and specify the key type of the hash. The "of" keyword is just syntactic sugar for specifying the return type of the previous storage class. So we could have


    my %pet is Hash of Array of Array of Hash of Array of Cat;

which might really mean:


    my %pet is Hash(keytype => Str,
                    returns => Array(
                        returns => Array(
                            returns => Hash(
                                keytype => Str,
                                returns => Array(
                                    returns => Cat)))))

or some such.

I suppose you could also write that as:


    my Array of Array of Hash of Array of Cat %pet;

but for linguistic reasons it's probably better to keep the variable name near the left and put the long, heavy phrases to the right. (People tend to prefer to say the short parts of their sentences before the long parts--linguists call this the "end-weight" problem.) The Hash is implied by the %pet, so you could leave out the "is" part and just say:


    my %pet of Array of Array of Hash of Array of Cat;

Another possibility is:


    my Cat %pet is Hash of Array of Array of Hash of Array;

That one reads kinda funny if you leave out the "is Hash", though. Nevertheless, it says that we have this funny data structure that has multiple parameters that you can view as a funny function returning Cat. In fact, "returns" is a synonym for "of". This is also legal:


    my @litter returns Cat;

But the "returns" keyword is mostly for use by functions:


    my Cat sub find_cat($name) {...}

is the same as:


    my sub find_cat($name) returns Cat {...}

This is more important for things like closures that have no "my" on the front:


    $closure = sub ($name) returns Cat {...}

Though for the closure case, it's possible we could define some kind of non-my article to introduce a type unambiguously:


    $closure = a Camel sub ($name) {...}
    $closure = an Aardvark sub () {...}

Presumably "a" or "an" is short for "anonymous". Which is more or less what the indefinite article means in English.

However, we need returns anyway in cases where the return value is complicated, so that you'd rather list it later (for end-weight reasons):


    my sub next_prisoner() returns (Nationality, Name, Rank, SerialNo) {...}

Note that the return type is a signature much like the parameter types, though of course there are no formal parameter names on a return value. (Though there could be, I suppose.) We're calling such nameless signatures "siglets".

Stub declarations

When you declare a subroutine, it can change how the rest of the current file (or string) is compiled. So there is some pressure to put subroutine declarations early. On the other hand, there are good reasons for putting subroutine definitions later in the file too, particularly when you have mutually recursive subroutines. Beyond that, the definition might not even be supplied until run time if you use some kind of autoloading mechanism. (We'll discuss autoloading in Apocalypse 10, Packages.) Perl 5 has long supported the notion of "forward" declarations or "stubs" via a syntax that looks like this:


    sub optimal;

Perl 6 also supports stubbing, but instead you write it like this:


    sub optimal {...}

That is, the stub is distinguished not by leaving the body of the function out, but by supplying a body that explicitly calls the "..." operator (known affectionately as the "yada, yada, yada" operator). This operator emits a warning if you actually try to execute it. (It can also be made to pitch an exception.) There is no warning for redefining a {...} body.

We're moving away from the semicolon syntax in order to be consistent with the distinction made by other declarations:


    package Fee;        # scope extends to end of file
    package Fee { ... } # scope extends over block

    module Fie;         # scope extends to end of file
    module Fie { ... }  # scope extends over block

    class Foe;          # scope extends to end of file
    class Foe { ... }   # scope extends over block

To be consistent, a declaration like:


    sub foo;

would therefore extend to the end of the file. But that would be confusing for historical reasons, so we disallow it instead, and you have to say:


    sub foo {...}

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Scope of subroutine names

Perl 5 gives subroutine names two scopes. Perl 6 gives them four.

Package scoped subs

All named subs in Perl 5 have package scope. (The body provides a lexical scope, but we're not talking about that. We're talking about where the name of the subroutine is visible from.) Perl 6 provides by default a package-scoped name for "unscoped" declarations such as these:


          sub fee {...}
       method fie {...}
    submethod foe {...}
        multi foo {...}
        macro sic {...}

Methods and submethods are ordinarily package scoped, because (just as in Perl 5) a class's namespace is kept in a package.

Anonymous subs

It's sort of cheating to call this a subroutine scope, because it's really more of a non-scope. Scope is a property of the name of a subroutine. Since closures and anonymous subs have no name, they naturally have no intrinsic scope of their own. Instead, they rely on the scope of whatever variable contains a reference to them. The only way to get a lexically scoped subroutine name in Perl 5 was by indirection:


    my $subref = sub { dostuff(@_) }
    &$subref(...)

But that doesn't actually give you a lexically scoped name that is equivalent to an ordinary subroutine's name. Hence, Perl 6 also provides...

Lexically scoped subs

You can declare "scoped" subroutines by explicitly putting a my or our on the front of the declaration:


    my sub privatestuff { ... }
    our sub semiprivatestuff { ... }

Both of these introduce a name into the current lexical scope, though in the case of our this is just an alias for a package subroutine of the same name. (As with other uses of our, you might want to introduce a lexical alias if your strictness level prohibits unqualified access to package subroutines.)

You can also declare lexically scoped macros:


    my macro sic { ... }

Global scoped subs

Perl 6 also introduces the notion of completely global variables that are visible from everywhere they aren't overridden by the current package or lexical scope. Such variables are named with a leading * on the identifier, indicating that the package prefix is a wildcard, if you will. Since subroutines are just a funny kind of variable, you can also have global subs:


    sub *print (*@list) { $*DEFOUT.print(@list) } }

In fact, that's more-or-less how some built-in functions like print could be implemented in Perl 6. (Methods like $*DEFOUT.print() are a different story, of course. They're defined off in a class somewhere. (Unless they're multimethods, in which case they could be defined almost anywhere, because multimethods are always globally scoped. (In fact, most built-ins including print will be multimethods, not subs. (But we're getting ahead of ourselves...))))

Signatures

One of Perl's strong points has always been the blending of positional parameters with variadic parameters.

"Variadic" parameters are the ones that vary. They're the "...And The Rest" list of values that many functions--like print, map, and chomp--have at the end of their call. Whereas positional parameters generally tell a function how to do its job, variadic parameters are most often used to pass the arbitrary sequences of data the function is supposed to do its job on/with/to.

In Perl 5, when you unpack the arguments to a sub like so:


    my ($a, $b, $c, @rest) = @_;

you are defining three positional parameters, followed by a variadic list. And if you give the sub a prototype of ($$$@) it will force the first three parameters to be evaluated in scalar context, while the remaining arguments are evaluated in list context.

The big problem with the Perl 5 solution is that the parameter binding is done at run time, which has run-time costs. It also means the metadata is not readily available outside the function body. We could just as easily have written it in some other form like:


    my $a = shift;
    my $b = shift;
    my $c = shift;

and left the rest of the arguments in @_. Not only is this difficult for a compiler to analyze, but it's impossible to get the metadata from a stub declaration; you have to have the body defined already.

The old approach is very flexible, but the cost to the user is rather high.

Perl 6 still allows you to access the arguments via @_ if you like, but in general you'll want to hoist the metadata up into the declaration. Perl 6 still fully supports the distinction between positional and variadic data--you just have to declare them differently. In general, variadic items must follow positional items both in declaration and in invocation.

In turn, there are at least three kinds of positional parameters, and three kinds of variadic parameters. A declaration for all six kinds of parameter won't win a beauty contest, but might look like this:


    method x ($me: $req, ?$opt, +$namedopt, *%named, *@list) {...}

Of course, you'd rarely write all of those in one declaration. Most declarations only use one or two of them. Or three or four... Or five or six...

There is some flexibility in how you pass some of these parameters, but the ordering of both formal parameters and actual arguments is constrained in several ways. For instance, positional parameters must precede non-positional, and required parameters must precede optional. Variadic lists must be attached either to the end of the positional list or the end of the named parameter list. These constraints serve a number of purposes:

  • They avoid user confusion.
  • They enable the system to implement calls efficiently.
  • Perhaps most importantly, they allow interfaces to evolve without breaking old code.

Since there are constraints on the ordering of parameters, similar parameters tend to clump together into "zones". So we'll call the ?, +, and * symbols you see above "zone markers". The underlying metaphor really is very much like zoning regulations--you know, the ones where your city tells you what you may or may not do on a chunk of land you think you own. Each zone has a set of possible uses, and similar zones often have overlapping uses. But you're still in trouble if you put a factory in the middle of a housing division, just as you're in trouble if you pass a positional argument to a formal parameter that has no position.

I was originally going to go with a semicolon to separate required from optional parameters (as Perl 5 uses in its prototypes), but I realized that it would get lost in the traffic, visually speaking. It's better to have the zone markers line up, especially if you decide to repeat them in the vertical style:


    method action ($self:
                int  $x,
                int ?$y,
                int ?$z,
             Adverb +$how,
        Beneficiary +$for,
           Location +$at is copy,
           Location +$toward is copy,
           Location +$from is copy,
             Reason +$why,
                    *%named,
                    *@list
                ) {...}

So optional parameters are all marked with zone markers.

In this section we'll be concentrating on the declaration's syntax rather than the call's syntax, though the two cannot be completely disintertwingled. The declaration syntax is actually the more complicated of the two for various good reasons, so don't get too discouraged just yet.

Positional parameters

The three positional parameter types are the invocant, the required parameters, and the optional positional parameters. (Note that in general, positional parameters may also be called using named parameter notation, but they must be declared as positional parameters if you wish to have the option of calling them as positional parameters.) All positional parameters regardless of their type are considered scalars, and imply scalar context for the actual arguments. If you pass an array or hash to such a parameter, it will actually pass a reference to the array or hash, just as if you'd backslashed the actual argument.

The invocant

The first argument to any method (or submethod) is its invocant, that is, the object or class upon which the method is acting. The invocant parameter, if present, is always declared with a colon following it. The invocant is optional in the sense that, if there's no colon, there's no explicit invocant declared. It's still there, and it must be passed by the caller, but it has no name, and merely sets the outer topic of the method. That is, the invocant's name is $_, at least until something overrides the current topic. (You can always get at the invocant with the self built-in, however. If you don't like "self", you can change it with a macro. See below.)

Ordinary subs never have an invocant. If you want to declare a non-method subroutine that behaves as a method, you should declare a submethod instead.

Multimethods can have multiple invocants. A colon terminates the list of invocants, so if there is no colon, all parameters are considered invocants. Only invocants participate in multimethod dispatch. Only the first invocant is bound to $_.

Macros are considered methods on the current parse state object, so they have an invocant.

Required parameters

Next (or first in the case of subs) come the required positional parameters. If, for instance, the routine declares three of these, you have to pass at least three arguments in the same order. The list of required parameters is terminated at the first optional parameter, that is the first parameter having any kind of zone marker. If none of those are found, all the parameters are required, and if you pass either too many or too few arguments, Perl will throw an exception as soon as it notices. (That might be at either compile time or run time.) If there are optional or variadic parameters, the required list merely serves as the minimum number of arguments you're allowed to pass.

Optional parameters

Next come the optional positional parameters. (They have to come next because they're positional.) In the declaration, optional positional parameters are distinguished from required parameters by marking the optional parameters with a question mark. (The parameters are not distinguished in the call--you just use commas. We'll discuss call syntax later.) All optional positional parameters are marked with ?, not just the first one. Once you've made the transition to the optional parameter zone, all parameters are considered optional from there to the end of the signature, even after you switch zones to + or *. But once you leave the positional zone (at the end of the ? zone), you can't switch back to the positional zone, because positionals may not follow variadics.

If there are no variadic parameters following the optional parameters, the declaration establishes both a minimum and a maximum number of allowed arguments. And again, Perl will complain when it notices you violating either constraint. So the declaration:


    sub *substr ($string, ?$offset, ?$length, ?$repl) {...}

says that substr can be called with anywhere from 1 to 4 scalar parameters.

Variadic parameters

Following the positional parameters, three kinds of variadic parameters may be declared. Variadic arguments may be slurped into a hash or an array depending on whether they look like named arguments or not. "Slurpy" parameters are denoted by a unary * before the variable name, which indicates that an arbitrary number of values is expected for that variable.

Additional named parameters may be placed at the end of the declaration, or marked with a unary + (because they're "extra" parameters). Since they are--by definition--in the variadic region, they may only be passed as named arguments, never positionally. It is illegal to mark a parameter with ? after the first + or *, because you can't reenter a positional zone from a variadic zone.

Unlike the positional parameters, the variadic parameters are not necessarily declared in the same order as they will be passed in the call. They may be declared in any order (though the exact behavior of a slurpy array depends slightly on whether you declare it first or last).

Named-only parameters

Parameters marked with a + zone marker are named-only parameters. Such a parameter may never be passed positionally, but only by name.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

The slurpy hash

A hash declaration like *%named indicates that the %named hash should slurp up all the remaining named arguments (that is, those that aren't bound explicitly to a specific formal parameter).

The slurpy array

An array declaration like *@rest indicates that the @rest array should slurp up all the remaining items after the named parameters. (Later we'll discuss how to disambiguate the situation when the beginning of your list looks like named parameters.) If you shift or pop without an argument, it shifts or pops whatever slurpy array is in scope. (So in a sense, your main program has an implicit slurpy array of *@*ARGS because that's what shift shifts there.)

Formal parameter syntax

Formal parameters have lexical scope, as if they were declared with a my. (That is reflected in the pseudocode in Appendix B.) Their scope extends only to the end of the associated block. Formal parameters are the only lexically scoped variables that are allowed to be declared outside their blocks. (Ordinary my and our declarations are always scoped to their surrounding block.)

Any subroutine can have a method signature syntactically, but subsequent semantic analysis will reject mistakes like invocants on subroutines. This is not just motivated by laziness. I think that "You can't have an invocant on a subroutine" is a better error message than "Syntax error".


    rule signature :w {
        [<parameter> [<[,:]> <parameter> ]* ]?
    }

In fact, we just treat colon as a funny comma here, so any use of extra colons is detected in semantic analysis. Similarly, zone markers are semantically restricted, not syntactically. Again, "Syntax error" doesn't tell you much. It's much more informative to see "You can't declare an optional positional parameter like ?$flag after a slurpy parameter like *@list", or "You can't use a zone marker on an invocant".

Here's what an individual parameter looks like:


    rule parameter :w {
        [ <type>? <zone>? <variable> <trait>* <defval>?
        | \[ <signature> \]     # treat single array ref as an arg list
        ]
    }

    rule zone {
        [ \?            # optional positional
        | \*            # slurpy array or hash
        | \+            # optional named-only
        ]
    }

    rule variable { <sigil> <name> [ \( <siglet> \) ]? }
    rule sigil { <[$@%&]> <[*.?^]>? }   # "What is that, swearing?"

Likewise, we parse any sigil here, but semantically reject things like $*x or $?x. We also reject package-qualified names and indirect names. We could have a <simplevar> rule that only admits <ident>, but again, "Syntax error" is a lot less user-friendly than "You can't use a package variable as a parameter, dimwit!"

Similarly, the optional <siglet> in <variable> is allowed only on & parameters, to say what you expect the signature of the referenced subroutine to look like. We should talk about siglets.

Siglets

The <siglet> in the <variable> rule is an example of a nameless signature, that is, a "small signature", or "siglet". Signatures without names are also used for return types and context traits (explained later). A siglet is sequential list of paramlets. The paramlets do not refer to actual variable names, nor do they take defaults:


    rule siglet :w {
        [<paramlet> [<[,:]> <paramlet> ]* ]?
    }

    rule paramlet :w {
        [ <type> <zone>? <varlet>? <trait>*     # require type
        | <zone> <varlet>? <trait>*             # or zone
        | <varlet> <trait>*                     # or varlet
        | \[ <siglet> \]        # treat single array ref as an arg list
        ]
    }

In place of a <variable>, there's a kind of stub we'll call a "varlet":


    rule varlet :w {
        <sigil> [ \( <siglet \) ]?
    }

As with the <variable> rule, a <varlet>'s optional siglet is allowed only on & parameters.

Here's a fancy example with one signature and several siglets.


    sub (int *@) imap ((int *@) &block(int $),
                        int *@vector is context(int) {...}

You're not expected to understand all of that yet. What you should notice, however, is that a paramlet is allowed to be reduced to a type (such as int), or a zone (such as ?), or a varlet (such as $), or some sequence of those (such as int *@). But it's not allowed to be reduced to a null string. A signature of () indicates zero arguments, not one argument that could be anything. Use ($) for that. Nor can you specify four arguments by saying (,,,). You have to put something there.

Perl 6 siglets can boil down to something very much like Perl 5's "prototype pills". However, you can't leave out the comma between parameters in Perl 6. So you have to say ($,$) rather than ($$), when you want to indicate a list of two scalars.

If you use a <siglet> instead of a <signature> in declaring a subroutine, it will be taken as a Perl 5 style prototype, and all args still come in via @_. This is a sop to the Perl5-to-Perl6 translator, which may not be able to figure out how to translate a prototype to a signature if you've done something strange with @_. You should not use this feature in new code. If you use a siglet on a stub declaration, you must use the same siglet on the corresponding definition as well, and vice versa. You can't mix siglets and signatures that way. (This is not a special rule, but a natural consequence of the signature matching rules.)

Siglets and multimethods

For closure parameters like &block(int $), the associated siglet is considered part of its name. This is true not just for parameters, but anywhere you use the & form in your program, because with multimethods there may be several routines sharing the same identifier, distinguishable only by their type signature:


    multi factorial(int $a) { $a<=1 ?? 1 :: $a*factorial($a-1) }
    multi factorial(num $a) { gamma(1+$a) }

    $ref = &factorial;          # illegal--too ambiguous
    $ref = &factorial($);       # illegal--too ambiguous
    $ref = &factorial(int);     # good, means first one.
    $ref = &factorial(num);     # good, means second one.
    $ref = &factorial(complex); # bad, no such multimethod.

Note that when following a name like "&factorial", parentheses do not automatically mean to make a call to the subroutine. (This Apocalypse contradicts earlier Apocalypses. Guess which one is right...)


    $val = &factorial($x);      # illegal, must use either
    $val = factorial($x);       #   this or
    $val = &factorial.($x);     #   maybe this.

In general, don't use the & form when you really want to call something.

Formal parameter traits

Other than type, zone, and variable name, all other information about parameters is specified by the standard trait syntax, generally introduced by is. Internally even the type and zone are just traits, but syntactically they're out in front for psychological reasons. Whose psychological reasons we won't discuss.

is constant (default)

Every formal parameter is constant by default, meaning primarily that the compiler won't feel obligated to construct an lvalue out the actual argument unless you specifically tell it to. It also means that you may not modify the parameter variable in any way. If the parameter is a reference, you may use it to modify the referenced object (if the object lets you), but you can't assign to it and change the original variable passed to the routine.

is rw

The rw trait is how you tell the compiler to ask for an lvalue when evaluating the actual argument for this parameter. Do not confuse this with the rw trait on the subroutine as a whole, which says that the entire subroutine knows how to function as an lvalue. If you set this trait, then you may modify the variable that was passed as the actual argument. A swap routine would be:


    sub swap ($a is rw, $b is rw) { ($a,$b) = ($b,$a) }

If applied to a slurpy parameter, the rw trait distributes to each element of the list that is bound to the parameter. In the case of a slurpy hash, this implies that the named pairs are in an lvalue context, which actually puts the right side of each named pair into lvalue context.

Since normal lvalues assume "is rw", I suppose that also implies that you can assign to a pair:


    (key => $var) = "value";

or even do named parameter binding:


    (who => $name, why => $reason) := (why => $because, who => "me");

which is the same as:


    $name   := "me";
    $reason := $because;

And since a slurpy hash soaks up the rest of the named parameters, this also seems to imply that binding a slurpy rw hash actually makes the hash values into rw aliases:


    $a = "a"; $b = "b";
    *%hash := (a => $a, b => $b);
    %hash{a} = 'x';
    print $a;   # prints "x"

That's kinda scary powerful. I'm not sure I want to document that... ["Too late!" whispers Evil Damian.]

is copy

This trait requests copy-in semantics. The variable is modifiable by you, but you're only modifying your own private copy. It has the same effects as assigning the argument to your own my variable. It does not do copy-out.

If you want both copy-in and copy-out semantics, declare it rw and do your own copying back and forth, preferably with something that works even if you exit by exception (if that's what you want):


    sub cico ($x is rw) {
        my $copy = $x;
        LAST { $x = $copy }
        ...
    }

Though if you're using a copy you probably only want to copy-out on success, so you'd use a KEEP block instead. Or more succinctly, using the new will syntax:


    sub cicomaybe ($x is rw) {
        my $copy will keep { $x = $copy } = $x;
        ...
    }

is ref

This trait explicitly requests call-by-reference semantics. It lets you read and write an existing argument but doesn't attempt to coerce that argument to an lvalue (or autovivify it) on the caller end, as rw would. This trait is distinguished from a parameter of type Ref, which merely asserts that the return type of the parameter is a reference without necessarily saying anything about calling convention. You can without contradiction say:


    sub copyref (Ref $ref is copy) {...}

meaning you can modify $ref, but that doesn't change whatever was passed as the argument for that parameter.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Defaults

Default values are also traits, but are written as assignments and must come at the end of the formal parameter for psychological reasons.


    rule defval :w { \= <item> }

That is:


    sub trim ( Str $_ is rw, Rule ?$remove = /\s+/ ) {
            s:each/^ <$remove> | <$remove> $//;
    }

lets you call trim as either:


    trim($input);

or:


    trim($input, /\n+/);

It's very important to understand that the expression denoted by item is evaluated in the lexical scope of the subroutine definition, not of the caller. If you want to get at the lexical scope of the caller, you have to do it explicitly (see CALLER:: below). Note also that an item may not contain unbracketed commas, or the parser wouldn't be able to reliably locate the next parameter declaration.

Although the default looks like an assignment, it isn't one. Nor is it exactly equivalent to //=, because the default is set only if the parameter doesn't exist, not if it exists but is undefined. That is, it's used only if no argument is bound to the parameter.

An rw parameter may only default to a valid lvalue. If you find yourself wanting it to default to an ordinary value because it's undefined, perhaps you really want //= instead:


    sub multprob ($x is rw, $y) {
        $x //= 1.0;     # assume undef means "is certain"
        $x *= $y;
    }

Syntactically, you can put a default on a required parameter, but it would never be used because the argument always exists. So semantic analysis will complain about it. (And I'd rather not say that adding a default implies it's optional without the ? zone marker.)

Formal parameter types

Formal parameters may have any type that any other variable may have, though particular parameters may have particular restrictions. An invocant needs to be an object of an appropriate class or subclass, for instance. As with ordinary variable declarations the type in front is actually the return type, and you can put it afterwards if you like:


    sub foo (int @array is rw) {...}
    sub foo (@array of int is rw) {...}
    sub foo (@array is Array of int is rw) {...}

The type of the actual argument passed must be compatible with (but not necessarily identical to) the formal type. In particular, for methods the formal type will often indicate a base class of the actual's derived class. People coming from C++ must remember that all methods are "virtual" in Perl.

Closure parameters are typically declared with &:


    sub mygrep (&block, *@list is rw) {...}

Within that subroutine, you can then call block() as an ordinary subroutine with a lexically scoped name. If such a parameter is declared without its own parameter signature, the code makes no assumptions about the actual signature of the closure supplied as the actual argument. (You can always inspect the actual signature at run time, of course.)

You may, however, supply a signature if you like:


    sub mygrep (&block($foo), *@list is rw) {
        block(foo => $bar);
    }

With an explicit signature, it would be error to bind a block to &block that is not compatible. We're leaving "compatible" undefined for the moment, other than to point out that the signature doesn't have to be identical to be compatible. If the actual subroutine accepted one required parameter and one optional, it would work perfectly fine, for instance. The signature in mygrep is merely specifying what it requires of the subroutine, namely one positional argument named "$foo". (Conceivably it could even be named something different in the actual routine, provided the compiler turns that call into a positional one because it thinks it already knows the signature.)

Calling subroutines

The typical subroutine or method is called a lot more often than it is declared. So while the declaration syntax is rather ornate, we strive for a call syntax that is rather simple. Typically it just looks like a comma-separated list. Parentheses are optional on predeclared subroutine calls, but mandatory otherwise. Parentheses are mandatory on method calls with arguments, but may be omitted for argumentless calls to methods such as attribute accessors. Parentheses are optional on multimethod and macro calls because they always parse like list operators. A rule may be called like a method but is normally invoked within a regex via the <rule> syntax.

As in Perl 5, within the list there may be an implicit transition from scalar to list context. For example, the declaration of the standard push built-in in Perl 6 probably looks like this:


    multi *push (@array, *@list) {...}

but you still generally call it as you would in Perl 5:


    push(@foo, 1, 2, 3);

This call has two of the three kinds of call arguments. It has one positional argument, followed by a variadic list. We could imagine adding options to push sometime in the future. We could define it like this:


    multi *push (@array, ?$how, *@list) {...}

That's just an optional positional parameter, so you'd call it like this:


    push(@foo, "rapidly", 1,2,3)

But that won't do, actually, since we used to allow the list to start at the end of the positional parameters, and any pre-existing push(@foo,1,2,3) call to the new declaration would end up mapping the "1" onto the new optional parameter. Oops...

If instead we force new parameters to be in named notation, like this:


    multi *push (@array, *@list, +$how) {...}

t�en we can say:


    push(@foo, how => "rapidly", 1,2,3)and it's no longer ambiguous.  Since dhow is i� the named-only zone,
it can never be set positionally, and the old calls to:


    push(@foo, 1,2,3);

still work fine, because *@list is still at the end of the positional parameter zone. If we instead declare that:


    multi *push (@array, +$how, *@list) {...}

we could still say:


    push(@foo, how => "rapidly", 1,2,3)

but this becomes illegal:


    push(@foo, 1,2,3);

because the slurpy array is in the named-only zone. We'll need an explicit way to indicate the start of the list in this case. I can think of lots of (mostly bad) ways. You probably can too. We'll come back to this...

Actual arguments

So the actual arguments to a Perl function are of three kinds: positional, named, and list. Any or all of these parts may be omitted, but whenever they are there, they must occur in that order. It's more efficient for the compiler (and less confusing to the programmer) if all the positional arguments come before all the non-positional arguments in the list. Likewise, the named arguments are constrained to occur before the list arguments for efficiency--otherwise the implementation would have to scan the entire list for named arguments, and some lists are monstrous huge.

We'd call these three parts "zones" as well, but then people would get them confused with our six declarative zones. In fact, extending the zoning metaphor a bit, our three parts are more like houses, stores, and factories (real ones, not OO ones, sheesh). These are the kinds of things you actually find in residential, commercial, and industrial zones. Similarly, you can think of the three different kinds of argument as the things you're allowed to bind in the different parameter zones.

A house is generally a scalar item that is known for its position; after all, "there's no place like home". Um, yeah. Anyway, we usually number our houses. In the US, we don't usually name our houses, though in the UK they don't seem to mind it.

A store may have a position (a street number), but usually we refer to stores by name. "I'm going out to Fry's" does not refer to a particular location, at least not here in Silicon Valley. "I'm going out to McDonald's" doesn't mean a particular location anywhere in the world, with the possible exception of "not Antarctica".

You don't really care exactly where a factory is--as long as it's not in your back yard--you care what it produces. The typical factory is for mass producing a series of similar things. In programming terms, that's like a generator, or a pipe...or a list. And you mostly worry about how you get vast quantities of stuff into and out of the factory without keeping the neighbors awake at night.

So our three kinds of arguments map onto the various parameter zones in a similar fashion.

The positional arguments

Obviously, actual positional arguments are mapped onto the formal parameters in the order in which the formal positional parameters are declared. Invocant parameters (if any) must match invocant arguments, the required parameters match positional arguments, and then any additional non-named arguments are mapped onto the optional positional parameters. However, as soon as the first named argument is seen (that cannot be mapped to an explicitly typed Pair or Hash parameter) this mapping stops, and any subsequent positional parameters may only be bound by name.

The named arguments

After the positional argument part, you may pass as many named pairs as you like. These may bind to any formal parameter named in the declaration, whether declared as positional or named. However, it is erroneous to simultaneously bind a parameter both by position and by name. Perl may (but is not required to) give you a warning or error about this. If the problem is ignored, the positional parameter takes precedence, since the name collision might have come in by accident as a result of passing extra arguments intended for a different routine. Problems like this can arise when passing optional arguments to all the base classes of the current class, for instance. It's not yet clear how fail-soft we should be here.

Named arguments can come in either as Pair or Hash references. When parameter mapper sees an argument that is neither a Pair nor a Hash, it assumes it's the end of the named part and the beginning of the list part.

All unbound named arguments are bound to elements of the slurpy hash, if one was declared. If no slurpy hash is declared, an exception is thrown (although some standard methods, like BUILD, will provide an implicitly declared slurpy hash--known as %_ by analogy to @_--to handle surplus named arguments).

At the end of named argument processing, any unmapped optional parameter ends up with the value undef unless a default value is declared for it. Any unmapped required parameter throws an exception.

The slurpy array

All remaining arguments are bound to the slurpy array, if any. If no slurpy array is specified, any remaining arguments cause an exception to be thrown. (You only get an implicit *@_ slurpy array when the signature is omitted entirely. Otherwise we could never validly give the error "Too many arguments".)

No argument processing is done on this list. If you go back to using named pairs at the end of the list, for instance, you'll have to pop those off yourself. But since the list is potentially very long, Perl isn't going to look for those on your behalf.

Indeed, the list could be infinitely long, and maybe even a little longer than that. Perl 5 always flattens lists before calling the subroutine. In Perl 6, list flattening is done lazily, so a list could contain several infinite entries:


    print(1..Inf, 1..Inf);

That might eventually give the print function heartburn, of course...

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Variadic transitions

There are, then, two basic transitions in argument processing. First is the transition from positional to named arguments. The second is from named arguments to the variadic list. It's also possible to transition directly from positional arguments to the variadic list if optional positional arguments have been completely specified. That is, the slurp array could just be considered the next optional positional parameter in that case, as it is in push.

But what if you don't want to fill out all the optional parameters, and you aren't planning to use named notation to skip the rest of them? How can you make both transitions simultaneously? There are two workarounds. First, suppose we have a push-like signature such as this:


    sub stuff (@array, ?$how, *@list) {...}

The declarative workaround is to move the optional parameters after the slurp array, so that they are required to be specified as named parameters:


    sub stuff (@array, *@list, +$how) {...}

Then you can treat the slurp array as a positional parameter. That's the solution we used to add an extra argument to push earlier, where the list always starts at the second argument.

On the calling end, you don't have any control of the declaration, but you can always specify one of the arguments as named, either the final positional one, or the list itself:


    stuff(@foo, how => undef, 1,2,3)
    stuff(@foo, list => (1,2,3))

The latter is clearer and arguably more correct, but it has a couple of minor problems. For one thing, you have to know what the parameter name is. It's all very well if you have to know the names of optional parameters, but every list operator has a list that you really ought to be able to feed without knowing its name.

So we'll just say that the actual name of the slurpy list parameter is "*@". You can always say this:


    stuff(@foo, '*@' => (1,2,3))

That's still a lot of extra unnecessary cruft--but we can do better. List operators are like commands in Unix, where there's a command line containing a program name and some options, and streams of data coming in and going out via pipes. The command in this case is stuff, and the option is @foo, which says what it is we're stuffing. But what about the streams of stuff going in and out? Perl 6 has lazy lists, so they are in fact more like streams than they used to be.

There will be two new operators, called pipe operators, that allow us to hook list generators together with list consumers in either order. So either of these works:


    stuff @foo <== 1,2,3
    1,2,3 ==> stuff @foo

The (ir)rationale for this is provided in Appendix A.

To be sure, these newfangled pipe operators do still pass the list as a "*@"-named argument, because that allows indirection in the entire argument list. Instead of:


    1,2,3 ==> stuff @foo

you can pull everything out in front, including the positional and named parameters, and build a list that gets passed as "splat" arguments (described in the next section) to stuff:


    list(@foo, how => 'scrambled' <== 1,2,3)
        ==> stuff *;

In other words:


    list(@foo, how => 'scrambled' <== 1,2,3) ==> stuff *;

is equivalent to:


    list(@foo, how => 'scrambled' <== 1,2,3) ==> stuff *();

which is equivalent to:


    stuff *(list(@foo, how => 'scrambled' <== 1,2,3));

The "splat" and the list counteract each other, producing:


    stuff(@foo, how => 'scrambled' <== 1,2,3);

So what stuff actually sees is exactly as if you called it like this:


    stuff(@foo, how => 'scrambled', '*@' => (1,2,3));

which is equivalent to:


    stuff @foo, how => 'scrambled', 1, 2, 3;

And yes, the ==> and <== operators are big, fat, and obnoxiously noticeable. I like them that way. I think the pipes are important and should stand out. In postmodern architecture the ducts are just part of the deconstructed decor. (Just don't anyone suggest a ==>= operator. Just...don't.)

The ==> and <== operators have the additional side effect of forcing their blunt end into list context and their pointy end into scalar context. (More precisely, it's not the expression on the pointy end that is in scalar context, but rather the positional arguments of whatever list function is pointed to by the pointy end.) See Appendix A for details.

Context

As with Perl 5, the scalar arguments are evaluated in scalar context, while the list arguments are evaluated in list context. However, there are a few wrinkles.

Overriding signature with *

Perl 5 has a syntax for calling a function without paying any attention to its prototype, but in Perl 6 that syntax has been stolen for a higher purpose (referential purity). Also, sometimes you'd like to be able to ignore part of a signature rather than the whole signature. So Perl 6 has a different notation, unary *, for disabling signature checking, which we've mentioned in earlier Apocalypses, and which you've already seen in the form of the stuff * above. (Our splat in the stuff * above is in fact unary, but the optional argument is missing, because the list is supplied via pipe.)

The first splatted term in an argument list causes all prior terms to be evaluated in scalar context, and all subsequent terms to be evaluated in list context. (Splat is a no-op in list context, so it doesn't matter if there are more splatted terms.) If the function wants more positional arguments, they are assumed to come from the generated list, as if the list had been specified literally in the program at that point as comma-separated values.

With splat lists, some of the argument processing may have to be deferred from compile time to runtime, so in general such a call may run slower than the ordinary form.

Context unknown at compile time

If Perl can't figure out the signature of a function at compile time (because, for instance, it's a method and not a function), then it may not be known which arguments are in scalar or list context at the time they are evaluated. This doesn't matter for Perl variables, because in Perl 6, they always return a reference in either scalar or list context. But if you call a function in such an indeterminate context, and the function doesn't have a return value declared that clarifies whether the function behaves differently in scalar or list context, then one of two things must happen. The function must either run in an indeterminate context, or the actual call to the function must be delayed until the context is known. It is not yet clear which of these approaches is the lesser evil. It may well depend on whether the function pays more attention to its dynamic context or to global values. A function with no side effects and no global or dynamic dependencies can be called whenever we like, but we're not here to enforce the functional paradigm. Interesting functions may pay attention to their context, and they may have side effects such as reading from an input stream in a particular order.

A variant of running in indeterminate context is to simply assume the function is running in list context. (That is, after all, what Perl 5 does on methods and on not-yet-declared subroutines.) In Perl 6, we may see most such ambiguities resolved by explicit use of the <== operator to force preceding args into scalar context, and the following args into list context. Individual arguments may also be forced into scalar or list context, of course.

By the way, if you mix unary splat with <==, only the args to the left of the splat are forced into scalar context. (It can do this because <== governs everything back to the list operator, since it has a precedence slightly looser than comma.) So, given something like:


    @moreargs = (1,2,3);
    mumble $a, @b, c(), *@moreargs <== @list;

we can tell just by looking that $a, @b, and c() are all evaluated in scalar context, while @moreargs and @list are both in list context. It is parsed like this:


    mumble( ($a, @b, c(), (*@moreargs)) <== (@list) );

You might also write that like this:


    @moreargs = list(1,2,3 <== @list);
    mumble $a, @b, c(), *@moreargs;

In this case, we can still assume that $a, @b, c() are in scalar context, because as we mentioned in the previous section, the * forces it. (That's because there's no reason to put the splat if you're already in list context.)

Before we continue, you probably need a break. Here, have a break:


    *******************************************************
    ******************** Intermission *********************
    *******************************************************

Variations on a theme

Welcome back.

We've covered the basics up till now, but there are a number of miscellaneous variations we left out in the interests of exposition. We'll now go back to visit some of those issues.

Typed slurps

Sometimes you want to specify that the variadic list has a particular recurring type, or types. This falls out naturally from the slurp array syntax:


    sub list_of_ints ($a, $b, Int *@ints) { ... }
    sub list_of_scalars (Scalar *@scalars) { ... }

These still evaluate the list in list context. But if you declare them as:


    sub intlist ($a, $b, Int *@ints is context(Int)) { ... }
    sub scalarlist (Scalar *@scalars is context(Scalar)) { ... }

then these provide a list of Int or Scalar contexts to the caller. If you call:


    scalarlist(@foo, %bar, baz())

you get two scalar references and the scalar result of baz(), not a flattened list. You can have lists without list context in Perl 6!

If you want to have alternating types in your list, you can. Just specify a tuple type on your context:


    strintlist( *@strints is context(Str,Int)) { ... }

Perl 5's list context did not do lazy evaluation, but always flattened immediately. In Perl 6 the default list context "is context(Lazy)". But you can specify "is context(Eager)" to get back to Perl 5 semantics of immediate flattening.

As a sop to the Perl5-to-Perl6 translator (and to people who have to read translated programs), the Eager context can also be specified by doubling the slurpy * on the list to make it look like a pair of rollers that will squish anything flat:


    sub p5func ($arg, **@list) { ... }

The "eager splat" is also available as a unary operator to attempt eager flattening on the rvalue side:


    @foo = **1..Inf;  # Test our "out of memory" handler...

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Sublist formals

It's often the case that you'd like to treat a single array argument as if it were an argument list of its own. Well, you can. Just put a sublist signature in square brackets. This is particularly good for declaring multimethods in a functional programming mindset:


    multi apply (&func, []) { }
    multi apply (&func, [$head, *@tail]) {
        return func($head), apply(&func, @tail);
    }

    @squares := apply { $_ * $_ } [1...];

Of course, in this case, the first multimethod is never called because the infinite list is never null no matter how many elements we pull off the front. But that merely means that @squares is bound to an infinite list generator. No big deal, as long as you don't try to flatten the list...

Note that, unlike the example in the previous section which alternated strings and integers, this:


    strintlist( [Str, Int] *@strints ) { ... }

implies single array references coming in, each containing a string and an integer.

Of course, this may be a bad example insofar as we could just write:


    multi apply (&func) { }
    multi apply (&func, $head, *@tail) {
        return func($head), apply(&func, *@tail);
    }

    @squares := apply { $_ * $_ } *1...;

It'd be nice to lose the * though on the calls. Maybe what we really want is a slurpy scalar in front of the slurpy array, where presumably the <== maps to the first slurpy scalar or hash (or it could be passed positionally):


    multi apply (&func) { }
    multi apply (&func, *$head, *@tail) {
        return func($head), apply(&func <== @tail);
    }

    @squares := apply { $_ * $_ } 1...;

Yow, I think I could like that if I tried.

So let's say for now that a slurpy scalar parameter just pulls the first (or next) value off of the the slurpy list. The [] notation is still useful though for when you really do have a single array ref coming in as a parameter.

Attributive parameters

It is typical in many languages to see object initializers that look like this (give or take a keyword):


    function init (a_arg, b_arg, c_arg) {
        a = a_arg;
        b = b_arg;
        c = c_arg;
    }

Other languages try to improve the situation without actually succeeding. In a language resembling C++, it might look more like this:


    method init (int a_arg, int b_arg, int c_arg)
        : a(a_arg), b(b_arg), c(c_arg) {}

But there's still an awful lot of redundancy there, not to mention inconsistent special syntax.

Since (as proven by Perl 5) signatures are all about syntactic sugar anyway, and since Perl 6 intentionally makes attribute variables visually distinct from ordinary variables, we can simply write this in Perl 6 as:


    submethod BUILD ($.a, $.b, $.c) {}

Any parameter that appears to be an attribute is immediately copied directly into the corresponding object attribute, and no lexical parameter is generated. You can mix these with ordinary parameters--the general rule of thumb for an initializer is that you should see each dotted attribute at least once:


    submethod BUILD ($.a, $.b, $c) {
        $.c = mung($c);
    }

This feature is primarily intended for use in constructors and initializers, but Perl does not try to guess which subroutines fall into that category (other than the fact that Perl 6 will implicitly call certain conventional names like CREATE and BUILD.)

However, submethods such as BUILD are assumed to have an extra *%_ parameter to soak up any extra unrecognized named arguments. Ordinarily you must declare a slurp-hash explicitly to get that behavior. But BUILD submethods are always called with named arguments (except for the invocant), and often have to ignore arguments intended for other classes participating in the current construction. It's likely that this implicit *%_ feature extends to other routines declared in all-caps as well, and perhaps all submethods.

As in Perl 5, subroutines declared in all-caps are expected to be called automatically most of the time--but not necessarily all the time. The BUILD routine is a good example, because it's only called automatically when you rely on the default class initialization rules. But you can override those rules, in which case you may have to call BUILD yourself. More on that in Apocalypse 12. Or go to one of Damian's Perl 6 talks...

Other kinds of subroutines

Closures

All blocks are considered closures in Perl 6, even the blocks that declare modules or classes (presuming you use the block form). A closure is just an anonymous subroutine that has access its lexical context. The fact that some closures are immediately associated with names or have other kinds of parameter declarations does not change the fact that an anonymous bare block without parameters is also a kind of subroutine. Of course, if the compiler can determine that the block is only executed inline, it's free to optimize away all the subroutine linkage--but not the lexical linkage. It can only optimize away the lexical linkage if no external lexicals are accessed (or potentially accessed, in the case of eval).

Pointy subs

As introduced in Apocalypse 4, loops and topicalizers are often written with a special form of closure declaration known these days as "pointy subs". A pointy sub is exactly equivalent to a standard anonymous sub declaration having the same parameters. It's almost pure syntactic sugar--except that we embrace syntactic sugar in Perl when it serves a psychological purpose (not to be confused with a logical psycho purpose, which we also have).

Anyway, when you say:


    -> $a, $b, $c { ... }

it's almost exactly the same as if you'd said:


    sub ($a, $b, $c) { ... }

only without the parentheses, and with the cute arrow that indicates the direction of data flow to that part of your brain that consumes syntactic glucose at a prodigious rate.

Since the parentheses around the signature are missing, you can't specify anything that would ordinarily go outside the parentheses, such as the return type or other subroutine traits. But you may still put traits or zone markers on each individual formal parameter.

Also, as a "sub-less" declaration, you can't return from it using return, because despite being a closure, it's supposed to look like a bare Block embedded in a larger Routine, and users will expect return to exit from the "real" subroutine. All of which just means that, if you need those fancy extras, use a real sub sub, not a pointy one.

Placeholders

Also as discussed in Apocalypse 4, a bare block functioning as a closure can have its parameters declared internally. Such parameters are of the form:


    rule placeholder { <sigil> \^ <ident> }

Placeholder parameters are equivalent to required position parameters declared in alphabetical order. (Er, Unicodical order, really.) For example, the closure:


    { $^fred <=> $^barney }

has the same signature as the pointy sub:


    -> $barney, $fred { $fred <=> $barney }

or the standard anonymous sub:


    sub ($barney, $fred) { $fred <=> $barney }

On first hearing about the alphabetical sorting policy, some otherwise level-headed folks immediately panic, imagining all sorts of ways to abuse the mechanism for the purposes of obfuscation. And surely there are many ways to abuse many of the features in Perl, more so in Perl 6. The point of this mechanism, however, is to make it drop-dead easy to write small, self-contained closures with a small number of parameters that you'd probably give single-character alphabetical names to in any event. If you want to get fancier than that, you should probably be using a fancier kind of declaration. I define "small number" as approximately e ± π. But as is generally the case in Perl, you get to pick your own definition of "small number". (Or at the very least, you get to pick whether to work with a company that has already defined "small number" for you.)

As bare rvalue variables embedded in the code, you may not put any traits or zone markers on the placeholders. Again, the desire to do so indicates you should be using a fancier form of declaration.

Methods

Perl 5 just used subroutines for methods. This is okay as long as you don't want to declare any utility subroutines in your class. But as soon as you do, they're inherited in Perl 5, which is not what you want. In Perl 6, methods and subroutines still share the same namespace, but a method must be declared using the method keyword. This is good documentation in any event, and further allows us to intuit an invocant where none is declared. (And we know that none is declared if there's no colon after the first argument, at least in the case of an ordinary method.)

Submethods

There are certain implementation methods that want to be inherited in general so that you can specify a default implementation, but that you want the class to be able to override without letting derived classes inherit the overridden method from this class. That is, they are scoped like utility subroutines, but can be called as if they are methods, without being visible outside the class. We call these hybrids "submethods", and so there's a submethod keyword to declare them. Submethods are simultaneously subs and methods. You can also think of them as something less than a method, as the "sub" works in the word "subhuman". Or you can think of them as underneath in the infrastructural sense, as in "subterranean".

Routines that create, initialize, or destroy the current object tend to fall into this category. Hence, the BUILD routine we mentioned earlier is ordinarily declared as a submethod, if you don't want to inherit the standard BUILD method defined in the Object class. But if you override it, your children still inherit BUILD from Object.

Contrariwise, if you don't like Object's default BUILD method, you can define an entire new class of classes that all default to your own BUILD method, as long as those classes derive from your new base object with superior characteristics. Each of those derived classes could then define a submethod to override your method only for that class, while classes derived from those classes could still inherit your default.

And so on, ad OOium.

Multimethods

Some kinds of programming map easily onto the standard model in which a method has a single invocant. Other kinds of programming don't. Perl 6 supplies support for the latter kind of programming, where the relationships between classes are just as interesting as the classes themselves. In some languages, all methods are multimethods. Perl 6 doesn't go quite that far--you must declare your multimethods explicitly. To do so, use the multi keyword in place of method, and optionally place a colon after the list of invocants in the declaration, unless you want them all to be invocants. Then your multimethod will be registered globally as a being of interest to all the types of its invocants, and will participate in multimethod dispatch.

It is beyond the scope of this Apocalypse to specify exactly how multimethod dispatch works (see Apocalypse 12, someday), but we can tell you that, in general, you call a multimethod as if it were an ordinary subroutine, and the dispatcher figures out on your behalf how many of the arguments are invocants. This may sound fancy to you, but many of the functions that are built into Perl 5 are not built into Perl 6, at least, not as keywords. Instead they are either defined as global subroutines or as multimethods, single invocant multimethods in many cases. When you call a function like close($handle), it'll first look to see if there's a close subroutine defined in your scope, and if not, it will dispatch it as a multimethod. Likewise, for something like sysread, you can call it either as a method:


    sysread $handle: $buffer, $length

or as a function:


    sysread $handle, $buffer, $length

In the first case, it's explicitly dispatching on the handle, because a colon in place of the first comma indicates an invocant. (That's our new indirect object syntax, in fact. Perl 6 does not support the Perl 5 syntax of just leaving whitespace between the indirect object and the subsequent arguments.)

In the second case, it looks for a sysread subroutine, doesn't find it (we hope), and calls multimethod dispatch on it. And it happens that the multimethod dispatch is smart enough to find the ordinary single-invocant sysread method, even though it may not have been explicitly declared a multimethod. Multimethod dispatch happens to map directly onto ordinary method dispatch when there's only one invocant.

At least, that's how it works this week...

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Rules

Rules were discussed in Apocalypse 5. They are essentially methods with an implicit invocant, consisting of the object containing the current pattern matching context. To match the internals of regex syntax, traits attached to rules are typically written as ":w" rather than "is w", but they're essentially the same thing underneath.

It's possible to call a rule as if it were a method, as long as you give it the right arguments. And a method defined in a grammar can be called as if it were a rule. They share the same namespace, and a rule really is just a method with a funny syntax.

Macros

A macro is a function that is called immediately upon completion of the parsing of its arguments. Macros must be defined before they are used--there are no forward declarations of macros, and while a macro's name may be installed in either a package or a lexical scope, its syntactic effect can only be lexical, from the point of declaration (or importation) to the end of the current lexical scope.

Every macro is associated (implicitly or explicitly) with a particular grammar rule that parses and reduces the arguments to the macro. The formal parameters of a macro are special in that they must be derived somehow from the results of that associated grammar rule. We treat macros as if they were methods on the parse object returned by the grammar rule, so the first argument is passed as if it were an invocant, and it is always bound to the current parse tree object, known as $0 in Apocalypse 5. (A macro is not a true method of that class, however, because its name is in your scope, not the class's.)

Since the first parameter is treated as an invocant, you may either declare it or leave it implicit in the actual declaration. In either case, the parse tree becomes the current topic for the macro. Hence you may refer to it as either $_ or $0, even if you don't give it a name.

Subsequent parameters may be specified, in which case they bind to internal values of $0 in whatever way makes sense. Positional parameters bind to $1, $2, etc. Named parameters bind to named elements of $0. A slurpy hash is really the same as $0, since $0 already behaves as a hash. A slurpy array gets $1, $2, etc., even if already bound to a positional parameter.

A macro can do anything it likes with the parse tree, but the return value is treated specially by the parser. You can return one of several kinds of values:

  • A parse tree (the same one, a modified one, or a synthetic one) to be passed up to the outer grammar rule that was doing pattern matching when we hit the macro.
  • A closure functioning as a generic routine that is to be immediately inlined, treating the closure as a template. Within the template, any variable referring back to one of the macro's parse parameters will interpolate that parameter's value at that point in the template. (It will be interpolated as a parse tree, a string, or a number depending on the declaration of the parameter.) Any variable not referring back to a parameter is left alone, so that your template can declare its own lexical variables, or refer to a package variable.
  • A string, to be shoved into the input stream and reparsed at the point the macro was found, starting in exactly the same grammar state we were before the macro. This is slightly different from returning the same string parsed into a parse tree, because a parse tree must represent a complete construct at some level, while the string could introduce a construct without terminating it. This is the most dangerous kind of return value, and the least likely to produce coherent error messages with decent line numbers for the end user. But it's also very powerful. Hee, hee.
  • An undef, indicating that the macro is only used for its side effects. Such a macro would be one way of introducing an alternate commenting mechanism, for instance. I suppose returning "" has the same effect, though.

A macro by default parses any subsequent text using whatever macro rule is currently in effect. Generally this will be the standard Perl::macro rule, which parses subsequent arguments as a list operator would--that is, as a comma-separated list with the same policy on using or omitting parentheses as any other list operator. This default may be overridden with the "is parsed" trait.

If there is no signature at all, macro defaults to using the null rule, meaning it looks for no argument at all. You can use it for simple word substitutions where no argument processing is needed. Instead of the long-winded:


    my macro this () is parsed(/<null>/) { "self" }

you can just quietly turn your program into C++:


    my macro this { "self" }

A lot of Perl is fun, and macros are fun, but in general, you should never use a macro just for the fun of it. It's far too easy to poke someone's eye out with a macro.

Out-of-band parameters

Certain kinds of routines want extra parameters in addition to the ordinary parameter list. Autoloading routines for instance would like to know what function the caller was trying to call. Routines sensitive to topicalizers may wish to know what the topic is in their caller's lexical scope.

There are several possible approaches. The Perl 5 autoloader actually pokes a package variable into the package with the AUTOLOAD subroutine. It could be argued that something that's in your dynamic scope should be accessed via dynamically scoped variables, and indeed we may end up with a $*AUTOLOAD variable in Perl 6 that works somewhat like Perl 5's, only better, because AUTOLOAD kinda sucks. We'll address that in Apocalypse 10, for some definition of "we".

Another approach is to give access to the caller's lexical scope in some fashion. The magical caller() function could return a handle by which you can access the caller's my variables. And in general, there will be such a facility under the hood, because we have to be able to construct the caller's lexical scope while it's being compiled.

In the particular case of grabbing the topic from the caller's lexical scope (and it has to be in the caller's lexical scope because $_ is now lexically scoped in Perl 6), we think it'll happen often enough that there should be a shorthand for it. Or maybe it's more like a "midhand". We don't want it too short, or people will unthinkingly abuse it. Something on the order of a CALLER:: prefix, which we'll discuss below.

Lexical context

Works just like in Perl 5. Why change something that works?

Well, okay, we are tweaking a few things related to lexical scopes. $_ (also known as the current topic) is always a lexically scoped variable now. In general, each subroutine will implicitly declare its own $_. Methods, submethods, macros, rules, and pointy subs all bind their first argument to $_; ordinary subs declare a lexical $_ but leave it undefined. Every sub definition declares its own $_ and hides any outer $_. The only exception is bare closures that are pretending to be ordinary blocks and don't commandeer $_ for a placeholder. These continue to see the outer scope's $_, just as they would any other lexically scoped variable declared in the outer scope.

Dynamic context

On the flipside, $_ is no longer visible in the dynamic context. You can still temporize (localize) it, but you'll be temporizing the current subroutine's lexical $_, not the global $_. Routines which used to use dynamic scoping to view the $_ of a calling subroutine will need some tweaking. See CALLER:: below.

The caller function

As in Perl 5, the caller function will return information about the dynamic context of the current subroutine. Rather than always returning a list, it will return an object that represents the selected caller's context. (In a list context, the object can still return the old list as Perl 5-ers are used to.) Since contexts are polymorphic, different context objects might in fact supply different methods. The caller function doesn't have to know anything about that, though.

What caller does know in Perl 6 is that it takes an optional argument. That argument says where to stop when scanning up the call stack, and so can be used to tell caller which kinds of context you're interested in. By default, it'll skip any "wrapper" functions (see "The .wrap method" below) and return the outermost context that thought it was calling your routine directly. Here's a possible declaration:


    multi *caller (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
        returns CallerContext {...}

The $where argument can be anything that matches a particular context, including a subroutine reference or any of these Code types:


    Code Routine Block Sub Method Submethod Multi Macro Bare Parametric

&_ produces a reference to your current Routine, though in the signature above we have to use &CALLER::_ to get at the caller's &_.

Note that use of caller can prevent certain kinds of optimizations, such as tail recursion elimination.

The want function

The want function is really just the caller function in disguise. It also takes an argument telling it which context to pay attention to, which defaults to the one you think it should default to. It's declared like this:


    multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
        returns WantContext {...}

Note that, as a variant of caller, use of want can prevent certain kinds of optimizations.

When want is called in a scalar context:


        $primary_context = want;

it returns a synthetic object whose type behaves as the junction of all the valid contexts currently in effect, whose numeric overloading returns the count of arguments expected, and whose string overloading produces the primary context as one of 'Void', 'Scalar', or 'List'. The boolean overloading produces true unless in a void context.

When want is called in a list context like this:


        ($primary, $count, @secondary) = want;

it returns a list of at least two values, indicating the contexts in which the current subroutine was called. The first two values in the list are the primary context (i.e the scalar return value) and the expectation count (see Expectation counts below). Any extra contexts that want may detect (see Valid contexts below) are appended to these two items.

When want is used as an object, it has methods corresponding to its valid contexts:


        if want.rw { ... }
        unless want.count < 2 { ... }
        when want.List { ... }

The want function can be used with smart matching:


        if want ~~ List & 2 & Lvalue { ... }

Which means it can also be used in a switch:


    given want {
        when List & 2 & Lvalue { ... }
        when .count > 2 {...}
    }

The numeric value of the want object is the "expectation count". This is an integer indicating the number of return values expected by the subroutine's caller. For void contexts, the expectation count is always zero; for scalar contexts, it is always zero or one; for list contexts it may be any non-negative number. The want value can simply be used as a number:


    if want >= 2 { return ($x, $y) }         # context wants >= 2 values
    else         { return ($x); }            # context wants < 2 values

Note that Inf >= 2 is true. (Inf is not the same as undef.) If the context is expecting an unspecified number of return values (typically because the result is being assigned to an array variable), the expectation count is Inf. You shouldn't actually return an infinite list, however, unless want ~~ Lazy. The opposite of Lazy context is Eager context (the Perl 5 list context, which always flattened immediately). Eager and Lazy are subclasses of List.

The valid contexts are pretty much as listed in RFC 21, though to the extent that the various contexts can be considered types, they can be specified without quotes in smart matches. Also, types are not all-caps any more. We know we have a Scalar type--hopefully we also get types or pseudo-types like Void, List, etc. The List type in particular is an internal type for the temporary lists that are passed around in Perl. Preflattened lists are Eager, while those lists that are not preflattened are Lazy. When you call @array.specs, for instance, you actually get back an object of type Lazy. Lists (Lazy or otherwise) are internal generator objects, and in general you shouldn't be doing operations on them, but on the arrays to which they are bound. The bound array manages its hidden generators on your behalf to "harden" the abstract list into concrete array values on demand.

The CALLER:: pseudopackage

Just as the SUPER:: pseudopackage lets you name a method somewhere in your set of superclasses, the CALLER:: pseudoclass lets you name a variable that is in the lexical scope of your (dynamically scoped) caller. It may not be used to create a variable that does not already exist in that lexical scope. As such, it is is primarily intended for a particular variable that is known to exist in every caller's lexical scope, namely $_. Your caller's current topic is named $CALLER::_. Your caller's current Routine reference is named &CALLER::_.

Note again that, as a form of caller, use of CALLER:: can prevent certain kinds of optimizations. However, if your signature uses $CALLER::_ as a default value, the optimizer may be able to deal with that as a special case. If you say, for instance:


    sub myprint (IO $handle, *@list = ($CALLER::_)) {
        print $handle: *@list;
    }

then the compiler can just turn the call:


    myprint($*OUT);

into:


    myprint($*OUT, $_);

Our earlier example of trim might want to default the first argument to the caller's $_. In which case you can declare it as:


    sub trim ( Str ?$_ is rw = $CALLER::_, Rule ?$remove = /\s+/ ) {
            s:each/^ <$remove> | <$remove> $//;
    }

which lets you call it like this:


    trim;   # trims $_

or even this:


    trim remove => /\n+/;

Do not confuse the caller's lexical scope with the callee's lexical scope. In particular, when you put a bare block into your program that uses $_ like this:


    for @array {
        mumble { s/foo/bar/ };
    }

the compiler may not know whether or not the mumble routine is intending to pass $_ as the first argument of the closure, which mumble needs to do if it's some kind of looping construct, and doesn't need to do if it's a one-shot. So such a bare block actually compiles down to something like this:


    for @array {
        mumble(sub ($_ is rw = $OUTER::_) { s/foo/bar/ });
    }

(If you put $CALLER::_ there instead, it would be wrong, because that would be referring to mumble's $_.)

With $OUTER::_, if mumble passes an argument to the block, that argument becomes $_ each time mumble calls the block. Otherwise, it's just the same outer $_, as if ordinary lexical scoping were in effect. And, indeed, if the compiler knows that mumble takes a sub argument with a signature of (), it may optimize it down to ordinary lexical scoping, and if it has a signature of ($), it can assume it doesn't need the default. A signature of (?$) means all bets are off again.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Where return/leave returns to

A return statement needs to return to where the user thinks it ought to return to. Since any block is a closure, any block is really a subroutine in disguise. But the user doesn't generally want return to return from the innermost block, but from the innermost block that was actually defined using an explicit sub-ish keyword. So that's what Perl 6 does. If it can, it will implement the return internally as a simple jump to the end of the subroutine. If it can't, it implements return by throwing a control exception that is caught by the proper context frame.

There will be a leave function that can return from other scopes. By default it exits from the innermost block (anything matching base class Code), but, as with caller and want, you can optionally select the scope you want to return from. It's declared like this:


    multi leave (?$where = Code, *@value, Int +$skip, Str +$label) {...}

which lets you say things like:


    leave;
    leave Block;
    leave &_ <== 1,2,3; # same as "return 1,2,3"
    leave where => Parametric, value => (1,2,3);
    leave Loop, label => 'LINE', $retval
    leave { $_ ~~ Block and $_ !~ Sub } 1,2,3;
    leave () <== 1,2,3;

As it currently stands, the parens aren't optional on that last one, because <== is a binary operator. You could always define yourself a "small" return, ret, that leaves the innermost block:


    my macro ret { "leave Code <== " }
    # and later...
    { ret 1,2,3 }

Note that unlike a return, leave always evaluates any return value in list context. Another thing to iron out is that the context we choose to leave must have set up an exception handler that can handle the control exception that leave must in some cases throw. This seems to imply that any context must miminally catch a control exception that is bound to its own identity, since leave is doing the picking, not the exception handlers.

Subroutine object methods

The .wrap method

You may ask a subroutine to wrap itself up in another subroutine in place, so that calls to the original are intercepted and interpreted by the wrapper, even if access is only through the reference:


    $id = $subref.wrap({
        # preprocessing here
        call;
        # postprocessing here
    }

The call built-in knows how to call the inner function that this function is wrapped around. In a void context, call arranges for the return value of the wrapped routine to be returned implicitly. Alternately, you can fetch the return value yourself from call and return it explicitly:


    $id = $subref.wrap({
        my @retval = call;
        push(@retval, "...and your little dog, too!";
        return @retval;
    })

The arguments arrive in whatever form you request them, independently of how the parameters look to the wrapped routine. If you wish to modify the parameters, supply a new argument list to call:


    $id = $subref.wrap(sub (*@args) {
        call(*@args,1,2,3);
    })

You need to be careful not to preflatten those generators, though.

The $id is useful for removing a particular wrapper:


    $subref.unwrap($id);

We might also at some point allow a built-in sub-like keyword wrap. If we don't, someone will write it anyway.

There is also likely a .wrappers method that represents the list of all the current wrappers of the subroutine. The ordering and manipulation of this list is beyond the scope of this document, but such activity will be necessary for anyone implementing Aspect-Oriented Programming in Perl 6.

The .assuming method

Currying is done with the .assuming method. It works a bit like the .wrap method, except that instead of wrapping in place, it returns a new function to you with a different signature, one in which some of the parameters are assumed to be certain values:


    my &say ::= &*print.assuming(handle => $*TERM);

You can even curry built-in operators:


    my &prefix:½ ::= &infix:/ .assuming(y => 2);

(assuming here that built-in infix operators always use $x and $y).

The .req method

The .req method returns the number of required args requested by the sub in question. It's just a shortcut for digging down into the signature trait and counting up how many required parameters there are. The count includes any invocant (or invocants, for multimethods).

If you want to know how many optional arguments there are, you can do your own digging. This call is primarily for use by madmen who wish to write variants of map and reduce that are sensitive to the number of parameters declared for the supplied block. (Certainly the implementation of for will make heavy use of this information.)

Subroutine traits

These are traits that are declared on the subroutine as a whole, not on any individual parameter.

Internal traits

The signature, returns, and do traits are internal traits containing, respectively, the type signature of the parameters, the type signature of the return value, and the body of the function. Saying:


    sub Num foo (int $one, Str *@many) { return +@many[$one] }

is short for saying something like:


    sub foo is signature( sig(int $one, Str *@many) )
            is returns( sig(Num) )
            will do { return +@many[$one] }

In fact, it's likely that the "do" trait handler has to set up all the linkage to pass parameters in and to trap "return" exceptions.

Many of these pre-defined traits just map straight onto the container object's attribute methods of the same name. Underneath they're just accessors, but we use the trait notation in declarations for several reasons. For one thing, you can string a bunch of them together without repeating the original object, which might be anonymous in any event. It also gives us liberty behind the scenes to promote or demote various traits from mere properties to attributes of every object of a class. It's one of those levels of indirection computer scientists keep talking about...

Going the other direction, it allows us to pretend that accessors are just another form of metadata when accessed as a trait. By the same token it allows us to transparently make our metadata active rather than passive, without rewriting our declarations. This seems useful.

The basic rule of thumb is that you can use any of a container's rw methods as if it were a trait. For subroutine containers, the example above really turns into something like this:


    BEGIN {
        &foo.signature = sig(int $one, Str *@many);
        &foo.returns = sig(Num);
        &foo.do = { return +@many[$one] }
    }

is rw

This trait identifies lvalue subs or methods. See the section on lvalue subs below.

is parsed(<rule>)

This trait binds a macro to a grammar rule for parsing it. The grammar rule is invoked as soon as the initial keyword is seen and before anything else is parsed, so you can completely change the grammar on the fly. For example, the sig() function above might well invoke special parsing rules on its arguments, since what is inside is not an ordinary expression.

In the absence of an explicit <is parsed> trait, a macro's arguments are parsed with whatever macro rule is in effect, by default the standard Perl::macro.

is cloned("BEGIN")

Perhaps this is an alternate way of specifying the parsing and semantics of a macro or function. Or perhaps not. Just an idea for now...

is cached

This is the English translation of what some otherwise sane folks call "memoization". This trait asserts that Perl can do automatic caching of return values based on the assumption that, for any particular set of arguments, the return value is always the same. It can dramatically speed up certain kinds of recursive functions that shouldn't have been written recursively in the first place. ;-)

is inline

This says you think performance would be enhanced if the code were inlined into the calling code. Of course, it also constitutes a promise that you're not intending to redefine it or wrap it or do almost anything else fancy with it, such as expecting it to get called by a method dispatcher. In early versions of Perl 6, it's likely to be completely ignored, I suspect. (If not, it's likely to be completely broken...)

PRE/POST/FIRST/LAST/etc.

These all-caps traits are generally set from the inside of a subroutine as special blocks. FIRST and LAST are expected to have side effects. PRE and POST are expected to not have side effects, but return a boolean value indicating whether pre/post conditions have been met. If you declare any PRE or POST conditions, your routine will automatically be wrapped in a wrapper that evaluates them according to Design-by-Contract principles (ORing preconditions, ANDing postconditions).

Note that the actual "first" or "last" property attached to a subroutine may well be a list of FIRST or LAST blocks, since there can be more than one of them.

Overriding built-ins

All built-in functions that can be overridden are either multimethods or global subroutines. To override one of these, just declare your own subroutine of that name in your current package or lexical scope. For instance, the standard non-filehandle print function may well be declared as:


    multi *print (*@list) {...}

Just declare your own sub:


    sub print (*@list) {...}

to override all print multimethods in the current package, or:


    my sub print (*@list) {...}

to override in the current lexical scope.

To override or wrap a built-in function for everyone (dangerous), you have to play with the globally named version, but we're not going to tell you how to do that. If you can't figure it out, you shouldn't be doing it.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Subs with special parsing

Any macro can have special parsing rules if you use the is parsed trait. But some subs are automatically treated specially.

Operator subroutines

In Perl 6, operators are just subroutines with special names. When you say:


    -$a + $b

you're really doing this internally:


    infix:+( prefix:-($a), $b)

Operator names start with one of four names followed by a colon:


    prefix:     a unary prefix operator
    infix:      a binary infix operator
    postfix:    a binary suffix operator
    circumfix:  a bracketing operator

Everything after the colon and up to the next whitespace or left parenthesis will be taken as the spelling of the actual operator. Unicode is specifically allowed. The null operator is not allowed, so if the first thing after the colon is a left parenthesis, it is part of the operator, and if the first thing is whitespace, it's an illegal name. Boom!

You can make your own lexically scoped operators like this:


    my sub postfix:! (Int $x) { return factorial($x) }
    print 5!, "\n";     # print 120

You can use a newly declared operator recursively as soon as its name is introduced, including in its own definition:


    my sub postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }

You can declare multimethods that create new syntax like this:


    multi postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }

However, regardless of the scope of the name, the new syntax is considered to be a lexically scoped declaration, and is only valid after the name is declared (or imported) and after any precedence traits have been parsed.

If you want to specify a precedence, you always do it relative to some existing operator:


    multi infix:coddle   (PDL $a, PDL $b) is equiv(&infix:+) { ... }
    multi infix:poach    (PDL $a, PDL $b) is looser(&infix:+) { ... }
    multi infix:scramble (PDL $a, PDL $b) is tighter(&infix:+) { ... }

If you base a tighter operator on a looser one, or a looser one on a tighter one, you don't get back to where you were. It always goes into the cracks no matter how many times you derive.

Just a note on implementation: if you've played with numerically oriented precedence tables in the past, and are thinking, "but he'll run out of bits in his number eventually." The answer to that is that we don't use precedence numbers. The actual precedence level can be represented internally by an arbitrarily long string of bytes that are compared byte by byte. When you make a tighter or looser operator, the string just gets one byte longer. A looser looser looser looser infix:* is still tighter than a tighter tighter tighter tighter infix:+, because the string comparison bails out on the first byte. The first byte compares the built-in multiplication operator against the built-in addition operator, and those are already different, so we don't have to compare any more.

However, two operators derived by the same path have the same precedence. All binary operators of a given precedence level are assumed to be left associative unless declared otherwise with an assoc('right') or assoc('non') trait. (Unaries pay no attention to associativity--they always go from the outside in.)

This may sound complicated, and it is, if you're implementing it internally. But from the user's point of view, it's much less complicated than trying to keep track of numeric precedence levels yourself. By making the precedence levels relative to existing operators, we keep the user from having to think about how to keep those cracks open. And most user-defined operators will have exactly the same precedence as something built-in anyway. Not to mention the fact that it's just plain better documentation to say that an operator works like a familiar operator such as "+". Who the heck can remember what precedence level 17 is, anyway?

If you don't specify a precedence on an operator, it will default to something reasonable. A named unary operator, whether prefix or postfix, will default to the same precedence as other named unary operators like abs(). Symbolic unaries default to the same precedence as unary + or - (hence the ! in our factorial example is tighter than the * of multiplication.) Binaries default to the same precedence as binary + or -. So in our coddle example above, the is equiv(&infix::+) is completely redundant.

Unless it's completely wrong. For multimethods, it's an error to specify two different precedences for the same name. Multimethods that overload an existing name will be assumed to have the same precedence as the existing name.

You'll note that the rules for the scope of syntax warping are similar to those for macros. In essence, these definitions are macros, but specialized ones. If you declare one as a macro, the body is executed at compile time, and returns a string, a parse tree, or a closure just as a macro would:


    # define Pascal comments:
    macro circumfix:(**) () is parsed(/.*?/ { "" }
                                # "Comment? What comment?"

A circumfix operator is assumed to be split symmetrically between prefix and postfix. In this case the circumfix of four characters is split exactly in two, but if you don't want it split in the middle (which is particularly gruesome when there's an odd number of characters) you may specify exactly where the parse rule is interpolated with a special ... marker, which is considered part of the name:


    macro circumfix:(*...*) () is parsed(/.*?/ { "" }

The default parse rule for a circumfix is an ordinary Perl expression of lowest precedence, the same one Perl uses inside ordinary parentheses. The defaults for other kinds of operators depend on the precedence of the operator, which may or may not be reflected in the actual name of the grammatical rule.

Note that the ternary operator ??:: has to be parsed as an infix ?? operator with a special parsing rule to find the associated :: part. I'm not gonna explain that here, partly because user-defined ternary operators are discouraged, and partly because I haven't actually bothered to figure out the details yet. This Apocalypse is already late enough.

Also please note that it's perfectly permissible (but not extremely expeditious) to rapidly reduce the Perl grammar to a steaming pile of gopher guts by redefining built-in operators such as commas or parentheses.

Named unaries

As in Perl 5, a named unary operator by default parses with the same precedence as all other named unary operators like sleep and rand. Any sub declared with a single scalar argument counts as a named unary, not just explicit operator definitions. So it doesn't really matter whether you say:


    sub plaster ($x) {...}

or:


    sub prefix:plaster ($x) {...}

Argumentless subs

As in Perl 5, a 0-ary subroutine (one with a () signature) parses without looking for any argument at all, much like the time built-in. (An optional pair of empty parens are allowed on the call, as in time().) Constant subs with a null signature will likely be inlined as they are in Perl 5, though the preferred way to declare constants will be as standard variables with the is constant trait.

Matching of forward declarations

If you define a subroutine for which you earlier had a stub declaration, its signature and traits must match the stub's subroutine signature and traits, or it will be considered to be declaring a different subroutine of the same name, which may be any of illegal, immoral, or fattening. In the case of standard subs, it would be illegal, but in the case of multimethods, it would merely be fattening. (Well, you'd also get a warning if you called the stub instead of the "real" definition.)

The declaration and the definition should have the same defaults. That does not just mean that they should merely look the same. If you say:


    our $x = 1;
    sub foo ($y = $x) {...}             # default to package var

    {
        my $x = 2;
        sub foo ($y = $x) { print $y }  # default to lexical var
        foo();
    }

then what you've said is an error if the compiler can catch it, and is erroneous if it can't. In any event, the program may correctly print any of these values:


    1
    2
    1.5
    12
    1|2
    (1,2)
    Thbthbthbthth...
    1|2|1.5|12|(1|2)|(1,2)|Thbthbthbthth...

Lvalue subroutines

The purpose of an lvalue subroutine is to return a "proxy"--that is, to return an object that represents a "single evaluation" of the subroutine while actually allowing multiple accesses within a single transaction. An lvalue subroutine has to pretend to be a storage location, with all the rights, privileges, and responsibilities pertaining thereto. But it has to do this without repeatedly calculating the identity of whatever it is you're actually modifying underneath--especially if that calculation entails side effects. (Or is expensive--meaning that it has the side-effect of chewing up computer resources...)

An lvalue subroutine is declared with the is rw trait. The compiler will take whatever steps necessary to ensure that the returned value references a storage location that can be treated as an lvalue. If you merely return a variable (such as an object attribute), that variable can act as its own proxy. You can also return the result of a call to another lvalue subroutine or method. If you need to do pre- or post-processing on the "public" value, however, you'll need to return a tied proxy variable.

But if you know how hard it is to tie variables in Perl 5, you'll be pleasantly surprised that we're providing some syntactic relief for the common cases. In particular, you can say something like:


    my sub thingie is rw {
        return my $var
            is Proxy( for => $hidden_var,
                      FETCH => { ... },
                      STORE => { ... },
                      TEMP  => { ... },
                      ...
            );
    }

in order to generate a tie class on the fly, and only override the standard proxy methods you need to, while letting others default to doing the standard behavior. This is particularly important when proxying things like arrays and hashes that have oodles of potential service routines.

But in particular, note that we want to be able to temporize object attributes, which is why there's a TEMP method in our proxy. In Perl 5 you could only temporize (localize) variables. But we want accessors to be usable exactly as if they were variables, which implies that temporization is part of the interface. When you use a temp or let context specifier:


    temp $obj.foo = 42;
    let $obj.bar = 43;

the proxy attribute returned by the lvalue method needs to know how to temporize the value. More precisely, it needs to know how to restore the old value at the end of the dynamic scope. So what the .TEMP method returns is a closure that knows how to restore the old value. As a closure, it can simply keep the old value in a lexical created by .TEMP. The same method is called for both temp and let. The only difference is that temp executes the returned closure unconditionally at end of scope, while let executes the closure conditionally only upon failure (where failure is defined as throwing a non-control exception or returning undef in scalar context or () in list context).

After the .TEMP method returns the closure, you never have to worry about it again. The temp or let will squirrel away the closure and execute it later when appropriate. That's where the real power of temp and let comes from--they're fire-and-forget operators.

The standard Scalar, Array, and Hash classes also have a .TEMP method (or equivalent). So any such variable can be temporized, even lexicals:


    my $identity = 'Clark Kent';

    for @emergencies {
         temp $identity = 'SUPERMAN';   # still the lexical $identity
         ...
    }

    print $identity;    # prints 'Clark Kent'

We'll talk more about lvalues below in reference to various RFCs that espouse lvalue subs--all of which were rejected. :-)

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Temporizing any subroutine call

Lvalue subroutines have a special way to return a proxy that can be temporized, but sometimes that's overkill. Maybe you don't want an lvalue; you just want a subroutine that can do something temporarily in an rvalue context. To do that, you can declare a subroutine with a TEMP block that works just like the .TEMP method described earlier. The TEMP block returns a closure that will be called when the call to this function goes out of scope.

So if you declare a function with a TEMP property:


    sub setdefout ($x) {
        my $oldout = $*OUT;
        $*DEFOUT = $x;
        TEMP {{ $*DEFOUT = $oldout }}
    }

then you can call it like this:


    temp setdefout($MYFILE);

and it will automatically undo itself on scope exit. One place where this might be useful is for wrappers:


    temp &foo.wrap({...})

The routine will automatically unwrap itself at the end of the current dynamic scope. A let would similarly put a hypothetical wrapper in place, but keep it wrapped on success.

The TEMP block is called only if you invoke the subroutine or method with temp or let. Otherwise the TEMP block is ignored. So if you just call:


    setdefout($MYFILE);

then the side-effects are permanent.

That being said...

I don't think we'll actually be using explicit TEMP closures all over the place, because I'd like to extend the semantics of temp and let such that they automatically save state of anything within their dynamic scopes. In essence, Perl writes most of the TEMP methods for you, and you don't have to worry about them unless you're interfacing to external code or data that doesn't know how to save its own state. (Though there's certainly plenty of all that out in the wide world.)

See appendix C for more about this line of thought.

The RFCs

Let me reiterate that there's little difference between an RFC accepted with major caveats and a rejected RFC from which some ideas may have been stolen. Please don't take any of this personally--I ignore author names when evaluating RFCs.

Rejected RFCs

RFC 59: Proposal to utilize * as the prefix to magic subroutines

There are several problems with doing this.

  • The * prefix is already taken for two other meanings. (It indicates a completely global symbol or a splatlist.) We could come up with something else, but we're running out of keyboard. And I don't think it's important enough to inflict a Unicode character on people.
  • It would be extra clutter that conveys little extra information over what is already conveyed by all-caps.
  • All-caps routines are a fuzzy set. Some of these routines are always called implicitly, while others are only usually called implicitly. We'd have to be continually making arbitrary decisions on where to cut it off.
  • Some routines are in the process of migrating into (or out of) the core. We don't want to force people to rewrite their programs when that happens.
  • People are already used to the all-caps convention.
  • Most importantly, I have an irrational dislike for anything that resembles Python's __foo__ convention. :-)

So we'll continue to half-heartedly reserve the all-caps space for Perl magic.

RFC 75: structures and interface definitions

In essence, this proposal turns every subroutine call into a constructor of a parameter list object. That's an interesting way to look at it, but the proposed notation for class declaration suffers from some problems. It's run-time rather than compile-time, and it's based on a value list rather than a statement list. In other words, it's not what we're gonna do, because we'll have a more standard-looking way of declaring classes. (On the other hand, I think the proposed functionality can probably be modeled by suitable use of constructors.)

The proposal also runs afoul of the rule that a lexically scoped variable ought generally to be declared explicitly at the beginning of its lexical scope. The parameters to subroutines will be lexically scoped in Perl 6, so there needs to be something equivalent to a my declaration at the beginning.

Unifying parameter passing with pack/unpack syntax is, I think, a false economy. pack and unpack are serialization operators, while parameter lists are about providing useful aliases to caller-provided data without any implied operation. The fact that both deal with lists of values on some level doesn't mean we should strain to make them the same on every level. That will merely make it impossible to implement subroutine calls efficiently, particularly since the Parrot engine is register-based, not stack-based as this RFC assumes. Register-based machines don't access parameters by offsets from the stack pointer.

RFC 107: lvalue subs should receive the rvalue as an argument

This would make it hard to dynamically scope an attribute. You'd have to call the method twice--once to get the old value, and once to set the new value.

The essence of the lvalue problem is that you'd like to separate the identification of the object from its manipulation. Forcing the new value into the same argument list as arguments meant to identify the object is going to mess up all sorts of things like assignment operators and temporization.

RFC 118: lvalue subs: parameters, explicit assignment, and wantarray() changes

This proposal has a similar problem in that it doesn't separate the identity from the operation.

RFC 132: Subroutines should be able to return an lvalue

This RFC proposes a keyword lreturn to return an lvalue.

I'd rather the lvalue hint be available to the compiler, I think, even if the body has not been compiled yet. So it needs to be declared in the signature somehow. The compiler would like to know whether it's even legal to assign to the subroutine. Plus it might have to deal with the returned value as a different sort of object.

At least this proposal doesn't confuse identification with modification. The lvalue is presumably an object with a STORE method that works independently of the original arguments. But this proposal also doesn't provide any mechanism to do postprocessing on the stored value.

RFC 149: Lvalue subroutines: implicit and explicit assignment

This is sort of the don't-have-your-cake-and-don't-eat-it-too approach. The implicit assignment doesn't allow for virtual attributes. The explicit assignment doesn't allow for delayed modification.

RFC 154: Simple assignment lvalue subs should be on by default

Differentiating "simple" lvalue subs is a problem. A user ought to just be able to say something fancy like


    temp $obj.attr += 3;

and have it behave right, provided .attr allows that.

Even with:


    $obj.attr = 3;

we have a real problem with knowing what can be done at compile time, since we might not know the exact type of $obj. Even if $obj is declared with a type, it's only an "isa" assertion. We could enforce things based on the declared type with the assumption that a derived type won't violate the contract, but I'm a little worried about large semantic changes happening just because one adds an optional type declaration. It seems safer that the untyped method behave just like the typed method, only with run-time resolution rather than compile-time resolution. Anything else would violate the principle of least surprise. So if it is not known whether $obj.attr can be an lvalue, it must be assumed that it can, and compiled with a mechanism that will work consistently, or throw a run-time exception if it can't.

The same goes for argument lists, actually. $obj.meth(@foo) can't assume that @foo is either scalar or list until it knows the signature of the .meth method. And it probably doesn't know that until dispatch time, unless it can analyze the entire set of available methods in advance. In general, modification of an invalid lvalue (an object without a write method, essentially) has to be handled by throwing an exception. This may well mean that it is illegal for a method to have an rw parameter!

Despite the fact that there are similar constraints on the arguments and on the lvalue, we cannot combine them, because the values are needed at different times. The arguments are needed when identifying the object to modify, since lvalue objects often act as proxies for other objects elsewhere.` Think of subscripting an array, for instance, where the subscripts function as arguments, so you can say:


    $elem := @a[0][1][2];
    $elem = 3;

Likewise we should be able to say:


    $ref := a(0,1,2);
    $ref = 3;

and have $ref be the lvalue returned by a(). It's the implied "is rw" on the left that causes a() to return an lvalue, just as a subroutine parameter that is "rw" causes lvaluehood to be passed to its actual argument.

Since we can't in general know at compile time whether a method is "simple" or not, we don't know whether it's appropriate to treat an assignment as an extra argument or as a parameter to an internal STORE method. We have to compile the call assuming there's a separate STORE method on the lvalue object. Which means there's no such thing as a "simple" lvalue from the viewpoint of the caller.

Accepted RFCs

RFC 168: Built-in functions should be functions

This all seems fine to me in principle. All built-in functions and multimethods exist in the "*" space, so system() is really &*system(); in Perl 6 .

We do need to consider whether "sub system" changes the meaning of calls to system() earlier in the lexical scope. Or are built-ins imported as third-class keywords like lock() is in Perl 5? It's probably best if we detect the ambiguous situation and complain. A "late" definition of system() could be considered a redefinition, in fact, any definition of system() could be considered a redefinition. We could require "is redefined" or some such on all such redefinitions.

The "lock" situation arises when we add a new built-in, however. Do we want to force people to add in an "is redefined" where they didn't have to before? Worse, if their definition of "lock" is retroactive to the front of the file, merely adding "sub lock is redefined" is not necessarily good enough to become retroactive.

This is not a problem with my subs, since they have to be declared in advance. If we defer committing compilation of package-named subs to the end of the compilation unit, then we can just say that the current package overrides the "*" package. All built-ins become "third class" keywords in that case. But does that mean that a built-in can't override ordinary function-call syntax? Built-ins should at least be able to be used as list operators, but in Perl 5 you couldn't use your own sub as a list operator unless it was predeclared. Maybe we could relax that.

Since there are no longer any barewords, we can assume that any unrecognized word is a subroutine or method call of some sort even in the absence of parens. We could assume all such words are list operators. That works okay for overriding built-ins that actually *are* list operators--but not all of them are. If you say:


    print rand 1, 2;
    sub rand (*@x) { ... }

then it cannot be determined whether rand should be parsed as a unary operator ($) or as a list operator (*@).

Perl has to be able to parse its unary operators. So that code must be interpreted as:


    print rand(1), 2;

At that point in the parse, we've essentially committed to a signature of ($), which makes the subsequent sub declaration a redefinition with a different signature, which is illegal. But when someone says:


    print foo 1, 2;
    sub foo (*@x) { ... }

it's legal until someone defines &*foo($). We can protect ourselves from the backward compatibility problem by use of parens. When there are parens, we can probably defer the decision about the binding of its arguments to the end of the compilation. So either of:


    print foo(1), 2;
    sub foo (*@x) { ... }

or:


    print foo(1, 2);
    sub foo (*@x) { ... }

remain legal even if we later add a unary &*foo operator, as long as no other syntactic monkey business is going on with the functions args. So I think we keep the rule that says post-declared subs have to be called using parens, even though we could theoretically relax it.

On the other hand, this means that any unrecognized word followed by a list may unambiguously be taken to be a multimethod being called as a list operaotr. After all, we don't know when someone will be adding more multimethods. I currently think this is a feature, but I could be sadly mistaken. It has happened once or twice in the past.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

RFC 57: Subroutine prototypes and parameters

We ended up with something like this proposal, though with some differences. Instead of =, we're using => to specify names because it's a pair constructor in Perl 6, so there's little ambiguity with positional parameters. Unless a positional parameter is explicitly declared with a Pair or Hash type, it's assumed not to be interested in named arguments.

Also, as the RFC points out, use of = would be incompatible with lvalue subs, which we're supporting.

The RFC allows for mixing of positional and named parameters, both in declaration and in invocation. I think such a feature would provide far more confusion than functionality, so we won't allow it. You can always process your own argument list if you want to. You could even install your own signature handler in place of Perl's.

The RFC suggests treating the first parameter with a default as the first optional parameter. I think I'd rather mark optional parameters explicitly, and then disallow defaults on required parameters as a semantic constraint.

It's also suggested that something like:


    sub waz ($one, $two, 
             $three = add($one, $two), 
             $four  = add($three, 1)) {
        ...
    }

be allowed, where defaults can refer back to previous parameters. It seems as though we could allow that, if we assume that symbols are introduced in signatures as soon as they are seen. That would be consistent with how we've said my variables are introduced. It does mean that a prototype that defaults to the prior $_ would have to be written like this:


    $myclosure = sub ($_ = $OUTER::_) { ... }

On the other hand, that's exactly what:


    $myclosure = { ... }

means in the absence of placeholder variables, so the situation will likely not arise all that often. So I'd say yes, defaults should be able to refer back to previous parameters in the same signature, unless someone thinks of a good reason not to.

As explained in Apocalypse 4, $OUTER:: is for getting at an outer lexical scope. This ruling about formal parameters means that, effectively, the lexical scope of a subroutine "starts to begin" where the formal parameters are declared, and "finishes beginning" at the opening brace. Whether a given symbol in the signature actually belongs to the inner scope or the outer scope depends on whether it's already been introduced by the inner scope. Our sub above needed $OUTER::_ because $_ had already been introduced as the name of the first argument. Had some other name been introduced, $_ might still be taken to refer to the outer $_:


    $myclosure = sub ($arg = $_) { ... }

If so, use of $OUTER::_ would be erroneous in that case, because the subroutine's implicit $_ declaration wouldn't happen till the opening curly, and instead of getting $OUTER::_, the user would unexpectedly be getting $OUTER::OUTER::_, as it were. So instead, we'll say that the implicit introduction of the new sub's $_ variable always happens after the <subintro> and before the <signature>, so any use of $_ as a default in a signature or as an argument to a property can only refer to the subroutine's own topic, if any. To refer to any external $_ you must say either $CALLER::_ or $OUTER::_. This approach seems much cleaner.

RFC 160: Function-call named parameters (with compiler optimizations)

For efficiency, we have to be able to hoist the semantics from the signature into the calling module when that's practical, and that has to happen at compile time. That means the information has to be in the signature, not embedded in a fields() function within the body of the subroutine. In fact, my biggest complaint about this RFC is that it arbitrarily separates the prototype characters, the parameter names, and the variable names. That's a recipe for things getting out of sync.

Basically, this RFC has a lot of the right ideas, but just doesn't go far enough in the signature direction, based on the (at the time) laudable notion that we were interested in keeping Perl 6 as close to Perl 5 as possible. Which turned out not to be quite the case. :-) Our new signatures look more hardwired than the attribute syntax proposed here, but it's all still very hookable underneath via the sub and parameter traits. And everything is together that should be together.

Although the signature is really just a trait underneath, I thought it important to have special syntax for it, just as there's special syntax for the body of the function. Signatures are very special traits, and people like special things to look special. It's just more of those darn psychological reasons that keep popping up in the design of Perl.

Still and all, the current design is optimized for many of the same sensitivities described in this RFC.

RFC 128: Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

This RFC also has lots of good ideas, but tends to stay a little too close to Perl 5 in various areas where I've decided to swap the defaults around. For instance, marking reference parameters in prototypes rather than slurpy parameters in signatures, identifying lazy parameters rather than flattening, and defaulting to rw (autovivifying lvalue args) rather than constant (rvalue args).

Context classes are handled by the automatic coercion to references within scalar context, and by type junctions.

Again, I don't buy into two-pass, fill-in-the-blanks argument processing.

Placeholders are now just for argument declaration, and imply no currying. Currying on the other hand is done with an explicit .assuming method, which requires named args that will be bound to the corresponding named parameters in the function being curried.

Or should I say functions? When module and class writers write systems of subroutines or methods, they usually go to great pains to make sure all the parameter names are consistent. Why not take advantage of that?

So currying might even be extended to classes or modules, where all methods or subs with a given argument name are curried simultaneously:


    my module MyIO ::= (use IO::Module).assuming(ioflags => ":crlf");
    my class UltAnswer ::= (use Answer a,b,c).assuming(answer => 42);

If you curry a class's invocant, it would turn the class into a module instead of another class, since there are no longer any methods if there are no invocants:


    my module UltAnswer ::=
        (use Answer a,b,c).assuming(self => new Answer: 42);

Or something like that. If you think this implies that there are class and module objects that can be sufficiently introspected to do this sort of chicanery, you'd be right. On the other hand, given that we'll have module name aliasing anyway to support running multiple versions of the same module, why not support multiple curried versions without explicit renaming of the module:


    (use IO::Module).assuming(ioflags => ":crlf");

Then for the rest of this scope, IO::Module really points to your aliased idea of IO::Module, without explicitly binding it to a different name. Well, that's for Apocalypse 11, really...

One suggestion from this RFC I've taken to heart, which is to banish the term "prototype". You'll note we call them signatures now. (You may still call Perl 5's prototypes "prototypes", of course, because Perl 5's prototypes really were a prototype of signatures.)

RFC 344: Elements of @_ should be read-only by default

I admit it, I waffled on this one. Up until the last moment, I was going to reject it, because I wanted @_ to work exactly like it does in Perl 5 in subs without a signature. It seemed like a nice sop towards backward compatibility.

But when I started writing about why I was rejecting it, I started thinking about whether a sig-less sub is merely a throwback to Perl 5, or whether we'll see it continue as a viable Perl 6 syntax. And if the latter, perhaps it should be designed to work right rather than merely to work the same. The vast majority of subroutines in Perl 5 refrain from modifying their arguments via @_, and it somehow seems wrong to punish such good deeds.

So I changed my mind, and the default signature on a sub without a signature is simply (*@_), meaning that @_ is considered an array of constants by default. This will probably have good effects on performance, in general. If you really want to write through the @_ parameter back into the actual arguments, you'll have to declare an explicit signature of (*@_ is rw).

The Perl5-to-Perl6 translator will therefore need to translate:


    sub {...}

to:


    sub (*@_ is rw) {...}

unless it can be determined that elements of @_ are not modified within the sub. (It's okay to shift a constant @_ though, since that doesn't change the elements passed to the call; remember that for slurpy arrays the implied "is constant" or explicit "is rw" distributes to the individual elements.)

RFC 194: Standardise Function Pre- and Post-Handling

Yes, this needs to be standardized, but we'll be generalizing to the notion of wrappers, which can automatically keep their pre and post routines in sync, and, more importantly, keep a single lexical scope across the related pre and post processing. A wrapper is installed with the .wrap method, which can have optional parameters to tell it how to wrap, and which can return an identifier by which the particular wrapper can be named when unwrapping or otherwise rearranging the wrappings. A wrapper automatically knows what function it's wrapped around, and invoking the call builtin automatically invokes the next level routine, whether that's the actual routine or another layer of wrapper. That does matter, because with that implicit knowledge call doesn't need to be given the name of the routine to invoke.

The implementation is dependent on what happens to typeglobs in Perl 6, how does one inspect and modify the moral equivalent of the symbol table?

This is not really a problem, since we've merely split the typeglob up into separate entries.

Also: what will become of prototypes? Will it become possible to declare return types of functions?

Yes. Note that if you do introspection on a sub ref, by default you're going to get the signature and return type of the actual routine, not of any wrappers. There needs to be some method for introspecting the wrappers as well, but it's not the default.

As pointed out in [JP:HWS] certain intricacies are involved: what are the semantic of caller()? Should it see the prehooks? If yes, how?

It seems to me that sometimes you want to see the wrappers, and sometimes you don't. I think caller needs some kind of argument that says which levels to recognize and which levels to ignore. It's not necessarily a simple priority either. One invocation may want to find the innermost enclosing loop, while another might want the innermost enclosing try block. A general matching term will be supplied on such calls, defaulting to ignore the wrappers.

How does this relate to the proposed generalized want() [DC:RFC21]?

The want() function can be viewed as based on caller(), but with a different interface to the information available at the the particular call level.

I worry that generalized wrappers will make it impossible to compile fast subroutine calls, if we always have to allow for run-time insertion of handlers. Of course, that's no slower than Perl 5, but we'd like to do better than Perl 5. Perhaps we can have the default be to have wrappable subs, and then turn that off with specific declarations for speed, such as "is inline".

RFC 271: Subroutines : Pre- and post- handlers for subroutines

I find it odd to propose using PRE for something with side effects like flock. Of course, this RFC was written before FIRST blocks existed...

On the other hand, it's possible that a system of PRE and POST blocks would need to keep "dossiers" of its own internal state independent of the "real" data. So I'm not exactly sure what the effective difference is between PRE and FIRST. But we can always put a PRE into a lexical wrapper if we need to keep info around till the POST. So we can keep PRE and POST with the semantics of simply returning boolean expressions, while FIRST and LAST are evaluated primarily for side effects.

You might think that you wouldn't need a signature on any pre or post handler, since it's gonna be the same as the primary. However, we have to worry about multimethods of the same name, if the handlers are defined outside of the subroutine. Again, embedding PRE and POST blocks either in the routine itself or inside a wrapper around the routine should handle that. (And turning the problem into one of being able to generate a reference to a multimethod with a particular signature, in essence, doing method dispatch without actually dispatching at the end.)

My gut feeling is that $_[-1] is a bad place to keep the return value. With the call interface we're proposing, you just harvest the return value of call if you're interested in the return value. Or perhaps this is a good place for a return signature to actually have formal variables bound to the return values.

Also, defining pre and post conditions in terms of exceptions is probably a mistake. If they're just boolean expressions, they can be ANDed and ORed together more easily in the approved DBC fashion.

We haven't specified a declarative form of wrapper, merely a .wrap method that you can call at run time. However, as with most of Perl, anything you can do at run time, you can also do at compile time, so it'd be fairly trivial to come up with a syntax that used a wrap keyword in place of a sub:


    wrap split(Regex ?$re, ?$src = $CALLER::_, ?$limit = Inf) {
        print "Entering split\n";
        call;
        print "Leaving split\n";
    }

I keep mistyping "wrap" as "warp". I suppose that's not so far off, actually...

RFC 21: Subroutines: Replace wantarray with a generic want function

Overall, I like it, except that it's reinventing several wheels. It seems that this has evolved into a powerful method for each sub to do its own overloading based on return type. How does this play with a more declarative approach to return types? I dunno. For now we're assuming multmethod dispatch only pays attention to argument types. We might get rid of a lot of calls to want if we could dispatch on return type as well. Perhaps we could do primary dispatch on the arguments and then do tie-breaking on return type when more then one multimethod has the same parameter profile.

I also worry a bit that we're assuming an interpreter here that can keep track of all the context information in a way suitable for searching by the called subroutine. When running on top of a JVM or CLR, this info might not be convenient to provide, and I'd hate to have to keep a descriptor of every call, or do some kind of double dispatch, just because the called routine might want to use want(), or might want to call another routine that might want to use want, or so on. Maybe the situation is not that bad.

I sometimes wonder if want should be a method on the context object:


    given caller.want {...}

or perhaps the two could be coalesced into a single call:


    given context { ... }

But for the moment let's assume for readability that there's a want function distinct from caller, though with a similar signature:


    multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
        returns WantContext {...}

As with caller, calling want with no arguments looks for the context of the currently executing subroutine or method. Like return, it specifically ignores bare blocks and routines interpreting bare blocks, and finds the context for the lexically enclosing explicit sub or method declaration, named by &_.

You'll note that unlike in the proposal, we don't pass a list to want, so we don't support the implicit && that is proposed for the arguments to want. But that's one of the re-invented wheels, anyway, so I'm not too concerned about that. What we really want is a want that works well with smart matching and switch statements.

RFC 23: Higher order functions

In general, this RFC proposes some interesting semantic sugar, but the rules are too complicated. There's really no need for special numbered placeholders. And the special ^_ placeholder is too confusing. Plus we really need regular sigils on our placeholder variables so we can distinguish $^x from @^x from %^x.

But the main issue is that the RFC is confusing two separate concepts (though that can be blamed on the languages this idea was borrowed from). Anyway, it turns out we'll have an explicit pre-binding method called .assuming for actual currying.

We'll make the self-declaring parameters a separate concept, called placeholder variables. They don't curry. Some of the examples of placeholders in the RFC are actually replaced by topics and junctions in our smart matching mode, but there are still lots of great uses for placeholder variables.

RFC 176: subroutine / generic entity documentation

This would be trivial to do with declared traits and here docs. But it might be better to use a POD directive that is accessible to the program. An entity might even have implicit traits that bind to nearby chunks of the right sort. Maybe we could get Don Knuth to come up with something literate...

RFC 298: Make subroutines' prototypes accessible from Perl

While I'm all in favor of a sub's signature being available for inspection, this RFC goes beyond that to make indirection in the signature the norm. This seems to be a solution in search of a problem. I'm not sure the confusion of the indirection is worth the ability to factor out common parameter lists. Certainly parameter lists must have introspection, but using it to set the prototype seems potentially confusing. That being said, the signatures are just traits, so this may be one of those things that is permitted, but not advised, like shooting your horse in the middle of the desert, or chewing out your SO for burning dinner. Implicit declaration of lexically scoped variables will undoubtedly be considered harmful by somebody someday. [Damian says, "Me. Today."]

RFC 334: Perl should allow specially attributed subs to be called as C functions

Fine, Dan, you implement it. ;-)

Did I claim I ignore the names of RFC authors? Hmm.

The syntax for the suggested:


    sub foo : C_visible("i", "iii") {#sub body}

is probably a bit more verbose in real life:


    my int sub foo (int $a, int $b, int $c)
         is callable("C","Python","COBOL") { ... }

If we can't figure out the "i" and "iii" bits from introspection of the signature and returns traits, we haven't done introspection right. And if we're gonna have an optional type system, I can't think of a better place to use it than for interfaces to optional languages.

Acknowledgements

This work was made possible by a grant from the Perl Foundation. I would like to thank everyone who made this dissertation possible by their generous support. So, I will...

Thank you all very, very, very, very much!!!

I should also point out that I would have been stuck forever on some of these design issues without the repeated prodding (as in cattle) of the Perl 6 design team. So I would also like to publicly thank Allison, chromatic, Damian, Dan, Hugo, Jarkko, Gnat, and Steve. Thanks, you guys! Many of the places we said "I" above, I should have said "we".

I'd like to publicly thank O'Reilly & Associates for facilitating the design process in many ways.

I would also like to thank my wife Gloria, but not publicly.

Future Plans

From here on out, the Apocalypses are probably going to be coming out in priority order rather than sequential order. The next major one will probably be Apocalypse 12, Objects, though it may take a while since (like a lot of people in Silicon Valley) I'm in negative cash flow at the moment, and need to figure out how to feed my family. But we'll get it done eventually. Some Apocalypses might be written by other people, and some of them hardly need to be written at all. In fact, let's write Apocalypse 7 right now...

Apocalypse 7: Formats

Gone from the core. See Damian.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Appendix A: Rationale for pipe operators

As we pointed out in the text, the named form of passing a list has the disadvantage that you have to know what the formal parameter's name is. We could get around that by saying that a null name maps to the slurp array. In other words, we could define a => unary operator that creates a null key:


    stuff(@foo, =>(1,2,3))

We can at least lose the outer parens in this case:


    stuff @foo, =>(1,2,3)

But darn it, we can't get rid of those pesky inner parens because of the precedence of => with respect to comma. So perhaps it's time for a new operator with looser precedence than comma:


    stuff @foo *: 1,2,3         # * to match * zone marker
    stuff @foo +* 1,2,3         # put the * on the list side
    stuff @foo *=> 1,2,3        # or combine with => above
    stuff @foo ==> 1,2,3        # maybe just lengthen =>
    stuff @foo <== 1,2,3        # except the dataflow is to the left
    stuff @foo with 1,2,3       # could use a word

Whichever one we pick, it'd still probably want to construct a special pair internally, because we have to be able to use it indirectly:


    @args = (\@foo, '*@' => (1,2,3));
    stuff *@args;

But if we're going to have a special operator to switch explicitly to the list part, it really needs to earn its keep, and do more work. A special operator could also force scalar context on the left and list context on the right. So with implied scalar context we could omit the backslash above:


    @args = (@foo with 1,2,3);
    stuff *@args;

That's all well and good, and some language designers would stop right there, if not sooner. But if we think about this in relation to cascaded list operators, we'll see a different pattern emerging. Here's a left-to-right variant on the Schwartzian Transform:


    my @x := map {...} @input;
    my @y := sort {...} with @x;
    my @z := map {...} with @y;

When we think of data flowing left-to-right, it's more like a pipe operator from a shell, except that we're naming our pipes @x and @y. But it'd be nice not to have to name the temporary array values. If we do have a pipe operator in Perl, it's not going to be |, for two reasons. First, | is taken for junctions. Second, piping is a big, low-precedence operation, and I want a big fat operator that will show up to the eye. Of our candidate list above, I think the big, fat arrows really stand out, and look like directed pipes. So assuming we have the ==> operator to go with the <==, we could write our ST like this:


    @input     ==>
    map {...}  ==>
    sort {...} ==>
    map {...}  ==>
    push my @z;

That argues that the scalar-to-list transition operator should be <==:


    my @x := map {...} @input;
    my @y := sort {...} <== @x;
    my @z := map {...} <== @y;

And that means this should maybe dwim:


    @args = (@foo <== 1,2,3);
    stuff *@args;

Hmm.

That does imply that <== is (at least in this case) a data composition operator, unlike the ==> operator which merely sends the output of one function to the next. Maybe that's not a problem. But people might see:


    @x <== 1,2,3

and expect it does assignment when it in fact doesn't. Internally it would really do something more like appending a named argument:


    @x, '*@' => (1,2,3)

or however we decide to mark the beginning of the "real" list within a larger list.

But I do rather like the looks of:


    push @foo <== 1,2,3;

not to mention the symmetrical:


    1,2,3 ==>
    push @foo;

Note however that the pointy end of ==> must be bound to a function that takes a list. You can't say:


    1,2,3 ==>
    my @foo;

because you can't say:


    my @foo <== 1,2,3;

Or rather, you can, if we allow:


    (@foo <== 1,2,3)

but it would mean the Wrong Thing. Ouch. So maybe that should not be legal. The asymmetry was bugging me anyway.

So let's say that <== and ==> must always be bound on their pointy end to a slurpy function, and if you want to build an indirect argument list, you have to use some kind of explicit list function such as args:


    @args = args @foo <== 1,2,3;
    stuff *@args;

The args function would really be a no-op, much like other context enforcers such as scalar and list. In fact, I'd be tempted to just use list like this:


    @args = list @foo <== 1,2,3;

But unless we can get people to see <== as a strange kind of comma, that will likely be misread as:


    @args = list(@foo) <== 1,2,3;

when it's really this:


    @args = list(@foo <== 1,2,3);

On the other hand, using list would cut out the need for yet another built-in, for which there is much to be said... I'd say, let's go with list on the assumption that people will learn to read <== as a pipe comma. If someone wants to use args for clarity, they can always just alias list:


    my &args ::= &*list;

More likely, they'll just use the parenthesized form:


    @args = list(@foo <== 1,2,3);

I suppose there could also be a prefix unary form, in case they want to use it without scalar arguments:


    @args = list(<== 1,2,3);

or in case they want to put a comma after the scalar arguments:


    @args = list(@foo, <== 1,2,3);

In fact, it could be argued that we should only have the unary form, since in this:


    stan @array, ollie <== 1,2,3

it's visually ambiguous whether the pointy pipe belongs to stan or ollie. It could be ambiguous to the compiler as well. With a unary operator, it unambiguously belongs to ollie. You'd have to say:


    stan @array, ollie, <== 1,2,3

to make it belong to stan. And yet, it'd be really strange for a unary <== to force the arguments to its left into scalar context if the operator doesn't govern those arguments syntactically. And I still think I want <== to do that. And it's probably better to disambiguate with parentheses anyway. So we keep it a binary operator. There's no unary variant, either prefix or postfix. You can always say:


    list( () <== 1,2,3 )
    list( @foo <== () )

Similarly, ==> is also always a binary operator. As the reverse of <==, it forces its left side into list context, and it also forces all the arguments of the list operator on the right into scalar context. Just as:


    mumble @foo <== @bar

tells you that @foo is in scalar context and @bar is in list context regardless of the signature of mumble, so too:


    @bar ==>
    mumble @foo

tells you exactly the same thing. This is particularly useful when you have a method with an unknown signature that you have to dispatch on:


    @bar ==>
    $objects[$x].mumble(@foo)

The ==> unambiguously indicates that all the other arguments to mumble are in scalar context. It also allows mumble's signature to check to see if the number of scalar arguments is within the correct range, counting only required and optional parameters, since we don't have to allow for extra arguments to slop into the slurp array.

If we do want extra list arguments, we could conceivably allow both kinds of pipe at once:


    @bar ==>
    $objects[$x].mumble(@foo <== 1,2,3)

If we did that, it could be equivalent to either:


    $objects[$x].mumble(@foo <== 1,2,3,@bar)

or:


    $objects[$x].mumble(@foo <== @bar,1,2,3)

Since I can argue it both ways, we'll have to disallow it entirely. :-)

Seriously, the conservative thing to do is to disallow it until we know what we want it to mean, if anything.

On the perl6-language list, an operator was discussed that would do argument rearrangement, but this is a little different in that it is constrained (by default) to operate only with the slurpy list part of the input to a function. This is as it should be, if you think about it. When you pipe things around in Unix, you don't expect the command line switches to come in via the pipe, but from the command line. The scalar arguments of a list operator function as the command line, and the list argument functions as the pipe.

That being said, if you want to pull the scalar arguments from the front of the pipe, we already have a mechanism for that:


    @args = list(@foo <== 1,2,3);
    stuff *@args;

By extension, we also have this:


    list(@foo <== 1,2,3) ==>
      stuff *();

So there's no need for a special syntax to put the invocant after all the arguments. It's just this:


    list(@foo <== 1,2,3) ==>
     $object.stuff *();

Possibly the *() could be inferred in some cases, but it may be better not to if we can't do it consistently. If stuff's signature started with optional positional parameters, we wouldn't know whether the pipe starts with positional arguments or list elements. I think that passing positionals at the front of the pipe is rare enough that it ought to be specially marked with *(). Maybe we can reduce it to a *, like a unary that has an optional argument:


    list(@foo <== 1,2,3) ==>
     $object.stuff *;

By the way, you may think that we're being silly calling these pipes, since we're just passing lists around. But remember that these can potentially be lazy lists produced by a generator. Indeed, a common idiom might be something like:


    <$*IN> ==> process() ==> print;

which arguably reads better than:


    print process <$*IN>;

Another possibility is that we extend the argumentless * to mark where the list goes in constructs that take lists but aren't officially list operators:


    1,2,3 ==>
    my @foo = (*)

But maybe we should just make:


    1,2,3 ==> my @foo;

do what people will expect it to. Since we require the list operator for the other usage, it's easy enough to recognize that this is not a list operator, and that we should therefore assign it. It seems to have a kind of inevitability about it.

Damian: "Certainly, if we don't support it, someone (*ahem*) will immediately write:


    multi infix:==> (Lazy $list, @array is rw) { @array = $list }
    multi infix:<== (@array is rw, Lazy $list) { @array = $list }

"So we might as well make it standard."

On the other hand...

I'm suddenly wondering if assignment and binding can change precedence on the right like list operators do if it's known we're assigning to a list. I, despite my credentials as TheLarry, keep finding myself writing list assignments like this:


    my @foo := 0..9,'a'..'z';

Oops. But what if it wasn't an oops. What if that parsed like a list operator, and slurped up all the commas to the right? Parens would still be required around a list on the left though. And it might break weird things like:


    (@a = (1,2), @b = (3,4))

But how often do you do a list assignment inside a list? On the other hand, making list assignment a different precedence than scalar is weird. But it'd have to be that way if we still wanted:


    ($a = 1, $b = 2)

to work as a C programmer expects. Still, I think I like it. In particular, it'd let us write what we mean explicitly:


    1,2,3 ==>
    my @foo = *;

So let's go ahead and do that, and then maybe someone (*ahem*) might just forget to overload the pipe operators on arrays.*

* The words "fat", "slim", and "none" come to mind.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Appendix B: How to bind strange arguments to weird parameters

It may seem like the declaration syntax has too many options (and it does), but it's actually saving you a good deal of complexity. When you say something excruciatingly fancy like this:


    method x ($me: $req, ?$opt, *%named, *@list, +$namedopt) {...}

you're actually getting a whole pile of semantics resembling this pseudocode:


    method x (*@list) {
        my %named;
        my $argnum = 0;

        # find beginning of named arguments
        while exists @list[$argnum] and @list[$argnum] !~ Pair|Hash {
            $argnum++;
            last if $argnum == 3;               # force transition to named or list
        }
        my $posmax = $argnum;

        # pull out named actuals (pairs or hashes)
        while exists @list[$argnum] and @list[$argnum] ~~ Pair|Hash {
            %named.add(@list[$argnum++].pairs);
        }

        # the invocant comes in like this
        my $_   := $posmax > 0 ?? @list[0] :: delete %named{me}  // die "Not enough args";
        my $me  := $_;                          # only if invocant declared

        # required parameters are bound like this
        my $req := $posmax > 1 ?? @list[1] :: delete %named{req} // die "Not enough args";

        # optional positional parameters are bound like this
        my $opt := $posmax > 2 ?? @list[2] :: delete %named{opt};

        # optional named parameters are bound like this
        my $namedopt := delete %named{namedopt};

        # trim @list down to just the remaining list
        splice(@list, 0, $argnum, ());

        if defined @list {
            die "Can't have two lists" if exists %named{'*@'};
        }
        else {
            @list := delete %named{'*@'} // [];
        }

        ...     # Your ad here.
    }

Only hopefully it runs a lot faster. Regardless, I know which version I'd rather write...

Or maintain...

You can get even more semantics than that if we need to process default values or do run-time type checking. It also gets hairier if you have any positional parameters declared as Pair or Hash. On the other hand, the compiler can probably optimize away lots of the linkage code in general, particularly when it can compare the actual arguments against the signature at compile time and, for instance, turn named arguments into positional arguments internally. Or prebuild a hash of the named args. Even if it can't do that, it could generate specially marked lists that already know where the named arguments start and stop so we don't have to scan for those boundaries. This gets easier if the caller marks the list part with <== or ==>. Though it gets harder again if they use splat to pass indirect positional arguments.

Note also that we don't necessarily have to build a real %named slurphash. The %named hash can just be a proxy for a function that scans those args known to contain named arguments, whether pairs or hashes. In general, although there may be quite a few optional parameters, most of them aren't set in the average call, so the brute-force approach of scanning the call list linearly for each possible parameter may well be faster than trying to build a real hash (particularly if any or all of named parameters already come in as a hash).

It might be tricky to make bound named arguments disappear from the proxy hash, however. In the code above, you'll note that we actually delete named arguments from %named as we bind them to positional parameters. A proxy hash might have to figure out how to hide "used" values somehow. Or maybe we just leave them visible as aliases to bound parameters. I don't profess to know which is better. Could be a pragma for it...seems the usual cure for festering bogons these days...

In our pseudocode above, we don't ever actually evaluate the arguments of the entire list, because it could be a generated list like 1..Inf, and flattening that kind of list would chew up just a wee bit too much memory. If @list were an ordinary array, its boolean value would tell us if it will produce any values, but that's not really what we want. What we really want to know is whether the caller specified anything, not whether what they specified is going to produce any values. If you say:


    push @foo, 1..0;

the range doesn't generate any values, but you shouldn't look anywhere else for the list either. That is,


    1,2,3 ==>
    push @foo, 1..0;

should probably be an error. It's equivalent to saying:


    push @foo, '*@'=>(1,2,3), 1..0;

or some such. We try to catch that in our pseudocode above.

When you bind a lazy list to an array name such as @_ or @list, by default it's going to try to give the appearance that the array is all there, even if behind the scenes it is having to generate values for you. In this case, we don't want to flatten the list, so instead of trying to access any of the values of the variadic list, we just ask if it is defined. In Perl 6, an ordinary array is considered defined if it either has some flattened arguments in it already, or it has an associated list generator definition of how to produce more elements. We can figure this out without changing the state of the array.

Contrast this with the array's boolean value, which is true only if it is known that there are actual elements in the array. If an array has no remaining flattened elements but has a definition for how to produce more, the boolean evaluation must evaluate the definition sufficiently to determine whether there will be at least one more value. In the case of a range object, it can ask the range object without actually flattening another element, but in the limiting case of a random generator subroutine, it would have to go ahead and call the wretched generator to get the next flattened element, so that it can know to return false if there were no next element.

Note that even the flat view of the array doesn't necessarily flatten until you actually access the array, in which case it flattens as much as it needs to in order to produce the value you requested, and no more.

We need a name for the list of internal generators bound to the array. Since they're behaving as specifications for the array, we'll get at them using the predefined .specs method that arrays support.

So, for instance, if you say:


    my @foo := (0..9,'a'..'z');

then:


    @foo.length

would return 36, but:


    @foo.specs.length

would return 2, one for each range object. (That's presuming you didn't already ask for the length of the array, since in general asking for the length of an array flattens it completely and blows away the specs--though perhaps in this case the range specs can calculate their lengths non-destructively.)

Anyway, in the absence of such a flattening event, both @foo and @foo.specs are true. However, if instead you'd given it a null range:


    my @foo := 1..0;

then @foo.specs would be true at least temporarily, but @foo would be false, because the flattened list contains no values.

Now here's where it gets interesting. As you process a flat array view, the corresponding specs mutate:


    my @flat = 1..10;
    shift @flat;
    print @flat.specs;   # prints 2..10

The specs aren't just a queue, but also a stack:


    my @flat = 1..10;
    pop @flat;
    print @flat.specs;   # prints 1..9

Note that you can pop an array without committing to flattening the entire list:


    my @flat = (1..Inf, 1..10);
    pop @flat;
    print @flat.specs;   # prints 1..Inf, 1..9

If you pop the array 9 more times, the resulting null spec pops itself from the specs list, and you get a single spec of 1..Inf out of @flat.specs. (Continuing to pop @flat returns Inf forever, of course, with no change to the spec.)

However, if you access the last element using the length of the array, it may try to flatten, and fail:


    my @flat = (1..Inf, 1..10);
    $last = @flat[@flat - 1];   # Kaboom!

Still, we should be able to detect the attempt to flatten an infinite list and give a better diagnostic than Perl 5's "Out of memory". Either that, or someone should just up and figure out how to subscript arrays using transfinite numbers.

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Appendix C: Hypotheticality and Flight Recorders

[This is a portion of a letter I sent to the design team. This stuff is still in discussion with the internals folks, so please take this as informative rather than definitive. But I just thought you might like to see how sausage is made. :-) --Larry]


    : Seems like you're going to have to explain the C<TEMP>/C<RESTORE> 
    : relationship in A6, Larry, since C<RESTORE> isn't even mentioned
    : there at present.

I'd like to explain it primarily by making both of them unnecessary most of the time.


    : But maybe after a good nights sleep, eh? ;-)

Okay, I've had a night's sleep. Whether it was a good one remains to be seen. But here's what I'm after. Forget implementation for a moment. What's the interface that people want, in the abstract? You were starting to think this way yourself with "suppose {...}". So let's do some supposin'.

I'll talk about caller and callee here, but I'm really talking about the user's abstract view vs. the user's implementation view, so it applies to variables, lvalue routines, and rvalue routines alike.

On the caller side, people want to be able to make a temporary assumption or a hypothesis. There is some scope over which the hypothesis is stated, and then some scope over which the hypothesis is assumed. At the end of that scope, the hypothesis may or may not be retracted. (I'm trying not to state this in terms of temp and let, just to keep our current ideas out of it.)

Historically, the scope of the hypothesis statement is a single variable/value, because local only knew how to temporize that kind of thing. The scope of the hypothesis assumption has always extended to the end of the current dynamic scope.

In the very abstract view, supposing is a transactional function with two arguments, the first one of which establishes a scope in which any state change is labelled as provisional. The second argument establishes a scope in which we work out the ramifications of that supposing, which may include other supposings. In classical terms, they're the protasis and apodosis:


    suppose { <pro> } { <apo> }

At the end of the second scope we decide whether to succeed or fail. On failure, we unsuppose everything that was supposed from the beginning, and upon success, we allow certain new "facts" to leak out into a larger reality (which may itself be a hypothesis, but leave that aside for the moment). It's basically commit/rollback.

It could also be written:


    suppose <pro> {
        <apo>
    }

to make it look more like an if. But up till now we've written it:


    {
        temp <pro>;
        <apo>
    }

which actually works out fine as a syntax, since every statement is in a sense conditional on preceding statements. If we want to allow a hypothetical result to leak out, we use "let" instead of "temp". Whatever. I'm not caring about the syntax yet, just the abstract interface.

And the abstract interface wants both <pro> and <apo> to be as general as possible. We already have a completely general <apo>, but we've severely restricted the <pro> so far to be (in Perl 5) a storage location, or in Perl 6 (Seb), anything with a .TEMP method. You'd like to be able to turn anything involving state changes into an <pro>, but we can't. We can only do it to values that cooperate.

So the real question is what does cooperation look like from the "callee" end of things? What's the best interface for cooperating? I submit that the best interface for that does not look like TEMP => {}, or RESTORE {}. It looks like nothing at all!


    sub foo { $x = 1234; }
    $x = 0;
    {
        temp foo();
        print $x;       # prints 1234
    }
    print $x;           # prints 0

How might this work in practice? If Perl (as a language) is aware of when it is making a state change, and if it also aware of when it is doing so in a hypothetical context (*any* hypothetical context in the dynamic scope), then Perl (as a language) can save its own record of that state change, filing it with the proper hypothetical context management authorities, to be undone (or committed) at the appropriate moment.

That's fine as long as we're running in Perl. Where an explicit TEMP method is useful is in the interface to foreign code or data that doesn't support dynamically scoped hypotheticality. If a Proxy is proxying for a Perl variable or attribute, however, then the STORE already knows its dynamic context, and handles temp and let implicitly just as any other Perl code running in hypothetical context would.

As for a hypothesis within a hypothesis, I think it just means that when you refrain from UNDOing the let state changes, you actually KEEP them into a higher undo list, if there is one. (In practice, this may mean there aren't separate LAST and UNDO lists. Just a LAST list, in which some entries do a KEEP or UNDO at the last moment. Otherwise a let within a let has to poke something onto both a keep list and an undo list. But maybe it comes out to the same thing.)

(In any event, we do probably need a name for the current innermost supposition we're in the dynamic scope of. I have my doubts that $?_ is that name, however. $0 is closer to it. Can thrash that out later.)

That's all very powerful. But here's where it borders on disruptive technology. I mentioned a while back the talk by Todd A. Proebsting on Disruptive Language Technologies. In it he projects which new disruptive language technologies will take over the world someday. The one that stuck in my head was the flight data recorder, where every state change for the last N instructions was recorded for analysis in case of failure. Sound familiar?

Taken together with my hypotheticality hypothesis, I think this likely indicates a two-birds-with-one-stone situation that we must design for. If state changes are automatically stored in a type-appropriate manner, we don't necessarily have to generate tons of artificial closures merely to create artificial lexical variables just so we have them around later at the right moment. I don't mind writing double closures for things like macros, where they're not in hot code. But let and friends need to be blazing fast if we're ever going to use Perl for logic programming, or even recursive descent parsing. And if we want a flight data recorder, it had better not hang on the outside of the airplane where it'll induce drag.

And that's what I think is wrong with our Sebastopolian formulation of .TEMP. Am I making any sense?

Larry

Improving mod_perl Sites' Performance: Part 8

In this article we continue talking about how to optimize your site for performance without touching code, buying new hardware or telling casts. A few simple httpd.conf configuration changes can improve the performance tremendously.

Choosing MinSpareServers, MaxSpareServers and StartServers

With mod_perl enabled, it might take as much as 20 seconds from the time you start the server until it is ready to serve incoming requests. This delay depends on the OS, the number of preloaded modules and the process load of the machine. It's best to set StartServers and MinSpareServers to high numbers, so that if you get a high load just after the server has been restarted, the fresh servers will be ready to serve requests immediately. With mod_perl, it's usually a good idea to raise all three variables higher than normal.

In order to maximize the benefits of mod_perl, you don't want to kill servers when they are idle, rather you want them to stay up and available to handle new requests immediately. I think an ideal configuration is to set MinSpareServers and MaxSpareServers to similar values, maybe even the same. Having the MaxSpareServers close to MaxClients will completely use all of your resources (if MaxClients has been chosen to take the full advantage of the resources), but it'll make sure that at any given moment your system will be capable of responding to requests with the maximum speed (assuming that number of concurrent requests is not higher than MaxClients).

Let's try some numbers. For a heavily loaded Web site and a dedicated machine, I would think of (note 400Mb is just for example):


  Available to webserver RAM:   400Mb
  Child's memory size bounded:  10Mb
  MaxClients:                   400/10 = 40 (larger with mem sharing)
  StartServers:                 20
  MinSpareServers:              20
  MaxSpareServers:              35

However, if I want to use the server for many other tasks, but make it capable of handling a high load, I'd try:


  Available to webserver RAM:   400Mb
  Child's memory size bounded:  10Mb
  MaxClients:                   400/10 = 40
  StartServers:                 5
  MinSpareServers:              5
  MaxSpareServers:              10

These numbers are taken off the top of my head, and shouldn't be used as a rule, but rather as examples to show you some possible scenarios. Use this information with caution.

Summary of Benchmarking to Tune All 5 Parameters

OK, we've run various benchmarks -- let's summarize the conclusions:

  • MaxRequestsPerChild
    If your scripts are clean and don't leak memory, then set this variable to a number as large as possible (10000?). If you use Apache::SizeLimit, then you can set this parameter to 0 (treated as infinity). You will want this parameter to be smaller if your code becomes gradually more unshared over the process' life. As well as this, Apache::GTopLimit can help, with its shared memory limitation feature.
  • StartServers
    If you keep a small number of servers active most of the time, then keep this number low. Keep it low especially if MaxSpareServers is also low, as if there is no load, Apache will kill its children before they have been utilized at all. If your service is heavily loaded, then make this number close to MaxClients, and keep MaxSpareServers equal to MaxClients.
  • MinSpareServers
    If your server performs other work besides Web serving, then make this low so the memory of unused children will be freed when the load is light. If your server's load varies (you get loads in bursts) and you want fast response for all clients at any time, then you will want to make it high, so that new children will be respawned in advance and are waiting to handle bursts of requests.
  • MaxSpareServers
    The logic is the same as for MinSpareServers - low if you need the machine for other tasks, high if it's a dedicated Web host and you want a minimal delay between the request and the response.
  • MaxClients
    Not too low, so you don't get into a situation where clients are waiting for the server to start serving them (they might wait, but not for very long). However, do not set it too high. With a high MaxClients, if you get a high load, then the server will try to serve all requests immediately. Your CPU will have a hard time keeping up, and if the child size * number of running children is larger than the total available RAM, then your server will start swapping. This will slow down everything, which in turn will make things even slower, until eventually your machine will die. It's important that you take pains to ensure that swapping does not normally happen. Swap space is an emergency pool, not a resource to be used routinely. If you are low on memory and you badly need it, then buy it. Memory is cheap.

    But based on the test I conducted above, even if you have plenty of memory like I have (1Gb), increasing MaxClients sometimes will give you no improvement in performance. The more clients are running, the more CPU time will be required, the less CPU time slices each process will receive. The response latency (the time to respond to a request) will grow, so you won't see the expected improvement. The best approach is to find the minimum requirement for your kind of service and the maximum capability of your machine. Then start at the minimum and test as I did, successively raising this parameter until you find the region on the curve of the graph of latency and/or throughput against MaxClients where the improvement starts to diminish. Stop there and use it. When you make the measurements on a production server you will have the ability to tune them more precisely, since you will see the real numbers.

    Don't forget that if you add more scripts, or even just modify the existing ones, then the processes will grow in size as you compile in more code. When you do this, your parameters probably will need to be recalculated.

KeepAlive

If your mod_perl server's httpd.conf includes the following directives:


  KeepAlive On
  MaxKeepAliveRequests 100
  KeepAliveTimeout 15

you have a real performance penalty, since after completing the processing for each request, the process will wait for KeepAliveTimeout seconds before closing the connection and will therefore not be serving other requests during this time. With this configuration, you will need many more concurrent processes on a server with high traffic.

If you use some server status reporting tools, then you will see the process in K status when it's in KeepAlive status.

The chances are that you don't want this feature enabled. Set it Off with:


  KeepAlive Off

The other two directives don't matter if KeepAlive is Off.

You might want to consider enabling this option if the client's browser needs to request more than one object from your server for a single HTML page. If this is the situation, then by setting KeepAlive On you will save the HTTP connection overhead for all requests but the first one for each page.

For example: If you have a page with 10 ad banners, which is not uncommon today, then your server will work more effectively if a single process serves them all during a single connection. However, your client will see a slightly slower response, since banners will be brought one at a time and not concurrently as is the case if each IMG tag opens a separate connection.

Since keepalive connections will not incur the additional three-way TCP handshake, turning it on will be kinder to the network.

SSL connections benefit the most from KeepAlive in cases where you haven't configured the server to cache session ids.

You have probably followed the usual advice to send all the requests for static objects to a plain Apache server. Since most pages include more than one unique static image, you should keep the default KeepAlive setting of the non-mod_perl server, i.e. keep it On. It will probably be a good idea also to reduce the timeout a little.

One option would be for the proxy/accelerator to keep the connection open to the client but make individual connections to the server, read the response, buffer it for sending to the client and close the server connection. Obviously, you would make new connections to the server as required by the client's requests.

Also, you should know that KeepAlive requests only work with responses that contain a Content-Length header. To send this header do:


  $r->header_out('Content-Length', $length);

PerlSetupEnv Off

PerlSetupEnv Off is another optimization you might consider. This directive requires mod_perl 1.25 or later.

mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args(), and the value returned by Apache::server_hostname() is put into $ENV{SERVER_NAME}.

But %ENV population is expensive. Those who have moved to the Perl Apache API no longer need this extra %ENV population, and can gain by turning it Off. Scripts using the CGI.pm module require PerlSetupEnv On because that module relies on a properly populated CGI environment table.

By default it is "On."

Note that you can still set environment variables. For example, when you use the following configuration:


  PerlSetupEnv Off
  PerlModule Apache::RegistryNG
  <Location /perl>
    PerlSetupEnv On
    PerlSetEnv TEST hi
    SetHandler perl-script
    PerlHandler Apache::RegistryNG
    Options +ExecCGI
  </Location>

and issue a request (for example http://localhost/perl/setupenvoff.pl) for this script:


  setupenvoff.pl
  --------------
  use Data::Dumper;
  my $r = Apache->request();
  $r->send_http_header('text/plain');
  print Dumper(\%ENV);

you should see something like this:


  $VAR1 = {
            'GATEWAY_INTERFACE' => 'CGI-Perl/1.1',
            'MOD_PERL' => 'mod_perl/1.25',
            'PATH' => '/usr/lib/perl5/5.00503:... snipped ...',
            'TEST' => 'hi'
          };

Notice that we have got the value of the environment variable TEST.

Reducing the Number of stat() Calls Made by Apache

If you watch the system calls that your server makes (using truss or strace) while processing a request, then you will notice that a few stat() calls are made. For example, when I fetch http://localhost/perl-status and I have my DocRoot set to /home/httpd/docs I see:


  [snip]
  stat("/home/httpd/docs/perl-status", 0xbffff8cc) = -1 
                      ENOENT (No such file or directory)
  stat("/home/httpd/docs", {st_mode=S_IFDIR|0755, 
                                 st_size=1024, ...}) = 0
  [snip]

If you have some dynamic content and your virtual relative URI is something like /news/perl/mod_perl/summary (i.e., there is no such directory on the web server, the path components are only used for requesting a specific report), then this will generate five(!) stat() calls, before the DocumentRoot is found. You will see something like this:


  stat("/home/httpd/docs/news/perl/mod_perl/summary", 0xbffff744) = -1 
                      ENOENT (No such file or directory)
  stat("/home/httpd/docs/news/perl/mod_perl",         0xbffff744) = -1
                      ENOENT (No such file or directory)
  stat("/home/httpd/docs/news/perl",                  0xbffff744) = -1
                      ENOENT (No such file or directory)
  stat("/home/httpd/docs/news",                       0xbffff744) = -1
                      ENOENT (No such file or directory)
  stat("/home/httpd/docs", 
                      {st_mode=S_IFDIR|0755, st_size=1024, ...})  =  0

How expensive are those calls? Let's use the Time::HiRes module to find out.


  stat_call_sample.pl
  -------------------
  use Time::HiRes qw(gettimeofday tv_interval);
  my $calls = 1_000_000;
  
  my $start_time = [ gettimeofday ];
  
  stat "/app" for 1..$calls;
  
  my $end_time = [ gettimeofday ];
  
  my $elapsed = tv_interval($start_time,$end_time) / $calls;
  
  print "The average execution time: $elapsed seconds\n";

This script takes a time sample at the beginning, then does 1,000,000 stat() calls to a nonexisting file, samples the time at the end and prints the average time it took to make a single stat() call. I'm sampling a million stats, so I'd get a correct average result.

Before we actually run the script, one should distinguish between two different situations. When the server is idle, the time between the first and the last system call will be much shorter than the same time measured on the loaded system. That is because on the idle system, a process can use CPU very often, and on the loaded system lots of processes compete over it and each process has to wait for a longer time to get the same amount of CPU time.

So first we run the above code on the unloaded system:


  % perl stat_call_sample.pl
  The average execution time: 4.209645e-06 seconds

So it takes about 4 microseconds to execute a stat() call. Now let's start a CPU intensive process in one console. The following code keeps the CPU busy all the time.


  % perl -e '1**1 while 1'

And now run the stat_call_sample.pl script in the other console.


  % perl stat_call_sample.pl
  The average execution time: 8.777301e-06 seconds

You can see that the average time has more than doubled (about 8 microseconds). And this is obvious, since there were two processes competing for the CPU. Now if we run 4 occurrences of the above code:


  % perl -e '1**1 while 1' &
  % perl -e '1**1 while 1' &
  % perl -e '1**1 while 1' &
  % perl -e '1**1 while 1' &

And when running our script in parallel with these processes, we get:


  % perl stat_call_sample.pl
  2.0853558e-05 seconds

about 20 microseconds. So the average stat() system call is five times longer now. Now, if you have 50 mod_perl processes that keep the CPU busy all the time, the stat() call will be 50 times slower and it'll take 0.2 milliseconds to complete a series of call. If you have five redundant calls as in the strace example above, then they add up to 1 millisecond. If you have more processes constantly consuming CPU, then this time adds up. Now multiply this time by the number of processes that you have and you get a few seconds lost. As usual, for some services, this loss is insignificant, while for others a very significant one.

So why does Apache make all these redundant stat() calls? You can blame the default installed TransHandler for this inefficiency. Of course, you could supply your own, which will be smart enough not to look for this virtual path and immediately return OK. But in cases where you have a virtual host that serves only dynamically generated documents, you can override the default PerlTransHandler with this one:


  <VirtualHost 10.10.10.10:80>
    ...
    PerlTransHandler  Apache::OK
    ...
  </VirtualHost>

As you see it affects only this specific virtual host.

This has the effect of short circuiting the normal TransHandler processing of trying to find a filesystem component that matches the given URI -- no more 'stat's!

Watching your server under strace/truss can often reveal more performance hits than trying to optimize the code itself!

For example, unless configured correctly, Apache might look for the .htaccess file in many places, even if you don't have one, and make many unnecessary open() calls.

Let's start with this simple configuration. We will try to reduce the number of irrelevant system calls.


  DocumentRoot "/home/httpd/docs"
  <Location /app/test>
    SetHandler perl-script
    PerlHandler Apache::MyApp
  </Location>

The above configuration allows us to make a request to /app/test and the Perl handler() defined in Apache::MyApp will be executed. Notice that in the test setup there is no file to be executed (like in Apache::Registry). There is no .htaccess file as well.

This is a typical generated trace.


  stat("/home/httpd/docs/app/test", 0xbffff8fc) = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs/app",      0xbffff8fc) = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs", 
        {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
  open("/.htaccess", O_RDONLY)                 = -1 ENOENT 
        (No such file or directory)
  open("/home/.htaccess", O_RDONLY)            = -1 ENOENT 
        (No such file or directory)
  open("/home/httpd/.htaccess", O_RDONLY)      = -1 ENOENT 
        (No such file or directory)
  open("/home/httpd/docs/.htaccess", O_RDONLY) = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs/test", 0xbffff774)    = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs", 
        {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0

Now we modify the <Directory> entry and add AllowOverride None, which among other things disables .htaccess files and will not try to open them.


  <Directory />
    AllowOverride None
  </Directory>

We see that the four open() calls for .htaccess have gone:


  stat("/home/httpd/docs/app/test", 0xbffff8fc) = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs/app",      0xbffff8fc) = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs", 
        {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
  stat("/home/httpd/docs/test", 0xbffff774)    = -1 ENOENT 
        (No such file or directory)
  stat("/home/httpd/docs", 
        {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0

Let's try to shortcut the app location with:


  Alias /app /

Which makes Apache to look for the file in the / directory and not under /home/httpd/docs/app. Let's run it:


  stat("//test", 0xbffff8fc) = -1 ENOENT (No such file or directory)

Wow, we've got only one stat call left!

Let's remove the last Alias setting and use:


    PerlTransHandler  Apache::OK

as explained above. When we issue the request, we see no stat() calls. But this is possible only if you serve only dynamically generated documents, i.e. no CGI scripts. Otherwise, you will have to write your own PerlTransHandler to handle requests as desired.

For example, this PerlTransHandler will not lookup the file on the filesystem if the URI starts with /app, but will use the default PerlTransHandler otherwise:


  PerlTransHandler 'sub { return shift->uri() =~ m|^/app| \
                        ? Apache::OK : Apache::DECLINED;}'

Let's see the same configuration using the <Perl> section and a dedicated package:


  <Perl>  
    package My::Trans;
    use Apache::Constants qw(:common);
    sub handler{
       my $r = shift;
       return OK if $r->uri() =~ m|^/app|;
       return DECLINED;
    }

    package Apache::ReadConfig;  
    $PerlTransHandler = "My::Trans";
  </Perl>

As you see we have defined the My::Trans package and implemented the handler() function. Then we have assigned this handler to the PerlTransHandler.

Of course you can move the code in the module into an external file, (e.g. My/Trans.pm) and configure the PerlTransHandler with


  PerlTransHandler My::Trans

in the normal way (no <Perl> section required).


References

This week on Perl 6, week ending 2003-03-02

Welcome back to another episode in the ongoing saga that is the Perl 6 development process (or at least my attempt to describe it).

We kick off with perl6-internals.

IMCC calling conventions

Piers Cawley attempted to describe tail call optimizations, why they were a good thing and why a caller saves calling convention made such optimizations easier (possible?). He wondered if he hadn't just succeeded in muddying the waters still further. Jerome Vouillon appeared to understand what Piers was going on about and pointed out that a caller saves scheme also helps with exception handling. Benjamin Goldberg wondered about Perl 5's goto &func semantics which can be looked at as a 'sort of' tail call (except you don't actually get much of an optimization with Perl 5 as it stands) and proposed a callee saves scheme for doing tail call optimization which didn't optimize away an unnecessary pair of save/restores. Dan pointed out that, while goto &func (which is sort of like tail call optimization in Perl 5) would have to be supported, tail call optimization made more sense if you didn't have to use any special syntax to make use of it.

http://groups.google.com/groups

A couple of easy questions...

David (Cuny?) wondered how he could determine the data type of an arbitrary PMC and whether there were any pre-built Windows binaries of Parrot available. Leon Brocard pointed him at the typeof operator in answer to the first question but punted on the second. Leo Tötsch also pointed at typeof. David noted that it didn't seem to be available in his 0.0.9 installation and was recommended to use the CVS version, and discussion drifted toward wondering when Parrot 0.1.0 would see the light of day. (Not before at least one of either objects or exceptions is implemented apparently).

Nobody answered the 'pre built Windows binary' part of David's question.

http://groups.google.com/groups

More on optimizing the JIT with IMCC

In last week's summary I mentioned that Sean O'Rourke had suggested getting IMCC to store a control flow graph in bytecode, which the JIT could use to optimize things more effectively. Sean read this and pointed out that it wasn't his idea but was actually an area of active research and gave a pointer to some information. He also pointed to a USENIX paper which discussed adding a full data-flow compiler into a JVM which could then generate code that ran faster than a lightweight JIT, especially for long-running programs. Sean's original link was to a subscription only site but Jason 'Research wants to be free' Gloudon found a public version of the paper. Dan was fascinated, but was worried about availability of engineering time, not wishing to presume on Leo, Daniel and everyone who's done JIT work.

Dan said that he'd 'rather have a lower-overhead JIT with a win for shorter programs than a high-overhead one with a win for long-running programs'. Leo pointed out that, at the moment we already have a high overhead JIT with most of the cost paid at load time and showed some of these costs. Dan and Leo then discussed what kind of metadata would be needed from IMCC (or some external tool) in order to improve matters.

Meanwhile, the original 'Using IMCC as JIT optimizer' thread continued as Leo committed some more changes both to code and to the documentation in jit.pod. The new version of the JIT optimizing IMCC should be platform independent and apparently runs 95% of Parrot's tests on an i386.

Phil Hassey wondered why we even had a set number of registers in the JVM in the first place. He wondered if it would be possible to have each block of code declare 'I need 12 registers for this bloc' and let the JIT system do the appropriate register spilling magic with the system registers. Leo said that this is approximately what the JIT optimizer does at the moment and outlined some of the problems associated with it.

Angel Faus had some questions and suggestions about the optimization approach that Leo was taking, with particular reference to the amount of copies to and from memory and proposed an efficient way forward. Nicholas Clark wondered if some of Angel's suggestions mean that imc (IMCC source code) had now usurped the role of parrot bytecode and muttered something about premature optimization and the lack of objects, exceptions, IO or a Z-code interpreter. Leo bridled slightly at 'premature optimization' and wondered what was important about a Z-code interpreter ('Z-code interpreter' is, according to Nicholas, 'obfuscated shorthand for ``dynamic opcode libraries'' and ``reading foreign bytecode''.')

Toward the end of the week, Leo checked in a final patch related to this experiment and commented that, to do things right, JIT optimization should move in the direction that Angel Faus had outlined.

http://groups.google.com/groups

http://citeseer.nj.nec.com/krintz01using.html -- Sean's first link

http://www-hydra.stanford.edu/publications/JVM02.pdf -- 'Free' paper

http://groups.google.com/groups -- IMCC as JIT Optimizer thread

http://groups.google.com/groups -- Leo's Last (-Oj) patch

Parrot 0.0.10 freeze

Steve Fink announced that it had been brought to his attention that we were overdue for another release and announced that he'd like to have a Parrot feature freeze on March 8, with a Parrot 0.0.10 release a week after that (or a Parrot 0.1.0 release if someone sneaks objects or exceptions in under the wire...).

Jerome Quelin wondered about a codename and Benjamin Goldberg commented that 'we don't have any of objects, exceptions, or a real IO system' and suggested that we use 'Kakapo', which is a large, flightless parrot. Garret Göbbel suggested the rather wonderful 'Orange Juice' in homage to Leo's recent work on the -Oj JIT optimization switch.

http://groups.google.com/groups

Dan's plans

Dan outlined his plan for the string rework (discussed last week). Nobody said anything, maybe they liked it.

Dan also outlined some requirements for the final Parrot build system, again, nobody comment.

And, right at the end of the week, he released his second attempt at an object spec. This one definitely got comments, but I'll cover them in the next summary.

http://groups.google.com/groups -- Strings

http://groups.google.com/groups -- Builds

http://groups.google.com/groups -- Objects

PSteve Peters' Patches Prevent Parrot Peeves

Steve Peters released a flurry of patches to eliminate compilation warnings during Parrot compilation. Leo wasn't sure about one of them, but I assume that most of them will be applied at some point.

Meanwhile, in perl6-language

Things were even quieter than last week with all of 10 messages, one of which was last week's summary.

Arrays, lists, referencing

Paul Johnson had observed that changing the order of evaluation (of terms in a list) -- which is currently undefined in theory whilst being left to right in practice -- would almost certainly break a great deal. He suggested that it would be sensible for Perl 6 to define such an order.

Larry agreed, commenting that 'The fact that Perl 5 doesn't define it is merely an oversight, brought on no doubt by a lack of oversight. But as you point out it can deduced by observation of all the various implementations of Perl.' Which made at least one person smile.

Predefined properties/traits/etc.

Last week, Simon Cozens asked that someone 'please compile a list of all the ``is foo'' properties that have been suggested/accepted as being pre-defined by the language.' as he couldn't keep track of them all.

This week someone (who also happened to be Simon Cozens) did just that. Allison Randal went through Simon's list commenting on what he'd got right and wrong, and explaining the general rule (properties and traits are lower case, class names are capitalized, so is Foo mean that something is a 'Foo', while is bar relates to a 'bar' property). The rest of the thread was taken up with confusion between Munich and Zurich.

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

I'm getting the feeling of someone sat in the calm before the storm. perl6-language has been very quiet these last few weeks, I'm guessing that people don't want to distract Larry from the important business of producing an Apocalypse. Rumours abound of a draft in circulation among the Perl 6 core design team... Maybe this week will see its publication and the level of discussion in the language list will rise once more. Until then I'm enjoying the calm.

Still no American Odyssey web page. One day, I promise.

If you appreciated this summary, please consider one or more of the following options:

This week's summary was again sponsored by Darren Duncan. Thanks Darren. If you'd like to become a summary sponsor, drop me a line at p6summarizer@bofh.org.uk.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en