Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 6

Subroutines

by Larry Wall
March 07, 2003

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

This is the Apocalypse on Subroutines. In Perl culture the term "subroutine" conveys the general notion of calling something that returns control automatically when it's done. This "something" that you're calling may go by a more specialized name such as "procedure", "function", "closure", or "method". In Perl 5, all such subroutines were declared using the keyword sub regardless of their specialty. For readability, Perl 6 will use alternate keywords to declare special subroutines, but they're still essentially the same thing underneath. Insofar as they all behave similarly, this Apocalypse will have something to say about them. (And if we also leak a few secrets about how method calls work, that will make Apocalypse 12 all the easier--presuming we don't have to un-invent anything between now and then...)

Here are the RFCs covered in this Apocalypse. PSA stands for "problem, solution, acceptance", my private rating of how this RFC will fit into Perl 6. I note that none of the RFCs achieved unreserved acceptance this time around. Maybe I'm getting picky in my old age. Or maybe I just can't incorporate anything into Perl without "marking" it...


    RFC   PSA   Title
    ---   ---   -----
     21   abc   Subroutines: Replace C<wantarray> with a generic C<want>
                   function
     23   bcc   Higher order functions
     57   abb   Subroutine prototypes and parameters
     59   bcr   Proposal to utilize C<*> as the prefix to magic subroutines
     75   dcr   structures and interface definitions
    107   adr   lvalue subs should receive the rvalue as an argument
    118   rrr   lvalue subs: parameters, explicit assignment, and wantarray()
                   changes
    128   acc   Subroutines: Extend subroutine contexts to include name
                   parameters and lazy arguments
    132   acr   Subroutines should be able to return an lvalue
    149   adr   Lvalue subroutines: implicit and explicit assignment
    154   bdr   Simple assignment lvalue subs should be on by default
    160   acc   Function-call named parameters (with compiler optimizations)
    168   abb   Built-in functions should be functions
    176   bbb   subroutine / generic entity documentation
    194   acc   Standardise Function Pre- and Post-Handling
    271   abc   Subroutines : Pre- and post- handlers for subroutines
    298   cbc   Make subroutines' prototypes accessible from Perl
    334   abb   Perl should allow specially attributed subs to be called as C
                   functions
    344   acb   Elements of @_ should be read-only by default

In Apocalypses 1 through 4, I used the RFCs as a springboard for discussion. In Apocalypse 5 I was forced by the complexity of the redesign to switch strategies and present the RFCs after a discussion of all the issues involved. That was so well received that I'll try to follow the same approach with this and subsequent Apocalypses.

But this Apocalypse is not trying to be as radical as the one on regexes. Well, okay, it is, and it isn't. Alright, it is radical, but you'll like it anyway (we hope). At least the old way of calling subroutines still works. Unlike regexes, Perl subroutines don't have a lot of historical cruft to get rid of. In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle. Perl 6 will let you migrate the cruft out of the user-defined code and back into the declarations where it belongs. Then you will think it to be very beautiful cruft indeed (we hope).

Perl 5's subroutines have a number of issues that need to be dealt with. First of all, they're just awfully slow, for various reasons:

  • Construction of the @_ array
  • Needless prepping of potential lvalues
  • General model that forces lots of run-time processing
  • Difficulty of optimization
  • Storage of unneeded context
  • Lack of tail recursion optimization
  • Named params that aren't really
  • Object model that forces double dispatch in some cases

Quite apart from performance, however, there are a number of problems with usability:

  • Not easy to detect type errors at compile time
  • Not possible to specify the signatures of certain built-in functions
  • Not possible to define control structures as subroutines
  • Not possible to type-check any variadic args other than as a list
  • Not possible to have a variadic list providing scalar context to its elements
  • Not possible to have lazy parameters
  • Not possible to define immediate subroutines (macros)
  • Not possible to define subroutines with special syntax
  • Not enough contextual information available at run time.
  • Not enough contextual information available at compile time.

In general, the consensus is that Perl 5's simple subroutine syntax is just a little too simple. Well, okay, it's a lot too simple. While it's extremely orthogonal to always pass all arguments as a single variadic array, that mechanism does not always map well onto the problem space. So in Perl 6, subroutine syntax has blossomed in several directions.

But the most important thing to note is that we haven't actually added a lot of syntax. We've added some, but most of new capabilities come in through the generalized trait/property system, and the new type system. But in those cases where specialized syntax buys us clarity, we have not hesitated to add it. (Er, actually, we hesitated quite a lot. Months, in fact.)

One obvious difference is that the sub on closures is now optional, since every brace-delimited block is now essentially a closure. You can still put the sub if you like. But it is only required if the block would otherwise be construed as a hash value; that is, if it appears to contain a list of pairs. You can force any block to be considered a subroutine with the sub keyword; likewise you can force any block to be considered a hash value with the hash keyword. But in general Perl just dwims based on whether the top-level is a list that happens to have a first argument that is a pair or hash:


    Block               Meaning
    -----               -------
    { 1 => 2 }          hash { 1 => 2 }
    { 1 => 2, 3 => 4 }  hash { 1 => 2, 3 => 4 }
    { 1 => 2, 3, 4 }    hash { 1 => 2, 3 => 4 }
    { %foo, 1 => 2 }    hash { %foo.pairs, 1 => 2 }

Anything else that is not a list, or does not start with a pair or hash, indicates a subroutine:


    { 1 }               sub { return 1 }
    { 1, 2 }            sub { return 1, 2 }
    { 1, 2, 3 }         sub { return 1, 2, 3 }
    { 1, 2, 3 => 4 }    sub { return 1, 2, 3 => 4 }
    { pair 1,2,3,4 }    sub { return 1 => 2, 3 => 4 }
    { gethash() }       sub { return gethash() }

This is a syntactic distinction, not a semantic one. That last two examples are taken to be subs despite containing functions returning pairs or hashes. Note that it would save no typing to recognize the pair method specially, since hash automatically does pairing of non-pairs. So we distinguish these:


    { pair 1,2,3,4 }    sub { return 1 => 2, 3 => 4 }
    hash { 1,2,3,4 }    hash { 1 => 2, 3 => 4 }

If you're worried about the compiler making bad choices before deciding whether it's a subroutine or hash, you shouldn't. The two constructs really aren't all that far apart. The hash keyword could in fact be considered a function that takes as its first argument a closure returning a hash value list. So the compiler might just compile the block as a closure in either case, then do the obvious optimization.

Although we say the sub keyword is now optional on a closure, the return keyword only works with an explicit sub. (There are other ways to return values from a block.)

Subroutine Declarations

You may still declare a sub just as you did in Perl 5, in which case it behaves much like it did in Perl 5. To wit, the arguments still come in via the @_ array. When you say:


    sub foo { print @_ }

that is just syntactic sugar for this:


    sub foo (*@_) { print @_ }

That is, Perl 6 will supply a default parameter signature (the precise meaning of which will be explained below) that makes the subroutine behave much as a Perl 5 programmer would expect, with all the arguments in @_. It is not exactly the same, however. You may not modify the arguments via @_ without declaring explicitly that you want to do so. So in the rare cases that you want to do that, you'll have to supply the rw trait (meaning the arguments should be considered "read-write"):


    sub swap (*@_ is rw) { @_[0,1] = @_[1,0] };

The Perl5-to-Perl6 translator will try to catch those cases and add the parameter signature for you when you want to modify the arguments. (Note: we will try to be consistent about using "arguments" to mean the actual values you pass to the function when you call it, and "parameters" to mean the list of lexical variables declared as part of the subroutine signature, through which you access the values that were passed to the subroutine.)

Perl 5 has rudimentary prototypes, but Perl 6 type signatures can be much more expressive if you want them to be. The entire declaration is much more flexible. Not only can you declare types and names of individual parameters, you can add various traits to the parameters, such as rw above. You can add traits to the subroutine itself, and declare the return type. In fact, at some level or other, the subroutine's signature and return type are also just traits. You might even consider the body of the subroutine to be a trait.

For those of you who have been following Perl 6 development, you'll wonder why we're now calling these "traits" rather than "properties". They're all really still properties under the hood, but we're trying to distinguish those properties that are expected to be set on containers at compile time from those that are expected to be set on values at run time. So compile-time properties are now called "traits". Basically, if you declare it with is, it's a trait, and if you add it onto a value with but, it's a property. The main reason for making the distinction is to keep the concepts straight in people's minds, but it also has the nice benefit of telling the optimizer which properties are subject to change, and which ones aren't.

A given trait may or may not be implemented as a method on the underlying container object. You're not supposed to care.

There are actually several syntactic forms of trait:


    rule trait :w {
          is <ident>[\( <traitparam> \)]?
        | will <ident> <closure>
        | of <type>
        | returns <type>
    }

(We're specifying the syntax here using Perl 6 regexes. If you don't know about those, go back and read Apocalypse 5.)

A <type> is actually allowed to be a junction of types:


    sub foo returns Int|Str {...}

The will syntax specifically introduces a closure trait without requiring the extra parens that is would. Saying:


    will flapdoodle { flap() and doodle() }

is exactly equivalent to:


    is flapdoodle({ flap() and doodle() })

but reads a little better. More typically you'll see traits like:


    will first { setup() }
    will last { teardown() }

The final block of a subroutine declaration is the "do" trait. Saying:


    sub foo { ... }

is like saying:


    sub foo will do { ... }

Note however that the closure eventually stored under the do trait may in fact be modified in various ways to reflect argument processing, exception handling, and such.

We'll discuss the of and returns traits later when we discuss types. Back to syntax.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Next Pagearrow