Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 4
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

Accepted RFCs

Previous Apocalypses

Apocalypse One

Apocalypse Two

Apocalypse Three

Note that, although these RFCs are in the "accepted" category, most are accepted with major caveats (a "c" acceptance rating), or at least some "buts" (a "b" rating). I'll try to list all those caveats here, but where there are systematic changes, I may indicate these generally in this document without attempting to rewrite the RFC in every detail. Those who implement these features must be sensitive to these systematic changes and not just uncritically implement everything the RFC says.

I'd like to talk about exceptions first, but before that I have to deal with the switch statement, because I think it's silly not to unify exception handlers with switch statements.

RFC 022: Control flow: Builtin switch statement

Some OO purists say that any time you want to use a switch statement, you ought to make the discriminant of the switch statement into a type, and use method dispatch instead. Fortunately, we are not OO purists here, so forget that argument.

Another argument against having a switch statement in Perl 6 is that we never had it in the first five versions of Perl. But it would be incorrect to say that we didn't miss it. What actually happened was that every time we started discussing how to add a switch statement, it wasn't obvious how far to go. A switch statement in Perl ought to do more than a switch statement in C (or in most any other language, for that matter). So the fact that we haven't added a switch statement so far says more about how hard it is to design a good one than about how much we wanted a lousy one. Eventually the ever inventive Damian Conway came up with his famous design, with a Perl 5 module as proof of concept, and pretty much everyone agreed that he was on the right track, for some definition of "right" (and "track"). This RFC is essentially that design (not surprisingly, since Damian wrote it), so it will be accepted, albeit with several tweaks.

In the first place, as a quasi-linguist, I loathe the keywords switch and case. I would prefer keywords that read better in English. Much as I love verbing nouns, they don't work as well as real verbs or real prepositions when topicalizers are called for. After thrashing over several options with Damian and other folks, we've settled on using given instead of switch, and when instead of case:

    given EXPR {
        when EXPR { ... }
        when EXPR { ... }
        ...
    }

The other great advantage of using different words is that people won't expect it to work exactly like any other switch statement they may be familiar with.

That being said, I should point out that it is still called "the switch statement", and the individual components are still "cases". But you don't have to put "switch" or "case" into constant-width font, because they're not keywords.

Because curlies are so extremely overloaded in Perl 5, I was at first convinced that we would need a separator of some sort between the expression and the block, maybe a : or => or some such. Otherwise it would be too ambigous to come upon a left curly when expecting an operator--it would be interpreted as a hash subscript instead. Damian's RFC proposes to require parentheses in certain situations to disambiguate the expression.

But I've come to the conclusion that I'd rather screw around (a little) with the "insignificant whitespace" rule than to require an extra unnatural delimiter. If we observe current practice, we note that 99% of the time, when people write a hash subscript they do so without any whitespace before it. And 99% of the time, when they write a block, they do put some whitespace in front of it. So we'll just dwim it using the whitespace. (No, we're not going all the way to whole-hog whitespace dwimmery--Python will remain the best/worst example of that approach.)

Subscripts are the only valid use of curlies when an operator is expected. (That is, subscripts are essentially postfix operators.) In contrast, hash composers and blocks are terms, not operators. Therefore, we will make the rule that a left curly that has whitespace in front of it will never be interpreted as a subscript in Perl 6. (If you think this is totally bizarre thing to do, consider that this new approach is actually consistent with how Perl 5 already parses variables within interpolated strings.) If there is any space before the curly, we force it to start a term, not an operator, which means that the curlies in question must delimit either a hash composer or a block. And it's a hash composer only if it contains a => pair constructor at the top level (or an explicit hash keyword on the front.) Therefore it's possible to unambiguously terminate an expression by following it with a block, as in the constructs above.

Interestingly, this one tweak to the whitespace rule also means that we'll be able to simplify the parentheses out of other similar built-in constructs:

    if $foo { ... }
    elsif $bar { ... }
    else { ... }
    while $more { ... }
    for 1..10 { ... }

I think throwing out two required punctuation characters for one required whitespace is an excellent trade in terms of readability, particularly when it already matches common practice. (You can still put in the parens if you want them, of course, just for old times' sake.) This tweak also allows greater flexibility in how user-defined constructs are parsed. If you want to define your own constructs, they should be able to follow the same syntax rules as built-ins.

By a similar chain of logic (or illogic), I also want to tweak the whitespace rules for the trailing curly. There are severe problems in any C-derived language that allows user-defined constructs containing curlies (as Perl does). Even C doesn't entirely escape the head-scratching puzzle of "When do I put a semicolon after a curly?" A struct definition requires a terminating semicolon, for instance, while an if or a while doesn't.

In Perl, this problem comes up most often when people say "Why do I have to put a semicolon after do {} or eval {} when it looks like a complete statement?"

Well, in Perl 6, you don't, if the final curly is on a line by itself. That is, if you use an expression block as if it were a statement block, it behaves as one. The win is that these rules are consistent across all expression blocks, whether user-defined or built-in. Any expression block construct can be treated as either a statement or a component of an expression. Here's a block that is being treated as a term in an expression:

    $x = do {
        ...
    } + 1;

However, if you write

    $x = do {
        ...
    }
    + 1;

then the + will be taken erroneously as the start of a new statement. (So don't do that.)

Note that this special rule only applies to constructs that take a block (that is, a closure) as their last (or only) argument. Operators like sort and map are unaffected. However, certain constructs that used to be in the statement class may become expression constructs in Perl 6. For instance, if we change BEGIN to an expression construct we can now use a BEGIN block inside an expression to force compile-time evaluation of a non-static expression:

    $value = BEGIN { call_me_once() } + call_me_again();

On the other hand, a one-line BEGIN would then have to have a semicolon.

Anyway, back to switch statements. Damian's RFC proposes various specific kinds of dwimmery, and while some of those dwims are spot on, others may need adjustment. In particular, there is an assumption that the programmer will know when they're dealing with an object reference and when they're not. But everything will be an object reference in Perl 6, at some level or other. The underlying characteristics of any object are most generally determined by the answer to the question, "What methods does this object respond to?"

Unfortunately, that's a run-time question in general. But in specific, we'd like to be able to optimize many of these switch statements at compile time. So it may be necessary to supply typological hints in some cases to do the dwimmery efficiently. Fortunately, most cases are still fairly straightforward. A 1 is obviously a number, and a "foo" is obviously a string. But unary + can force anything to a number, and unary _ can force anything to a string. Unary ? can force a boolean, and unary . can force a method call. More complicated thoughts can be represented with closure blocks.

Another thing that needs adjustment is that the concept of "isa" matching seems to be missing, or at least difficult to express. We need good "isa" matching to implement good exception handling in terms of the switch mechanism. This means that we need to be able to say something like:

    given $! {
        when Error::Overflow { ... }
        when Error::Type { ... }
        when Error::ENOTTY { ... }
        when /divide by 0/ { ... }
        ...
    }

and expect it to check $!.isa(Error::Overflow) and such, along with more normal pattern matching. In the case of the actual exception mechanism, we won't use the keyword given, but rather CATCH:

    CATCH {
        when Error::Overflow { ... }
        when Error::Type { ... }
        when Error::ENOTTY { ... }
        when /divide by 0/ { ... }
        ...
    }

CATCH is a BEGIN-like block that can turn any block into a "try" block from the inside out. But the insides of the CATCH are an ordinary switch statement, where the discriminant is simply the current exception object, $!. More on that later--see RFC 88 below.

Some of you may recall that I've stated that Perl 6 will have no barewords. That's still the case. A token like Error::Overflow is not a bareword because it's a declared class. Perl 6 recognizes package names as symbolic tokens. So when you call a class method as Class::Name.method(), the Class::Name is actually a class object (that just happens to stringify to "Class::Name"). But the class method can be called without a symbolic lookup on the package name at run time, unlike in Perl 5.

Since Error::Overflow is just such a class object, it can be distinguished from other kinds of objects in a switch statement, and an "isa" can be inferred. It would be nice if we could go as far as to say that any object can be called with any class name as a method name to determine whether it "isa" member of that class, but that could interfere with use of class name methods to implement casting or construction. So instead, since switch statements are into heavy dwimmery anyway, I think the switch statement will have to recognize any Class::Name known at compile time, and force it to call $!.isa(Class::Name).

Another possible adjustment will involve the use of switch statements as a means of parallelizing regular expression evaluation. Specifically, we want to be able to write parsers easily in Perl, which means that we need some way of matching a token stream against something like a set of regular expressions. You can think of a token stream as a funny kind of string. So if the "given" of a switch statement is a token stream, the regular expressions matched against it may have special abilities relating to the current parse's data structure. All the regular expressions of such a switch statement will likely be implicitly anchored to the current parse location, for instance. There may be special tokens referring to terminals and non-terminals. Basically, think of something like a yacc grammar, where alternative pattern/action grammar rules are most naturally expressed via switch statement cases. More on that in the next Apocalypse.

Another possible adjustment is that the proposed else block could be considered unnecessary. The code following the final when is automatically an "else". Here's a duodecimal digit converter:

    $result = given $digit {
        when "T" { 10 }
        when "E" { 11 }
        $digit;
    }

Nevertheless, it's probably good documentation to line up all the blocks, which means it would be good to have a keyword. However, for reasons that will become clearer when we talk about exception handlers, I don't want to use else. Also, because of the identification of when and if, it would not be clear whether an else should automatically supply a break at the end of its block as the ordinary when case does.

So instead of else, I'd like to borrow a bit more from C and use default:

    $result = given $digit {
        when "T" { 10 }
        when "E" { 11 }
        default  { $digit }
    }

Unlike in C, the default case must come last, since Perl's cases are evaluated (or at least pretend to be evaluated) in order. The optimizer can often determine which cases can be jumped to directly, but in cases where that can't be determined, the cases are evaluated in order much like cascaded if/elsif/else conditions. Also, it's allowed to intersperse ordinary code between the cases, in which case the code must be executed only if the cases above it fail to match. For example, this should work as indicated by the print statements:

    given $given {
        print "about to check $first";
        when $first { ... }
        print "didn't match $first; let's try $next";
        when $next { ... }
        print "giving up";
        default { ... }
        die "panic: shouldn't see this";
    }

We can still define when as a variant of if, which makes it possible to intermix the two constructs when (or if) that is desirable. So we'll leave that identity in--it always helps people think about it when you can define a less familiar construct in terms of a more familiar one. However, the default isn't quite the same as an else, since else can't stand on its own. A default is more like an if that's always true. So the above code is equivalent to:

    given $given {
        print "about to check $first";
        if $given =~ $first { ...; break }
        print "didn't match $first; let's try $next";
        if $given =~ $next { ...; break }
        print "giving up";
        if 1 { ...; break; }
        die "panic: shouldn't see this";
    }

We do need to rewrite the relationship table in the RFC to handle some of the tweaks and simplifications we've mentioned. The comparison of bare refs goes away. It wasn't terribly useful in the first place, since it only worked for scalar refs. (To match identities we'll need an explicit .id method in any event. We won't be relying on the default numify or stringify methods to produce unique representations.)

I've rearranged the table to be applied in order, so that default interpretations come later. Also, the "Matching Code" column in the RFC gave alternatives that aren't resolved. In these cases I've chosen the "true" definition rather than the "exists" or "defined" definition. (Except for certain set manipulations with hashes, people really shouldn't be using the defined/undefined distinction to represent true and false, since both true and false are considered defined concepts in Perl.)

Some of the table entries distinguish an array from a list. Arrays look like this:

    when [1, 3, 5, 7, 9] { "odd digit intersection" }
    when @array          { "array intersection" }

while a list looks like this:

    when 1, 3, 5, 7, 9    { "odd digit" }
    when @foo, @bar, @baz { "intersection with at least one array" }

Ordinarily lists and arrays would mean the same thing in scalar context, but when is special in differentiating explicit arrays from lists. Within a when, a list is a recursive disjunction. That is, the comma-separated values are treated as individual cases OR-ed together. We could use some other explicit notation for disjunction such as:

    when any(1, 3, 5, 7, 9) { "odd" }

But that seems a lot of trouble for a very common case of case, as it were. We could use vertical bars as some languages do, but I think the comma reads better.

Anyway, here's another simplification. The following table will also define how the Perl 6 =~ operator works! That allows us to use a recursive definition to handle matching against a disjunctive list of cases. (See the first entry in the table below.) Of course, for precedence reasons, to match a list of things using =~ you'll have to use parens:

    $digit =~ (1, 3, 5, 7, 9) and print "That's odd!";

Alternatively, you can look at this table as the definition of the =~ operator, and then say that the switch statement is defined in terms of =~. That is, for any switch statement of the form

    given EXPR1 {
        when EXPR2 { ... }
    }

it's equivalent to saying this:

    for (scalar(EXPR1)) {
        if ($_ =~ (EXPR2)) { ... }
    }

Table 1: Matching a switch value against a case value

    $a      $b        Type of Match Implied    Matching Code
    ======  =====     =====================    =============
    expr    list      recursive disjunction    match if $a =~ any($b)
    list    list      recursive disjunction*   match if any($a) =~ any($b)
    hash    sub(%)    hash sub truth           match if $b(%$a)
    array   sub(@)    array sub truth          match if $b(@$a)
    expr    sub($)    scalar sub truth         match if $b($a)
    expr    sub()     simple closure truth*    match if $b()
    hash    hash      hash key intersection*   match if grep exists $a{$_}, $b.keys
    hash    array     hash value slice truth   match if grep {$a{$_}} @$b
    hash    regex     hash key grep            match if grep /$b/, keys %$a
    hash    scalar    hash entry truth         match if $a{$b}
    array   array     array intersection*      match if any(@$a) =~ any(@$b)
    array   regex     array grep               match if grep /$b/, @$a
    array   number    array entry truth        match if $a[$b]
    array   expr      array as list            match if any($a) =~ $b
    object  class     class membership         match if $a.isa($b)
    object  method    method truth             match if $a.$b()
    expr    regex     pattern match            match if $a =~ /$b/
    expr    subst     substitution match       match if $a =~ subst
    expr    number    numeric equality         match if $a == $b
    expr    string    string equality          match if $a eq $b
    expr    boolean   simple expression truth* match if $b
    expr    undef     undefined                match unless defined $a
    expr    expr      run-time guessing        match if ($a =~ $b) at runtime

In order to facilitate optimizations, these distinctions are made syntactically at compile time whenever possible. For each comparison, the reverse comparison is also implied, so $a/$b can be thought of as either given/when or when/given. (We don't reverse the matches marked with * are because it doesn't make sense in those casees.)

If type of match cannot be determined at compile time, the default is to try to apply the very same rules in the very same order at run time, using the actual types of the arguments, not their compile-time type appearance. Note that there are no run-time types corresponding to "method" or "boolean". Either of those notions can be expressed at runtime as a closure, of course.

In fact, whenever the default behavior is not what you intend, there are ways to force the arguments to be treated as you intend:

    Intent      Natural           Forced
    ======      =======           ======
    array       @foo              [list] or @{expr}
    hash        %bar              {pairlist} or %{expr}
    sub(%)      { %^foo.aaa }     sub (%foo) { ... }
    sub(@)      { @^bar.bbb }     sub (@bar) { ... }
    sub($)      { $^baz.ccc }     sub ($baz) { ... }
    number      numeric literal   +expr int(expr) num(expr)
    string      string literal    _expr str(expr)
    regex       //, m//, qr//     /$(expr)/
    method      .foo(args)        { $_.$method(args) }
    boolean     $a == $b          ?expr or true expr or { expr }

A method must be written with a unary dot to distinguish it from other forms. The method may have arguments. In essence, when you write

    .foo(1,2,3)

it is treated as if you wrote

    { $_.foo(1,2,3) }

and then the closure is evaluated for its truth.

A class match works only if the class name is known at compile time. Use .isa("Class") for more complicated situations.

Boolean expressions are recognized at compile time by the presence of a top-level operator that is a comparison or logical operator. As the table shows, an argumentless closure (a sub (), that is) also functions as a boolean. However, it's probably better documentation to use the true function, which does the opposite of not. (Or the unary ? operator, which does the opposite of unary !.)

It might be argued that boolean expressions have no place here at all, and that you should use if if that's what you mean. (Or use a sub() closure to force it to ignore the given.) However, the "comb" structure of a switch is an extremely readable way to write even ordinary boolean expressions, and rather than forcing people to write:

    anyblock {
        when { $a == 1 } { ... }
        when { $b == 2 } { ... }
        when { $c == 3 } { ... }
        default          { ... }
    }

I'd rather they be able to write:

    anyblock {
        when $a == 1 { ... }
        when $b == 2 { ... }
        when $c == 3 { ... }
        default      { ... }
    }

This also fits better into the use of "when" within CATCH blocks:

    CATCH {
        when $!.tag eq "foo" { ... }
        when $!.tag eq "bar" { ... }
        default              { die }
    }

To force all the when clauses to be interpreted as booleans without using a boolean operator on every case, simply provide an empty given, to be read as "given nothing...":

    given () {
        when $a.isa(Ant) { ... }
        when $b.isa(Bat) { ... }
        when $c.isa(Cat) { ... }
        default          { ... }
    }

A when can be used by other topicalizers than just given. Just as CATCH will imply a given of $!, a for loop (the foreach variety) will also imply a given of the loop variable:

    for @foo {
        when 1   { ... }
        when 2   { ... }
        when "x" { ... }
        default  { ... }
    }

By symmetry, a given will by default alias $_ to the "given". Basically, the only difference between a given and a for is that a given takes a scalar expression, while a for takes a pre-flattened list and iterates over it.

Suppose you want to preserve $_ and alias $g to the value instead. You can say that like this:

    given $value -> $g {
        when 1 { /foo/ }
        when 2 { /bar/ }
        when 3 { /baz/ }
    }

In the same way, a loop's values can be aliased to one or more loop variables.

    for @foo -> $a, $b {  # two at a time
        ...
    }

That works a lot like the definition of a subroutine call with two formal parameters, $a and $b. (In fact, that's precisely what it is.) You can use modifiers on the formal paramaters just as you would in a subroutine type signature. This implies that the aliases are automatically declared as my variables. It also implies that you can modify the formal parameter with an rw property, which allows you to modify the original elements of the array through the variable. The default loop:

    for @foo { ... }

is really compiled down to this:

    for @foo -> $_ is rw { ... }

Since for and given work by passing arguments to a closure, it's a small step to generalize that in the other direction. Any method definition is a topicalizer within the body of the method, and will assume a "given" of its $self object (or whatever you have named it). Bare closures topicalize their first argument, implicitly aliasing it to $_ unless $^a or some such is used. That is, if you say this:

    grep { $_ eq 3 } @list

it's equivalent to this more explicit use of a curried function:

    grep { $^a eq 3 } @list

But even a grep can use the aliasing syntax above:

    grep -> $x { $x eq 3 } @list

Outside the scope of any topicalizer, a when will assume that its given was stored in $_ and will test implicitly against that variable. This allows you to use when in your main loop, for instance, even if that main loop was supplied by Perl's -n or -p switch. Whenever a loop is functioning as a switch, the break implied by finishing a case functions as a next, not a last. Use last if that's what you mean.

A when is the only defaulting construct that pays attention to the current topicalizer regardless of which variable it is associated with. All other defaulting constructs pay attention to a fixed variable, typically $_. So be careful what you're matching against if the given is aliased to something other than $_:

    $_ = "foo";
    given "bar" -> $f {
        if /foo/   { ... } # true, matches against $_
        when /bar/ { ... } # true, matches against $f
    }

Oh, one other tweak. The RFC proposes to overload next to mean "fall through to the next case". I don't think this is wise, since we'll often want to use loop controls within a switch statement. Instead, I think we should use skip to do that. (To be read as "Skip to the next statement.")

Similarly, if we make a word to mean to explicitly break out of a topicalizer, it should not be last. I'd suggest break! It will, of course, be unnecessary to break out of the end of a when case because the break is implied. However, there are times when you might want to break out of a when block early. Also, since we're allowing when modifiers that do not implicitly break, we could use an explicit break for that situation. You might see cases like this:

    given $x {
        warn("Odd value")        when !/xxx/;
        warn("No value"), break  when undef;
        when /aaa/ { break when 1; ... }
        when /bbb/ { break when 2; ... }
        when /ccc/ { break when 3; ... }
    }

So it looks to me like we need a break.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Next Pagearrow