Apocalypse 4
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.
Accepted RFCs
|
Previous Apocalypses |
Note that, although these RFCs are in the "accepted" category, most
are accepted with major caveats (a "c" acceptance rating), or at
least some "buts" (a "b" rating). I'll try to list all those caveats
here, but where there are systematic changes, I may indicate these
generally in this document without attempting to rewrite the RFC in
every detail. Those who implement these features must be sensitive
to these systematic changes and not just uncritically implement
everything the RFC says.
I'd like to talk about exceptions first, but before that I have to deal with the switch statement, because I think it's silly not to unify exception handlers with switch statements.
RFC 022: Control flow: Builtin switch statement
Some OO purists say that any time you want to use a switch statement, you ought to make the discriminant of the switch statement into a type, and use method dispatch instead. Fortunately, we are not OO purists here, so forget that argument.
Another argument against having a switch statement in Perl 6 is that we never had it in the first five versions of Perl. But it would be incorrect to say that we didn't miss it. What actually happened was that every time we started discussing how to add a switch statement, it wasn't obvious how far to go. A switch statement in Perl ought to do more than a switch statement in C (or in most any other language, for that matter). So the fact that we haven't added a switch statement so far says more about how hard it is to design a good one than about how much we wanted a lousy one. Eventually the ever inventive Damian Conway came up with his famous design, with a Perl 5 module as proof of concept, and pretty much everyone agreed that he was on the right track, for some definition of "right" (and "track"). This RFC is essentially that design (not surprisingly, since Damian wrote it), so it will be accepted, albeit with several tweaks.
In the first place, as a quasi-linguist, I loathe the keywords switch
and case. I would prefer keywords that read better in English.
Much as I love verbing nouns, they don't work as well as real verbs or
real prepositions when topicalizers are called for. After thrashing
over several options with Damian and other folks, we've settled on
using given instead of switch, and when instead of case:
given EXPR {
when EXPR { ... }
when EXPR { ... }
...
}
The other great advantage of using different words is that people won't expect it to work exactly like any other switch statement they may be familiar with.
That being said, I should point out that it is still called "the switch statement", and the individual components are still "cases". But you don't have to put "switch" or "case" into constant-width font, because they're not keywords.
Because curlies are so extremely overloaded in Perl 5, I was at
first convinced that we would need a separator of some sort between
the expression and the block, maybe a : or => or some such.
Otherwise it would be too ambigous to come upon a left curly when
expecting an operator--it would be interpreted as a hash subscript
instead. Damian's RFC proposes to require parentheses in certain
situations to disambiguate the expression.
But I've come to the conclusion that I'd rather screw around (a little) with the "insignificant whitespace" rule than to require an extra unnatural delimiter. If we observe current practice, we note that 99% of the time, when people write a hash subscript they do so without any whitespace before it. And 99% of the time, when they write a block, they do put some whitespace in front of it. So we'll just dwim it using the whitespace. (No, we're not going all the way to whole-hog whitespace dwimmery--Python will remain the best/worst example of that approach.)
Subscripts are the only valid use of curlies when an operator is
expected. (That is, subscripts are essentially postfix operators.) In
contrast, hash composers and blocks are terms, not operators.
Therefore, we will make the rule that a left curly that has whitespace
in front of it will never be interpreted as a subscript in Perl 6.
(If you think this is totally bizarre thing to do, consider that this
new approach is actually consistent with how Perl 5 already parses
variables within interpolated strings.) If there is any space before
the curly, we force it to start a term, not an operator, which means
that the curlies in question must delimit either a hash composer or
a block. And it's a hash composer only if it contains a =>
pair constructor at the top level (or an explicit hash keyword on
the front.) Therefore it's possible to unambiguously terminate an
expression by following it with a block, as in the constructs above.
Interestingly, this one tweak to the whitespace rule also means that we'll be able to simplify the parentheses out of other similar built-in constructs:
if $foo { ... }
elsif $bar { ... }
else { ... }
while $more { ... }
for 1..10 { ... }
I think throwing out two required punctuation characters for one required whitespace is an excellent trade in terms of readability, particularly when it already matches common practice. (You can still put in the parens if you want them, of course, just for old times' sake.) This tweak also allows greater flexibility in how user-defined constructs are parsed. If you want to define your own constructs, they should be able to follow the same syntax rules as built-ins.
By a similar chain of logic (or illogic), I also want to tweak
the whitespace rules for the trailing curly. There are severe
problems in any C-derived language that allows user-defined constructs
containing curlies (as Perl does). Even C doesn't entirely escape the
head-scratching puzzle of "When do I put a semicolon after a curly?"
A struct definition requires a terminating semicolon, for instance,
while an if or a while doesn't.
In Perl, this problem comes up most often when people say "Why do I
have to put a semicolon after do {} or eval {} when it looks
like a complete statement?"
Well, in Perl 6, you don't, if the final curly is on a line by itself. That is, if you use an expression block as if it were a statement block, it behaves as one. The win is that these rules are consistent across all expression blocks, whether user-defined or built-in. Any expression block construct can be treated as either a statement or a component of an expression. Here's a block that is being treated as a term in an expression:
$x = do {
...
} + 1;
However, if you write
$x = do {
...
}
+ 1;
then the + will be taken erroneously as the start of a new statement. (So don't do that.)
Note that this special rule only applies to constructs that take a
block (that is, a closure) as their last (or only) argument. Operators
like sort and map are unaffected. However, certain constructs
that used to be in the statement class may become expression constructs in
Perl 6. For instance, if we change BEGIN to an expression construct
we can now use a BEGIN block inside an expression to force
compile-time evaluation of a non-static expression:
$value = BEGIN { call_me_once() } + call_me_again();
On the other hand, a one-line BEGIN would then have to have a semicolon.
Anyway, back to switch statements. Damian's RFC proposes various specific kinds of dwimmery, and while some of those dwims are spot on, others may need adjustment. In particular, there is an assumption that the programmer will know when they're dealing with an object reference and when they're not. But everything will be an object reference in Perl 6, at some level or other. The underlying characteristics of any object are most generally determined by the answer to the question, "What methods does this object respond to?"
Unfortunately, that's a run-time question in general. But in specific,
we'd like to be able to optimize many of these switch statements at
compile time. So it may be necessary to supply typological hints in
some cases to do the dwimmery efficiently. Fortunately, most cases
are still fairly straightforward. A 1 is obviously a number,
and a "foo" is obviously a string. But unary + can force
anything to a number, and unary _ can force anything to a string.
Unary ? can force a boolean, and unary . can force a method call.
More complicated thoughts can be represented with closure blocks.
Another thing that needs adjustment is that the concept of "isa" matching seems to be missing, or at least difficult to express. We need good "isa" matching to implement good exception handling in terms of the switch mechanism. This means that we need to be able to say something like:
given $! {
when Error::Overflow { ... }
when Error::Type { ... }
when Error::ENOTTY { ... }
when /divide by 0/ { ... }
...
}
and expect it to check $!.isa(Error::Overflow) and such, along with
more normal pattern matching. In the case of the actual exception
mechanism, we won't use the keyword given, but rather CATCH:
CATCH {
when Error::Overflow { ... }
when Error::Type { ... }
when Error::ENOTTY { ... }
when /divide by 0/ { ... }
...
}
CATCH is a BEGIN-like block that can turn any block into a
"try" block from the inside out. But the insides of the CATCH are
an ordinary switch statement, where the discriminant is simply the
current exception object, $!. More on that later--see RFC 88
below.
Some of you may recall that I've stated that Perl 6 will have no
barewords. That's still the case. A token like Error::Overflow
is not a bareword because it's a declared class. Perl 6 recognizes
package names as symbolic tokens. So when you call a class method as
Class::Name.method(), the Class::Name is actually a class object
(that just happens to stringify to "Class::Name"). But the class
method can be called without a symbolic lookup on the package name
at run time, unlike in Perl 5.
Since Error::Overflow is just such a class object, it can be
distinguished from other kinds of objects in a switch statement,
and an "isa" can be inferred. It would be nice if we could go as
far as to say that any object can be called with any class name as
a method name to determine whether it "isa" member of that class,
but that could interfere with use of class name methods to implement
casting or construction. So instead, since switch statements are
into heavy dwimmery anyway, I think the switch statement will have
to recognize any Class::Name known at compile time, and force it
to call $!.isa(Class::Name).
Another possible adjustment will involve the use of switch statements as a means of parallelizing regular expression evaluation. Specifically, we want to be able to write parsers easily in Perl, which means that we need some way of matching a token stream against something like a set of regular expressions. You can think of a token stream as a funny kind of string. So if the "given" of a switch statement is a token stream, the regular expressions matched against it may have special abilities relating to the current parse's data structure. All the regular expressions of such a switch statement will likely be implicitly anchored to the current parse location, for instance. There may be special tokens referring to terminals and non-terminals. Basically, think of something like a yacc grammar, where alternative pattern/action grammar rules are most naturally expressed via switch statement cases. More on that in the next Apocalypse.
Another possible adjustment is that the proposed else block could be
considered unnecessary. The code following the final when is
automatically an "else". Here's a duodecimal digit converter:
$result = given $digit {
when "T" { 10 }
when "E" { 11 }
$digit;
}
Nevertheless, it's probably good documentation to line up all the
blocks, which means it would be good to have a keyword. However, for
reasons that will become clearer when we talk about exception handlers,
I don't want to use else. Also, because of the identification of
when and if, it would not be clear whether an else should
automatically supply a break at the end of its block as the ordinary
when case does.
So instead of else, I'd like to borrow a bit more from C and
use default:
$result = given $digit {
when "T" { 10 }
when "E" { 11 }
default { $digit }
}
Unlike in C, the default case must come last, since Perl's cases
are evaluated (or at least pretend to be evaluated) in order. The
optimizer can often determine which cases can be jumped to directly,
but in cases where that can't be determined, the cases are evaluated in
order much like cascaded if/elsif/else conditions. Also, it's
allowed to intersperse ordinary code between the cases, in which case
the code must be executed only if the cases above it fail to match.
For example, this should work as indicated by the print statements:
given $given {
print "about to check $first";
when $first { ... }
print "didn't match $first; let's try $next";
when $next { ... }
print "giving up";
default { ... }
die "panic: shouldn't see this";
}
We can still define when as a variant of if, which makes it
possible to intermix the two constructs when (or if) that is
desirable. So we'll leave that identity in--it always helps people
think about it when you can define a less familiar construct in terms
of a more familiar one. However, the default isn't quite the same
as an else, since else can't stand on its own. A default is
more like an if that's always true. So the above code is equivalent
to:
given $given {
print "about to check $first";
if $given =~ $first { ...; break }
print "didn't match $first; let's try $next";
if $given =~ $next { ...; break }
print "giving up";
if 1 { ...; break; }
die "panic: shouldn't see this";
}
We do need to rewrite the relationship table in the RFC to handle some
of the tweaks and simplifications we've mentioned. The comparison of
bare refs goes away. It wasn't terribly useful in the first place,
since it only worked for scalar refs. (To match identities we'll
need an explicit .id method in any event. We won't be relying on the
default numify or stringify methods to produce unique representations.)
I've rearranged the table to be applied in order, so that default interpretations come later. Also, the "Matching Code" column in the RFC gave alternatives that aren't resolved. In these cases I've chosen the "true" definition rather than the "exists" or "defined" definition. (Except for certain set manipulations with hashes, people really shouldn't be using the defined/undefined distinction to represent true and false, since both true and false are considered defined concepts in Perl.)
Some of the table entries distinguish an array from a list. Arrays look like this:
when [1, 3, 5, 7, 9] { "odd digit intersection" }
when @array { "array intersection" }
while a list looks like this:
when 1, 3, 5, 7, 9 { "odd digit" }
when @foo, @bar, @baz { "intersection with at least one array" }
Ordinarily lists and arrays would mean the same thing in scalar
context, but when is special in differentiating explicit arrays
from lists. Within a when, a list is a recursive disjunction.
That is, the comma-separated values are treated as individual cases
OR-ed together. We could use some other explicit notation for
disjunction such as:
when any(1, 3, 5, 7, 9) { "odd" }
But that seems a lot of trouble for a very common case of case, as it were. We could use vertical bars as some languages do, but I think the comma reads better.
Anyway, here's another simplification. The following table will
also define how the Perl 6 =~ operator works! That allows us
to use a recursive definition to handle matching against a disjunctive
list of cases. (See the first entry in the table below.) Of course,
for precedence reasons, to match a list of things using =~ you'll
have to use parens:
$digit =~ (1, 3, 5, 7, 9) and print "That's odd!";
Alternatively, you can look at this table as the definition of the
=~ operator, and then say that the switch statement is defined in
terms of =~. That is, for any switch statement of the form
given EXPR1 {
when EXPR2 { ... }
}
it's equivalent to saying this:
for (scalar(EXPR1)) {
if ($_ =~ (EXPR2)) { ... }
}
Table 1: Matching a switch value against a case value
$a $b Type of Match Implied Matching Code
====== ===== ===================== =============
expr list recursive disjunction match if $a =~ any($b)
list list recursive disjunction* match if any($a) =~ any($b)
hash sub(%) hash sub truth match if $b(%$a)
array sub(@) array sub truth match if $b(@$a)
expr sub($) scalar sub truth match if $b($a)
expr sub() simple closure truth* match if $b()
hash hash hash key intersection* match if grep exists $a{$_}, $b.keys
hash array hash value slice truth match if grep {$a{$_}} @$b
hash regex hash key grep match if grep /$b/, keys %$a
hash scalar hash entry truth match if $a{$b}
array array array intersection* match if any(@$a) =~ any(@$b)
array regex array grep match if grep /$b/, @$a
array number array entry truth match if $a[$b]
array expr array as list match if any($a) =~ $b
object class class membership match if $a.isa($b)
object method method truth match if $a.$b()
expr regex pattern match match if $a =~ /$b/
expr subst substitution match match if $a =~ subst
expr number numeric equality match if $a == $b
expr string string equality match if $a eq $b
expr boolean simple expression truth* match if $b
expr undef undefined match unless defined $a
expr expr run-time guessing match if ($a =~ $b) at runtime
In order to facilitate optimizations, these distinctions are made
syntactically at compile time whenever possible. For each comparison,
the reverse comparison is also implied, so $a/$b can be thought
of as either given/when or when/given. (We don't reverse the matches
marked with * are because it doesn't make sense in those casees.)
If type of match cannot be determined at compile time, the default is to try to apply the very same rules in the very same order at run time, using the actual types of the arguments, not their compile-time type appearance. Note that there are no run-time types corresponding to "method" or "boolean". Either of those notions can be expressed at runtime as a closure, of course.
In fact, whenever the default behavior is not what you intend, there are ways to force the arguments to be treated as you intend:
Intent Natural Forced
====== ======= ======
array @foo [list] or @{expr}
hash %bar {pairlist} or %{expr}
sub(%) { %^foo.aaa } sub (%foo) { ... }
sub(@) { @^bar.bbb } sub (@bar) { ... }
sub($) { $^baz.ccc } sub ($baz) { ... }
number numeric literal +expr int(expr) num(expr)
string string literal _expr str(expr)
regex //, m//, qr// /$(expr)/
method .foo(args) { $_.$method(args) }
boolean $a == $b ?expr or true expr or { expr }
A method must be written with a unary dot to distinguish it from other forms. The method may have arguments. In essence, when you write
.foo(1,2,3)
it is treated as if you wrote
{ $_.foo(1,2,3) }
and then the closure is evaluated for its truth.
A class match works only if the class name is known at compile time.
Use .isa("Class") for more complicated situations.
Boolean expressions are recognized at compile time by the presence of a
top-level operator that is a comparison or logical operator. As the
table shows, an argumentless closure (a sub (), that is) also functions
as a boolean. However, it's probably better documentation to use
the true function, which does the opposite of not. (Or the unary ?
operator, which does the opposite of unary !.)
It might be argued that boolean expressions have no place here at all,
and that you should use if if that's what you mean. (Or use a
sub() closure to force it to ignore the given.) However, the "comb"
structure of a switch is an extremely readable way to write even
ordinary boolean expressions, and rather than forcing people to write:
anyblock {
when { $a == 1 } { ... }
when { $b == 2 } { ... }
when { $c == 3 } { ... }
default { ... }
}
I'd rather they be able to write:
anyblock {
when $a == 1 { ... }
when $b == 2 { ... }
when $c == 3 { ... }
default { ... }
}
This also fits better into the use of "when" within CATCH blocks:
CATCH {
when $!.tag eq "foo" { ... }
when $!.tag eq "bar" { ... }
default { die }
}
To force all the when clauses to be interpreted as booleans without
using a boolean operator on every case, simply provide an empty given,
to be read as "given nothing...":
given () {
when $a.isa(Ant) { ... }
when $b.isa(Bat) { ... }
when $c.isa(Cat) { ... }
default { ... }
}
A when can be used by other topicalizers than just given.
Just as CATCH will imply a given of $!, a for loop (the foreach
variety) will also imply a given of the loop variable:
for @foo {
when 1 { ... }
when 2 { ... }
when "x" { ... }
default { ... }
}
By symmetry, a given will by default alias $_ to the "given".
Basically, the only difference between a given and a for is
that a given takes a scalar expression, while a for takes a
pre-flattened list and iterates over it.
Suppose you want to preserve $_ and alias $g to the value
instead. You can say that like this:
given $value -> $g {
when 1 { /foo/ }
when 2 { /bar/ }
when 3 { /baz/ }
}
In the same way, a loop's values can be aliased to one or more loop variables.
for @foo -> $a, $b { # two at a time
...
}
That works a lot like the definition of a subroutine call with two
formal parameters, $a and $b. (In fact, that's precisely what
it is.) You can use modifiers on the formal paramaters just as you
would in a subroutine type signature. This implies that the aliases
are automatically declared as my variables. It also implies
that you can modify the formal parameter with an rw property,
which allows you to modify the original elements of the array through
the variable. The default loop:
for @foo { ... }
is really compiled down to this:
for @foo -> $_ is rw { ... }
Since for and given work by passing arguments to a closure,
it's a small step to generalize that in the other direction.
Any method definition is a topicalizer within the body of the method,
and will assume a "given" of its $self object (or whatever you
have named it). Bare closures topicalize their first argument,
implicitly aliasing it to $_ unless $^a or some such is used.
That is, if you say this:
grep { $_ eq 3 } @list
it's equivalent to this more explicit use of a curried function:
grep { $^a eq 3 } @list
But even a grep can use the aliasing syntax above:
grep -> $x { $x eq 3 } @list
Outside the scope of any topicalizer, a when will assume that
its given was stored in $_ and will test implicitly against
that variable. This allows you to use when in your main loop,
for instance, even if that main loop was supplied by Perl's -n
or -p switch. Whenever a loop is functioning as a switch,
the break implied by finishing a case functions as a next, not
a last. Use last if that's what you mean.
A when is the only defaulting construct that pays attention to the
current topicalizer regardless of which variable it is associated
with. All other defaulting constructs pay attention to a fixed
variable, typically $_. So be careful what you're matching against
if the given is aliased to something other than $_:
$_ = "foo";
given "bar" -> $f {
if /foo/ { ... } # true, matches against $_
when /bar/ { ... } # true, matches against $f
}
Oh, one other tweak. The RFC proposes to overload next to mean
"fall through to the next case". I don't think this is wise, since
we'll often want to use loop controls within a switch statement.
Instead, I think we should use skip to do that. (To be read as
"Skip to the next statement.")
Similarly, if we make a word to mean to explicitly break out of a
topicalizer, it should not be last. I'd suggest break! It will,
of course, be unnecessary to break out of the end of a when case
because the break is implied. However, there are times when you
might want to break out of a when block early. Also, since we're
allowing when modifiers that do not implicitly break, we could use
an explicit break for that situation. You might see cases like this:
given $x {
warn("Odd value") when !/xxx/;
warn("No value"), break when undef;
when /aaa/ { break when 1; ... }
when /bbb/ { break when 2; ... }
when /ccc/ { break when 3; ... }
}
So it looks to me like we need a break.

