Apocalypse 6
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.
Subs with special parsing
Any macro can have special parsing rules if you use the is parsed
trait. But some subs are automatically treated specially.
Operator subroutines
In Perl 6, operators are just subroutines with special names. When you say:
-$a + $b
you're really doing this internally:
infix:+( prefix:-($a), $b)
Operator names start with one of four names followed by a colon:
prefix: a unary prefix operator
infix: a binary infix operator
postfix: a binary suffix operator
circumfix: a bracketing operator
Everything after the colon and up to the next whitespace or left parenthesis will be taken as the spelling of the actual operator. Unicode is specifically allowed. The null operator is not allowed, so if the first thing after the colon is a left parenthesis, it is part of the operator, and if the first thing is whitespace, it's an illegal name. Boom!
You can make your own lexically scoped operators like this:
my sub postfix:! (Int $x) { return factorial($x) }
print 5!, "\n"; # print 120
You can use a newly declared operator recursively as soon as its name is introduced, including in its own definition:
my sub postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }
You can declare multimethods that create new syntax like this:
multi postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }
However, regardless of the scope of the name, the new syntax is considered to be a lexically scoped declaration, and is only valid after the name is declared (or imported) and after any precedence traits have been parsed.
If you want to specify a precedence, you always do it relative to some existing operator:
multi infix:coddle (PDL $a, PDL $b) is equiv(&infix:+) { ... }
multi infix:poach (PDL $a, PDL $b) is looser(&infix:+) { ... }
multi infix:scramble (PDL $a, PDL $b) is tighter(&infix:+) { ... }
If you base a tighter operator on a looser one, or a looser one on a tighter one, you don't get back to where you were. It always goes into the cracks no matter how many times you derive.
Just a note on implementation: if you've played with numerically
oriented precedence tables in the past, and are thinking, "but he'll
run out of bits in his number eventually." The answer to that is that
we don't use precedence numbers. The actual precedence level can be
represented internally by an arbitrarily long string of bytes that are
compared byte by byte. When you make a tighter or looser operator,
the string just gets one byte longer. A looser looser looser looser
infix:* is still tighter than a tighter tighter tighter tighter
infix:+, because the string comparison bails out on the first byte.
The first byte compares the built-in multiplication operator against
the built-in addition operator, and those are already different,
so we don't have to compare any more.
However, two operators derived by the same path have the same
precedence. All binary operators of a given precedence level are
assumed to be left associative unless declared otherwise with an
assoc('right') or assoc('non') trait. (Unaries pay no attention
to associativity--they always go from the outside in.)
This may sound complicated, and it is, if you're implementing
it internally. But from the user's point of view, it's much less
complicated than trying to keep track of numeric precedence levels
yourself. By making the precedence levels relative to existing
operators, we keep the user from having to think about how to keep
those cracks open. And most user-defined operators will have exactly
the same precedence as something built-in anyway. Not to mention the
fact that it's just plain better documentation to say that an operator
works like a familiar operator such as "+". Who the heck can
remember what precedence level 17 is, anyway?
If you don't specify a precedence on an operator, it will default
to something reasonable. A named unary operator, whether prefix or
postfix, will default to the same precedence as other named unary
operators like abs(). Symbolic unaries default to the same
precedence as unary + or - (hence the ! in our factorial
example is tighter than the * of multiplication.) Binaries default
to the same precedence as binary + or -. So in our coddle
example above, the is equiv(&infix::+) is completely redundant.
Unless it's completely wrong. For multimethods, it's an error to specify two different precedences for the same name. Multimethods that overload an existing name will be assumed to have the same precedence as the existing name.
You'll note that the rules for the scope of syntax warping are similar to those for macros. In essence, these definitions are macros, but specialized ones. If you declare one as a macro, the body is executed at compile time, and returns a string, a parse tree, or a closure just as a macro would:
# define Pascal comments:
macro circumfix:(**) () is parsed(/.*?/ { "" }
# "Comment? What comment?"
A circumfix operator is assumed to be split symmetrically between
prefix and postfix. In this case the circumfix of four characters
is split exactly in two, but if you don't want it split in the
middle (which is particularly gruesome when there's an odd number
of characters) you may specify exactly where the parse rule is
interpolated with a special ... marker, which is considered part
of the name:
macro circumfix:(*...*) () is parsed(/.*?/ { "" }
The default parse rule for a circumfix is an ordinary Perl expression of lowest precedence, the same one Perl uses inside ordinary parentheses. The defaults for other kinds of operators depend on the precedence of the operator, which may or may not be reflected in the actual name of the grammatical rule.
Note that the ternary operator ??:: has to be parsed as an infix
?? operator with a special parsing rule to find the associated
:: part. I'm not gonna explain that here, partly because
user-defined ternary operators are discouraged, and partly because
I haven't actually bothered to figure out the details yet. This
Apocalypse is already late enough.
Also please note that it's perfectly permissible (but not extremely expeditious) to rapidly reduce the Perl grammar to a steaming pile of gopher guts by redefining built-in operators such as commas or parentheses.
Named unaries
As in Perl 5, a named unary operator by default parses with the
same precedence as all other named unary operators like sleep
and rand. Any sub declared with a single scalar argument counts
as a named unary, not just explicit operator definitions. So it
doesn't really matter whether you say:
sub plaster ($x) {...}
or:
sub prefix:plaster ($x) {...}
Argumentless subs
As in Perl 5, a 0-ary subroutine (one with a () signature) parses
without looking for any argument at all, much like the time
built-in. (An optional pair of empty parens are allowed on the call,
as in time().) Constant subs with a null signature will likely
be inlined as they are in Perl 5, though the preferred way to declare
constants will be as standard variables with the is constant trait.
Matching of forward declarations
If you define a subroutine for which you earlier had a stub declaration, its signature and traits must match the stub's subroutine signature and traits, or it will be considered to be declaring a different subroutine of the same name, which may be any of illegal, immoral, or fattening. In the case of standard subs, it would be illegal, but in the case of multimethods, it would merely be fattening. (Well, you'd also get a warning if you called the stub instead of the "real" definition.)
The declaration and the definition should have the same defaults. That does not just mean that they should merely look the same. If you say:
our $x = 1;
sub foo ($y = $x) {...} # default to package var
{
my $x = 2;
sub foo ($y = $x) { print $y } # default to lexical var
foo();
}
then what you've said is an error if the compiler can catch it, and is erroneous if it can't. In any event, the program may correctly print any of these values:
1
2
1.5
12
1|2
(1,2)
Thbthbthbthth...
1|2|1.5|12|(1|2)|(1,2)|Thbthbthbthth...
Lvalue subroutines
The purpose of an lvalue subroutine is to return a "proxy"--that is, to return an object that represents a "single evaluation" of the subroutine while actually allowing multiple accesses within a single transaction. An lvalue subroutine has to pretend to be a storage location, with all the rights, privileges, and responsibilities pertaining thereto. But it has to do this without repeatedly calculating the identity of whatever it is you're actually modifying underneath--especially if that calculation entails side effects. (Or is expensive--meaning that it has the side-effect of chewing up computer resources...)
An lvalue subroutine is declared with the is rw trait. The compiler
will take whatever steps necessary to ensure that the returned value
references a storage location that can be treated as an lvalue.
If you merely return a variable (such as an object attribute), that
variable can act as its own proxy. You can also return the result
of a call to another lvalue subroutine or method. If you need to do
pre- or post-processing on the "public" value, however, you'll need
to return a tied proxy variable.
But if you know how hard it is to tie variables in Perl 5, you'll be pleasantly surprised that we're providing some syntactic relief for the common cases. In particular, you can say something like:
my sub thingie is rw {
return my $var
is Proxy( for => $hidden_var,
FETCH => { ... },
STORE => { ... },
TEMP => { ... },
...
);
}
in order to generate a tie class on the fly, and only override the standard proxy methods you need to, while letting others default to doing the standard behavior. This is particularly important when proxying things like arrays and hashes that have oodles of potential service routines.
But in particular, note that we want to be able to temporize object
attributes, which is why there's a TEMP method in our proxy. In Perl 5
you could only temporize (localize) variables. But we want accessors
to be usable exactly as if they were variables, which implies that
temporization is part of the interface. When you use a temp
or let context specifier:
temp $obj.foo = 42;
let $obj.bar = 43;
the proxy attribute returned by the lvalue method needs to know how to
temporize the value. More precisely, it needs to know how to restore
the old value at the end of the dynamic scope. So what the .TEMP
method returns is a closure that knows how to restore the old value.
As a closure, it can simply keep the old value in a lexical created
by .TEMP. The same method is called for both temp and let.
The only difference is that temp executes the returned closure
unconditionally at end of scope, while let executes the closure
conditionally only upon failure (where failure is defined as throwing
a non-control exception or returning undef in scalar context or ()
in list context).
After the .TEMP method returns the closure, you never have to worry
about it again. The temp or let will squirrel away the closure
and execute it later when appropriate. That's where the real power of
temp and let comes from--they're fire-and-forget operators.
The standard Scalar, Array, and Hash classes also have
a .TEMP method (or equivalent). So any such variable can be
temporized, even lexicals:
my $identity = 'Clark Kent';
for @emergencies {
temp $identity = 'SUPERMAN'; # still the lexical $identity
...
}
print $identity; # prints 'Clark Kent'
We'll talk more about lvalues below in reference to various RFCs that
espouse lvalue subs--all of which were rejected. :-)
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 |

