Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 6
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Subs with special parsing

Any macro can have special parsing rules if you use the is parsed trait. But some subs are automatically treated specially.

Operator subroutines

In Perl 6, operators are just subroutines with special names. When you say:


    -$a + $b

you're really doing this internally:


    infix:+( prefix:-($a), $b)

Operator names start with one of four names followed by a colon:


    prefix:     a unary prefix operator
    infix:      a binary infix operator
    postfix:    a binary suffix operator
    circumfix:  a bracketing operator

Everything after the colon and up to the next whitespace or left parenthesis will be taken as the spelling of the actual operator. Unicode is specifically allowed. The null operator is not allowed, so if the first thing after the colon is a left parenthesis, it is part of the operator, and if the first thing is whitespace, it's an illegal name. Boom!

You can make your own lexically scoped operators like this:


    my sub postfix:! (Int $x) { return factorial($x) }
    print 5!, "\n";     # print 120

You can use a newly declared operator recursively as soon as its name is introduced, including in its own definition:


    my sub postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }

You can declare multimethods that create new syntax like this:


    multi postfix:! (Int $x) { $x<=1 ?? 1 :: $x*($x-1)! }

However, regardless of the scope of the name, the new syntax is considered to be a lexically scoped declaration, and is only valid after the name is declared (or imported) and after any precedence traits have been parsed.

If you want to specify a precedence, you always do it relative to some existing operator:


    multi infix:coddle   (PDL $a, PDL $b) is equiv(&infix:+) { ... }
    multi infix:poach    (PDL $a, PDL $b) is looser(&infix:+) { ... }
    multi infix:scramble (PDL $a, PDL $b) is tighter(&infix:+) { ... }

If you base a tighter operator on a looser one, or a looser one on a tighter one, you don't get back to where you were. It always goes into the cracks no matter how many times you derive.

Just a note on implementation: if you've played with numerically oriented precedence tables in the past, and are thinking, "but he'll run out of bits in his number eventually." The answer to that is that we don't use precedence numbers. The actual precedence level can be represented internally by an arbitrarily long string of bytes that are compared byte by byte. When you make a tighter or looser operator, the string just gets one byte longer. A looser looser looser looser infix:* is still tighter than a tighter tighter tighter tighter infix:+, because the string comparison bails out on the first byte. The first byte compares the built-in multiplication operator against the built-in addition operator, and those are already different, so we don't have to compare any more.

However, two operators derived by the same path have the same precedence. All binary operators of a given precedence level are assumed to be left associative unless declared otherwise with an assoc('right') or assoc('non') trait. (Unaries pay no attention to associativity--they always go from the outside in.)

This may sound complicated, and it is, if you're implementing it internally. But from the user's point of view, it's much less complicated than trying to keep track of numeric precedence levels yourself. By making the precedence levels relative to existing operators, we keep the user from having to think about how to keep those cracks open. And most user-defined operators will have exactly the same precedence as something built-in anyway. Not to mention the fact that it's just plain better documentation to say that an operator works like a familiar operator such as "+". Who the heck can remember what precedence level 17 is, anyway?

If you don't specify a precedence on an operator, it will default to something reasonable. A named unary operator, whether prefix or postfix, will default to the same precedence as other named unary operators like abs(). Symbolic unaries default to the same precedence as unary + or - (hence the ! in our factorial example is tighter than the * of multiplication.) Binaries default to the same precedence as binary + or -. So in our coddle example above, the is equiv(&infix::+) is completely redundant.

Unless it's completely wrong. For multimethods, it's an error to specify two different precedences for the same name. Multimethods that overload an existing name will be assumed to have the same precedence as the existing name.

You'll note that the rules for the scope of syntax warping are similar to those for macros. In essence, these definitions are macros, but specialized ones. If you declare one as a macro, the body is executed at compile time, and returns a string, a parse tree, or a closure just as a macro would:


    # define Pascal comments:
    macro circumfix:(**) () is parsed(/.*?/ { "" }
                                # "Comment? What comment?"

A circumfix operator is assumed to be split symmetrically between prefix and postfix. In this case the circumfix of four characters is split exactly in two, but if you don't want it split in the middle (which is particularly gruesome when there's an odd number of characters) you may specify exactly where the parse rule is interpolated with a special ... marker, which is considered part of the name:


    macro circumfix:(*...*) () is parsed(/.*?/ { "" }

The default parse rule for a circumfix is an ordinary Perl expression of lowest precedence, the same one Perl uses inside ordinary parentheses. The defaults for other kinds of operators depend on the precedence of the operator, which may or may not be reflected in the actual name of the grammatical rule.

Note that the ternary operator ??:: has to be parsed as an infix ?? operator with a special parsing rule to find the associated :: part. I'm not gonna explain that here, partly because user-defined ternary operators are discouraged, and partly because I haven't actually bothered to figure out the details yet. This Apocalypse is already late enough.

Also please note that it's perfectly permissible (but not extremely expeditious) to rapidly reduce the Perl grammar to a steaming pile of gopher guts by redefining built-in operators such as commas or parentheses.

Named unaries

As in Perl 5, a named unary operator by default parses with the same precedence as all other named unary operators like sleep and rand. Any sub declared with a single scalar argument counts as a named unary, not just explicit operator definitions. So it doesn't really matter whether you say:


    sub plaster ($x) {...}

or:


    sub prefix:plaster ($x) {...}

Argumentless subs

As in Perl 5, a 0-ary subroutine (one with a () signature) parses without looking for any argument at all, much like the time built-in. (An optional pair of empty parens are allowed on the call, as in time().) Constant subs with a null signature will likely be inlined as they are in Perl 5, though the preferred way to declare constants will be as standard variables with the is constant trait.

Matching of forward declarations

If you define a subroutine for which you earlier had a stub declaration, its signature and traits must match the stub's subroutine signature and traits, or it will be considered to be declaring a different subroutine of the same name, which may be any of illegal, immoral, or fattening. In the case of standard subs, it would be illegal, but in the case of multimethods, it would merely be fattening. (Well, you'd also get a warning if you called the stub instead of the "real" definition.)

The declaration and the definition should have the same defaults. That does not just mean that they should merely look the same. If you say:


    our $x = 1;
    sub foo ($y = $x) {...}             # default to package var

    {
        my $x = 2;
        sub foo ($y = $x) { print $y }  # default to lexical var
        foo();
    }

then what you've said is an error if the compiler can catch it, and is erroneous if it can't. In any event, the program may correctly print any of these values:


    1
    2
    1.5
    12
    1|2
    (1,2)
    Thbthbthbthth...
    1|2|1.5|12|(1|2)|(1,2)|Thbthbthbthth...

Lvalue subroutines

The purpose of an lvalue subroutine is to return a "proxy"--that is, to return an object that represents a "single evaluation" of the subroutine while actually allowing multiple accesses within a single transaction. An lvalue subroutine has to pretend to be a storage location, with all the rights, privileges, and responsibilities pertaining thereto. But it has to do this without repeatedly calculating the identity of whatever it is you're actually modifying underneath--especially if that calculation entails side effects. (Or is expensive--meaning that it has the side-effect of chewing up computer resources...)

An lvalue subroutine is declared with the is rw trait. The compiler will take whatever steps necessary to ensure that the returned value references a storage location that can be treated as an lvalue. If you merely return a variable (such as an object attribute), that variable can act as its own proxy. You can also return the result of a call to another lvalue subroutine or method. If you need to do pre- or post-processing on the "public" value, however, you'll need to return a tied proxy variable.

But if you know how hard it is to tie variables in Perl 5, you'll be pleasantly surprised that we're providing some syntactic relief for the common cases. In particular, you can say something like:


    my sub thingie is rw {
        return my $var
            is Proxy( for => $hidden_var,
                      FETCH => { ... },
                      STORE => { ... },
                      TEMP  => { ... },
                      ...
            );
    }

in order to generate a tie class on the fly, and only override the standard proxy methods you need to, while letting others default to doing the standard behavior. This is particularly important when proxying things like arrays and hashes that have oodles of potential service routines.

But in particular, note that we want to be able to temporize object attributes, which is why there's a TEMP method in our proxy. In Perl 5 you could only temporize (localize) variables. But we want accessors to be usable exactly as if they were variables, which implies that temporization is part of the interface. When you use a temp or let context specifier:


    temp $obj.foo = 42;
    let $obj.bar = 43;

the proxy attribute returned by the lvalue method needs to know how to temporize the value. More precisely, it needs to know how to restore the old value at the end of the dynamic scope. So what the .TEMP method returns is a closure that knows how to restore the old value. As a closure, it can simply keep the old value in a lexical created by .TEMP. The same method is called for both temp and let. The only difference is that temp executes the returned closure unconditionally at end of scope, while let executes the closure conditionally only upon failure (where failure is defined as throwing a non-control exception or returning undef in scalar context or () in list context).

After the .TEMP method returns the closure, you never have to worry about it again. The temp or let will squirrel away the closure and execute it later when appropriate. That's where the real power of temp and let comes from--they're fire-and-forget operators.

The standard Scalar, Array, and Hash classes also have a .TEMP method (or equivalent). So any such variable can be temporized, even lexicals:


    my $identity = 'Clark Kent';

    for @emergencies {
         temp $identity = 'SUPERMAN';   # still the lexical $identity
         ...
    }

    print $identity;    # prints 'Clark Kent'

We'll talk more about lvalues below in reference to various RFCs that espouse lvalue subs--all of which were rejected. :-)

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Next Pagearrow