Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 6
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Rules

Rules were discussed in Apocalypse 5. They are essentially methods with an implicit invocant, consisting of the object containing the current pattern matching context. To match the internals of regex syntax, traits attached to rules are typically written as ":w" rather than "is w", but they're essentially the same thing underneath.

It's possible to call a rule as if it were a method, as long as you give it the right arguments. And a method defined in a grammar can be called as if it were a rule. They share the same namespace, and a rule really is just a method with a funny syntax.

Macros

A macro is a function that is called immediately upon completion of the parsing of its arguments. Macros must be defined before they are used--there are no forward declarations of macros, and while a macro's name may be installed in either a package or a lexical scope, its syntactic effect can only be lexical, from the point of declaration (or importation) to the end of the current lexical scope.

Every macro is associated (implicitly or explicitly) with a particular grammar rule that parses and reduces the arguments to the macro. The formal parameters of a macro are special in that they must be derived somehow from the results of that associated grammar rule. We treat macros as if they were methods on the parse object returned by the grammar rule, so the first argument is passed as if it were an invocant, and it is always bound to the current parse tree object, known as $0 in Apocalypse 5. (A macro is not a true method of that class, however, because its name is in your scope, not the class's.)

Since the first parameter is treated as an invocant, you may either declare it or leave it implicit in the actual declaration. In either case, the parse tree becomes the current topic for the macro. Hence you may refer to it as either $_ or $0, even if you don't give it a name.

Subsequent parameters may be specified, in which case they bind to internal values of $0 in whatever way makes sense. Positional parameters bind to $1, $2, etc. Named parameters bind to named elements of $0. A slurpy hash is really the same as $0, since $0 already behaves as a hash. A slurpy array gets $1, $2, etc., even if already bound to a positional parameter.

A macro can do anything it likes with the parse tree, but the return value is treated specially by the parser. You can return one of several kinds of values:

  • A parse tree (the same one, a modified one, or a synthetic one) to be passed up to the outer grammar rule that was doing pattern matching when we hit the macro.
  • A closure functioning as a generic routine that is to be immediately inlined, treating the closure as a template. Within the template, any variable referring back to one of the macro's parse parameters will interpolate that parameter's value at that point in the template. (It will be interpolated as a parse tree, a string, or a number depending on the declaration of the parameter.) Any variable not referring back to a parameter is left alone, so that your template can declare its own lexical variables, or refer to a package variable.
  • A string, to be shoved into the input stream and reparsed at the point the macro was found, starting in exactly the same grammar state we were before the macro. This is slightly different from returning the same string parsed into a parse tree, because a parse tree must represent a complete construct at some level, while the string could introduce a construct without terminating it. This is the most dangerous kind of return value, and the least likely to produce coherent error messages with decent line numbers for the end user. But it's also very powerful. Hee, hee.
  • An undef, indicating that the macro is only used for its side effects. Such a macro would be one way of introducing an alternate commenting mechanism, for instance. I suppose returning "" has the same effect, though.

A macro by default parses any subsequent text using whatever macro rule is currently in effect. Generally this will be the standard Perl::macro rule, which parses subsequent arguments as a list operator would--that is, as a comma-separated list with the same policy on using or omitting parentheses as any other list operator. This default may be overridden with the "is parsed" trait.

If there is no signature at all, macro defaults to using the null rule, meaning it looks for no argument at all. You can use it for simple word substitutions where no argument processing is needed. Instead of the long-winded:


    my macro this () is parsed(/<null>/) { "self" }

you can just quietly turn your program into C++:


    my macro this { "self" }

A lot of Perl is fun, and macros are fun, but in general, you should never use a macro just for the fun of it. It's far too easy to poke someone's eye out with a macro.

Out-of-band parameters

Certain kinds of routines want extra parameters in addition to the ordinary parameter list. Autoloading routines for instance would like to know what function the caller was trying to call. Routines sensitive to topicalizers may wish to know what the topic is in their caller's lexical scope.

There are several possible approaches. The Perl 5 autoloader actually pokes a package variable into the package with the AUTOLOAD subroutine. It could be argued that something that's in your dynamic scope should be accessed via dynamically scoped variables, and indeed we may end up with a $*AUTOLOAD variable in Perl 6 that works somewhat like Perl 5's, only better, because AUTOLOAD kinda sucks. We'll address that in Apocalypse 10, for some definition of "we".

Another approach is to give access to the caller's lexical scope in some fashion. The magical caller() function could return a handle by which you can access the caller's my variables. And in general, there will be such a facility under the hood, because we have to be able to construct the caller's lexical scope while it's being compiled.

In the particular case of grabbing the topic from the caller's lexical scope (and it has to be in the caller's lexical scope because $_ is now lexically scoped in Perl 6), we think it'll happen often enough that there should be a shorthand for it. Or maybe it's more like a "midhand". We don't want it too short, or people will unthinkingly abuse it. Something on the order of a CALLER:: prefix, which we'll discuss below.

Lexical context

Works just like in Perl 5. Why change something that works?

Well, okay, we are tweaking a few things related to lexical scopes. $_ (also known as the current topic) is always a lexically scoped variable now. In general, each subroutine will implicitly declare its own $_. Methods, submethods, macros, rules, and pointy subs all bind their first argument to $_; ordinary subs declare a lexical $_ but leave it undefined. Every sub definition declares its own $_ and hides any outer $_. The only exception is bare closures that are pretending to be ordinary blocks and don't commandeer $_ for a placeholder. These continue to see the outer scope's $_, just as they would any other lexically scoped variable declared in the outer scope.

Dynamic context

On the flipside, $_ is no longer visible in the dynamic context. You can still temporize (localize) it, but you'll be temporizing the current subroutine's lexical $_, not the global $_. Routines which used to use dynamic scoping to view the $_ of a calling subroutine will need some tweaking. See CALLER:: below.

The caller function

As in Perl 5, the caller function will return information about the dynamic context of the current subroutine. Rather than always returning a list, it will return an object that represents the selected caller's context. (In a list context, the object can still return the old list as Perl 5-ers are used to.) Since contexts are polymorphic, different context objects might in fact supply different methods. The caller function doesn't have to know anything about that, though.

What caller does know in Perl 6 is that it takes an optional argument. That argument says where to stop when scanning up the call stack, and so can be used to tell caller which kinds of context you're interested in. By default, it'll skip any "wrapper" functions (see "The .wrap method" below) and return the outermost context that thought it was calling your routine directly. Here's a possible declaration:


    multi *caller (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
        returns CallerContext {...}

The $where argument can be anything that matches a particular context, including a subroutine reference or any of these Code types:


    Code Routine Block Sub Method Submethod Multi Macro Bare Parametric

&_ produces a reference to your current Routine, though in the signature above we have to use &CALLER::_ to get at the caller's &_.

Note that use of caller can prevent certain kinds of optimizations, such as tail recursion elimination.

The want function

The want function is really just the caller function in disguise. It also takes an argument telling it which context to pay attention to, which defaults to the one you think it should default to. It's declared like this:


    multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
        returns WantContext {...}

Note that, as a variant of caller, use of want can prevent certain kinds of optimizations.

When want is called in a scalar context:


        $primary_context = want;

it returns a synthetic object whose type behaves as the junction of all the valid contexts currently in effect, whose numeric overloading returns the count of arguments expected, and whose string overloading produces the primary context as one of 'Void', 'Scalar', or 'List'. The boolean overloading produces true unless in a void context.

When want is called in a list context like this:


        ($primary, $count, @secondary) = want;

it returns a list of at least two values, indicating the contexts in which the current subroutine was called. The first two values in the list are the primary context (i.e the scalar return value) and the expectation count (see Expectation counts below). Any extra contexts that want may detect (see Valid contexts below) are appended to these two items.

When want is used as an object, it has methods corresponding to its valid contexts:


        if want.rw { ... }
        unless want.count < 2 { ... }
        when want.List { ... }

The want function can be used with smart matching:


        if want ~~ List & 2 & Lvalue { ... }

Which means it can also be used in a switch:


    given want {
        when List & 2 & Lvalue { ... }
        when .count > 2 {...}
    }

The numeric value of the want object is the "expectation count". This is an integer indicating the number of return values expected by the subroutine's caller. For void contexts, the expectation count is always zero; for scalar contexts, it is always zero or one; for list contexts it may be any non-negative number. The want value can simply be used as a number:


    if want >= 2 { return ($x, $y) }         # context wants >= 2 values
    else         { return ($x); }            # context wants < 2 values

Note that Inf >= 2 is true. (Inf is not the same as undef.) If the context is expecting an unspecified number of return values (typically because the result is being assigned to an array variable), the expectation count is Inf. You shouldn't actually return an infinite list, however, unless want ~~ Lazy. The opposite of Lazy context is Eager context (the Perl 5 list context, which always flattened immediately). Eager and Lazy are subclasses of List.

The valid contexts are pretty much as listed in RFC 21, though to the extent that the various contexts can be considered types, they can be specified without quotes in smart matches. Also, types are not all-caps any more. We know we have a Scalar type--hopefully we also get types or pseudo-types like Void, List, etc. The List type in particular is an internal type for the temporary lists that are passed around in Perl. Preflattened lists are Eager, while those lists that are not preflattened are Lazy. When you call @array.specs, for instance, you actually get back an object of type Lazy. Lists (Lazy or otherwise) are internal generator objects, and in general you shouldn't be doing operations on them, but on the arrays to which they are bound. The bound array manages its hidden generators on your behalf to "harden" the abstract list into concrete array values on demand.

The CALLER:: pseudopackage

Just as the SUPER:: pseudopackage lets you name a method somewhere in your set of superclasses, the CALLER:: pseudoclass lets you name a variable that is in the lexical scope of your (dynamically scoped) caller. It may not be used to create a variable that does not already exist in that lexical scope. As such, it is is primarily intended for a particular variable that is known to exist in every caller's lexical scope, namely $_. Your caller's current topic is named $CALLER::_. Your caller's current Routine reference is named &CALLER::_.

Note again that, as a form of caller, use of CALLER:: can prevent certain kinds of optimizations. However, if your signature uses $CALLER::_ as a default value, the optimizer may be able to deal with that as a special case. If you say, for instance:


    sub myprint (IO $handle, *@list = ($CALLER::_)) {
        print $handle: *@list;
    }

then the compiler can just turn the call:


    myprint($*OUT);

into:


    myprint($*OUT, $_);

Our earlier example of trim might want to default the first argument to the caller's $_. In which case you can declare it as:


    sub trim ( Str ?$_ is rw = $CALLER::_, Rule ?$remove = /\s+/ ) {
            s:each/^ <$remove> | <$remove> $//;
    }

which lets you call it like this:


    trim;   # trims $_

or even this:


    trim remove => /\n+/;

Do not confuse the caller's lexical scope with the callee's lexical scope. In particular, when you put a bare block into your program that uses $_ like this:


    for @array {
        mumble { s/foo/bar/ };
    }

the compiler may not know whether or not the mumble routine is intending to pass $_ as the first argument of the closure, which mumble needs to do if it's some kind of looping construct, and doesn't need to do if it's a one-shot. So such a bare block actually compiles down to something like this:


    for @array {
        mumble(sub ($_ is rw = $OUTER::_) { s/foo/bar/ });
    }

(If you put $CALLER::_ there instead, it would be wrong, because that would be referring to mumble's $_.)

With $OUTER::_, if mumble passes an argument to the block, that argument becomes $_ each time mumble calls the block. Otherwise, it's just the same outer $_, as if ordinary lexical scoping were in effect. And, indeed, if the compiler knows that mumble takes a sub argument with a signature of (), it may optimize it down to ordinary lexical scoping, and if it has a signature of ($), it can assume it doesn't need the default. A signature of (?$) means all bets are off again.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Next Pagearrow