Apocalypse 6
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.
Rules
Rules were discussed in Apocalypse 5. They are essentially methods
with an implicit invocant, consisting of the object containing the
current pattern matching context. To match the internals of regex
syntax, traits attached to rules are typically written as ":w"
rather than "is w", but they're essentially the same thing
underneath.
It's possible to call a rule as if it were a method, as long as you give it the right arguments. And a method defined in a grammar can be called as if it were a rule. They share the same namespace, and a rule really is just a method with a funny syntax.
Macros
A macro is a function that is called immediately upon completion of the parsing of its arguments. Macros must be defined before they are used--there are no forward declarations of macros, and while a macro's name may be installed in either a package or a lexical scope, its syntactic effect can only be lexical, from the point of declaration (or importation) to the end of the current lexical scope.
Every macro is associated (implicitly or explicitly) with a particular
grammar rule that parses and reduces the arguments to the macro.
The formal parameters of a macro are special in that they must be
derived somehow from the results of that associated grammar rule.
We treat macros as if they were methods on the parse object returned
by the grammar rule, so the first argument is passed as if it were
an invocant, and it is always bound to the current parse tree object,
known as $0 in Apocalypse 5. (A macro is not a true method of that
class, however, because its name is in your scope, not the class's.)
Since the first parameter is treated as an invocant, you may either
declare it or leave it implicit in the actual declaration. In either
case, the parse tree becomes the current topic for the macro.
Hence you may refer to it as either $_ or $0, even if you don't
give it a name.
Subsequent parameters may be specified, in which case they bind to
internal values of $0 in whatever way makes sense. Positional
parameters bind to $1, $2, etc. Named parameters bind to named
elements of $0. A slurpy hash is really the same as $0, since
$0 already behaves as a hash. A slurpy array gets $1, $2,
etc., even if already bound to a positional parameter.
A macro can do anything it likes with the parse tree, but the return value is treated specially by the parser. You can return one of several kinds of values:
- A parse tree (the same one, a modified one, or a synthetic one) to be passed up to the outer grammar rule that was doing pattern matching when we hit the macro.
- A closure functioning as a generic routine that is to be immediately inlined, treating the closure as a template. Within the template, any variable referring back to one of the macro's parse parameters will interpolate that parameter's value at that point in the template. (It will be interpolated as a parse tree, a string, or a number depending on the declaration of the parameter.) Any variable not referring back to a parameter is left alone, so that your template can declare its own lexical variables, or refer to a package variable.
- A string, to be shoved into the input stream and reparsed at the point the macro was found, starting in exactly the same grammar state we were before the macro. This is slightly different from returning the same string parsed into a parse tree, because a parse tree must represent a complete construct at some level, while the string could introduce a construct without terminating it. This is the most dangerous kind of return value, and the least likely to produce coherent error messages with decent line numbers for the end user. But it's also very powerful. Hee, hee.
-
An
undef, indicating that the macro is only used for its side effects. Such a macro would be one way of introducing an alternate commenting mechanism, for instance. I suppose returning "" has the same effect, though.
A macro by default parses any subsequent text using whatever
macro rule is currently in effect. Generally this will be the
standard Perl::macro rule, which parses subsequent arguments as a
list operator would--that is, as a comma-separated list with the same
policy on using or omitting parentheses as any other list operator.
This default may be overridden with the "is parsed" trait.
If there is no signature at all, macro defaults to using the null
rule, meaning it looks for no argument at all. You can use it for
simple word substitutions where no argument processing is needed.
Instead of the long-winded:
my macro this () is parsed(/<null>/) { "self" }
you can just quietly turn your program into C++:
my macro this { "self" }
A lot of Perl is fun, and macros are fun, but in general, you should never use a macro just for the fun of it. It's far too easy to poke someone's eye out with a macro.
Out-of-band parameters
Certain kinds of routines want extra parameters in addition to the ordinary parameter list. Autoloading routines for instance would like to know what function the caller was trying to call. Routines sensitive to topicalizers may wish to know what the topic is in their caller's lexical scope.
There are several possible approaches. The Perl 5 autoloader actually
pokes a package variable into the package with the AUTOLOAD
subroutine. It could be argued that something that's in your dynamic
scope should be accessed via dynamically scoped variables, and indeed
we may end up with a $*AUTOLOAD variable in Perl 6 that works
somewhat like Perl 5's, only better, because AUTOLOAD kinda sucks.
We'll address that in Apocalypse 10, for some definition of "we".
Another approach is to give access to the caller's lexical scope in
some fashion. The magical caller() function could return a handle
by which you can access the caller's my variables. And in general,
there will be such a facility under the hood, because we have to be
able to construct the caller's lexical scope while it's being compiled.
In the particular case of grabbing the topic from the caller's lexical
scope (and it has to be in the caller's lexical scope because $_
is now lexically scoped in Perl 6), we think it'll happen often enough
that there should be a shorthand for it. Or maybe it's more like a
"midhand". We don't want it too short, or people will unthinkingly
abuse it. Something on the order of a CALLER:: prefix, which
we'll discuss below.
Lexical context
Works just like in Perl 5. Why change something that works?
Well, okay, we are tweaking a few things related to lexical scopes.
$_ (also known as the current topic) is always a lexically scoped
variable now. In general, each subroutine will implicitly declare its
own $_. Methods, submethods, macros, rules, and pointy subs all
bind their first argument to $_; ordinary subs declare a lexical
$_ but leave it undefined. Every sub definition declares its own
$_ and hides any outer $_. The only exception is bare closures
that are pretending to be ordinary blocks and don't commandeer $_
for a placeholder. These continue to see the outer scope's $_,
just as they would any other lexically scoped variable declared in
the outer scope.
Dynamic context
On the flipside, $_ is no longer visible in the dynamic context.
You can still temporize (localize) it, but you'll be temporizing
the current subroutine's lexical $_, not the global $_.
Routines which used to use dynamic scoping to view the $_ of a calling
subroutine will need some tweaking. See CALLER:: below.
The caller function
As in Perl 5, the caller function will return information about
the dynamic context of the current subroutine. Rather than always
returning a list, it will return an object that represents the selected
caller's context. (In a list context, the object can still return the
old list as Perl 5-ers are used to.) Since contexts are polymorphic,
different context objects might in fact supply different methods.
The caller function doesn't have to know anything about that,
though.
What caller does know in Perl 6 is that it takes an optional
argument. That argument says where to stop when scanning up the call
stack, and so can be used to tell caller which kinds of context
you're interested in. By default, it'll skip any "wrapper" functions
(see "The .wrap method" below) and return the outermost context
that thought it was calling your routine directly. Here's a possible
declaration:
multi *caller (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
returns CallerContext {...}
The $where argument can be anything that matches a particular
context, including a subroutine reference or any of these Code types:
Code Routine Block Sub Method Submethod Multi Macro Bare Parametric
&_ produces a reference to your current Routine, though in the
signature above we have to use &CALLER::_ to get at the caller's
&_.
Note that use of caller can prevent certain kinds of optimizations,
such as tail recursion elimination.
The want function
The want function is really just the caller function in disguise.
It also takes an argument telling it which context to pay attention
to, which defaults to the one you think it should default to. It's
declared like this:
multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
returns WantContext {...}
Note that, as a variant of caller, use of want can prevent
certain kinds of optimizations.
When want is called in a scalar context:
$primary_context = want;
it returns a synthetic object whose type behaves as the junction of all the valid contexts currently in effect, whose numeric overloading returns the count of arguments expected, and whose string overloading produces the primary context as one of 'Void', 'Scalar', or 'List'. The boolean overloading produces true unless in a void context.
When want is called in a list context like this:
($primary, $count, @secondary) = want;
it returns a list of at least two values, indicating the contexts
in which the current subroutine was called. The first two values
in the list are the primary context (i.e the scalar return value)
and the expectation count (see Expectation counts below). Any
extra contexts that want may detect (see Valid contexts below)
are appended to these two items.
When want is used as an object, it has methods corresponding to
its valid contexts:
if want.rw { ... }
unless want.count < 2 { ... }
when want.List { ... }
The want function can be used with smart matching:
if want ~~ List & 2 & Lvalue { ... }
Which means it can also be used in a switch:
given want {
when List & 2 & Lvalue { ... }
when .count > 2 {...}
}
The numeric value of the want object is the "expectation
count". This is an integer indicating the number of return values
expected by the subroutine's caller. For void contexts, the expectation
count is always zero; for scalar contexts, it is always zero or one;
for list contexts it may be any non-negative number. The want
value can simply be used as a number:
if want >= 2 { return ($x, $y) } # context wants >= 2 values
else { return ($x); } # context wants < 2 values
Note that Inf >= 2 is true. (Inf is not the same as
undef.) If the context is expecting an unspecified number of
return values (typically because the result is being assigned to
an array variable), the expectation count is Inf. You shouldn't
actually return an infinite list, however, unless want ~~ Lazy.
The opposite of Lazy context is Eager context (the Perl 5 list
context, which always flattened immediately). Eager and Lazy
are subclasses of List.
The valid contexts are pretty much as listed in RFC 21, though to the
extent that the various contexts can be considered types, they can
be specified without quotes in smart matches. Also, types are not
all-caps any more. We know we have a Scalar type--hopefully we also
get types or pseudo-types like Void, List, etc. The List
type in particular is an internal type for the temporary lists that
are passed around in Perl. Preflattened lists are Eager, while
those lists that are not preflattened are Lazy. When you call
@array.specs, for instance, you actually get back an object of
type Lazy. Lists (Lazy or otherwise) are internal
generator objects, and in general you shouldn't be doing operations
on them, but on the arrays to which they are bound. The bound array manages its
hidden generators on your behalf to "harden" the abstract list into concrete
array values on demand.
The CALLER:: pseudopackage
Just as the SUPER:: pseudopackage lets you name a method somewhere
in your set of superclasses, the CALLER:: pseudoclass lets you
name a variable that is in the lexical scope of your (dynamically
scoped) caller. It may not be used to create a variable that does
not already exist in that lexical scope. As such, it is is primarily
intended for a particular variable that is known to exist in every
caller's lexical scope, namely $_. Your caller's current topic
is named $CALLER::_. Your caller's current Routine reference
is named &CALLER::_.
Note again that, as a form of caller, use of CALLER:: can
prevent certain kinds of optimizations. However, if your signature
uses $CALLER::_ as a default value, the optimizer may be able to
deal with that as a special case. If you say, for instance:
sub myprint (IO $handle, *@list = ($CALLER::_)) {
print $handle: *@list;
}
then the compiler can just turn the call:
myprint($*OUT);
into:
myprint($*OUT, $_);
Our earlier example of trim might want to default the first argument
to the caller's $_. In which case you can declare it as:
sub trim ( Str ?$_ is rw = $CALLER::_, Rule ?$remove = /\s+/ ) {
s:each/^ <$remove> | <$remove> $//;
}
which lets you call it like this:
trim; # trims $_
or even this:
trim remove => /\n+/;
Do not confuse the caller's lexical scope with the callee's
lexical scope. In particular, when you put a bare block into your
program that uses $_ like this:
for @array {
mumble { s/foo/bar/ };
}
the compiler may not know whether or not the mumble routine
is intending to pass $_ as the first argument of the closure,
which mumble needs to do if it's some kind of looping construct,
and doesn't need to do if it's a one-shot. So such a bare block
actually compiles down to something like this:
for @array {
mumble(sub ($_ is rw = $OUTER::_) { s/foo/bar/ });
}
(If you put $CALLER::_ there instead, it would be wrong, because
that would be referring to mumble's $_.)
With $OUTER::_, if mumble passes an argument to the block, that
argument becomes $_ each time mumble calls the block. Otherwise,
it's just the same outer $_, as if ordinary lexical scoping were
in effect. And, indeed, if the compiler knows that mumble takes
a sub argument with a signature of (), it may optimize it down
to ordinary lexical scoping, and if it has a signature of ($),
it can assume it doesn't need the default. A signature of (?$)
means all bets are off again.
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 |

