Apocalypse 6
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.
RFC 57: Subroutine prototypes and parameters
We ended up with something like this proposal, though with some
differences. Instead of =, we're using => to specify
names because it's a pair constructor in Perl 6, so there's little
ambiguity with positional parameters. Unless a positional parameter
is explicitly declared with a Pair or Hash type, it's assumed
not to be interested in named arguments.
Also, as the RFC points out, use of = would be incompatible with
lvalue subs, which we're supporting.
The RFC allows for mixing of positional and named parameters, both in declaration and in invocation. I think such a feature would provide far more confusion than functionality, so we won't allow it. You can always process your own argument list if you want to. You could even install your own signature handler in place of Perl's.
The RFC suggests treating the first parameter with a default as the first optional parameter. I think I'd rather mark optional parameters explicitly, and then disallow defaults on required parameters as a semantic constraint.
It's also suggested that something like:
sub waz ($one, $two,
$three = add($one, $two),
$four = add($three, 1)) {
...
}
be allowed, where defaults can refer back to previous parameters.
It seems as though we could allow that, if we assume that symbols
are introduced in signatures as soon as they are seen. That would
be consistent with how we've said my variables are introduced.
It does mean that a prototype that defaults to the prior $_ would
have to be written like this:
$myclosure = sub ($_ = $OUTER::_) { ... }
On the other hand, that's exactly what:
$myclosure = { ... }
means in the absence of placeholder variables, so the situation will likely not arise all that often. So I'd say yes, defaults should be able to refer back to previous parameters in the same signature, unless someone thinks of a good reason not to.
As explained in Apocalypse 4, $OUTER:: is for getting at an outer
lexical scope. This ruling about formal parameters means that,
effectively, the lexical scope of a subroutine "starts to begin"
where the formal parameters are declared, and "finishes beginning" at
the opening brace. Whether a given symbol in the signature actually
belongs to the inner scope or the outer scope depends on whether it's
already been introduced by the inner scope. Our sub above needed
$OUTER::_ because $_ had already been introduced as the name
of the first argument. Had some other name been introduced, $_
might still be taken to refer to the outer $_:
$myclosure = sub ($arg = $_) { ... }
If so, use of $OUTER::_ would be erroneous in that case, because
the subroutine's implicit $_ declaration wouldn't happen till
the opening curly, and instead of getting $OUTER::_, the user would
unexpectedly be getting $OUTER::OUTER::_, as it were. So instead, we'll
say that the implicit introduction of the new sub's $_ variable
always happens after the <subintro> and before the <signature>,
so any use of $_ as a default in a signature or
as an argument to a property can only refer to the subroutine's own
topic, if any. To refer to any external $_ you must say either
$CALLER::_ or $OUTER::_. This approach seems much cleaner.
RFC 160: Function-call named parameters (with compiler optimizations)
For efficiency, we have to be able to hoist the semantics from the
signature into the calling module when that's practical, and that
has to happen at compile time. That means the information has to be
in the signature, not embedded in a fields() function within the
body of the subroutine. In fact, my biggest complaint about
this RFC is that it arbitrarily separates the prototype characters,
the parameter names, and the variable names. That's a recipe for
things getting out of sync.
Basically, this RFC has a lot of the right ideas, but just doesn't
go far enough in the signature direction, based on the (at the
time) laudable notion that we were interested in keeping Perl 6 as
close to Perl 5 as possible. Which turned out not to be quite
the case. :-) Our new signatures look more hardwired than the
attribute syntax proposed here, but it's all still very hookable
underneath via the sub and parameter traits. And everything is
together that should be together.
Although the signature is really just a trait underneath, I thought it important to have special syntax for it, just as there's special syntax for the body of the function. Signatures are very special traits, and people like special things to look special. It's just more of those darn psychological reasons that keep popping up in the design of Perl.
Still and all, the current design is optimized for many of the same sensitivities described in this RFC.
RFC 128: Subroutines: Extend subroutine contexts to include name parameters and lazy arguments
This RFC also has lots of good ideas, but tends to stay a little
too close to Perl 5 in various areas where I've decided to swap
the defaults around. For instance, marking reference parameters in
prototypes rather than slurpy parameters in signatures, identifying
lazy parameters rather than flattening, and defaulting to rw
(autovivifying lvalue args) rather than constant (rvalue args).
Context classes are handled by the automatic coercion to references within scalar context, and by type junctions.
Again, I don't buy into two-pass, fill-in-the-blanks argument processing.
Placeholders are now just for argument declaration, and imply
no currying. Currying on the other hand is done with an explicit
.assuming method, which requires named args that will be bound to
the corresponding named parameters in the function being curried.
Or should I say functions? When module and class writers write systems of subroutines or methods, they usually go to great pains to make sure all the parameter names are consistent. Why not take advantage of that?
So currying might even be extended to classes or modules, where all methods or subs with a given argument name are curried simultaneously:
my module MyIO ::= (use IO::Module).assuming(ioflags => ":crlf");
my class UltAnswer ::= (use Answer a,b,c).assuming(answer => 42);
If you curry a class's invocant, it would turn the class into a module instead of another class, since there are no longer any methods if there are no invocants:
my module UltAnswer ::=
(use Answer a,b,c).assuming(self => new Answer: 42);
Or something like that. If you think this implies that there are class and module objects that can be sufficiently introspected to do this sort of chicanery, you'd be right. On the other hand, given that we'll have module name aliasing anyway to support running multiple versions of the same module, why not support multiple curried versions without explicit renaming of the module:
(use IO::Module).assuming(ioflags => ":crlf");
Then for the rest of this scope, IO::Module really points to your aliased idea of IO::Module, without explicitly binding it to a different name. Well, that's for Apocalypse 11, really...
One suggestion from this RFC I've taken to heart, which is to banish the term "prototype". You'll note we call them signatures now. (You may still call Perl 5's prototypes "prototypes", of course, because Perl 5's prototypes really were a prototype of signatures.)
RFC 344: Elements of @_ should be read-only by default
I admit it, I waffled on this one. Up until the last moment, I was
going to reject it, because I wanted @_ to work exactly like it
does in Perl 5 in subs without a signature. It seemed like a nice
sop towards backward compatibility.
But when I started writing about why I was rejecting it, I started
thinking about whether a sig-less sub is merely a throwback to Perl 5,
or whether we'll see it continue as a viable Perl 6 syntax. And if
the latter, perhaps it should be designed to work right rather than
merely to work the same. The vast majority of subroutines in Perl
5 refrain from modifying their arguments via @_, and it somehow
seems wrong to punish such good deeds.
So I changed my mind, and the default signature on a sub without
a signature is simply (*@_), meaning that @_ is considered an
array of constants by default. This will probably have good effects
on performance, in general. If you really want to write through
the @_ parameter back into the actual arguments, you'll have to
declare an explicit signature of (*@_ is rw).
The Perl5-to-Perl6 translator will therefore need to translate:
sub {...}
to:
sub (*@_ is rw) {...}
unless it can be determined that elements of @_ are not modified
within the sub. (It's okay to shift a constant @_ though, since
that doesn't change the elements passed to the call; remember that
for slurpy arrays the implied "is constant" or explicit "is rw"
distributes to the individual elements.)
RFC 194: Standardise Function Pre- and Post-Handling
Yes, this needs to be standardized, but we'll be generalizing to the
notion of wrappers, which can automatically keep their pre and post
routines in sync, and, more importantly, keep a single lexical scope
across the related pre and post processing. A wrapper is installed
with the .wrap method, which can have optional parameters to tell it
how to wrap, and which can return an identifier by which the particular
wrapper can be named when unwrapping or otherwise rearranging the
wrappings. A wrapper automatically knows what function it's wrapped
around, and invoking the call builtin automatically invokes the
next level routine, whether that's the actual routine or another layer
of wrapper. That does matter, because with that implicit knowledge
call doesn't need to be given the name of the routine to invoke.
- The implementation is dependent on what happens to typeglobs in Perl 6, how does one inspect and modify the moral equivalent of the symbol table?
This is not really a problem, since we've merely split the typeglob up into separate entries.
- Also: what will become of prototypes? Will it become possible to declare return types of functions?
Yes. Note that if you do introspection on a sub ref, by default you're going to get the signature and return type of the actual routine, not of any wrappers. There needs to be some method for introspecting the wrappers as well, but it's not the default.
- As pointed out in [JP:HWS] certain intricacies are involved: what are the semantic of caller()? Should it see the prehooks? If yes, how?
It seems to me that sometimes you want to see the wrappers, and
sometimes you don't. I think caller needs some kind of argument that
says which levels to recognize and which levels to ignore. It's not
necessarily a simple priority either. One invocation may want to find
the innermost enclosing loop, while another might want the innermost
enclosing try block. A general matching term will be supplied on
such calls, defaulting to ignore the wrappers.
- How does this relate to the proposed generalized want() [DC:RFC21]?
The want() function can be viewed as based on caller(), but
with a different interface to the information available at the the
particular call level.
I worry that generalized wrappers will make it impossible to compile
fast subroutine calls, if we always have to allow for run-time
insertion of handlers. Of course, that's no slower than Perl 5, but
we'd like to do better than Perl 5. Perhaps we can have the default
be to have wrappable subs, and then turn that off with specific
declarations for speed, such as "is inline".
RFC 271: Subroutines : Pre- and post- handlers for subroutines
I find it odd to propose using PRE for something with side effects
like flock. Of course, this RFC was written before FIRST blocks
existed...
On the other hand, it's possible that a system of PRE and POST
blocks would need to keep "dossiers" of its own internal state
independent of the "real" data. So I'm not exactly sure what the
effective difference is between PRE and FIRST. But we can
always put a PRE into a lexical wrapper if we need to keep info
around till the POST. So we can keep PRE and POST with the
semantics of simply returning boolean expressions, while FIRST
and LAST are evaluated primarily for side effects.
You might think that you wouldn't need a signature on any pre or post handler, since it's gonna be the same as the primary. However, we have to worry about multimethods of the same name, if the handlers are defined outside of the subroutine. Again, embedding PRE and POST blocks either in the routine itself or inside a wrapper around the routine should handle that. (And turning the problem into one of being able to generate a reference to a multimethod with a particular signature, in essence, doing method dispatch without actually dispatching at the end.)
My gut feeling is that $_[-1] is a bad place to keep the return
value. With the call interface we're proposing, you just harvest
the return value of call if you're interested in the return value.
Or perhaps this is a good place for a return signature to actually
have formal variables bound to the return values.
Also, defining pre and post conditions in terms of exceptions is probably a mistake. If they're just boolean expressions, they can be ANDed and ORed together more easily in the approved DBC fashion.
We haven't specified a declarative form of wrapper, merely a .wrap
method that you can call at run time. However, as with most of Perl,
anything you can do at run time, you can also do at compile time, so
it'd be fairly trivial to come up with a syntax that used a wrap
keyword in place of a sub:
wrap split(Regex ?$re, ?$src = $CALLER::_, ?$limit = Inf) {
print "Entering split\n";
call;
print "Leaving split\n";
}
I keep mistyping "wrap" as "warp". I suppose that's not so far off, actually...
RFC 21: Subroutines: Replace wantarray with a generic
want function
Overall, I like it, except that it's reinventing several wheels.
It seems that this has evolved into a powerful method for each sub to
do its own overloading based on return type. How does this play with
a more declarative approach to return types? I dunno. For now we're
assuming multmethod dispatch only pays attention to argument types.
We might get rid of a lot of calls to want if we could dispatch
on return type as well. Perhaps we could do primary dispatch on
the arguments and then do tie-breaking on return type when more
then one multimethod has the same parameter profile.
I also worry a bit that we're assuming an interpreter here that
can keep track of all the context information in a way suitable
for searching by the called subroutine. When running on top of a
JVM or CLR, this info might not be convenient to provide, and I'd
hate to have to keep a descriptor of every call, or do some kind of
double dispatch, just because the called routine might want to
use want(), or might want to call another routine that might want
to use want, or so on. Maybe the situation is not that bad.
I sometimes wonder if want should be a method on the context object:
given caller.want {...}
or perhaps the two could be coalesced into a single call:
given context { ... }
But for the moment let's assume for readability that there's a want
function distinct from caller, though with a similar signature:
multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
returns WantContext {...}
As with caller, calling want with no arguments looks for
the context of the currently executing subroutine or method.
Like return, it specifically ignores bare blocks and routines
interpreting bare blocks, and finds the context for the lexically
enclosing explicit sub or method declaration, named by &_.
You'll note that unlike in the proposal, we don't pass a list to
want, so we don't support the implicit && that is proposed for
the arguments to want. But that's one of the re-invented wheels,
anyway, so I'm not too concerned about that. What we really want is
a want that works well with smart matching and switch statements.
RFC 23: Higher order functions
In general, this RFC proposes some interesting semantic sugar,
but the rules are too complicated. There's really no need for
special numbered placeholders. And the special ^_ placeholder is
too confusing. Plus we really need regular sigils on our placeholder
variables so we can distinguish $^x from @^x from %^x.
But the main issue is that the RFC is confusing two separate concepts
(though that can be blamed on the languages this idea was borrowed
from). Anyway, it turns out we'll have an explicit pre-binding method
called .assuming for actual currying.
We'll make the self-declaring parameters a separate concept, called placeholder variables. They don't curry. Some of the examples of placeholders in the RFC are actually replaced by topics and junctions in our smart matching mode, but there are still lots of great uses for placeholder variables.
RFC 176: subroutine / generic entity documentation
This would be trivial to do with declared traits and here docs. But it might be better to use a POD directive that is accessible to the program. An entity might even have implicit traits that bind to nearby chunks of the right sort. Maybe we could get Don Knuth to come up with something literate...
RFC 298: Make subroutines' prototypes accessible from Perl
While I'm all in favor of a sub's signature being available for inspection, this RFC goes beyond that to make indirection in the signature the norm. This seems to be a solution in search of a problem. I'm not sure the confusion of the indirection is worth the ability to factor out common parameter lists. Certainly parameter lists must have introspection, but using it to set the prototype seems potentially confusing. That being said, the signatures are just traits, so this may be one of those things that is permitted, but not advised, like shooting your horse in the middle of the desert, or chewing out your SO for burning dinner. Implicit declaration of lexically scoped variables will undoubtedly be considered harmful by somebody someday. [Damian says, "Me. Today."]
RFC 334: Perl should allow specially attributed subs to be called as C functions
Fine, Dan, you implement it. ;-)
Did I claim I ignore the names of RFC authors? Hmm.
The syntax for the suggested:
sub foo : C_visible("i", "iii") {#sub body}
is probably a bit more verbose in real life:
my int sub foo (int $a, int $b, int $c)
is callable("C","Python","COBOL") { ... }
If we can't figure out the "i" and "iii" bits from introspection of
the signature and returns traits, we haven't done introspection
right. And if we're gonna have an optional type system, I can't think
of a better place to use it than for interfaces to optional languages.
Acknowledgements
This work was made possible by a grant from the Perl Foundation. I would like to thank everyone who made this dissertation possible by their generous support. So, I will...
Thank you all very, very, very, very much!!!
I should also point out that I would have been stuck forever on some of these design issues without the repeated prodding (as in cattle) of the Perl 6 design team. So I would also like to publicly thank Allison, chromatic, Damian, Dan, Hugo, Jarkko, Gnat, and Steve. Thanks, you guys! Many of the places we said "I" above, I should have said "we".
I'd like to publicly thank O'Reilly & Associates for facilitating the design process in many ways.
I would also like to thank my wife Gloria, but not publicly.
Future Plans
From here on out, the Apocalypses are probably going to be coming out in priority order rather than sequential order. The next major one will probably be Apocalypse 12, Objects, though it may take a while since (like a lot of people in Silicon Valley) I'm in negative cash flow at the moment, and need to figure out how to feed my family. But we'll get it done eventually. Some Apocalypses might be written by other people, and some of them hardly need to be written at all. In fact, let's write Apocalypse 7 right now...
Apocalypse 7: Formats
Gone from the core. See Damian.
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 |

