Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 6
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 06 for the latest information.

Temporizing any subroutine call

Lvalue subroutines have a special way to return a proxy that can be temporized, but sometimes that's overkill. Maybe you don't want an lvalue; you just want a subroutine that can do something temporarily in an rvalue context. To do that, you can declare a subroutine with a TEMP block that works just like the .TEMP method described earlier. The TEMP block returns a closure that will be called when the call to this function goes out of scope.

So if you declare a function with a TEMP property:


    sub setdefout ($x) {
        my $oldout = $*OUT;
        $*DEFOUT = $x;
        TEMP {{ $*DEFOUT = $oldout }}
    }

then you can call it like this:


    temp setdefout($MYFILE);

and it will automatically undo itself on scope exit. One place where this might be useful is for wrappers:


    temp &foo.wrap({...})

The routine will automatically unwrap itself at the end of the current dynamic scope. A let would similarly put a hypothetical wrapper in place, but keep it wrapped on success.

The TEMP block is called only if you invoke the subroutine or method with temp or let. Otherwise the TEMP block is ignored. So if you just call:


    setdefout($MYFILE);

then the side-effects are permanent.

That being said...

I don't think we'll actually be using explicit TEMP closures all over the place, because I'd like to extend the semantics of temp and let such that they automatically save state of anything within their dynamic scopes. In essence, Perl writes most of the TEMP methods for you, and you don't have to worry about them unless you're interfacing to external code or data that doesn't know how to save its own state. (Though there's certainly plenty of all that out in the wide world.)

See appendix C for more about this line of thought.

The RFCs

Let me reiterate that there's little difference between an RFC accepted with major caveats and a rejected RFC from which some ideas may have been stolen. Please don't take any of this personally--I ignore author names when evaluating RFCs.

Rejected RFCs

RFC 59: Proposal to utilize * as the prefix to magic subroutines

There are several problems with doing this.

  • The * prefix is already taken for two other meanings. (It indicates a completely global symbol or a splatlist.) We could come up with something else, but we're running out of keyboard. And I don't think it's important enough to inflict a Unicode character on people.
  • It would be extra clutter that conveys little extra information over what is already conveyed by all-caps.
  • All-caps routines are a fuzzy set. Some of these routines are always called implicitly, while others are only usually called implicitly. We'd have to be continually making arbitrary decisions on where to cut it off.
  • Some routines are in the process of migrating into (or out of) the core. We don't want to force people to rewrite their programs when that happens.
  • People are already used to the all-caps convention.
  • Most importantly, I have an irrational dislike for anything that resembles Python's __foo__ convention. :-)

So we'll continue to half-heartedly reserve the all-caps space for Perl magic.

RFC 75: structures and interface definitions

In essence, this proposal turns every subroutine call into a constructor of a parameter list object. That's an interesting way to look at it, but the proposed notation for class declaration suffers from some problems. It's run-time rather than compile-time, and it's based on a value list rather than a statement list. In other words, it's not what we're gonna do, because we'll have a more standard-looking way of declaring classes. (On the other hand, I think the proposed functionality can probably be modeled by suitable use of constructors.)

The proposal also runs afoul of the rule that a lexically scoped variable ought generally to be declared explicitly at the beginning of its lexical scope. The parameters to subroutines will be lexically scoped in Perl 6, so there needs to be something equivalent to a my declaration at the beginning.

Unifying parameter passing with pack/unpack syntax is, I think, a false economy. pack and unpack are serialization operators, while parameter lists are about providing useful aliases to caller-provided data without any implied operation. The fact that both deal with lists of values on some level doesn't mean we should strain to make them the same on every level. That will merely make it impossible to implement subroutine calls efficiently, particularly since the Parrot engine is register-based, not stack-based as this RFC assumes. Register-based machines don't access parameters by offsets from the stack pointer.

RFC 107: lvalue subs should receive the rvalue as an argument

This would make it hard to dynamically scope an attribute. You'd have to call the method twice--once to get the old value, and once to set the new value.

The essence of the lvalue problem is that you'd like to separate the identification of the object from its manipulation. Forcing the new value into the same argument list as arguments meant to identify the object is going to mess up all sorts of things like assignment operators and temporization.

RFC 118: lvalue subs: parameters, explicit assignment, and wantarray() changes

This proposal has a similar problem in that it doesn't separate the identity from the operation.

RFC 132: Subroutines should be able to return an lvalue

This RFC proposes a keyword lreturn to return an lvalue.

I'd rather the lvalue hint be available to the compiler, I think, even if the body has not been compiled yet. So it needs to be declared in the signature somehow. The compiler would like to know whether it's even legal to assign to the subroutine. Plus it might have to deal with the returned value as a different sort of object.

At least this proposal doesn't confuse identification with modification. The lvalue is presumably an object with a STORE method that works independently of the original arguments. But this proposal also doesn't provide any mechanism to do postprocessing on the stored value.

RFC 149: Lvalue subroutines: implicit and explicit assignment

This is sort of the don't-have-your-cake-and-don't-eat-it-too approach. The implicit assignment doesn't allow for virtual attributes. The explicit assignment doesn't allow for delayed modification.

RFC 154: Simple assignment lvalue subs should be on by default

Differentiating "simple" lvalue subs is a problem. A user ought to just be able to say something fancy like


    temp $obj.attr += 3;

and have it behave right, provided .attr allows that.

Even with:


    $obj.attr = 3;

we have a real problem with knowing what can be done at compile time, since we might not know the exact type of $obj. Even if $obj is declared with a type, it's only an "isa" assertion. We could enforce things based on the declared type with the assumption that a derived type won't violate the contract, but I'm a little worried about large semantic changes happening just because one adds an optional type declaration. It seems safer that the untyped method behave just like the typed method, only with run-time resolution rather than compile-time resolution. Anything else would violate the principle of least surprise. So if it is not known whether $obj.attr can be an lvalue, it must be assumed that it can, and compiled with a mechanism that will work consistently, or throw a run-time exception if it can't.

The same goes for argument lists, actually. $obj.meth(@foo) can't assume that @foo is either scalar or list until it knows the signature of the .meth method. And it probably doesn't know that until dispatch time, unless it can analyze the entire set of available methods in advance. In general, modification of an invalid lvalue (an object without a write method, essentially) has to be handled by throwing an exception. This may well mean that it is illegal for a method to have an rw parameter!

Despite the fact that there are similar constraints on the arguments and on the lvalue, we cannot combine them, because the values are needed at different times. The arguments are needed when identifying the object to modify, since lvalue objects often act as proxies for other objects elsewhere.` Think of subscripting an array, for instance, where the subscripts function as arguments, so you can say:


    $elem := @a[0][1][2];
    $elem = 3;

Likewise we should be able to say:


    $ref := a(0,1,2);
    $ref = 3;

and have $ref be the lvalue returned by a(). It's the implied "is rw" on the left that causes a() to return an lvalue, just as a subroutine parameter that is "rw" causes lvaluehood to be passed to its actual argument.

Since we can't in general know at compile time whether a method is "simple" or not, we don't know whether it's appropriate to treat an assignment as an extra argument or as a parameter to an internal STORE method. We have to compile the call assuming there's a separate STORE method on the lvalue object. Which means there's no such thing as a "simple" lvalue from the viewpoint of the caller.

Accepted RFCs

RFC 168: Built-in functions should be functions

This all seems fine to me in principle. All built-in functions and multimethods exist in the "*" space, so system() is really &*system(); in Perl 6 .

We do need to consider whether "sub system" changes the meaning of calls to system() earlier in the lexical scope. Or are built-ins imported as third-class keywords like lock() is in Perl 5? It's probably best if we detect the ambiguous situation and complain. A "late" definition of system() could be considered a redefinition, in fact, any definition of system() could be considered a redefinition. We could require "is redefined" or some such on all such redefinitions.

The "lock" situation arises when we add a new built-in, however. Do we want to force people to add in an "is redefined" where they didn't have to before? Worse, if their definition of "lock" is retroactive to the front of the file, merely adding "sub lock is redefined" is not necessarily good enough to become retroactive.

This is not a problem with my subs, since they have to be declared in advance. If we defer committing compilation of package-named subs to the end of the compilation unit, then we can just say that the current package overrides the "*" package. All built-ins become "third class" keywords in that case. But does that mean that a built-in can't override ordinary function-call syntax? Built-ins should at least be able to be used as list operators, but in Perl 5 you couldn't use your own sub as a list operator unless it was predeclared. Maybe we could relax that.

Since there are no longer any barewords, we can assume that any unrecognized word is a subroutine or method call of some sort even in the absence of parens. We could assume all such words are list operators. That works okay for overriding built-ins that actually *are* list operators--but not all of them are. If you say:


    print rand 1, 2;
    sub rand (*@x) { ... }

then it cannot be determined whether rand should be parsed as a unary operator ($) or as a list operator (*@).

Perl has to be able to parse its unary operators. So that code must be interpreted as:


    print rand(1), 2;

At that point in the parse, we've essentially committed to a signature of ($), which makes the subsequent sub declaration a redefinition with a different signature, which is illegal. But when someone says:


    print foo 1, 2;
    sub foo (*@x) { ... }

it's legal until someone defines &*foo($). We can protect ourselves from the backward compatibility problem by use of parens. When there are parens, we can probably defer the decision about the binding of its arguments to the end of the compilation. So either of:


    print foo(1), 2;
    sub foo (*@x) { ... }

or:


    print foo(1, 2);
    sub foo (*@x) { ... }

remain legal even if we later add a unary &*foo operator, as long as no other syntactic monkey business is going on with the functions args. So I think we keep the rule that says post-declared subs have to be called using parens, even though we could theoretically relax it.

On the other hand, this means that any unrecognized word followed by a list may unambiguously be taken to be a multimethod being called as a list operaotr. After all, we don't know when someone will be adding more multimethods. I currently think this is a feature, but I could be sadly mistaken. It has happened once or twice in the past.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Next Pagearrow