Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

RFC 274: Generalized Additions to Regexs

This proposal has significant early/late binding issues. A definition that forces run-time overhead is not as useful as it might be. On the other hand, a pure compile-time mechanism is not as general as it might be--but a compile-time mechanism can always compile in a run-time mechanism if it chooses to defer evaluation.

So it seems like this is a good place for syntactic warpage of some sort or other. That would make it possible to do both compile-time and run-time bindings. We'll be using the <...> notation for our extensible syntax, and the grammar rules for parsing that particular part of Perl syntax will be just as easy to tweak as any other Perl grammar rule.

That being said, the very fact that we can associate a grammar with the regex means that it's easy to define any instance of <word> to mean whatever you want it to. (In a sense, these subrules are the very callbacks that the RFC proposes.) These subrules can be bound either at Perl compile time or at Perl run time. They can be defined to take a string, regex, or Perl expression as an argument. The latter two cases are efficient because they come in as a regex or code reference respectively.

Following on, if (?{...}) etc code is evaluated in forward match, it would be a good idea to likewise support some code block that is ignored on a forward match but is executed when the code is unwound due to backtracking.

Yes, though hypothetical values take some of the pressure off for this. But if a closure contained a BACK block, it could be automatically fired off on backtracking. As with LAST et al., I suppose there's a corresponding back property on variables. In a sense, saying

    let $var = $newval

is much like saying

    our $var is back { .set($oldval) } = $newval

except that $var may well be stored in the regex state object rather than in a global symbol table.

RFC 276: Localising Paren Counts in qr()s.

I agree totally. As for the problem of pulling captures out of a subrule, it's up to the subrule to determine what it "returns". We could make some intelligent defaults, though different kinds of rules might want different defaults. One approach might be to say that if there is a single capture, that is returned as the result. If there is no capture, it's as if the entire subpattern were captured. If there are multiple captures, they are returned as an anonymous list. So $1 from such a subrule might come through like this:

    / $sub:=<subrule> { print $sub[1] } /

or just:

    / <subrule> { print $subrule[1] } /

But named captures and named rules intrude on this idyllic picture. You'd also like a default anonymous hash value returned that is keyed by all the named captures or rules. The question is whether that forces numbered captures to come through the hash interface. Or maybe that's just always the case, so to get at $1 of a subrule, you'd say:

    / $sub:=<subrule> { print $sub{'1'} } /

But there are reasons for wanting to treat the result object as an array, so that

    / $sub:=<subrule> { process(@$sub) } /

processes all the numbered captures from the subrule. So I think the return object behaves either like a hash or an array as appropriate. (Note that such an array might be declared to have an origin at 1 rather than 0.)

RFC 317: Access to optimisation information for regular expressions

Seems like a no-brainer. All such information is likely to be available to Perl anyway, given that we'd like to do the parser, optimizers, and code generators in Perl if at all possible.

RFC 331: Consolidate the $1 and \1 notations

I like the title of this RFC. It fits in with the new my policy of immediate introduction. However, there are certain difficulties with the proposed implementation. The statement-by-statement setting of the @/ array looks pretty ugly to me. I'd rather have a consistent view of hypothetical variables that can live on outside the regex in question without regard to statement boundaries. In the rare event that someone needs to refer to $1 (or anything else) from a prior regex, a temporary variable should be used.

RFC 332: Regex: Make /$/ equivalent to /\z/ under the '/s' modifier

Another RFC that is accepted in principle, but that doesn't go far enough. The /s modifier is going away, along with /m. A $ will always mean end-of-string, and $$ will match at the end of any line. (The current process id is now $*PID, by the way, so there's no conflict there. But how often to you want to write a pattern to match the current process id anyway?)

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow