Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
RFC 274: Generalized Additions to Regexs
This proposal has significant early/late binding issues. A definition that forces run-time overhead is not as useful as it might be. On the other hand, a pure compile-time mechanism is not as general as it might be--but a compile-time mechanism can always compile in a run-time mechanism if it chooses to defer evaluation.
So it seems like this is a good place for syntactic warpage of some
sort or other. That would make it possible to do both compile-time
and run-time bindings. We'll be using the <...> notation
for our extensible syntax, and the grammar rules for parsing that
particular part of Perl syntax will be just as easy to tweak as any
other Perl grammar rule.
That being said, the very fact that we can associate a grammar with
the regex means that it's easy to define any instance of <word>
to mean whatever you want it to. (In a sense, these subrules are
the very callbacks that the RFC proposes.) These subrules can be
bound either at Perl compile time or at Perl run time. They can
be defined to take a string, regex, or Perl expression as an argument. The latter
two cases are efficient because they come in as a regex or code reference
respectively.
- Following on, if (?{...}) etc code is evaluated in forward match, it would be a good idea to likewise support some code block that is ignored on a forward match but is executed when the code is unwound due to backtracking.
Yes, though hypothetical values take some of the pressure off for this.
But if a closure contained a BACK block, it could be automatically fired
off on backtracking. As with LAST et al., I suppose there's a corresponding back property
on variables. In a sense, saying
let $var = $newval
is much like saying
our $var is back { .set($oldval) } = $newval
except that $var may well be stored in the regex state object rather than
in a global symbol table.
RFC 276: Localising Paren Counts in qr()s.
I agree totally. As for the problem of pulling captures out of
a subrule, it's up to the subrule to determine what it "returns".
We could make some intelligent defaults, though different kinds of
rules might want different defaults. One approach might be to say
that if there is a single capture, that is returned as the result.
If there is no capture, it's as if the entire subpattern were captured.
If there are multiple captures, they are returned as an anonymous list.
So $1 from such a subrule might come through like this:
/ $sub:=<subrule> { print $sub[1] } /
or just:
/ <subrule> { print $subrule[1] } /
But named captures and named rules intrude on this idyllic picture.
You'd also like a default anonymous hash value returned that is keyed
by all the named captures or rules. The question is whether that forces
numbered captures to come through the hash interface. Or maybe that's
just always the case, so to get at $1 of a subrule, you'd say:
/ $sub:=<subrule> { print $sub{'1'} } /
But there are reasons for wanting to treat the result object as an array, so that
/ $sub:=<subrule> { process(@$sub) } /
processes all the numbered captures from the subrule. So I think the return object behaves either like a hash or an array as appropriate. (Note that such an array might be declared to have an origin at 1 rather than 0.)
RFC 317: Access to optimisation information for regular expressions
Seems like a no-brainer. All such information is likely to be available to Perl anyway, given that we'd like to do the parser, optimizers, and code generators in Perl if at all possible.
RFC 331: Consolidate the $1 and \1 notations
I like the title of this RFC. It fits in with the new my policy of
immediate introduction. However, there are certain difficulties with
the proposed implementation. The statement-by-statement setting of
the @/ array looks pretty ugly to me. I'd rather have a consistent
view of hypothetical variables that can live on outside the regex in
question without regard to statement boundaries. In the rare event
that someone needs to refer to $1 (or anything else) from a prior
regex, a temporary variable should be used.
RFC 332: Regex: Make /$/ equivalent to /\z/ under the '/s' modifier
Another RFC that is accepted in principle, but that doesn't go far enough.
The /s modifier is going away, along with /m.
A $ will always mean end-of-string, and $$ will match
at the end of any line. (The current process id is now $*PID, by the way,
so there's no conflict there. But how often to you want to write a
pattern to match the current process id anyway?)
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

