Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
The new : operator replaces the (?>...) construct. It modifies
whatever comes before it, much like * does, so it's naturally scoped
if the preceding atom (or quantified atom) is a bracketed construct.
Parsers can use this every time they commit to the parsing of a token
or phrase to tell the regex engine that there's no point in backtracking
through the atom in question, so backtracking will skip backwards over
the atom and continue with some earlier branch point. The following takes
a long time to fail if it has to look at every sequence of "a" to see if
there is a "b" after it:
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaac" =~ /^ a* b /
But we already know that the only possible match is the longest one.
So if you put in the colon, it fails in one pass because the * grabs everything
and gives nothing back on backtracking.
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaac" =~ /^ a*: b /
You can use colon on a longer sequence too. The following might match a list of expressions separated by comma:
/ <expr> [ , <expr> ]*: /
It is an error to use : on any atom that does no backtracking. This will
help to catch errors where you've forgotten to backslash a literal colon in
things like:
/^From: (.*)/
Perl 6 has no need for a special conditional construct like Perl 5's (?(cond)yes|no).
That's because with a slight tweak, ordinary alternation can do the same thing.
That tweak is our next backtracking modifier, the :: operator. If you backtrack
across it, it fails all the way out of the current list of alternatives. Consider
an ordinary list of alternatives:
[ <A> <X> | <B> <Y> | <C> <Z> ]
The way the rules of backtracking work, if either <A> or <X>
fail, it backtracks to the next alternative. Likewise for
<B> and <Y>. In the case of <C> or <Z>,
there is no next alternative, so it naturally fails out of the
entire construct. That's not how a conditional is supposed to work,
because in the conditional, only the condition determines which case
is executed. Once you've committed to a particular case, it has to
stand or fall as if the conditional hadn't been there. So all we
need for our purposes is to have is something that separates the
assertions that matter from those that don't. That's what ::
does, and it reads rather well as a "then", or as a "corresponds to". If you write
[ <A> :: <X>
| <B> :: <Y>
| <C> :: <Z>
]
then the failure of <A>, <B>, or <C> proceeds
to the next case (if any), while any failure in <X>, <Y>,
or <Z> is guaranteed to backtrack out of the front of the
alternative list and revise a former choice (just as the success of
<X>, <Y>, or <Z> is guaranteed to "forward track"
out of the end of the alternative list and try to match more). It's a
natural mapping to existing regex semantics. Here's a more realistic
example from the Perl 6 grammar. It parses statement modifiers.
(The <ws> rule parses optional whitespace.)
rule modifier { if <ws> :: <expr> { .new_cond(0,$expr) }
| unless <ws> :: <expr> { .new_cond(1,$expr) }
| while <ws> :: <expr> { .new_loop(0,$expr) }
| until <ws> :: <expr> { .new_loop(1,$expr) }
| for <ws> :: <expr> { .new_for($expr) }
| <@other_modifiers> # user defined
| <null> # no modifier
}
In each case, once we recognize a keyword (and its following
whitespace), we need to look for an expression, and then call a closure
that builds the syntax tree. If either of those fails, the entire
modifier rule fails. We only get to the last two alternatives on
failure of assertions before the ::.
Note that the :: only says that we can't backtrack from the "then"
into the "if". It says nothing about backtracking into the alternative
list as a whole. The alternatives are still choice points, so the
regex engine is allowed to backtrack into the alternative list and
try another alternative. (To disable that, simply put a : after
the closing bracket of the alternative list.)
There is nothing in Perl 5 corresponding to the ::: operator, but
it works just like ::, only more so. If you backtrack across it,
it fails all the way out of the current rule definition (though not
out of any rule invoking this definition). That is, it fails all the
way out of the innermost lexically enclosing /.../, m/.../, s/...//,
rx/.../, or rule {...}, skipping out through any enclosing
nestings of <...>, [...], or (...). (A pattern nested
within a closure is classified as its own rule, however, so it never gets
the chance to pass out of a {...} closure.)
Since the alternatives in our last example are at the top level of the
regex, we could have used the ::: operator to get the same effect
as ::, because terminating the rule and terminating the alternation
amount to the same thing in that case. You can think of all of these
as variants on Prolog's "cut" operator.
If you backtrack over the :::: operator, it will delete your program
from the disk. ;-)
Actually, the real name of the real :::: operator is <commit>.
It fails the entire match if you backtrack over it, not just the
current rule. That is, it fails all the way out of the outermost
dynamically enclosing /.../, m/.../, s/...//, rx/.../,
or rule {...} that is executing on the current string.
There is one "cut" operator that is beyond <commit>; it is
appropriately named <cut>, for two reasons. First of all, it's
a real cut operator in that, if you backtrack over it, the current
match fails completely, just like <commit>. But that's
just a side effect of the other reason, which is that <cut>
cuts off the front of the string that you're currently matching on,
turning the current position into the new beginning of the string.
When you're matching on a potentially infinite string, it's important
that you have a way of discarding that part of the match that you've
already committed to. In Perl 5, the only way to do that was with
a coordinated system of s/^pat// operations. With the <cut>
assertion, however, you can just match normally, and cut at one
spot in your top-level rule when you reach an "accept" state.
In the realm of idle speculation, we could go as far as to define a
variant of <cut> that would render s/// slightly redundant:
s/foo/bar/;
m/foo <replace("bar")> /
Note that we don't need any special forms for controlling the scope of a "fail" in a closure. Just prefix the closure with the appropriate backtracking operator:
/ pattern ::: { code() or fail } / # fails entire rule
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

