Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

The new : operator replaces the (?>...) construct. It modifies whatever comes before it, much like * does, so it's naturally scoped if the preceding atom (or quantified atom) is a bracketed construct. Parsers can use this every time they commit to the parsing of a token or phrase to tell the regex engine that there's no point in backtracking through the atom in question, so backtracking will skip backwards over the atom and continue with some earlier branch point. The following takes a long time to fail if it has to look at every sequence of "a" to see if there is a "b" after it:

    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaac" =~ /^ a* b /

But we already know that the only possible match is the longest one. So if you put in the colon, it fails in one pass because the * grabs everything and gives nothing back on backtracking.

    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaac" =~ /^ a*: b /

You can use colon on a longer sequence too. The following might match a list of expressions separated by comma:

    / <expr> [ , <expr> ]*: /

It is an error to use : on any atom that does no backtracking. This will help to catch errors where you've forgotten to backslash a literal colon in things like:

    /^From: (.*)/

Perl 6 has no need for a special conditional construct like Perl 5's (?(cond)yes|no). That's because with a slight tweak, ordinary alternation can do the same thing. That tweak is our next backtracking modifier, the :: operator. If you backtrack across it, it fails all the way out of the current list of alternatives. Consider an ordinary list of alternatives:

    [ <A> <X> | <B> <Y> | <C> <Z> ]

The way the rules of backtracking work, if either <A> or <X> fail, it backtracks to the next alternative. Likewise for <B> and <Y>. In the case of <C> or <Z>, there is no next alternative, so it naturally fails out of the entire construct. That's not how a conditional is supposed to work, because in the conditional, only the condition determines which case is executed. Once you've committed to a particular case, it has to stand or fall as if the conditional hadn't been there. So all we need for our purposes is to have is something that separates the assertions that matter from those that don't. That's what :: does, and it reads rather well as a "then", or as a "corresponds to". If you write

    [ <A> :: <X>
    | <B> :: <Y>
    | <C> :: <Z>
    ]

then the failure of <A>, <B>, or <C> proceeds to the next case (if any), while any failure in <X>, <Y>, or <Z> is guaranteed to backtrack out of the front of the alternative list and revise a former choice (just as the success of <X>, <Y>, or <Z> is guaranteed to "forward track" out of the end of the alternative list and try to match more). It's a natural mapping to existing regex semantics. Here's a more realistic example from the Perl 6 grammar. It parses statement modifiers. (The <ws> rule parses optional whitespace.)

    rule modifier { if     <ws> :: <expr> { .new_cond(0,$expr) }
                  | unless <ws> :: <expr> { .new_cond(1,$expr) }
                  | while  <ws> :: <expr> { .new_loop(0,$expr) }
                  | until  <ws> :: <expr> { .new_loop(1,$expr) }
                  | for    <ws> :: <expr> { .new_for($expr)   }
                  | <@other_modifiers>  # user defined
                  | <null>              # no modifier
                  }

In each case, once we recognize a keyword (and its following whitespace), we need to look for an expression, and then call a closure that builds the syntax tree. If either of those fails, the entire modifier rule fails. We only get to the last two alternatives on failure of assertions before the ::.

Note that the :: only says that we can't backtrack from the "then" into the "if". It says nothing about backtracking into the alternative list as a whole. The alternatives are still choice points, so the regex engine is allowed to backtrack into the alternative list and try another alternative. (To disable that, simply put a : after the closing bracket of the alternative list.)

There is nothing in Perl 5 corresponding to the ::: operator, but it works just like ::, only more so. If you backtrack across it, it fails all the way out of the current rule definition (though not out of any rule invoking this definition). That is, it fails all the way out of the innermost lexically enclosing /.../, m/.../, s/...//, rx/.../, or rule {...}, skipping out through any enclosing nestings of <...>, [...], or (...). (A pattern nested within a closure is classified as its own rule, however, so it never gets the chance to pass out of a {...} closure.)

Since the alternatives in our last example are at the top level of the regex, we could have used the ::: operator to get the same effect as ::, because terminating the rule and terminating the alternation amount to the same thing in that case. You can think of all of these as variants on Prolog's "cut" operator.

If you backtrack over the :::: operator, it will delete your program from the disk. ;-)

Actually, the real name of the real :::: operator is <commit>. It fails the entire match if you backtrack over it, not just the current rule. That is, it fails all the way out of the outermost dynamically enclosing /.../, m/.../, s/...//, rx/.../, or rule {...} that is executing on the current string.

There is one "cut" operator that is beyond <commit>; it is appropriately named <cut>, for two reasons. First of all, it's a real cut operator in that, if you backtrack over it, the current match fails completely, just like <commit>. But that's just a side effect of the other reason, which is that <cut> cuts off the front of the string that you're currently matching on, turning the current position into the new beginning of the string. When you're matching on a potentially infinite string, it's important that you have a way of discarding that part of the match that you've already committed to. In Perl 5, the only way to do that was with a coordinated system of s/^pat// operations. With the <cut> assertion, however, you can just match normally, and cut at one spot in your top-level rule when you reach an "accept" state.

In the realm of idle speculation, we could go as far as to define a variant of <cut> that would render s/// slightly redundant:

    s/foo/bar/;
    m/foo <replace("bar")> /

Note that we don't need any special forms for controlling the scope of a "fail" in a closure. Just prefix the closure with the appropriate backtracking operator:

    / pattern ::: { code() or fail } /  # fails entire rule

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow