Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

Null String Reform

The null pattern is now illegal. To match whatever you used to match with a null pattern, use one of these:

    Old                 New
    ---                 ---
    //                  /<prior>/       # match what prior match did
    //                  /<null>/        # match the null string between chars
    (a|b|)              (a|b|<null>)    # match a null alternative

Note that, as an assertion, <null> always succeeds. You never want to say:

    / <null> | single | double | triple | home run /

because you'll never get to first base.

Extension Syntax Reform

There are no longer any (?...) sequences, because parens now always capture. Some of the replacement sequences take their intrinsic scoping from <...>, while others are associated with other bracketing characters, or with any arbitrary atom that could be a bracketed construct. Looking at the metasyntax problem from the perspective of a Perl5-to-Perl6 translator, here's what the various Perl 5 extension constructs translate to:

    Old                 New
    ---                 ---
    (??{$rule})         <$rule>         # call regex in variable
    (?{ code })         { code }        # call Perl code, ignore result
    (?#...)             <('...')>       # in-line comment, rarely needed
    (?:...)             [...]           # non-capturing brackets
    (?=...)             <before ...>    # positive lookahead
    (?!...)             <!before ...>   # negative lookahead
    (?<=...)            <after ...>     # positive lookbehind
    (?<!...)            <!after ...>    # negative lookbehind
    (?>...)             [...]:          # grab (any atom)
    (?(cond)yes|no)     [ cond :: yes | no ]
    (?(1)yes|no)        [ <(defined $1)> :: yes | no ]

The <$rule> construct does a "delayed" call of another regular expression stored in the $rule variable. If it is a regex object, it's just called as if it were a subroutine, so there's no performance problem. If it's a string, it is compiled as a regex and executed. The compiled form is cached as a property of the string, so it doesn't have to be recompiled unless the string changes. (This implies that we can have properties that invalidate themselves when their base object is modified.) In either case, the evaluated regex is treated as a subrule, and any captures it does are invisible to the outer regex unless the outer regex takes steps to retrieve them. In any event, subrule parens never change the paren count of the outer rule.

The {code} form doesn't return anything meaningful--it is used for its side effects. Any such closure may behave as an assertion. It merely has to throw an exception in order to fail. To throw such an exception (on purpose) one may use fail:

    $_ = "666";
    / (\d+) { $1 < 582 or fail }/

As with any assertion, the failing closure starts backtracking at the location of the closure. In this case, it backtracks into the \d+ and ends up matching "66" rather than "666". If you didn't want that, use \d+: instead.

It's more succinct, however, to use the code assertion syntax. Just put angles around a parenthesized Perl expression:

    / (\d+) <( $1 < 582 )> /

I find the parens to be vaguely reminiscent of the parentheses you have to put around conditionals in C (but not Perl (anymore)). Also, the parentheses are meant to remind you that you only want to put an expression there, not a full statement.

Don't use a bare closure to try to interpolate a calculated regex, since the result will be ignored. Instead, use the <{expr}> form to do that. As with <&rule()>, the result will be interpreted as a subrule, not as if it were interpolated.

Since a string is usually true, you can just assert it to get the effect of an inline comment: <("this is a comment")>. But I've never used one except to show it as an example. Line ending comments are usually much clearer. (Just bear in mind you can't put the final regex delimiter on the same line, because it won't be seen in the comment.) You could also use the {'...'} construct for comments, but then you risk warnings about "useless use of a string in void context".

The [...] is the new non-capturing bracket notation. It seems to work very well for this purpose--I tried the other brackets and they tend to "disappear" faster than square brackets. So we reserve (...) and <...> for constructs where the visual distance between opening and closing is typically shorter than for square brackets or curlies. The square brackets also work nicely when lined up vertically with vertical bars. Here's a declaration of a named rule from the class Perl6Grammar. It parses Perl 6 statements. (Think of it as a funny looking method declaration.)

    rule state  { <label>
                    [ <control>          {.control}
                    | <sideff> <eostate> {.sideff}
                    | <@other_statements>
                    ]
                };

Huffman coding says that rarer forms should be longer, and that's the case with the lookahead and lookbehind assertions, <before ...> and <after ...>. (The negations are formed via the general <!...> rule.) Note that these prepositions are interpreted as assertions, not operations. For example, <before X> is to be read "Assert that we are before X" rather than "Look before where we are for X".

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow