Sign In/My Account | View Cart  
advertisement


Listen Print

Synopsis 5
by Allison Randal, Damian Conway | Pages: 1, 2, 3, 4, 5

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.

Backslash reform

  • The \p and \P properties become intrinsic grammar rules (<prop ...> and <!prop ...>).

  • The \L...\E, \U...\E, and \Q...\E sequences become \L[...], \U[...], and \Q[...] (\E is gone).

  • Note that \Q[...] will rarely be needed since raw variables interpolate as eq matches, rather than regexes.

  • Backreferences (e.g. \1) are gone; $1 can be used instead, because it's no longer interpolated.

  • New backslash sequences, \h and \v, match horizontal and vertical whitespace respectively, including Unicode.

  • \s now matches any Unicode whitespace character.

  • The new backslash sequence \N matches anything except a logical newline; it is the negation of \n.

  • A series of other new capital backslash sequences are also the negation of their lower-case counterparts:
    • \H matches anything but horizontal whitespace.

    • \V matches anything but vertical whitespace.

    • \T matches anything but a tab.

    • \R matches anything but a return.

    • \F matches anything but a formfeed.

    • \E matches anything but an escape.

    • \X... matches anything but the specified hex character.


Regexes are rules

  • The Perl 5 qr/pattern/ regex constructor is gone.

  • The Perl 6 equivalents are:
        rule { pattern }    # always takes {...} as delimiters
        rx/ pattern /       # can take (almost any) chars as delimiters

  • Related Reading

    Perl & XML

    Perl & XML
    By Erik T. Ray, Jason McIntosh

  • If either needs modifiers, they go before the opening delimiter:
        $regex = rule :ewi { my name is (.*) };
        $regex = rx:ewi/ my name is (.*) /;

  • The name of the constructor was changed from qr because it's no longer an interpolating quote-like operator.

  • As the syntax indicates, it is now more closely analogous to a sub {...} constructor.

  • In fact, that analogy will run very deep in Perl 6.

  • Just as a raw {...} is now always a closure (which may still execute immediately in certain contexts and be passed as a reference in others)...

  • ...so too a raw /.../ is now always a regex (which may still match immediately in certain contexts and be passed as a reference in others).

  • Specifically, a /.../ matches immediately in a void or Boolean context, or when it is an explicit argument of a =~.

  • Otherwise it's a regex constructor.

  • So this:
        $var = /pattern/;

    no longer does the match and sets $var to the result.


  • Instead it assigns a regex reference to $var.

  • The two cases can always be distinguished using m{...} or rx{...}:
        $var = m{pattern};    # Match regex, assign result
        $var = rx{pattern};   # Assign regex itself

  • Note that this means that former magically lazy usages like:
        @list = split /pattern/, $str;

    are now just consequences of the normal semantics.


  • It's now also possible to set up a user-defined subroutine that acts like grep:
        sub my_grep($selector, *@list) {
            given $selector {
                when RULE  { ... }
                when CODE  { ... }
                when HASH  { ... }
                # etc.
            }
        }

  • Using {...} or /.../ in the scalar context of the first argument causes it to produce a CODE or RULE reference, which the switch statement then selects upon.


Backtracking control

  • Backtracking over a single colon causes the regex engine not to retry the preceding atom:
        m:w/ \( <expr> [ , <expr> ]* : \) /

    (i.e. there's no point trying fewer <expr> matches, if there's no closing parenthesis on the horizon)


  • Backtracking over a double colon causes the surrounding group to immediately fail:
        m:w/ [ if :: <expr> <block>
             | for :: <list> <block>
             | loop :: <loop_controls>? <block>
             ]
        /

    (i.e. there's no point trying to match a different keyword if one was already found but failed)


  • Backtracking over a triple colon causes the current rule to fail outright (no matter where in the rule it occurs):
        rule ident {
              ( [<alpha>|_] \w* ) ::: { fail if %reserved{$1} }
            | " [<alpha>|_] \w* "
        }
        m:w/ get <ident>? /

    (i.e. using an unquoted reserved word as an identifier is not permitted)


  • Backtracking over a <commit> assertion causes the entire match to fail outright, no matter how many subrules down it happens:
        rule subname {
            ([<alpha>|_] \w*) <commit> { fail if %reserved{$1} }
        }
        m:w/ sub <subname>? <block> /

    (i.e. using a reserved word as a subroutine name is instantly fatal to the ``surrounding'' match as well)


  • A <cut> assertion always matches successfully, and has the side effect of deleting the parts of the string already matched.

  • Attempting to backtrack past a <cut> causes the complete match to fail (like backtracking past a <commit>. This is because there's now no preceding text to backtrack into.

  • This is useful for throwing away successfully processed input when matching from an input stream or an iterator of arbitrary length.

Pages: 1, 2, 3, 4, 5

Next Pagearrow