Synopsis 5
by Allison Randal, Damian Conway
|
Pages: 1, 2, 3, 4, 5
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
Backslash reform
-
The
\pand\Pproperties become intrinsic grammar rules (<prop ...>and<!prop ...>). -
The
\L...\E,\U...\E, and\Q...\Esequences become\L[...],\U[...], and\Q[...](\Eis gone). -
Note that
\Q[...]will rarely be needed since raw variables interpolate aseqmatches, rather than regexes. -
Backreferences (e.g.
\1) are gone;$1can be used instead, because it's no longer interpolated. -
New backslash sequences,
\hand\v, match horizontal and vertical whitespace respectively, including Unicode. -
\snow matches any Unicode whitespace character. -
The new backslash sequence
\Nmatches anything except a logical newline; it is the negation of\n. -
A series of other new capital backslash sequences are also the negation of their lower-case counterparts:
-
\Hmatches anything but horizontal whitespace. -
\Vmatches anything but vertical whitespace. -
\Tmatches anything but a tab. -
\Rmatches anything but a return. -
\Fmatches anything but a formfeed. -
\Ematches anything but an escape. -
\X...matches anything but the specified hex character.
-
Regexes are rules
-
The Perl 5
qr/pattern/regex constructor is gone. -
The Perl 6 equivalents are:
rule { pattern } # always takes {...} as delimiters rx/ pattern / # can take (almost any) chars as delimiters -
If either needs modifiers, they go before the opening delimiter:
$regex = rule :ewi { my name is (.*) }; $regex = rx:ewi/ my name is (.*) /; -
The name of the constructor was changed from
qrbecause it's no longer an interpolating quote-like operator. -
As the syntax indicates, it is now more closely analogous to a
sub {...}constructor. - In fact, that analogy will run very deep in Perl 6.
-
Just as a raw
{...}is now always a closure (which may still execute immediately in certain contexts and be passed as a reference in others)... -
...so too a raw
/.../is now always a regex (which may still match immediately in certain contexts and be passed as a reference in others). -
Specifically, a
/.../matches immediately in a void or Boolean context, or when it is an explicit argument of a=~. - Otherwise it's a regex constructor.
-
So this:
$var = /pattern/;no longer does the match and sets
$varto the result. -
Instead it assigns a regex reference to
$var. -
The two cases can always be distinguished using
m{...}orrx{...}:$var = m{pattern}; # Match regex, assign result $var = rx{pattern}; # Assign regex itself -
Note that this means that former magically lazy usages like:
@list = split /pattern/, $str;are now just consequences of the normal semantics.
-
It's now also possible to set up a user-defined subroutine that acts like grep:
sub my_grep($selector, *@list) { given $selector { when RULE { ... } when CODE { ... } when HASH { ... } # etc. } } -
Using
{...}or/.../in the scalar context of the first argument causes it to produce aCODEorRULEreference, which the switch statement then selects upon.
|
Related Reading
|
Backtracking control
-
Backtracking over a single colon causes the regex engine not to retry the preceding atom:
m:w/ \( <expr> [ , <expr> ]* : \) /(i.e. there's no point trying fewer
<expr>matches, if there's no closing parenthesis on the horizon) -
Backtracking over a double colon causes the surrounding group to immediately fail:
m:w/ [ if :: <expr> <block> | for :: <list> <block> | loop :: <loop_controls>? <block> ] /(i.e. there's no point trying to match a different keyword if one was already found but failed)
-
Backtracking over a triple colon causes the current rule to fail outright (no matter where in the rule it occurs):
rule ident { ( [<alpha>|_] \w* ) ::: { fail if %reserved{$1} } | " [<alpha>|_] \w* " }m:w/ get <ident>? /(i.e. using an unquoted reserved word as an identifier is not permitted)
-
Backtracking over a
<commit>assertion causes the entire match to fail outright, no matter how many subrules down it happens:rule subname { ([<alpha>|_] \w*) <commit> { fail if %reserved{$1} } } m:w/ sub <subname>? <block> /(i.e. using a reserved word as a subroutine name is instantly fatal to the ``surrounding'' match as well)
- A <cut> assertion always matches successfully, and has the side effect of deleting the parts of the string already matched.
- Attempting to backtrack past a <cut> causes the complete match to fail (like backtracking past a <commit>. This is because there's now no preceding text to backtrack into.
- This is useful for throwing away successfully processed input when matching from an input stream or an iterator of arbitrary length.


