Synopsis 5
by Allison Randal, Damian Conway
|
Pages: 1, 2, 3, 4, 5
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
Changed metacharacters
-
A dot
.now matches any character including newline. (The/smodifier is gone.) -
^and$now always match the start/end of a string, like the old\Aand\z. (The/mmodifier is gone.) -
A
$no longer matches an optional preceding\nso it's necessary to say\n?$if that's what you mean. -
\nnow matches a logical (platform independent) newline not just\012. -
The
\A,\Z, and\zmetacharacters are gone.
New metacharacters
-
Because
/xis default:-
#now always introduces a comment. - Whitespace is now always metasyntactic, i.e. used only for layout and not matched literally (but see the :w modifier described above).
-
-
^^and$$match line beginnings and endings. (The/mmodifier is gone.) -
.matches an ``anything'', while\Nmatches an ``anything except newline''. (The/smodifier is gone.)
Bracket rationalization
-
(...)still delimits a capturing group. -
[...]is no longer a character class. - It now delimits a non-capturing group.
-
{...}is no longer a repetition quantifier. - It now delimits an embedded closure.
- You can call Perl code as part of a regex match.
-
Embedded code does not usually affect the match - it is only used for side-effects:
/ (\S+) { print "string not blank\n"; $text = $1; } \s+ { print "but does contain whitespace\n" } / -
It can affect the match if it calls
fail:/ (\d+) {$1<256 or fail} / -
<...>are now extensible metasyntax delimiters or ``assertions'' (i.e. they replace(?...)).
Variable (non-)interpolation
|
Related Reading
|
- In Perl 6 regexes, variables don't interpolate.
- Instead they're passed ``raw'' to the regex engine, which can then decide how to handle them (more on that below).
- The default way in which the engine handles a scalar is to match it as a \Q[...] literal (i.e.it does not treat the interpolated string as a subpattern).
-
In other words, a Perl 6: / $var /
is like a Perl 5: / \Q$var\E /
- (To get regex interpolation use an assertion - see below)
-
An interpolated array:
/ @cmds /is matched as if it were an alternation of its elements:
/ [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] / - And, of course, each one is matched as a literal.
-
An interpolated hash matches a
/\w+/and then requires that sequence to be a valid key of the hash. -
So:
/ %cmds / -
is like:
/ (\w+) { fail unless exists %cmds{$1} } /
Extensible metasyntax (<...>)
-
The first character after
<determines the behaviour of the assertion. -
A leading alphabetic character means it's a grammatical assertion (i.e. a subpattern or a named character class - see below):
/ <sign>? <mantissa> <exponent>? / -
The special named assertions include:
/ <before pattern> / # was /(?=pattern)/ / <after pattern> / # was /(?<pattern)/ # but now a real pattern!/ <ws> / # match any whitespace/ <sp> / # match a space char -
A leading number, pair of numbers, or pair of scalars means it's a repetition specifier:
/ value was (\d<1,6>) with (\w<$m,$n>) / -
A leading
$,@,%, or&interpolates a variable or subroutine return value as a regex rather than as a literal:/ <$key_pat> = <@value_alternatives> / -
A leading
(indicates a code assertion:/ (\d<1,3>) <( $1 < 256 )> / -
Same as:
/ (\d<1,3>) {$1<256 or fail} / -
A leading
{indicates code that produces a regex to be interpolated into the pattern at that point:/ (<ident>) <{ cache{$1} //= get_body($1) }> / -
A leading
[indicates an enumerated character class:/ <[a-z_]>* / -
A leading
-indicates a complemented character class:/ <-[a-z_]> <-<alpha>> / -
A leading
'indicates an interpolated literal match (including whitespace):/ <'match this exactly (whitespace matters)'> / -
The special assertion
<.>matches any logical grapheme (including a Unicode combining character sequences):/ seekto = <.> / # Maybe a combined char -
A leading
!indicates a negated meaning (a zero-width assertion except for repetition specifiers):/ <!before _> # We aren't before an _ \w<!1,3> # We match 0 or >3 word chars /


