Sign In/My Account | View Cart  
advertisement


Listen Print

Synopsis 5
by Allison Randal, Damian Conway | Pages: 1, 2, 3, 4, 5

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.

Changed metacharacters

  • A dot . now matches any character including newline. (The /s modifier is gone.)

  • ^ and $ now always match the start/end of a string, like the old \A and \z. (The /m modifier is gone.)

  • A $ no longer matches an optional preceding \n so it's necessary to say \n?$ if that's what you mean.

  • \n now matches a logical (platform independent) newline not just \012.

  • The \A, \Z, and \z metacharacters are gone.


New metacharacters

  • Because /x is default:
    • # now always introduces a comment.

    • Whitespace is now always metasyntactic, i.e. used only for layout and not matched literally (but see the :w modifier described above).

  • ^^ and $$ match line beginnings and endings. (The /m modifier is gone.)

  • . matches an ``anything'', while \N matches an ``anything except newline''. (The /s modifier is gone.)


Bracket rationalization

  • (...) still delimits a capturing group.

  • [...] is no longer a character class.

  • It now delimits a non-capturing group.

  • {...} is no longer a repetition quantifier.

  • It now delimits an embedded closure.

  • You can call Perl code as part of a regex match.

  • Embedded code does not usually affect the match - it is only used for side-effects:
        / (\S+) { print "string not blank\n"; $text = $1; }
           \s+  { print "but does contain whitespace\n" }
        /

  • It can affect the match if it calls fail:
        / (\d+) {$1<256 or fail} /

  • <...> are now extensible metasyntax delimiters or ``assertions'' (i.e. they replace (?...)).


Variable (non-)interpolation

Related Reading

Perl & LWP

Perl & LWP
By Sean M. Burke

  • In Perl 6 regexes, variables don't interpolate.

  • Instead they're passed ``raw'' to the regex engine, which can then decide how to handle them (more on that below).

  • The default way in which the engine handles a scalar is to match it as a \Q[...] literal (i.e.it does not treat the interpolated string as a subpattern).

  • In other words, a Perl 6: / $var /

    is like a Perl 5: / \Q$var\E /


  • (To get regex interpolation use an assertion - see below)

  • An interpolated array:
        / @cmds /

    is matched as if it were an alternation of its elements:

        / [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] /

  • And, of course, each one is matched as a literal.

  • An interpolated hash matches a /\w+/ and then requires that sequence to be a valid key of the hash.

  • So:
        / %cmds /

  • is like:
        / (\w+) { fail unless exists %cmds{$1} } /


Extensible metasyntax (<...>)

  • The first character after < determines the behaviour of the assertion.

  • A leading alphabetic character means it's a grammatical assertion (i.e. a subpattern or a named character class - see below):
        / <sign>? <mantissa> <exponent>? /

  • The special named assertions include:
        / <before pattern> /    # was /(?=pattern)/
        / <after pattern> /     # was /(?<pattern)/ 
                                # but now a real pattern!
        / <ws> /                # match any whitespace
        / <sp> /                # match a space char

  • A leading number, pair of numbers, or pair of scalars means it's a repetition specifier:
        / value was (\d<1,6>) with (\w<$m,$n>) /

  • A leading $, @, %, or & interpolates a variable or subroutine return value as a regex rather than as a literal:
        / <$key_pat> = <@value_alternatives> /

  • A leading ( indicates a code assertion:
        / (\d<1,3>) <( $1 < 256 )> /

  • Same as:
        / (\d<1,3>) {$1<256 or fail} /

  • A leading { indicates code that produces a regex to be interpolated into the pattern at that point:
        / (<ident>)  <{ cache{$1} //= get_body($1) }> /

  • A leading [ indicates an enumerated character class:
        / <[a-z_]>* /

  • A leading - indicates a complemented character class:
        / <-[a-z_]> <-<alpha>> /

  • A leading ' indicates an interpolated literal match (including whitespace):
        / <'match this exactly (whitespace matters)'> /

  • The special assertion <.> matches any logical grapheme (including a Unicode combining character sequences):
        / seekto = <.> /  # Maybe a combined char

  • A leading ! indicates a negated meaning (a zero-width assertion except for repetition specifiers):
        / <!before _>    # We aren't before an _
          \w<!1,3>       # We match 0 or >3 word chars
        /

Pages: 1, 2, 3, 4, 5

Next Pagearrow