Synopsis 5
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
/x) is no longer required...it's the default.
/s or /m modifiers (changes to the meta-characters replace them - see below).
/e evaluation modifier on substitutions;
use s/pattern/$( code() )/ instead.
/g modifier has been renamed to e (for each).
@matches = m:ei/\s* (\w*) \s* ,?/;
:i :ignorecase
:e :each
:c (or :cont) modifier causes the match to continue from the string's current .pos:
m:c/ pattern / # start at end of
# previous match on $_
:o (:once) modifier replaces the Perl 5 ?...? syntax:
m:once/ pattern / # only matches first time
:w (:word) modifier causes whitespace sequences to be replaced by \s* or \s+ subpattern:
m:w/ next cmd = <condition>/
m/ \s* next \s+ cmd \s* = \s* <condition>/
:uN modifier specifies Unicode level:
m:u0/ .<2> / # match two bytes
m:u1/ .<2> / # match two codepoints
m:u2/ .<2> / # match two graphemes
m:u3/ .<2> / # match language dependently
:p5 modifier allows Perl 5 regex syntax to be used instead:
m:p5/(?mi)^[a-z]{1,2}(?=\s)/
x, it means repetition:
s:4x{ (<ident>) = (\N+) $$}{$1 => $2};
# same as:
s{ (<ident>) = (\N+) $$}{$1 => $2} for 1..4;
st, nd, rd, or th, it means find the Nth occurance:
s:3rd/(\d+)/@data[$1]/;
# same as:
m/(\d+)/ && m:c/(\d+)/ && s:c/(\d+)/@data[$1]/;
:any modifier, the regex will match every possible way (including overlapping) and return all matches.
$str = "abracadabra";
@substrings = $str =~ m:any/ a (.*) a /;
# br brac bracad bracadabr c cad cadabr d dabr br
:i, :w, :c, :uN, and :p5 modifiers can be placed inside the regex (and are lexically scoped):
m/:c alignment = [:i left|right|cent[er|re]] /
m:fuzzy/pattern/;
s:ewi/cat/feline/
m:fuzzy('bare')/pattern/;
|
by Allison Randal, Damian Conway
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
. now matches any character including newline. (The /s modifier is gone.)
^ and $ now always match the start/end of a string, like the old \A and \z. (The /m modifier is gone.)
$ no longer matches an optional preceding \n so it's necessary to say \n?$ if that's what you mean.
\n now matches a logical (platform independent) newline not just \012.
\A, \Z, and \z metacharacters are gone.
/x is default:
# now always introduces a comment.
^^ and $$ match line beginnings and endings. (The /m modifier is gone.)
. matches an ``anything'', while \N matches an ``anything except newline''. (The /s modifier is gone.)
(...) still delimits a capturing group.
[...] is no longer a character class.
{...} is no longer a repetition quantifier.
/ (\S+) { print "string not blank\n"; $text = $1; }
\s+ { print "but does contain whitespace\n" }
/
fail:
/ (\d+) {$1<256 or fail} /
<...> are now extensible metasyntax delimiters or ``assertions'' (i.e. they replace (?...)).
|
Related Reading
|
is like a Perl 5: / \Q$var\E /
/ @cmds /
is matched as if it were an alternation of its elements:
/ [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] /
/\w+/ and then requires that sequence to be a valid key of the hash.
/ %cmds /
/ (\w+) { fail unless exists %cmds{$1} } /
<...>)< determines the behaviour of the assertion.
/ <sign>? <mantissa> <exponent>? /
/ <before pattern> / # was /(?=pattern)/
/ <after pattern> / # was /(?<pattern)/
# but now a real pattern!
/ <ws> / # match any whitespace
/ <sp> / # match a space char
/ value was (\d<1,6>) with (\w<$m,$n>) /
$, @, %, or & interpolates a variable or subroutine return value as a regex rather than as a literal:
/ <$key_pat> = <@value_alternatives> /
( indicates a code assertion:
/ (\d<1,3>) <( $1 < 256 )> /
/ (\d<1,3>) {$1<256 or fail} /
{ indicates code that produces a regex to be interpolated into the pattern at that point:
/ (<ident>) <{ cache{$1} //= get_body($1) }> /
[ indicates an enumerated character class:
/ <[a-z_]>* /
- indicates a complemented character class:
/ <-[a-z_]> <-<alpha>> /
' indicates an interpolated literal match (including whitespace):
/ <'match this exactly (whitespace matters)'> /
<.> matches any logical grapheme (including a Unicode combining character sequences):
/ seekto = <.> / # Maybe a combined char
! indicates a negated meaning (a zero-width assertion except for repetition specifiers):
/ <!before _> # We aren't before an _
\w<!1,3> # We match 0 or >3 word chars
/
|
by Allison Randal, Damian Conway
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
\p and \P properties become intrinsic grammar rules (<prop ...> and <!prop ...>).
\L...\E, \U...\E, and \Q...\E sequences become \L[...], \U[...], and \Q[...] (\E is gone).
\Q[...] will rarely be needed since raw variables interpolate as eq matches, rather than regexes.
\1) are gone; $1 can be used instead, because it's no longer interpolated.
\h and \v, match horizontal and vertical whitespace respectively, including Unicode.
\s now matches any Unicode whitespace character.
\N matches anything except a logical newline; it is the negation of \n.
\H matches anything but horizontal whitespace.
\V matches anything but vertical whitespace.
\T matches anything but a tab.
\R matches anything but a return.
\F matches anything but a formfeed.
\E matches anything but an escape.
\X... matches anything but the specified hex character.
qr/pattern/ regex constructor is gone.
rule { pattern } # always takes {...} as delimiters
rx/ pattern / # can take (almost any) chars as delimiters
|
Related Reading
|
$regex = rule :ewi { my name is (.*) };
$regex = rx:ewi/ my name is (.*) /;
qr because it's no longer an interpolating quote-like operator.
sub {...} constructor.
{...} is now always a closure (which may still execute immediately in certain contexts and be passed as a reference in others)...
/.../ is now always a regex (which may still match immediately in certain contexts and be passed as a reference in others).
/.../ matches immediately in a void or Boolean context, or when it is an explicit argument of a =~.
$var = /pattern/;
no longer does the match and sets $var to the result.
$var.
m{...} or rx{...}:
$var = m{pattern}; # Match regex, assign result
$var = rx{pattern}; # Assign regex itself
@list = split /pattern/, $str;
are now just consequences of the normal semantics.
sub my_grep($selector, *@list) {
given $selector {
when RULE { ... }
when CODE { ... }
when HASH { ... }
# etc.
}
}
{...} or /.../ in the scalar context of the first argument causes it to produce a CODE or RULE reference, which the switch statement then selects upon.
m:w/ \( <expr> [ , <expr> ]* : \) /
(i.e. there's no point trying fewer <expr> matches, if there's no closing parenthesis on the horizon)
m:w/ [ if :: <expr> <block>
| for :: <list> <block>
| loop :: <loop_controls>? <block>
]
/
(i.e. there's no point trying to match a different keyword if one was already found but failed)
rule ident {
( [<alpha>|_] \w* ) ::: { fail if %reserved{$1} }
| " [<alpha>|_] \w* "
}
m:w/ get <ident>? /
(i.e. using an unquoted reserved word as an identifier is not permitted)
<commit> assertion causes the entire match to fail outright, no matter how many subrules down it happens:
rule subname {
([<alpha>|_] \w*) <commit> { fail if %reserved{$1} }
}
m:w/ sub <subname>? <block> /
(i.e. using a reserved word as a subroutine name is instantly fatal to the ``surrounding'' match as well)
|
by Allison Randal, Damian Conway
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
sub and rule extends much further.
rule ident { [<alpha>|_] \w* }
# and later...
@ids = grep /<ident>/, @strings;
rule serial_number { <[A-Z]> \d<8> })
rule type { alpha | beta | production | deprecated | legacy }
in other regexes as named assertions:
rule identification { [soft|hard]ware <type> <serial_number> }
/<prior>/
/<null>/
let and then bound to the desired value:
/ (\d+) {let $num := $1} (<alpha>+)/
$num will only be bound if the digits are actually found.
/ [ (\d+) { let $num := $1 }
| (<alpha>+) { let $alpha := $2 }
| (.) { let $other := $3 }
]
/
/ [ $num := (\d+)
| $alpha:= (<alpha>+)
| $other:=(.)
]
/
$1, $2, etc.) are also ``hypothetical''.
my ($key, $val) = m:w{ $1:=(\w+) =\> $2:=(.*?)
| $2:=(.*?) \<= $1:=(\w+)
};
/ @values:=[ (.*?) , ]* /
/ %options:=[ (<ident>) = (\N+) ]* /
/ %options:=[ (<ident>) = \N+ ]* /
<rule>) also capture their result in a hypothetical variable of the same name as the rule:
/ <key> =\> <value> { %hash{$key} = $value } /
$0
if /pattern/ {...}
# or:
/pattern/; if $0 {...}
$match_count += m:e/pattern/;
print %hash{$text =~ /,? (<ident>)/};
# or:
$text =~ /,? (<ident>)/ && print %hash{$0};
$0 acts like a hypothetical variable.
$$ does in yacc)
$0:= to override the default return behaviour described above:
rule string1 { (<["'`]>) ([ \\. | <-[\\]> ]*?) $1 }
$match = m/<string1>/; # default: $match includes
# opening and closing quotes
rule string2 { (<["'`]>) $0:=([ \\. | <-[\\]> ]*?) $1 }
$match = m/<string2>/; # $match now excludes quotes
# because $0 explicitly bound
# to second capture only
|
by Allison Randal, Damian Conway
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
my @array := <$fh>; # lazy when aliased
my $array is from(\@array); # tie scalar
# and later...
$array =~ m/pattern/; # match from stream
ident regex shouldn't clobber someone else's ident regex.
class Identity {
method name { "Name = $.name" }
method age { "Age = $.age" }
method addr { "Addr = $.addr" }
method desc {
print .name(), "\n",
.age(), "\n",
.addr(), "\n";
}
# etc.
}
grammar Identity {
rule name :w { Name = (\N+) }
rule age :w { Age = (\d+) }
rule addr :w { Addr = (\N+) }
rule desc {
<name> \n
<age> \n
<addr> \n
}
# etc.
}
grammar Letter {
rule text { <greet> <body> <close> }
rule greet :w { [Hi|Hey|Yo] $to:=(\S+?) , $$}
rule body { <line>+ }
rule close :w { Later dude, $from:=(.+) }
# etc.
}
grammar FormalLetter is Letter {
rule greet :w { Dear $to:=(\S+?) , $$}
rule close :w { Yours sincerely, $from:=(.+) }
}
grammar Perl { # Perl's own grammar
rule prog { <line>* }
rule line { <decl>
| <loop>
| <label> [<cond>|<sideff>|;]
}
rule decl { <sub> | <class> | <use> }
# etc. etc.
}
given $source_code {
$parsetree = m/<Perl::prog>/;
}
PAIR:
$str =~ tr( 'A-C' => 'a-c' );
$str =~ tr( {'A'=>'a', 'B'=>'b', 'C'=>'c'} );
$str =~ tr( {'A-Z'=>'a-z', '0-9'=>'A-F'} );
$str =~ tr( %mapping );
$str =~ tr( ['A'..'C'], ['a'..'c'] );
$str =~ tr( @UPPER, @lower );
$str =~ tr( [' ', '<', '>', '&' ],
[' ', '<', '>', '&' ]);
Dr. Damian Conway is a Senior Lecturer in Computer Science and Software Engineering at Monash University (Melbourne, Australia), where he teaches object-oriented software engineering. He is an effective teacher, an accomplished writer, and the author of several popular Perl modules. He is also a semi-regular contributor to the Perl Journal. In 1998 he was the winner of the Larry Wall Award for Practical Utility for two modules (Getopt::Declare and Lingua::EN::Inflect) and in 1999 he won his second "Larry" for his Coy.pm haiku-generation module. He has just published "Object-Oriented Perl" (Manning, 1999).
Allison Randal's first geek career was as a research linguist in eastern Africa. Working with minority languages led to a series of academic papers delivered in obscure places like the Czech Republic. But eventually her love of coding seduced her away from natural languages to artificial ones. In particular, to Perl. After serving several tours of duty in the dot.com trenches, she has recently returned to Darkest Academia, at the University of Portland. In her spare time she enjoys extreme sports: teaching Perl to Java programmers, Perl Monger wrangling, and debating linguistics with Larry and Damian.
Return to Perl.com.
Perl.com Compilation Copyright © 1998-2006 O'Reilly Media, Inc.