Synopsis 5
by Allison Randal, Damian Conway
|
Pages: 1, 2, 3, 4, 5
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.
Named Regexes
-
The analogy between
subandruleextends much further. - Just as you can have anonymous subs and named subs...
-
...so too you can have anonymous regexes and named regexes:
rule ident { [<alpha>|_] \w* }# and later...@ids = grep /<ident>/, @strings; -
As the above example indicates, it's possible to refer to named regexes, such as:
rule serial_number { <[A-Z]> \d<8> }) rule type { alpha | beta | production | deprecated | legacy }in other regexes as named assertions:
rule identification { [soft|hard]ware <type> <serial_number> }
Nothing is illegal
- The null pattern is now illegal.
-
To match whatever the prior successful regex matched, use:
/<prior>/ -
To match the zero-width string, use:
/<null>/
Hypothetical variables
- In embedded closures it's possible to bind a variable to a value that only ``sticks'' if the surrounding pattern successfully matches.
-
A variable is declared with the keyword
letand then bound to the desired value:/ (\d+) {let $num := $1} (<alpha>+)/ -
Now
$numwill only be bound if the digits are actually found. - If the match ever backtracks past the closure (i.e. if there are no alphabetics following), the binding is ``undone''.
-
This is even more interesting in alternations:
/ [ (\d+) { let $num := $1 } | (<alpha>+) { let $alpha := $2 } | (.) { let $other := $3 } ] / -
There is also a shorthand for assignment to hypothetical variables:
/ [ $num := (\d+) | $alpha:= (<alpha>+) | $other:=(.) ] / -
The numeric variables (
$1,$2, etc.) are also ``hypothetical''. -
Numeric variables can be assigned to, and even re-ordered:
my ($key, $val) = m:w{ $1:=(\w+) =\> $2:=(.*?) | $2:=(.*?) \<= $1:=(\w+) }; -
Repeated captures can be bound to arrays:
/ @values:=[ (.*?) , ]* / -
Pairs of repeated captures can be bound to hashes:
/ %options:=[ (<ident>) = (\N+) ]* / -
Or just capture the keys (and leave the values undef):
/ %options:=[ (<ident>) = \N+ ]* / -
Subrules (e.g.
<rule>) also capture their result in a hypothetical variable of the same name as the rule:/ <key> =\> <value> { %hash{$key} = $value } /
Return values from matches
-
A match always returns a ``match object'', which is also available as (lexical)
$0 -
The match object evaluates differently in different contexts:
-
in boolean context it evaluates as true or false (i.e. did the match succeed?):
if /pattern/ {...} # or: /pattern/; if $0 {...} -
in numeric context it evaluates to the number of matches:
$match_count += m:e/pattern/; -
in string context it evaluates to the captured substring (if there was exactly one capture in the pattern) or to the entire text that was matched (if the pattern does not capture, or captures multiple elements):
print %hash{$text =~ /,? (<ident>)/}; # or: $text =~ /,? (<ident>)/ && print %hash{$0};
-
in boolean context it evaluates as true or false (i.e. did the match succeed?):
-
Within a regex,
$0acts like a hypothetical variable. -
It controls what a regex match returns (like
$$does in yacc) -
Use
$0:=to override the default return behaviour described above:rule string1 { (<["'`]>) ([ \\. | <-[\\]> ]*?) $1 }$match = m/<string1>/; # default: $match includes # opening and closing quotesrule string2 { (<["'`]>) $0:=([ \\. | <-[\\]> ]*?) $1 }$match = m/<string2>/; # $match now excludes quotes # because $0 explicitly bound # to second capture only

