Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
Keyword and Context Reform
Deferred regex rules are now defined with rx// rather than qr//,
because a regular expression is no longer a kind of quoted string.
Actually, just as you can define closures without an explicit sub,
any // or rx// declares a deferred regex if it's not in a context
that executes it immediately. A regex is executed automatically if
it's in a boolean, numeric, or string context. But assignment to
an untyped variable is not such a context, nor is passing the regex
as an untyped parameter to a function. (Of course, an explicitly
declared RULE parameter doesn't provide an evaluating context either.)
So these are equivalent:
my $foo = /.../; # create regex object
my $foo = rx[...]; # create regex object
my $foo = rule {...}; # create regex object
Likewise, these are equivalent:
@x = split /.../;
@x = split rx[...];
@x = split rule {...};
The "rule" syntax is just a way of declaring a deferred regex as if it were a subroutine or method. More on that later.
To force an immediate evaluation of a deferred regex where it wouldn't ordinarily be, you can use the appropriate unary operator:
my $foo = ?/.../; # boolean context, return whether matched,
my $foo = +/.../; # numeric context, return count of matches
my $foo = _/.../; # string context, return captured/matched string
The standard match and substitution forms also force immediate evaluation regardless of context:
$result = m/.../; # do match on topic string
$result = s/.../.../; # do substitution on topic string
These forms also force the regex to start matching at the beginning
of the string in question and scan forward through the string for
the match, as if there were an implicit .*? in front of every
iteration. (Both of these behaviors are suppressed if you use
the :c/:cont modifier). In contrast, the meaning of the
deferred forms is dependent on context. In particular, a deferred
regex naturally assumes :c when used as a subrule. That is,
it continues where the last match left off, and the next thing has
to match right there at the head of the string.
In any other context, including list context, a deferred regex is not immediately evaluated, but produces a reference to the regex object:
my $rx = /.../; # not evaluated
my @foo = $rx; # ERROR: type mismatch.
my @foo = ($rx); # One element, a regex object.
my @foo = (/.../); # Same thing.
my @foo := $rx; # Set autogrow rule for @foo.
To evaluate repeatedly in list context, treat the regex object as you would any other iterator:
my @foo = <$rx>;
You can also use the more explicit form:
my @foo = m/<$rx>/;
Those aren't identical, since the former assumes :c and starts up at the current
position of the unmentioned topic, while the latter explicitly resets
the position to the beginning before scanning. Also, since the deferred
regex assumes a :c modifier, <$rx> won't scan through the string
like m//. It can return multiple values to the list, but they
have to be contiguous. You can get the scanning effect of m//
by prepending the pattern with .*?.
But it's vitally important to understand this fundamental change, that
// is no longer a short form of m//, but rather a short form of rx//.
If you want to add modifiers to a //, you have to turn it into an rx//, not
an m//. It's now wrong to call split like this:
split m/.../
(That is, it's wrong unless you actually want the return value of the pattern match to be used as the literal split delimiter.)
The old ?...? syntax is gone. Indeed, it has to go for us to get
the unary ? operator.
Old New
--- ---
?pat? m:once/pat/
qr// rx//
rule { }
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

