Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
RFC 348: Regex assertions in plain Perl code
This RFC makes some good points, though the code assertion syntax will be:
<( code )>
The RFC also makes a case for getting rid of the special behavior of
local in Perl 5, which treated local differently within a regex.
However, something very like the local behavior will still be
needed for making hypotheses, though the RFC is correct that it's
not needed in the typical code assertion, In Perl 6, localization is
done with temp, and it will not do the hypothetical variable hack
that Perl 5 did. Instead there will be an explicit lvalue modifier,
let, which specifically requests a variable's value to be scoped
to the success of the current point in the regex. These hypothetical
variables actually have much broader use than this RFC suggests.
Perl 5's hardwired use of $^R just translates to an appropriately
named hypothetical variable in Perl 6.
RFC 360: Allow multiply matched groups in regexes to return a listref of all matches
I think that parens that can potentially match multiple times will automatically produce a list rather than matching the final one. It's not as if we can't tell whether something's inside a quantifier...
Here's the RFC's proposed solution:
while ($text =~ /name:\s*(.*?)\n\s*
children:\s*(?:(?@\S+)[, ]*)*\n\s*
favorite\ colors:\s*(?:(?@\S+)[, ]*)*\n/sigx) {
# now we have:
# $1 = "John Abajace";
# $2 = ["Tom", "Dick", "Harry"]
# $3 = ["red", "green", "blue"]
}
Apart from the change in behavior of (...) within a quantifier, I have
the urge to rewrite this example for several reasons:
-
The C</x> and C</s> flags no longer exist. -
The C</i> and C</g> flags must be pulled out to the front for visibility. (And the C</g> flag is renamed C<:e>). -
There's now a C<\h> for horizontal whitespace, and C<\H> for the negation of that. (Not that RFC is incorrect to use C<\s>.) -
The negation of C<\n> is now C<\N>. -
The C<:> character is now a metacharacter, and so must be backslashed. -
Character classes are now represented with C<< <[...]> >>. -
Grouping is now represented with C<[...]>.
With these changes, and making better use of whitespace, the sample regex ends up looking like this:
for ($text =~ m:ie[
name \: \h* (\N*?) \n
\h* children \: \h* [ (\S+) <[,\h]>* ]* \n
\h* favorite\ colors \: \h* [ (\S+) <[,\h]>* ]* \n
]
)
{
# now we have:
# $1 = "John Abajace";
# $2 = ["Tom", "Dick", "Harry"]
# $3 = ["red", "green", "blue"]
}
I think in the long run people will find this more readable once they're used to it. Certainly tabularizing the parallelisms will make any typing errors stand out.
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

