Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
Null String Reform
The null pattern is now illegal. To match whatever you used to match with a null pattern, use one of these:
Old New
--- ---
// /<prior>/ # match what prior match did
// /<null>/ # match the null string between chars
(a|b|) (a|b|<null>) # match a null alternative
Note that, as an assertion, <null> always succeeds. You never want to say:
/ <null> | single | double | triple | home run /
because you'll never get to first base.
Extension Syntax Reform
There are no longer any (?...) sequences, because parens now always
capture. Some of the replacement sequences take their intrinsic
scoping from <...>, while others are associated with other
bracketing characters, or with any arbitrary atom that could be a
bracketed construct. Looking at the metasyntax problem from the
perspective of a Perl5-to-Perl6 translator, here's what the various
Perl 5 extension constructs translate to:
Old New
--- ---
(??{$rule}) <$rule> # call regex in variable
(?{ code }) { code } # call Perl code, ignore result
(?#...) <('...')> # in-line comment, rarely needed
(?:...) [...] # non-capturing brackets
(?=...) <before ...> # positive lookahead
(?!...) <!before ...> # negative lookahead
(?<=...) <after ...> # positive lookbehind
(?<!...) <!after ...> # negative lookbehind
(?>...) [...]: # grab (any atom)
(?(cond)yes|no) [ cond :: yes | no ]
(?(1)yes|no) [ <(defined $1)> :: yes | no ]
The <$rule> construct does a "delayed" call of another
regular expression stored in the $rule variable. If it is
a regex object, it's just called as if it were a subroutine,
so there's no performance problem. If it's a string, it is
compiled as a regex and executed. The compiled form is cached
as a property of the string, so it doesn't have to be recompiled
unless the string changes. (This implies that we can have properties
that invalidate themselves when their base object is modified.)
In either case, the evaluated regex is treated as a subrule, and
any captures it does are invisible to the outer regex unless the
outer regex takes steps to retrieve them. In any event, subrule
parens never change the paren count of the outer rule.
The {code} form doesn't return anything meaningful--it is used for
its side effects. Any such closure may behave as an assertion. It merely
has to throw an exception in order to fail. To throw such an exception
(on purpose) one may use fail:
$_ = "666";
/ (\d+) { $1 < 582 or fail }/
As with any assertion, the failing closure starts backtracking at the
location of the closure. In this case, it backtracks into the \d+
and ends up matching "66" rather than "666". If you didn't
want that, use \d+: instead.
It's more succinct, however, to use the code assertion syntax. Just put angles around a parenthesized Perl expression:
/ (\d+) <( $1 < 582 )> /
I find the parens to be vaguely reminiscent of the parentheses you have to put around conditionals in C (but not Perl (anymore)). Also, the parentheses are meant to remind you that you only want to put an expression there, not a full statement.
Don't use a bare closure to try to interpolate a calculated regex, since the
result will be ignored. Instead, use the <{expr}> form to do that.
As with <&rule()>, the result will be interpreted as a subrule, not as
if it were interpolated.
Since a string is usually true, you can just assert it to get the effect of
an inline comment: <("this is a comment")>. But I've never used one
except to show it as an example. Line ending comments are usually much
clearer. (Just bear in mind you can't put the final regex delimiter on the
same line, because it won't be seen in the comment.) You could also use
the {'...'} construct for comments, but then you risk warnings about
"useless use of a string in void context".
The [...] is the new non-capturing bracket notation. It seems to work
very well for this purpose--I tried the other brackets and they tend
to "disappear" faster than square brackets. So we reserve (...) and
<...> for constructs where the visual distance between opening and closing is
typically shorter than for square brackets or curlies. The square brackets also work
nicely when lined up vertically with vertical bars. Here's a declaration
of a named rule from the class Perl6Grammar. It parses Perl 6 statements.
(Think of it as a funny looking method declaration.)
rule state { <label>
[ <control> {.control}
| <sideff> <eostate> {.sideff}
| <@other_statements>
]
};
Huffman coding says that rarer forms should be longer, and that's the
case with the lookahead and lookbehind assertions, <before ...>
and <after ...>. (The negations are formed via the general
<!...> rule.) Note that these prepositions are interpreted
as assertions, not operations. For example, <before X> is
to be read "Assert that we are before X" rather than "Look before
where we are for X".
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

