Apocalypse 5
by Larry Wall
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.
Rejected RFCs
RFC 135: Require explicit m on matches, even with ?? and // as delimiters.
Squish that gnat... :-)
A decent Perl parser is still going to have to keep track of whether
a term or an operator is expected. And while we're simplifying
the grammar in many ways, it's also the case that we're letting
users install their own grammar rules to perform syntactic warpage.
Besides, people like to write patterns with /.../. So rather
than impoverishing Perl's syntax artificially, let's make the standard
parser more accessible by writing it all in Perl 6 regexes.
RFC 145: Brace-matching for Perl Regular Expressions
Good problem, not-so-good solution from a complexity point of view. I'd like to leverage existing character class and backref notations maybe. If there were simply some way to tell a backref to invert any match characters, that might do it. Or maybe reverse them when you remember them, and leave the backref ignorant? (Downside is nested brackets would probably need recursive patterns.)
Recursion might be advisable anyway--you can't really pick up the arguments to a
function, for instance, without also handling things like quoted strings, which may
have different bracketing rules than outside of strings. Certainly matching \"
would be dependent on whether you're inside or outside of a string. Given that
recursion is often necessary, I'm not sure making this construct recurse itself
is all that useful.
Along the lines of how tr/// works (or ought to work), I think it'd
be more generally useful to have character remapping facility within
a backref generator:
(
<[ \( \[ \{ \< ] =>
[ \) \] \} \> ]> )
That might match a left bracket of some sort but return the corresponding right bracket as $1.
But maybe we should just use an "existing" mechanism to translate strings:
my %closing = {
'[' => ']',
'(' => ')',
'{' => '}',
'<' => '>',
};
rule balanced {
<![\[\(\{\<\]\)\}\>]>* # any non-brackets
[ # followed by either
$ # end of string
| # or
$b := <[[({<]> # an opening bracket
<self> # containing a balanced expr
%closing{$b} # followed by corresponding close bracket
<self> # followed by a balanced expr
]
}
RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()
All operators will have a way to name them, which means it's
possible to alias them to any other name. Rearranging the formal
order of parameters would be a little harder, however. We need
inlining to do that efficiently. Still, now that // doesn't
evaluate in a typeless context, it's relatively straightforward
to define a subroutine or method that does
subst $string, /foo/, {"bar"}
in whatever order you like.
RFC 197: Numeric Value Ranges In Regular Expressions
If we go down this road, eventually we reinvent all of Perl syntax in regular expressions. Not that I'm against TMTOWTDI, but I'd rather have a better way to run Perl code from within a regex and have it "succeed" or "fail", and maybe better ways to test ranges from Perl code. Anything beyond that could be done with syntactic warpage.
In any event, overloading () and [] for this would be mentally treacherous,
not to mention completely opaque to non-mathematicians. We'll stick
with the standard boolean assertion:
/ (\d+) <( $1 =~ 1..10 )> /
Interestingly, that can also be written:
/ <( _/\d+/ =~ 1..10 )> /
RFC 198: Boolean Regexes
Again, I'm not much in favor of inventing new regex syntax that duplicates ordinary Perl syntax. I think we need richer ways of interconnecting related regexes via ordinary Perl syntax. Certainly it helps to have an easy way to specify a Perl assertion:
/ (\w+) <( %count{$1} > 3 )> /
But there's something to be said for forcing submatch assertions to be defined externally to the current regex, much like we discourage inline code where subroutine calls are in order.
So anyway, I think most of the submatches like onion rings should
be handled simply by searching on captured strings within a closure.
Booleans can be put into closures as well, but the new :: operator
makes it pretty easy to AND and OR assertions together in a more
regexly fashion without reinventing the wheel.
As proposed, there will be a "fail" token, but it's spelled <fail>, not \F.
And the "true" token is spelled <null>. :-)
RFC 261: Pattern matching on perl values
This reminds me a bit of unification in Prolog. It's not explained very well here, and I'm wondering if it will be too hard to explain in general. I think this is probably too powerful a concept for the typical Perl programmer, who is lucky to understand simple lvalues that always do what they're told.
This sort of matching can probably be provided as syntactic warpage, though I'm not sure if that prevents useful optimizations. Anyway, this sort of thing is unlikely to make it into the Perl 6 core unless it generalizes usefully to function argument lists, and it may be too powerful for there too. For that purpose it would resemble a form of overloading, but with the "types" specified by keys. I suspect real types are more useful.
RFC 308: Ban Perl hooks into regexes
We must be able to call back into Perl code if we want to write parsers conveniently in Perl. Think of how yacc works. Certainly the way that Perl 5 does it is ugly, I'll admit. We can beautify that.
But the whole point of Perl is to have all the most useful "Krakken
tentacles". And I don't really care if it makes it hard to put the
Perl regex engine into some other language. :-)
RFC 316: Regex modifier for support of chunk processing and prefix matching
Infinite strings (via infinite arrays) seem like a more useful concept. It would be easy for the extension subroutine to fail and produce the results desired in this RFC, but without the necessity of the extra syntax specified by the RFC. A match naturally fails when it gets to the end of its string without finishing the pattern. Incremental matching can also easily be done via infinite strings, and the user interface can be a simple as we like, as long as extension rule is somehow associated with the string in question.
I think pos() is rather too low-level a concept for general use.
Certainly it needs to be there, but I think we need some way of
implying that one regex is a continuation of a previous one, but within
some higher-level syntactic construct, so that it's easy to write
parsers without invoking pos() or \g or /c all over the place.
<cut>
Well, I could say a lot more, but that's it for this time. I hope you're excited by all this, in a positive sort of way. But if your jaw lost all of its bounce when it hit the table, I expect Damian's upcoming Exegesis 5 will do a better job of showing how this all fits together into a pretty picture.

