Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

Rejected RFCs

RFC 135: Require explicit m on matches, even with ?? and // as delimiters.

Squish that gnat... :-)

A decent Perl parser is still going to have to keep track of whether a term or an operator is expected. And while we're simplifying the grammar in many ways, it's also the case that we're letting users install their own grammar rules to perform syntactic warpage. Besides, people like to write patterns with /.../. So rather than impoverishing Perl's syntax artificially, let's make the standard parser more accessible by writing it all in Perl 6 regexes.

RFC 145: Brace-matching for Perl Regular Expressions

Good problem, not-so-good solution from a complexity point of view. I'd like to leverage existing character class and backref notations maybe. If there were simply some way to tell a backref to invert any match characters, that might do it. Or maybe reverse them when you remember them, and leave the backref ignorant? (Downside is nested brackets would probably need recursive patterns.)

Recursion might be advisable anyway--you can't really pick up the arguments to a function, for instance, without also handling things like quoted strings, which may have different bracketing rules than outside of strings. Certainly matching \" would be dependent on whether you're inside or outside of a string. Given that recursion is often necessary, I'm not sure making this construct recurse itself is all that useful.

Along the lines of how tr/// works (or ought to work), I think it'd be more generally useful to have character remapping facility within a backref generator:

    (
     <[ \( \[ \{ \< ] =>
      [ \) \] \} \> ]> )

That might match a left bracket of some sort but return the corresponding right bracket as $1. But maybe we should just use an "existing" mechanism to translate strings:

    my %closing = {
        '[' => ']',
        '(' => ')',
        '{' => '}',
        '<' => '>',
    };
    rule balanced {
        <![\[\(\{\<\]\)\}\>]>*  # any non-brackets
        [                       # followed by either
            $                   #   end of string
        |                       # or
            $b := <[[({<]>      #   an opening bracket
            <self>              #   containing a balanced expr
            %closing{$b}        #   followed by corresponding close bracket
            <self>              #   followed by a balanced expr
        ]
     }

RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()

All operators will have a way to name them, which means it's possible to alias them to any other name. Rearranging the formal order of parameters would be a little harder, however. We need inlining to do that efficiently. Still, now that // doesn't evaluate in a typeless context, it's relatively straightforward to define a subroutine or method that does

    subst $string, /foo/, {"bar"}

in whatever order you like.

RFC 197: Numeric Value Ranges In Regular Expressions

If we go down this road, eventually we reinvent all of Perl syntax in regular expressions. Not that I'm against TMTOWTDI, but I'd rather have a better way to run Perl code from within a regex and have it "succeed" or "fail", and maybe better ways to test ranges from Perl code. Anything beyond that could be done with syntactic warpage.

In any event, overloading () and [] for this would be mentally treacherous, not to mention completely opaque to non-mathematicians. We'll stick with the standard boolean assertion:

    / (\d+) <( $1 =~ 1..10 )> /

Interestingly, that can also be written:

    / <( _/\d+/ =~ 1..10 )> /

RFC 198: Boolean Regexes

Again, I'm not much in favor of inventing new regex syntax that duplicates ordinary Perl syntax. I think we need richer ways of interconnecting related regexes via ordinary Perl syntax. Certainly it helps to have an easy way to specify a Perl assertion:

    / (\w+) <( %count{$1} > 3 )> /

But there's something to be said for forcing submatch assertions to be defined externally to the current regex, much like we discourage inline code where subroutine calls are in order.

So anyway, I think most of the submatches like onion rings should be handled simply by searching on captured strings within a closure. Booleans can be put into closures as well, but the new :: operator makes it pretty easy to AND and OR assertions together in a more regexly fashion without reinventing the wheel.

As proposed, there will be a "fail" token, but it's spelled <fail>, not \F. And the "true" token is spelled <null>. :-)

RFC 261: Pattern matching on perl values

This reminds me a bit of unification in Prolog. It's not explained very well here, and I'm wondering if it will be too hard to explain in general. I think this is probably too powerful a concept for the typical Perl programmer, who is lucky to understand simple lvalues that always do what they're told.

This sort of matching can probably be provided as syntactic warpage, though I'm not sure if that prevents useful optimizations. Anyway, this sort of thing is unlikely to make it into the Perl 6 core unless it generalizes usefully to function argument lists, and it may be too powerful for there too. For that purpose it would resemble a form of overloading, but with the "types" specified by keys. I suspect real types are more useful.

RFC 308: Ban Perl hooks into regexes

We must be able to call back into Perl code if we want to write parsers conveniently in Perl. Think of how yacc works. Certainly the way that Perl 5 does it is ugly, I'll admit. We can beautify that.

But the whole point of Perl is to have all the most useful "Krakken tentacles". And I don't really care if it makes it hard to put the Perl regex engine into some other language. :-)

RFC 316: Regex modifier for support of chunk processing and prefix matching

Infinite strings (via infinite arrays) seem like a more useful concept. It would be easy for the extension subroutine to fail and produce the results desired in this RFC, but without the necessity of the extra syntax specified by the RFC. A match naturally fails when it gets to the end of its string without finishing the pattern. Incremental matching can also easily be done via infinite strings, and the user interface can be a simple as we like, as long as extension rule is somehow associated with the string in question.

I think pos() is rather too low-level a concept for general use. Certainly it needs to be there, but I think we need some way of implying that one regex is a continuation of a previous one, but within some higher-level syntactic construct, so that it's easy to write parsers without invoking pos() or \g or /c all over the place.


<cut>

Well, I could say a lot more, but that's it for this time. I hope you're excited by all this, in a positive sort of way. But if your jaw lost all of its bounce when it hit the table, I expect Damian's upcoming Exegesis 5 will do a better job of showing how this all fits together into a pretty picture.