Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.


O'Reilly Open Source Convention -- July 22-26, San Diego, CA.

From the Frontiers of Research to the Heart of the Enterprise

Don't miss Larry Walls's presentation, Introducing The Perl 6 Language at the O'Reilly Open Source Convention in July. The conference includes many sessions and tutorials of interest to Perl developers.

Regex culture has gone wrong in a variety of ways, but it's not my intent to assign blame--there's plenty of blame to go around, and plenty of things that have gone wrong that are nobody's fault in particular. For example, it's nobody's fault that you can't realistically complement a character set anymore. It's just an accident of the way Unicode defines combining characters. The whole notion of character classes is mutating, and that will have some bearing on the future of regular expression syntax.

Given all this, I need to warn you that this Apocalypse is going to be somewhat radical. We'll be proposing changes to certain "sacred" features of regex culture, and this is guaranteed to result in future shock for some of our more conservative citizens. Do not be alarmed. We will provide ways for you to continue programming in old-fashioned regular expressions if you desire. But I hope that once you've thought about it a little and worked through some examples, you'll like most of the changes we're proposing here.

So although the RFCs did contribute greatly to my thinking for this Apocalypse, I'm going to present my own vision first for where regex culture should go, and then analyze the RFCs with respect to that vision.

First, let me enumerate some of the things that are wrong with current regex culture.

  • Too much history

  • Too compact and "cute"

  • Poor Huffman coding

  • Too much reliance on too few metacharacters

  • Different things look too similar

  • Poor end-weight design

  • Too much reliance on modifiers

  • Too many special rules and boobytraps

  • Backreferences not useful enough

  • Too hard to match a literal string

  • Two-level interpretation is problematic

  • Too little abstraction

  • Little support for named captures

  • Difficult to use nested patterns

  • Little support for grammars

  • Inability to define variants

  • Poor integration with "real" language

  • Missing backtracking controls

  • Difficult to define assertions

I'm sure there are other problems, but that'll do for starters. Let's look at each of these in more detail.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow