Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

RFC 348: Regex assertions in plain Perl code

This RFC makes some good points, though the code assertion syntax will be:

    <( code )>

The RFC also makes a case for getting rid of the special behavior of local in Perl 5, which treated local differently within a regex. However, something very like the local behavior will still be needed for making hypotheses, though the RFC is correct that it's not needed in the typical code assertion, In Perl 6, localization is done with temp, and it will not do the hypothetical variable hack that Perl 5 did. Instead there will be an explicit lvalue modifier, let, which specifically requests a variable's value to be scoped to the success of the current point in the regex. These hypothetical variables actually have much broader use than this RFC suggests.

Perl 5's hardwired use of $^R just translates to an appropriately named hypothetical variable in Perl 6.

RFC 360: Allow multiply matched groups in regexes to return a listref of all matches

I think that parens that can potentially match multiple times will automatically produce a list rather than matching the final one. It's not as if we can't tell whether something's inside a quantifier...

Here's the RFC's proposed solution:

    while ($text =~ /name:\s*(.*?)\n\s*
                    children:\s*(?:(?@\S+)[, ]*)*\n\s*
                    favorite\ colors:\s*(?:(?@\S+)[, ]*)*\n/sigx) {
        # now we have:
        #  $1 = "John Abajace";
        #  $2 = ["Tom", "Dick", "Harry"]
        #  $3 = ["red", "green", "blue"]
    }

Apart from the change in behavior of (...) within a quantifier, I have the urge to rewrite this example for several reasons:

  •     The C</x> and C</s> flags no longer exist.
  •     The C</i> and C</g> flags must be pulled out to the front for visibility.
        (And the C</g> flag is renamed C<:e>).
  •     There's now a C<\h> for horizontal whitespace, and C<\H> for the negation
        of that.  (Not that RFC is incorrect to use C<\s>.)
  •     The negation of C<\n> is now C<\N>.
  •     The C<:> character is now a metacharacter, and so must be backslashed.
  •     Character classes are now represented with C<< <[...]> >>.
  •     Grouping is now represented with C<[...]>.

With these changes, and making better use of whitespace, the sample regex ends up looking like this:

    for ($text =~ m:ie[
                            name             \: \h*   (\N*?)            \n
                        \h* children         \: \h* [ (\S+) <[,\h]>* ]* \n
                        \h* favorite\ colors \: \h* [ (\S+) <[,\h]>* ]* \n
                      ]
          )
    {
             # now we have:
             #  $1 = "John Abajace";
             #  $2 = ["Tom", "Dick", "Harry"]
             #  $3 = ["red", "green", "blue"]
    }

I think in the long run people will find this more readable once they're used to it. Certainly tabularizing the parallelisms will make any typing errors stand out.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow