Sign In/My Account | View Cart  
advertisement


Listen Print

Cooking with Perl

by Tom Christiansen, Nathan Torkington
August 21, 2003

Editor's note: The new edition of Perl Cookbook is about to hit store shelves, so to trumpet its release, we offer some recipes--new to the second edition--for your sampling pleasure. This week's excerpts include recipes from Chapter 6 ("Pattern Matching") and Chapter 8 ("File Contents"). And be sure to check back here in the coming weeks for more new recipes on topics such as using SQL without a database server, extracting table data, templating with HTML::Mason, and more.

Sample Recipe: Matching Nested Patterns

Problem

You want to match a nested set of enclosing delimiters, such as the arguments to a function call.

Solution

Use match-time pattern interpolation, recursively:

my $np;
$np = qr{
           \(
           (?:
              (?> [^(  )]+ )    # Non-capture group w/o backtracking
            |
              (??{ $np })     # Group with matching parens
           )*
           \)
        }x;

Or use the Text::Balanced module's extract_bracketed function.

Discussion

The $(??{ CODE }) construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this:

if ($word =~ /^(\w+)\w?(??{reverse $1})$/ ) {
    print "$word is a palindrome.\n";
}

Consider a word like "reviver", which this pattern correctly reports as a palindrome. The $1 variable contains "rev" partway through the match. The optional word character following catches the "i". Then the code reverse $1 runs and produces "ver", and that result is interpolated into the pattern.

For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses (??{ CODE }) can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of $np in that pattern, you could use it like this to match a function call:

$text = "myfunfun(1,(2*(3+4)),5)";
$funpat = qr/\w+$np/;   # $np as above
$text =~ /^$funpat$/;   # Matches!

You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example:

use Regexp::Common;
$text = "myfunfun(1,(2*(3+4)),5)";
if ($text =~ /(\w+\s*$RE{balanced}{-parens=>'(  )'})/o) {
  print "Got function call: $1\n";
}

Other patterns provided by that module match numbers in various notations and quote-delimited strings:

$RE{num}{int}
$RE{num}{real}
$RE{num}{real}{'-base=2'}{'-sep=,'}{'-group=3'}
$RE{quoted}
$RE{delimited}{-delim=>'/'}

The standard (as of v5.8) Text::Balanced module provides a general solution to this problem.

use Text::Balanced qw/extract_bracketed/;
$text = "myfunfun(1,(2*(3+4)),5)";
if (($before, $found, $after)  = extract_bracketed($text, "(")) {
    print "answer is $found\n";
} else {
    print "FAILED\n";
}

See Also

The section on "Match-Time Pattern Interpolation" in Chapter 5, "Pattern Matching," of Programming Perl, 3rd Edition; the documentation for the Regexp::Common CPAN module and the standard Text::Balanced module.

Pages: 1, 2

Next Pagearrow