Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

Variable Interpretation

As we mentioned earlier, bare scalars match their contents literally. (Use <$var> instead to match a regex defined in $var.) Subscripted arrays and hashes behave just like a scalar as long as the subscripts aren't slices.

If you use a bare array (unsubscripted), it will match if any element of the array matches literally at that point. (A slice of an array or hash also behaves this way.) If you say

    @array = ("^", "$", ".");
    / @array /

it's as if you said

    / \^ | \$ | \. /

But if you you slice it like this:

    / @array[0..1] /

it won't match the dot.

If you want the array to be considered as a set of regex alternatives, enclose in angles:

    @array = ("^foo$", "^bar$", "^baz$");
    / <@array> /

Bare hashes in a regex provide a sophisticated match-via-lookup mechanism. Bare hashes are matched as follows:

  1. Match a key at the current point in the string.
    1a. If the hash has its keymatch property set to some regex, use that regex to match the key.

    1b. Otherwise, use /\w+:/ to match the key.

  2. If a key isn't found at the current position in the string, the match fails.

  3. Otherwise, get the value in the hash corresponding to the matched key.

  4. If the is no entry for that key, the match fails.

  5. If the hash doesn't have a valuematch property, the match succeeds immediately.

  6. Otherwise use the hash's valuematch property (typically itself a regex) to extract the value at the current point in the string.

  7. If no value can be extracted, matching of the hash fails.

  8. If the extracted value string is eq to the key's actual value, matching of the original hash immediately succeeds.

  9. Otherwise, matching of the original hash fails.

So matching a bare hash is equivalent to:

    rule {
        $key := <{ %hash.prop{keymatch} // /\w+:/ }>    # find key
        <( exists %hash{$key} )>                        # if exists
        [ <( not defined %hash.prop{valuematch} )> ::   # done?
            <null>                                      # succeed
        |                                               # else
            $val := <%hash.prop{valuematch}>            # find value
                <( $val eq %hash{$val} )>               # assert eq
        ]
    }

A typical valuematch might look like:

    rule {
        \s* =\> \s*             # match => 
        $q:=(<["']>)            # match initial quote 
        $0:=( [ \\. | . ]*? )   # return matched value
        $q                      # match trailing quote 
    }

In essence, the presence or absence of the valuematch property controls whether the hash tries to match only keys, or both keys and values.

A hash may be used inside angles as well. In that case, it finds the key by the same method (steps 1 and 2 above), but always treats the corresponding hash value as a regex (regardless of any properties the hash might have). The parse then continues according to the rule found in the hash. For example, we could parse a set of control structures with:

    rule { <%controls> }

The %controls hash can have keys like "if" and "while" in it. The corresponding entry says how to parse the rest of an if or a while statement. For example:

        %controls = ( 
            if     => / <condition>      <closure> /,
            unless => / <condition>      <closure> /,
            while  => / <condition>      <closure> /,
            until  => / <condition>      <closure> /,
            for    => / <list_expr>      <closure> /,
            loop   => / <loop_controls>? <closure> /,
        );

So saying:

    <%controls>

is really much as if we'd said:

    [ if     \b <%controls{if}>
    | unless \b <%controls{unless}> 
    | while  \b <%controls{while}>
    | until  \b <%controls{until}> 
    | for    \b <%controls{for}> 
    | loop   \b <%controls{loop}>
    ]

Only it actually works more like

    / $k=<{ %controls.prop{keymatch} // /\w+:/ }> <%controls{$k}> /

Note that in Perl 6 it's perfectly valid to use // inside an expression embedded in a regex delimited by slashes. That's because a regex is no longer considered a string, so we don't have to find the end of it before we parse it. Since we can parse it in one pass, the expression parser can handle the // when it gets to it without worrying about the outer slash, and the final slash is recognized as the terminator by the regex parser without having to worry about anything the expression parser saw.

A bare subroutine call may be used in a regex, provided it starts with & and uses parentheses around the arguments. The return value of the subroutine is matched literally. The subroutine may have side effects, and may throw an exception to fail.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow