Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Exegesis 7
by Damian Conway | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.

Great floods have flown from simple sources...

When it comes to specifying the data source for each field in a format, form offers several alternatives as to where that data placed, several alternatives as to the order in which that data is extracted, and an option that lets us control how the data is fitted into each field.

A man may break a word with you, sir...

Whenever a field is passed more data than it can accommodate in a single line, form is forced to "break" that data somewhere.

If the field in question is W columns wide, form first squeezes any whitespace (as specified by the user's :ws option) and then looks at the next W columns of the string. (Of course, that might actually correspond to less than W characters if the string contains wide characters. However, for the sake of exposition we'll pretend that all characters are one column wide here.)

form's breaking algorithm then searches for a newline, a carriage return, any other whitespace character, or a hyphen. If it finds a newline or carriage return within the first W columns, it immediately breaks the data string at that point. Otherwise it locates the last whitespace or hyphen in the first W columns and breaks the string immediately after that space or hyphen. If it can't find anywhere suitable to break the string, it breaks it at the (W-1)th column and appends a hyphen.

So, for example:


    $data = "You can play no part but Pyramus;\nfor Pyramus is a sweet-faced man";

    print form "|{[[[[[}|",
                 $data;

prints:


    |You can|
    |play no|
    |part   |
    |but    |
    |Pyramu-|
    |s;     |
    |for    |
    |Pyramus|
    |is a   |
    |sweet- |
    |faced  |
    |man    |

Note the line-breaks after can (at a whitespace), part (after a whitespace), sweet- (after a hyphen), and s; (at a newline). Note too that Pyramus; doesn't fit in the field, so it has to be chopped in two and a hyphen inserted.

Of course, this particular style of line-breaking may not be suitable to all applications, and we might prefer that form use some other algorithm. For example, if form used the TeX breaking algorithm it would have broken Pyramus; less clumsily, yielding:


    |You can|
    |play no|
    |part   |
    |but    |
    |Pyra-  |
    |mus;   |
    |for    |
    |Pyramus|
    |is a   |
    |sweet- |
    |faced  |
    |man    |

To support different line-breaking strategies form provides the :break option. The :break option's value must be a closure/subroutine, which will then be called whenever a data string needs to be broken to fit a particular field width.

That subroutine is passed three arguments: the data string itself, an integer specifying how wide the field is, and a regex indicating which (if any) characters are to be squeezed. It is expected to return a list of two values: a string which is taken as the "broken" text for the field, and a boolean value indicating whether or not any data remains after the break (so form knows when to stop breaking the data string). The subroutine is also expected to update the .pos of the data string to point immediately after the break it has imposed.

For example, if we always wanted to break at the exact width of the field (with no hyphens), we could do that with:


    sub break_width ($data is rw, $width, $ws) {
        given $data {
            # Treat any squeezed or vertical whitespace as a single character
            # (since they'll subsequently be squeezed to a single space)
            my rule single_char { <$ws> | \v+ | . }

            # Give up if there are no more characters to grab...
            return ("", 0) unless m:cont/ (<single_char><1,$width>) /;

            # Squeeze the resultant substring...
            (my $result = $1) ~~ s:each/ <$ws> | \v+ /\c[SPACE]/;

            # Check for any more data still to come...
            my bool $more = m:cont/ <before: .* \S> /;

            # Return the squeezed substring and the "more" indicator...
            return ($result, $more);
        }
    }
    
    print form
        :break(&break_width),
        "|{[[[[[}|",
          $data;

producing:


    |You can|
    |play no|
    |part bu|
    |t Pyram|
    |us; for|
    |Pyramus|
    |is a sw|
    |eet-fac|
    |ed man |

Or we might prefer to break on every single whitespace-separated word:


    sub break_word ($data is rw, $width, $ws) {
        given $data {
            # Locate the next word (no longer than $width cols)
            my $found = m:cont/ \s* $?word:=(\S<1,$width>) /;

            # Fail if no more words...
            return ("", 0) unless $found{word};

            # Check for any more data still to come...
            my bool $more = m:cont/ <before: .* \S> /;

            # Otherwise, return broken text and "more" flag... 
            return ($found{word}, $more);
        }
    }
    
    print form
        :break(&break_word),
        "|{[[[[[}|",
          $data;

producing:


    |You    |
    |can    |
    |play   |
    |no     |
    |part   |
    |but    |
    |Pyramus|
    |;      |
    |for    |
    |Pyramus|
    |is     |
    |a      |
    |sweet-f|
    |aced   |
    |man    |

We'll see yet another application of user-defined breaking when we discuss user-defined fields.

He, being in the vaward, placed behind...

There are (at least) three schools of thought when it comes to setting out a call to form that uses more than one format. The "traditional" way (i.e. the way Perl 5 formats do it) is to interleave each format string with a line containing the data it is to interpolate, with each datum aligned directly under the field into which it is to be fitted. Like so:


    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ",
           $name,
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",
                             $bio,
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
           $status,
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
           $comments;

This approach has the advantage that it self-documents: to know what a particular field is supposed to contain, we merely need to look down one line.

It does, however, break up the "abstract picture" that the formats portray, which can make it more difficult to envisage what the final formatted text will look like. So some people prefer to put all the data to the right of the formats:


    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ", $name,
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}", $bio,
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $status,
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $comments;

And that's perfectly acceptable too.

Sometimes, however, the data to be interpolated doesn't come neatly pre-packaged in separate variables that are easy to intersperse between the formats. For example, the data might be a list returned by a subroutine call (get_info) or might be stored in a hash ( %person{« name biog stat comm »} ). In such cases it's a nuisance to have to tease that data out into separate variables (or hash accesses) and then sprinkle them through the formats:


    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ",%person{name},
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",%person{biog},
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{stat},
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{comm};

So form has an option that lets us put a single, multi-line format at the start of the argument list, place all the data together after it, and have that data automatically interleaved as necessary. Not surprisingly, that option is: :interleave. It's normally used in conjunction with a heredoc, since that's the easiest way to specify a multi-line string in Perl:


    print form :interleave, <<'EOFORMAT',
           Name:                                                 
             {[[[[[[[[[[[[}                                      
                             Biography:                          
           Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}
             {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
                               {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
           Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
             {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
           EOFORMAT
         %person{« name biog stat comm »}

When :interleave is in effect, form grabs the first string argument it's passed and breaks that argument up into individual lines. It treats those individual lines as a series of distinct formats and grabs as many of the remaining arguments as are required to provide data for each format.

Of course, in this example we're also taking advantage of the new indenting behaviour of heredocs. The "Name:", "Status:", and "Comments:" titles are actually at the very beginning of their respective lines, because the start of a Perl 6 heredoc terminator marks the left margin of the entire heredoc string.

Would they were multitudes...

It's important to point out that, even when we're using form's default non-interleaving behaviour, it's still okay to use a format that spans multiple lines. There is however a significant (and useful) difference in behaviour between the two alternatives.

The normal behaviour of form is to take each format string, fill in each field in the format with a substring from the corresponding data source, and then repeat that process until all the data sources have been exhausted. Which means that a multi-line format like this:


    print form
         <<'EOFORMAT',
            Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}
            Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
            _______________________________________________
            EOFORMAT
         @names, @roles, @addresses;

would normally produce this:


    Name:    King Lear           Role: Protagonist 
    Address: The Cliffs, Dover                     
    _______________________________________________
    Name:    The Three Witches   Role: Plot devices
    Address: Dismal Forest, Scotland               
    _______________________________________________
    Name:    Iago                Role: Villain     
    Address: Casa d'Otello, Venezia               
    _______________________________________________

because the entire three-line format is repeatedly filled in as a single unit, line-by-line and datum-by-datum.

On the other hand, if we tell form that it's supposed to automatically interleave the data coming after the format, like so:


    print form :interleave,
         <<'EOFORMAT',
            Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}
            Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
            _______________________________________________
            EOFORMAT
         @names, @roles, @addresses;

then the call produces:


    Name:    King Lear           Role: Protagonist 
    Name:    The Three Witches   Role: Plot devices
    Name:    Iago                Role: Villain     
    Address: The Cliffs, Dover                     
    Address: Dismal Forest, Scotland               
    Address: Casa d'Otello, Venezia               
    _______________________________________________

because that second version is really equivalent to:


    print form
         "Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}",
                   @names,                   @roles,
         "Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                   @addresses,
         "_______________________________________________";

That's not much use in this particular example, but it was exactly what was needed for the biography example earlier. It's just a matter of choosing the right type of data placement to achieve the particular effect we want.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

Next Pagearrow