Exegesis 7
by Damian Conway
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.
Great floods have flown from simple sources...
When it comes to specifying the data source for each field in a format,
form offers several alternatives as to where that data placed,
several alternatives as to the order in which that data is extracted, and
an option that lets us control how the data is fitted into each field.
A man may break a word with you, sir...
Whenever a field is passed more data than it can
accommodate in a single line, form is forced to "break" that data somewhere.
If the field in question is W
columns wide, form first squeezes any whitespace (as specified by
the user's :ws option) and then looks at the next W columns of the string.
(Of course, that might actually correspond to less than W characters
if the string contains wide characters. However, for the sake of exposition
we'll pretend that all characters are one column wide here.)
form's breaking algorithm then searches for a newline, a carriage
return, any other whitespace character, or a hyphen. If it
finds a newline or carriage return within the first W columns, it
immediately breaks the data string at that point. Otherwise it locates
the last whitespace or hyphen in the first W columns and breaks
the string immediately after that space or hyphen. If it can't find
anywhere suitable to break the string, it breaks it at the (W-1)th
column and appends a hyphen.
So, for example:
$data = "You can play no part but Pyramus;\nfor Pyramus is a sweet-faced man";
print form "|{[[[[[}|",
$data;
prints:
|You can|
|play no|
|part |
|but |
|Pyramu-|
|s; |
|for |
|Pyramus|
|is a |
|sweet- |
|faced |
|man |
Note the line-breaks after can (at a whitespace), part (after a whitespace), sweet- (after a hyphen), and s; (at a newline). Note too that Pyramus; doesn't fit in the field, so it has to be chopped in two and a hyphen inserted.
Of course, this particular style of line-breaking may not be suitable to all
applications, and we might prefer that form use some other algorithm. For
example, if form used the TeX breaking algorithm it would have broken
Pyramus; less clumsily, yielding:
|You can|
|play no|
|part |
|but |
|Pyra- |
|mus; |
|for |
|Pyramus|
|is a |
|sweet- |
|faced |
|man |
To support different line-breaking strategies form provides
the :break option. The :break option's value must be
a closure/subroutine, which will then be called whenever a data string
needs to be broken to fit a particular field width.
That subroutine is passed three arguments: the data
string itself, an integer specifying how wide the field is, and a regex
indicating which (if any) characters are to be squeezed.
It is expected to return a list of two values: a string which is taken
as the "broken" text for the field, and a boolean value indicating
whether or not any data remains after the break (so form knows when
to stop breaking the data string). The subroutine is also expected to
update the .pos of the data string to point immediately after the
break it has imposed.
For example, if we always wanted to break at the exact width of the field (with no hyphens), we could do that with:
sub break_width ($data is rw, $width, $ws) {
given $data {
# Treat any squeezed or vertical whitespace as a single character
# (since they'll subsequently be squeezed to a single space)
my rule single_char { <$ws> | \v+ | . }
# Give up if there are no more characters to grab...
return ("", 0) unless m:cont/ (<single_char><1,$width>) /;
# Squeeze the resultant substring...
(my $result = $1) ~~ s:each/ <$ws> | \v+ /\c[SPACE]/;
# Check for any more data still to come...
my bool $more = m:cont/ <before: .* \S> /;
# Return the squeezed substring and the "more" indicator...
return ($result, $more);
}
}
print form
:break(&break_width),
"|{[[[[[}|",
$data;
producing:
|You can|
|play no|
|part bu|
|t Pyram|
|us; for|
|Pyramus|
|is a sw|
|eet-fac|
|ed man |
Or we might prefer to break on every single whitespace-separated word:
sub break_word ($data is rw, $width, $ws) {
given $data {
# Locate the next word (no longer than $width cols)
my $found = m:cont/ \s* $?word:=(\S<1,$width>) /;
# Fail if no more words...
return ("", 0) unless $found{word};
# Check for any more data still to come...
my bool $more = m:cont/ <before: .* \S> /;
# Otherwise, return broken text and "more" flag...
return ($found{word}, $more);
}
}
print form
:break(&break_word),
"|{[[[[[}|",
$data;
producing:
|You |
|can |
|play |
|no |
|part |
|but |
|Pyramus|
|; |
|for |
|Pyramus|
|is |
|a |
|sweet-f|
|aced |
|man |
We'll see yet another application of user-defined breaking when we discuss user-defined fields.
He, being in the vaward, placed behind...
There are (at least) three schools of thought when it comes to setting
out a call to form that uses more than one format. The
"traditional" way (i.e. the way Perl 5 formats do it) is to interleave
each format string with a line containing the data it is to
interpolate, with each datum aligned directly under the field into
which it is to be fitted. Like so:
print form
"Name: ",
" {[[[[[[[[[[[[} ",
$name,
" Biography: ",
"Status: {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",
$bio,
" {[[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
$status,
" {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
"Comments: {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
" {[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
$comments;
This approach has the advantage that it self-documents: to know what a particular field is supposed to contain, we merely need to look down one line.
It does, however, break up the "abstract picture" that the formats portray, which can make it more difficult to envisage what the final formatted text will look like. So some people prefer to put all the data to the right of the formats:
print form
"Name: ",
" {[[[[[[[[[[[[} ", $name,
" Biography: ",
"Status: {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}", $bio,
" {[[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $status,
" {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
"Comments: {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
" {[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $comments;
And that's perfectly acceptable too.
Sometimes, however, the data to be interpolated doesn't come neatly
pre-packaged in separate variables that are easy to intersperse between the
formats. For example, the data might be a list returned by a
subroutine call () or might be stored in a hash
( get_info%person{« name biog stat comm »} ). In such
cases it's a nuisance to have to tease that data out into separate
variables (or hash accesses) and then sprinkle them through the formats:
print form
"Name: ",
" {[[[[[[[[[[[[} ",%person{name},
" Biography: ",
"Status: {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",%person{biog},
" {[[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{stat},
" {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
"Comments: {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
" {[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{comm};
So form has an option that lets us put a single, multi-line format
at the start of the argument list, place all the data together
after it, and have that data automatically interleaved as necessary.
Not surprisingly, that option is: :interleave. It's normally used in
conjunction with a heredoc, since that's the easiest way to specify a
multi-line string in Perl:
print form :interleave, <<'EOFORMAT',
Name:
{[[[[[[[[[[[[}
Biography:
Status: {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}
{[[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
{VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
Comments: {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
{[[[[[[[[[[[} {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
EOFORMAT
%person{« name biog stat comm »}
When :interleave is in effect, form grabs the first string
argument it's passed and breaks that argument up into individual lines.
It treats those individual lines as a series of distinct formats
and grabs as many of the remaining arguments as are required to
provide data for each format.
Of course, in this example we're also taking advantage of the new indenting behaviour of heredocs. The "Name:", "Status:", and "Comments:" titles are actually at the very beginning of their respective lines, because the start of a Perl 6 heredoc terminator marks the left margin of the entire heredoc string.
Would they were multitudes...
It's important to point out that, even when we're using form's
default non-interleaving behaviour, it's still okay to use a format
that spans multiple lines. There is however a significant (and useful)
difference in behaviour between the two alternatives.
The normal behaviour of form is to take each format string,
fill in each field in the format with a substring from the
corresponding data source, and then repeat that process until all the
data sources have been exhausted. Which means that a multi-line format
like this:
print form
<<'EOFORMAT',
Name: {[[[[[[[[[[[[[[[} Role: {[[[[[[[[[[}
Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
_______________________________________________
EOFORMAT
@names, @roles, @addresses;
would normally produce this:
Name: King Lear Role: Protagonist
Address: The Cliffs, Dover
_______________________________________________
Name: The Three Witches Role: Plot devices
Address: Dismal Forest, Scotland
_______________________________________________
Name: Iago Role: Villain
Address: Casa d'Otello, Venezia
_______________________________________________
because the entire three-line format is repeatedly filled in as a single unit, line-by-line and datum-by-datum.
On the other hand, if we tell form that it's supposed to automatically
interleave the data coming after the format, like so:
print form :interleave,
<<'EOFORMAT',
Name: {[[[[[[[[[[[[[[[} Role: {[[[[[[[[[[}
Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
_______________________________________________
EOFORMAT
@names, @roles, @addresses;
then the call produces:
Name: King Lear Role: Protagonist
Name: The Three Witches Role: Plot devices
Name: Iago Role: Villain
Address: The Cliffs, Dover
Address: Dismal Forest, Scotland
Address: Casa d'Otello, Venezia
_______________________________________________
because that second version is really equivalent to:
print form
"Name: {[[[[[[[[[[[[[[[} Role: {[[[[[[[[[[}",
@names, @roles,
"Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
@addresses,
"_______________________________________________";
That's not much use in this particular example, but it was exactly what was needed for the biography example earlier. It's just a matter of choosing the right type of data placement to achieve the particular effect we want.

