Exegesis 7
by Damian Conway
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.
And mark what way I make...
Obviously, as a call to form builds up each line of its output
– extracting data from one or more data arguments and
formatting it into the corresponding fields – it needs to keep
track of where it's up to in each datum. It does this by progressively
updating the .pos of each datum, in exactly the same way as a
pattern match does.
And as with a pattern match, by default that updated .pos is only
used internally and not preserved after the call to form is
finished. So passing a string to form doesn't interfere with any
other pattern matching or text formatting that we might
subsequently do with that data.
However, sometimes we do want to know how much of our data a call to form
managed to extract and format. Or we may want to split a formatting task
into several stages, with separate calls to form for each stage.
So we need a way of telling form to preserve the .pos information
in our data.
But, if we want to apply a series of form calls to the same data we also
need to be able to tell form to respect the .pos information
of that data – to start extracting from the previously preserved
.pos position, rather than from the start of the string.
To achieve both those goals, we use a follow-on field. That is we use
an ordinary field but mark it as .pos-sensitive with a special
notation: Unicode ellipses or ASCII colons at either end. So instead of
{<<<<>>>>}, we'd write
{…<<<>>>…}
or {:<<<>>>:}.
Note that each ellipsis is a single, one-column wide Unicode HORIZONTAL
ELLIPSIS character (\c[2026]), not three separate dots. The
connotation of the ellipses is "...then keep on formatting from where
you previously left off, remembering there's probably still more to
come...". And the colons are the ASCII symbol most like a single
character ellipsis (try tilting your head and squinting).
Follow-on fields are most useful when we want to split a formatting task into distinct stages – or iterations – but still allow the contents of the follow-on field to flow uninterrupted from line to line. For example:
print "The best Shakespearean roles are:\n\n";
for @roles -> $role {
print form " * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<} *{…<<<<<<<>>>>>>>…}*",
$role, $disclaimer;
}
which produces:
The best Shakespearean roles are:
* Macbeth *WARNING: *
* King Lear *This list of roles*
* Juliet *constitutes a*
* Othello *personal opinion*
* Hippolyta *only and is in no*
* Don John *way endorsed by*
* Katerina *Shakespeare'R'Us. *
* Richard *It may contain*
* Malvolio *nuts. *
* Bottom * *
The multiple calls to form manage to produce a coherent disclaimer
because the ellipses in the second field tell each call to start
extracting data from $disclaimer at the offset indicated by
$disclaimer.pos, and then to update $disclaimer.pos with
the final position at which the field extracted data. So the next time
form is called, the follow-on field starts extracting from
where it left off in the previous call.
Follow-on fields are similar to ^<<<<< fields in a Perl 5 format,
except they don't destroy the contents of a data source; they merely change that
data source's .pos marker.
Therefore, put you in your best array...
Data, especially numeric data, is often stored in arrays.
So form also accepts arrays as data arguments. It can
do so because its parameter list is defined as:
sub form (Str|Array|Pair *@args is context(Scalar)) {...}
which means that although its arguments may include one or more arrays, each such array argument is nevertheless evaluated in a scalar context. Which, in Perl 6, produces an array reference.
In other words, array arguments don't get flattened automatically, so
form doesn't losing track of where in
the argument list one array finishes and the next begins.
Once inside form, each array that was specified as the data source
for a field is internally converted to a single string by joining it
together with a newline between each element.
The upshot is that, instead of:
print "The best Shakespearean roles are:\n\n";
for @roles -> $role {
print form " * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<} *{…<<<<<<<>>>>>>>…}*",
$role, $disclaimer;
}
we could just write:
print "The best Shakespearean roles are:\n\n";
print form " * {[[[[[[[[[[[[[[[[[[[[[[[[[[[[} *{[[[[[[[[]]]]]]]]}*",
@roles, $disclaimer;
And the array of roles would be internally converted to a single string, with one role per line. Note that we also changed the disclaimer field to a regular block field, so that the entire disclaimer would be formatted. And there was no longer any need for the disclaimer field to be a follow-on field, since the block field would extract and format the entire disclaimer anyway.
Note, however, that this block-based approach wouldn't work so well if
one of the elements of @roles was too big to fit on a single line. In
that case we might end up with something like the following:
The best Shakespearean roles are:
* Either of the 'two foolish *WARNING: *
* officers': Dogberry and Verges *This list of roles*
* That dour Scot, the Laird *constitutes a*
* Macbeth *personal opinion*
* The tragic Moor of Venice, *only and is in no*
* Othello *way endorsed by*
* Rosencrantz's good buddy *Shakespeare'R'Us. *
* Guildenstern *It may contain*
* The hideous and malevolent *nuts. *
* Richard III * *
rather than:
The best Shakespearean roles are:
* Either of the 'two foolish *WARNING: *
officers': Dogberry and Verges *This list of roles*
* That dour Scot, the Laird *constitutes a*
Macbeth *personal opinion*
* The tragic Moor of Venice, *only and is in no*
Othello *way endorsed by*
* Rosencrantz's good buddy *Shakespeare'R'Us. *
Guildenstern *It may contain*
* The hideous and malevolent *nuts. *
Richard III * *
That's because the "*" that's being used as a bullet for the first
column is a literal (i.e. mere decoration),
and so it will be repeated on every line that
is formatted, regardless of whether that line is the start of a new
element of @roles or merely the broken-and-wrapped remains of the
previous element. Happily, as we shall see later, this particular
problem has a simple solution.
Despite these minor complications, array data sources are particularly useful when formatting, especially if the data is known to fit within the specified width. For example:
print form
'-------------------------------------------',
'Name Score Time | Normalized',
'-------------------------------------------',
'{[[[[[[[[[[[[} {III} {II} | {]]].[[} ',
@name, @score, @time, [@score »/« @time];
is a very easy way to produce the table:
-------------------------------------------
Name Score Time | Normalized
-------------------------------------------
Thomas Mowbray 88 15 | 5.867
Richard Scroop 54 13 | 4.154
Harry Percy 99 18 | 5.5
Note the use of the Perl6-ish listwise division (»/«)
to produce the array of data for the "Normalized" column.
More particulars must justify my knowledge...
The most commonly used fields are those that justify their contents: to the left, to the right, to the left and right, or towards the centre.
Left-justified and right-justified fields extract from their data source the largest substring that will fit inside them, push that string to the left or right as appropriate, and then pad the string out to the required field width with spaces (or the nominated fill character).
Centred fields ({>>>><<<<} and {]]]][[[[}) likewise
extract as much data as possible, and then pad both sides of it with
(near) equal numbers of spaces. If the amount of padding required is not
evenly divisible by 2, the one extra space is added after the data.
There is a second syntax for centred fields – a tip-o'-the-hat to
Perl 5 formats: {|||||||||} and {IIIIIIII}. This variant also
makes it easier to specify centering fields that are only three columns
wide: {|} and {I}.
Note, however, that the behaviour of centering fields specified this way is exactly the same in every respect as the bracket-based versions, so we're free to use whichever we prefer.
Fully justified fields ({<<<<>>>>} and {[[[[]]]]})
extract a maximal substring and then distribute any padding as evenly as
possible into the existing whitespace gaps in that data. For example:
print form '({<<<<<<<<<>>>>>>>>>>>})',
"A fellow of infinite jest, of most excellent fancy";
would print:
(A fellow of infinite)
A fully-justified block field ({[[[[]]]]}) does the same across
multiple lines, except that the very last line is always left-justified.
Hence, this:
print form '({[[[[[[[[]]]]]]]})',
"All the world's a stage, And all the men and women merely players."
would print:
(All the world's a)
(stage, And all)
(the men and women)
(merely players. )
By the way, with both centred fields ({>>>><<<}) and fully
justified fields ({<<<>>>>}), the actual number of
left vs right arrows is irrelevant, so long as there is at least
one of each.
What, is't too short?
One special case we need to consider is an empty set of field delimiters:
form 'ID number: {}'
This specification is treated as a two-column-wide, left-justified block field (since that seems to be the type of two-column-wide field most often required).
Other kinds of two-column (and single-column) fields can also be created using imperative field widths and and user-defined fields.

