Exegesis 7
by Damian Conway
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.
Define, define, well-educated infant.
Laziness is, of course, a major virtue. And one of the Laziest approaches to programming is never to repeat oneself. Which is why Perl 6 has subroutines and macros and classes and constants and dozens of other ways for us to factor out commonalities.
Occasionally, the same need for factoring arises in formatting. For example, suppose we want a field that masks its data in some way. Perhaps a field that blanks out certain words by replacing them with the corresponding number of X's.
We could always do that by writing a subroutine that generates the appropriate filter:
sub expurgate (Str *@hidewords) {
return sub (Str $data is rw) {
$data ~~ s:ei/(@hidewords)/$( 'X' x length $1 )/;
return $data;
}
}
We could then apply that subroutine to the data of any field that needed bowdlerization:
my &censor := expurgate «villain plot libel treacherous murderer false deadly 'G'»;
print form
"[Ye following tranfcript hath been cenfored by Order of ye King]\n\n",
" {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
censor($speech);
to produce:
[Ye following tranfcript hath been cenfored by Order of ye King]
And therefore, since I cannot prove a lover,
To entertain these fair well-spoken days,
I am determined to prove a XXXXXXX
And hate the idle pleasures of these days.
XXXXs have I laid, inductions dangerous,
By drunken prophecies, XXXXXs and dreams,
To set my brother Clarence and the king
In XXXXXX hate the one against the other:
And if King Edward be as true and just
As I am subtle, XXXXX and XXXXXXXXXXX,
This day should Clarence closely be mew'd up,
About a prophecy, which says that XXX
Of Edward's heirs the XXXXXXXX shall be.
Of course, if this were Puritanism and not Perl, we might have a long
list of proscribed words that we needed to excise from every formatted text.
In that case, rather that explicitly running every data
source through the same censorious subroutine, it would be handy if form
had a built-in field that did that for us automatically.
Naturally, form doesn't have such a field built-in...but we
can certainly give it one.
User-defined field specifiers can be declared using the :field option,
which takes as its value an array of pairs. The key of each pair
is a string or a rule (i.e. regex) that specifies the syntax of the
user-defined field. The value of each pair is a closure/subroutine that
constructs a standard field specifier to replace the user-defined
specifier. Alternatively, the value of a pair may be a string, which is
taken as the (static) field specifier to be used instead of the
user-defined field.
In other words, each pair is a macro that maps a user-defined field
(specified by the pair's key) onto a standard form field (specified by
the pair's value). For example:
:field[ /\{ X+ \}/ => &censor_field ]
This tells form that whenever it finds a brace-delimited field consisting
of one or more X's, it should call a subroutine named censor_field and
use the return value of that call instead of the all-X field.
When the key of a :field pair matches some part of a format,
its corresponding subroutine is called. That subroutine is passed
the result (i.e. $0) of the rule
match, as well as the hash of active options for that field. Changes
to the options hash will affect the subsequent formatting behaviour of
that field.
So censor_field could be implemented like so:
# Constructor subroutine for user-defined censor fields...
sub censor_field ($field_spec, %opts) {
# Set up the field's 'break' option with a censorious break...
%opts{break} = break_and_censor(%opts{break});
# Construct a left-justified field with the appropriate width
# specified imperatively...
return '{[[{' ~ length($field_spec) ~ '}[[}';
}
The censor_field subroutine has to change the field's :break
option, creating a new line breaker that also expurgates unsuitable
words. To do this it calls break_and_censor, which returns a new line
breaker subroutine:
# Create a new 'break' sub...
sub break_and_censor (&original_breaker) {
return sub (*@args) {
# Call the field's original 'break' sub...
my ($nextline, $more) = original_breaker(*@args);
# X out any doubleplus ungood words
$nextline ~~ s:ei/(@proscribed)/$( 'X' x length $1 )/;
# Return the "corrected" version...
return ($nextline, $more);
}
}
Having created a subroutine to translate censor fields and another to break-and-expurgate the data placed in them, we are now in a position to create a module that encapsulates the new formatting functionality:
module Ministry::Of::Truth {
# Internal mechanism (as above)...
my @proscribed = «villain plot libel treacherous murderer false deadly 'G'»;
sub break_and_censor (&original_breaker) {...}
sub censor_field ($field_spec, %opts) {...}
# Make the new field type standard by default in this scope...
use Form :field[ /\{ X+ \}/ => &censor_field ];
# Re-export the specialized &form that was imported above...
sub form is exported {...}
}
Okay, admittedly that's quite a lot of work. But the pay-off is huge: we can now trample on free speech much more easily:
use Ministry::Of::Truth;
print form
"[Ye following tranfcript hath been cenfored by Order of ye King]\n\n",
" {XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}",
$speech;
And we'd get the same carefully XXXX'ed output as before.
Put thyself into the trick of singularity...
User-defined fields are also a handy way to create single-character markers for single-column fields (in order to preserve the one-to-one spacing of a format). For example:
print form
:field{ '^' => '{<III{1}III}', # 1-char-wide, top-justified block
'=' => '{<=II{1}II=}', # 1-char-wide, middle-justified block
'_' => '{<_II{1}II_}', # 1-char-wide, bottom-justified block
},
'~~~~~~~~~',
'^ _ = _ ^', *«like round and orient perls»,
'~~~~~~~~~';
prints:
~~~~~~~~~
l o p
i r a r e
k o n i r
e u d e l
n n s
d t
~~~~~~~~~
Note that we needed to use a unary * to flatten the «like round and orient perls»
data list. That's because every argument of form is evaluated in scalar
context, and an unflattened «...» list in scalar context
becomes an array reference, rather than the five separate strings we needed
to fill our five single-character fields.
Single fields are particularly useful for labelling the vertical axes of a graph:
use Form :field[ '=' => '{<=II{1}II=}' ];
@vert_label = «Villain's fortunes»
$hor_label = "Time";
print form
' ^ ',
' = = | {""""""""""""""""""""""""""""""""""""} ', *@vert_label, @data,
' +--------------------------------------->',
' {|||||||||||||||||||||||||||||||||||||} ', $hor_label;
which produces:
^
|
V | *
i f | * *
l o | * *
l r |
a t | * *
i u |
n n | * *
' e |
s s |
|
| * *
+--------------------------------------->
Time
Specifying these kinds of single-character block markers is perhaps the commonest use of user-defined fields. But the:
:field[ '=' => '{<=II{1}II=}' ]
syntax is uncomfortably verbose for that purpose. So calls to
form can also accept a short-hand notation to define a
single-character field:
:single('=')
or to define several at once:
:single['#', '*', '+']
The :single option does exactly the same thing as the :field options
shown above. It takes a single-character string, or a reference
to an array of such strings, as its value. It then turns each of those
strings into a single-column field marker. If the character is '='
then the field is vertically "middled" within its block. If the
character is '_' then the field is "bottomed" within its block. If
the single character is anything else, the resulting block is top-justified.
So our previous example could also have been written:
print form
:single("="),
' ^ ',
' = = | {""""""""""""""""""""""""""""""""""""} ', *@vert_label, @data,
' +--------------------------------------->',
' {|||||||||||||||||||||||||||||||||||||} ', $hor_label;
These paper bullets of the brain...
Bulleted lists of items are a very common feature of reports, but as we saw earlier they're surprisingly hard to get right.
Suppose, for example, we want a list of items bulleted by "diamonds":
<> A rubber sword (laminated with mylar to
look suitably shiny).
<> Cotton tights (summer performances).
<> Woolen tights (winter performances or
those actors who are willing to admit
to being over 65 years of age).
<> Talcum powder.
<> Codpieces (assorted sizes).
<> Singlet.
<> Double.
<> Triplet (Kings and Emperors only).
<> Supercilious attitude (optional).
Something like this works well enough:
for @items -> $item {
print form
'<> {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}', $item;
' {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}';
}
The first format produces the bullet plus the first line of text for the item, then the second format handles any overflow of the item data.
Alternatively, we could achieve the same result with a single format string by interpolating the bullet as well:
my $bullet = "<>";
for @items -> $item {
print form
"{''{*}''} {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
$bullet, $item;
}
Here we use a single-line starred verbatim field ({''{*}''}),
so that the bullet is interpolated "as-is" and the field
is only as wide as the bullet itself.
Then for the item itself we use a block field, which will format the item
data over as many lines as necessary. Meanwhile, because the bullet's
field is single-line, after the first line the bullet field will be
filled with spaces (instead of a "diamond"), leaving a bullet only on
the first line.
This second approach also has the advantage that we could change the bullet string at run-time and the format would adapt automatically.
However, it's still a little irritating that we have to set up a loop and
call form separately for each element of @items. After all, if we
didn't need to bullet our list we could just write:
print form
"{[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
@items;
and form would take care of iterating over the @items for us. It
seems that things ought to be that easy for bulleted lists as well.
And, of course, things are that easy.
All we need to do is tell form that whenever the string "<>"
appears in a format, it should be treated as a bullet. That is, it should
appear only beside the first line of text produced when formatting each
element of the adjacent field's data.
To tell form all that we use the :bullet option:
print form
:bullet("<>"),
"<> {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
@items;
or, more permanently:
use Form :bullet("<>");
# and later...
print form
"<> {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
@items;
The presence of this :bullet option causes form to treat the sequence
"<>" as a special field. That special field interpolates the
string "<>" when the field immediately to its right begins to
format a new data element, but thereafter interpolates only spaces until
the adjacent field finishes formatting that data element.
Or, more simply, if we tell form that "<>" is a bullet,
form treats it like a bullet that's attached to the very next field.
So we could finally fix our Shakespearean roles example, like so:
print "The best Shakespearean roles are:\n\n";
print form
:bullet("* "),
" * {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[} *{[[[[[[[[]]]]]]]]}*",
@roles, $disclaimer;
This would then produce:
The best Shakespearean roles are:
* Either of the 'two foolish *WARNING: *
officers': Dogberry and Verges *This list of roles*
* That dour Scot, the Laird *constitutes a*
Macbeth *personal opinion*
* The tragic Moor of Venice, *only and is in no*
Othello *way endorsed by*
* Rosencrantz's good buddy *Shakespeare'R'Us. *
Guildenstern *It may contain*
* The hideous and malevolent *nuts. *
Richard III * *
Notice too that the asterisks on either side of the disclaimer aren't
treated as bullets. That's because we defined a bullet to be "* ", and
neither of the disclaimer asterisks has a space after it.
Bullets can be any string we like, and there can be more than one of them in a single format. For example:
print form
:bullet('+'),
"+ {[[[[[[[[[[[[[[[[[[[…} + {…[[[[[[[[[[[[[[[[[[[}",
@items, @items;
would print:
+ A rubber sword, 65 years of age).
laminated with mylar + Talcum powder.
to look suitably + Codpieces (assorted
shiny. sizes).
+ Cotton tights (summer + Singlet.
performances). + Double.
+ Woolen tights (winter + Triplet (Kings and
performances or those Emperors only).
actors who are willing + Supercilious attitude
to admit to being over (optional).
We can even change bullets in mid-form, which is useful for
multi-level bulleting. Of course, in that case we're going
to need a loop again, since form itself has only
one level of intrinsic looping:
%categories = (
Animal => ["The mighty destrider, ship of the knight",
"The patient cat, warden of the granary",
"Our beloved king, whom we shall soon have to kill"],
Vegetable => ["The lovely peony, garland of Eddore",
"The mighty oak, from which tiny acorns grow",
"The humble cabbage, both food and metaphor for the fool"],
Mineral => ["Gold, for which men thirst",
"Salt, by which men thirst",
"Sand, on which men thirst"],
);
for %categories.kv -> $category, @examples {
print form
:bullet('*'), "* {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}", $category,
:bullet('-'), " - {[[[[[[[[[[[[[[[[[[[[[[[[}", @examples;
}
This would produce:
* Mineral
- Gold, for which men thirst
- Salt, by which men thirst
- Sand, on which men thirst
* Animal
- The mighty destrider, ship
of the knight
- The patient cat, warden of
the granary
- Our beloved king, whom we
shall soon have to kill
* Vegetable
- The lovely peony, garland
of Eddore
- The mighty oak, from which
tiny acorns grow
- The humble cabbage, both
food and metaphor for the
fool
All's well that ends...
Report generation was one of Perl's original raisons d'etre. Over the
years we've found out what format does well, and where its
limitations lurk. The new Perl 6 form function aims to preserve
format's simple approach to report generation and build on its strengths
by adding:
- independence from the I/O system;
- run-time specifiable format strings;
- a wider range of useful field types, including fully justified, verbatim, and overflow fields;
- the ability to define new field types;
- sophisticated formatting of numeric/currency data;
- declarative, imperative, distributive, and extensible field widths;
- more flexible control of headers, footers, and page layout;
- control over line-breaking, whitespace squeezing, and filling of empty fields; and
- support for creating plaintext lists, tables, and graphs.
And because it's now part of a module, rather than a core component,
form will be able to evolve more easily to meet the needs of its
community. For example, we are currently investigating how we might
add facilities for specifying numerical bullets, for formatting
text using variable-width fonts, and for outputting HTML instead
of plaintext.
If you're a regular user of Perl 5's format you might like to try the
form function instead. It's available right now in the Perl6::Form
module, which waits upon thy pleasure at the CPAN.
You must be logged in to the O'Reilly Network to post a talkback.

