Exegesis 7
by Damian ConwayFebruary 27, 2004
What a piece of work is Perl 6!
How noble in reason!
How infinite in faculty!
In form how express and admirable!
– W. Shakespeare, "Hamlet" (Perl 6 revision)
Formats are Perl 5's mechanism for creating text templates with fixed-width fields. Those fields are then filled in using values from prespecified package variables. They're a useful tool for generating many types of plaintext reports – the r in Perl, if you will.
Unlike Perl 5, Perl 6 doesn't have a format keyword. Or the
associated built-in formatting mechanism. Instead it has a Form.pm
module. And a form function.
Like a Perl 5 format statement, the form function takes a series
of format (or "picture") strings, each of which is immediately
followed by a suitable set of replacement values. It interpolates
those values into the placeholders specified within each picture string,
and returns the result.
The general idea is the same as for Perl's two other built-in string
formatting functions: sprintf and pack. The first argument
represents a template with N placeholders to be filled in, and the
next N arguments are the data that is to be formatted and
interpolated into those placeholders:
$text = sprintf $format_s, $datum1, $datum2, $datum3;
$text = pack $format_p, $datum1, $datum2, $datum3;
$text = form $format_f, $datum1, $datum2, $datum3;
Of course, these three functions use quite different mini-languages to specify the templates they fill in, and all three fill in those templates in quite distinct ways.
Apart from those differences in semantics, form has a syntactic
difference too. With form, after the first N data arguments we're
allowed to put a second format string and its corresponding data, then a
third format and data, and so on:
$text = form $format_f1, $datum1, $datum2, $format_f2, $datum4, $format_f3, $datum5;
And if we prettify that function call a little, it becomes obvious that it has
the same basic structure as a Perl 5 format:
form
$format_f1,
$datum1, $datum2, $datum3,
$format_f2,
$datum4,
$format_f3,
$datum5;
But the Perl 6 version is implemented as a vanilla Perl 6 subroutine,
rather than hard-coded into the language with a special keyword and
declaration syntax. In this respect it's rather like Perl 5's
little-known formline function – only much, much better.
So, whereas in Perl 5 we might write:
# Perl 5 code...
our ($name, $age, $ID, $comments);
format STDOUT
===================================
| NAME | AGE | ID NUMBER |
|----------+------------+-----------|
| @<<<<<<< | @||||||||| | @>>>>>>>> |
$name, $age, $ID,
|===================================|
| COMMENTS |
|-----------------------------------|
| ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |~~
$comments,
===================================
.
write STDOUT;
in Perl 6 we could write:
print form
" =================================== ",
"| NAME | AGE | ID NUMBER |",
"|----------+------------+-----------|",
"| {<<<<<<} | {||||||||} | {>>>>>>>} |",
$name, $age, $ID,
"|===================================|",
"| COMMENTS |",
"|-----------------------------------|",
"| {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[} |",
$comments,
" =================================== ";
And both of them would print something like:
===================================
| NAME | AGE | ID NUMBER |
|----------+------------+-----------|
| Richard | 33 | 000003 |
|===================================|
| COMMENTS |
|-----------------------------------|
| Talks to self. Seems to be |
| overcompensating for inferiority |
| complex rooted in post-natal |
| materal rejection due to physical |
| handicap (congenital or perhaps |
| the result of premature birth). |
| Shows numerous indications of |
| psychotic (esp. nepocidal) |
| tendencies. Naturally, subject |
| gravitated to career in politics. |
===================================
At first glance the Perl 6 version may seem like something of a backwards step – all those extra quotation marks and commas that the Perl 5 format didn't require. But the new formatting interface does have several distinct advantages:
- it uses the standard Perl 6 subroutine call syntax, so we can use the full power of Perl data structures and control flow when setting up formats;
- it delimits every field specification by braces, which allows for a much wider range of field types;
-
it removes the special meanings of
'@','^','~', and'.'in formats, leaving only'{'as special; - it provides an extension mechanism for creating new field types;
- it greatly simplifies the common task of formatting data into a string (rather than requiring the format data to be written to an output stream);
- it doesn't destroy the contents of data variables when formatting them across multiple lines;
-
it's easy to create new formats on-the-fly, rather than being
forced to statically declare them at compile-time (or in a run-time
string
eval); -
it allows calls to
formto be nested; -
it supports dynamically computed page headers and footers, which
may themselves make use of nested calls to
form; - it doesn't rely on package variables, typeglobs, or a global accumulator;
-
it doesn't require a (frequently cryptic) call to the
mysterious
writefunction – and hence frees upwriteto be used as the true opposite ofread, should Larry so desire.
Of course, this is Perl, not Puritanism. So those folks who happen to like package variables, global accumulators, and mysterious writes, can still have them. And, if they're particularly nostalgic, they can also get rid of all the quotation marks and commas, and even retain the dot as a format terminator. For example:
sub myster_rite {
our ($name, $age, $ID, $comments);
print form :interleave, <<'.'
===================================
| NAME | AGE | ID NUMBER |
|----------+------------+-----------|
| {<<<<<<} | {||||||||} | {>>>>>>>} |
|===================================|
| COMMENTS |
|-----------------------------------|
| {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[} |
===================================
.
$name, $age, $ID,
$comments;
}
# and elsewhere in the same package...
($name, $age, $ID, $comments) = get_data();
myster_rite();
($name, $age, $ID, $comments) = get_more_data();
myster_rite();
Let's take a look...
What's in a name?
But before we do, here's a quick run-down of some of the highly arcane technical jargon we'll be using as we talk about formatting:
- Format
- A string that is used as a template for the creation of text. It will contain zero or more fields, usually with some literal characters and whitespace between them.
- Text
-
A string that is created by replacing the fields of a format with specific
data values. For example, the string that a call to
formreturns. - Field
- A fixed-width slot within a format string, into which data will be formatted.
- Data
- A string or numeric value (or an array of such values) that is interpolated into a format, in order to fill in a particular field.
- Single-line field
- A field that interpolates only as much of its corresponding data value as will fit inside it within a single line of text.
- Block field
- A field that interpolates all of its corresponding data value, over a series of text lines – as many as necessary – producing a text block.
- Text block
- The column of newline-separated text lines. A text block is produced when data is formatted into a block field that is too small to contain the data in a single line
- Column
- The amount of space on an output device required to display one single-width character. One character will occupy one column in most cases, the most obvious exceptions being CJK double-width characters.

