Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Exegesis 7
by Damian Conway | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.

And mark what way I make...

Obviously, as a call to form builds up each line of its output – extracting data from one or more data arguments and formatting it into the corresponding fields – it needs to keep track of where it's up to in each datum. It does this by progressively updating the .pos of each datum, in exactly the same way as a pattern match does.

And as with a pattern match, by default that updated .pos is only used internally and not preserved after the call to form is finished. So passing a string to form doesn't interfere with any other pattern matching or text formatting that we might subsequently do with that data.

However, sometimes we do want to know how much of our data a call to form managed to extract and format. Or we may want to split a formatting task into several stages, with separate calls to form for each stage. So we need a way of telling form to preserve the .pos information in our data.

But, if we want to apply a series of form calls to the same data we also need to be able to tell form to respect the .pos information of that data – to start extracting from the previously preserved .pos position, rather than from the start of the string.

To achieve both those goals, we use a follow-on field. That is we use an ordinary field but mark it as .pos-sensitive with a special notation: Unicode ellipses or ASCII colons at either end. So instead of {<<<<>>>>}, we'd write {…<<<>>>…} or {:<<<>>>:}.

Note that each ellipsis is a single, one-column wide Unicode HORIZONTAL ELLIPSIS character (\c[2026]), not three separate dots. The connotation of the ellipses is "...then keep on formatting from where you previously left off, remembering there's probably still more to come...". And the colons are the ASCII symbol most like a single character ellipsis (try tilting your head and squinting).

Follow-on fields are most useful when we want to split a formatting task into distinct stages – or iterations – but still allow the contents of the follow-on field to flow uninterrupted from line to line. For example:


    print "The best Shakespearean roles are:\n\n";

    for @roles -> $role {
        print form "   * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}   *{…<<<<<<<>>>>>>>…}*",
                         $role,                            $disclaimer;
    }

which produces:


    The best Shakespearean roles are:

       * Macbeth                          *WARNING:          *
       * King Lear                        *This list of roles*
       * Juliet                           *constitutes      a*
       * Othello                          *personal   opinion*
       * Hippolyta                        *only and is in  no*
       * Don John                         *way  endorsed   by*
       * Katerina                         *Shakespeare'R'Us. *
       * Richard                          *It   may   contain*
       * Malvolio                         *nuts.             *
       * Bottom                           *                  *

The multiple calls to form manage to produce a coherent disclaimer because the ellipses in the second field tell each call to start extracting data from $disclaimer at the offset indicated by $disclaimer.pos, and then to update $disclaimer.pos with the final position at which the field extracted data. So the next time form is called, the follow-on field starts extracting from where it left off in the previous call.

Follow-on fields are similar to ^<<<<< fields in a Perl 5 format, except they don't destroy the contents of a data source; they merely change that data source's .pos marker.


Therefore, put you in your best array...

Data, especially numeric data, is often stored in arrays. So form also accepts arrays as data arguments. It can do so because its parameter list is defined as:


        sub form (Str|Array|Pair *@args is context(Scalar)) {...}

which means that although its arguments may include one or more arrays, each such array argument is nevertheless evaluated in a scalar context. Which, in Perl 6, produces an array reference.

In other words, array arguments don't get flattened automatically, so form doesn't losing track of where in the argument list one array finishes and the next begins.

Once inside form, each array that was specified as the data source for a field is internally converted to a single string by joining it together with a newline between each element.

The upshot is that, instead of:


    print "The best Shakespearean roles are:\n\n";

    for @roles -> $role {
        print form "   * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}   *{…<<<<<<<>>>>>>>…}*",
                         $role,                            $disclaimer;
    }

we could just write:


    print "The best Shakespearean roles are:\n\n";

    print form "   * {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}   *{[[[[[[[[]]]]]]]]}*",
                     @roles,                           $disclaimer;

And the array of roles would be internally converted to a single string, with one role per line. Note that we also changed the disclaimer field to a regular block field, so that the entire disclaimer would be formatted. And there was no longer any need for the disclaimer field to be a follow-on field, since the block field would extract and format the entire disclaimer anyway.

Note, however, that this block-based approach wouldn't work so well if one of the elements of @roles was too big to fit on a single line. In that case we might end up with something like the following:


   The best Shakespearean roles are:

      * Either of the 'two foolish             *WARNING:          *
      * officers': Dogberry and Verges         *This list of roles*
      * That dour Scot, the Laird              *constitutes      a*
      * Macbeth                                *personal   opinion*
      * The tragic Moor of Venice,             *only and is in  no*
      * Othello                                *way  endorsed   by*
      * Rosencrantz's good buddy               *Shakespeare'R'Us. *
      * Guildenstern                           *It   may   contain*
      * The hideous and malevolent             *nuts.             *
      * Richard III                            *                  *

rather than:


   The best Shakespearean roles are:

      * Either of the 'two foolish             *WARNING:          *
        officers': Dogberry and Verges         *This list of roles*
      * That dour Scot, the Laird              *constitutes      a*
        Macbeth                                *personal   opinion*
      * The tragic Moor of Venice,             *only and is in  no*
        Othello                                *way  endorsed   by*
      * Rosencrantz's good buddy               *Shakespeare'R'Us. *
        Guildenstern                           *It   may   contain*
      * The hideous and malevolent             *nuts.             *
        Richard III                            *                  *

That's because the "*" that's being used as a bullet for the first column is a literal (i.e. mere decoration), and so it will be repeated on every line that is formatted, regardless of whether that line is the start of a new element of @roles or merely the broken-and-wrapped remains of the previous element. Happily, as we shall see later, this particular problem has a simple solution.

Despite these minor complications, array data sources are particularly useful when formatting, especially if the data is known to fit within the specified width. For example:


    print form
        '-------------------------------------------',   
        'Name             Score   Time  | Normalized',   
        '-------------------------------------------',   
        '{[[[[[[[[[[[[}   {III}   {II}  |  {]]].[[} ',
         @name,           @score, @time,   [@score »/« @time];

is a very easy way to produce the table:


    -------------------------------------------
    Name             Score   Time  | Normalized
    -------------------------------------------
    Thomas Mowbray    88      15   |     5.867
    Richard Scroop    54      13   |     4.154
    Harry Percy       99      18   |     5.5

Note the use of the Perl6-ish listwise division (»/«) to produce the array of data for the "Normalized" column.


More particulars must justify my knowledge...

The most commonly used fields are those that justify their contents: to the left, to the right, to the left and right, or towards the centre.

Left-justified and right-justified fields extract from their data source the largest substring that will fit inside them, push that string to the left or right as appropriate, and then pad the string out to the required field width with spaces (or the nominated fill character).

Centred fields ({>>>><<<<} and {]]]][[[[}) likewise extract as much data as possible, and then pad both sides of it with (near) equal numbers of spaces. If the amount of padding required is not evenly divisible by 2, the one extra space is added after the data.

There is a second syntax for centred fields – a tip-o'-the-hat to Perl 5 formats: {|||||||||} and {IIIIIIII}. This variant also makes it easier to specify centering fields that are only three columns wide: {|} and {I}.

Note, however, that the behaviour of centering fields specified this way is exactly the same in every respect as the bracket-based versions, so we're free to use whichever we prefer.

Fully justified fields ({<<<<>>>>} and {[[[[]]]]}) extract a maximal substring and then distribute any padding as evenly as possible into the existing whitespace gaps in that data. For example:


    print form '({<<<<<<<<<>>>>>>>>>>>})',
               "A fellow of infinite jest, of most excellent fancy";

would print:


    (A fellow  of  infinite)

A fully-justified block field ({[[[[]]]]}) does the same across multiple lines, except that the very last line is always left-justified. Hence, this:


    print form '({[[[[[[[[]]]]]]]})',
               "All the world's a stage, And all the men and women merely players."

would print:


    (All the world's a)
    (stage,  And   all)
    (the men and women)
    (merely players.  )

By the way, with both centred fields ({>>>><<<}) and fully justified fields ({<<<>>>>}), the actual number of left vs right arrows is irrelevant, so long as there is at least one of each.


What, is't too short?

One special case we need to consider is an empty set of field delimiters:


    form 'ID number: {}'

This specification is treated as a two-column-wide, left-justified block field (since that seems to be the type of two-column-wide field most often required).

Other kinds of two-column (and single-column) fields can also be created using imperative field widths and and user-defined fields.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

Next Pagearrow