Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Exegesis 7
by Damian Conway | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 7 for the current design information.

Define, define, well-educated infant.

Laziness is, of course, a major virtue. And one of the Laziest approaches to programming is never to repeat oneself. Which is why Perl 6 has subroutines and macros and classes and constants and dozens of other ways for us to factor out commonalities.

Occasionally, the same need for factoring arises in formatting. For example, suppose we want a field that masks its data in some way. Perhaps a field that blanks out certain words by replacing them with the corresponding number of X's.

We could always do that by writing a subroutine that generates the appropriate filter:


    sub expurgate (Str *@hidewords) {
        return sub (Str $data is rw) {
            $data ~~ s:ei/(@hidewords)/$( 'X' x length $1 )/;
            return $data;
        }
    }

We could then apply that subroutine to the data of any field that needed bowdlerization:


    my &censor := expurgate «villain plot libel treacherous murderer false deadly 'G'»;

    print form
        "[Ye following tranfcript hath been cenfored by Order of ye King]\n\n",
        "         {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                  censor($speech);

to produce:


    [Ye following tranfcript hath been cenfored by Order of ye King]

             And therefore, since I cannot prove a lover,  
             To entertain these fair well-spoken days,     
             I am determined to prove a XXXXXXX            
             And hate the idle pleasures of these days.    
             XXXXs have I laid, inductions dangerous,      
             By drunken prophecies, XXXXXs and dreams,     
             To set my brother Clarence and the king       
             In XXXXXX hate the one against the other:     
             And if King Edward be as true and just        
             As I am subtle, XXXXX and XXXXXXXXXXX,        
             This day should Clarence closely be mew'd up, 
             About a prophecy, which says that XXX         
             Of Edward's heirs the XXXXXXXX shall be.

Of course, if this were Puritanism and not Perl, we might have a long list of proscribed words that we needed to excise from every formatted text. In that case, rather that explicitly running every data source through the same censorious subroutine, it would be handy if form had a built-in field that did that for us automatically.

Naturally, form doesn't have such a field built-in...but we can certainly give it one.

User-defined field specifiers can be declared using the :field option, which takes as its value an array of pairs. The key of each pair is a string or a rule (i.e. regex) that specifies the syntax of the user-defined field. The value of each pair is a closure/subroutine that constructs a standard field specifier to replace the user-defined specifier. Alternatively, the value of a pair may be a string, which is taken as the (static) field specifier to be used instead of the user-defined field.

In other words, each pair is a macro that maps a user-defined field (specified by the pair's key) onto a standard form field (specified by the pair's value). For example:


    :field[ /\{ X+ \}/ => &censor_field ]

This tells form that whenever it finds a brace-delimited field consisting of one or more X's, it should call a subroutine named censor_field and use the return value of that call instead of the all-X field.

When the key of a :field pair matches some part of a format, its corresponding subroutine is called. That subroutine is passed the result (i.e. $0) of the rule match, as well as the hash of active options for that field. Changes to the options hash will affect the subsequent formatting behaviour of that field.

So censor_field could be implemented like so:


        # Constructor subroutine for user-defined censor fields...
        sub censor_field ($field_spec, %opts) {

            # Set up the field's 'break' option with a censorious break...
            %opts{break} = break_and_censor(%opts{break});

            # Construct a left-justified field with the appropriate width
            # specified imperatively...
            return '{[[{' ~ length($field_spec) ~ '}[[}';
        }

The censor_field subroutine has to change the field's :break option, creating a new line breaker that also expurgates unsuitable words. To do this it calls break_and_censor, which returns a new line breaker subroutine:


        # Create a new 'break' sub...
        sub break_and_censor (&original_breaker) {
            return sub (*@args) {

                # Call the field's original 'break' sub...
                my ($nextline, $more) = original_breaker(*@args);

                # X out any doubleplus ungood words
                $nextline ~~ s:ei/(@proscribed)/$( 'X' x length $1 )/;

                # Return the "corrected" version...
                return ($nextline, $more);
            }
        }

Having created a subroutine to translate censor fields and another to break-and-expurgate the data placed in them, we are now in a position to create a module that encapsulates the new formatting functionality:


    module Ministry::Of::Truth {

        # Internal mechanism (as above)...
        my @proscribed = «villain plot libel treacherous murderer false deadly 'G'»;
        sub break_and_censor (&original_breaker) {...}
        sub censor_field ($field_spec, %opts) {...}

        # Make the new field type standard by default in this scope...
        use Form :field[ /\{ X+ \}/ => &censor_field ];

        # Re-export the specialized &form that was imported above...  
        sub form is exported {...}

    }

Okay, admittedly that's quite a lot of work. But the pay-off is huge: we can now trample on free speech much more easily:


    use Ministry::Of::Truth;

    print form 
        "[Ye following tranfcript hath been cenfored by Order of ye King]\n\n",
        "        {XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}",
                  $speech;

And we'd get the same carefully XXXX'ed output as before.

Put thyself into the trick of singularity...

User-defined fields are also a handy way to create single-character markers for single-column fields (in order to preserve the one-to-one spacing of a format). For example:


    print form
        :field{ '^' => '{<III{1}III}',   # 1-char-wide, top-justified block
                '=' => '{<=II{1}II=}',   # 1-char-wide, middle-justified block
                '_' => '{<_II{1}II_}',   # 1-char-wide, bottom-justified block
              },
        '~~~~~~~~~',
        '^ _ = _ ^',   *«like round and orient perls»,
        '~~~~~~~~~';

prints:


    ~~~~~~~~~
    l     o p
    i r a r e
    k o n i r
    e u d e l
      n   n s
      d   t  
    ~~~~~~~~~

Note that we needed to use a unary * to flatten the «like round and orient perls» data list. That's because every argument of form is evaluated in scalar context, and an unflattened «...» list in scalar context becomes an array reference, rather than the five separate strings we needed to fill our five single-character fields.

Single fields are particularly useful for labelling the vertical axes of a graph:


    use Form :field[ '=' => '{<=II{1}II=}' ];

    @vert_label = «Villain's fortunes»
    $hor_label  = "Time";

    print form 
       '     ^                                        ',
       ' = = | {""""""""""""""""""""""""""""""""""""} ', *@vert_label, @data,
       '     +--------------------------------------->',
       '      {|||||||||||||||||||||||||||||||||||||} ', $hor_label;

which produces:


         ^                                        
         |                                        
     V   |       *                                
     i f |     *   *                              
     l o |    *     *                             
     l r |                                        
     a t |   *       *                            
     i u |                                        
     n n |  *         *                           
     ' e |                                        
     s s |                                        
         |                                        
         | *           *                          
         +--------------------------------------->
                           Time

Specifying these kinds of single-character block markers is perhaps the commonest use of user-defined fields. But the:


    :field[ '=' => '{<=II{1}II=}' ]

syntax is uncomfortably verbose for that purpose. So calls to form can also accept a short-hand notation to define a single-character field:


    :single('=')

or to define several at once:


    :single['#', '*', '+']

The :single option does exactly the same thing as the :field options shown above. It takes a single-character string, or a reference to an array of such strings, as its value. It then turns each of those strings into a single-column field marker. If the character is '=' then the field is vertically "middled" within its block. If the character is '_' then the field is "bottomed" within its block. If the single character is anything else, the resulting block is top-justified. So our previous example could also have been written:


    print form
        :single("="),
        '     ^                                        ',
        ' = = | {""""""""""""""""""""""""""""""""""""} ', *@vert_label, @data,
        '     +--------------------------------------->',
        '      {|||||||||||||||||||||||||||||||||||||} ', $hor_label;

These paper bullets of the brain...

Bulleted lists of items are a very common feature of reports, but as we saw earlier they're surprisingly hard to get right.

Suppose, for example, we want a list of items bulleted by "diamonds":


    <> A rubber sword (laminated with mylar to
       look suitably shiny).                   
    <> Cotton tights (summer performances).   
    <> Woolen tights (winter performances or  
       those actors who are willing to admit
       to being over 65 years of age).                 
    <> Talcum powder.                         
    <> Codpieces (assorted sizes).            
    <> Singlet.                               
    <> Double.                                
    <> Triplet (Kings and Emperors only).     
    <> Supercilious attitude (optional).

Something like this works well enough:


    for @items -> $item {
        print form
            '<> {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}', $item;
            '   {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}';
    }

The first format produces the bullet plus the first line of text for the item, then the second format handles any overflow of the item data.

Alternatively, we could achieve the same result with a single format string by interpolating the bullet as well:


    my $bullet = "<>";

    for @items -> $item {
        print form
            "{''{*}''} {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
             $bullet,  $item;
    }

Here we use a single-line starred verbatim field ({''{*}''}), so that the bullet is interpolated "as-is" and the field is only as wide as the bullet itself. Then for the item itself we use a block field, which will format the item data over as many lines as necessary. Meanwhile, because the bullet's field is single-line, after the first line the bullet field will be filled with spaces (instead of a "diamond"), leaving a bullet only on the first line.

This second approach also has the advantage that we could change the bullet string at run-time and the format would adapt automatically.

However, it's still a little irritating that we have to set up a loop and call form separately for each element of @items. After all, if we didn't need to bullet our list we could just write:


    print form
        "{[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
        @items;

and form would take care of iterating over the @items for us. It seems that things ought to be that easy for bulleted lists as well.

And, of course, things are that easy.

All we need to do is tell form that whenever the string "<>" appears in a format, it should be treated as a bullet. That is, it should appear only beside the first line of text produced when formatting each element of the adjacent field's data.

To tell form all that we use the :bullet option:


    print form
        :bullet("<>"),
        "<> {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
            @items;

or, more permanently:


    use Form :bullet("<>");

    # and later...

    print form
        "<> {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
            @items;

The presence of this :bullet option causes form to treat the sequence "<>" as a special field. That special field interpolates the string "<>" when the field immediately to its right begins to format a new data element, but thereafter interpolates only spaces until the adjacent field finishes formatting that data element.

Or, more simply, if we tell form that "<>" is a bullet, form treats it like a bullet that's attached to the very next field.

So we could finally fix our Shakespearean roles example, like so:


    print "The best Shakespearean roles are:\n\n";

    print form
        :bullet("* "),
        "   * {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}   *{[[[[[[[[]]]]]]]]}*",
              @roles,                                 $disclaimer;

This would then produce:


   The best Shakespearean roles are:

      * Either of the 'two foolish             *WARNING:          *
        officers': Dogberry and Verges         *This list of roles*
      * That dour Scot, the Laird              *constitutes      a*
        Macbeth                                *personal   opinion*
      * The tragic Moor of Venice,             *only and is in  no*
        Othello                                *way  endorsed   by*
      * Rosencrantz's good buddy               *Shakespeare'R'Us. *
        Guildenstern                           *It   may   contain*
      * The hideous and malevolent             *nuts.             *
        Richard III                            *                  *

Notice too that the asterisks on either side of the disclaimer aren't treated as bullets. That's because we defined a bullet to be "* ", and neither of the disclaimer asterisks has a space after it.

Bullets can be any string we like, and there can be more than one of them in a single format. For example:


    print form
        :bullet('+'),
        "+ {[[[[[[[[[[[[[[[[[[[…}       + {…[[[[[[[[[[[[[[[[[[[}",
            @items,                         @items;

would print:


    + A rubber sword,                65 years of age).     
      laminated with mylar         + Talcum powder.        
      to look suitably             + Codpieces (assorted   
      shiny.                         sizes).               
    + Cotton tights (summer        + Singlet.              
      performances).               + Double.               
    + Woolen tights (winter        + Triplet (Kings and    
      performances or those          Emperors only).       
      actors who are willing       + Supercilious attitude 
      to admit to being over         (optional).

We can even change bullets in mid-form, which is useful for multi-level bulleting. Of course, in that case we're going to need a loop again, since form itself has only one level of intrinsic looping:


    %categories = (
       Animal    => ["The mighty destrider, ship of the knight",
                     "The patient cat, warden of the granary",
                     "Our beloved king, whom we shall soon have to kill"],
       Vegetable => ["The lovely peony, garland of Eddore",
                     "The mighty oak, from which tiny acorns grow",
                     "The humble cabbage, both food and metaphor for the fool"],
       Mineral   => ["Gold, for which men thirst",
                     "Salt, by which men thirst",
                     "Sand, on which men thirst"],
    );

    for %categories.kv -> $category, @examples {
        print form
            :bullet('*'),  "* {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",  $category,
            :bullet('-'),  "    - {[[[[[[[[[[[[[[[[[[[[[[[[}",  @examples;
    }

This would produce:


    * Mineral                       
        - Gold, for which men thirst
        - Salt, by which men thirst 
        - Sand, on which men thirst 
    * Animal                        
        - The mighty destrider, ship
          of the knight             
        - The patient cat, warden of
          the granary               
        - Our beloved king, whom we 
          shall soon have to kill   
    * Vegetable                     
        - The lovely peony, garland 
          of Eddore                 
        - The mighty oak, from which
          tiny acorns grow          
        - The humble cabbage, both  
          food and metaphor for the 
          fool

All's well that ends...

Report generation was one of Perl's original raisons d'etre. Over the years we've found out what format does well, and where its limitations lurk. The new Perl 6 form function aims to preserve format's simple approach to report generation and build on its strengths by adding:

  • independence from the I/O system;
  • run-time specifiable format strings;
  • a wider range of useful field types, including fully justified, verbatim, and overflow fields;
  • the ability to define new field types;
  • sophisticated formatting of numeric/currency data;
  • declarative, imperative, distributive, and extensible field widths;
  • more flexible control of headers, footers, and page layout;
  • control over line-breaking, whitespace squeezing, and filling of empty fields; and
  • support for creating plaintext lists, tables, and graphs.

And because it's now part of a module, rather than a core component, form will be able to evolve more easily to meet the needs of its community. For example, we are currently investigating how we might add facilities for specifying numerical bullets, for formatting text using variable-width fonts, and for outputting HTML instead of plaintext.

If you're a regular user of Perl 5's format you might like to try the form function instead. It's available right now in the Perl6::Form module, which waits upon thy pleasure at the CPAN.