Recently in Text Tools Category

Reverse Callback Templating

Programmers have long recognized that separating code logic from presentation is good. The Perl community has produced many fine systems for doing just this. While there are many systems, they largely fall within two execution models, pipeline and callback (as noted by Perrin Harkins in Choosing a Templating System). HTML::Template and Template Toolkit are in the pipeline category. Their templates consist of simple presentation logic in the form of loops and conditionals and template variables. The Perl program does its work, then loads and renders the appropriate template, as if data were flowing through a pipeline. Mason and Embperl fall into the callback category. They mix code in with the template markup, and the template "calls back" to Perl when it encounters program logic.

A third execution model exists: the reverse callback model. Template and code files are separate, just like in the pipeline approach. Instead of using a mini-language to handle display logic, however, the template consists of named sections. Perl executes and calls a specific section of the template at the appropriate time, rendering it. Effectively, this is the opposite of the callback method, which wraps Perl logic around portions (or sections) of a template in a single file. Reverse callback uses Perl statements to load, or call, specific portions of the the template. This approach has a few distinct advantages.

A Reverse Callback Example

Suppose that you have a simple data structure you are dying to output as pretty HTML.

my @goods = (
    "oxfords,Brown leather,\$85,0",
    "hiking,All sizes,\$55,7",
    "tennis shoes,Women's sizes,\$35,15",
    "flip flops,Colors of the rainbow,\$7,90"
    );

First, you need an HTML template with the appropriate sections defined. Sections are of vital importance; they enable Template::Recall to keep the logic squarely in the code. Template::Recall uses the default pattern /[\s*=+\s*\w+\s*=+\s*]/ (to match, for example, [==== section_name ====]) to determine sections in a single file. The start of one section denotes the end of another. This is because Template::Recall uses a split() operation based on the above regex, saving the \w+ as the section key in an internal data structure.

[ =================== header ===================]

<html>
<head>
    <title>my site - [' title ']</title>
</head>
<body>

<h4>The date is [' date ']</h4>



<table border="1">

    <tr>
        <th>Shoe</th>
        <th>Details</th>
        <th>Price</th>
        <th>Quantity</th>
    </tr>

[ =================== product_row =================== ]
    <tr>
        <td>[' shoe ']</td>
        <td>[' details ']</td>
        <td>[' price ']</td>
        <td>[' quantity ']</td>
    </tr>


[= footer =]
</table>

</body>
</html>

This template is quite simple. It has three sections, a "header," "product_row," and "footer." The sections essentially give away how the program logic is going to work. A driver program would call header and footer only once during program execution (start and end, respectively). product_row will be called multiple times during iteration over an array.

Names contained within the delimeters [' and '] are template variables for replacement during rendering. For example, [' date '] will be replaced by the current date when the program executes.

The driver code must first instantiate a new Template::Recall object, $tr, and pass it the path of the template, which I've saved as the file template1.html.

use Template::Recall;

my $tr = Template::Recall->new( template_path => 'template1.html');

With $tr created, the template sections are loaded and ready for use. The obvious first step is to render the header section with the render() method. render() takes the name of the section to process, and optionally, a hash of names and values to replace in that section. There are two template variables in the header section, [' title '] and [' date '], so the call looks like:

print $tr->render( 'header', { title => 'MyStore', date => scalar(localtime) } );

The names used in the hash must match the names of the template variables in the section you intend to render. For example, date => scalar(localtime) means that [' date '] in the header section will be dynamically replaced by the value produced by scalar(localtime).

You probably noticed from the template that the header section created the start of an HTML table. This is a fine time to render @goods as the table's rows.

for my $good (@goods)
{
    my @attr     = split(/,/, $good);
    my $quantity = $attr[3] eq '0' ? 'Out of stock' : $attr[3];

    my %row      = (
        shoe     => $attr[0],
        details  => $attr[1],
        price    => $attr[2],
        quantity => $quantity,
    );

    print $tr->render('product_row', \%row);
}

In actual code, this array would likely come from a database. For each row, the driver makes necessary logical decisions (such as displaying "Out of stock" if the quantity equals "0"), then calls $tr->render() to replace the placeholders in the template section with the values from %row.

Finally, the driver renders the footer of the HTML output. There are no template variables to replace, so there's no need to pass in a hash.

print $tr->render('footer');

The result is this nice little output of footwear inventory:

The date is Fri Aug 10 14:22:30 2007

Shoe Details Price Quantity
oxfords Brown leather $85 Out of stock
hiking All sizes $55 7
tennis shoes Women's sizes $35 15
flip flops Colors of the rainbow $7 90

The Logic Is in the Code

What happens if you extend your shoe data slightly, to add categories? For instance, what if @goods looks like:

my @goods = (
    "dress,oxfords,Brown leather,\$85,0",
    "sports,hiking,All sizes,\$55,7",
    "sports,tennis shoes,Women's sizes,\$35,15",
    "recreation,flip flops,Colors of the rainbow,\$7,90"
    );

The output now needs grouping, which implies the use of nested loops. One loop can output the category header -- sports, dress, or recreation shoes -- and another will output the details of each shoe in that category.

To handle this in HTML::Template, you would generally build a nested data structure of anonymous arrays and hashes, and then process it against nested <TMPL_LOOP> directives in the template. Template::Recall logic remains in the code, you would build a nested loop structure in Perl that calls the appropriate sections. You can also use a hash to render the category sections as keys and detail sections as values in a single pass, and output them together using join.

The template needs some modification:

[====== table_start ====]
<table border="1">
[====== category =======]
<tr><td colspan="4"><b>['category']</b></td></tr>
[====== detail ======]
<tr><td>['shoe']</td><td>['detail']</td><td>['price']</td><td>['quantity']</td></tr>
[======= table_end ====]
</table>

This template now has a section called "category," a single table row that spans all columns. The "detail" section is pretty much the same as in the previous.

my %inventory;

for my $good (@goods) {
    my @attr = split(/,/, $good);
    my $q    = $attr[4] == 0 ? 'Out of stock' : $attr[4];

    $inventory{ $tr->render('category', { category => $attr[0] } ) } .=
        $tr->render('detail',
            {
                shoe     => $attr[1],
                detail   => $attr[2],
                price    => $attr[3],
                quantity => $q,
            } );
}

print $tr->render('table_start') .
    join('', %inventory) .
    $tr->render('table_end');

This loop looks surprisingly similar to the first example, doesn't it? That's because it is. Instead of printing each row, however, this code renders the first column in @goods against the category template section, and then storing the output as a key in %inventory. In the same iteration, it renders the remaining columns against the detail section and appends to the value of that key.

After storing the rendered sections in this way to %inventory, the code prints everything with a single statement, using join to print all the values in %inventory, including keys. The output is:

recreation
flip flops Colors of the rainbow $7 90
sports
hiking All sizes $55 7
tennis shoes Women's sizes $35 15
dress
oxfords Brown leather $85 Out of stock

The code also handles conditional output. Suppose that at your growing online shoe emporium you provide special deals to customers who have bought over a certain dollar amount. As they browse your shoe inventory, these deals appear.

if ( $customer->is_elite ) {
    print $tr->render('special_deals', get_deals('elite') );
}
else {
    print $tr->render('standard_deals', get_deals() );
}

What about producing XML output? This usually requires a separate template? You can conditionally load a .xml or .html template:

my $tr;
if ( $q->param('fmt') eq 'xml' ) {
    $tr = Template::Recall->new( template_path => 'inventory.xml' );
}
else {
    $tr = Template::Recall->new( template_path => 'inventory.html' );
}

Perl provides everything you need to handle model, controller, and view logic. Template::Recall capitalizes on this and helps to make projects code driven.

Template Model Comparison

It's important to note a few things that occurred in these examples -- or failed to occur, rather. First, there's no mixture of code and template markup. All template access occurs through the method call $tr->render(). This is strong separation of concerns (SOC), just like the pipeline model, and unlike the callback model, which mixes template markup and code in the same file. Not only does strong SOC provide good code organization, it also keeps designers from having to sift through code to change markup. Consider using Mason to output the rows of @goods.

% for my $good (@goods) {
%  my @attr     = split(/,/, $good);
%  my $quantity = $attr[3] eq '0' ? 'Out of stock' : $attr[3];
<tr>
<td><% $attr[0] %></td>
<td><% $attr[1] %></td>
<td><% $attr[2] %></td>
<td><% $quantity %></td>
</tr>
% }

This is an efficient approach, and easy enough for a programmer to walk through. It becomes difficult to maintain though, when designers are involved, if for no other reason than because a designer and a programmer need to access the same file to do their respective work. Design changes and code changes will not always share the same schedule because they belong to different domains. It also means that in order to switch templates, say to output XML or text (or both), you have to add more and more conditionals and templates to the code, making it increasingly difficult to read.

The other thing that did not occur in this example is the leaking of any kind of logic (presentation or otherwise) into the template. Consider that HTML::Template would have to insert the <TMPL_LOOP> statement in the template in order to output the rows of @goods.

    <TMPL_LOOP NAME="PRODUCT">
    <tr>
    <td><TMPL_VAR NAME=SHOE></td>
    <td><TMPL_VAR NAME=DETAILS></td>
    <td><TMPL_VAR NAME=PRICE></td>
    <td><TMPL_VAR NAME=QUANTITY></td>
    </tr>
    </TMPL_LOOP>

That's not a big deal, really. If you care about line count, this only requires one extra line over the Template::Recall version, and that's the the closing tag </TMPL_LOOP>. Nonetheless, the template now states some of the logic for the application. Sure, it's only presentation logic, but it's logic nonetheless. HTML::Template also provides <TMPL_IF> for displaying items conditionally, and <TMPL_INCLUDE> for including other templates. Again, this is logic contained in the template files.

Template::Recall keeps as much logic as possible in the code. If you need to display something conditionally, use Perl's if statement. If you need to include other templates, load them using a Template::Recall object. Whereas the pipeline models likely work better for projects with a fairly sophisticated design team, Template::Recall tries to be the programmer's friend and let him or her steer from the most comfortable place, the code.

There is also a subtle cost to using the pipeline model for a simple loop like that above. Consider this HTML::Template footwear data code:

my $template = HTML::Template->new(filename => template1.tmpl');

my @output;

for my $good (@goods)
{
    my @attr = split(/,/, $_);
    my %row  = (
        SHOE     => $attr[0],
        DETAILS  => $attr[1],
        PRICE    => $attr[2],
        QUANTITY => $attr[3],
    );
    push( @output, \%row );
}

$template->param(PRODUCT => \@output);

print $template->output();

The code iterates over @goods and builds a second array, @output, with the rows as hash references. Then the template iterates over @output within <TMPL_LOOP>. That's walking over the same data twice. Template sections do not suffer this cost, because you can output the data immediately, as you get it:

print $tr->render('product_row', \%row);

This is essentially what happens with Mason (or JSP/PHP/ASP for that matter). The main difference is that Template::Recall renders the section through a method call rather than mixing code and template.

Template::Recall, by using sectioned templates, combines the efficiency of the callback model with the strong, clean separation of concerns inherent in the pipeline model, and perhaps gets the best of both worlds.

FMTYEWTK About Mass Edits In Perl

For those not used to the terminology, FMTYEWTK stands for Far More Than You Ever Wanted To Know. This one is fairly light as FMTYEWTKs usually go. In any case, the question before us is, "How do you apply an edit against a list of files using Perl?" Well, that depends on what you want to do....

The Beginning

If you only want to read in one or more files, apply a regex to the contents, and spit out the altered text as one big stream -- the best approach is probably a one-liner such as the following:

perl -p -e "s/Foo/Bar/g" <FileList>

This command calls perl with the options -p and -e "s/Foo/Bar/g" against the files listed in FileList. The first argument, -p, tells Perl to print each line it reads after applying the alteration. The second option, -e, tells Perl to evaluate the provided substitution regex rather than reading a script from a file. The Perl interpreter then evaluates this regex against every line of all (space separated) files listed on the command line and spits out one huge stream of the concatenated fixed lines.

In standard fashion, Perl allows you to concatenate options without arguments with following options for brevity and convenience. Therefore, you'll more often see the previous example written as:

perl -pe "s/Foo/Bar/g" <FileList>

In-place Editing

If you want to edit the files in place, editing each file before going on to the next, that's pretty easy, too:

perl -pi.bak -e "s/Foo/Bar/g" <FileList>

The only change from the last command is the new option -i.bak, which tells Perl to operate on files in-place, rather than concatenating them together into one big output stream. Like the -e option, -i takes one argument, an extension to add to the original file names when making backup copies; for this example I chose .bak. Warning: If you execute the command twice, you've most likely just overwritten your backups with the changed versions from the first run. You probably didn't want to do that.

Because -i takes an argument, I had to separate out the -e option, which Perl otherwise would interpret as the argument to -i, leaving us with a backup extension of .bake, unlikely to be correct unless you happen to be a pastry chef. In addition, Perl would have thought that "s/Foo/Bar/" was the filename of the script to run, and would complain when it could not find a script by that name.

Running Multiple Regexes

Of course, you may want to make more extensive changes than just one regex. To make several changes all at once, add more code to the evaluated script. Remember to separate each additional line of code with a semicolon (technically, you should place a semicolon at the end of each line of code, but the very last one in any code block is optional). For example, you could make a series of changes:

perl -pi.bak -e "s/Bill Gates/Microsoft CEO/g;
 	s/CEO/Overlord/g" <FileList>

"Bill Gates" would then become "Microsoft Overlord" throughout the files. (Here, as in all examples, we ignore such finicky things as making sure we don't change "HERBACEOUS" to "HERBAOverlordUS"; for that kind of information, refer to a good treatise on regular expressions, such as Jeffrey Friedl's impressive book Mastering Regular Expressions, 2nd Edition. Also, I've wrapped the command to fit, but you should type it in as just one line.)

Doing Your Own Printing

You may wish to override the behavior created by -p, which prints every line read in, after any changes made by your script. In this case, change to the -n option. -p -e "s/Foo/Bar/" is roughly equivalent to -n -e "s/Foo/Bar/; print". This allows you to write interesting commands, such as removing lines beginning with hash marks (Perl comments, C-style preprocessor directives, etc.):

perl -ni.bak -e "print unless /^\s*#/;" <FileList>

Fields and Scripts

Of course, there are far more powerful things you can do with this. For example, imagine a flat-file database, with one row per line of the file, and fields separated by colons, like so:

Bill:Hennig:Male:43:62000
Mary:Myrtle:Female:28:56000
Jim:Smith:Male:24:50700
Mike:Jones:Male:29:35200
...

Suppose you want to find everyone who was over 25, but paid less than $40,000. At the same time, you'd like to document the number and percentage of women and men found. This time, instead of providing a mini-script on the command line, we'll create a file, glass.pl, which contains the script. Here's how to run the query:

perl -naF':' glass.pl <FileList>

glass.pl contains the following:

BEGIN { $men = $women = $lowmen = $lowwomen = 0; }

next unless /:/;
/Female/ ? $women++ : $men++;
if ($F[3] > 25 and $F[4] < 40000)
    { print; /Female/ ? $lowwomen++ : $lowmen++; }

END {
print "\n\n$lowwomen of $women women (",
      int($lowwomen / $women * 100),
      "%) and $lowmen of $men men (",
      int($lowmen / $men * 100),
      "%) seem to be underpaid.\n";
}

Don't worry too much about the syntax, other than to note some of the awk and C similarities. The important thing here and in later sections is to see how Perl makes these problems easily solvable.

Several new features appear in this example; first, if there is no -e option to evaluate, Perl assumes the first filename listed, in this case glass.pl, refers to a Perl script for it to execute. Secondly, two new options make it easy to deal with field-based data. -a (autosplit mode) takes each line and splits its fields into the array @F, based on the field delimiter given by the -F (Field delimiter) option, which can be a string or a regex. If no -F option exists, the field delimiter defaults to ' ' (one single-quoted space). By default, arrays in Perl are zero-based, so $F[3] and $F[4] refer to the age and pay fields, respectively. Finally, the BEGIN and END blocks allow the programmer to perform actions before file reading begins and after it finishes, respectively.

File Handling

All of these little tidbits have made use only of data from within the files being operated on. What if you want to be able to read in data from elsewhere? For example, imagine that you had some sort of file that allows includes; in this case, we'll assume that you somehow specify these files by relative pathname, rather than looking them up in an include path. Perhaps the includes look like the following:

...
#include foo.bar, baz.bar, boo.bar
...

If you want to see what the file looks like with the includes placed into the master file, you might try something like this:

perl -ni.bak -e "if (s/#include\s+//) {foreach $file
 (split /,\s*/) {open FILE, '<', $file; print <FILE>}}
 else {print}" <FileList>

To make it easier to see what's going on here, this is what it looks like with a full set of line breaks added for clarity:

perl -ni.bak -e "
        if (s/#include\s+//) {
            foreach $file (split /,\s*/) {
                open FILE, '<', $file;
                print <FILE>
            }
        } else {
            print
        }
    " <FileList>

Of course, this only expands one level of include, but then we haven't provided any way for the script to know when to stop if there's an include loop. In this little example, we take advantage of the fact that the substitution operator returns the number of changes made, so if it manages to chop off the #include at the beginning of the line, it returns a non-zero (true) value, and the rest of the code splits apart the list of includes, opens each one in turn, and prints its entire contents.

There are some handy shortcuts as well: if you open a new file using the name of an old file handle (FILE in this case), Perl automatically closes the old file first. In addition, if you read from a file using the <> operator into a list (which the print function expects), it happily reads in the entire file at once, one line per list entry. The print call then prints the entire list, inserting it into the current file, as expected. Finally, the else clause handles printing non-include lines from the source, because we are using -n rather than -p.

Better File Lists

The fact that it is relatively easy to handle filenames listed within other files indicates that it ought to be fairly easy to deal entirely with files read from some other source than a list on the end of the command line. The simplest case is to read all of the file contents from standard input as a single stream, which is common when building up pipes. As a matter of fact, this is so common that Perl automatically switches to this mode if there are no files listed on the command line:

<Source> | perl -pe "s/Foo/Bar/g" | <Sink>

Here Source and Sink are the commands that generate the raw data and handle the altered output from Perl, respectively. Incidentally, the filename consisting of a single hyphen (-) is an explicit alias for standard input; this allows the Perl programmer to merge input from files and pipes, like so:

<Source> | perl -pe "s/Foo/Bar/g" header.bar - footer.bar
 | <Sink>

This example first reads a header file, then the input from the pipe source, and then a footer file — the whole mess. The program modifies this text and sends it through to the out pipe.

As I mentioned earlier, when dealing with multiple files it is usually better to keep the files separate, by using in-place editing or by explicitly handling each file separately. On the other hand, it can be a pain to list all of the files on the command line, especially if there are a lot of files, or when dealing with files generated programmatically.

The simplest method is to read the files from standard input, pushing them onto @ARGV in a BEGIN block; this has the effect of tricking Perl into thinking it received all of the filenames on the command line! Assuming the common case of one filename per input line, the following will do the trick:

<FilenamesSource> | perl -pi.bak -e "BEGIN {push @ARGV,
 <STDIN>; chomp @ARGV} s/Foo/Bar/g"

Here we once again use the shortcut that reading in a file in a list context (which push provides) will read in the entire file. This adds the entire contents, one filename per entry, to the @ARGV array, which normally contains the list of arguments to the script. To complete the trick, we chomp the line endings from the filenames, because Perl normally returns the line ending characters (a carriage return and/or a line feed) when reading lines from a file. We don't want to consider these to be part of the filenames. (On some platforms, you could actually have filenames containing line ending characters, but then you'd have to make the Perl code a little more complex, and you deserve to figure that out for yourself for trying it in the first place.)

Response Files

Another common design is to provide filenames on the command line as usual, treating filenames starting with an @ specially. The program should consider their contents to be lists of filenames to insert directly into the command line. For example, if the contents of the file names.baz (often called a response file) are:

two
three
four

then this command:

perl -pi.bak -e "s/Foo/Bar/g" one @names.baz five

should work equivalently to:

perl -pi.bak -e "s/Foo/Bar/g" one two three four five

To make this work, we once again need to do a little magic in a BEGIN block. Essentially, we want to parse through the @ARGV array, looking for filenames that begin with @. We pass through any unmarked filenames, but for each response file found, we read in the contents of the response file and insert the new list of filenames into @ARGV. Finally, we chomp the line endings, just as in the previous section. This produces a canonical file list in @ARGV, just as if we'd specified all of the files on the command line. Here's what it looks like in action:

perl -pi.bak -e "BEGIN {@ARGV = map {s/^@// ? @{open RESP,
 '<', $_; [<RESP>]} : $_} @ARGV; chomp @ARGV} s/Foo/Bar/g"
 <ResponseFileList>

Here's the same code with line breaks added so you can see what's going on:

perl -pi.bak -e "
        BEGIN {
            @ARGV = map {
                        s/^@// ? @{open RESP, '<', $_;
                                   [<RESP>]}
                               : $_
                    } @ARGV;
            chomp @ARGV
        }
        
        s/Foo/Bar/g
    " <ResponseFileList>

The only tricky part is the map block. map applies a piece of code to every element of a list, returning a list of the return values of the code; the current element is in the $_ special variable. The block here checks to see if it could remove a @ from the beginning of each filename. If so, it opens the file, reads the whole thing into an anonymous temporary array (that's what the square brackets are there for), and then inserts that array instead of the response file's name (that's the odd @{...} construct). If there is no @ at the beginning of the filename to remove, the filename goes directly into the map results. Once we've performed this expansion and chomped any line endings, we can then proceed with the main work, in this case our usual substitution, s/Foo/Bar/g.

Recursing Directories

For our final example, let's deal with a major weakness in the way we've been doing things so far — we're not recursing into directories, instead expecting all of the files we need to read to appear explicitly on the command line. To perform the recursion, we need to pull out the big guns: File::Find. This Perl module provides very powerful recursion methods. It also comes standard with any recent version of the Perl interpreter. The command line is deceptively simple, because all of the brains are in the script:

perl cleanup.pl <DirectoryList>

This script will perform some basic housecleaning, marking all files readable and writeable, removing those with the extensions .bak, .$$$, and .tmp, and cleaning up .log files. For the log files, we will create a master log file (for archiving or perusal) containing the contents of all of the other logs, and then delete the logs so that they remain short over time. Here's the script:

use File::Find;

die "All arguments must be directories!"
    if grep {!-d} @ARGV;
open MASTER, '>', 'master.lgm';
finddepth(\&filehandler, @ARGV);
close MASTER;
rename 'master.lgm', 'master.log';

sub filehandler
{
    chmod stat(_) | 0666, $_ unless (-r and -w);
    unlink if (/\.bak$/ or /\.tmp$/ or /\.\$\$\$$/);
    if (/\.log$/) {
        open LOG, '<', $_;
        print MASTER "\n\n****\n$File::Find::name\n****\n";
        print MASTER <LOG>;
        close LOG;
        unlink;
    }
}

This example shows just how powerful Perl and Perl modules can be, and at the same time just how obtuse Perl can appear to the inexperienced. In this case, the short explanation is that the finddepth() function iterates through all of the program arguments (@ARGV), recursing into each directory and calling the filehandler() subroutine for each file. That subroutine then can examine the file and decide what to do with it. The example checks for readability and writability with -r and -w, fixing the file's security settings if needed with chmod. It then unlinks (deletes) any file with a name ending in any of the three unwanted extensions. Finally, if the extension is .log, it opens the file, writes a few header lines to the master log, copies the file into the master log, closes it, and deletes it.

Instead of using finddepth(), which does a depth-first search of the directories and visits them from the bottom up, we could have used find(), which does the same depth-first search from the top down. As a side note, the program writes the master log file with the extension .lgm, then renames it at the end to have the extension .log, so as to avoid the possibility of writing the master log into itself if the program is searching the current directory.

Conclusion

That's it. Sure, there's a lot more that you could do with these examples, including adding error checking, generating additional statistics, producing help text, etc. To learn how to do this, find a copy of Programming Perl, 3rd Edition, by Larry Wall, Tom Christiansen, and Jon Orwant. This is the bible (or the Camel, rather) of the Perl community, and well worth the read. Good luck!

How We Wrote the Template Toolkit Book ...

There are a number of tools available for writing books. Many people would immediately reach for their favorite word processor, but having written one book using Microsoft Word I'm very unlikely to repeat the experience. Darren Chamberlain, Andy Wardley, and I are all Perl hackers, so when we got together to write Perl Template Toolkit, it didn't take us long to agree that we wanted to write it using POD (Plain Old Documentation).

Of course, any chosen format has its pros and cons. With POD we had all the advantages of working with plain text files and all of the existing POD tools were available to convert our text into various other formats, but there were also some disadvantages. These largely stem from the way that books (especially technical books) are written. Authors rarely write the chapters in the order in which they are published in the finished book. In fact, it's very common for the chapters to rearranged a few times before the book is published.

Now this poses a problem with internal references. It's all very well saying "see chapter Six for further details", but when the book is rearranged and Chapter Six becomes Chapter Four, all of these references are broken. Most word processors will allow you to insert these references as "tags" that get expanded (correctly) as the document is printed. POD and emacs doesn't support this functionality.

Another common problem with technical books is the discrepancy between the code listings in the book and the code that actually got run to produce the output shown. It's easily done. You create an example program and cut-and-paste the code into the document. You then find a subtle bug in the code and fix it in the version that you're running but forget to fix it in the book. What would be really useful would be if you could just use tags saying "insert this program file here" and even "insert the output of running the program here". That's functionality that no word processor offers.

Of course, these shortcomings would be simple to solve if you had a powerful templating system at the ready. Luckily Andy, Darren, and I had the Template Toolkit (TT) handy.

The Book Templates

We produced a series of templates that controlled the book's structure and a Perl program that pulled together each chapter into a single POD file. This program was very similar to the tpage program that comes with TT, but was specialized for our requirements.

Separating Code from Code

There was one problem we had to address very early on with our book templates. This was the problem of listing TT code within a TT template. We needed a way to distinguish the template directives we were using to produce the book from the template directives we were demonstrating in the book.

Of course TT provides a simple way to achieve this. You can define the characters that TT uses to recognize template directives. By default it looks for [% ... %], but there are a number of predefined groups of tags that you can turn on using the TAGS directive. All of our book templates started with the line:

  [% TAGS star %]

When it sees this directive, the TT parser starts to look for template directives that are delimited with [* ... *]. The default delimiters ([% ... %]) are treated as plain text and passed through unaltered. Therefore, by using this directive we can use [% ... %] in our example code and [* ... *] for the template directives that we wanted TT to process.

Of course, the page where we introduced the TAGS directive and gave examples of its usage was still a little complex.

In the rest of this article, I'll go back to using the [% ... %] style of tags.

Useful Blocks and Macros

We defined a number of useful blocks and macros that expanded to useful phrases that would be used throughout the book. For example:

  [% TT = 'Template Toolkit';

     versions = {
       stable = '2.10'
       developer = '2.10a'
     } %]

The first of these must have saved each of us many hours of typing time and the second gave us an easy way to keep the text up-to-date if Andy released a new version of TT while we were writing the book. A template using these variables might look like this:

  The current stable version of the [% TT %] is [% stable %]

Keeping Track of Chapters

We used a slightly more complex set of variables and macros to solve the problem of keeping chapter references consistent. First we defined an array that contained details of the chapters (in the current order):

  Chapters = [
    {  name  = 'intro'
       title = "Introduction to the Template Toolkit"
    }
    {  name  = 'web'
       title = "A Simple Web Site"
    }
    {  name  = 'language'
       title = "The Template Language"
    }
    {  name  = 'directives'
       title = "Template Directives"
    }
    {  name  = 'filters'
       title = "Filters"
    }
    {  name  = 'plugins'
       title = "Plugins"
    }
    ... etc ...
   ]

Each entry in this array is a hash with two keys. The name is the name of the directory in our source tree that contains that chapter's files and the title is the human-readable name of the chapter.

The next step is to convert this into a hash so that we can look up the details of a chapter when given its symbolic name.

    FOREACH c = Chapters;
      c.number = loop.count;
      Chapter.${c.name} = c;
    END;

Notice that we are adding a new key to the hash that describes a chapter. We use the loop.count variable to set the chapter number. This means that we can reorder our original Chapters array and the chapter numbers in the Chapter hash will always remain accurate.

Using this hash, it's now simple to create a macro that lets us reference chapters. It looks like this:

  MACRO chref(id) BLOCK;
    THROW chapter "invalid chapter id: $id"
      UNLESS (c = Chapter.$id);
    seen = global.chapter.$id;
    global.chapter.$id = 1;
    seen ? "Chapter $c.number"
         : "Chapter $c.number, I<$c.title>";
  END;

The macro takes one argument, which is the id of the chapter (this is the unique name from the original array). If this chapter doesn't exist in the Chapter hash then the macro throws an error. If the chapter exists in the hash then the macro displays a reference to the chapter. Notice that we remember when we have seen a particular chapter (using global.chapter.$id) -- this is because O'Reilly's style guide says that a chapter is referenced differently the first time it is mentioned in another chapter. The first time, it is referenced as "Chapter 2, A Simple Web Site", and on subsequent references it is simply called "Chapter 2. "

So with this mechanism in place, we can have templates that say things like this:

  Plugins are covered in more detail in [% chref(plugins) %].

And TT will convert that to:

  Plugins are covered in more detail in Chapter 6, I<Plugins>.

And if we subsequently reorder the book again, the chapter number will be replaced with the new correct number.

Running Example Code

The other problem I mentioned above is that of ensuring that sample code and its output remain in step. The solution to this problem is a great example of the power of TT.

The macro that inserts an example piece of code looks like this:

  MACRO example(file, title) BLOCK;
    global.example = global.example + 1;
    INCLUDE example
      title = title or "F<$file>"
      id    = "$chapter.id/example/$file"
      file  = "example/$file"
      n     = global.example;
    global.exref.$file = global.example;
  END;

The macro takes two arguments, the name of the file containing the example code and (optionally) a title for the example. If the title is omitted then the filename is used in its place. All of the examples in a particular chapter are numbered sequentially and the global.example variable holds the last used value, which we increment. The macro then works out the path of the example file (the structure of our directory tree is very strict) and INCLUDEs a template called example, passing it various information about the example file. After processing the example, we store the number that is associated with this example by storing it in the hash global.exref.$file.

The example template looks like this:

[% IF publishing -%] =begin example [% title %]

      Z<[% id %]>[% INSERT $file FILTER indent(4) +%]

  =end
  [% ELSE -%]
  B<Example [% n %]: [% title %]>

  [% INSERT $file FILTER indent(4) +%]

[% END -%]

This template looks at a global flag called publishing, which determines if we are processing this file for submission to O'Reilly or just for our own internal use. The Z< ... > POD escape is an O'Reilly extension used to identify the destination of a link anchor (we'll see the link itself later on). Having worked out how to label the example, the template simply inserts it and indents it by four spaces.

This template is used within our chapter template by adding code like [% example('xpath', 'Processing XML with XPath') %] to your document. That will be expanded to something like, "Example 2: Processing XML with Xpath," followed by the source of the example file, xpath.

All of that gets the example code into that document. We now have to do two other things. We need to be able to reference the code from the text of the chapter ('As example 3 demonstrates...'), and we also need to include the results of running the code.

For the first of these there is a macro called exref, which is shown below:

  MACRO exref(file) BLOCK;
    # may be a forward reference to next example
    SET n = global.example + 1
      UNLESS (n = global.exref.$file);
    INCLUDE exref
      id    = "$chapter.id/example/$file";
  END;

This works in conjunction with another template, also called exref.

  [% IF publishing -%]
  A<[% id %]>
  [%- ELSE -%]
  example [% n %]
  [%- END -%]

The clever thing about this is that you can use it before you have included the example code. So you can do things like:

  This is demonstrated in [% exref('xpath') %].

  [% example('xpath', 'Processing XML with XPath') %]

As long as you only look at a maximum of one example ahead, it still works. Notice that the A< ... > POD escape is another O'Reilly extension that marks a link anchor. So within the O'Reilly publishing system it's the A<foo> and the associated Z<foo> that make the link between the reference and the actual example code.

The final thing we need is to be able to run the example code and insert the output into the document. For this we defined a macro called output.

  MACRO output(file) BLOCK;
    n = global.example;
    "B<Output of example $n:>\n\n";
    INCLUDE "example/$file" FILTER indent(4);
  END;

This is pretty simple. The macro is passed the name of the example file. It assumes that this is the most recent example included in the document so it gets the example number from global.example. It then displays a header and INCLUDEs the file. Notice that the major difference between example and output is that example uses INSERT to just insert the file's contents, whereas output uses INCLUDE, which loads the file and processes it.

With all of these macros and templates, we can now have example code in our document and be sure that the output we show really reflects the output that you would get by running that code. So we can put something like this in the document:

  The use of GET and SET is demonstrated in [% exref('get_set') %].

  [% example('get_set', 'GET and SET') %]

  [% output('get_set') %]

And that will be expanded to the following.

  The use of GET and SET is demonstrated in example 1.

  B<Example 1: GET and SET>

      [% SET foo = 'bar -%]
      The variable foo is set to "[% GET foo %]".

  B<Output of example 1:

      The variable foo is set to "bar".

As another bonus, all of the example code is neatly packaged away in individual files that can easily be made into a tarball for distribution from the book's web site.

Other Templates, Blocks, and Macros

Once we started creating these timesaving templates, we found a huge numbers of areas where we could make our lives easier. We had macros that inserted references to other books in a standard manner, macros for inserting figures and screenshots, as well as templates that ensured that all our chapters had the same standard structure and warned us if any of the necessary sections were missing. I'm convinced that the TT templates we wrote for the project saved us all a tremendous amount of time that would have otherwise been spent organizing and reorganizing the work of the three authors. I would really recommend a similar approach to other authors.

The Template Toolkit is often seen as a tool for building web sites, but we have successfully demonstrated one more non-Web area where the Template Toolkit excels.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02