November 2006 Archives

Advanced HTML::Template: Filters

The CPAN module HTML::Template is a very simple, yet extremely useful module to achieve true separation of presentation and logic when programming CGI scripts. The basic idea is that, rather than having print statements scattered all through your code (the "classic" CGI approach), or mixing logic in with HTML (as in JSP, ASP, and Perl Mason), you maintain two files. One is the actual Perl script containing the business logic and the other one is a template file, containing exclusively presentation layer statements.

The template is straight-up HTML, augmented by a small set of special tags. The templating system replaces these tags by dynamic content. The dynamic content comes from a Perl data structure that you have built in your code.

Here is a minimal template and the corresponding script:

<html>
  <head></head>
  <body>
    <h1><TMPL_VAR NAME=title></h1>
    
    <ul>
      <TMPL_LOOP NAME=rows>
        <li><TMPL_VAR NAME=item>
      </TMPL_LOOP>
    </ul>
  </body>
</html>
#!/usr/bin/perl -w
use HTML::Template;

# Create the template object
my $tpl = HTML::Template->new( filename => 'tmpl1.tpl' );

# Set the parameter values
$tpl->param( title => "Useful Books" );
$tpl->param( rows => [ { item => "Learning Perl" },
                       { item => "Programming Perl" },
                       { item => "Perl Cookbook" } ] );

# Print
print "Content-Type: text/html\n\n", $tpl->output;

That's really all there is to it.

As Simple as Possible, but Not Simpler?

One of the nice things about HTML::Template is that doesn't try to do too much. It is a templating system for HTML, period. That keeps it simple and avoids complementary features, which could get in the way.

In this spirit, the module only offers a very, very limited set of tags, and no obvious way to extend this selection. Here is the complete list:

  • <TMPL_VAR>
  • <TMPL_LOOP>
  • <TMPL_IF>
  • <TMPL_ELSE>
  • <TMPL_UNLESS>
  • <TMPL_INLCUDE>

All of these tags do pretty much what you might expect: <TMPL_VAR> expands to dynamic text; <TMPL_LOOP> loops over an array; <TMPL_IF>, <TMPL_ELSE>, and <TMPL_UNLESS> provide a mechanism for conditional display; and <TMPL_INLCUDE> allows you to include pieces of other documents (such as a shared page header).

Most of the time, this is precisely what you want. All the facilities are there to control the display, but there is no danger in mixing general-purpose code in with the HTML template.

Yet, sometimes things can become unnecessarily clumsy--in particular, concerning the conditional display of information. A typical example concerns optional information. Suppose that you want to display a list of library books with their due dates, but should display the due date only if the book is currently checked out. (This is a surprisingly common pattern. Think of items with an optional sales price, accounts with an optional bad account status, a flight schedule with optional delay information, etc.)

Sometimes you can simply leave the corresponding variable blank:

<ul>
  <TMPL_LOOP NAME=books>
    <li><TMPL_VAR NAME=title>   <TMPL_VAR NAME=duedate>
  </TMPL_LOOP>
</ul>

and use the code:

$tpl->param( books => [ { title => 'Learning Perl', duedate => '' },
                        { title => 'Programming Perl', duedate => '29. Feb. 2008' } ] );

This will work. There will be no output for the due date if there is no due date. However, what if you want to have some additional text, in addition to the actual due date? That template might resemble:

<ul>
  <TMPL_LOOP NAME=books>
    <li><TMPL_VAR NAME=title>   
        <TMPL_IF NAME=duedate>
          Date due: <TMPL_VAR NAME=duedate>
        </TMPL_IF>
  </TMPL_LOOP>
</ul>

Do this a lot, and you will start looking for a better solution! (The main problem here is not actually the additional typing, but the fact that the template itself is becoming increasingly unwieldy and its structure harder to follow.)

One solution is to put the additional text in code:

$tpl->param( books => [ { title => 'Learning Perl', duedate => '' },
                        { title => 'Programming Perl', duedate => 'Date due: 29. Feb. 2008' } ] );

However, this is clumsy: it breaks the separation of presentation and behavior, as there is now strictly presentational material such as "Date due: " in the Perl code, and it makes the parameter less generally useful. If you want the plain date somewhere else in the template, it now comes bundled with a string that you may not want.

Wouldn't it be nice if you could bundle the entire optional part of each list entry ("encapsulate it," if you will) and simply call it by name? To put it in other words, what if you had the ability to define a custom tag?

Enabling Custom Tags Using Filters

This is where filters come in. A filter is a subroutine called on the template text before tag substitution takes place. A filter can do anything. In particular, it can modify the template programmatically.

A filter is precisely the hook into the template processing that is necessary to enable custom tags. Write your template in terms of standard template tags and your own custom tags. Then provide a filter (or filters) to replace these custom tags with the appropriate combinations of standard tags. The HTML::Template "engine" then does the final substitutions and renders the ultimate output.

For the library example, suppose that you have chosen to use a custom tag <CSTM_DUEDATE> to display the optional due date. This makes the template look very clean and simple:

<ul>
  <TMPL_LOOP NAME=books>
    <li><TMPL_VAR NAME=title>   <CSTM_DUEDATE>
  </TMPL_LOOP>
</ul>

The code to set the template parameter values does not change, but you now need to define the filter (and note that you need to escape the slash in the </TMPL_IF>):

sub duedate_filter {
  my $text_ref = shift;
  $$text_ref =~ s/<CSTM_DUEDATE>/<TMPL_IF NAME=duedate>Date due: <TMPL_VAR NAME=duedate><\/TMPL_IF>/g;
}

Finally, you need to register this filter, so that the module will call it before tag substitution takes place. Filters are options to the template object, so registration takes place when you construct the template:

$tpl = HTML::Template->new( filename => 'books.tpl', 
                            filter => \&duedate_filter );

That's the basic idea. If you wanted to, you could register a sequential collection of filters (one filter per custom tag?) or you can put all required substitutions into a single routine. I can certainly perceive of the possibility of developing reusable "tag libraries" as Perl modules with filter definitions. Interestingly (and in contrast to JSP), the template files themselves do not need to know about the definitions and behavior of the "custom tags." I think this is exactly right: the templates are plain ("dumb") text files, which define the presentation layer. All the behavior is in the Perl code, where it belongs.

Data-Sensitive Displays and Data Filters

Consider another example application. Rather than dealing with optional information, this one uses conditional formatting.

Assume that you have a list of accounts, and each account has a various state: "Good," "Bad," and "Canceled." You would like to display the account number in a different color, depending on the state: green for "Good," yellow for "Bad," and red for "Canceled."

If you tried to do this using <TMPL_IF> tags, this would be a real mess, for two reasons: You have more than two choices, so that we can't use a <TMPL_IF>...<TMPL_ELSE>...</TMPL_IF> construct; instead you would have to use a sequence of individual conditionals: <TMPL_IF>...</TMPL_IF><TMPL_IF>...</TMPL_IF><TMPL_IF>...</TMPL_IF>. Moreover, <TMPL_IF> does not allow for arbitrary conditional expressions (such as $status eq 'good'). All it does is test a Boolean variable. The classic approach is to introduce three Boolean dummy variables: $isGood, $isBad, and $isCancelled to use in these tests. There's certainly a better way!

Instead, introduce a custom tag that allows for configurable formatting:

<CSTM_SPAN NAME=...>

Here is the associated filter:

sub span_filter {
  my $text_ref = shift;
  $$text_ref =~ s/<CSTM_SPAN\s+NAME=(.*?)\s*>
                 /<span class='<TMPL_VAR NAME=$1_class>'>
                    <TMPL_VAR NAME=$1>
                  <\/span>
                 /gx;
}

If you use this custom tag in your template as <CSTM_SPAN NAME=account>, then the template in the intermediate state, after filter application, but before template parameter substitution, will look like:

<span class='<TMPL_VAR NAME=account_class>'>
  <TMPL_VAR NAME=account>
</span>

The original single custom tag has expanded into an HTML <span> tag wrapping the actual dynamic content. The <span> defines a CSS class to provide formatting to the dynamic content. The name of the CSS class itself is dynamic, too--so that you can select the classes .bad or .good depending on the value of the account status. The name of the CSS class comes from the name of the displayed parameter. If this class is defined in a stylesheet or within the HTML document itself, a CSS-capable client will apply it to the dynamic text as expected.

In other words, the innocuous <CSTM_SPAN> tag actually requires two parameters: one containing the text to display and the other specifying the CSS class to apply. All you need to do is to set the account_class parameter to the appropriate value in Perl code (and, of course, define the CSS classes somewhere in the stylesheet.)

Here is a slick way to stay in control doing this. You can call the param() function either with individual name/value-pairs (as I have shown before), or with a hash-ref: $tpl->param( { key1 => value1, key2 => value2 } ). In other words, rather than calling param() repeatedly throughout your code, setting individual parameters, you can build up one large hash, containing all parameters, and then pass it to the template in a single call.

Now you can apply the filter trick that worked so well for the template itself, on the parameters as well! In other words, before calling $tpl->param( \%parameter_hash), pass the hash to a subroutine, which performs data filtering operations on the parameters. In this case, it adds the CSS class appropriate for the account status for each account in the data structure.

This approach centralizes all data-dependent display decisions in a single subroutine. If you add further status codes, or if you want to change the display class, there is only a single location to edit.

Here is a Perl program demonstrating these concepts:

#!/usr/bin/perl -w
use HTML::Template;

# Create the template object
my $tpl = HTML::Template->new( filename => 'tmpl4.tpl',
                               filter => \&span_filter );

# Build a data structure containing all the parameters
my %params = ( title => "Accounts",
               accounts => [ { account => 'Good' }, { account => 'Bad' } ] );

apply_display_logic( \%params );

# Set all parameters for the template at once
$tpl->param( \%params );

# Print
print "Content-Type: text/html\n\n", $tpl->output;

# ---

sub span_filter {
  my $text_ref = shift;
  $$text_ref =~ s/<CSTM_SPAN\s+NAME=(.*?)\s*>
                 /<span class='<TMPL_VAR NAME=$1_class>'>
                    <TMPL_VAR NAME=$1>
                  <\/span>
                 /gx;
}

sub apply_display_logic {
  my $hash_ref = shift;

  my %account_classes = ( Good => 'good', Bad => 'bad' );

  foreach my $acc ( @{ $hash_ref->{ accounts } } ) {
    $acc->{account_class} = $account_classes{ $acc->{account} };
  }
}

#%@$! Happens

When I first explained this idea to a colleague, his reaction was: "Great. You go down that path and you are going to end up with JSP. Is that what you want?" To which my response was: "Well, fertilizer happens."

GUI development is a pain. Always. You cannot avoid all issues regarding the separation of presentation and logic in GUI development. The challenge in my opinion is therefore not to find the ideal web development framework, but the optimal one for the task at hand.

HTML::Template is very, very good for rather simple, minimum fuss, straightforward websites (of which there are significantly more than anyone might want to admit). It is particularly well-suited for reports and stats. In this application domain, the ability to the define custom tags to take care of formatting and special presentation issues is a distinct advantage.

Conclusion

In the second part of this series, I will take these ideas a step further and explore how to use HTML::Template to create GUI-like widgets in a web/CGI situation.

Hash Crash Course

When I teach about hashes, I do what most Perl tutors and tutorials do: I introduce the hash as a "dictionary": a mapping between one thing and another. The classic example, for instance, is to have a set of English words mapped to French words:

    %french = (
        apple  => "pomme",
        pear   => "poivre",
        orange => "Leon Brocard"
    );

Yet the more I look at my code--and more often, the more I look at how to tidy up other people's code--I realize that this is perhaps the least common use of a hash. Much more often, I use hashes in particular idioms which have very little in common with this concept of a mapping. It's interesting to consider the ways that programmers actually use hashes in Perl code.

Counting

Many of the uses of hashes are to "answer questions about lists." When you have an array or list of values and you need to ask about its properties, you will often find yourself using a hash. Start simply by counting the number of particular elements in a list. Here's the naïve approach:

    my $count = 0;
    for (@list) {
        $count++ if $_ eq "apple";
    }

You can smarten this up with the use of the grep function:

    $count = grep $_ eq "apple", @list;

... but when you need the number of pears in the list, then you have to do the same again:

    $count_apples = grep $_ eq "apple", @list;
    $count_pears  = grep $_ eq "pear",  @list;

Now there are two passes over the list, and the situation isn't going to get any prettier from here. What you want is basically a histogram of the data, and you can get that with a hash:

    my %histogram;
    $histogram{$_}++ for @list;

This hash associates each individual item with its count, and it only traverses the list once.

In a recent case, I was looking at a list of tags associated with various photographs. To lay out the data for display, it was useful to know how many different tags there are in the list. I could get that simply from the number of keys in the histogram:

    $unique = keys %histogram;

I could also delete the duplicates and end up with a list of the unique tags:

    @unique = keys %histogram;

Finally, I could show the five most popular tags from the list:

    @popular = 
        (sort { $histogram{$b} <=> $histogram{$b} } @unique)[0..4];

This sorts the list of unique tags based on their popularity in the original list, and pulls out the top five.

Uniqueness

Another idiom to get the unique elements from a list is a variation on the "counting things" idiom; but this time you don't care how many of each item there is, just that there's at least one. This allows you to say:

    for (@list) { $unique{$_} = 1 }
    @unique = keys %unique;

which reduces to:

    %unique = map { $_ => 1 } @list;
    @unique = keys %unique;

You can take this a stage further with a slightly non-standard idiom to do the whole thing in one operation:

    @unique = keys %{{ map { $_ => 1 } @list }};

keys requires a hash, so this funny-looking line creates an anonymous hash ({ map ... @list }) and then turns it into a real hash (%{ $hash_ref }) to feed it to keys.

The advantage of this trick is that you can use a variation of it to work with lists which include objects as well as ordinary scalars. Here's an example of the problem:

    my @tags;
    push @tags, Memories::Tag->retrieve_random for 1..10;

This should get ten random Memories::Tag objects. However, some of them might be the same, and, assuming that you want to see the unique ones:

    %unique = map { $_ => 1 } @tags;
    @unique = keys %unique;

Unfortunately, if you then try to do anything with the tags, it all goes horribly wrong:

    print $unique[0]->name;

    # Can't locate object method "test" via package "Memories::Tag(0x1801380)"

What's happened is that hash keys can only be strings; once you put the object into a hash as the key, then it's not an object any more. Perl turns it into a string. There's no (easy) way to get the strings back into an object. What you need is some kind of data structure to map these broken strings into the original objects. Thankfully, there is one in the form of the hash itself! So the code becomes:

    %unique = map { $_ => $_ } @tags;
    @unique = values %unique;

First, map the object--which Perl will smash into a string--with another copy of the object, which will retain its object-ness. Then instead of getting the strings out of the hash, which are useless to us, you get the objects.

Of course, this doesn't quite work, because even if retrieve_random retrieves the same tag name twice, it will return two different objects with the same data. (That's true unless the underlying database abstraction layer uses caching, which is another good use for a hash.)

The solution is to make the list unique based on a property of the tag, such as its name. This time, the code is:

    %unique = map { $_->name => $_ } @tags;
    @unique = values %unique;

Okay, now you have a list of objects which you've retrieved randomly, but which have unique names. The only problem is that you retrieved ten of them to start with, and then whittled the list down to get rid of duplicates. What if you actually wanted 10?

The solution is to keep track of ones that you've already seen, so that you don't create any duplicates in the first place. The question "Have I seen this before?" is easy to answer using another hash-based idiom:

    my @tags;
    my %seen;
    while (@tags < 10) {
        my $candidate = Memories::Tag->retrieve_random;
        next if $seen{$candidate->name}++;
        push @tags, $candidate;
    }

Once again you're collecting unique values by putting them into a hash. Suppose that the first candidate tag that comes along is japan. At this point the hash is empty, and so $seen{japan} is zero. Because it's zero, the code continues through the loop with next, but still increments the hash value. Now $seen{japan} stands at 1, and the tag goes into the list. If japan comes past again, $seen{japan} will be positive so the next code will activate, and the tag will not go into the list again. (Don't try this if you have fewer than 10 distinct tags, of course!)

Caching

A special case of "have I seen this before?" comes when you want to create a cache: if I have seen this before, what was the answer I saw last time? Here's how to do that.

I mentioned that the underlying database abstraction layer in the tags example might cache look-ups and return the same object if you requested the same tag multiple times:

    my %cache;

    sub retrieve {
        my ($self, $id) = @_;
        return $cache{$id} if exists $cache{$id};
        return $cache{$id} = $self->_hard_retrieve($id);
    }

This checks to see if it has seen the $id before; if so have, return the value from the hash. If not, work out what the value should be, then store it in the hash for next time, then return the whole lot.

Of course, for heavy-duty applications like a database abstraction layer, you need to do a little more work, such as pruning the cache to make sure it doesn't get out of date or get so full of data that it eats all your memory. The Cache::Cache suite of modules from CPAN takes care of all of this for you, and the core Memoize module adds this kind of caching to a function without you specifically having to write the cache-getting and -setting code.

Searching

Finally, you can use hashes for searching.

If you've done any computer science, you'll probably know about a couple of common searching algorithms. (Even if you haven't, they're so common that you've probably reinvented them without knowing it.) One is linear search, where you start at the beginning of a list and work your way toward the end, looking for the target item:

    my $index;
    for $index (0..@chambers) {
        last if $chambers[$index] == $bullet;
    }
    print "Found at index $index" if $index < @chambers;

Another is binary search, where you start in the middle of a sorted list, and work out whether you should go higher or lower than the current element, like players in some demented version of Card Sharks. This is basically the way to look up names in a phone book--it's faster, but a bit more complicated to implement:

    my @names = qw(Able Baker Charlie Dog ...);
    my $index;

    my $target = "Roger";

    sub search {
        my ($lower, $upper) = (0, $#names);
        while ($lower <= $upper) {
            my $index = ($lower + $upper) / 2;
            if ($names[$index] lt $target) {
                $lower = $index + 1;
            } elsif ($names[$index] gt $target) {
                $upper = $index--1;
            } else { return $index }
        }
        # Not found!
    }

This starts in the middle, at "Mike". "Go higher!" shouts the crowd, so it goes half-way between there and the end, to "Tare". Then it needs to go lower, so looks half-way between "Mike" and "Tare" and gets to "Peter". Now it needs to go higher again, between "Peter" and "Tare", and finds "Roger".

That takes four comparisons, which is not bad. Can you do it any better? How about... oh, none?

    my %search = map { $names[$_] => $_ } 0..$#names;
    print $names{"Roger"};

Now I'm cheating somewhat here because setting up the hash requires iterating over the whole array, but once you've done that, the searches are basically free.

Of course, if you need to look up an array index by its contents, then maybe you're doing something wrong in the first place, and should have used a hash to start with. Consider, for instance, in a configuration file:

    force-v3-sigs
    escape-from-lines
    lock-once
    load-extension rndlinux
    keyserver wwwkeys.eu.pgp.net
    keyserver the.earth.li

(Yes, that's from GnuPG.) Each line has a key and optionally a value, but you can have multiple values for each key. At the end of the day, you might want to look through and say "tell me all the keyservers"--but not just the keyservers, you want to read the whole config in and be able to say that about any key. That's a search problem, and a tricky one. Here's one solution:

    while (<>) {
        chomp;
        my ($key, $value) = split /\s+/, $_, 2;
        push @config, [$key, $value];
    }

Now you have to say:

    @keyservers = map { $_->[1] } grep { $_->[0] eq "keyserver" } @config;

This is actually a linear search in disguise. (grep does a linear search for you.)

With hashes, the issue is much simpler:

    while (<>) {
        chomp;
        my ($key, $value) = split /\s+/, $_, 2;
        push @{$config{$key}}, $value;
    }

This treats every key as a separate array reference, and pushes values into that array. Now to retrieve the list of keyservers, just look up the array inside the hash:

    @keyservers = @{$config{"keyservers"}};

This process is very remniscient of the final pattern--using a hash as a portable symbol table.

Dispatch Tables

Instead of having the configuration reader create a bunch of arrays--@keyservers, @load_extension and so on--I created a hash which held the arrays so as to look them up indirectly but more efficiently. In effect, instead of using the Perl symbol table, you can use a hash as a portable symbol table.

Suppose you have a script that does several related things: it manages your to-do list by adding, editing, listing, and deleting to-do items:

    % todo add "Email Samuel about photos"
    Todo item 129 created
    % todo done 129
    Item 129 marked as done

You might expect the script to look like:

    my $command = shift @ARGV;
    if    ($command eq "add")  { add(@ARGV)  }
    elsif ($command eq "list") { list(@ARGV) }
    elsif ($command eq "done") { done(@ARGV) }
    elsif ($command eq "edit") { edit(@ARGV) }
    ...
    else { die "Unknown command: $command" }

That is quite tedious; you need to edit the program in several places every time you add a new command. You could use symbolic references--that is, tell Perl to call a function named $command:

    my $command = shift @ARGV;
    sub AUTOLOAD { die "Unknown command: $AUTOLOAD" }
    no strict 'refs';
    &{$command}(@ARGV);

But that's somewhat crazy. It allows the user to get at any subroutine in the main package, which you may not want, and to keep any error checking you have to assume that any undefined subroutine call comes from the command line.

The middle way is to copy the commands into a hash, mapped to a function reference:

    %commands = (
        add  => \&add,
        list => \&list,
        edit => \&edit,
        done => \&done,
    );
    my $command = shift @ARGV;
    if (!exists $commands{$command}) { die "Unknown command: $command" }
    $commands{$command}->(@ARGV);

This keeps strict happy, it's safe in the way it restricts what subroutines users can call, and it allows for error checking that doesn't mess everything else up. Mark Jason Dominus' Higher Order Perl shows how you can define commands at runtime if you use dispatch tables, something you can't do if you hard-code your dispatch.

Conclusion

I've explored some of the most common hash-based patterns: using hashes for counting, uniqueness, searching, and dispatch--rather a lot more than just mapping from one thing to another. Of course, that is what a hash does at one level, but the uses of such a data structure are a lot more diverse than just that.

That's how you improve your Perl programming--you take elements of the language which ostensibly do one thing, and you find that they're great for more complicated uses as well. Maybe after these ideas you'll be able to find a few more hash idioms of your own!

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en