Sign In/My Account | View Cart  
advertisement


Listen Print

How to Avoid Writing Code
by Kake Pugh | Pages: 1, 2

What Does This Get Me?

The immediate benefits of all this are obvious:

  • You don't have to mess about with HTML, since the very simplistic use of the Template Toolkit means that templates are comprehensible to competent web designers.
  • You don't have to maintain classes full of copy-and-paste code, since the repetitive programming tasks like creating constructors and simple accessors are done for you.

A large hidden benefit is testing. Since the actual CGI scripts--which can be a pain to test--are so simple, you can concentrate most of your energy on testing the underlying modules.

It's probably worth writing a couple of simple tests to make sure that you've set up your classes the way you intended to, particularly in your first couple of forays into Class::DBI.

  use Test::More tests => 5;
  use strict;

  use_ok( "Bookworms::Author" );
  use_ok( "Bookworms::Book" );
  my $author = Bookworms::Author->create({ name => "Isaac Asimov" });
  isa_ok( $author, "Bookworms::Author" );
  my $book = Bookworms::Book->create({ title  => "Foundation",
                                       author => $author });
  isa_ok( $book, "Bookworms::Book" );
  is( $book->author->name, "Isaac Asimov", "right author" );

However, the big testing win with this technique of separating out the heavy lifting from the CGI scripts into modules is when you'd like to add something more complicated. Say, for example, fuzzy matching. It's well known that people can't spell, and you'd like someone typing in "Isaac Assimov" to find the author they're looking for. So, let's process the author names as we create the author objects, and store some kind of canonicalized form in the database.

Class::DBI allows you to define "triggers"--methods that are called at given points during the lifetime of an object. We'll want to use an after_create trigger, which is called after an object has been created and stored in the database. We use this in preference to a before_create trigger, since we want to know the uid of the object, and this is only created (via the auto_increment primary key) once the object has been written to the database.

We use Search::InvertedIndex to store the canonicalized names, for quick access. We start with a very simple canonicalization--stripping out vowels and collapsing repeated letters. (I've found that this can pick up about half of name misspellings found in the wild, which is pretty impressive.)

We'll write a couple of tests before we move on to code. Here are some that check that our class is doing what we told it to--removing vowels and collapsing repeated consonants.

  use Test::More tests => 2;
  use strict;

  use Bookworms::Author;

  my $author = Bookworms::Author->create({ name => "Isaac Asimov" });
  my @matches = Bookworms::Author->fuzzy_match( name => "asemov" );
  is_deeply( \@matches, [ $author ], 
    "fuzzy matching catches wrong vowels" );
  @matches = Bookworms::Author->fuzzy_match( 
    name => "assimov" );
  is_deeply( \@matches, [ $author ], 
    "fuzzy matching catches repeated letters" );

We should also write some other tests to run our algorithms over various misspellings that we've captured from actual users, to give an idea of whether "what we told our class to do" is the right thing.

Here's the first addition to the Bookworms::Author class, to store the indexed data:

  use Search::InvertedIndex;

  my $database = Search::InvertedIndex::DB::Mysql->new(
                     -db_name    => "bookworms",
                     -username   => "username",
                     -password   => "password",
                     -hostname   => "",
                     -table_name => "sii_author",
                     -lock_mode  => "EX"
    ) or die "Couldn't set up db";

  my $map = Search::InvertedIndex->new( -database => $database )
    or die "Couldn't set up map";
  $map->add_group( -group => "author_name" );

  __PACKAGE__->add_trigger( after_create => sub {
      my $self = shift;
      my $update = Search::InvertedIndex::Update->new(
          -group => "author_name",
          -index => $self->uid,
          -data  => $self->name,
           -keys  => { map { $self->_canonicalise($_) => 1 }
                       split(/\s+/, $self->name)
                     }
          );
          $map->update( -update => $update );
      }
  } );

  sub _canonicalise {
      my ($class, $word) = @_;
      return "" unless $word;
      $word = lc($word);
      $word =~ s/[aeiou]//g;    # remove vowels
      $word =~ s/(\w)\1+/$1/eg; # collapse doubled 
                                # (or tripled, etc) letters
      return $word;
  }

(We'll also want similar triggers for after_update and after_delete, in order that our indexing is kept up to date with our data.)

Then we can write the fuzzy_matching method:

  sub fuzzy_match {
      my ($class, %args) = @_;
      return () unless $args{name};
      my @terms = map { $class->_canonicalise($_) => 1 }
                        split(/\s+/, $args{name});
      my @leaves;
      foreach my $term (@terms) {
          push @leaves, Search::InvertedIndex::Query::Leaf->new(
              -key   => $term,
              -group => "author_name" );
      }

      my $query = Search::InvertedIndex::Query->new( -logic => 'and',
                                                     -leafs => \@leaves );
      my $result = $map->search( -query => $query );

      my @matches;
      my $num_results = $result->number_of_index_entries || 0;
      if ( $num_results ) {
          for my $i ( 1 .. $num_results ) {
              my ($index, $data) = $result->entry( -number => $i - 1 );
              push @matches, $data;
          }
      }

      return @matches;
  }

(The matching method can be improved. I've found that neither Text::Soundex nor Text::Metaphone are much of an improvement over the simple approach already detailed, but Text::DoubleMetaphone is definitely worth plugging in, to catch misspellings such as Nicolas/Nicholas and Asimov/Azimof.)

There are plenty of other features that our little web application would benefit from, but I shall leave those as an exercise for the reader. I hope I've given you some insight into my current preferred web development techniques--and I'd love to see a finished Bookworms application if it does scratch anyone's itch.

See Also