Hash Crash Course
by Simon Cozens
|
Pages: 1, 2
Caching
A special case of "have I seen this before?" comes when you want to create a cache: if I have seen this before, what was the answer I saw last time? Here's how to do that.
I mentioned that the underlying database abstraction layer in the tags example might cache look-ups and return the same object if you requested the same tag multiple times:
my %cache;
sub retrieve {
my ($self, $id) = @_;
return $cache{$id} if exists $cache{$id};
return $cache{$id} = $self->_hard_retrieve($id);
}
This checks to see if it has seen the $id before; if so have, return the value from the hash. If not, work out what the value should be, then store it in the hash for next time, then return the whole lot.
Of course, for heavy-duty applications like a database abstraction layer, you need to do a little more work, such as pruning the cache to make sure it doesn't get out of date or get so full of data that it eats all your memory. The Cache::Cache suite of modules from CPAN takes care of all of this for you, and the core Memoize module adds this kind of caching to a function without you specifically having to write the cache-getting and -setting code.
Searching
Finally, you can use hashes for searching.
If you've done any computer science, you'll probably know about a couple of common searching algorithms. (Even if you haven't, they're so common that you've probably reinvented them without knowing it.) One is linear search, where you start at the beginning of a list and work your way toward the end, looking for the target item:
my $index;
for $index (0..@chambers) {
last if $chambers[$index] == $bullet;
}
print "Found at index $index" if $index < @chambers;
Another is binary search, where you start in the middle of a sorted list, and work out whether you should go higher or lower than the current element, like players in some demented version of Card Sharks. This is basically the way to look up names in a phone book--it's faster, but a bit more complicated to implement:
my @names = qw(Able Baker Charlie Dog ...);
my $index;
my $target = "Roger";
sub search {
my ($lower, $upper) = (0, $#names);
while ($lower <= $upper) {
my $index = ($lower + $upper) / 2;
if ($names[$index] lt $target) {
$lower = $index + 1;
} elsif ($names[$index] gt $target) {
$upper = $index--1;
} else { return $index }
}
# Not found!
}
This starts in the middle, at "Mike". "Go higher!" shouts the crowd, so it goes half-way between there and the end, to "Tare". Then it needs to go lower, so looks half-way between "Mike" and "Tare" and gets to "Peter". Now it needs to go higher again, between "Peter" and "Tare", and finds "Roger".
That takes four comparisons, which is not bad. Can you do it any better? How about... oh, none?
my %search = map { $names[$_] => $_ } 0..$#names;
print $names{"Roger"};
Now I'm cheating somewhat here because setting up the hash requires iterating over the whole array, but once you've done that, the searches are basically free.
Of course, if you need to look up an array index by its contents, then maybe you're doing something wrong in the first place, and should have used a hash to start with. Consider, for instance, in a configuration file:
force-v3-sigs
escape-from-lines
lock-once
load-extension rndlinux
keyserver wwwkeys.eu.pgp.net
keyserver the.earth.li
(Yes, that's from GnuPG.) Each line has a key and optionally a value, but you can have multiple values for each key. At the end of the day, you might want to look through and say "tell me all the keyservers"--but not just the keyservers, you want to read the whole config in and be able to say that about any key. That's a search problem, and a tricky one. Here's one solution:
while (<>) {
chomp;
my ($key, $value) = split /\s+/, $_, 2;
push @config, [$key, $value];
}
Now you have to say:
@keyservers = map { $_->[1] } grep { $_->[0] eq "keyserver" } @config;
This is actually a linear search in disguise. (grep does a linear search for you.)
With hashes, the issue is much simpler:
while (<>) {
chomp;
my ($key, $value) = split /\s+/, $_, 2;
push @{$config{$key}}, $value;
}
This treats every key as a separate array reference, and pushes values into that array. Now to retrieve the list of keyservers, just look up the array inside the hash:
@keyservers = @{$config{"keyservers"}};
This process is very remniscient of the final pattern--using a hash as a portable symbol table.
Dispatch Tables
Instead of having the configuration reader create a bunch of arrays--@keyservers, @load_extension and so on--I created a hash which held the arrays so as to look them up indirectly but more efficiently. In effect, instead of using the Perl symbol table, you can use a hash as a portable symbol table.
Suppose you have a script that does several related things: it manages your to-do list by adding, editing, listing, and deleting to-do items:
% todo add "Email Samuel about photos"
Todo item 129 created
% todo done 129
Item 129 marked as done
You might expect the script to look like:
my $command = shift @ARGV;
if ($command eq "add") { add(@ARGV) }
elsif ($command eq "list") { list(@ARGV) }
elsif ($command eq "done") { done(@ARGV) }
elsif ($command eq "edit") { edit(@ARGV) }
...
else { die "Unknown command: $command" }
That is quite tedious; you need to edit the program in several places every time you add a new command. You could use symbolic references--that is, tell Perl to call a function named $command:
my $command = shift @ARGV;
sub AUTOLOAD { die "Unknown command: $AUTOLOAD" }
no strict 'refs';
&{$command}(@ARGV);
But that's somewhat crazy. It allows the user to get at any subroutine in the main package, which you may not want, and to keep any error checking you have to assume that any undefined subroutine call comes from the command line.
The middle way is to copy the commands into a hash, mapped to a function reference:
%commands = (
add => \&add,
list => \&list,
edit => \&edit,
done => \&done,
);
my $command = shift @ARGV;
if (!exists $commands{$command}) { die "Unknown command: $command" }
$commands{$command}->(@ARGV);
This keeps strict happy, it's safe in the way it restricts what subroutines users can call, and it allows for error checking that doesn't mess everything else up. Mark Jason Dominus' Higher Order Perl shows how you can define commands at runtime if you use dispatch tables, something you can't do if you hard-code your dispatch.
Conclusion
I've explored some of the most common hash-based patterns: using hashes for counting, uniqueness, searching, and dispatch--rather a lot more than just mapping from one thing to another. Of course, that is what a hash does at one level, but the uses of such a data structure are a lot more diverse than just that.
That's how you improve your Perl programming--you take elements of the language which ostensibly do one thing, and you find that they're great for more complicated uses as well. Maybe after these ideas you'll be able to find a few more hash idioms of your own!
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 8 of 8.
2006-11-23 04:59:56 techcode [Reply]
Hashes are great for getting data from web form into a database - and the other way around.
If you fetch your data as a hash(ref) as described in previous posts - you can simply give that hash to the template engine of your choice (HTML::Template, TemplateToolkit ...) or to say HTML::FillInForm in case of an edit form for existing record.
Simples way is to have same names for fields in the table as in the web form - then you can just pass that hash(ref) around.
You can also get hashref with form fields from CGI.pm (or CGI::Simple) by calling Vars() method.
Of all the things Perl has - hashes are what I mostly use.
- Duplicate...
2006-11-07 07:30:31 MarcChapaux [Reply]
Ooops. I've just seen that Kjetil had already posted a comment on the same subject (hashes as records).
- Hashes and the DBI
2006-11-07 07:28:33 MarcChapaux [Reply]
Hashes can be used when retrieving SQL data. The DBI fetchrow_hashref() method does this automatically.
Thus one can write (assuming for instance that $customer is a statement handle for a SELECT from the CUSTOMER table): $customer->{customer_id}, $customer->{customer_name}, etc.
Using an array is much less readable ($customer->[0], $customer->[5]) and it makes the Perl code dependent on the ordering of columns retrieved by the SELECT statement.
- Typo ?
2006-11-07 07:15:57 MarcChapaux [Reply]
This code
@popular = (sort { $histogram{$b} <=> $histogram{$b} } @unique)[0..4];
should probably be
@popular = (sort { $histogram{$a} <=> $histogram{$b} } @unique)[0..4];
- Hashes as records
2006-11-06 17:07:37 Kjetil [Reply]
In perl, record handling is ofte done with hashes. Espescially in handling data to or from databases. Rows can be read out as hashes, tables as arrays of hashrefs. $person{FirstName}, $person{BirthDate}, $person{Gender} and so on.
- Thanks, and a couple of typos
2006-11-05 15:16:20 pmccann [Reply]
While I don't have a new hash usage to add (sadly), I did want to say thanks for a really well written and useful article.
(A couple of typos seem to have crept in: apologies if formatting's skew here, there's no preview button and no indication of formatting conventions, so I'm guessing plain text!)
$upper = $index--1; # kill one of the minus signs
print $names{"Roger"};
# should be
print $search{"roger"};- Thanks, and a couple of typos
2006-11-06 15:17:12 pmccann [Reply]
Err, "Roger", in the second typo. Sigh...
- Thanks, and a couple of typos



