A Refactoring Example
by Michael Schwern
|
Pages: 1, 2, 3, 4, 5
Improve Overly Generic Names
Continuing with our variable name improvements, we're left with the
last few unimproved names. Let's start with $line.
Since we can see clearly that $line = <INPUT>, calling the variable
'line' tells us nothing new. A better name might be what each line
contains. Looking at how the line is used we see $pages[$page_num]
= $line and how that is then used in the SQL. It's a page id.
But it doesn't make much sense to put a page id into an array called
@pages. It doesn't contain pages, it contains @page_ids.
What about $page_num? It doesn't contain a page number, it contains
the line number of the file we're reading in. Or more conventionally,
an $index or $idx.
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data <= ? AND
data >= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($page_id=<INPUT>){
$idx++;
chop($page_id);
$page_ids[$idx] = $page_id;
$hits_upto_sth->execute($page_ids[$idx], $start_date);
$hits_upto[$idx] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $today);
$hits_today[$idx] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $yesterday);
$hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($page_ids[$idx], $end_of_30_days,
$start_of_30_days);
$hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
Fixing Odd Interfaces
What's wrong with this picture?
$hits_range_sth->execute($page_ids[$idx], $end_of_30_days,
$start_of_30_days);
Isn't it a little odd to specify a date range with the end first? Sure is. It also guarantees someone is going to get it backwards. Reverse it. Don't forget the SQL, too.
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ? AND
data <= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($page_id=<INPUT>){
$idx++;
chop($page_id);
$page_ids[$idx] = $page_id;
$hits_upto_sth->execute($page_ids[$idx], $start_date);
$hits_upto[$idx] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $today);
$hits_today[$idx] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $yesterday);
$hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
$end_of_30_days);
$hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
s/chop/chomp/
Now that we've stared at the code for a while, you might have noticed
the use of chop(). Using chop() to strip a newline is asking
for portability problems, so let's fix it by using chomp().
Technically this isn't a refactoring since we altered the behavior
of the code by fixing the bug. But using chop() where you meant
chomp() is such a common mistake we'll make it an honorary
refactoring.
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ? AND
data <= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($page_id=<INPUT>){
$idx++;
chomp($page_id);
$page_ids[$idx] = $page_id;
$hits_upto_sth->execute($page_ids[$idx], $start_date);
$hits_upto[$idx] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $today);
$hits_today[$idx] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $yesterday);
$hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
$end_of_30_days,);
$hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
Collect Related Variables into Hashes
The common prefix hits_ is a dead giveaway that much of the data in
this code is related. Related variables should be grouped together
into a single structure, probably a hash to make the relation obvious
and allow them to be passed around to subroutines more easily. Its
easier to move around one hash than four arrays.
I've decided to collect together all the @hit_ arrays into a single
hash %hits since they'll probably be used together parts of the
program. If this code snippet represents a function it means I can
return one hash reference rather than four array refs. It also makes
future expansion easier, rather than returning an additional array it
simply becomes another key in the hash.
Before.
$hits{upto}[$idx] = $hits_upto_sth->fetchrow_array();
After.
$hits_upto[$idx] = $hits_upto_sth->fetchrow_array();
It's interesting to note what a small, natural change this is. Circumstantial evidence that this is a good refactoring.
As before, since these arrays are global data, we must be sure to change them everywhere. This includes the tests.
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ? AND
data <= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($page_id=<INPUT>){
$idx++;
chomp($page_id);
$page_ids[$idx] = $page_id;
$hits_upto_sth->execute($page_ids[$idx], $start_date);
$hits{upto}[$idx] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $today);
$hits{today}[$idx] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($page_ids[$idx], $yesterday);
$hits{yesterday}[$idx] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
$end_of_30_days,);
$hits{monthly}[$idx] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
When Not to Refactor
The statement handles are also related, but I'm not going to collect them together into a hash. The statement handles are short-lived lexicals, they're never likely to be passed around. Their short scope and grouping within the code makes their relationship obvious. The design would not be improved by the refactoring.
Refactoring is not a set of rules to be slavishly followed, it's a collection of tools. And like any other tool you must carefully consider when and when not to use it. Since collecting the statement handles together doesn't improve the design, I won't do it.

