A Refactoring Example
by Michael Schwern
|
Pages: 1, 2, 3, 4, 5
Fix Conflicting Styles
Now the only difference between the statements is the choice of
data ranges.
Using the variables are passed into each statement we can make some more deductions. Let's have a look...
$startdate$dataoggi$dataieri$fine30gg, $inizio30gg
One of these things is not like the other. What's $startdate
doing there? Everything else is talking about 'data'. What's 'ieri'?
'oggi'? Remember, the programmer who submitted this code is Italian.
Maybe the names are in Italian. Grabbing an Italian-English dictionary we find
out that 'data' is Italian for 'date'! Now it makes sense, this code
was probably originally written in English, then worked on by an
Italian (or vice-versa).
This code has committed a cardinal stylistic sin. It uses two
different languages for naming variables. Not just different
languages, languages which have different meanings for the same words.
Taken out of context, we can't know if $data represents a hunk of
facts or "Thursday."
Since the styles conflict, one of them has to go. Since I don't speak Italian, I'm going to translate it into English.
Pulling out our Italian-to-English dictionary...
- "riga" is "line"
- "pagina" is "page"
- "nump" is probably short for "numero pagina" which is "page number"
- "data" is "date"
- "oggi" is "today"
- "ieri" is "yesterday"
- "inizio" is "start"
- "fine" is "end"
- "gg" is probably short for "giorni" which is "days"
- "fine30gg" would then be "the end of 30 days"
- "inizio30gg" would be "the beginning of 30 days"
It would be a straightforward matter of a bunch of search-and-replaces in any good editor but for one snag, the SQL column 'data.' We'd like to change this to its English 'date', but databases are very global with possibly lots of other programs using it. So we can't change the column name without breaking other code. While in a well-organized programming shop you might have the ability to find all the code which uses your database, we won't assume we have that luxury here. For the moment then, we'll leave that be and deal with it in a separate refactoring.
my $totalvisit_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $totalvisittoday_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $totalmvisit_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data <= ? AND
data >= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($line=<INPUT>){
$page_num++;
chop($line);
$pages[$page_num] = $line;
$totalvisit_sth->execute($pages[$page_num], $start_date);
$totalvisit[$page_num] = $totalvisit_sth->fetchrow_array();
$totalvisittoday_sth->execute($pages[$page_num], $today);
$totalvisittoday[$page_num] = $totalvisittoday_sth->fetchrow_array();
$totalvisittoday_sth->execute($pages[$page_num], $yesterday);
$totalyvisit[$page_num] = $totalvisittoday_sth->fetchrow_array();
$totalmvisit_sth->execute($pages[$page_num], $end_of_30_days,
$start_of_30_days);
$totalmvisit[$page_num] = $totalmvisit_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
Better Names
With decent variable names in place, the purpose of the program becomes much clearer. This is a program to calculate the number of visits to a page for various date ranges. Based on this new information we can give the statement handles and the arrays they put data into better names.
Looking at the SQL we see we've got:
- One to get all the visits up to a single day.
- One to get the visits for a certain date.
- One to get the visits for a range of dates.
A good set of new names would be:
- daily
- up to
- range
Also, Total Visits is too long. We could shorten that to just Visits, or even shorter to Hits.
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data <= ? AND
data >= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($line=<INPUT>){
$page_num++;
chop($line);
$pages[$page_num] = $line;
$hits_upto_sth->execute($pages[$page_num], $start_date);
$totalvisit[$page_num] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($pages[$page_num], $today);
$totalvisittoday[$page_num] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($pages[$page_num], $yesterday);
$totalyvisit[$page_num] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($pages[$page_num], $end_of_30_days,
$start_of_30_days);
$totalmvisit[$page_num] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
Changing Global Variable Names
The array names need work, too. Currently, they're rather ambiguous.
@totalyvisit, what does the y mean? Looking at each variable
name and the variables that got passed to execute() to produce it...
@totalvisitcomes up to a$start_date. So that can be@hits_upto@totalvisittodaycomes from$todayand is pretty obvious.@hits_today@totalyvisitcomes from$yesterdayso 'y' must be for 'yesterday'.@hits_yesterday@totalmvisitcomes from the range produced by the $start_of_30_days and the $end_of_30_days. So 'm' must be 'month'.@hits_monthly
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data >= ?
SQL
my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data = ?
SQL
my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM lognew
WHERE pageid = ? AND
data <= ? AND
data >= ?
SQL
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($line=<INPUT>){
$page_num++;
chop($line);
$pages[$page_num] = $line;
$hits_upto_sth->execute($pages[$page_num], $start_date);
$hits_upto[$page_num] = $hits_upto_sth->fetchrow_array();
$hits_daily_sth->execute($pages[$page_num], $today);
$hits_today[$page_num] = $hits_daily_sth->fetchrow_array();
$hits_daily_sth->execute($pages[$page_num], $yesterday);
$hits_yesterday[$page_num] = $hits_daily_sth->fetchrow_array();
$hits_range_sth->execute($pages[$page_num], $end_of_30_days,
$start_of_30_days);
$hits_monthly[$page_num] = $hits_range_sth->fetchrow_array();
}
close (INPUT);
Test... uh-oh, test failed!
There's something very different about this change compared to the others. The variables we changed were not declared in our little code block. Likely they're used in other parts of the code, such as our test which caused it to break.
In the Real World, we would be sure to replace all occurrences of the variable. The simplest way to do this is to use your editor to perform a search and replace rather than doing it by your all too fallible hands. If it could be used over a set of files, grepping through those files for all occurrences of it and changing those as well would be necessary.
# If you don't have rgrep, grep -r does the same thing.
rgrep '[@$]totalvisit' /path/to/your/project
I do this so often that I've taken to calling grep -r, 'Refactoring Grep'. Other languages who's syntax is -- ummm -- not as inspired as Perl's, such as Java, C++ and Python, have tools for doing this sort of thing automatically. Because of the complexity of Perl's syntax, we still have to do it mostly by hand, though there are some efforts underway to rectify this.
Changing the array names in our test as well we get them to pass.
Log. Check in.

