Sign In/My Account | View Cart  
advertisement


Listen Print

A Refactoring Example
by Michael Schwern | Pages: 1, 2, 3, 4, 5

Improve Overly Generic Names

Continuing with our variable name improvements, we're left with the last few unimproved names. Let's start with $line.

Since we can see clearly that $line = <INPUT>, calling the variable 'line' tells us nothing new. A better name might be what each line contains. Looking at how the line is used we see $pages[$page_num] = $line and how that is then used in the SQL. It's a page id.

But it doesn't make much sense to put a page id into an array called @pages. It doesn't contain pages, it contains @page_ids.

What about $page_num? It doesn't contain a page number, it contains the line number of the file we're reading in. Or more conventionally, an $index or $idx.

my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   <= ? AND 
       data   >= ?
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($page_id=<INPUT>){
    $idx++;
    chop($page_id);
    $page_ids[$idx] = $page_id;

    $hits_upto_sth->execute($page_ids[$idx], $start_date);
    $hits_upto[$idx] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $today);
    $hits_today[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $yesterday);
    $hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($page_ids[$idx], $end_of_30_days,
                                                   $start_of_30_days);
    $hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Fixing Odd Interfaces

What's wrong with this picture?

$hits_range_sth->execute($page_ids[$idx], $end_of_30_days,
                                               $start_of_30_days);

Isn't it a little odd to specify a date range with the end first? Sure is. It also guarantees someone is going to get it backwards. Reverse it. Don't forget the SQL, too.

my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   >= ? AND
       data   <= ? 
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($page_id=<INPUT>){
    $idx++;
    chop($page_id);
    $page_ids[$idx] = $page_id;

    $hits_upto_sth->execute($page_ids[$idx], $start_date);
    $hits_upto[$idx] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $today);
    $hits_today[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $yesterday);
    $hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
                                                   $end_of_30_days);
    $hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

s/chop/chomp/

Now that we've stared at the code for a while, you might have noticed the use of chop(). Using chop() to strip a newline is asking for portability problems, so let's fix it by using chomp().

Technically this isn't a refactoring since we altered the behavior of the code by fixing the bug. But using chop() where you meant chomp() is such a common mistake we'll make it an honorary refactoring.

my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   >= ? AND
       data   <= ? 
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($page_id=<INPUT>){
    $idx++;
    chomp($page_id);
    $page_ids[$idx] = $page_id;

    $hits_upto_sth->execute($page_ids[$idx], $start_date);
    $hits_upto[$idx] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $today);
    $hits_today[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $yesterday);
    $hits_yesterday[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
                                                   $end_of_30_days,);
    $hits_monthly[$idx] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Collect Related Variables into Hashes

The common prefix hits_ is a dead giveaway that much of the data in this code is related. Related variables should be grouped together into a single structure, probably a hash to make the relation obvious and allow them to be passed around to subroutines more easily. Its easier to move around one hash than four arrays.

I've decided to collect together all the @hit_ arrays into a single hash %hits since they'll probably be used together parts of the program. If this code snippet represents a function it means I can return one hash reference rather than four array refs. It also makes future expansion easier, rather than returning an additional array it simply becomes another key in the hash.

Before.

$hits{upto}[$idx] = $hits_upto_sth->fetchrow_array();

After.

$hits_upto[$idx]  = $hits_upto_sth->fetchrow_array();

It's interesting to note what a small, natural change this is. Circumstantial evidence that this is a good refactoring.

As before, since these arrays are global data, we must be sure to change them everywhere. This includes the tests.

my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   >= ? AND
       data   <= ? 
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($page_id=<INPUT>){
    $idx++;
    chomp($page_id);
    $page_ids[$idx] = $page_id;

    $hits_upto_sth->execute($page_ids[$idx], $start_date);
    $hits{upto}[$idx] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $today);
    $hits{today}[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($page_ids[$idx], $yesterday);
    $hits{yesterday}[$idx] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($page_ids[$idx], $start_of_30_days,
                                                   $end_of_30_days,);
    $hits{monthly}[$idx] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

When Not to Refactor

The statement handles are also related, but I'm not going to collect them together into a hash. The statement handles are short-lived lexicals, they're never likely to be passed around. Their short scope and grouping within the code makes their relationship obvious. The design would not be improved by the refactoring.

Refactoring is not a set of rules to be slavishly followed, it's a collection of tools. And like any other tool you must carefully consider when and when not to use it. Since collecting the statement handles together doesn't improve the design, I won't do it.

Pages: 1, 2, 3, 4, 5

Next Pagearrow