Listen Print

A Refactoring Example

by Michael Schwern
October 09, 2003

About a year ago, a person asked the Fun With Perl mailing list about some code they had written to do database queries. It's important to note that this person was posting from an .it address; why will become apparent later. The code was reading records in from a text file and then doing a series of queries based on that information. They wanted to make it faster.

Here's his code:

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($riga=<INPUT>){
$nump++;
chop($riga);
$pagina[$nump] = $riga;

$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
pageid='$pagina[$nump]' and data>='$startdate'");
$sth->execute;
$totalvisit[$nump] = $sth->fetchrow_array();

$sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataoggi')");
$sth->execute;
$totalvisittoday[$nump] = $sth->fetchrow_array();

 $sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataieri')");
$sth->execute;
$totalyvisit[$nump] = $sth->fetchrow_array();

 $sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data<='$fine30gg' and data>='$inizio30gg')");
$sth->execute;
$totalmvisit[$nump] = $sth->fetchrow_array();

 }

I decided that rather than try to read through this code and figure out what it's doing and how to make it faster, I'd clean it up first. Clean it up before you figure out how it works? Yes, using a technique called Refactoring.

Refactoring?

In his book, Martin Fowler defines Refactoring as "the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure." In other words, you clean up your code but don't change what it does.

Refactoring can be as simple as changing this code:

$exclamation = 'I like '.$pastry.'!';

To this:

$exclamation = "I like $pastry!";

Still does the same thing, but it's easier to read.

It's important to note that I don't need to know anything about the contents of $pastry or how $exclamation is used. The change is completely self-contained and does not affect surrounding code or change what it does. This is Refactoring.

On the principle of "show me don't tell me," rather than talk about it, we'll dive right into refactoring our bit of code.

Fix the Indentation

Your first impulse when faced with a hunk of code slammed against the left margin is to indent it. This is our first refactoring.

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($riga=<INPUT>){
    $nump++;
    chop($riga);
    $pagina[$nump] = $riga;

    $sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
                         pageid='$pagina[$nump]' and data>='$startdate'");
    $sth->execute;
    $totalvisit[$nump] = $sth->fetchrow_array();

    $sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
                          (pageid='$pagina[$nump]' and data='$dataoggi')");
    $sth->execute;
    $totalvisittoday[$nump] = $sth->fetchrow_array();

    $sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
                          (pageid='$pagina[$nump]' and data='$dataieri')");
    $sth->execute;
    $totalyvisit[$nump] = $sth->fetchrow_array();

    $sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
                         (pageid='$pagina[$nump]' and data<='$fine30gg' 
						 and data>='$inizio30gg')");
    $sth->execute;
    $totalmvisit[$nump] = $sth->fetchrow_array();
}

close (INPUT);

Already it looks better. We can see that we're iterating over a file, performing some SELECTs on each line and shoving the results into a bunch of arrays.

A Single, Simple Change

One of the most important principles of Refactoring is that you work in small steps. This re-indentation is a single step. And part of this single step includes running the test suite, logging the change, and checking it into CVS.

Checking into CVS after something this simple? Yes. Many programmers ask the question, "When should I check in?" When you're refactoring it's simple: check in when you've done one refactoring and have tested that it works. Our re-indentation is one thing; we test that it works and check it in.

This may seem excessive, but it prevents us from entangling two unrelated changes together. By doing one change at a time we know that any new bugs were introduced by that one change. Also, you will often decide in the middle of a refactoring that it's not such a good idea. When you've checked in at every one you can simply rollback to the last version rather than having to undo it by hand. Convenient, and you're sure no stray bits of your aborted change are hanging around.

So our procedure for doing a proper refactoring is:

  • Make one logical change to the code.
  • Make sure it passes tests.
  • Log and check in.

Big Refactorings from Small

The goal of this refactoring is to make the code go faster. One of the simplest ways to do achieve that is to pull necessary code out of the loop. Preparing four new statements in every iteration of the loop seems really unnecessary. We'd like to pull those prepare() statements out of the loop. This is a refactoring. To achieve this larger refactoring, a series of smaller refactorings must be done.

Use Bind Variables

Each time through the loop, a new set of SQL statements is created based on the line read in. But they're all basically the same, just the data is changing. If we could pull that data out of the statement we'd be closer to our goal of pulling the prepare()s out of the loop.

So my next refactoring pulls variables out of the SQL statements and replaces them with placeholders. Then the data is bound to the statement using bind variables. This means we're now prepare()ing the same statements every time through the loop.

Before refactoring:

$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
                     pageid='$pagina[$nump]' and data>='$startdate'");
$sth->execute;

After refactoring:

$sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE 
                     pageid=? and data>=?');
$sth->execute($pagina[$nump], $startdate);

Bind variables also protect against a naughty user from trying to slip some extra SQL into your program via the data you read in. As a side-effect of our code cleanup, we've closed a potential security hole.

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($riga=<INPUT>){
    $nump++;
    chop($riga);
    $pagina[$nump] = $riga;

    $sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE 
                         pageid=? and data>=?');
    $sth->execute($pagina[$nump], $startdate);
    $totalvisit[$nump] = $sth->fetchrow_array();

    $sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
                          (pageid=? and data=?)');
    $sth->execute($pagina[$nump], $dataoggi);
    $totalvisittoday[$nump] = $sth->fetchrow_array();

    $sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
                          (pageid=? and data=?)');
    $sth->execute($pagina[$nump], $dataieri);
    $totalyvisit[$nump] = $sth->fetchrow_array();

    $sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
                         (pageid=? and data<=? and data>=?)');
    $sth->execute($pagina[$nump], $fine30gg, $inizio30gg);
    $totalmvisit[$nump] = $sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Split a Poorly Reused Variable

The next stumbling block to pulling the prepare() statements out of the loop is that they all use the same variable, $sth. We'll have to change it so they all use different variables. While we're at it, we'll name those statement handles something more descriptive of what the statement does. Since at this point we haven't figured out what the statements do, we can base the name on the array it gets assigned to.

While we're at it, throw in some my() declarations to limit the scope of these variables to just the loop.

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($riga=<INPUT>){
    $nump++;
    chop($riga);
    $pagina[$nump] = $riga;

    my $totalvisit_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE 
                                        pageid=? and data>=?');
    $totalvisit_sth->execute($pagina[$nump], $startdate);
    $totalvisit[$nump] = $totalvisit_sth->fetchrow_array();

    my $totalvisittoday_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
                                             (pageid=? and data=?)');
    $totalvisittoday_sth->execute($pagina[$nump], $dataoggi);
    $totalvisittoday[$nump] = $totalvisittoday_sth->fetchrow_array();

    my $totalyvisit_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
                                         (pageid=? and data=?)');
    $totalyvisit_sth->execute($pagina[$nump], $dataieri);
    $totalyvisit[$nump] = $totalyvisit_sth->fetchrow_array();

    my $totalmvisit_sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
                                        (pageid=? and data<=? and data>=?)');
    $totalmvisit_sth->execute($pagina[$nump], $fine30gg, $inizio30gg);
    $totalmvisit[$nump] = $totalmvisit_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Pages: 1, 2, 3, 4, 5

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2008 O’Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email