Sign In/My Account | View Cart  
advertisement


Listen Print

A Refactoring Example
by Michael Schwern | Pages: 1, 2, 3, 4, 5

Fix Conflicting Styles

Now the only difference between the statements is the choice of data ranges.

Using the variables are passed into each statement we can make some more deductions. Let's have a look...

  • $startdate
  • $dataoggi
  • $dataieri
  • $fine30gg, $inizio30gg

One of these things is not like the other. What's $startdate doing there? Everything else is talking about 'data'. What's 'ieri'? 'oggi'? Remember, the programmer who submitted this code is Italian. Maybe the names are in Italian. Grabbing an Italian-English dictionary we find out that 'data' is Italian for 'date'! Now it makes sense, this code was probably originally written in English, then worked on by an Italian (or vice-versa).

This code has committed a cardinal stylistic sin. It uses two different languages for naming variables. Not just different languages, languages which have different meanings for the same words. Taken out of context, we can't know if $data represents a hunk of facts or "Thursday."

Since the styles conflict, one of them has to go. Since I don't speak Italian, I'm going to translate it into English.

Pulling out our Italian-to-English dictionary...

  • "riga" is "line"
  • "pagina" is "page"
  • "nump" is probably short for "numero pagina" which is "page number"
  • "data" is "date"
  • "oggi" is "today"
  • "ieri" is "yesterday"
  • "inizio" is "start"
  • "fine" is "end"
  • "gg" is probably short for "giorni" which is "days"
    • "fine30gg" would then be "the end of 30 days"
    • "inizio30gg" would be "the beginning of 30 days"

It would be a straightforward matter of a bunch of search-and-replaces in any good editor but for one snag, the SQL column 'data.' We'd like to change this to its English 'date', but databases are very global with possibly lots of other programs using it. So we can't change the column name without breaking other code. While in a well-organized programming shop you might have the ability to find all the code which uses your database, we won't assume we have that luxury here. For the moment then, we'll leave that be and deal with it in a separate refactoring.

my $totalvisit_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $totalvisittoday_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $totalmvisit_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   <= ? AND 
       data   >= ?
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($line=<INPUT>){
    $page_num++;
    chop($line);
    $pages[$page_num] = $line;

    $totalvisit_sth->execute($pages[$page_num], $start_date);
    $totalvisit[$page_num] = $totalvisit_sth->fetchrow_array();

    $totalvisittoday_sth->execute($pages[$page_num], $today);
    $totalvisittoday[$page_num] = $totalvisittoday_sth->fetchrow_array();

    $totalvisittoday_sth->execute($pages[$page_num], $yesterday);
    $totalyvisit[$page_num] = $totalvisittoday_sth->fetchrow_array();

    $totalmvisit_sth->execute($pages[$page_num], $end_of_30_days,
                                                 $start_of_30_days);
    $totalmvisit[$page_num] = $totalmvisit_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Better Names

With decent variable names in place, the purpose of the program becomes much clearer. This is a program to calculate the number of visits to a page for various date ranges. Based on this new information we can give the statement handles and the arrays they put data into better names.

Looking at the SQL we see we've got:

  • One to get all the visits up to a single day.
  • One to get the visits for a certain date.
  • One to get the visits for a range of dates.

A good set of new names would be:

  • daily
  • up to
  • range

Also, Total Visits is too long. We could shorten that to just Visits, or even shorter to Hits.

my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   <= ? AND 
       data   >= ?
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($line=<INPUT>){
    $page_num++;
    chop($line);
    $pages[$page_num] = $line;

    $hits_upto_sth->execute($pages[$page_num], $start_date);
    $totalvisit[$page_num] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($pages[$page_num], $today);
    $totalvisittoday[$page_num] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($pages[$page_num], $yesterday);
    $totalyvisit[$page_num] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($pages[$page_num], $end_of_30_days,
                                                $start_of_30_days);
    $totalmvisit[$page_num] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test. Log. Check in.

Changing Global Variable Names

The array names need work, too. Currently, they're rather ambiguous. @totalyvisit, what does the y mean? Looking at each variable name and the variables that got passed to execute() to produce it...

  • @totalvisit comes up to a $start_date. So that can be @hits_upto
  • @totalvisittoday comes from $today and is pretty obvious. @hits_today
  • @totalyvisit comes from $yesterday so 'y' must be for 'yesterday'. @hits_yesterday
  • @totalmvisit comes from the range produced by the $start_of_30_days and the $end_of_30_days. So 'm' must be 'month'. @hits_monthly
my $hits_upto_sth = $dbh->prepare(<<'SQL');
SELECT count(*)
FROM   lognew 
WHERE  pageid =  ? AND 
       data   >= ?
SQL

my $hits_daily_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid = ? AND 
       data   = ?
SQL

my $hits_range_sth = $dbh->prepare(<<'SQL');
SELECT count(*) 
FROM   lognew 
WHERE  pageid =  ? AND
       data   <= ? AND 
       data   >= ?
SQL

open (INPUT, "< $filepageid") || &file_open_error("$filepageid");

while ($line=<INPUT>){
    $page_num++;
    chop($line);
    $pages[$page_num] = $line;

    $hits_upto_sth->execute($pages[$page_num], $start_date);
    $hits_upto[$page_num] = $hits_upto_sth->fetchrow_array();

    $hits_daily_sth->execute($pages[$page_num], $today);
    $hits_today[$page_num] = $hits_daily_sth->fetchrow_array();

    $hits_daily_sth->execute($pages[$page_num], $yesterday);
    $hits_yesterday[$page_num] = $hits_daily_sth->fetchrow_array();

    $hits_range_sth->execute($pages[$page_num], $end_of_30_days,
                                                $start_of_30_days);
    $hits_monthly[$page_num] = $hits_range_sth->fetchrow_array();
}

close (INPUT);

Test... uh-oh, test failed!

There's something very different about this change compared to the others. The variables we changed were not declared in our little code block. Likely they're used in other parts of the code, such as our test which caused it to break.

In the Real World, we would be sure to replace all occurrences of the variable. The simplest way to do this is to use your editor to perform a search and replace rather than doing it by your all too fallible hands. If it could be used over a set of files, grepping through those files for all occurrences of it and changing those as well would be necessary.

# If you don't have rgrep, grep -r does the same thing.
rgrep '[@$]totalvisit' /path/to/your/project

I do this so often that I've taken to calling grep -r, 'Refactoring Grep'. Other languages who's syntax is -- ummm -- not as inspired as Perl's, such as Java, C++ and Python, have tools for doing this sort of thing automatically. Because of the complexity of Perl's syntax, we still have to do it mostly by hand, though there are some efforts underway to rectify this.

Changing the array names in our test as well we get them to pass.

Log. Check in.

Pages: 1, 2, 3, 4, 5

Next Pagearrow