Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Ten Essential Development Practices
by Damian Conway | Pages: 1, 2, 3, 4, 5, 6, 7, 8

Don't try to fix the problem straight away, though. Instead, immediately add those tests to your test suite. If that testing has been well set up, that can often be as simple as adding a couple of entries to a table:

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law', 

   # Sascha's bug, reported 27 August 2004...
   'man'           => 'men',
   'woman'         => 'women',
);

The point is: if the original test suite didn't report this bug, then that test suite was broken. It simply didn't do its job (finding bugs) adequately. Fix the test suite first by adding tests that cause it to fail:

> perl inflections.t
ok 1 - house -> houses
ok 2 - law -> laws
ok 3 - man -> men
ok 4 - mongoose -> mongooses
ok 5 - goose -> geese
ok 6 - ox -> oxen
not ok 7 - woman -> women
#     Failed test (inflections.t at line 20)
#          got: 'womans'
#     expected: 'women'
ok 8 - mother-in-law -> mothers-in-law
ok 9 - mouse -> mice
ok 10 - box -> boxes
1..10
# Looks like you failed 1 tests of 10.

Once the test suite is detecting the problem correctly, then you'll be able to tell when you've correctly fixed the actual bug, because the tests will once again fall silent.

This approach to debugging is most effective when the test suite covers the full range of manifestations of the problem. When adding test cases for a bug, don't just add a single test for the simplest case. Make sure you include the obvious variations as well:

my %plural_of = (
   'mouse'         => 'mice',
   'house'         => 'houses',
   'ox'            => 'oxen',
   'box'           => 'boxes',
   'goose'         => 'geese',
   'mongoose'      => 'mongooses', 
   'law'           => 'laws',
   'mother-in-law' => 'mothers-in-law', 

   # Sascha's bug, reported 27 August 2004...
   'man'           => 'men',
   'woman'         => 'women',
   'human'         => 'humans',
   'man-at-arms'   => 'men-at-arms', 
   'lan'           => 'lans',
   'mane'          => 'manes',
   'moan'          => 'moans',
);

The more thoroughly you test the bug, the more completely you will fix it.

10. Don't Optimize Code--Benchmark It

If you need a function to remove duplicate elements of an array, it's natural to think that a "one-liner" like this:

sub uniq { return keys %{ { map {$_=>1} @_ } } }

will be more efficient than two statements:

sub uniq {
   my %seen;
   return grep {!$seen{$_}++} @_;
}

Unless you are deeply familiar with the internals of the Perl interpreter (in which case you already have far more serious personal issues to deal with), intuitions about the relative performance of two constructs are exactly that: unconscious guesses.

The only way to know for sure which of two--or more--alternatives will perform better is to actually time each of them. The standard Benchmark module makes that easy:

# A short list of not-quite-unique values...
our @data = qw( do re me fa so la ti do );

# Various candidates...
sub unique_via_anon {
   return keys %{ { map {$_=>1} @_ } };
}

sub unique_via_grep {
   my %seen;
   return grep { !$seen{$_}++ } @_;
}

sub unique_via_slice {
   my %uniq;
   @uniq{@_} = ();
   return keys %uniq;
}

# Compare the current set of data in @data
sub compare {
   my ($title) = @_;
   print "\n[$title]\n";

   # Create a comparison table of the various timings, making sure that
   # each test runs at least 10 CPU seconds...
   use Benchmark qw( cmpthese );
   cmpthese -10, {
       anon  => 'my @uniq = unique_via_anon(@data)',
       grep  => 'my @uniq = unique_via_grep(@data)',
       slice => 'my @uniq = unique_via_slice(@data)',
   };

   return;
}

compare('8 items, 10% repetition');

# Two copies of the original data...
@data = (@data) x 2;
compare('16 items, 56% repetition');

# One hundred copies of the original data...
@data = (@data) x 50;
compare('800 items, 99% repetition');

The cmpthese() subroutine takes a number, followed by a reference to a hash of tests. The number specifies either the exact number of times to run each test (if the number is positive), or the absolute number of CPU seconds to run the test for (if the number is negative). Typical values are around 10,000 repetitions or ten CPU seconds, but the module will warn you if the test is too short to produce an accurate benchmark.

The keys of the test hash are the names of your tests, and the corresponding values specify the code to be tested. Those values can be either strings (which are eval'd to produce executable code) or subroutine references (which are called directly).

The benchmarking code shown above would print out something like the following:

[8 items, 10% repetitions]
        Rate anon  grep slice
anon  28234/s --  -24%  -47%
grep  37294/s   32% --  -30%
slice 53013/s   88% 42%    --

[16 items, 50% repetitions]
        Rate anon  grep slice
anon  21283/s --  -28%  -51%
grep  29500/s   39% --  -32%
slice 43535/s  105% 48%    --

[800 items, 99% repetitions]
       Rate  anon grep slice
anon   536/s --  -65%  -89%
grep  1516/s  183% --  -69%
slice 4855/s  806%  220% --

Each of the tables printed has a separate row for each named test. The first column lists the absolute speed of each candidate in repetitions per second, while the remaining columns allow you to compare the relative performance of any two tests. For example, in the final test tracing across the grep row to the anon column reveals that the grepped solution was 1.83 times (183 percent) faster than using an anonymous hash. Tracing further across the same row also indicates that grepping was 69 percent slower (-69 percent faster) than slicing.

Overall, the indication from the three tests is that the slicing-based solution is consistently the fastest for this particular set of data on this particular machine. It also appears that as the data set increases in size, slicing also scales much better than either of the other two approaches.

However, those two conclusions are effectively drawn from only three data points (namely, the three benchmarking runs). To get a more definitive comparison of the three methods, you'd also need to test other possibilities, such as a long list of non-repeating items, or a short list with nothing but repetitions.

Better still, test on the real data that you'll actually be "unique-ing."

For example, if that data is a sorted list of a quarter of a million words, with only minimal repetitions, and which has to remain sorted, then test exactly that:

our @data = slurp '/usr/share/biglongwordlist.txt';

use Benchmark qw( cmpthese );

cmpthese 10, {
    # Note: the non-grepped solutions need a post-uniqification re-sort
    anon  => 'my @uniq = sort(unique_via_anon(@data))',
    grep  => 'my @uniq = unique_via_grep(@data)',
    slice => 'my @uniq = sort(unique_via_slice(@data))',
};

Not surprisingly, this benchmark indicates that the grepped solution is markedly superior on a large sorted data set:

s/iter anon slice  grep
anon    4.28 --   -3%  -46%
slice   4.15 3%    --  -44%
grep    2.30 86%   80%    --

Perhaps more interestingly, the grepped solution still benchmarks as being marginally faster when the two hash-based approaches aren't re-sorted. This suggests that the better scalability of the sliced solution as seen in the earlier benchmark is a localized phenomenon, and is eventually undermined by the growing costs of allocation, hashing, and bucket-overflows as the sliced hash grows very large.

Above all, that last example demonstrates that benchmarks only benchmark the cases you actually benchmark, and that you can only draw useful conclusions about performance from benchmarking real data.