Ten Essential Development Practices
by Damian Conway
|
Pages: 1, 2, 3, 4, 5, 6, 7, 8
Don't try to fix the problem straight away, though. Instead, immediately add those tests to your test suite. If that testing has been well set up, that can often be as simple as adding a couple of entries to a table:
my %plural_of = (
'mouse' => 'mice',
'house' => 'houses',
'ox' => 'oxen',
'box' => 'boxes',
'goose' => 'geese',
'mongoose' => 'mongooses',
'law' => 'laws',
'mother-in-law' => 'mothers-in-law',
# Sascha's bug, reported 27 August 2004...
'man' => 'men',
'woman' => 'women',
);
The point is: if the original test suite didn't report this bug, then that test suite was broken. It simply didn't do its job (finding bugs) adequately. Fix the test suite first by adding tests that cause it to fail:
> perl inflections.t
ok 1 - house -> houses
ok 2 - law -> laws
ok 3 - man -> men
ok 4 - mongoose -> mongooses
ok 5 - goose -> geese
ok 6 - ox -> oxen
not ok 7 - woman -> women
# Failed test (inflections.t at line 20)
# got: 'womans'
# expected: 'women'
ok 8 - mother-in-law -> mothers-in-law
ok 9 - mouse -> mice
ok 10 - box -> boxes
1..10
# Looks like you failed 1 tests of 10.
Once the test suite is detecting the problem correctly, then you'll be able to tell when you've correctly fixed the actual bug, because the tests will once again fall silent.
This approach to debugging is most effective when the test suite covers the full range of manifestations of the problem. When adding test cases for a bug, don't just add a single test for the simplest case. Make sure you include the obvious variations as well:
my %plural_of = (
'mouse' => 'mice',
'house' => 'houses',
'ox' => 'oxen',
'box' => 'boxes',
'goose' => 'geese',
'mongoose' => 'mongooses',
'law' => 'laws',
'mother-in-law' => 'mothers-in-law',
# Sascha's bug, reported 27 August 2004...
'man' => 'men',
'woman' => 'women',
'human' => 'humans',
'man-at-arms' => 'men-at-arms',
'lan' => 'lans',
'mane' => 'manes',
'moan' => 'moans',
);
The more thoroughly you test the bug, the more completely you will fix it.
10. Don't Optimize Code--Benchmark It
If you need a function to remove duplicate elements of an array, it's natural to think that a "one-liner" like this:
sub uniq { return keys %{ { map {$_=>1} @_ } } }
will be more efficient than two statements:
sub uniq {
my %seen;
return grep {!$seen{$_}++} @_;
}
Unless you are deeply familiar with the internals of the Perl interpreter (in which case you already have far more serious personal issues to deal with), intuitions about the relative performance of two constructs are exactly that: unconscious guesses.
The only way to know for sure which of two--or more--alternatives will perform better is to actually time each of them. The standard Benchmark module makes that easy:
# A short list of not-quite-unique values...
our @data = qw( do re me fa so la ti do );
# Various candidates...
sub unique_via_anon {
return keys %{ { map {$_=>1} @_ } };
}
sub unique_via_grep {
my %seen;
return grep { !$seen{$_}++ } @_;
}
sub unique_via_slice {
my %uniq;
@uniq{@_} = ();
return keys %uniq;
}
# Compare the current set of data in @data
sub compare {
my ($title) = @_;
print "\n[$title]\n";
# Create a comparison table of the various timings, making sure that
# each test runs at least 10 CPU seconds...
use Benchmark qw( cmpthese );
cmpthese -10, {
anon => 'my @uniq = unique_via_anon(@data)',
grep => 'my @uniq = unique_via_grep(@data)',
slice => 'my @uniq = unique_via_slice(@data)',
};
return;
}
compare('8 items, 10% repetition');
# Two copies of the original data...
@data = (@data) x 2;
compare('16 items, 56% repetition');
# One hundred copies of the original data...
@data = (@data) x 50;
compare('800 items, 99% repetition');
The cmpthese() subroutine takes a number, followed by a
reference to a hash of tests. The number specifies either the exact number of
times to run each test (if the number is positive), or the absolute number of
CPU seconds to run the test for (if the number is negative). Typical values are
around 10,000 repetitions or ten CPU seconds, but the module will warn you if
the test is too short to produce an accurate benchmark.
The keys of the test hash are the names of your tests, and the corresponding
values specify the code to be tested. Those values can be either strings (which
are eval'd to produce executable code) or subroutine references
(which are called directly).
The benchmarking code shown above would print out something like the following:
[8 items, 10% repetitions]
Rate anon grep slice
anon 28234/s -- -24% -47%
grep 37294/s 32% -- -30%
slice 53013/s 88% 42% --
[16 items, 50% repetitions]
Rate anon grep slice
anon 21283/s -- -28% -51%
grep 29500/s 39% -- -32%
slice 43535/s 105% 48% --
[800 items, 99% repetitions]
Rate anon grep slice
anon 536/s -- -65% -89%
grep 1516/s 183% -- -69%
slice 4855/s 806% 220% --
Each of the tables printed has a separate row for each named test. The
first column lists the absolute speed of each candidate in repetitions per
second, while the remaining columns allow you to compare the relative
performance of any two tests. For example, in the final test tracing across
the grep row to the anon column reveals that the
grepped solution was 1.83 times (183 percent) faster than using an anonymous hash.
Tracing further across the same row also indicates that grepping was 69 percent
slower (-69 percent faster) than slicing.
Overall, the indication from the three tests is that the slicing-based solution is consistently the fastest for this particular set of data on this particular machine. It also appears that as the data set increases in size, slicing also scales much better than either of the other two approaches.
However, those two conclusions are effectively drawn from only three data points (namely, the three benchmarking runs). To get a more definitive comparison of the three methods, you'd also need to test other possibilities, such as a long list of non-repeating items, or a short list with nothing but repetitions.
Better still, test on the real data that you'll actually be "unique-ing."
For example, if that data is a sorted list of a quarter of a million words, with only minimal repetitions, and which has to remain sorted, then test exactly that:
our @data = slurp '/usr/share/biglongwordlist.txt';
use Benchmark qw( cmpthese );
cmpthese 10, {
# Note: the non-grepped solutions need a post-uniqification re-sort
anon => 'my @uniq = sort(unique_via_anon(@data))',
grep => 'my @uniq = unique_via_grep(@data)',
slice => 'my @uniq = sort(unique_via_slice(@data))',
};
Not surprisingly, this benchmark indicates that the grepped solution is
markedly superior on a large sorted data set:
s/iter anon slice grep
anon 4.28 -- -3% -46%
slice 4.15 3% -- -44%
grep 2.30 86% 80% --
Perhaps more interestingly, the grepped solution still benchmarks as being
marginally faster when the two hash-based approaches aren't re-sorted. This
suggests that the better scalability of the sliced solution as seen in the
earlier benchmark is a localized phenomenon, and is eventually undermined by
the growing costs of allocation, hashing, and bucket-overflows as the sliced
hash grows very large.
Above all, that last example demonstrates that benchmarks only benchmark the cases you actually benchmark, and that you can only draw useful conclusions about performance from benchmarking real data.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 9 of 9.
- Re: Rethinking bracing as it is done now...
2005-07-18 09:17:41 MaxDemian [Reply]
Although I do agree with the first poster as to brace placement that's just my personal preference.
But I have read that the more compact bracing style, that's often referred to as K&R style, is actually not the method preferred by K&R. They both prefer newlines before opening braces (but like me, think it's a minor issue). When publishing a book the amount of space saved adds up to a considerable amount of pages and money so they, like most authors had no choice in the matter.
I can't find a link to that, and even if i did it may be apocryphal but since it suits my personal preference i like to believe it.
- Rethinking bracing as it is done now...
2005-07-15 09:55:36 KLEstes [Reply]
<soapbox>
One thing that really bugs me is modern formatting practice. Now I know my way isn't The_One_True_Way (indeed, that is one of the great things about Perl culture, there being more than one way to do it - that might even make a very interesting t-shirt from thinkgeek, but I digress) but hear me out.
if ( $sigil eq '$' ) {
if ( $subsigil eq '?' ) {
$sym_table{ substr( $var_name, 2 ) }
= delete $sym_table{ locate_orig_var($var) };
$internal_count++;
$has_internal{$var_name}++;
}
else {
${$var_ref} = q{$sym_table{$var_name}};
$external_count++;
$has_external{$var_name}++;
}
}
elsif ( $sigil eq '@' && $subsigil eq '?' ) {
@{ $sym_table{$var_name} }
= grep {defined $_} @{ $sym_table{$var_name} };
}
elsif ( $sigil eq '%' && $subsigil eq '?' ) {
delete $sym_table{$var_name}{$EMPTY_STR};
}
else {
${$var_ref} = q{$sym_table{$var_name}};
}
Admittedly, this is better code than the spaghetti that preceded it. Yet ask yourself, if one brace was missing - how easy could it be found ? Then ask yourself "if a brace is but a placeholder, why should it have its own place _before_ the block of code it demarcates ? In doing so, it breaks the flow of visual emphasis - is there enough reason to warrant that ? To be fair, braces can imply much implicit meaning (such as closures, or local context) but most of the time, do they really warrant uglyfying the program ?
Now consider this:
if ( $sigil eq '$' )
{
if ( $subsigil eq '?' )
{
$sym_table{ substr( $var_name, 2 ) }
= delete $sym_table{locate_orig_var($var) };
$internal_count++;
$has_internal{$var_name}++;
}
else
{
${$var_ref} = q{$sym_table{$var_name}};
$external_count++;
$has_external{$var_name}++;
}
}
elsif ( $sigil eq '@' && $subsigil eq '?' )
{
@{ $sym_table{$var_name} }
= grep {defined $_} @{ $sym_table{$var_name} };
}
elsif ( $sigil eq '%' && $subsigil eq '?' )
{
delete $sym_table{$var_name}{$EMPTY_STR};
}
else
{
${$var_ref} = q{$sym_table{$var_name}};
}
Notice how the braces fall in line with the code blocks. True, there is more lines this way, and that is one of the reasons why the bracing style is used so much in books - it saves precious (and costly) publishing real estate. In a modern source code editor, this should not be a concern.
If a brace was missing, would it be easier to find ? Of course it would. Braces are not given exaggerated importance either. The code may even "feel" different to you - the change in the visual impact makes for a "smoother" experience.
In short, there is no one way that is best for everyone. Consider why you do things, and what your style is buying you, and you will arrive at the way that is best to you. Then you can use perltidy and conform with the rest of the neanderthals :)
</soapbox>
Also one thing Damian might have mentioned that is mentioned in "Pragmatic Version Control" is that using perltidy on a file under version control might be a Baaaad Thing to Do. The book is so big and chock full of wisdom, he probably just had to draw the line somewhere :)
From what little I've seen of the book, even a contentious old-fart-in-training like me will learn scads - Thanks Damian !!!
- Rethinking bracing as it is done now...
2005-07-15 16:26:21 DamianConway [Reply]
<<Yet ask yourself, if one brace was missing - how easy could it be found?>>
I'm not sure I'm convinced this is a major issue. Any decent text editor should quickly be able to find an unmatched brace.
More relevently, any reasonable layout scheme should make identifying missing braces easy. For example, in the K&R style I suggest, every opening brace is at the end of the line on which its controlling construct appears. So a braced-construct keyword without a brace at the EOL is a missing opening brace. And under K&R every closing brace is on its own line at the same indentation as the controlling keyword. So if you scan down from a keyword and don't find a brace at the same indentation, that's where the missing closing brace is.
<<...there are more lines this way, and that is one of the reasons why the bracing style is used so much in books - it saves precious (and costly) publishing real estate. In a modern source code editor, this should not be a concern.>>
Actually, I think it still *is* a concern. Screen real estate (80x24) is typically even more limited than textbook real estate (90x50)...at least, it is at the screen font sizes my failing eyes can cope with! ;-)
<<In short, there is no one way that is best for everyone. Consider why you do things, and what your style is buying you, and you will arrive at the way that is best to you.>>
You just summed up my Chapter 1 in two sentences. Well done! :-)
<<From what little I've seen of the book, even a contentious old-fart-in-training like me will learn scads>>
I sincerely hope so. I certainly learned scads as I was writing it.- Rethinking bracing as it is done now...
2006-09-20 14:15:44 JadeNB [Reply]
<blockquote>I'm not sure I'm convinced this is a major issue. Any decent text editor should quickly be able to find an unmatched brace.</blockquote>
I'm not sure about Emacs, but vim (the One True Editor) chokes on
while ( <> ) {
/}/
}
Of course this snippet is unlikely on its own, but I had a similar problem just today when writing a primitive TeX checker.
- Rethinking bracing as it is done now...
- GRRRRRR ! ARRRRGH !
2005-07-15 09:57:12 KLEstes [Reply]
I am so sorry - I had no idea perl.com would UTTERLY DESTROY my formatting, making my dissertation UTTERLY MOOT.
<sigh>
- GRRRRRR ! ARRRRGH !
2005-07-15 17:33:13 emdee1 [Reply]
You didn't know that html collapses white space?- GRRRRRR ! ARRRRGH !
2005-07-18 12:39:51 KLEstes [Reply]
yeah, I guess I kinda thought perl.com would be advanced enough to preserve formatting - silly me !
- GRRRRRR ! ARRRRGH !
- GRRRRRR ! ARRRRGH !
- Rethinking bracing as it is done now...
- Getopt::Long et al
2005-07-15 05:45:56 JohanLindström [Reply]
The section about command line options really should mention some Getopt::* modules.
/J
- Getopt::Long et al
2005-07-15 16:29:28 DamianConway [Reply]
Good point. Chapter 14 of "Perl Best Practices", from which that excerpt was taken, certainly does mention the numerous Getopt:: modules.
- Getopt::Long et al



