Recently in Books and Magazines Category

Genomic Perl

This is a book I have been looking forward to for a long time. Back when James Tisdall had just finished his Beginning Perl for Bioinformatics, I asked him to write an article about how to get into bioinf from a Perl programmer's perspective. With bioinformatics being a recently booming sphere and many scientists finding themselves in need of computer programmers to help them process their data, I thought it would be good if there was more information for programmers about the genomic side of things.

Rex Dwyer has produced a book, Genomic Perl, which bridges the gap. As well as teaching basic Perl programming techniques to biologists, it introduces many useful genetic concepts to those already familiar with programming. Of course, as a programmer and not a biologist, I'm by no means qualified to assess the quality of that side of the book, but I certainly learned a lot from it.

The first chapter, for instance, taught the basics of DNA transcription and translation, the basics of Perl string handling, and tied the two concepts together with an example of transcription in Perl. This is typical of the format of the book - each chapter introduces a genetic principle and a related problem, a Perl principle which can be used to solve the problem, and a program illustrates both. It's a well thought-out approach which deftly caters for both sides of the audience.

However, it should be stressed that the book is a substitute neither for a good introductory book on Perl nor a good textbook on genetics; and indeed, I think it will turn out to be better for programmers who need an over-arching idea of some of the problems involved with bioinformatics than for biologists who need to turn out working code. For instance, when it states that a hash is the most convenient data structure for looking up amino acids by their codons, it doesn't say why, or even what a hash is. On the other hand, amino acids and codons are both explained in detail.

The book covers a wide range of biological areas - from the structure of DNA to building predictive models of species, exploring the available databases of genetic sequences including readers of the GenBank database and an implementation of the BLAST algorithm, phylology, protein databases, DNA sequence assembly and prediction, restriction mapping, and a lot more besides. In all, it's a good overview of the common areas in which biologists need computer programs.

There's a significant but non-threatening amount of math in there, particularly in dealing with probabilities of mutation and determining whether or not events are significant, but I was particularly encouraged to see discussion of algorithmic running time; as the author is primarily a computer science professor and secondarily a bioinformaticists, this should not be too surprising. However, a significant number of bioinformaticists tend to produce code which works... eventually. Stopping to say "well, this is order n-to-the-6 and we can probably do better than that" is most welcome.

Onto the code itself. The first thing any reader will notice about the book is that the code isn't monospaced. Instead, the code is "ground", pretty-printed, as in days of old. This means you'll see code snippets like:

next unless $succReadSite; ## dummy sinks have no successor
my $succContigSite = $classNames->find($succReadSite);

Now, I have to admit I really like this, but others may find it difficult to read, and those who know slightly less Perl may find it confusing - the distinction between '' and " (that's two single quotes and a double quote) can be quite subtle, and if you're going to grind Perl code, regular expressions really, really ought to be monospaced. "$elem =~ /^[(^\[(]*)(\(.*\))?$/;" is just plain awkward.

The code is more idiomatic than much bioinformatic code that I've seen, but still feels a little unPerlish; good use is made of references, callbacks and object oriented programming, but three-argument for is used more widely than the fluent Perl programmer would have it, and things like


    main();
    sub main {
        ...
    }

worry me somewhat. But it works, it's efficient, and it's certainly enough to get the point across.

The appendices were enlightening and well thought-out: the first turns an earlier example, RNA structure folding, into a practical problem of drawing diagrams of folded RNA for publication; the other two tackle matters of how to make some of the algorithms in the text more efficient.

All in all, I came away from this book not just with more knowledge about genetics and biology - indeed, some of what I learned has been directly applicable to some work I have - but also with an understanding of some of the complexity of the problems geneticists face. It fully satisfies its goals, expressed in the preface: teaching computer scientists the biological underpinnings of bioinformatics, providing real, working code for biologists without boring the programmers, and providing an elementary handling of the statistical considerations of the subject matter. While it will end up being more used by programmers getting into the field, it's still useful for the biologists already there, particularly when combined with something like James Tisdall's book or Learning Perl. But for the programmer like me, interested in what biologists do and how we can help them do it, it's by far the clearest introduction available, and I would heartily recommend it.

Embedding Perl in HTML with Mason

Disclaimer: As you know, each month I try to review a recently published Perl book, and I aim to cover all the majors as they come out. The book that's fallen onto my desk for review this month is Dave Rolsky and Ken Williams' Embedding Perl in HTML with Mason "What is this," you're thinking, "an O'Reilly site doing a review of an O'Reilly book? Scandalous!" Well, I hope that you've taken a look at my other reviews and have satisfied yourself that I try to be as impartial as I can when reviewing. As far as I'm concerned, this is a Perl site first and an O'Reilly site second.

With that disclaimer out of the way, onto the book! There are plenty of ways of achieving the Perl-in-HTML goal, as this book correctly points out: the Template Toolkit, the venerable embperl, and so on. My personal favorite, however, is HTML::Mason, and so I've been looking forward to this book for a long time. That also means, however, that I've had high expectations of it; Rolsky and Williams' effort has lived up to many of them but let me down in a few areas.

The first chapter does a good job of setting the scene - it describes a little of what Mason looks like, talks about some of the alternatives to Mason, and shows how to install the module and test it. While the first description is precisely what is needed, and the test process is well-documented, I feel more could have been made of the comparison to other techniques - tools such as PHP and Template Toolkit are described, but they're not compared to Mason, so it's hard to see their relative strengths and weaknesses. Similarly, the installation process for Mason is quite detailed, and brushing it off with "perl Makefile.PL; make; make install" doesn't do it justice - an example of the output would make readers feel more comfortable.

The book then goes on to introduce the main syntax of Mason components. This is a useful section and I learned a lot about various ways Mason tags are interpreted, but I felt it would have been better structured with more examples building on top of one another; that said, the chapter did declare itself to be an introduction to Mason syntax and semantics, and as that, it succeeded - however, I think it would have been a better tutorial if both semantics and syntax were covered.

The short chapter on autohandlers and dhandlers was a sermon from the clouds. I've long known that these things existed and were powerful, but until reading chapter 3, I didn't quite know how they should be used. The section on the Mason API again unfortunately suffers from a lack of examples; but on the other hand, the book builds up to a full example in chapter 8. The "advanced features" chapter was a complete goldmine of information. Like many of these early chapters, it contains a lot of concentrated goodness in not many pages, and it will take me several more readings to pick out and understand more of the ideas.

Chapter 6, another short (10 page) chapter on the "lexer compiler, resolver and interpreter objects" seems to be included for completeness or for hard-core Mason hackers - it could be skipped or moved to an appendix with no loss of flow or coverage.

As I've mentioned, chapter 8 is where it all comes together: a full, real-life application (the Perl Apprenticeship site - an interesting resource in its own right!) is put together before your very eyes. As this is a real live site, areas such as creating user accounts and handling access control have to be covered. If you're doing anything at all with Mason, then I'd urge you to buy this book if only for this chapter - once you've worked through it, you'll have a much clearer idea of how a Mason site fits together.

Later chapters cover mixing Mason and CGI, another brief (8 page) chapter on design, and another gold mine chapter of recipes. Chapters like this and the advanced topics chapter are the reason you buy books on open source projects: sure, there may be free documentation's there - and the Mason documentation is pretty thorough - but it doesn't directly tell you how to do what you want to do. The documentation doesn't often cover the situations you find yourself in when you're actually developing with the tool in question; I'm happy to say that this book does.

Appendices cover the Mason API, (which is odd, given chapter 4 covers that ...) the Mason-object model, using Mason with your favorite text editor (a surprisingly useful set of information!) and information about Bricolage, a Mason-based context management system; useful as yet another set of ideas for what Mason can do.

On an aesthetic note, kudos to O'Reilly for restoring the spine coloring - now my bookshelf can be color-coded again; now bring back Garamond!

If I've sounded at all negative in this review, then it's probably because I've been expecting this book for a while and have had high hopes for it. That said, my overall impression of this book is that it's a little thin - short chapters and few worked examples leave one wanting more. On the other hand, the full example in chapter 8 is worth its weight in gold, and when combined with the advanced concept and cookbook chapters, I'd give this book a qualified thumbs-up for anyone doing any Mason work.

Writing Perl Modules for CPAN

As Andy Lester points out in this month's Perl Review, one big advantage of there being so many Perl books around is that the publishers can now get around to putting out books on some of the more "niche" areas of Perl. Finally, we can explore many areas that haven't really been written down and codified before.

Sam Tregar's "Writing Perl Modules For CPAN" certainly does this, and in a good way; most hints and strategies for designing, creating and maintaining Perl modules, including the sometimes arduous community interaction that this entails, have been handed down wordlessly through observation of the old hands. I tried my best in the core perlnewmod documentation to explain some of the issues involved, but Sam - a bit of an old hand himself, with the HTML::Template and Devel::Profiler modules amongst his output - takes the time to cover all the bases: from how to create a module distribution and how to get your PAUSE ID and so on to how to handle feature creep, setting up mailing lists and CVS servers, and with "bonus" chapters on XS programming and CGI applications.

Don't be put off by the book's title; even if you're not planning on making your modules public, we've found that the techniques of good distribution management and source control explained in this book are assets even when you're developing modules internal to a company or specific application.

Frequent readers of my reviews will know that one of my favorite subjects is introductory filler, and this book doesn't get off lightly either; chapter 1, the history and motivation of the CPAN, really ought to go in the foreword, and chapter 2, Perl module basics, should be required knowledge.

The writing style is friendly but direct, although I occasionally feel that the author tends to overuse footnotes a little, (Chapter 7 has 25 footnotes in 10 pages) and has a good mix of practical and philosophical discussion appropriate for this topic. The code style is perfectly fine, although I would like to have seen use strict a little more prominently, and some discussion of error handling and checking would not have gone amiss.

On the down side: typography. For some reason, the book's designer is enamoured of bold and italic typewriter faces for headings, which frankly looks horrible. Sadly, typos are rife, and there are some surprising omissions, too: The wonderful cpan-upload script is not mentioned, nor is the CPAN bug-tracking system, and coverage of testing, one of the recent obsessions of the Perl module community, is quite thin.

On the other hand, I was surprised by the XS chapters; they work. Sam somehow manages to pack just enough information into two relatively slim chapters at the end of the book to allow some pretty complex XS modules to be created by the adventurous reader, but then follows it with quite a bit of repetition in the following chapter, on Inline::C.

Similarly, the chapter on Great CPAN Modules was an unexpectedly good read - I'd never before sat down and thought about why one particular XML parsing module (say) should be more popular than another, and this is a good summary of the issues involved.

If you're thinking about writing a Perl module, whether or not it's for public consumption, then I'd certainly recommend getting a copy of this book; I certainly learned a few things about module maintainance from it, and I'm sure you will too.


Writing Perl Modules for CPAN is published by Apress.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02