Beginning Bioinformatics
by James D. Tisdall
|
Pages: 1, 2
There are a few other good books out, and several more coming during the next year, in the bioinformatics field. Waterman; Mount; Grant and Ewens; Baxevanis et al, and Pevzner are a few of the most popular books (some more theoretical than others). My book, although for beginning programmers, may be helpful in the later chapters to get an idea of basic biological data and programs. Gibas and Jambeck's book Developing Bioinformatics Computer Skills gives a good overview of much of the software and the general computational approach that's used in bioinformatics, although it also includes some beginning topics unsuitable for the experienced programmer.
|
Related Reading
Programming Perl, 3rd Edition |
Of all the bioinformatics programs that one might want to learn about, the Perl programmer will naturally gravitate toward the Bioperl project. This is an open-source, international collaborative effort to write useful Perl bioinformatics modules, and it has reached a point during the past few years where it is quite useful stuff. The 1.0 release may be available by the time you read this. Exploring this software, available at http://www.bioperl.org, is highly recommended, with one caveat: It does not include much tutorial material, certainly not for teaching basic biology concepts. Still, you'll find lots of great stuff to explore and use in Bioperl. It's a must for the Perl bioinformatician.
Apart from self-study, you may also want to try to get into some seminars or reading groups at the local university or biotech firm, or generally meet people. If you're job hunting, then you may want to go introduce yourself to the head of the biology department at the U, and let her (yes, there are a lot of women working in biology research, a much better situation than in programming) -- know that you want a bioinformatics job and that you are a wizard at 1) programming in general, 2) Web programming, and 3) getting a lot out of computers for minimal money. But be prepared for them to have sticker shock when it comes to salaries. Maybe it's getting a little better now, but I've often found that biologists want to pay you about half of what you're worth on the market. Their pay level is just lower than that in computer programming. When you get to that point, you might have to be a bit hardnosed during salary negotiations to maintain your children's nutritional requirements.
I don't know of a book or training program that's specifically targeted at programmers interested in learning about biology. However, many universities have started offering bioinformatics courses, training programs, and even degrees, and some of their course offerings are designed for the experienced programmer. You might consider attending one of the major bioinformatics conferences. However, there will be a tutorial aimed at you in the upcoming O'Reilly bioinformatics conference -- indeed, the main focus of that conference is from the programming side more than the biology side.
Apart from the upcoming O'Reilly conference already mentioned, there is the ISMB conference, the largest in the bioinformatics field, which is in Calgary this coming summer; a good place to meet people and learn. It will also play host to the Bioperl yearly meeting, which is directly on target. Actually, if you check out the presenters at the ISMB, RECOMB or O'Reilly conferences, then you will find computer science people who are specializing in biology-related problems, as well as biologists specializing in infomatics, and many of these will be many of these will be lab heads or managers who maintain staffs of programmers.The thing about biology is that it's a very large area. Most researchers stake out a claim to some particular system -- say, the regulation of nervous system development in the fly -- and work there. So it's hard to really prepare yourself for the particular biology you might find on the job. The "Recombinant DNA" book will give you an overview of some of the more important techniques that are commonly used in most labs.
A Taste of Molecular Biology
Now that I've given you my general take on how a Perl programmer could move into biology research, I'll turn my attention to two basic molecular biology techniques that are fundamental in biology research, as for instance in the Human Genome Project: restriction enzymes and cloning with PCR.
First, we have to point out that the two most important biological molecules, DNA and proteins, are both polymers, which are chains of smaller building block molecules. DNA is made out of four building blocks, the nucleotides or "bases"; proteins are made from 20 amino acids. DNA has a regular structure, usually the famous double helix of two complementary strands intertwined; whereas proteins fold up in a wide variety of ways that have an important effect on what the proteins are able to do inside the cell. DNA is the primary genetic material that transmits traits to succeeding generations. Finally, DNA contains the coded templates from which proteins are made; and proteins accomplish much of the work of the cell.
One important class of proteins are enzymes, which promote certain specific chemical reactions in the cell. In 1978 the Nobel Prize was awarded to Werner Arber, Daniel Nathans, and Hamilton Smith for their discovery and work on restriction enzymes in the 1960s and early 1970s. Restriction enzymes are a large group of enzymes that have the useful property of cutting DNA at specific locations called restriction sites. This has been exploited in several important ways. It has been an important technique in fingerprinting DNA, as is used in forensic science to identify individuals. It has been instrumental in developing physical maps, which are known positions along DNA and are used to zero in on the location of genes, and also serve as a reference point for the lengthy process of the determination of the entire sequence of bases in DNA.
Restriction enzymes are fundamental to modern biological research. To learn more about them, you could go to the REBASE restriction enzyme database where detailed information about all known restriction enzymes is collected. Many of them are easily ordered from supply houses for use in the lab.
One of the most common restriction enzymes is called EcoRI. When it finds the six bases GAATTC along a strand of DNA, it cleaves the DNA.
The other main technique I want to introduce is one already mentioned: PCR, or the polymerase chain reaction. This is the most important way that DNA samples are cloned, that is, have copies made. PCR is very powerful at this; in a short time many millions of copies of a stretch of DNA can be created, at which point there is enough of a "sample" of the DNA to perform other molecular biology experiments, such as determining what exactly is the sequence of bases in the DNA (as has been accomplished for humans in the Human Genome Project.)
PCR also won a Nobel prize for its invention, by Kary Mullis in 1983. The basic idea is quite simple. We've mentioned that the two intertwined strands of the double helix of DNA are complementary. They are different, but given one strand we know what the other strand is, as they always pair in a specific way. PCR exploits this property.
Motivation
It's clear that a short article is not going to get very far in introducing a major science such as biology. But I hope I've given you enough pointers to enable you to make a good start at learning about this explosive science, and about how a Perl programmer might be able to bring needed skills to the great challenge of understanding life and curing disease.
In the 10 years I've been working in biology, I've found it to be a really exciting field, very stimulating intellectually; and I've found that going to work to try to help to cure cancer, Alzheimer's disease, and others, has been very satisfying emotionally.
I wish you the very best of luck. If you make it to the O'Reilly conference, please look me up!


