Building a Large-scale E-commerce Site with Apache and mod_perl
by Perrin Harkins
|
Pages: 1, 2, 3, 4, 5
Trap: Nested Exceptions
When trying out a new technology like the Error module, there are
bound to be some things to watch out for. We found a certain code
structure that causes a memory leak every time it is executed. It
involves nested try{} blocks, and looks like this:
my $foo;
try {
# some stuff...
try {
$foo++;
# more stuff...
} catch Error with {
# handle error
};
} catch Error with {
# handle other error
};
It's not Graham Barr's fault that this leaks; it is simply a byproduct
of the fact that the try and catch keywords are implemented using
anonymous subroutines. This code is equivalent to the following:
my $foo;
$subref1 = sub {
$subref2 = sub {
$foo++;
};
};
This nested subroutine creates a closure for $foo and will make a
new copy of the variable each time it is executed. The situation is
easy to avoid once you know to watch out for it.
Berkeley DB
One of the big wins in our architecture was the use of Berkeley DB.
Since most people are not familiar with it's more advanced features,
we'll give a brief overview here.
The DB_File module is part of the standard Perl distribution.
However, it only supports the interface of Berkeley DB version 1.85,
and doesn't include the interesting features of later releases. To get
those, you'll need the BerkeleyDB.pm module, available from CPAN.
This module can be tricky to build, but comprehensive instructions are
included.
Newer versions of Berkeley DB offer many features that help
performance in a mod_perl environment. To begin with, database files
can be opened once at the start of the program and kept open, rather
than opened and closed on each request. Berkeley DB will use a
shared memory buffer to improve data access speed for all processes
using the database. Concurrent access is directly supported with
locking handled for you by the database. This is a huge win over
DB_File, which requires you to do your own locking. Locks can be at
a database level, or at a memory page level to allow multiple
simultaneous writers. Transactions with rollback capability are also
supported.
This all sounds too good to be true, but there are some downsides. The documentation is somewhat sparse, and you will probably need to refer to the C API if you need to understand how to do anything complicated.
A more serious problem is database corruption. When an Apache process using Berkeley DB dies from a hard kill or a segfault, it can corrupt the database. A corrupted database will sometimes cause subsequent opening attempts to hang. According to the people we talked to at Sleepycat Software (which provides commercial support for Berkeley DB), this can happen even with the transactional mode of operation. They are working on a way to fix the problem. In our case, none of the data stored in the cache was essential for operation so we were able to simply clear it out when restarting an application server.
Another thing to watch out for is deadlocks. If you use the page-level locking option, then you have to handle deadlocks. There is a daemon included in the distribution that will watch for deadlocks and fix them, or you can handle them yourself using the C API.
After trying a few different things, we recommend that you use database-level locking. It's much simpler, and cured our problems. We didn't see any significant performance hit from switching to this mode of locking. The one thing you need to watch out for when using exclusive database level write locks are long operations with cursors that tie up the database. We split up some of our operations into multiple writes in order to avoid this problem.
If you have a good C coder on your team, you may want to try the alternate approach that we finally ended up with. You can write your own daemon around Berkeley DB and use it in a client/server style over Unix sockets. This allows you to catch signals and ensure a safe shutdown. You can also write your own deadlock handling code this way.
Valuable Tools
|
|
If you plan to do any serious Perl development, then you should really take
the time to become familiar with some of the available development
tools. The debugger in particular is a lifesaver, and it works with
mod_perl. There is a profiler called Devel::DProf, which also
works with mod_perl. It's definitely the place to start when
performance tuning your application.
We found the ability to run our complete system on individual's workstations to be extremely useful. Everyone could develop on his own machine, and coordinate changes using CVS source control.
For object modeling and design, we used the open-source Dia program
and Rational Rose. Both support working with UML and are great for
generating pretty class diagrams for your cubicle walls.
Do Try This at Home
Since we started this project, a number of development frameworks that offer support for this kind of architecture have come out. We don't have direct experience using these, but they have a similar design and may prove useful to you if you want to take an MVC approach with your system.
Apache::PageKit is a mod_perl module available from CPAN that
provides a basic MVC structure for Web applications. It uses the
HTML::Template module for building views.
OpenInteract is a recently released Web application framework in Perl, which works together with the persistence layer SPOPS. Both are available from CPAN.
The Application Toolkit from Extropia is a comprehensive set of Perl classes for building Web apps. It has excellent documentation and takes good advantage of existing CPAN modules. You can find it on http://www.extropia.com/.
If you want a ready-to-use cache module, take a look at the Perl-cache
project on http://sourceforge.net/. This is the next generation of
the popular File::Cache module.
The Java world has many options as well. The Struts framework, part of the Jakarta project, is a good open-source choice. There are also commercial products from several vendors that follow this sort of design. Top contenders include ATG Dynamo, BEA WebLogic, and IBM WebSphere.
An Open-Source Success Story
By building on the open-source software and community, we were able to create a top-tier Web site with a minimum of cost and effort. The system we ended up with is scalable to huge amounts of traffic. It runs on mostly commodity hardware making it easy to grow when the need arises. Perhaps best of all, it provided tremendous learning opportunities for our developers, and made us a part of the larger development community.
We've contributed patches from our work back to various open-source projects, and provided help on mailing lists. We'd like to take this opportunity to officially thank the open-source developers who contributed to projects mentioned here. Without them, this would not have been possible. We also have to thank the hardworking Web developers at eToys. The store may be closed, but the talent that built it lives on.
If you have questions about this material, you can contact us at the following e-mail addresses:
Bill Hilf - bill@hilfworks.com
Perrin Harkins - perrin@elem.com


