Sign In/My Account | View Cart  
advertisement


Listen Print

Building a Large-scale E-commerce Site with Apache and mod_perl
by Perrin Harkins | Pages: 1, 2, 3, 4, 5

Code Structure

The code is structured around the classic Model-View-Controller pattern, originally from SmallTalk and now often applied to Web applications. The MVC pattern is a way of splitting an application's responsibilities into three distinct layers.

Classes in the Model layer represent business concepts and data, like products or users. These have an API but no end-user interface. They know nothing about HTTP or HTML and can be used in non-Web applications like cron jobs. They talk to the database and other data sources, and manage their own persistence.

The Controller layer translates Web requests into appropriate actions on the Model layer. It handles parsing parameters, checking input, fetching the appropriate Model objects, and calling methods on them. Then it determines the appropriate View to use and sends the resulting HTML to the user.

View objects are really HTML templates. The Controller passes data from the Model objects to them and they generate a Web page. These are implemented with the Template Toolkit, a powerful templating system written in Perl. The templates have some basic conditional statements and looping in them, but only enough to express the formatting logic. No application control flow is embedded in the templates.

Caching

The core of the performance strategy is a multi-tier caching system. On the application servers, data objects are cached in shared memory with a backing store on local disk. Applications specify how long a data object can be out of sync with the database, and all future accesses during that time are served from the high-speed cache. This type of cache control is known as ``time-to-live.'' The local cache is implemented using a Berkeley DB database. Objects are serialized with the standard Storable module from CPAN.

Data objects are divided into pieces when necessary to provide finer granularity for expiration times. For example, product inventory is updated more frequently than other product data. By splitting up the product data, we can use a short expiration for inventory that keeps it in tighter sync with the database, while still using a longer expiration for the less volatile parts of the product data.

The application servers' object caches share product data between them using the IP Multicast protocol and custom daemons written in C. When a product is placed in the cache on one server, the data is replicated to the cache on all other servers. This technique is successful because of the high locality of access in product data. During the 2000 Christmas season this cache achieved a 99 percent hit ratio, thus taking a large amount of work off the database.

In addition to caching the data objects, entire pages that are not user-specific, like product detail pages, can be cached. The application takes the shortest expiration time of the data objects used in the pages and specifies that to the proxy servers as a page expiration time, using standard Expires headers. The proxy servers cache the generated page on a shared NFS partition. Pages served from this cache have performance close to that of static pages.

To allow for emergency fixes, we added a hook to mod_proxy that deletes the cached copy of a specified URL. This was used when a page needed to be changed immediately to fix incorrect information.

An extra advantage of this mod_proxy cache is the automatic handling of If-Modified-Since requests. We did not need to implement this ourselves since mod_proxy already provides it.

Session Tracking

Perl for System AdministrationPerl for System Administration
By David N. Blank-Edelman
July 2000
1-56592-609-9, Order Number: 6099
444 pages, $34.95

Users are assigned session IDs using HTTP cookies. This is done at the proxy servers by our customized version of mod_session. Doing it at the proxy ensures that users accessing cached pages will still get a session ID assigned. The session ID is simply a key into data stored on the server-side. User sessions are assigned to an application server and continue to use that server unless it becomes unavailable. This is called "sticky" load balancing. Session data and other data modified by the user -- such as shopping cart contents -- is written to both the object cache and the database. The double write carries a slight performance penalty, but it allows for fast read access on subsequent requests without going back to the database. If a server failure causes a user to be moved to a different application server, then the data is simply fetched from the database again.

Security

A large e-commerce site is a popular target for all types of attacks. When designing such a system, you have to assume that you will be attacked and build with security in mind, at the application level as well as the machine level.

The main rule of thumb is "don't trust the client!" User-specific data sent to the client is protected using multiple levels of encryption. SSL keeps sensitive data exchanges private from anyone snooping on network traffic. To prevent "session hijacking" (when someone tampers with their session ID in order to gain access to another user's session), we include a Message Authentication Code (MAC) as part of the session cookie. This is generated using the standard Digest::SHA1 module from CPAN, with a seed phrase known only to our servers. By running the ID from the session cookie through this MAC algorithm, we can verify that the data being presented was generated by us and not tampered with.

In situations where we need to include some state information in an HTML form or URL and don't want it to be obvious to the user, we use the CPAN Crypt:: modules to encrypt and decrypt it. The Crypt::CBC module is a good place to start.

To protect against simple overload attacks, when someone uses a program to send high volumes of requests at our servers hoping to make them unavailable to customers, access to the application servers is controlled by a throttling program. The code is based on some work by Randal Schwartz in his Stonehenge::Throttle module. Accesses for each user are tracked in compact logs written to an NFS partition. The program enforces limits on how many requests a user can make within a certain period of time.

For more information on Web security concerns including the use of MAC, encryption and overload prevention, we recommend looking at the books CGI Programming with Perl, 2nd Edition and Writing Apache Modules with Perl and C, both from O'Reilly.

Exception Handling

When planning this system, we considered using Java as the implementation language. We decided to go with Perl, but we really missed Java's nice exception-handling features. Luckily, Graham Barr's Error module from CPAN supplies similar capabilities in Perl.

Perl already has support for trapping runtime errors and passing exception objects, but the Error module adds some nice syntactic sugar. The following code sample is typical of how we used the module:

    try {
        do_some_stuff();
    } catch My::Exception with {
        my $E = shift;
        handle_exception($E);
    };

The module allows you to create your own exception classes and trap for specific types of exceptions.

One nice benefit of this is the way it works with DBI. If you turn on DBI's RaiseError flag and use try blocks in places where you want to trap exceptions, the Error module can turn DBI errors into simple Error objects.

    try {
        $sth->execute();
    } catch Error with {
        # roll back and recover
        $dbh->rollback();
        # etc.
    };

This code shows a condition where an error would indicate that we should roll back a database transaction. In practice, most DBI errors indicate something unexpected happened with the database and the current action can't continue. Those exceptions are allowed to propagate up to a top-level try{} block that encloses the whole request. When errors are caught there, we log a stacktrace and send a friendly error page back to the user.

Pages: 1, 2, 3, 4, 5

Next Pagearrow