Sign In/My Account | View Cart  
advertisement


Listen Print

Building a Large-scale E-commerce Site with Apache and mod_perl
by Perrin Harkins | Pages: 1, 2, 3, 4, 5

Surviving Christmas 2000

Our capacity planning was for three times the traffic of the previous peak. That's what we tested to, and that's about what we got:

  • 200,000+ sessions/hour
  • 2.5 million+ page views/hour
  • 20,000+ orders/hour

The software survived, although one of the routers went up in smoke. Once again, we were rated the third-most-highly trafficked e-commerce site for the season.

The Architecture

The machine strategy for the system is a fairly common one: low-cost Intel-based servers with a load-balancer in front of them, and big iron for the database.

Like many commercial packages, we have separate systems for the front-end Web servers (which we call proxy servers) and the application servers that generate the dynamic content. Both the proxy servers and the application servers are load-balanced using dedicated hardware from f5 Networks. More details about each of the systems shown here is provided below.

We chose to run Linux on our proxy and application servers, a common platform for mod_perl sites. The ease of remote administration under Linux made the clustered approach possible. Linux also provided solid security features and automated build capabilities to help with adding new servers.

The database servers are IBM NUMA-Q machines, which run DYNIX/ptx.

Proxy Servers

The proxy servers run a slim build of Apache, without mod_perl. They have several standard Apache modules installed, in addition to our own customized version of mod_session, which assigns session cookies. Because the processes are so small, we can run as many as 400 Apache children per machine. These servers handle all image requests themselves, and pass page requests on to the application servers. They communicate with the app servers using standard HTTP requests, and cache the page results when appropriate headers are sent from the app servers. The cached pages are stored on a shared NFS partition of a Network Appliance filer. Serving pages from the cache is nearly as fast as serving static files, i.e. very fast.

This kind of reverse-proxy setup is a commonly recommended approach when working with mod_perl, since it uses the lightweight proxy processes to send the content to clients (who may be on slow connections) and frees the resource-intensive mod_perl processes to move to the next request. For more information on why this configuration is helpful, see the mod_perl developer's guide at http://perl.apache.org/guide/.

Application Servers

Related Reading

Programming Perl, 3rd Edition

Programming Perl, 3rd Edition
By Larry Wall, Tom Christiansen, Jon Orwant

The application servers run mod_perl, and little else. They have a local cache for Perl objects, using Berkeley DB. The Web applications run here, and shared resources like HTML templates are mounted over NFS from the NetApp filer. Because they do the heavy lifting in this setup, these machines are somewhat beefy, with dual CPUs and 1GB of RAM each.

Search Servers

There is a third set of machines dedicated to handling searches. Since searching was such a large percentage of overall traffic, it was worthwhile to dedicate resources to it and take the load off the application servers and database.

The software on these boxes is a multi-threaded daemon that we developed in-house using C++. The application servers talk to the search servers using a Perl module. The search daemon accepts a set of search conditions and returns a sorted list of object IDs of the products whose data fits those conditions. Then the application servers look up the data to display these products from the database. The search servers know nothing about HTML or the Web interface.

This approach of finding the IDs with the search server and then retrieving the object data may sound like a performance hit, but in practice the object data usually comes from the application server's cache rather than the database. This design allows us to minimize the duplicated data between the database and the search servers, making it easier and faster to refresh the index. It also allows us to reuse the same Perl code for retrieving product objects from the database, regardless of how they were found.

The daemon uses a standard inverted word list approach to searching. The index is periodically built from the relevant data in Oracle. If you prefer an all-Perl solution, then there are modules on CPAN that implement this approach, including Search::InvertedIndex and DBIx::FullTextSearch. We chose to write our own because of the tight performance requirements on this part of the system, and because we had an unusually complex set of sorting rules for the returned IDs.

Load Balancing and Failover

We took pains to make sure that we would be able to provide load-balancing among nodes of the cluster and fault-tolerance in case one or more nodes failed. The proxy servers are balanced using a random selection algorithm. A user might end up on a different one on each request. These servers don't hold any state information, so the goal is just to distribute the load evenly.

The application servers use "sticky" load balancing. That means that once a user goes to a particular app server, all of her subsequent requests during that session will also be passed to the same app server. The f5 hardware accomplishes this using browser cookies.

The load balancers run a periodic service check on each server and remove any servers that fail the check from rotation. When a server fails, all users that were "stuck" to that machine are moved to another one.

In order to ensure that no data is lost if an app server dies, all updates are written to the database. As a result, user data like the contents of a shopping cart is preserved even in cases of catastrophic hardware failure on an app server. This is essential for a large e-commerce site.

The database has a separate failover system, which we will not go into here. It follows standard practices recommended by our vendors.

Pages: 1, 2, 3, 4, 5

Next Pagearrow