Improving mod_perl Sites' Performance: Part 1
by Stas Bekman
|
Pages: 1, 2, 3
Getting Fast Internet Connection
You have the best hardware you can get, but the service is still crawling. Make sure you have a fast Internet connection. Not as fast as your ISP claims it to be, but fast as it should be. The ISP might have a good connection to the Internet, but put many clients on the same line. If these are heavy clients, then your traffic will have to share the same line and your throughput will suffer. Think about a dedicated connection and make sure it is truly dedicated. Don't trust the ISP, check it!
The idea of having a connection to the Internet is a little misleading. Many Web hosting and co-location companies have large amounts of bandwidth, but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges.
Private peering means that providers can exchange traffic much quicker.
Also, if your Web site is of global interest, check that the ISP has good global connectivity. If the Web site is going to be visited mostly by people in a certain country or region, then your server should probably be located there.
Bad connectivity can directly influence your machine's performance. Here is a story one of the developers told on the mod_perl mailing list:
What relationship has 10 percent packet loss on one upstream provider got to do with machine memory ?
Yes ... a lot. For a nightmare week, the box was located downstream of a provider who was struggling with some serious bandwidth problems of his own ... people were connecting to the site via this link, and packet loss was such that retransmits and TCP stalls were keeping httpd heavies around for much longer than normal ... instead of blasting out the data at high or even modem speeds, they would be stuck at 1k/sec or stalled out ... people would press stop and refresh, httpds would take 300 seconds to timeout on writes to no-one ... it was a nightmare. Those problems didn't go away till I moved the box to a place closer to some decent backbones.
Note that with a proxy, this only keeps a lightweight httpd tied up, assuming the page is small enough to fit in the buffers. If you are a busy Internet site, then you always have some slow clients. This is a difficult thing to simulate in benchmark testing, though.
Tuning I/O Performance
If your service is I/O bound (does a lot of read/write operations to
disk), then you need a very fast disk, especially if the you need a
relational database, which are the main I/O stream creators. So you
should not spend the money on video card and monitor! A cheap card
and a 14-inch monochrome monitor are perfectly adequate for a Web server;
you will probably access it by telnet or ssh most of the time.
Look for disks with the best price/performance ratio. Of course, ask
around and avoid disks that have a reputation for head-crashes and
other disasters.
You must think about RAID or similar systems if you have an enormous data set to serve (what is an enormous data set nowadays? Gigabytes, terabytes?) or you expect a really big Web traffic.
OK, you have a fast disk, what's next? You need a fast disk controller. There may be one embedded on your computer's motherboard. If the controller is not fast enough, then you should buy a faster one. Don't forget that it may be necessary to disable the original controller.
How Much Memory Is Enough?
How much RAM do you need? Nowadays, chances are that you will hear: ``Memory is cheap, the more you buy the better.'' But how much is enough? The answer is pretty straightforward: you do not want your machine to swap. When the CPU needs to write something into memory, but memory is already full, it takes the least frequently used memory pages and swaps them out to disk. This means you have to bear the time penalty of writing the data to disk. If another process then references some of the data that happens to be on one of the pages that has just been swapped out, then the CPU swaps it back in again, probably swapping out some other data that will be needed very shortly by some other process. Carried to the extreme, the CPU and disk start to thrash hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then your troubles really start.
How do you make a decision? You know the highest rate at which your server expects to serve pages and how long it takes on average to serve one. Now you can calculate how many server processes you need. If you know the maximum size your servers can grow to, then you know how much memory you need. If your OS supports memory sharing, then you can make best use of this feature by preloading the modules and scripts at server startup, and so you will need less memory than you have calculated.
Do not forget that other essential system processes need memory as well, so you should plan not only for the Web server, but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it. Most of the time your server will not have the maximum load, but you should be ready to bear the peaks. You need to reserve at least 20 percent of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and an unexpected number of requests suddenly came in. (Like Slashdot effect.) If you are about to announce something cool, then be aware of the possible consequences.
Getting a Fault-Tolerant CPU
Make sure that the CPU is operating within its specifications. Many boxes are shipped with incorrect settings for CPU clock speed, power supply voltage, etc. Sometimes a cooling fan is not fitted. It may be ineffective because a cable assembly fouls the fan blades. Like faulty RAM, an overheating processor can cause all kinds of strange and unpredictable things to happen. Some CPUs are known to have bugs that can be serious in certain circumstances. Try not to get one of them.
Detecting and Avoiding Bottlenecks
You might use the most expensive components, but still get bad performance. Why? Let me introduce an annoying word: bottleneck.
A machine is an aggregate of many components. Almost any one of them may become a bottleneck.
If you have a fast processor but a small amount of RAM, then the RAM will probably be the bottleneck. The processor will be under-utilized, usually it will be waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the busiest pages.
If you have a lot of memory, a fast processor, a fast disk, but a slow disk controller, then the disk controller will be the bottleneck. The performance will still be bad, and you will have wasted money.
A slow NIC can cause a bottleneck as well and make the whole service run slow. This is a most important component, since Web servers are much more often network-bound than they are disk-bound (i.e. having more network traffic than disk utilization)
Solving Hardware Requirement Conflicts
It may happen that the combination of software components that you find yourself using gives rise to conflicting requirements for the optimization of tuning parameters. If you can separate the components onto different machines, then you may find that this approach (a kind of clustering) solves the problem, at much less cost than buying faster hardware, because you can tune the machines individually to suit the tasks they should perform.
For example, if you need to run a relational database engine and mod_perl server, then it can be wise to put the two on different machines, since while RDBMS need a very fast disk, mod_perl processes need lots of memory. So by placing the two on different machines it's easy to optimize each machine at separate and satisfy the each software components requirements in the best way.
References
- The mod_perl site's URL: http://perl.apache.org
- For more information about RAID see the Disk-HOWTO, Module-HOWTO and Parallel-Processing-HOWTO available from the Linux Documentation Project and its mirrors (http://www.linuxdoc.org/docs.html#howto)
- For more information about clusters and high availability setups, see:
High-Availability Linux Project -- the definitive guide to load balancing techniques
Linux Virtual Server Project ( http://www.linuxvirtualserver.org/ )
mod_backhand -- Load Balancing for Apache ( http://www.backhand.org/mod_backhand/ )
mod_redundancy -- Redundancy/Failover solution ( http://www.ask-the-guru.com )
lbnamed - a Load Balancing Name Server Written in Perl ( http://www.stanford.edu/~riepel/lbnamed/ http://www.stanford.edu/~riepel/lbnamed/bof.talk/ http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html )

