Sign In/My Account | View Cart  
advertisement


Listen Print

Improving mod_perl Sites' Performance: Part 1
by Stas Bekman | Pages: 1, 2, 3

Ouch ... Discontinued Products

The OSs in this hazard group tend to be developed by a single company or organization.

You might find yourself in a position where you have invested a lot of time and money into developing some proprietary software that is bundled with the OS you chose (say writing a mod_perl handler that takes advantage of some proprietary features of the OS and that will not run on any other OS). Things are under control, the performance is great and you sing with happiness on your way to work. Then, one day, the company that supplies your beloved OS goes bankrupt (not unlikely nowadays), or they produce a newer incompatible version and they will not support the old one (happens all the time). You are stuck with their early masterpiece, no support and no source code! What are you going to do? Invest more money into porting the software to another OS ...

Free and open-source OSs are probably less susceptible to this kind of problem. Development is usually distributed between many companies and developers. So if a person who developed an important part of the kernel lost interest in continuing, then someone else will pick the falling flag and carry on. Of course, if tomorrow some better project shows up, then developers might migrate there and finally drop the development. But in practice, people are often given support on older versions and helped to migrate to current versions. Development tends to be more incremental than revolutionary, so upgrades are less traumatic, and there is usually plenty of notice of the forthcoming changes so that you have time to plan for them.

Of course, with open-source OSs you can have the source code! So you can always have a go yourself, but do not under-estimate the amounts of work involved. There are many, many man-years of work in an OS.

Keeping Up with OS Releases

Actively developed OSs generally try to keep pace with the latest technology developments, and continually optimize the kernel and other parts of the OS to become better and faster. Nowadays, Internet and networking in general are the hottest topics for system developers. Sometimes a simple OS upgrade to the latest stable version can save you an expensive hardware upgrade. Also, remember that when you buy new hardware, chances are that the latest software will make the most of it.

If a new product supports an old one by virtue of backward compatibility with previous products of the same family, then you might not reap all the benefits of the new product's features. Perhaps you get almost the same functionality for much less money if you were to buy an older model of the same product.

Choosing the Right Hardware

Sometimes the most expensive machine is not the one that provides the best performance. Your demands on the platform hardware are based on many aspects and affect many components. Let's discuss some of them.

In the discussion I use terms that may be unfamiliar to you:

  • Cluster: a group of machines connected together to perform one big or many small computational tasks in a reasonable time. Clustering can also be used to provide 'fail-over,' where if one machine fails, then its processes are transferred to another without interruption of service. And you may be able to take one of the machines down for maintenance (or an upgrade) and keep your service running -- the main server will simply not dispatch the requests to the machine that was taken down.
  • Load balancing: users are given the name of one of your machines but perhaps it cannot stand the heavy load. You can use a clustering approach to distribute the load over a number of machines. The central server, which users access initially when they type the name of your service, works as a dispatcher. It just redirects requests to other machines. Sometimes the central server also collects the results and returns them to the users. You can get the advantages of clustering, too.
  • Network Interface Card (NIC): a hardware component that allows you to connect your machine to the network. It performs packets sending and receiving, newer cards can encrypt and decrypt packets and perform digital signing and verifying of the such. These are coming in different speeds categories varying from 10Mbps to 10Gbps and faster. The most used type of the NIC card is the one that implements the Ethernet networking protocol.
  • Random Access Memory (RAM): It's the memory that you have in your computer. (Comes in units of 8Mb, 16Mb, 64Mb, 256Mb, etc.)
  • Redundant Array of Inexpensive Disks (RAID): an array of physical disks, usually treated by the operating system as one single disk, and often forced to appear that way by the hardware. The reason for using RAID is often simply to achieve a high data transfer rate, but it may also be to get adequate disk capacity or high reliability. Redundancy means that the system is capable of continued operation even if a disk fails. There are various types of RAID array and several different approaches to implementing them. Some systems provide protection against failure of more than one drive and some (`hot-swappable') systems allow a drive to be replaced without even stopping the OS.

Machine Strength Demands According to Expected Site Traffic

If you are building a fan site and you want to amaze your friends with a mod_perl guest book, then any old 486 machine could do it. If you are in a serious business, then it is important to build a scalable server. If your service is successful and becomes popular, then the traffic could double every few days, and you should be ready to add more resources to meet demand. While we can define the Web server scalability more precisely, the important thing is to make sure that you can add more power to your webserver(s) without investing much additional money in software development (you will need a little software effort to connect your servers, if you add more of them). This means that you should choose hardware and OSs that can talk to other machines and become a part of a cluster.

On the other hand, if you prepare for a lot of traffic and buy a monster to do the work for you, then what happens if your service doesn't prove to be as successful as you thought? Then you've spent too much money, and meanwhile faster processors and other hardware components have been released; so you lose.

Wisdom and prophecy, that's all it takes :)

Single Strong Machine vs. Many Weaker Machines

Let's start with a claim that a 4-year-old processor is still powerful and can be put to a good use. Now let's say that for a given amount of money you can probably buy either one new very strong machine or about 10 older but very cheap machines. I claim that with 10 old machines connected into a cluster and by deploying load balancing you will be able to serve about five times more requests than with one single new machine.

Why is that? Because generally the performance improvement on a new machine is marginal while the price is much higher. Ten machines will do faster disk I/O than one single machine, even if the new disk is quite a bit faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the new machine you have just bought might not stand the load. Then you will have to purchase more equipment and think about how to implement load balancing and Web server file system distribution anyway.

Why I am so convinced? Look at the busiest services on the Internet: search engines, Web/e-mail servers and the like -- most of them use a clustering approach. You may not always notice it, because they hide the real implementation details behind proxy servers.

Pages: 1, 2, 3

Next Pagearrow