Sign In/My Account | View Cart  
advertisement


Listen Print

Improving mod_perl Sites' Performance: Part 7
by Stas Bekman | Pages: 1, 2

Now we will benchmark the same script without using the mysql (code limited by perl only): (http://www.example.com/perl/access/access.cgi), it's the same script but it just returns the HTML form, without making SQL queries.


  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  MaxRequestsPerChild 5000

     NR   NC      RPS   comment
  ------------------------------------------------
     10   10    26.95   # not a reliable figure
    100   10    30.88   
   1000   10    29.31
   1000   50    28.01
   1000  100    29.74
  10000  200    24.92
 100000  400    24.95

Conclusions: This time the script we executed was pure perl (not limited by I/O or mysql), so we see that the server serves the requests much faster. You can see the number of requests per second is almost the same for any load, but goes lower when the number of concurrent clients goes beyond MaxClients. With 25 RPS, the machine simulating a load of 400 concurrent clients will be served in 16 seconds. To be more realistic, assuming a maximum of 100 concurrent clients and 30 requests per second, the client will be served in 3.5 seconds. Pretty good for a highly loaded server.

Now we will use the server to its full capacity, by keeping all MaxClients clients alive all the time and having a big MaxRequestsPerChild, so that no child will be killed during the benchmarking.


  MinSpareServers       50
  MaxSpareServers       50
  StartServers          50
  MaxClients            50
  MaxRequestsPerChild 5000
  
     NR   NC      RPS   comment
  ------------------------------------------------
    100   10    32.05
   1000   10    33.14
   1000   50    33.17
   1000  100    31.72
  10000  200    31.60

Conclusion: In this scenario, there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.

Now we will change MaxClients and watch the results: Let's reduce MaxClients to 10.


  MinSpareServers        8
  MaxSpareServers       10
  StartServers          10
  MaxClients            10
  MaxRequestsPerChild 5000
  
     NR   NC      RPS   comment
  ------------------------------------------------
     10   10    23.87   # not a reliable figure
    100   10    32.64 
   1000   10    32.82
   1000   50    30.43
   1000  100    25.68
   1000  500    26.95
   2000  500    32.53

Conclusions: Very little difference! Ten servers were able to serve almost with the same throughput as 50. Why? My guess is because of CPU throttling. It seems that 10 servers were serving requests five times faster than when we worked with 50 servers. In that case, each child received its CPU time slice five times less frequently. So having a big value for MaxClients, doesn't mean that the performance will be better. You have just seen the numbers!

Now we will start drastically to reduce MaxRequestsPerChild:


  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50

     NR   NC    MRPC     RPS    comment
  ------------------------------------------------
    100   10      10    5.77 
    100   10       5    3.32
   1000   50      20    8.92
   1000   50      10    5.47
   1000   50       5    2.83
   1000  100      10    6.51

Conclusions: When we drastically reduce MaxRequestsPerChild, the performance starts to become closer to plain mod_cgi.

Here are the numbers of this run with mod_cgi, for comparison:


  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  
     NR   NC    RPS     comment
  ------------------------------------------------
    100   10    1.12
   1000   50    1.14
   1000  100    1.13

Conclusion: mod_cgi is much slower. :) In the first test, when NR/NC was 100/10, mod_cgi was capable of 1.12 requests per second. In the same circumstances, mod_perl was capable of 32 requests per second, nearly 30 times faster! In the first test, each client waited about 100 seconds to be served. In the second and third tests, they waited 1,000 seconds!

Choosing MaxClients

The MaxClients directive sets the limit on the number of simultaneous requests that can be supported. No more than this number of child server processes will be created. To configure more than 256 clients, you must edit the HARD_SERVER_LIMIT entry in httpd.h and recompile. In our case, we want this variable to be as small as possible, so we can limit the resources used by the server children. Since we can restrict each child's process size with Apache::SizeLimit or Apache::GTopLimit, the calculation of MaxClients is pretty straightforward:


               Total RAM Dedicated to the Webserver
  MaxClients = ------------------------------------
                     MAX child's process size

So if I have 400Mb left for the Web server to run with, then I can set MaxClients to be of 40 if I know that each child is limited to 10Mb of memory (e.g. with Apache::SizeLimit).

You will be wondering what will happen to your server if there are more concurrent users than MaxClients at any time. This situation is signified by the following warning message in the error_log:


  [Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
  consider raising the MaxClients setting

There is no problem -- any connection attempts over the MaxClients limit will normally be queued, up to a number based on the ListenBacklog directive. When a child process is freed at the end of a different request, the connection will be served.

It is an error because clients are being put in the queue rather than getting served immediately, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more child processes. The best approach is to try not to have this condition reached at all, and if you reach it often you should start to worry about it.

It's important to understand how much real memory a child occupies. Your children can share memory between them when the OS supports that. You must take action to allow the sharing to happen. We have disscussed this in one of the previous article whose main topic was shared memory. If you do this, then chances are that your MaxClients can be even higher. But it seems that it's not so simple to calculate the absolute number. If you come up with a solution, then please let us know! If the shared memory was of the same size throughout the child's life, then we could derive a much better formula:


               Total_RAM + Shared_RAM_per_Child * (MaxClients - 1)
  MaxClients = ---------------------------------------------------
                              Max_Process_Size

which is:


                    Total_RAM - Shared_RAM_per_Child
  MaxClients = ---------------------------------------
               Max_Process_Size - Shared_RAM_per_Child

Let's roll some calculations:


  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   4Mb

              500 - 4
 MaxClients = --------- = 82
               10 - 4

With no sharing in place


                 500
 MaxClients = --------- = 50
                 10

With sharing in place you can have 64 percent more servers without buying more RAM.

If you improve sharing and keep the sharing level, let's say:


  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   8Mb

              500 - 8
 MaxClients = --------- = 246
               10 - 8

392 percent more servers! Now you can feel the importance of having as much shared memory as possible.

Choosing MaxRequestsPerChild

The MaxRequestsPerChild directive sets the limit on the number of requests that an individual child server process will handle. After MaxRequestsPerChild requests, the child process will die. If MaxRequestsPerChild is 0, then the process will live forever.

Setting MaxRequestsPerChild to a non-zero limit solves some memory leakage problems caused by sloppy programming practices, whereas a child process consumes more memory after each request.

If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch).

If this is your case, then you can set MaxRequestsPerChild to a small number. This will allow the system to reclaim the memory that a greedy child process consumed, when it exits after MaxRequestsPerChild requests.

But beware -- if you set this number too low, you will lose some of the speed bonus you get from mod_perl. Consider using Apache::PerlRun if this is the case.

Another approach is to use the Apache::SizeLimit or the Apache::GTopLimit modules. By using either of these modules you should be able to discontinue using the MaxRequestPerChild, although for some developers, using both in combination does the job. In addition the latter module allows you to kill any servers whose shared memory size drops below a specified limit.

References