Improving mod_perl Sites' Performance: Part 6
by Stas Bekman
|
Pages: 1, 2
A Complete Fork Example
Now let's put all the bits of code together and show a well-written fork code that solves all the problems discussed so far. I will use an <Apache::Registry> script for this purpose:
proper_fork1.pl
---------------
use strict;
use POSIX 'setsid';
use Apache::SubProcess;
my $r = shift;
$r->send_http_header("text/plain");
$SIG{CHLD} = 'IGNORE';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
print "Parent $$ has finished, kid's PID: $kid\n";
} else {
$r->cleanup_for_exec(); # untie the socket
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null'
or die "Can't write to /dev/null: $!";
open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
setsid or die "Can't start a new session: $!";
select STDERR;
local $| = 1;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
CORE::exit(0); # terminate the process
}
The script starts with the usual declaration of the strict mode, loading the
POSIX and Apache::SubProcess modules and importing of
the setsid() symbol from the POSIX package.
The HTTP header is sent next, with the Content-type of text/plain. The gets ready to ignore the child, to avoid zombies and the fork is called.
The program gets its personality split after fork and the
if conditional evaluates to a true value for the parent process,
and to a false value for the child process; the first block is executed by the
parent and the second by the child.
The parent process announces his PID and the PID of the spawned process and finishes its block. If there will be any code outside, then it will be executed by the parent as well.
The child process starts its code by disconnecting from the socket, changing
its current directory to /, opening the STDIN and STDOUT streams to
/dev/null, which in effect closes them both before opening. In fact, in
this example we don't need neither of these, so I could just
close() both. The child process completes its disengagement from
the parent process by opening the STDERR stream to /tmp/log, so it
could write there, and creating a new session with help of setsid(). Now the
child process has nothing to do with the parent process and can do the actual
processing that it has to do. In our example, it performs a simple series of
warnings, which are logged into /tmp/log:
select STDERR;
local $|=1;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
The localized setting of $|=1 is there, so we can see the output
generated by the program immediately. In fact, it's not required when the output
is generated by warn().
Finally, the child process terminates by calling:
CORE::exit(0);
which makes sure that it won't get out of the block and run some code that it's not supposed to run.
This code example will allow you to verify that indeed the spawned child process has its own life, and its parent is free as well. Simply issue a request that will run this script, watch that the warnings are started to be written into the /tmp/log file and issue a complete server stop and start. If everything is correct, then the server will successfully restart and the long-term process will still be running. You will know that it's still running if the warnings will still be printed into the /tmp/log file. You may need to raise the number of warnings to do above 20, to make sure that you don't miss the end of the run.
If there are only five warnings to be printed, then you should see the following output in this file:
started
1
2
3
4
5
completed
Starting a Long-Running External Program
But what happens if we cannot just run a Perl code from the spawned process
and we have a compiled utility, i.e. a program written in C. Or we have a Perl
program that cannot be easily converted into a module, and thus called as a
function. Of course, in this case, we have to use system(), exec(),
qx() or `` (back ticks) to start it.
When using any of these methods and when the Taint mode is enabled, we must at least add the following code to untaint the PATH environment variable and delete a few other insecure environment variables. This information can be found in the perlsec manpage.
$ENV{'PATH'} = '/bin:/usr/bin';
delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
Now all we have to do is to reuse the code from the previous section.
First, we move the core program into the external.pl file, add the shebang first line so the program will be executed by Perl, tell the program to run under Taint mode (-T) and possibly enable the warnings mode (-w) and make it executable:
external.pl
-----------
#!/usr/bin/perl -Tw
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null'
or die "Can't write to /dev/null: $!";
open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
select STDERR;
local $|=1;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
Now we replace the code that moved into the external program with
exec() to call it:
proper_fork_exec.pl
-------------------
use strict;
use POSIX 'setsid';
use Apache::SubProcess;
$ENV{'PATH'} = '/bin:/usr/bin';
delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
my $r = shift;
$r->send_http_header("text/html");
$SIG{CHLD} = 'IGNORE';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
print "Parent has finished, kid's PID: $kid\n";
} else {
$r->cleanup_for_exec(); # untie the socket
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null'
or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
setsid or die "Can't start a new session: $!";
exec "/home/httpd/perl/external.pl" or die "Cannot execute exec: $!";
}
Notice that exec() never returns unless it fails to start the
process. Therefore, you shouldn't put any code after exec()--it will be not
executed in the case of success. Use system() or back-ticks instead
if you want to continue doing other things in the process. But then you probably
will want to terminate the process after the program has finished. So you will
have to write:
system "/home/httpd/perl/external.pl" or die "Cannot execute system: $!";
CORE::exit(0);
Another important nuance is that we have to close all STD*
streams in the forked process, even if the called program does that.
If the external program is written in Perl, then you may pass complicated data
structures to it using one of the methods to serialize Perl data and then to
restore it. The Storable and FreezeThaw modules come
handy. Let's say that we have program master.pl calling program
slave.pl:
master.pl
---------
# we are within the C<mod_perl> code
use Storable ();
my @params = (foo => 1, bar => 2);
my $params = Storable::freeze(\@params);
exec "./slave.pl", $params or die "Cannot execute exec: $!"; slave.pl
--------
#!/usr/bin/perl -w
use Storable ();
my @params = @ARGV ? @{ Storable::thaw(shift)||[] } : ();
# do something
As you can see, master.pl serializes the @params data
structure with Storable::freeze and passes it to slave.pl
as a single argument. slave.pl restores it with
Storable::thaw, by shifting the first value of the
ARGV array if available. The FreezeThaw module does a
similar thing.
Starting a Short-Running External Program
Sometimes you need to call an external program and you cannot continue before this program completes its run and optionally returns some result. In this case, the fork solution doesn't help. But we have a few ways to execute this program. First using system():
system "perl -e 'print 5+5'"
We believe that you will never call the Perl interperter for doing this simple calculation, but for the sake of a simple example it's good enough.
The problem with this approach is that we cannot get the results printed to
STDOUT, and that's where back-ticks or qx() help. If you use either:
my $result = `perl -e 'print 5+5'`;
or:
my $result = qx{perl -e 'print 5+5'};
the whole output of the external program will be stored in the
$result variable.
Of course, you can use other solutions, such as opening a pipe (| to
the program) if you need to submit many arguments and more evolved solutions
provided by other Perl modules like IPC::Open2, which allows to open
a process for both reading and writing.
Executing
system() or exec() in the Right Way
The exec() and system() system calls behave
identically in the way they spawn a program. For example, let's use
system(). Consider the following code:
system("echo","Hi");
Perl will use the first argument as a program to execute, find
/bin/echo along the search path, invoke it directly and pass the
Hi string as an argument.
Perl's system() is not the
system(3) call (from the C-library). This is how the arguments to
system() get interpreted. When there is a single argument to
system(), it'll be checked for having shell metacharacters first (like
*,?), and if there are any--Perl interpreter invokes a
real shell program (/bin/sh -c on Unix platforms). If you pass a list of
arguments to system(), then they will be not checked for metacharacters, but split
into words if required and passed directly to the C-level execvp()
system call, which is more efficient. That's a very nice optimization.
In other words, only if you do:
system "sh -c 'echo *'"
will the operating system actually exec() a copy of
/bin/sh to parse your command. But even then, since sh is
almost certainly already running somewhere, the system will notice that (via the
disk inode reference) and replace your virtual memory page table with one
pointing to the existing program code plus your data space, thus will not create
this overhead.
References
- The
mod_perlsite's URL: http://perl.apache.org/ Apache-SubProcesshttp://search.cpan.org/search?dist=Apache-SubProcessStorablehttp://search.cpan.org/search?dist=Storable

