Sign In/My Account | View Cart  
advertisement


Listen Print

Improving mod_perl Sites' Performance: Part 6
by Stas Bekman | Pages: 1, 2

A Complete Fork Example

Now let's put all the bits of code together and show a well-written fork code that solves all the problems discussed so far. I will use an <Apache::Registry> script for this purpose:

  proper_fork1.pl
  ---------------
  use strict;
  use POSIX 'setsid';
  use Apache::SubProcess;
  
  my $r = shift;
  $r->send_http_header("text/plain");
  
  $SIG{CHLD} = 'IGNORE';
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent $$ has finished, kid's PID: $kid\n";
  } else {
      $r->cleanup_for_exec(); # untie the socket
      chdir '/'                or die "Can't chdir to /: $!";
      open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
      open STDOUT, '>/dev/null'
          or die "Can't write to /dev/null: $!";
      open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
      setsid or die "Can't start a new session: $!";
  
      select STDERR;
      local $| = 1;
      warn "started\n";
      # do something time-consuming
      sleep 1, warn "$_\n" for 1..20;
      warn "completed\n";
  
      CORE::exit(0); # terminate the process
  }

The script starts with the usual declaration of the strict mode, loading the POSIX and Apache::SubProcess modules and importing of the setsid() symbol from the POSIX package.

The HTTP header is sent next, with the Content-type of text/plain. The gets ready to ignore the child, to avoid zombies and the fork is called.

The program gets its personality split after fork and the if conditional evaluates to a true value for the parent process, and to a false value for the child process; the first block is executed by the parent and the second by the child.

The parent process announces his PID and the PID of the spawned process and finishes its block. If there will be any code outside, then it will be executed by the parent as well.

The child process starts its code by disconnecting from the socket, changing its current directory to /, opening the STDIN and STDOUT streams to /dev/null, which in effect closes them both before opening. In fact, in this example we don't need neither of these, so I could just close() both. The child process completes its disengagement from the parent process by opening the STDERR stream to /tmp/log, so it could write there, and creating a new session with help of setsid(). Now the child process has nothing to do with the parent process and can do the actual processing that it has to do. In our example, it performs a simple series of warnings, which are logged into /tmp/log:

      select STDERR;
      local $|=1;
      warn "started\n";
      # do something time-consuming
      sleep 1, warn "$_\n" for 1..20;
      warn "completed\n";

The localized setting of $|=1 is there, so we can see the output generated by the program immediately. In fact, it's not required when the output is generated by warn().

Finally, the child process terminates by calling:

      CORE::exit(0);

which makes sure that it won't get out of the block and run some code that it's not supposed to run.

This code example will allow you to verify that indeed the spawned child process has its own life, and its parent is free as well. Simply issue a request that will run this script, watch that the warnings are started to be written into the /tmp/log file and issue a complete server stop and start. If everything is correct, then the server will successfully restart and the long-term process will still be running. You will know that it's still running if the warnings will still be printed into the /tmp/log file. You may need to raise the number of warnings to do above 20, to make sure that you don't miss the end of the run.

If there are only five warnings to be printed, then you should see the following output in this file:

  started
  1
  2
  3
  4
  5
  completed

Starting a Long-Running External Program

But what happens if we cannot just run a Perl code from the spawned process and we have a compiled utility, i.e. a program written in C. Or we have a Perl program that cannot be easily converted into a module, and thus called as a function. Of course, in this case, we have to use system(), exec(), qx() or `` (back ticks) to start it.

When using any of these methods and when the Taint mode is enabled, we must at least add the following code to untaint the PATH environment variable and delete a few other insecure environment variables. This information can be found in the perlsec manpage.

  $ENV{'PATH'} = '/bin:/usr/bin';
  delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};

Now all we have to do is to reuse the code from the previous section.

First, we move the core program into the external.pl file, add the shebang first line so the program will be executed by Perl, tell the program to run under Taint mode (-T) and possibly enable the warnings mode (-w) and make it executable:

  external.pl
  -----------
  #!/usr/bin/perl -Tw
  
  open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
  open STDOUT, '>/dev/null'
      or die "Can't write to /dev/null: $!";
  open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
  
  select STDERR;
  local $|=1;
  warn "started\n";
  # do something time-consuming
  sleep 1, warn "$_\n" for 1..20;
  warn "completed\n";

Now we replace the code that moved into the external program with exec() to call it:

  proper_fork_exec.pl
  -------------------
  use strict;
  use POSIX 'setsid';
  use Apache::SubProcess;
  
  $ENV{'PATH'} = '/bin:/usr/bin';
  delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
  
  my $r = shift;
  $r->send_http_header("text/html");
  
  $SIG{CHLD} = 'IGNORE';
  
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent has finished, kid's PID: $kid\n";
  } else {
      $r->cleanup_for_exec(); # untie the socket
      chdir '/'                or die "Can't chdir to /: $!";
      open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
      open STDOUT, '>/dev/null'
          or die "Can't write to /dev/null: $!";
      open STDERR, '>&STDOUT'  or die "Can't dup stdout: $!";
      setsid or die "Can't start a new session: $!";
  
      exec "/home/httpd/perl/external.pl" or die "Cannot execute exec: $!";
  }

Notice that exec() never returns unless it fails to start the process. Therefore, you shouldn't put any code after exec()--it will be not executed in the case of success. Use system() or back-ticks instead if you want to continue doing other things in the process. But then you probably will want to terminate the process after the program has finished. So you will have to write:

      system "/home/httpd/perl/external.pl" or die "Cannot execute system: $!";
      CORE::exit(0);

Another important nuance is that we have to close all STD* streams in the forked process, even if the called program does that.

If the external program is written in Perl, then you may pass complicated data structures to it using one of the methods to serialize Perl data and then to restore it. The Storable and FreezeThaw modules come handy. Let's say that we have program master.pl calling program slave.pl:

  master.pl
  ---------
  # we are within the C<mod_perl> code
  use Storable ();
  my @params = (foo => 1, bar => 2);
  my $params = Storable::freeze(\@params);
  exec "./slave.pl", $params or die "Cannot execute exec: $!";
  slave.pl
  --------
  #!/usr/bin/perl -w
  use Storable ();
  my @params = @ARGV ? @{ Storable::thaw(shift)||[] } : ();
  # do something

As you can see, master.pl serializes the @params data structure with Storable::freeze and passes it to slave.pl as a single argument. slave.pl restores it with Storable::thaw, by shifting the first value of the ARGV array if available. The FreezeThaw module does a similar thing.

Starting a Short-Running External Program

Sometimes you need to call an external program and you cannot continue before this program completes its run and optionally returns some result. In this case, the fork solution doesn't help. But we have a few ways to execute this program. First using system():

  system "perl -e 'print 5+5'"

We believe that you will never call the Perl interperter for doing this simple calculation, but for the sake of a simple example it's good enough.

The problem with this approach is that we cannot get the results printed to STDOUT, and that's where back-ticks or qx() help. If you use either:

  my $result = `perl -e 'print 5+5'`;

or:

  my $result = qx{perl -e 'print 5+5'};

the whole output of the external program will be stored in the $result variable.

Of course, you can use other solutions, such as opening a pipe (| to the program) if you need to submit many arguments and more evolved solutions provided by other Perl modules like IPC::Open2, which allows to open a process for both reading and writing.

Executing system() or exec() in the Right Way

The exec() and system() system calls behave identically in the way they spawn a program. For example, let's use system(). Consider the following code:

  system("echo","Hi");

Perl will use the first argument as a program to execute, find /bin/echo along the search path, invoke it directly and pass the Hi string as an argument.

Perl's system() is not the system(3) call (from the C-library). This is how the arguments to system() get interpreted. When there is a single argument to system(), it'll be checked for having shell metacharacters first (like *,?), and if there are any--Perl interpreter invokes a real shell program (/bin/sh -c on Unix platforms). If you pass a list of arguments to system(), then they will be not checked for metacharacters, but split into words if required and passed directly to the C-level execvp() system call, which is more efficient. That's a very nice optimization. In other words, only if you do:

  system "sh -c 'echo *'"

will the operating system actually exec() a copy of /bin/sh to parse your command. But even then, since sh is almost certainly already running somewhere, the system will notice that (via the disk inode reference) and replace your virtual memory page table with one pointing to the existing program code plus your data space, thus will not create this overhead.

References