Recently in mod_perl Category

Debugging and Profiling mod_perl Applications

Because of the added complexity of being inside of the Apache web server, debugging mod_perl applications is often not as straightforward as it is with regular Perl programs or CGIs. Is the problem with your code, Apache, a CPAN module you are using, or within mod_perl itself? How do you tell? Sometimes traditional debugging techniques will not give you enough information to find your problem.

Perhaps, instead, you're baffled as to why some code you just wrote is running so slow. You're probably asking yourself, "Isn't this mod_perl stuff supposed to improve my code's performance?" Don't worry, slow code happens even to the best of us. How do you profile your code to find the problem?

This article shows how to use the available CPAN modules to debug and profile your mod_perl applications.

Traditional Debugging Methods

The tried-and-true print statement is the debugger's best friend. Used wisely this, can be the easiest and fastest way of figuring out what is amiss in your program. Can't figure out why your sales tax subroutine is always off by 14 cents? Add several print statements just before, just after, and all around inside of that particular subroutine. Use them to show the value of key variables at each step in the process. You can direct the output straight onto the page in your browser, or if you prefer, into hidden HTML comments. Typically this is all that you need to spot your problems. It's flexible and easy to implement and understand.

Another common approach is to place die() and/or warn() statements as you trace through your code, isolating the problem. die() is especially useful if you do not want your program to continue executing, possibly because the errors will corrupt your otherwise valid testing data. The main benefit of using warn over a simple print statement is that the output goes instead to the appropriate Apache error_log. This keeps your debugging information out of the user interface and gives you the ability to log and spot errors long after they occurred for the user. Simply tail your error_log in another window and you can watch it all day long. If you're into that sort of thing.

For example, if you had some code like:

sub handler {
    my $r   =   shift;

    # Set content type
    $r->content_type( 'text/html' );

    my $req = Apache2::Request->new($r);
   
    # Compute sales tax if we are told to do so
    my $tax = 0;
    if( $req->param('compute_sales_tax') ) {
        my $tax = compute_sales_tax($r, $req->param('total_amount');
    }

    # Code to display results to the browser....
}

... you might find a problem during testing. Your initial search leads you to believe that either the code never calls the compute_sales_tax() function or the function always returns zero. You can add some simple debugging statements:

sub handler {
    my $r   =   shift;
   
    # Set content type
    $r->content_type( 'text/html' );

    my $req = Apache2::Request->new($r);
   
    # Compute sales tax if we are told to do so
    my $tax = 0;

    # Debugging statements
    warn("Tax at start '$tax'");
    warn('compute_sales_tax ' . $req->param('compute_sales_tax') );

    if( $req->param('compute_sales_tax') ) {

        # Debugging
        warn("Tax before sub '$tax'");
        my $tax = compute_sales_tax($r, $req->param('total_amount');
        warn("Tax after sub '$tax'");
    }

    warn("Tax after if '$tax'");

    # Code to display results to the browser....
}

Assuming that the page that directs the user to this code has set compute_sales_tax to a true value, you will see something similar to:

Tax at start '0' at line 5
compute_sales_tax 1 at line 6
Tax before sub '0' at line 12
Tax after sub '1.36' at line 14
Tax after if '0' at line 17

If you read through this, you see that compute_sales_tax() is indeed being called, otherwise you would not see the "Tax before/after" warn outputs. Directly after the subroutine call you can see that $tax holds a suitable value. However, after the if block, $tax reverts back to zero. Upon closer examination, you might find that the bug is the my before the call to compute_sales_tax(). This creates a locally scoped variable named $tax and does not assign it to the $tax variable in the outer block, which causes it to stay zero and makes it seem that compute_sales_tax() was never called.

When to Use Apache::DB

Using print, die, and warn statements in your code will help you find and fix 99 percent of the bugs you may run across when building mod_perl applications. Too bad there is still that pesky remaining 1 percent that will make you tear your hair out in clumps and wish you had gone into selling insurance instead of programming. Luckily there is Apache::DB to help keep the glare off our collective heads at next year's Perl conference to a minimum.

Sometime, despite all of your attempts to see what is going wrong, you will find yourself in a situation where:

  • Your code causes Apache to segfault and you can't for the life of you figure out why.
  • It appears that your code segfaults inside of a subroutine or method you are calling in a CPAN module you are using.
  • You have more debugging statements than actual code.

You could spend time hacking up your other installed modules, such as those from CPAN, with debugging statements--but this only means you will have to return later and remove all of it. You could take an easier route and debug your mod_perl application with a real source debugger.

Using the Perl debugger allows you to see directly into what is happening to your code and data. You can step through your code line by line, as Perl executes it. Because you are following the same flow, there is no chance that you are making any bad assumptions. You might even consider it WYSIWYG, albeit without a GUI.

Using Apache::DB

While Apache::DB works with both mod_perl 1.x and mod_perl 2.x, all of the examples in this article use mod_perl 2.0. Once you have installed Apache::DB from CPAN, using it is fairly simple. It does, however, require that you make a few Apache configuration changes. Assuming you have a mod_perl handler installed at /modperl/ on your system, your configuration needs to resemble this:

<Location /modperl>
  SetHandler perl-script
  PerlResponseHandler My::Modperl::Handler
  PerlFixupHandler +Apache::DB
</Location>

You also need to modify either the appropriate <Perl></Perl> section or your startup.pl file to include:

use APR::Pool ();
use Apache::DB ();
Apache::DB->init();

If you are working in a mod_perl 1.0 environment, the only change is that you should not include the use APR::Pool (); directive.

Note that you must call Apache::DB->init(); prior to whatever code you are attempting to debug. To be safe, I always just put it as the very first thing in my startup.pl.

Once you have modified your configuration, the last step is to launch your Apache server with the -X command-line option. This option tells Apache to launch only one back-end process and to not fork into the background. If you don't use this option, you can't guarantee that your debugger has connected to same Apache child as your browser.

With this Apache daemon tying up your command prompt, simply browse to your application. As you will see, the shell running httpd has been replaced with a Perl debugging session. This debugging session is tied directly to your application and browser. If you look at your browser it will appear to hang waiting for a response; this is due to the fact your Apache server is waiting on you to work with the debugger.

Perl's debugger is very similar to other debuggers you may have used. You can step through your code line by line, skip entire subroutines, set break points, and display and/or change the value of variables with it.

It might be useful to read through man perldebtut, a introductory tutorial on using the debugger. For a more complete reference to all of the available commands, see man perldebug. This list should be just enough to get you started:

Command Description
expression This prints out the value of an expression or variable, just like the print directive in Perl.
expression This evaluates an expression and prettily prints it for you. Use it to make complex data structures readable.
s This tells the debugger to take a single step. A step is a single statement. If the next statement is a subroutine, the debugger will treat it as only one statement; you will not be able to step through each statement of that subroutine and the flow will continue without descending into it.
n This tells the debugger to go to the next statement. If the next statement is a subroutine, you will descend into it and be able to step through each line of that subroutine.
line Display a particular line of source code.
M Display all loaded modules.

Code Profiling with Apache::DProf

Apache::DProf provides the necessary hooks for you to get some coarse profiling information about your code. By coarse, I mean only information on a subroutine level. It will show you the number of times a subroutine is called along with duration information.

Essentially, Apache::DProf wraps Devel::DProf for you, making your life much easier. It is possible to use Devel::DProf by itself, but it assumes that you are running a normal Perl program from the command line and not in a persistent mod_perl environment. This isn't optimal, because while you can shoehorn Devel::DProf into working, you'll end up profiling all of the code used at server startup when you really only care about the runtime code.

Using Apache::DProf is relatively straightforward. All you need to do is include PerlModule Apache::DProf in your httpd.conf and restart your server.

As an example, here's a small application to profile. This code, while not all that useful, will help illustrate the major differences between these two profiling modules:

package PerlTest;

sub handler {
    my $r = shift;
  
    $r->content_type( 'text/plain' );
  
    handle_request($r);

    return( Apache2::Const::OK );
}

sub handle_request {
    my $r = shift;

    $r->print( "Handling request....\n" );

    cleanup_request($r);

}

sub cleanup_request {
    my $r = shift;

    $r->print( "Cleaning up request....\n" );

    sleep(5);     # Take some time in this section
}

1;

When you profile a module with Apache::Dprof, it will create a directory named dprof/ in your server's logs/ directory. Under this directory will be subdirectories named after the PID of each Apache child your server has. This allows you to profile code over a long period of time on a production system to see where your real bottlenecks are. Often, faking a typical user session does not truly represent how your users interact with your application and having the real data is beneficial.

After your server has run for a while, you need to stop it and revert your configuration, removing the PerlModule Apache::DProf you just inserted. This is due to the fact that Apache::DProf does not write its data to disk until the server child ends.

Viewing the profiling data is exactly the same as with Devel::DProf. Choose a particular Apache child directory in $SERVER_ROOT/logs/dprof/ and run dprofpp on the corresponding tmon.out file.

After beating on the code sample above for awhile with ab, here are the results Apache::DProf gave me:

Total Elapsed Time = 1082.402 Seconds
  User+System Time =        0 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 0.00   0.004  0.001    687   0.0000 0.0000  RevSys::PerlTest::cleanup_request
 0.00       - -0.000      1        -      -  warnings::import
 0.00       - -0.000      1        -      -  APR::Pool::DESTROY
 0.00       - -0.000      1        -      -  strict::import
 0.00       - -0.000      1        -      -  Apache2::XSLoader::load
 0.00       - -0.000      3        -      -  Apache2::RequestIO::BEGIN
 0.00       - -0.000      2        -      -  RevSys::PerlTest::BEGIN
 0.00       - -0.003    687        -      -  Apache2::RequestRec::content_type
 0.00       - -0.006   1374        -      -  Apache2::RequestRec::print
 0.00       - -0.012    687        -      -  RevSys::PerlTest::handle_request
 0.00       - -0.024    687        -      -  RevSys::PerlTest::handler

As expected, cleanup_request() shows the most time used per call. The report also shows stats for the other function calls you would expect as well as the ones that happen behind the scenes.

Code Profiling with Apache::SmallProf

While Apache::DProf will show you which subroutines use the most system resources, sometimes that is not enough information. Apache::SmallProf gives you fine-grained details in a line-by-line profile of your code.

Setup is similar to both of two previous modules. Add into a <Perl> section or your startup.pl file the code:

use APR::Pool ();
use Apache::DB ();
Apache::DB->init();

You also need to add PerlFixupHandler Apache::SmallProf into the <Directory> or <Location> block that refers to your mod_perl code.

Like Apache::DProf, Apache::SmallProf writes all of the profiling data into $SERVER_ROOT/logs/smallprof/. One interesting difference between Apache::DProf and Apache::SmallProf is that the latter writes a profile for each module in use. This is helpful because you already know which subroutines are slow and which packages they are in, from your first round of profiling with Apache::DProf. By focusing on those modules you can find your troubled code much faster.

Viewing Apache::SmallProf data is, however, a little different from Apache::DProf. A module profile looks like this:

<number> <wall time> <cpu time> <line number> <source line>

<number> is the number of times this particular line was executed, <wall time> is the actual time passed, and <cpu time> is the amount of time the CPU spent working on that line. The remaining two pieces of data are the line number in the file and the actual source on that line.

You can just open up the profiles generated by Apache::SmallProf and look at the results. However, this doesn't get to the heart of the matter very quickly. Sorting the profile by the amount of time spent on each line gets you where you want to go:

$ sort -nrk 2 logs/smallprof/MyHandler.pm | more

This command sorts the profile for MyHandler.pm by the wall time of each line. If you use this same sort on the output from Apache::SmallProf on the example code, you will see something similar to this:

# sort -nrk 2 PerlTest.pm.prof | more
    1 5.000785 0.000000         29:    sleep( 5 );
    1 0.008177 0.000000         13:    return( Apache2::Const::OK );
    1 0.007431 0.010000         21:    cleanup_request( $r );
    3 0.001343 0.000000          4:use Apache2::RequestIO;
    1 0.000176 0.000000         33:1;
    3 0.000164 0.000000          3:use Apache2::RequestRec;
    1 0.000093 0.000000         19:    $r->print( "Handling request......\n" );
    1 0.000067 0.000000         11:    handle_request( $r );
    1 0.000058 0.000000          9:    $r->content_type( 'text/plain' );
    1 0.000058 0.000000         28:    $r->print( "Cleaning up request......\n" );

As you can see, Apache::SmallProf has zeroed right in on our sleep() call as the source of our performance problems.

Conclusion

Hopefully, this article has given you enough of an introduction to these modules that you can begin using them in your development efforts. The next time you face a seemingly unsolvable bug or performance issue, you have a few more weapons in your arsenal.

If you have trouble getting any of these three modules to work, please don't hesitate to contact me directly. If you need mod_perl help in general, I strongly suggest you join the mod_perl mailing list. You can often get an answer to your mod_perl question in a few hours, if not minutes.

Integrating mod_perl with Apache 2.1 Authentication

Scratching Your Own Itch

Some time ago I became intrigued with Digest authentication, which uses the same general mechanism as the familiar Basic authentication scheme but offers significantly more password security without requiring an SSL connection. At the time it was really just an academic interest—while some browsers supported Digest authentication, many of the more popular ones did not. Furthermore, even though the standard Apache distribution came with modules to support both Basic and Digest authentication, Apache (and thus mod_perl) only offered an API for interacting with Basic authentication. If you wanted to use Digest authentication, flat files were the only password storage medium available. With both of these restrictions, it seemed impractical to deploy Digest authentication in all but the most limited circumstances.

Fast forward two years. Practically all mainstream browsers now support Digest authentication, and my interest spawned what is now Apache::AuthDigest, a module that gives mod_perl 1.0 developers an API for Digest authentication that is very similar to the Basic API that mod_perl natively supports. The one lingering problem is probably not surprising—Microsoft Internet Explorer. As it turns out, using the Digest scheme with MSIE requires a fully RFC-compliant Digest implementation, and Apache::AuthDigest was patterned after Apache 1.3's mod_digest.c, which is sufficient for most browsers but not MSIE.

In my mind, opening up Digest authentication through mod_perl still needed work to be truly useful, namely full RFC compliance to support MSIE. Wading through RFCs is not how I like to spend my spare time, so I started searching for a shortcut. Because Apache 2.0 did away with mod_digest.c and replaced it with the fully compliant mod_auth_digest.c, I was convinced that there was something in Apache 2.0 I could use to make my life easier. In Apache 2.1, the development version of the next generation Apache server, I found what I was looking for.

In this article, we're going to examine a mod_perl module that provides Perl support for the new authentication provider hooks in Apache 2.1. These authentication providers make writing Basic authentication handlers easier than it has been in the past. At the same time, the new provider mechanism opens up Digest authentication to the masses, making the Digest scheme a real possibility for filling your dynamic authentication needs. While the material is somewhat dense, the techniques we will be looking at are some of the most interesting and powerful in the mod_perl arsenal. Buckle up.

To follow along with the code in this article, you will need at least mod_perl version 1.99_10, which is currently only available from CVS. You will also need Apache 2.1, which is also only available from CVS. Instructions for obtaining the sources for both can be found here. When compiling Apache, keep in mind that the code presented here only works under the prefork MPM — making it thread-safe is the next step in the adventure.

Authentication Basics

Because there is lots of material to cover, we'll skip over the requisite introductory discussion of HTTP authentication, the Apache request cycle, and other materials that probably already familiar and skip right to the mod_perl authentication API. In both mod_perl 1.0 and mod_perl 2.0, the PerlAuthenHandler represents Perl access to the Apache authentication phase, where incoming user credentials are traditionally matched to those stored within the application. A simple PerlAuthenHandler in mod_perl 2.0 might look like the following.

package My::BasicHandler;

use Apache::RequestRec ();
use Apache::Access ();

use Apache::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

use strict;

sub handler {
  my $r = shift;

  # get the client-supplied credentials
  my ($status, $password) = $r->get_basic_auth_pw;

  # only continue if Apache says everything is OK
  return $status unless $status == Apache::OK;

  # user1/basic1 is ok
  if ($r->user eq 'user1' && $password eq 'basic1') {
    return Apache::OK;
  }

  # user2 is denied outright
  if ($r->user eq 'user2') {
    $r->note_basic_auth_failure;
    return Apache::HTTP_UNAUTHORIZED;
  }

  # all others are passed along to the Apache default
  # handler, which reads from the AuthUserFile
  return Apache::DECLINED;
}

1;

Although simple and impractical, this handler illustrates the API nicely. The process begins with a call to get_basic_auth_pw(), which does a few things behind the scenes. If a suitable Basic Authorization header is found, get_basic_auth_pw() will parse and decode the header, populate the user slot of the request record, and return OK along with the user-supplied password in clear text. Any value other than OK should be immediately propagated back to Apache, which effectively terminates the current request.

The next step in the process is where the real authentication logic resides. Our handler is responsible for digging out the username from $r->user() and applying some criteria for determining whether the user-supplied credentials are acceptable. If they are, the handler simply returns OK and the request is allowed to proceed. If they are not, the handler has a decision to make: either call note_basic_auth_failure() and return HTTP_UNAUTHORIZED (which is the same as the old AUTH_REQUIRED) to indicate failure, or return DECLINED to pass authentication control to the next authentication handler.

For the most part, the mod_perl API is identical to the API Apache offers to C module developers. The benefit that mod_perl adds is the ability to easily extend authentication beyond Apache's default flat-file mechanism to the areas where Perl support is strong, such as relational databases or LDAP. However, despite the versatility and strength programming the authentication phase offered, I never liked the look and feel of the API. While in some respects the process is dictated by the nuances of RFC 2617 and the HTTP protocol itself, the interface always struck me as somewhat inconsistent and difficult for new users to grasp. Additionally, as already mentioned, the API covers only Basic authentication, which is a real drawback as more and more browsers support the Digest scheme.

Apparently I wasn't alone in some of these feelings. Apache 2.1 has taken steps to improve the overall process for module developers. The result is a new, streamlined API that focuses on a new concept: authentication providers.

Authentication Providers in Apache 2.1

While in Apache 2.0 module writers were responsible for a large portion of the authentication logic—calling routines to parse and set authentication headers, digging out the user from the request record, and so on — the new authentication mechanism in Apache 2.1 delegates all HTTP and RFC logic out to two standard modules. mod_auth_basic handles Basic authentication and is enabled in the default Apache build. The standard mod_auth_digest, not enabled by default, handles the very complex world of Digest authentication. Regardless of the authentication scheme you choose to support, these modules are responsible for the details of parsing and interpreting the incoming request headers, as well as generating properly formatted response headers.

Of course, managing authentication on an HTTP level is only part of the story. What mod_auth_basic and mod_auth_digest leave behind is the job of digging out the server-side credentials and matching them to their incoming counterpart. Enter authentication providers.

Authentication providers are modules that supply server-side credential services to mod_auth_basic or mod_auth_digest. For instance, the default mod_authn_file digs the username and password out of the flat file specified by the AuthUserFile directive, similar to the default mechanism in Apache 1.3 and 2.0. An Apache 2.1 configuration that explicitly provides the same flat file behavior as Apache 2.0 would look similar to the following.

<Location /protected>
  Require valid-user
  AuthType Basic
  AuthName realm1

  AuthBasicProvider file

  AuthUserFile realm1
</Location>

The new part of this configuration is the AuthBasicProvider directive, which is implemented by mod_auth_basic and used to specify the provider responsible for managing server-side credentials. There is also a corresponding AuthDigestProvider directive if you have mod_auth_digest installed.

While it could seem as though Apache 2.1 is merely adding another directive to achieve essentially the same results, the shift to authentication providers adds significant value for module developers: a new API that is far simpler than before. Skipping ahead to the punch line, programming with new Perl API for Basic authentication, which follows the Apache API almost exactly, would look similar to the following.

package My::BasicProvider;

use Apache::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

use strict;

sub handler {
  my ($r, $user, $password) = @_;

  # user1/basic1 is ok
  if ($user eq 'user1' && $password eq 'basic1') {
    return Apache::OK;
  }

  # user2 is denied outright
  if ($user eq 'user2') {
    return Apache::HTTP_UNAUTHORIZED;
  }

  # all others are passed along to the next provider
  return Apache::DECLINED;
}

1;

As you can see, not only are the incoming username and password supplied in the argument list, removing the need for get_basic_auth_pw() and its associated checks, but gone is the need to call note_basic_auth_failure() before returning HTTP_UNAUTHORIZED. In essence, all that module writers need to be concerned with is validating the user credentials against whatever back-end datastore they choose. All in all, the API is a definite improvement. To add even more excitement, the API for Digest authentication looks almost exactly the same (but more on that later).

Because the new authentication provider approach represents a significant change in the way Apache handles authentication internally, it is not part of the stable Apache 2.0 tree and is instead being tested in the development tree. Unfortunately, until the provider mechanism is backported to Apache 2.0, or an official Apache 2.2 release, it is unlikely that authentication providers will be supported by core mod_perl 2.0. However, this does not mean that mod_perl developers are out of luck—by coupling mod_perl's native directive handler API with a bit of XS, we can open up the new Apache provider API to Perl with ease. The Apache::AuthenHook module does exactly that.

Introducing Apache::AuthenHook

Over in the Apache C API, authentication providers have a few jobs to do: they must register themselves by name as a provider while supplying a callback interface for the schemes they wish to support (Basic, Digest, or both). In order to open up the provider API to Perl modules our gateway module Apache::AuthenHook will need to accomplish these tasks as well. Both of these are accomplished at the same time through a call to the official Apache API function ap_register_provider.

Usually, mod_perl provides direct access to the Apache C API for us. For instance, a Perl call to $r->get_basic_auth_pw() is proxied off to ap_get_basic_auth_pw—but in this case ap_register_provider only exists in Apache 2.1 and, thus, is not supported by mod_perl 2.0. Therefore, part of what Apache::AuthenHook needs to do is open up this API to Perl. One of the great things about mod_perl is the ease at which it allows itself to be extended even beyond its own core functionality. Opening up the Apache API past what mod_perl allows is relatively easy with a dash of XS.

Our module opens with AuthenHook.xs, which is used to expose ap_register_provider through the Perl function Apache::AuthenHook::register_provider().

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include "mod_perl.h"
#include "ap_provider.h"
#include "mod_auth.h"

...

static const authn_provider authn_AAH_provider =
{
  &check_password,
  &get_realm_hash,
};

MODULE = Apache::AuthenHook    PACKAGE = Apache::AuthenHook

PROTOTYPES: DISABLE

void
register_provider(provider)
  SV *provider

  CODE:

    ap_register_provider(modperl_global_get_pconf(),
                         AUTHN_PROVIDER_GROUP,
                         SvPV_nolen(newSVsv(provider)), "0",
                         &authn_AAH_provider);

Let's start at the top. Any XS module you write will include the first three header files, while any mod_perl XS extension will require at least #include "mod_perl.h". The remaining two included header files are specific to what we are trying to accomplish—ap_provider.h defines the ap_register_provider function, while mod_auth.h defines the AUTHN_PROVIDER_GROUP constant we will be using, as well as the authn_provider struct that holds our callbacks.

Skipping down a bit, we can see our implementation of Apache::AuthenHook::register_provider(). The MODULE and PACKAGE declarations place register_provider() into the Apache::AuthenHook package. Following that is the definition of the register_provider() function itself.

As you can see, register_provider() accepts a single argument, a Perl scalar representing the name of the provider to register, making its usage something akin to the following.

Apache::AuthenHook::register_provider('My::BasicProvider');

The user-supplied name is then used in the call to ap_register_provider from the Apache C API to register My::BasicProvider as an authentication provider.

The twist in the process is that our implementation of ap_register_provider registers Apache::AuthenHook's callbacks (the check_password and get_realm_hash routines not shown here) for each Perl provider. In essence, this means that Apache::AuthenHook will be acting as go-between for Perl providers. Much in the same way that mod_perl proper is called by Apache for each phase and dispatches to different Perl*Handlers, Apache::AuthenHook will be called by Apache's authentication modules and dispatch to the appropriate Perl provider at runtime.

If this boggles your mind a bit, not to worry, it is only being presented to give you a feel for the bigger picture and show how easy it is to open up closed parts of the Apache C API with mod_perl and just a few lines of XS. However, the fun part of Apache::AuthenHook (and the part that you are more likely to use in your own mod_perl modules) is handled over in Perl space.

Setting the Stage

Now that we have the ability to call ap_register_provider, we need to link that into the Apache configuration process somehow. What we do not want to do is replace current PerlAuthenHandler functionality, since that directive is for inserting authentication handler logic in place of Apache's defaults. In our case, we need the default modules to run so they can call our Perl providers. Instead, we want to make it possible for Perl modules to register themselves as authentication providers. While we could have Perl providers call our new register_provider() function directly, Apache::AuthenHook chose to make the process transparent, using mod_perl's directive handler API to call register_provider() silently as httpd.conf is parsed.

Apache::AuthenHook makes sneaky use of directive handlers to extend the default Apache AuthBasicProvider and AuthDigestProvider directives so they register Perl providers on-the-fly. The net result is that Perl providers will be fully registered and configured via standard Apache directives, similar to the following.

AuthBasicProvider My::BasicProvider file

At configuration time, Apache::AuthenHook will intercept AuthBasicProvider and register My::BasicProvider. At request time, mod_auth_basic will attempt to authenticate the user, first using My::BasicProvider, followed by the default file provider if My::BasicProvider declines the request.

A nice side effect to this is that through our implementation we will be giving mod_perl developers a feature they have never had before—the ability to interlace Perl handlers and C handlers within the same phase.

AuthDigestProvider My::DigestProvider file My::OtherDigestProvider

Exciting, no? Let's take a look at AuthenHook.pm and see how the directive handler API works in mod_perl 2.0.

Directive Handlers with mod_perl

Directive handlers are a very powerful but little used feature of mod_perl. For the most part, their lack of use probably stems from the complex and intimidating API in mod_perl 1.0. However, in mod_perl 2.0, the API is much simpler and should lend itself to adoption by a larger audience.

The directive handler API allows mod_perl modules to define their own custom configuration directives that are understood by Apache, For example, enabling modules to make use of configuration variables like:

Foo "bar"

in httpd.conf requires only a few relatively simple settings you can code directly in Perl.

While using directive handlers simply to replace PerlSetVar behavior might seem a bit flashy, the techniques used by Apache::AuthenHook are some of the most powerful mod_perl has to offer.

As previously mentioned, we will be extending the new AuthBasicProvider and AuthDigestProvider directives to apply to Perl providers as well, silently registering each provider as the directive itself is parsed. To do this, we redefine these core directives, manipulate their configuration data, then disappear and allow Apache to handle the directives as if we were never there.

The code responsible for this is in AuthenHook.pm.

package Apache::AuthenHook;

use 5.008;

use DynaLoader ();

use mod_perl 1.99_10;     # DECLINE_CMD and $parms->info support
use Apache::CmdParms ();  # $parms->info

use Apache::Const -compile => qw(OK DECLINE_CMD OR_AUTHCFG RAW_ARGS);

use strict;

our @ISA     = qw(DynaLoader);
our $VERSION = '2.00_01';

__PACKAGE__->bootstrap($VERSION);

our @APACHE_MODULE_COMMANDS = (
  { name         => 'AuthDigestProvider',
    errmsg       => 'specify the auth providers for a directory or location',
    args_how     => Apache::RAW_ARGS,
    req_override => Apache::OR_AUTHCFG,
    cmd_data     => 'digest' },

  { name         => 'AuthBasicProvider',
    errmsg       => 'specify the auth providers for a directory or location',
    args_how     => Apache::RAW_ARGS,
    req_override => Apache::OR_AUTHCFG,
    func         => 'AuthDigestProvider',
    cmd_data     => 'basic' },
);

At the top of our module we import a few required items, some that are new and some that should already be familiar. DynaLoader and the bootstrap() method are required to pull in the register_provider() function from our XS implementation and, unlike with mod_perl 1.0, have nothing to do with the actual directive handler implementation. The Apache::CmdParms class provides the info() method we will be illustrating shortly, while Apache::Const gives us access to the constants we will need throughout the process.

The @APACHE_MODULE_COMMANDS array is where the real interface for directive handlers begins. @APACHE_MODULE_COMMANDS holds an array of hashes, each of which defines the behavior of an Apache directive. Let's focus on the first directive our handler implements, AuthDigestProvider, forgetting for the moment that mod_auth_digest also defines this directive.

While it should be obvious that the name key specifies the name of the directive, it is not so obvious that it also specifies the default Perl subroutine to call when Apache encounters AuthDigestProvider while parsing httpd.conf. Later on, we will need to implement the AuthDigestProvider() subroutine, which will contain the logic for all the activities we want to perform when Apache sees the AuthDigestProvider directive.

The args_how and req_override are fields that tell Apache specifically how the directive is supposed to behave in the configuration. req_override defines how our directive will interact with the core AllowOverride directive, in our case allowing AuthDigestProvider in .htaccess files only on directories governed by AllowOverride AuthConfig. Similarly, args_how defines how Apache should interact with our AuthDigestProvider() subroutine when it sees our directive in httpd.conf. In the case of RAW_ARGS, it means that Apache will pass our callback whatever follows the directive as a single string. Other possible values for both of these keys can be found in the documentation pointed to at the end of this article.

The final important key in our first hash is the cmd_data key, in which we can store a string of our choosing. This will become important in a moment.

The second hash in @APACHE_MODULE_COMMANDS defines the behavior of the AuthBasicProvider directive, which for the most part is identical to AuthDigestProvider. The differences are important, however, and begin with the addition of the func key. Although the default Perl subroutine callback for handling directives is the same as the name of the directive, the func key allows us to point to a different subroutine instead. Here we will be reusing AuthDigestProvider() to process both directives. How will we know which directive is actually being parsed? The cmd_data slot will contain digest when processing AuthDigestProvider and basic when processing AuthBasicProvider.

At this point, we have defined what our directives will look like and how they will interact with Apache in httpd.conf. What we have not shown is the logic that sits behind our directives. As we mentioned, both of our directives will be calling the Perl subroutine AuthDigestProvider, defined in AuthenHook.pm.

sub AuthDigestProvider {
  my ($cfg, $parms, $args) = @_;

  my @providers = split ' ', $args;

  foreach my $provider (@providers) {

    # if the provider looks like a Perl handler...
    if ($provider =~ m/::/) {

      # save the config for later
      push @{$cfg->{$parms->info}}, $provider;

      # and register the handler as an authentication provider
      register_provider($provider);
    }
  }

  # pass the directive back to Apache "unprocessed"
  return Apache::DECLINE_CMD;
}

The first argument passed to our directive handler callback, $cfg, represents the configuration object for our module, which we can populate with whatever data we choose and access again at request time. The second argument is an Apache::CmdParms object, which we will use to dig out the string we specified in the cmd_data slot of our configuration hash using the info() method.

While the first two arguments are standard and will be there for any directive handler you write, the third argument can vary somewhat. Because we specified RAW_ARGS as our args_how setting in the configuration hash, $args contains everything on the httpd.conf line following our directive. The standard Auth*Provider directives we are overriding can take more than one argument, so we split on whitespace and break apart the configuration into an array of providers, each of which we then process separately.

Each provider is examined using a cursory check to see whether the specified provider is a Perl provider. If the provider meets our criteria, we call the register_provider() function defined in AuthenHook.xs and keep track of the provider by storing it in our $cfg configuration object.

The final part of our callback brings the entire process together. The constant DECLINE_CMD has special meaning to Apache. Just as you might return DECLINED from a PerlTransHandler to trick Apache into thinking no translation took place, returning DECLINE_CMD from a directive handler tricks Apache into thinking that the directive was unprocessed. So, after our AuthDigestProvider() subroutine runs, Apache will continue along until it finds mod_auth_digest, which will then process the directive as though we were never there.

The one final piece of AuthenHook.pm that we need to discuss is directive merging. In order to deal properly with situations when directives meet, such as when AuthBasicProvider is specified in both an .htaccess file as well as the <Location> that governs the URI, we need to define DIR_CREATE() and DIR_MERGE() subroutines.

DIR_CREATE() is called at various times in the configuration process, including when <Location> and related directives are parsed at configuration time, as well as whenever an .htaccess file enters the request cycle. This is where we create the $cfg object our callback uses to store configuration data. While it is not required, DIR_CREATE() is a good place to initialize fields in the object as well, which prevents accidentally dereferencing nonexistent references.

sub DIR_CREATE {
  return bless { digest => [],
                 basic  => [], }, shift;
}

DIR_MERGE, generally called at request time, defines how we handle places where directives collide. The following code is standard for allowing the current configuration (%$base) to inherit only missing parameters from higher configurations (%$add), which is the behavior you are most likely to want.

sub DIR_MERGE {
  my ($base, $add) = @_;

  my %new = (%$add, %$base);

  return bless \%new, ref($base);
}

1;

Thus ends AuthenHook.pm.

The final result is pretty amazing. By secretly intercepting the AuthDigestProvider directive before mod_auth_digest has the chance to process it, we have provided an interface that makes the presence of Apache::AuthenHook all but undetectable. To enable the new provider mechanism for mod_perl developers, all that is required is to load Apache::AuthenHook using the new PerlLoadModule directive

PerlLoadModule Apache::AuthenHook

and their Perl providers will be magically inserted into the authentication phase at the appropriate time.

Taking a Step Back

Let's recap what we have accomplished so far. AuthenHook.pm redefines AuthDigestProvider and AuthBasicProvider so that any Perl providers listed in the configuration are automagically registered and inserted into the authentication process. At request time, one of the default Apache authentication handlers will call on the configured providers to supply server-side credentials. All registered Perl providers really point to the callbacks in AuthenHook.xs which have the arduous task of proxying the request for server-side credentials to the proper Perl provider. All in all, Apache::AuthenHook covers lots of ground, even if the gory details of what happens over in XS land have been left out.

As we mentioned earlier, Apache::AuthenHook not only the ability to write authentication providers in Perl, but it also follows the Apache API very closely. While diving deep into the XS code that Apache::AuthenHook uses to implement the check_password and get_realm_hash callbacks is far beyond the scope of this article, you may find it interesting that the callback signature for check_password

static authn_status check_password(request_rec *r, const char *user,
                                   const char *password)
{
  ...
}

is practically identical to what Apache::AuthenHook passes on to Perl providers supporting the Basic authentication scheme.

sub handler {
  my ($r, $user, $password) = @_;

  ...
}

If you recall, we started investigating the Apache 2.1 provider mechanism as a way to combine the security of the Digest authentication scheme with the strength of Perl. The signature for the Digest authentication callback, get_realm_hash, is only slightly different than check_password.

static authn_status get_realm_hash(request_rec *r, const char *user,
                                   const char *realm, char **rethash)
{
  ...
}

How does this translate into a Perl API? It is surprisingly simple. As it turns out, the name check_password is significant—for Basic authentication, the provider is expected to take steps to see if the incoming username and password match the username and password stored on the server back-end. For Digest authentication, as the name get_realm_hash might suggest, all a provider is responsible for is retrieving the hash for a user at a given realm. mod_auth_digest does all the heavy lifting.

Digest Authentication for the People

While we didn't take the time to explain how Basic authentication over HTTP actually works, briefly explaining Digest authentication is probably worth the time, if only to allow you to appreciate the elegance of the new provider mechanism.

When a request comes in for a resource protected by Digest authentication, the server begins the process by returning a WWW-Authenticate header that contains the authentication scheme, realm, a server generated nonce, and various other bits of information. A fully RFC-compliant WWW-Authenticate header might look like the following.

WWW-Authenticate: Digest realm="realm1", 
nonce="Q9equ9C+AwA=195acc80cf91ce99828b8437707cafce78b11621", 
algorithm=MD5, qop="auth"

On the client side, the username and password are entered by the end user based on the authentication realm sent from the server. Unlike Basic authentication, in which the client transmits the user's password practically in the clear, Digest authentication never exposes the password over the wire. Instead, both the client and server handle the user's credentials with care. For the client, this means rolling up the user credentials, along with other parts of the request such as the server-generated nonce and request URI, into a single MD5 hash, which is then sent back to the server via the Authorization header.

Authorization: Digest username="user1", realm="realm1", 
qop="auth", algorithm="MD5", uri="/index.html",
nonce="Q9equ9C+AwA=195acc80cf91ce99828b8437707cafce78b11621", 
nc=00000001, cnonce="3e4b161902b931710ae04262c31d9307", 
response="49fac556a5b13f35a4c5f05c97723b32"

The server, of course, needs to have its own copy of the user credentials around for comparison. Now, because the client and server have had (at various points in time) access to the same dataset—the user-supplied username and password, as well as the request URI, authentication realm, and other information shared in the HTTP headers—both ought to be able to generate the same MD5 hash. If the hash generated by the server does not match the one sent by the client in the Authorization header, the difference can be attributed to the one piece of information not mutually agreed upon through the HTTP request: the password.

As you can see from the headers involved, there is quite a lot of information to process and interpret with the Digest authentication scheme. However, if you recall, one of the benefits of the new provider mechanism is that mod_auth_digest takes care of all the intimate details of the scheme internally, relieving you from the burden of understanding it at all.

All a Digest provider is required to do is match the incoming user and realm to a suitable digest, stored in the medium of its choosing, and return it. With the hash in hand, mod_auth_digest will do all the subsequent manipulations and decide whether the hash the provider supplied is indeed sufficient to allow the user to continue on its journey to the resource it is after.

With that background behind us, we can proceed with a sample Perl Digest provider.

package My::DigestProvider;

use Apache::Log;

use Apache::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

use strict;

sub handler {
  my ($r, $user, $realm, $hash) = @_;

  # user1 at realm1 is found - pass to mod_auth_digest
  if ($user eq 'user1' && $realm eq 'realm1') {
    $$hash = 'eee52b97527306e9e8c4613b7fa800eb';
    return Apache::OK;
  }

  # user2 is denied outright
  if ($user eq 'user2' && $realm eq 'realm1') {
    return Apache::HTTP_UNAUTHORIZED;
  }

  # all others are passed along to the next provider
  return Apache::DECLINED;
}

1;

Note the only slight difference between the interface for Digest authentication as compared to Basic authentication. Because the authentication realm is an essential part of the Digest scheme, it is passed to our handler() subroutine in addition to the request record, $r, and username we received with the Basic scheme.

Knowing the username and authentication realm, our provider can choose whatever method it desires to retrieve the MD5 hash associated with the user. Returning the hash for comparison by mod_auth_digest is simply a matter of populating the scalar referenced by $hash and returning OK. While using references in this way may feel a bit strange, it follows the same pattern as the official Apache C API, so I guess that makes it ok.

If the user cannot be found, the provider can choose to return HTTP_UNAUTHORIZED and deny access to the user, or return DECLINED to pass authority for the user to the next provider. Remember, unlike with the Perl handlers for all the other phases of the request, you can intermix Perl providers with C providers, sandwiching the default file provider with Perl providers of your own choosing.

The one question that remains is how to generate a suitable MD5 digest to pass back to mod_auth_digest. For the default file provider, the return digest is typically generated using the htdigest binary that comes with the Apache installation. However, a Perl one-liner that can be used to generate a suitable MD5 digest for Perl providers would look similar to the following.

$ perl -MDigest::MD5 -e'print Digest::MD5::md5_hex("user:realm:password")'

That is all there is to it. No hash checking, no header manipulations, no back flips or somersaults. Simply dig out the user credentials and pass them along. At last, Digest authentication for the (Perl) people.

Don't Forget the Tests!

Of course, no module would be complete without a test suite, and the Apache-Test framework introduced last time gives us all the tools we need to write a complete set of tests.

For the most part, the tests for Apache::AuthenHook are not that different from those presented before. LWP supports Digest authentication natively, so all our test scripts really need to do is make a request to a protected URI and let LWP do all the work. Here is a snippet from one of the tests.

plan tests => 10, (have_lwp &&
                   have_module('mod_auth_digest'));

my $url = '/digest/index.html';

$response = GET $url, username => 'user1', password => 'digest1';
ok $response->code == 200;

When we plan the tests, we first check for the existence of mod_auth_digest—both mod_auth_basic and mod_auth_digest can be enabled or disabled for any given installation, so we need to check for them where appropriate. Passing the username and password credentials is pretty straightforward, using the username and password keys after the URL when formatting the request.

Actually, while the username and password keys have special meaning, you can use the same technique to send any arbitrary headers in the request.

# fetch the Last-Modified header from one response...
my $last_modified = $response->header('Last-Modified');

# and use it in the next request
$response = GET $uri, 'If-Modified-Since' => $last_modified;
ok ($response->code == HTTP_NOT_MODIFIED);

That's something to note just in case you need that functionality sometime later in your testing life.

One final note about our tests will apply to anyone writing a mod_perl XS extension. Instead of using extra.conf.in to configure Apache, we used extra.last.conf.in. The difference between the two is that extra.last.conf.in is guaranteed to be loaded the last in the configuration order—if our PerlLoadModule directive is processed before mod_perl gets the chance to add the proper blib entries, nothing will work, so ensuring our configuration is loaded after everything else is in place is important.

Whew

mod_perl is truly exciting. With surprisingly little work, we have managed to open an entire new world within Apache 2.1 to the Perl masses. I know of no other blend of technologies that allow for such remarkable flexibility beyond what each individually brings to the table. Hopefully, this article has not only introduced you to new Apache and authentication concepts, but has also brought to light ways in which you can leverage mod_perl that you never thought of before.

More Information

I apologize if this article is a little on the heavy side, teasing you with only cursory introductions to cool concepts while leaving out the finer details. So, if you want to explore these concepts in more detail, I leave you with the following required reading.

A nice overall introduction to the new provider mechanism can be found in Safer Apache Driving with AAA. The mechanics of Basic authentication can be found in lots of places, but decent explanations of Digest authentication are harder to find. Both are covered to some level of detail in Chapter 13 of the mod_perl Developer's Cookbook, which is freely available online. Recipe 13.8 in particular includes the code that became the splinter in my mind and eventually this article.

A more detailed explanation of directive handlers in mod_perl 2.0 can be found in the mod_perl 2.0 documentation. Although covering only mod_perl 1.0 directive handlers, whose implementation is very different, Chapter 8 in Writing Apache Modules with Perl and C and Recipes 7.8 through 7.11 in the mod_perl Developer's Cookbook provide excellent explanations of concepts that are universal to both platforms, and are essential reading if you plan on using directive handlers yourself. If you are curious about the intricate details of directive merging, Chapter 21 in Apache: the Definitive Guide presents probably the most comprehensive explanation available.

Finally, if you are interested in the gory details of the XS that really drives Apache::AuthenHook, there is no better single point of reference than Extending and Embedding Perl, which was my best friend while writing this module and absolutely deserves a place on your bookshelf.

Thanks

Many thanks to Stas Bekman and Philippe Chiasson for their feedback and review of the several patches to mod_perl core that were required for the code in this article, as well as to Jörg Walter, who was kind enough to take the time to review this article and give valuable feedback.

Testing mod_perl 2.0

Last time, we looked at writing a simple Apache output filter - Apache::Clean - using the mod_perl 2.0 API. How did I know that the filter I presented really worked? I wrote a test suite for it, one that exercised the code against a live Apache server using the Apache-Test testing framework.

Writing a series of tests that executes against a live Apache server has become much simpler since the advent of Apache-Test. Although Apache-Test, as part of the Apache HTTP Test Project, is generic enough to be used with virtually any version of Apache (with or without mod_perl enabled), it comes bundled with mod_perl 2.0, making it the tool of choice for writing tests for your mod_perl 2.0 modules.

Testing, Testing, 1, 2, 3

There are many advantages to writing tests. For instance, maintaining the test suite as I coded Apache::Clean allowed me to test each functional unit as I implemented it, which made development easier. The individual tests also allowed me to be fairly certain that the module would behave as expected once distributed. As an added bonus, tests offer additional end-user documentation in the form of test scripts, supporting libraries and configuration files, available to anyone who wants to snoop around the distribution a bit. All in all, having a test suite increases the value of your code exponentially, while at the same time making your life easier.

Of course, these benefits come from having any testing environment, and are not limited to just Apache-Test. The particular advantage that Apache-Test brings to the table is the ease at which it puts a whole, pristine, and isolated Apache server at your disposal, allowing you to test and exercise your code in a live environment with a minimum of effort. No more Apache::FakeRequest, no more httpd.conf configurations strewn across development environments or corrupted with proof-of-concept handlers that keep you busy following non-bugs for half a day. No more mess, no more tears.

If you have ever used tools like Test.pm or Test::More as the basis for testing your modules, then you already know most of what using Apache-Test is going to look like. In fact, Apache-Test uses Test.pm under the hood, so the layout and syntax are similar. If you have never written a test before, (and shame on you)then An Introduction to Testing provides a nice overview of testing with Perl. For the most part, though, Apache-Test is really simple enough that you should be able to follow along here without any trouble or previous knowledge.

Leveraging the Apache-Test framework requires only a few steps - generating the test harness, configuring Apache to your specific needs, and writing the tests - each of which is relatively straightforward.

Generating the Test Harness

The first step to using Apache-Test is to tweak the Makefile.PL for your module. If you don't yet have a Makefile.PL, or are not familiar with how to generate one, then don't worry - all that is required is a simple call to h2xs, which provides us with a standard platform both for distributing our module and deploying the Apache-Test infrastructure.

  
$ h2xs -AXPn Apache::Clean
Defaulting to backward compatibility with perl 5.9.0
If you intend this module to be compatible with earlier perl versions, then please
specify a minimum perl version with the -b option.

Writing Apache/Clean/Clean.pm
Writing Apache/Clean/Makefile.PL
Writing Apache/Clean/README
Writing Apache/Clean/t/1.t
Writing Apache/Clean/Changes
Writing Apache/Clean/MANIFEST
  

h2xs generates the necessary structure for our module, namely the Clean.pm template and the Makefile.PL, as well as the t/ subdirectory where our tests and supporting files will eventually live. You can take some extra steps and shuffle the distribution around a bit (such as removing t/1.t and putting everything into Apache-Clean/ instead of Apache/Clean/) but it is not required. Once you have the module layout sorted out and have replaced the generated Clean.pm stub with the actual Clean.pm filter from before, it's time to start preparing the basic test harness.

To begin, we need to modify the Makefile.PL significantly. The end result should look something like:

  
#!perl

use 5.008;

use Apache2 ();
use ModPerl::MM ();
use Apache::TestMM qw(test clean);
use Apache::TestRunPerl ();

# configure tests based on incoming arguments
Apache::TestMM::filter_args();

# provide the test harness
Apache::TestRunPerl->generate_script();

# now, write out the Makefile
ModPerl::MM::WriteMakefile(
  NAME      => 'Apache::Clean',
  VERSION   => '2.0',
  PREREQ_PM => { HTML::Clean      => 0.8,
                 mod_perl         => 1.9909, },
);
  

Let's take a moment to analyze our nonstandard Makefile.PL. We begin by importing a few new mod_perl 2.0 libraries. The first is Apache2.pm. In order to peacefully co-exist with mod_perl 1.0 installations, mod_perl 2.0 gives you the option of installing mod_perl relative to Apache2/ in your @INC, as to avoid collisions with 1.0 modules of the same name. For instance, the mod_perl 2.0 Apache::Filter we used to write our output filter interface would be installed as Apache2/Apache/Filter.pm. Of course, ordinary calls that require() or use() Apache::Filter in mod_perl 2.0 code would fail to find the correct version (if one was found at all), since it was installed in a nonstandard place. Apache2.pm extends @INC to include any (existing) Apache2/ directories so that use() and related statements work as intended. In our case, we need to use() Apache2 in order to ensure that, no matter how the end-user configured his mod_perl 2.0 installation, we can find the rest of the libraries we need.

Secure in the knowledge that our Makefile.PL will be able to find all our other mod_perl 2.0 packages (wherever they live), we can proceed. ModPerl::MM provides the WriteMakefile() function, which is similar to the ExtUtils::MakeMaker function of the same name and takes the same options. The reason that you will want to use the WriteMakefile() from ModPerl::MM is that, through means highly magical, all of your mod_perl-specific needs are satisfied. For instance, your module will be installed relative to Apache/ or Apache2/, depending on how mod_perl itself is installed. Other nice features are automatic inclusion of mod_perl's typemap and the header files required for XS-based modules, as well as magical cross-platform compatibility for Win32 compilation, which has been troublesome in the past.

Keep in mind that neither Apache2.pm nor ModPerl::MM are required in order to use Apache-Test - both are packages specific to mod_perl 2.0 and any handlers you may write for this version (as will be touched on later, Apache-Test can be used for mod_perl 1.0 based modules as well, or even Apache 1.3 or 2.0 modules independent of mod_perl, for that matter). The next package, Apache::TestMM, is where the real interface for Apache-Test begins.

Apache::TestMM, contains the functions we will need to configure the test harness. The first thing we do is import the test() and clean() functions, which generate their respective Makefile targets so that we can run (and re-run) our tests. After that, we call the filter_args() function. This allows us to configure various parts of our tests on the command line using different options, which will be discussed later.

The final part of our configuration uses the generate_script() method from the Apache::TestRunPerl class, which writes out the script responsible for running our tests, t/TEST. It is t/TEST that will be invoked when a user issues make test, although the script can be called directly as well. While t/TEST can end up containing lots of information, if you crack it open, then you would see that the engine that really drives the test suite is rather simple.

  
use Apache::TestRunPerl ();
Apache::TestRunPerl->new->run(@ARGV);
  

Believe it or not, the single call to run() does all intricate work of starting, configuring, and stopping Apache, as well as running the individual tests we (still) have yet to define.

Despite the long explanations, the net result of our activity thus far has been a few modifications to a typical Makefile.PL so that it reflects the needs of both our mod_perl 2.0 module and our forthcoming use of the Apache-Test infrastructure. Next, we need to configure Apache for the tests specific to the functionality in our handler.

Configuring Apache

Ordinarily, there are many things you need to stuff into httpd.conf in order to get the server responding to requests, only some of which are related to the content the server will provide. The Apache-Test framework provides a minimal Apache configuration, such as default DocumentRoot, ErrorLog, Listen, and other settings required for normal operation of the server. In fact, with no intervention on your part, Apache-Test provides a configuration that enables you to successfully request /index.html from the server. Chances are, though, that you will need something above a basic configuration in order to test your module appropriately.

To add additional settings to the defaults, we create the file t/conf/extra.conf.in, adding any required directories along the way. If Apache-Test sees extra.conf.in, then it would pull the file into its default configuration using an Include directive (after some manipulations we will discuss shortly). This provides a nice way of adding only the configuration data you require for your tests, and saves you from the need to worry about the mundane aspects of running the server.

One of the first aspects of Apache::Clean we should test is whether it can clean up a simple, static HTML file. So, we begin our extra.conf.in with the following:

  
PerlSwitches -w

Alias /level @DocumentRoot@
<Location /level>
  PerlOutputFilterHandler Apache::Clean
  PerlSetVar CleanLevel 2
</Location>
  

This activates our output filter for requests to /level. Note the introduction of a new directive, PerlSwitches, which allows you to pass command line switches to the embedded perl interpreter. Here, we use it to enable warnings, similar to the way that PerlWarn worked in mod_perl 1.0. PerlSwitches can actually take any perl command line switch, which makes it a fairly useful and flexible tool. For example, we could use the -I switch to extend @INC in place of adding use lib statements to a startup.pl, or use -T to enable taint mode in place of the former PerlTaintMode directive, which is not part of mod_perl 2.0.

Next, we come to the familiar Alias directive, albeit with a twist. As previously mentioned, Apache-Test configures several defaults, including DocumentRoot and ServerRoot. One of the nice features of Apache-Test is that it keeps track of its defaults for you and provides some helpful variable expansions. In my particular case, the @DocumentRoot@ variable in the Alias directive is replaced with the value of the default DocumentRoot that Apache-Test calculated for my build. The real configuration ends up looking like

  
Alias /level /src/perl.com/Apache-Clean-2.0/t/htdocs
  

when the tests are run. This is handy, especially when you take into consideration that your tests may run on different platforms.

The rest of the configuration closely resembles our example from last time - using the PerlOutputFilterHandler to specify Apache::Clean as our output filter, and PerlSetVar to specify the specific HTML::Clean level. The only thing missing before we have prepared our module enough to run our first test is some testable content in DocumentRoot.

As you can see from the @DocumentRoot@ expansion in the previous example, DocumentRoot resolves to ServerRoot/t/htdocs/, so that is one place where we can put any documents we are interested in retrieving for our tests. So, we create t/htdocs/index.html and place some useful content in it.

  
<i    ><strong>&quot;This is a test&quot;</strong></i   >
  

Our index.html contains a number of different elements that HTML::Clean can tidy, making it useful for testing various configurations of Apache::Clean.

Now we have all the Apache configuration that is required: some custom configuration directives in t/conf/extra.conf.in and some useful content in t/htdocs/index.html. All that is left to do is write the tests.

Writing the Tests

The Apache configuration we have created thus far provides a way to test Apache::Clean through /level/index.html. The result of this request should be that the default Apache content handler serves up index.html, applying our PerlOutputFilterHandler to the file before it is sent over the wire. Given the configured PerlSetVar CleanLevel 2 we would expect the end results of the request to be

  
<i><b>&quot;This is a test&quot;</b></i>
  

where tags are shortened and whitespace removed but the &quot; entity is left untouched. Well, maybe this is not what you would have expected, but cracking open the code for HTML::Clean reveals that level(2) includes the whitespace and shortertags options, but not the entities option. This brings us to the larger issue of test design and the possibility that flawed expectations can mask true bugs - when a test fails, is the bug in the test or in the code? - but that is a discussion for another time.

Given our configuration and expected results, we can craft a test that requests /level/index.html, isolates the content from the server response, then tests the content against our expectations. The file t/01level.t shown here does exactly that.

  
use strict;
use warnings FATAL => 'all';

use Apache::Test qw(plan ok have_lwp);
use Apache::TestRequest qw(GET);
use Apache::TestUtil qw(t_cmp);

plan tests => 1, have_lwp;

my $response = GET '/level/index.html';
chomp(my $content = $response->content);

ok ($content eq q!<i><b>&quot;This is a test&quot;</b></i>!);
  

t/01level.t illustrates a few of the things that will be common to most of the tests you will write. First, we do some bookkeeping and plan the number of tests that will be attempted using the plan() function from Apache::Test - in our case just one. The final, optional argument to plan() uses the have_lwp() function to check for the availability of the modules from the libwww-perl distribution. If have_lwp() returns true, then we know we can take advantage of the LWP shortcuts Apache::TestRequest provides. If have_lwp() returns false, then no tests are planned and the entire test is skipped at runtime.

After planning our test, we use the shortcut function GET() from Apache::TestRequest to issue a request to /level/index.html. GET() returns an HTTP::Response object, so if you are familiar with the LWP suite of modules you should feel right at home with what follows. Using the object in $response we isolate the server response using the content() method and compare it against our expected string. The comparison uses a call to ok(), which will report success if the two strings are equivalent.

Keep in mind that even though this example explicitly imported the plan(), ok(), have_lwp(), and GET() functions into our test script, that was just to illustrate the origins of the different parts of the test - each of these functions, along with just about all the others you may want, are exported by default. So, the typical test script will usually just call

  
use Apache::Test;
use Apache::TestRequest;
  

and go from there.

That is all there is to writing the test. In its simplest form, using Apache-Test involves pretty much the same steps as when writing tests using other Perl testing tools: plan() the number of tests in the script, do some stuff, and call ok() for each test you plan(). Apache-Test and its utility classes merely offer shortcuts that make writing tests against a running Apache server idiomatic.

Running the Tests

With all the preparation behind us - generating and customizing the Makefile.PL, configuring Apache with extra.conf.in, writing index.html and 01level.t - we have all the pieces in place and can (finally) run our test.

There are a few different ways we can run the tests in a distribution, but all require that we go through the standard build steps first.

  
$ perl Makefile.PL -apxs /usr/local/apache2/bin/apxs
Checking if your kit is complete ...
Looks good
Writing Makefile for Apache::Clean

$ make
cp Clean.pm blib/lib/Apache2/Apache/Clean.pm
Manifying blib/man3/Apache::Clean.3
  

Makefile.PL starts the process by generating the t/TEST script via the call to Apache::TestRunPerl->generate_script(). The additional argument we pass, -apxs, is trapped by Apache::TestMM::filter_args() and is used to specify the Apache installation we want to test our code against. Here, I use -apxs to specify the location of the apxs binary in my local Apache DSO installation - for static builds you will want to use -httpd to point to the httpd binary instead. By the time Makefile.PL exits, we have our test harness and know where our server lives.

Running make creates our build directory, blib/, and installs Clean.pm locally so we can use it in our tests. Note that ModPerl::MM installed Clean.pm relative to Apache2, magically following the path of my current mod_perl 2.0 installation.

At this point, we can run our tests. Issuing make test will run all the tests in t/, as you might expect. However, we can run our tests individually as well, which is particularly useful when debugging. To run a specific test we call t/TEST directly and give it the name of the test we are interested in.

  
$ t/TEST t/01level.t
*** setting ulimit to allow core files
ulimit -c unlimited; t/TEST 't/01level.t'
/usr/local/apache2/bin/httpd  -d /src/perl.com/Apache-Clean-2.0/t 
    -f /src/perl.com/Apache-Clean-2.0/t/conf/httpd.conf 
	-DAPACHE2 -DPERL_USEITHREADS
using Apache/2.1.0-dev (prefork MPM)

waiting for server to start: ..
waiting for server to start: ok (waited 1 secs)
server localhost:8529 started
01level....ok                                                                
All tests successful.
Files=1, Tests=1,  4 wallclock secs ( 3.15 cusr +  0.13 csys =  3.28 CPU)
*** server localhost:8529 shutdown
  

As you can see, the server was started, our test was run, the server was shutdown, and a report was generated - all with what is really minimal work on our part. Major kudos to the Apache-Test developers for making the development of live tests as easy as they are.

Beyond the Basics

What we have talked about so far is just the basics, and the framework is full of a number of different options designed to make writing and debugging tests easier. One of these is the Apache::TestUtil package, which provides a number of utility functions you can use in your tests. Probably the most helpful of these is t_cmp(), a simple equality testing function that also provides additional information when you run tests in verbose mode. For instance, after adding use Apache::TestUtil; to our 01level.t test, we can alter the call to ok() to look like

  
ok t_cmp(q!<i><b>&quot;This is a test&quot;</b></i>!, $content);
  

and the result would include expected and received notices (in addition to standard verbose output)

  
$ t/TEST t/01level.t -v
[lines snipped]
01level....1..1
# Running under perl version 5.009 for linux
# Current time local: Mon May  5 11:04:09 2003
# Current time GMT:   Mon May  5 15:04:09 2003
# Using Test.pm version 1.24
# expected: <i><b>&quot;This is a test&quot;</b></i>
# received: <i><b>&quot;This is a test&quot;</b></i>
ok 1
ok
All tests successful.
  

which is particularly helpful when debugging problems reported by end users of your code. See the Apache::TestUtil manpage for a long list of helper functions, as well as the README in the Apache-Test distribution for additional command line options over and above -v.

Of course, 01level.t only tests one aspect of our Clean.pm output filter, and there is much more functionality in the filter that we might want verify. So, let's take a quick look at some of the other tests that accompany the Apache::Clean distribution.

One of the features of Apache::Clean is that it automatically declines processing non-HTML documents. The logic for this was defined in just a few lines at the start of our filter.

  
# we only process HTML documents
unless ($r->content_type =~ m!text/html!i) {
  $log->info('skipping request to ', $r->uri, ' (not an HTML document)');

  return Apache::DECLINED;
}
  

A good test for this code would be verifying that content from a plain-text document does indeed pass through our filter unaltered, even if it has HTML tags that HTML::Clean would ordinary manipulate. Our test suite includes a file t/htdocs/index.txt whose content is identical to the index.html file we created earlier. Remembering that we already have an Apache configuration for /level that inserts our filter into the request cycle, we can use a request for /level/index.txt to test the decline logic.

  
use Apache::Test;
use Apache::TestRequest;

plan tests => 1, have_lwp;

my $response = GET '/level/index.txt';
chomp(my $content = $response->content);

ok ($content eq q!<i><strong>&quot;This is a test&quot;</strong></i>!);
  

It may be obvious, but if you think about what we are really testing here it is not that the content is unaltered - that is just what we use to measure the success of our test. The real test is against the criterion that determines whether the filter acts on the content. If we wanted to be really thorough, then we could add

  
AddDefaultCharset On
  

to our extra.conf.in to test the Content-Type logic against headers that look like text/html; charset=iso-8859-1 instead of just text/html. I actually have had more than one person comment that using a regular expression for testing the Content-Type is excessive - adding the AddDefaultCharset On directive shows that the regex logic can handle more runtime environments than a simple $r->content_type eq 'text/html' check. Oh, the bugs you will find, fix, and defend when you start writing tests.

More and More Tests

What other aspects of the filter can we put to the test? If you recall from our discussion of output filters last time, one of the responsibilities of filters that alter content is to remove the generated Content-Length header from the server response. The relevant code for this in our filter was as follows.

  
# output filters that alter content are responsible for removing
# the Content-Length header, but we only need to do this once.
$r->headers_out->unset('Content-Length');
  

Here is the test for this bit of logic, which checks that the Content-Length header is indeed present for plain documents, but removed by our filter for HTML documents. Again, we will be using the existing /level URI to request both index.txt and index.html.

  
use Apache::Test;
use Apache::TestRequest;

plan tests => 2, have_lwp;

my $response = GET '/level/index.txt';
ok ($response->content_length == 58);

$response = GET '/level/index.html';
ok (! $response->content_length);
  

Note the use of the content_length() method on our HTTP::Response object to retrieve the Content-Length of the server response. Remember that you have all the methods from that class to choose from in your tests.

The final test we will take a look at is the example we used previous time to illustrate our filter does indeed co-exist with both mod_include and mod_cgi. As it turns out, the example was taken right from the test suite (always a good place from which to draw examples). Here is the extra.conf.in snippet.

  
Alias /cgi-bin @ServerRoot@/cgi-bin
<Location /cgi-bin>
  SetHandler cgi-script

  SetOutputFilter INCLUDES
  PerlOutputFilterHandler Apache::Clean

  PerlSetVar CleanOption shortertags
  PerlAddVar CleanOption whitespace
  Options +ExecCGI +Includes
</Location>
  

The nature of our test requires that both mod_include and a suitable CGI platform (either mod_cgi or mod_cgid) be available to Apache - without both of these, our tests are doomed to failure, so we need a way to test whether these modules are available to the server before planning the individual tests. Also required are some CGI scripts, the location of which is specified by expanding @ServerRoot@. To include these scripts, we could just create a t/cgi-bin/ directory and place the relevant files in it. However, any CGI scripts we create would probably include a platform-specific shebang line like #!/usr/bin/perl. A better solution is to generate the scripts on-the-fly, specifying a shebang line that matches the version of Perl we are using to build and test the module.

Despite the extra work required, the test script used for this test is only a bit more complex than others we have seen so far.

  
use Apache::Test;
use Apache::TestRequest;
use Apache::TestUtil qw(t_write_perl_script);

use File::Spec::Functions qw(catfile);

plan tests => 4, (have_lwp && 
                  have_cgi &&
                  have_module('include'));

my @lines = <DATA>;
t_write_perl_script(catfile(qw(cgi-bin plain.cgi)), @lines[0,2]);
t_write_perl_script(catfile(qw(cgi-bin include.cgi)), @lines[1,2]);

my $response = GET '/cgi-bin/plain.cgi';
chomp(my $content = $response->content);

ok ($content eq q!<strong>/cgi-bin/plain.cgi</strong>!);
ok ($response->content_type =~ m!text/plain!);

$response = GET '/cgi-bin/include.cgi';
chomp($content = $response->content);

ok ($content eq q!<b>/cgi-bin/include.cgi</b>!);
ok ($response->content_type =~ m!text/html!);

__END__
print "Content-Type: text/plain\n\n";
print "Content-Type: text/html\n\n";
print '<strong><!--#echo var="DOCUMENT_URI" --></strong>';
  

The first thing to note is that we have joined the familiar call to have_lwp() with additional calls to have_cgi() and have_module(). The Apache::Test package comes with a number of handy shortcuts for querying the server for information. have_cgi() returns true if either mod_cgi or mod_cgid are installed. have_module() is more generic and can be used to test for either Apache C modules or Perl modules - for instance, have_module('Template') could be used to check whether the Template Toolkit is installed.

For generation of the CGI scripts, we use the t_write_perl_script() function from the Apache::TestUtil package. t_write_perl_script() takes two arguments, the first of which is the name of the file to generate, relative to the t/ directory in the distribution. If the file includes a path, any necessary directories are automatically created. In the interests of portability, we use catfile() from the File::Spec::Functions package to join the file with the directory. In general, you will want to keep File::Spec and its associated classes in mind when writing your tests - you never know when somebody is going to try and run them on Win32 or VMS. The second argument to t_write_perl_script() is a list of lines to append to the file after the (calculated) shebang line.

Although t_write_perl_script() cleans up any generated files and directories when the test completes, if we were to intercept include.cgi before removal it would look similar to something we would have written ourselves.

  
#!/src/bleedperl/bin/perl
# WARNING: this file is generated, do not edit
# 01: /src/bleedperl/lib/site_perl/5.9.0/i686-linux-thread-multi/
      Apache/TestUtil.pm:129
# 02: 06mod_cgi.t:18
print "Content-Type: text/html\n\n";
print '<strong><!--#echo var="DOCUMENT_URI" --></strong>';
  

As you probably have guessed by now, just as we ran tests against scripts in the (generated) t/cgi-bin/ directory, we can add other directories to t/ for other kinds of tests. For instance, we can create t/perl-bin/ to hold standard ModPerl::Registry scripts (remember, you don't need to generate a shebang line for those). We can even create t/My/ to hold a custom My::ContentGenerator handler, which can be used just like any other Perl module during Apache's runtime. All in all, you can simulate practically any production environment imaginable.

But Wait, There's More!

The tests presented here should be enough to get you started writing tests for your own modules, but they are only part of the story. If you are interested in seeing some of the other tests written to support this article, the Apache::Clean distribution is full of all kinds of different tests and test approaches, including some that integrate custom handlers as well as one that tests the POD syntax for the module. In fact, you will find 26 different tests in 12 test files there, free for the taking.

Stuck using mod_perl 1.0? One of the best things about Apache-Test is that it is flexible and intelligent enough to be used for mod_perl 1.0 handlers as well. In fact, the recent release of Apache-Test as a CPAN module outside of the mod_perl 2.0 distribution makes it even easier for all mod_perl developers to take advantage of the framework. For the most part, the instructions in this article should be enough to get you going writing tests for 1.0-based modules - the only changes specific to 1.0 modules rest in the Makefile.PL. I took the time to whip up a version of Apache::Clean for mod_perl 1.0 that parallels the functionality in these articles, which you can find next to the 2.0 version. The 1.0 distribution runs against the exact same *.t files (where applicable) and includes a sample 1.0 Makefile.PL.

Personally, I don't know how I ever got along without Apache-Test, and I'm sure that once you start using it you will feel the same. Secretly, I'm hoping that Apache-Test becomes so popular that end-users start wrapping their bug reports up in little, self-contained, Apache-Test-based tarballs so anyone can reproduce the problem.

More Information

This article was derived from Recipe 7.7 in the mod_perl Developer's Cookbook, adjusted to accommodate both mod_perl 2.0 and changes in the overall Apache-Test interface that have happened since publication. Despite these differences, the recipe is useful for its additional descriptions and coverage of features not discussed here. You can read Recipe 7.7, as well as the rest of Chapter 7 from the book's website. Also, in addition to the Apache-Test manpage and README there is also the Apache-Test tutorial on the mod_perl Project website, all of which are valuable sources of information.

Thanks

The Apache-Test project is the result of the tireless efforts of many, many developers - far too many to name individually here. However, there has has been a recent surge of activity as Apache-Test made its way to CPAN, especially in making it more platform aware and solving a few back compatibility problems with the old Apache::test that ships with mod_perl 1.0. Special thanks are due to Stas Bekman, David Wheeler, and Randy Kobes for helping to polish Apache-Test on Win32 and Mac OS X without requiring major changes to the API.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02