December 2002 Archives

How Perl Powers Christmas

You know, it's not easy having the happiness of billions of children around the world resting with your organization, and it's even harder on the IT department. The incorporated elves and pixiefolk of the North Pole, under the direction of their jolly old leader, have to deal with massive quantities of data, huge manfacturing flows and what is possibly the strictest delivery timetable in the world. Despite these challenges Santa and his reindeer have been able to meet their tight deadline and achieve one of the highest customer satisfaction ratings in industry.

For many centuries, the elves needed to work for only a couple of months of the year to manufacture every gift for every child, but recent advances in technology and the increasing global population have, in the past two decades, left them working day and night all year round, with only a few days of holiday before work had to begin again in early Janurary. During the early '90s, some workflow improvements were made and some time savings were gained by using a mainframe to coordinate the route Santa would take on the night before Christmas, ensuring he could still visit every client during the 24 hours available. By 1995, these savings had made only a small difference to the performance of the operation, with representatives of the Amalgamated Present Production and Sleigh Mechanics' Union threatening to leak to the media predictions that Christmas might need to be cancelled in 1998 and held bi-annually from then on if every child was to receive presents on the same day. The elves even considered going on strike in 1996, but they reconsidered after seeing the reaction of a young boy to his older brother's proof that Santa Claus could not physically exist. (Santa is, of course, entirely real -- he just doesn't pay too much attention to natural laws.)

Thankfully, history took a different route and the North Pole escaped its first industrial action since records began. A searching review of the whole production process included sending parts of the short statured delivery facilitation team on outplacements in industry. One of the elves in this division was lucky enough to be sent along to an emerging dot-com, where he discovered -- and instantly fell in love with -- Perl. He returned to the Pole brimming with enthusiasm and soon convinced Santa (well, his wife, who then made Mr. Claus convinced) that it would be possible to prepare for Christmas in a matter of months, or maybe even weeks, if this emerging tool fulfilled its early promise.

That was seven years ago, and now Perl powers Christmas. Its diverse realm of application, not to mention its slightly idiosyncratic nature, fit the mindset of an elf perfectly and -- more importantly -- help them to get everything done in November and December, leaving them the rest of the year to enjoy themselves.

The first application of Perl was in the Health and Safety department. For many years, the Association for the Prevention of Cruelty to Avionic Reindeer was worried about the risks presented during landings on roofs and increasingly from the potential for midair collisions with aircraft. An effort was undertaken to carefully map international air corridors, catalog hazardous or unstable landing patches and (as Santa's diet was underperforming) details of chimney widths for every dwelling on the planet. Needless to say maintaining this was a nightmare -- until, that is, a custom set of tools was written in Perl to allow Santa's scouts to take reports from the field using a primitive Web interface. The major benefit of Perl here was the speed with which new types of reindeer strip could be added to the database as the time needed to program the necessary logic to handle them was reduced.

Perl continued to make inroads on the databases of Santa's grottos. Next to fall to its rogueish charms was the global child distribution and route planning systems. These had been implemented on two mainframes under two different packages but global population growth scaled faster than the earlier architecture, requiring a shift to an entirely new distributed design. The database itself uses a commercial package but the data migration was handled using Perl. Its flexible dynamic typing, object system and, above all, the set of DBI drivers allowed Perl to talk to every database in its own language without the programmers needing to learn each and every one of them. This project was completed ahead of schedule leaving the team of crack data migration experts with little to do, so they were tasked to set their tools onto the internally infamous naughtyness database.


O'Reilly Gear

The naughtyness database was, for many years, simply a set of paper files kept in a filing cabinet in a dungeon by a troll. Every year, the troll would carefully collate every good deed and every black act of every boy and girl the world over. Before distributing presents on Christmas day, Father Christmas would ask the troll if there were any children that deserved coal instead. Of course, due to his meticulous record keeping the troll could honestly, if gruffly, reply "no, not a single child has been that bad." The mounting volumes of data near the end of the last century left the troll unable to keep up with developments. Soon he began to confuse one child with another, sometimes he couldn't enter every good thought of every child and eventually the system failed. A child was assigned not one but two sacks of coal. Thankfully an internal investigation revealed the problems faced by the troll and corrected the error, but it was decided that the troll must be retired in favor of a system based on the latest developments in artificial intelligence technologies. The review also concluded that the system should also modified so that especially good children would get better presents delivered, and coal was retired in favor of a good talking to from the troll who now relishes his new line in community work.

Getting back to the technical details, the system needed to run quickly (emulating a troll is not an easy undertaking, they may appear dimwitted but are, in fact, deeply pondering the games of postal Go they love so much). At the same time, it had to allow for operators to script the system and tune its operation using a high level language. It was decided that the crucial parts of the program would be implemented in C, with wrappers being written using the Inline::C module allowing Perl to form the high level director of the system. This proved to be a great sucess, with roughly 5 percent of children now qualifying for bonus presents. Elves are also queing up to work in the new department, in part because of the rewarding work, but also because the system allows them so much room to tinker with the criteria for awards as each processing run is performed so quickly.

The recent explosion of the Internet has been both good and bad for the elves in the mail room. Santa receives many millions of letters each year from all around the world, and now also gets about 10 times as many e-mails. For a time, these were processed in the same way as the letters, but soon a new solution was required. In the end, rather than develop an in house tool, the elves adopted RT -- a trouble ticketing and bug tracking system written in Perl -- to handle the assignment of requests to manufacturing areas. This allowed a much closer match between the wishes of the children and the presents they unwrapped come Dec. 25. The elves also encountered a growing problem from e-mail spam. For a while, they naively assumed that Santa should be sending viagra and his bank details to relatives of the President of Nigeria, but eventually they twigged that odd things were afoot. A bit of research, and some help from the Perl community, led them quickly to Mail::Audit and spamassassin as an optimal filter.

In line with many companies these days, the North Pole has started to outsource its production of presents to commercial concerns. Perl has again been able to help with this effort by acting as a mediator between the requirements database produced from letters to Santa and the production systems of the outsourced manufacturers. A combination of freely available XML, SOAP and CORBA tools allow rapid creation of interfaces to the systems of new partners and allow aggregation of many external representations of an invoice to a single standard form suitable for input to the internal accounting systems.

These accounting systems are written entirely in Perl, mostly because commercial packages are not available to deal in the currency (sherry, biscuits and carrots) of Santa's environs. This highly available mission critical application ensures that elves are well supplied with the rewards for their work and keeps the workforce motivated the whole year round. Using a core engine written using Perl's framework for multi user dungeons (and any other multitasking or massively networked environments), POE, the system has scaled to process over one billion transactions in a single day. (The elves and reindeer are paid yearly, on boxing day, out of the titbits left out in houses and collected by Santa as he passes through.)

Fifi Longstockings, the chief software engineer and head of blancmange parties, says of Perl: "without Perl, I don't know what would happen to our operation. It's now critical to every area of our business and contributes directly to the magic of Christmas. It's amazing just how much hard work and effort it has helped us avoid in the past few years. Here in the grotto we dare to be lazy and Perl is the ideal tool for the inspired slacker who'd rather sing and dance than spend longer than they need to at work." He also looks forward to the benefits that Perl 6 could bring, and is happy that Perl will continue to be supported by the community for the foreseeable future: "Of course, Perl has come a long way since we started using it. There were some tasks it just couln't cope with before it gained an object oriented framework and there are some things now we'd love to get done using it but cannot. Some of these should be possible with Perl 6 though, so we're investing a couple of our elves' time and some fairy dust in parrot development at the moment. It's certainly an exciting time for us, and for Perl!"

Programming with Mason

Dave Rolsky and Ken Williams are the authors of Embedding Perl in HTML with Mason.

Mason is a powerful framework for generating dynamic text, and is especially useful when creating complex, featureful Web sites. For those (hopefully few) folks who haven't yet heard of Mason, it is a Perl-based templating framework comparable to frameworks such as Apache::ASP, Embperl, and Template Toolkit. Like the first two, and unlike the latter, Mason operates by embedding Perl in text.

Mason is based around the idea of a component. A component is roughly equivalent to a Perl subroutine, and can contain text and/or code. Here is a very simple, but complete component that has both text and code:


 % my $planet = "World";
 Hello, <% $planet %>!

When Mason runs this code, the output is:


 Hello, World!

The rest of this article assumes at least a minimal familiarity with Mason, though if you're at all familiar with other templating systems, you'll probably be able to grok the code we show. For more details, I would of course recommend Embedding Perl in HTML with Mason, written by Ken Williams and myself. Mason also comes with its own documentation, which can be seen online at www.masonhq.com.

As with any powerful and flexible system, Mason is applicable to a lot of problems, and there is always more than one way to do it. It is a Perl-based system, after all!

Below you'll find some cookbook recipes for solving a few typical Web application problems. All the recipes assume that you are using the latest version of Mason, which at the time of this writing is 1.15, though most of them will work untouched with older versions.

Putting a Session ID in All URLs

If you've ever written a dynamic Web application, then it's likely that you've used sessions to store data as the user moves through the application. Typically, sessions are identified by session IDs that are stored in a cookie.

If you cannot use cookies, then you can store the session ID in the URL. There are security and application problems with this approach (as well as with the user of cookies), but those are outside the scope of this article. The mod_perl user list archives at marc.theaimsgroup.com/?l=apache-modperl contain a number of discussions related to this topic.

Putting the session ID in the URL can be a hassle, because it means that you have to somehow process all the URLs you generate. Using Mason, this isn't as difficult as it would be otherwise. There are at least two ways to do this.

The first would be to put a filter in your top level autohandler component:


  <%filter>
   s/href="([^"])+"/'href="' . add_session_id($1) . '"'/eg;
   s/action="([^"])+"/'href="' . add_session_id($1) . '"'/eg;
  </%filter>
The add_session_id() subroutine, which should be defined in a module, might look something like this:

  sub add_session_id {
      my $url = shift;

      return $url if $url =~ m{^\w+://}; # Don't alter external URLs

      if ($url =~ /\?/) {
  	  $url =~ s/\?/?session_id=$MasonBook::Session{_session_id}&/;
      } else {
          $url .= "?session_id=$MasonBook::Session{_session_id}";
      }

      return $url;
  }

This routine accounts for external links as well as links with or without an existing query string.

The drawback to putting this in a <%filter> section is that it only filters URLs in the content generated by components, and misses any URLs that might be in headers, such as in a redirect. Therefore, you'd need to handle those cases separately with this solution.

Another solution would be to create all URLs (including those intended for redirects) via a dedicated component or subroutine that adds the session id. This latter solution is probably a better idea, as it handles redirects properly. The drawback with this strategy is that you'll have a Mason component call for every link, instead of just regular HTML.

Here is just such a component:


  <%args>
   $scheme   => 'http'
   $username => undef
   $password => ''
   $host     => undef
   $port     => undef
   $path
   %query    => ()
   $fragment => undef
  </%args>
  <%init>
   my $uri = URI->new;

   if ($host) {
       $uri->scheme($scheme);

       if (defined $username) {
           $uri->authority( "$username:$password" );
       }

       $uri->host($host);
       $uri->port($port) if $port;
   }

   # Sometimes we may want to include a path in a query string as part
   # of the path but the URI module will escape the question mark.
   my $q;

   if ( $path =~ s/\?(.*)$// ) {
       $q = $1;
   }

   $uri->path($path);

   # If there was a query string, we integrate it into the query
   # parameter.
   if ($q) {
       %query = ( %query, split /[&=]/, $q );
   }

   $query{session_id} = $UserSession{session_id};

   # $uri->query_form doesn't handle hash ref values properly
   while ( my ( $key, $value ) = each %query ) {
       $query{$key} = ref $value eq 'HASH' ? [ %$value ] : $value;
   }

   $uri->query_form(%query) if %query;

   $uri->fragment($fragment) if $fragment;
  </%init>
  <% $uri->canonical | n %>\

If you didn't want to put the session ID in the query string, then you might instead make it part of the URL path. The application could retrieve the session id from incoming requests by using a mod_perl handler during the URL translation stage of request handling.

This component provides a programmatic interface to URL generation. Here is an example of how to use it, assuming that you've saved it as a component called /url:


   ... some HTML ...
   Look at <a href="<& /url, path => "books.html" &>">our books</a>
   or <a href="<& /url, host => "www.oreilly.com"
                        path => "/catalog" &>">O'Reilly's</a>.
   ... some HTML ...

Making Use of Autoflush

Every once in a while, you may have to output a very large component or a file to the client. If you simply let this accumulate in the output buffer, you could use up a lot of memory. Furthermore, the slow response time may make the user think that the site has stalled.

Here is an example that sends out the contents of a potentially large file without sucking up lots of memory.


  <%args>
   $filename
  </%args>
  <%init>
   local *FILE;
   open FILE, "< $filename" or die "Cannot open $filename: $!";
   $m->autoflush(1);
   while (<FILE>) {
       $m->print($_);
   }
   $m->autoflush(0);
  </%init>

If each line wasn't too huge, then you might just flush the buffer every once in a while:


  <%args>
   $filename
  </%args>
  <%init>
   local *FILE;
   open FILE, "< $filename" or die "Cannot open $filename: $!";
   while (<FILE>) {
       $m->print($_);
       $m->flush_buffer unless $. % 10;
   }
   $m->flush_buffer;
  </%init>

The unless $. % 10 bit makes use of the special Perl variable $., which is the current line number of the file being read. If this number modulo 10 is equal to zero, then we flush the buffer. This means that we flush the buffer every 10 lines. (Replace the number 10 with any desired value.)

User Authentication and Authorization

One problem that Web sites have to solve over and over again is user authentication and authorization. These two topics are related but not the same, as some might think. Authentication is the process of figuring out if someone is who they say they are, and usually involves checking passwords or keys. Authorization comes after this, when we want to determine whether a particular person is allowed to perform a certain action.

There are a number of modules on CPAN that are intended to help do these things under mod_perl. In fact, Apache has separate request-handling phases for both authentication and authorization that mod_perl can handle. It is certainly possible to use these modules with Mason.

You can also do authentication and authorization using Mason components. Authentication will usually involve some sort of request for a login and a password, after which you give the user some sort of token (either in a cookie or a session) that indicates that they have been authenticated. You can then check the validity of this token for each request.

If you have such a token, then authorization simply consists of checking that the user to whom the token belongs is allowed to perform a given action.

Using Apache::AuthCookie

The Apache::AuthCookie module, available from CPAN, is a module that handles both authentication and authorization via mod_perl and can be easily hooked into Mason. Rather than go through all the details of configuring Apache::AuthCookie, which requires various settings in your server config file, let's just skip all that and show you how you'd make the interface to Mason.

Apache::AuthCookie requires that you create a "login script" that will be executed the first time a browser tries to access a protected area. Calling this a script is actually somewhat misleading since it is really a page rather than a script (though it could be a script that generates a page). Regardless, using a Mason component for your "login script" merely requires that you specify the path to your Mason component for the login script parameter.

We'll call this "script" AuthCookieLoginForm.comp:


  <html>
  <head>
  <title>Mason Book AuthCookie Login Form</title>
  </head>
  <body>
  <p>
  Your attempt to access this document was denied
  (<% $r->prev->subprocess_env("AuthCookieReason") %>).  Please enter
  your username and password.
  </p>

  <form action="/AuthCookieLoginSubmit">
  <input type="hidden" name="destination" value="<% $r->prev->uri %>">
  <table align="left">
   <tr>
    <td align="right"><b>Username:</b></td>
    <td><input type="text" name="credential_0" size="10" maxlength="10"></td>
   </tr>
   <tr>
    <td align="right"><b>Password:</b></td>
    <td><input type="password" name="credential_1" size="8" maxlength="8"></td>
   </tr>
   <tr>
    <td colspan="2" align="center"><input type="submit" value="Continue"></td>
   </tr>
  </table>
  </form>

  </body>
  </html>

This component is modified version of the example login script included with the Apache::AuthCookie distribution.

The action used for this form, , is configured as part of your AuthCookie configuration in your httpd.conf file.

That's about all it takes to glue Apache::AuthCookie and Mason together. The rest of authentication and authorization is handled by configuring mod_perl to use Apache::AuthCookie to protect anything on your site that needs authorization. A very simple configuration might include the following directives:


  PerlSetVar MasonBookLoginScript /AuthCookieLoginForm.comp

  <location /authcookieloginsubmit>
    AuthType MasonBook::AuthCookieHandler
    AuthName MasonBook
    SetHandler  perl-script
    PerlHandler MasonBook::AuthCookieHandler->login
  </location>

  <location /protected>
    AuthType MasonBook::AuthCookieHandler
    AuthName MasonBook
    PerlAuthenHandler MasonBook::AuthCookieHandler->authenticate
    PerlAuthzHandler  MasonBook::AuthCookieHandler->authorize
    require valid-user
  </location>
The MasonBook::AuthCookieHandler module would look like this:

  package MasonBook::AuthCookieHandler;

  use strict;

  use base qw(Apache::AuthCookie);

  use Digest::SHA1;

  my $secret = "You think I'd tell you?  Hah!";

  sub authen_cred {
      my $self = shift;
      my $r = shift;
      my ($username, $password) = @_;

      # implementing _is_valid_user() is out of the scope of this chapter
      if ( _is_valid_user($username, $password) ) {
          my $session_key =
            $username . '::' . Digest::SHA1::sha1_hex( $username, $secret );
          return $session_key;
      }
  }

  sub authen_ses_key {
      my $self = shift;
      my $r = shift;
      my $session_key = shift;

      my ($username, $mac) = split /::/, $session_key;

      if ( Digest::SHA1::sha1_hex( $username, $secret ) eq $mac ) {
          return $session_key;
      }
  }

This provides the minimal interface an Apache::AuthCookie subclass needs to provide to get authentication working.

Doing It My Way (Thanks Frank)

But what if you don't want to use Apache::AuthCookie? For example, your site may need to work without using cookies. No doubt this was exactly what Frank Sinatra was thinking about when he sang "My Way," so let's do it our way.

First, we will show an example authentication system that only uses Mason and passes the authentication token around via the URL (actually, via a session).

This example assumes that we already have some sort of session system that passes the session id around as part of the URL, as discussed previously.

We start with a quick login form. We will call this component login_form.html:


  <%args>
   $username => ''
   $password => ''
   $redirect_to => ''
   @errors => ()
  </%args>
  <html>
  <head>
  <title>Mason Book Login</title>
  </head>

  <body>

  % if (@errors) {
  <h2>Errors</h2>
  %   foreach (@errors) {
  <b><% $_ | h %></b><br>
  %   }
  % }

  <form action="login_submit.html">
  <input type="hidden" name="redirect_to" value="<% $redirect_to %>">
  <table align="left">
   <tr>
    <td align="right"><b>Login:</b></td>
    <td><input type="text" name="username" value="<% $username %>"></td>
   </tr>
   <tr>
    <td align="right"><b>Password:</b></td>
    <td><input type="password" name="password" value="<% $password %>"></td>
   </tr>
   <tr>
    <td colspan="2" align="center"><input type="submit" value="Login"></td>
   </tr>
  </table>
  </form>

  </body>
  </html>

This form uses some of the same techniques we show in Chapter 8 ("Building a Mason Site") to pre-populate the form and to handle errors.

Now let's make the component that handles the form submission. This component, called login_submit.html, will check the username and password and, if they are valid, place an authentication token into the user's session:


  <%args>
   $username
   $password
   $redirect_to
  </%args>
  <%init>
   if (my @errors = check_login($username, $password) {
       $m->comp( 'redirect.mas',
                  path => 'login_form.html',
                  query => { errors => \@errors,
                             username => $username,
                             password => $password,
                             redirect_to => $redirect_to } );
   }
 
   $MasonBook::Session{username} = $username;
   $MasonBook::Session{token} =
       Digest::SHA1::sha1_hex( 'My secret phrase', $username );
 
   $m->comp( 'redirect.mas',
             path => $redirect_to );
  </%init>

This component simply checks (via magic hand waving) that the username and password are valid and if they are, it generates an authentication token, which is added to the user's session. To generate this token, we take the username, which is also in the session, and combine it with a secret phrase. We then generate a MAC from those two things.

The authentication and authorization check looks like this:


  if ( $MasonBook::Session{token} ) {
      if ( $MasonBook::Session{token} eq
           Digest::SHA1::sha1_hex( 'My secret phrase',
                                   $MasonBook::Session{username} ) {

          # R<... valid login, do something here>
      } else {
          # R<... someone is trying to be sneaky!>
      }
  } else { # no token
       my $wanted_page = $r->uri;
       
       # Append query string if we have one.
       $wanted_page .= '?' . $r->args if $r->args;

       $m->comp( 'redirect.mas',
                  path => '/login/login_form.html',
                  query => { redirect_to => $wanted_page } );
  }

We could put all the pages that require authorization in a single directory tree and have a top-level autohandler in that tree do the check. If there is no token to check, then we redirect the browser to the login page, and after a successful login they'll return, assuming that they submit valid login credentials.

Access Controls With Attributes

The components we saw previously assumed that there are only two access levels, unauthenticated and authenticated. A more complicated version of this code might involve checking that the user has a certain access level or role.

In that case, we'd first check that we had a valid authentication token and then go on to check that the user actually had the appropriate access rights. This is simply an extra step in the authorization process.

Using attributes, we can easily define access controls for different portions of our site. Let's assume that we have four access levels, "Guest," "User," "Editor" and "Admin." Most of the site is public, and viewable by anyone. Some parts of the site require a valid login, while some require a higher level of privilege.

We implement our access check in our top-level autohandler, , from which all other components must inherit in order for the access control code to be effective.



  <%init>
   my $user = get_user();  # again, hand waving
 
   my $required_access = $m->base_comp->attr('required_access');
 
   unless ( $user->has_access_level($required_access) ) {
      # R<... do something like send them to another page>
   }
 
   $m->call_next;
  </%init>
  <%attr>
   required_access => 'Guest'
  </%attr>

It is crucial that we set a default access level in this autohandler. By doing this, we are saying that by default, all components are accessible by all people, since every visitor will have at least "Guest" access.

We can override this default elsewhere. For example, in a component called /admin/autohandler, we might have:


  <%attr>
   required_access => 'Admin'
  </%attr>

As long as all the components in the directory inherit from the component and don't override the required_access attribute, we have effectively limited that directory (and its subdirectories) to administration users only. If we, for some reason, had an individual component in the directory that we wanted editors to be able to see, we could simply set the "required_access" attribute for that component to "Editor."

Managing DBI Connections

Not infrequently, we see people on the Mason users list asking questions about how to handle caching DBI connections.

Our recipe for this is really simple:


  use Apache::DBI

Rather than reinventing the wheel, use Apache::DBI, which provides the following features:

  • It is completely transparent to use. Once you've used it, you simply call DBI->connect() as always and Apache::DBI gives you an existing handle if one is available.

  • It makes sure that the handle is live, so that if your RDBMS goes down and then back up, your connections still work just fine.

  • It does not cache handles made before Apache forks, as many DBI drivers do not support using a handle after a fork.

Generating Config Files

Config files are a good candidate for generation by Mason. For example, your production and staging Web server config files might differ in only a few areas. Changes to one usually will need to be propagated to another. This is especially true with mod_perl, where Web server configuration can basically be part of a Web-based application.

On top of this, you may decide to set up a per-developer environment, either by having each developer run the necessary software on their own machine, or by starting Web servers on many different ports on a single development server. In this scenario, a template-driven config file generator becomes even more appealing.

Here's a simple script to drive this generation. This script assumes that all the processes are running on one shared development machine.


  #!/usr/bin/perl -w

  use strict;

  use Cwd;
  use File::Spec;
  use HTML::Mason;
  use User::pwent;

  my $comp_root =
      File::Spec->rel2abs( File::Spec->catfile( cwd(), 'config' ) );

  my $output;
  my $interp =
      HTML::Mason::Interp->new( comp_root  => $comp_root,
				out_method => \$output,
			      );

  my $user = getpwuid($<);

  $interp->exec( '/httpd.conf.mas', user => $user );

  my $file =  File::Spec->catfile( $user->dir, 'etc', 'httpd.conf' );
  open FILE, ">$file" or die "Cannot open $file: $!";
  print FILE $output;
  close FILE;

A httpd.conf.mas component might look like this:


  ServerRoot <% $user->dir %>

  PidFile <% File::Spec->catfile( $user->dir, 'logs', 'httpd.pid' ) %>

  LockFile <% File::Spec->catfile( $user->dir, 'logs', 'httpd.lock' ) %>

  Port <% $user->uid + 5000 %>

  # loads Apache modules, defines content type handling, etc.
  <& standard_apache_config.mas &>

  <perl>
   use lib <% File::Spec->catfile( $user->dir, 'project', 'lib' ) %>;
  </perl>

  DocumentRoot <% File::Spec->catfile( $user->dir, 'project', 'htdocs' ) %>

  PerlSetVar MasonCompRoot <% File::Spec->catfile( $user->dir, 'project', 'htdocs' ) %>
  PerlSetVar MasonDataDir <% File::Spec->catfile( $user->dir, 'mason' ) %>

  PerlModule HTML::Mason::ApacheHandler

  <filesmatch "\.html$">
   SetHandler perl-script
   PerlHandler HTML::Mason::ApacheHandler
  </filesmatch>

  <%args>
  $user
  </%args>

This points the server's document root to the developer's working directory. Similarly, it adds the project/lib directory to Perl's @INC via use lib so that the user's working copy of the project's modules are seen first. The server will listen on a port equal to the user's user ID, plus 5,000.

Obviously, this is an incomplete example. It doesn't specify where logs will go, or other necessary config items. It also doesn't handle generating the config file for a server intended to be run by the root user on a standard port.

If You Want More ...

These recipes were adapted from Chapter 11, "Recipes," of Embedding Perl in HTML With Mason. And, of course, the book contains a lot more than just recipes. If you're interested in learning more about Mason, the book is a great place to start.

Also, don't forget to check out the Mason HQ site at www.masonhq.com/, which contains online documentation, user-contributed code and docs, and links to the Mason users mailing list, which is another great resource for developers using Mason.


O'Reilly & Associates recently released (October 2002) Embedding Perl in HTML with Mason.

This week on Perl 6 (12/02-08, 2002)

Another Monday evening. Another summary to write.

Starting, as is becoming tediously predictable, with perl6-internals.

Another JIT Discussion

Toward the end of the previous week, Leopold Tötsch posted something about the latest round of changes to the JIT core. Daniel Grunblatt was concerned that the current JIT wasn't doing the correct thing when it came to hardware register allocation and wanted to remove a some conditional logic. Leo didn't agree at first, but became convinced and Daniel's requested change was applied.

http://groups.google.com/groups

Fun With IMCC

Many things happened with IMCC this week:

  • David Robins posted a list of minor niggles (For instance, it turns out you can't ret early from a .sub) and suggested some remedies. Leo Tötsch mentioned that the IMCC Cabal (which would consist of Melvin Smith, Sean O'Rourke, Angel Faus and Leo if there were a Cabal. But, as everyone knows, There Is No Cabal) has been discussing several of these issues.

    http://groups.google.com/groups

    http://groups.google.com/groups

  • Art Haas had problems building IMCC, apparently bison didn't like the imcc.y file. Leo tracked down the problem (when asked a second time; I think he might be slipping) and checked in a working fix.
  • Leo Tötsch made a pile of changes to IMCC to eliminate clashes between Parrot's PASM language and IMCC's PIR syntax, which had made it hard to mix the two. Full details of the changes are in his post.

    Gopal V wondered whether there was any way of feeding code to IMCC beyond simply writing to a file and running IMCC. He'd had to make a bunch of changes to the IMCC files that he used, and wondered whether there was a better way. Actually, he didn't so much wonder as propose the aforementioned better way, lifting ideas from DotGNU's treecc. He and Leo discussed things, worked out an interface and Gopal went off to implement something. (Yay Gopal!)

    http://groups.google.com/groups

    http://groups.google.com/groups

  • Steve Fink posted a patch implementing a first cut at namespace support in IMCC. He wasn't at all sure that what he'd implemented was the right thing, but it supplied what he needed for the time being (if that makes sense) in the regex engine. Leo reckoned that it looked OK, and promised to apply it if nobody hollered. He also pointed out some problems with the current regex implementation to do with re-entrancy and memory leakage. It turns out that Steve was working on languages/regex rather than the rx_* ops, which are the ones that have the problems.

    http://groups.google.com/groups

  • Gregor N. Purdy had some problems with IMCC's syntax, a fragment of code that he thought should work fell over in a heap. Both Mr. Nobody and Leo pointed out that IMCC expects subroutines, and you should wrap your code in a .sub/.end pair.

    Once Gregor had that straight, he posted a Jako program and the IMCC code he thought the Jako compiler should generate from it and asked for any feedback before he went to change the compiler. Leo Tötsch provided some (I assume) useful feedback.

    A little later Gregor posted again, and he was still having problems with IMCC not quite behaving as he wanted for the Jako compiler. He and Leo thrashed it out over a few messages and, to cut a long story short, IMCC looks like it won't be changing. I'm not sure whether Gregor is happy about this ... .

    http://groups.google.com/groups

    http://groups.google.com/groups

  • Mr. Nobody posted a patch to get IMCC to compile under Windows. Apparently, the OUT label clashes with something in the Windows header files. The patch was applied.
  • Gregor N. Purdy got a little confused by how IMCC generates PASM code, and posted some sample code, interspersed with questions, which Leo answered. It's worth looking at this; it shows off the kind of optimization that IMCC gets up to.

    http://groups.google.com/groups

    http://groups.google.com/groups

PMCs Are the Thing

Dan announced that he's finally stopped waffling and frozen the PMC structures `modulo the odd twiddling to it.' He's added a pmc.ops file, and has started adding in ops to manipulate PMC internals. Leo asked for some clarifications, got some, and then wondered what the final 'Parrot Object' would look like.

http://groups.google.com/groups

logical_not Issue

David Robins is having fun with logical_not and Ruby. The issue is that all integers in Ruby are true, whether or not they are zero but that with some of the assumptions in other PMCs. Robin offered 3 suggestions for how to fix it. Dan noted that it's an issue for Perl 6, too, since the truth or otherwise of a value can be modified by a property of that value, coming up with a fix is on his to-do list. David wondered whether this had been discussed before and offered another possible way forward. Dan half liked the idea but noted that the approach didn't work for and, or and xor, at least where Perl is concerned.

http://groups.google.com/groups

The Language That Dare Not Speak Its Name

After a fortnight, during which nobody made any comment on Leon Brocard's patch adding a brainfuck subdirectory to the languages directory, Nicholas Clark committed it in CVS.

At which point, Andy Dougherty spoke up to say he wasn't happy about it, saying that he didn't wish to be associated with needlessly crude and offensive language. After some further debate, the subdirectory was renamed to bf in such a way that, if you ask CVS it will tell you that the brainfuck subdirectory does not exist now and never has existed. Which seems strangely appropriate somehow. Fnord.

http://groups.google.com/groups

http://groups.google.com/groups

Parrot Organization

Michael Collins asked about the structure of the Parrot development organization, and Dan provided some answers. My favorite Q&A:

Q. Is there any formal structure to this organization.

A. I [Dan] delude myself into thinking I'm more or less in charge ... .

http://groups.google.com/groups

http://groups.google.com/groups

Just When You Thought It Was Safe ...

to start using long file names with impunity, Mr. Nobody pointed out that a bunch of the files in the Parrot repository didn't play well with the MS-DOS 'filesystem's 8.3 naming rules. "So what?" asked Aldo Calpini. Mr. Nobody asked if DOS was an intended compilation target. Answer: "No". The consensus appears to be "Ha! We laugh at your crappy filename restrictions and will not be jumping through any hoops to deal with a faintly silly hypothetical target." Or maybe that's just my opinion dressed up as consensus. Ah well, if I'm wrong, then I'm sure someone will tell me.

http://groups.google.com/groups

Meanwhile, Over in perl6-language

The perl6-documentation team have been discussing String literals and their discussion spilled over into perl6-language as there are several things about them that are undefined and needed discussing by the language crowd. It's all to do with how octal numbers and octal string escapes are specified. Essentially, people don't like the current Perl5/C style 0101 (octal number) and \101 (octal string escape), so James supplied a list of the other possibilities. (The current Numeric literals doc say that 0c101 designates an octal numeric literal, but then the consistent extension to string literals (\c101 clashes with the current method of specifying a control-char). After some debate, Larry pulled one of his gloriously clear posts out of the bag, sketching the issues and coming up with a straightforward and obvious (but only with hindsight) way forward. It's good to be reminded why we trust Larry. Anyway, it turns out that an octal number will be specified using 0o101 and an octal character escape will probably be one of \0o[101] or \c[0o101] (I like the second better...)

http://groups.google.com/groups

http://groups.google.com/groups

Thinking about \c[...]

Once Larry had pulled \c[0o101] out of the bag, it fell to David Whipp to wonder what you could get up to with it. For instance, could you do: print "\c[71, 101, 108, 108, 111]" and have that print "Hello"? Damian pointed out that Larry had already discussed some of this in Apocalypse 5, but that the separator character would probably be the semicolon. Then Nicholas Clark got evil, and wondered about "\c[$(call_a_func())]", but Damian seemed to think that wouldn't be such a good idea.

http://groups.google.com/groups

Purge: Opposite of Grep

Miko O'Sullivan suggested a purge command that would be to grep as unless is to if. Nobody seemed to like the name that much, though most seemed to think the idea was sound. Michael Lazzaro suggested divvy could be used to break a list into multiple lists (he initially proposed just breaking the list into two lists, but others extended the idea to more). Damian didn't like the name, and initially proposed classify, which would return a list of array references. Discussion continued for a while until Ralph Mellor suggested part as the name for this putative function, which Damian leapt on with a glad cry.

This went on for a while, with extra features being proposed and other explorations of the possibilities including some rather nifty proposed shorthand/DWIMmery.

Meanwhile, Ken Fox wondered why we couldn't just implement classify/part/divvy as a normal sub and why everything had to be built into the first version of Perl 6. So Damian implemented it, but commented that "then the thousands of people who are apparently clamoring for this functionality and who would have no hope of getting the above correct, would have to pull in some module every time they wanted to partition an array." Ken was impressed, and asked for some commentary on how it all worked, which Damian provided. BTW, this code is really worth looking at for an example of the kind of power that Perl 6 will provide.

David Wheeler wasn't over keen on calling the function 'part' because part has so many different possible interpretations. It turns out that that's why Damian likes the name so much.

http://groups.google.com/groups

http://groups.google.com/groups -- classify

http://groups.google.com/groups

http://groups.google.com/groups -- Damian implements part

http://groups.google.com/groups -- Damian implements part with sane formatting

http://groups.google.com/groups -- Damian explains it all

In Defense of Zero-Indexed Arrays

Michael G. Schwern asked people to `explain how having indexes in Perl 6 start at zero will benefit most users. Do not invoke legacy.' Answers ranged from the silly to the sincere. The best answer was "Because I [Larry] like it" which, I think, trumps everyone.

http://groups.google.com/groups

http://groups.google.com/groups

Stringification of References

Joseph F. Ryan kicked off a discussion of the stringification of objects and references and offered his suggestions. Joseph leans toward having the default stringifications of objects and references provide information useful to the programmer. I agree with him (so, if you spot any bias in the upcoming summary that'd be because I'm biased). Michael Lazzaro explicitly brought up the distinction between "stringification for output" and "stringification for debugging", and came down in favor of stringification for output (heck, he even wanted references to be invisible to stringification). Piers Cawley told him he was wrong and appealed to the authority of Kent Beck (a Smalltalk and Java programmer, possibly not the best authority to choose). Michael then proposed a scheme involving subclasses of String, to provide cues for different stringifications, which John Siracusa thought was going rather a long way too far, coming down in favour of the "stringify for debugging" position. I'm not sure anything has actually been decided yet though. Tune in next week.

http://groups.google.com/groups

Outline of Class Definitions in Perl 6

Simon Cozens asked for a translation of some Perl 5 style OO code into Perl 6, and Luke Palmer had a go at the task, then Larry came through with something a little more definitive (but not actually definitive just yet, I get the feeling that a few things are still in flux ... .)

http://groups.google.com/groups

http://groups.google.com/groups

Perl 6 and Set Theory

Luke Palmer posted a fascinating document presenting a "new way of thinking about some constructs." and proposed some changes to help with consistency. The document covered junctions and classes, recasting them as representations of finite and infinite sets. Only Damian responded with a few corrections and clarifications noting that one of Luke's proposed changes was rather fundamental, and that he wasn't sure he wanted to make that change without some deep reflection (from someone) on how that would affect the junction types that Luke hadn't considered. Discussion continues.

http://groups.google.com/groups

In Brief

Steve Fink is toying with adding OpenGL ops to Parrot.

Leon Brocard has used the native call interface features to add curses support to Parrot and offered a version of life.pasm that makes use of it.

The perl6-language crowd are currently working on string literals and stringification.

Who's Who in Perl 6?

I'm bumping this one up the questionnaire queue slightly. I felt the need for some controversy.

Who are you?
Abigail
What do you do for/with Perl 6?
Nothing, except for disliking languages that are white space sensitive.
Where are you coming from?
I've been coding Perl since 1995. Joined p5p in 1996 or so.
When do you think Perl 6 will be released?
A usable release? Given the current rate in which apocalypses are produced, I'd say 2008. Give or take a few years.
Why are you doing this?
I adore Perl. Perl5 that is. Programming in Perl5 is like exploring a large medieval castle, surrounded by a dark, mysterious forest, with something new and unexpected around each corner. There are dragons to be conquered, maidens to be rescued, and holy grails to be quested for. Lots of fun. Perl6 looks like a Louis-XVI castle and garden to me. Straight, symmetric, and bright. There are wigs to be powdered, minuets to be danced, all quite boring. I haven't been impressed by new features yet, but I'm disappointed by what will be lost.
You have 5 words. Describe yourself.
My mind is twisted. Backwards.
Do you have anything to declare?
One of the great things about Perl5 is that I don't have to declare anything I don't want to.

Acknowledgements

Another week of writing on the train and, for a change of scenery, at my parents' house, fuelled, as usual by large amounts of tea.

Proofreading was once again down to Aspell and me. Any errors this week are probably my fault, it's about time I started accepting my responsibilities.

Thanks to everyone who has sent me questionnaire answers, I've got a queue of about four left so, if you work with Perl 6 (or, like Abigail, hate it) please answer the same set of questions Abigail just answered and send them to me at mailto:5Ws@bofh.org.uk. Thanks.

I got some mail last week from someone praising me for the summaries (thanks), but wanting to know how he could contribute his time and energy, so this week the chorus has a few extra lines in it:

If you didn't like the summary, what are you doing still reading it? If you did like it, please consider one or more of the following options:

The fee paid for publication of these summaries on perl.com is paid directly to the Perl Foundation.

Improving mod_perl Sites' Performance: Part 5

Sharing Memory

As we have learned in the previous article, sharing memory helps us save memory with mod_perl, giving us a huge speed increase; but we pay the price with a big memory footprint. I presented a few techniques to save memory by trying to share more of it. In this article, we will see other techniques allowing you to save even more memory.

Preloading Registry Scripts at Server Startup

What happens if you find yourself stuck with Perl CGI scripts and you cannot or don't want to move most of the stuff into modules to benefit from modules preloading, so the code will be shared by the children? Luckily, you can preload scripts as well. This time the Apache::RegistryLoader module comes to your aid. Apache::RegistryLoader compiles Apache::Registry scripts at server startup.

For example, to preload the script /perl/test.pl, which is in fact the file /home/httpd/perl/test.pl, you would do the following:


  use Apache::RegistryLoader ();
  Apache::RegistryLoader->new->handler("/perl/test.pl",
                            "/home/httpd/perl/test.pl");

You should put this code either into a <Perl> section or into a startup script.

But what if you have a bunch of scripts located under the same directory and you don't want to list them one by one? Take the benefit of Perl modules and put them to good use - the File::Find module will do most of the work for you.

The following code walks the directory tree under which all Apache::Registry scripts are located. For each file it encounters with the extension .pl, it calls the Apache::RegistryLoader::handler() method to preload the script in the parent server, before pre-forking the child processes:


  use File::Find qw(finddepth);
  use Apache::RegistryLoader ();
  {
    my $scripts_root_dir = "/home/httpd/perl/";
    my $rl = Apache::RegistryLoader->new;
    finddepth
      (
       sub {
         return unless /\.pl$/;
         my $url = "$File::Find::dir/$_";
         $url =~ s|$scripts_root_dir/?|/|;
         warn "pre-loading $url\n";
           # preload $url
         my $status = $rl->handler($url);
         unless($status == 200) {
           warn "pre-load of `$url' failed, status=$status\n";
         }
       },
       $scripts_root_dir);
  }

Note that I didn't use the second argument to handler() here, as in the first example. To make the loader smarter about the URI to filename translation, you might need to provide a trans() function to translate the URI to a filename. URI to filename translation normally doesn't happen until HTTP request time, so the module is forced to roll its own translation. If the filename is omitted and a trans() function was not defined, then the loader will try using the URI relative to ServerRoot.

A simple trans() function can be like this:


  sub mytrans {
    my $uri = shift;
    $uri =~ s|^/perl/|/home/httpd/perl/|;
    return $uri;
  }

You can easily derive the right translation by looking at the Alias directive. The above mytrans() function is matching our Alias:


  Alias /perl/ /home/httpd/perl/

After defining the URI to filename translation function, you should pass it during the creation of the Apache::RegistryLoader object:


  my $rl = Apache::RegistryLoader->new(trans => \&mytrans);

I won't show any benchmarks here, since the effect is absolutely the same as with preloading modules.

Modules Initializing at Server Startup

We have just learned that it's important to preload the modules and scripts at the server startup. It turns out that it's not enough for some modules and you have to prerun their initialization code to get more memory pages shared. Basically you will find an information about specific modules in their respective manpages. I will present a few examples of widely used modules where the code can be initialized.

Initializing DBI.pm

The first example is the DBI module. As you know, DBI works with many database drivers in the DBD:: namespace, such as DBD::mysql. It's not enough to preload DBI; you should initialize DBI with the driver(s) that you are going to use (usually a single driver is used) if you want to minimize memory use after forking the child processes. Note that you want to do this under mod_perl and other environments where shared memory is important. In other circumstances, you shouldn't initialize drivers.

You probably know already that under mod_perl you should use the Apache::DBI module to make the connection persistent, unless you want to open a separate connection for each user -- in which case, you should not use this module. Apache::DBI automatically loads DBI and overrides some of its methods, so you should continue coding just as though you were simply using the DBI module.

Just as with modules preloading, our goal is to find the startup environment that will lead to the smallest ``difference'' between the shared and normal memory reported, which would mean a smaller total memory usage.

And again, in order to make it easy to measure, I will use only one child process. To do this, I will use these settings in httpd.conf:


  MinSpareServers 1
  MaxSpareServers 1
  StartServers 1
  MaxClients 1
  MaxRequestsPerChild 100

I'm going to run memory benchmarks on five different versions of the startup.pl file. I always preload these modules:


  use Gtop();
  use Apache::DBI(); # preloads DBI as well
option 1
Leave the file unmodified.
option 2
Install MySQL driver (I will use MySQL RDBMS for our test):

  DBI->install_driver("mysql");

It's safe to use this method, since just like with use(), if it can't be installed it'll die().

option 3
Preload MySQL driver module:

  use DBD::mysql;
option 4
Tell Apache::DBI to connect to the database when the child process starts (ChildInitHandler) - no driver is preloaded before the child gets spawned!

  Apache::DBI->connect_on_init('DBI:mysql:test::localhost',
                             "",
                             "",
                             {
                              PrintError => 1, # warn() on errors
                              RaiseError => 0, # don't die on error
                              AutoCommit => 1, # commit executes
                              # immediately
                             }
                            )
  or die "Cannot connect to database: $DBI::errstr";

Here is the Apache::Registry test script that I have used:


  preload_dbi.pl
  --------------
  use strict;
  use GTop ();
  use DBI ();
    
  my $dbh = DBI->connect("DBI:mysql:test::localhost",
                         "",
                         "",
                         {
                          PrintError => 1, # warn() on errors
                          RaiseError => 0, # don't die on error
                          AutoCommit => 1, # commit executes
                                           # immediately
                         }
                        )
    or die "Cannot connect to database: $DBI::errstr";
  
  my $r = shift;
  $r->send_http_header('text/plain');
  
  my $do_sql = "show tables";
  my $sth = $dbh->prepare($do_sql);
  $sth->execute();
  my @data = ();
  while (my @row = $sth->fetchrow_array){
    push @data, @row;
  }
  print "Data: @data\n";
  $dbh->disconnect(); # NOP under Apache::DBI
  
  my $proc_mem = GTop->new->proc_mem($$);
  my $size  = $proc_mem->size;
  my $share = $proc_mem->share;
  my $diff  = $size - $share;
  printf "%8s %8s %8s\n", qw(Size Shared Diff);
  printf "%8d %8d %8d (bytes)\n",$size,$share,$diff;

The script opens a connection to the database 'test' and issues a query to learn what tables the databases has. When the data is collected and printed the connection would be closed in the regular case, but Apache::DBI overrides it with empty method. When the data is processed, some code to print the memory usage follows -- this should already be familiar to you.

The server was restarted before each new test.

So here are the results of the five tests that were conducted, sorted by the Diff column:

  1. After the first request:
    
      Version     Size   Shared     Diff        Test type
      --------------------------------------------------------------------
            1  3465216  2621440   843776  install_driver
            2  3461120  2609152   851968  install_driver & connect_on_init
            3  3465216  2605056   860160  preload driver
            4  3461120  2494464   966656  nothing added
            5  3461120  2482176   978944  connect_on_init
  2. After the second request (all the subsequent request showed the same results):
    
      Version     Size   Shared    Diff         Test type
      --------------------------------------------------------------------
            1  3469312  2609152   860160  install_driver
            2  3481600  2605056   876544  install_driver & connect_on_init
            3  3469312  2588672   880640  preload driver
            4  3477504  2482176   995328  nothing added
            5  3481600  2469888  1011712  connect_on_init

Now what do we conclude from looking at these numbers. First, we see that only after a second reload do we get the final memory footprint for a specific request in question (if you pass different arguments the memory usage might and will be different).

But both tables show the same pattern of memory usage. We can clearly see that the real winner is the startup.pl file's version where the MySQL driver was installed (1). Since we want to have a connection ready for the first request made to the freshly spawned child process, we generally use the second version (2), which uses somewhat more memory, but has almost the same number of shared memory pages. The third version only preloads the driver, resulting in smaller shared memory. The last two versions having nothing initialized (4) and having only the connect_on_init() method used (5). The former is a little bit better than the latter, but both significantly worse than the first two versions.

To remind you why do we look for the smallest value in the column diff, recall the real memory usage formula:


  RAM_dedicated_to_mod_perl = diff * number_of_processes
                            + the_processes_with_largest_shared_memory

Notice that the smaller the diff is, the bigger the number of processes you can have using the same amount of RAM. Therefore, every 100K difference counts, when you multiply it by the number of processes. If we take the number from the version (1) vs. (4) and assume that we have 256M of memory dedicated to mod_perl processes, we will get the following numbers using the formula derived from the above formula:


               RAM - largest_shared_size
  N_of Procs = -------------------------
                        Diff

                268435456 - 2609152
  (ver 1)  N =  ------------------- = 309
                      860160

                268435456 - 2469888
  (ver 5)  N =  ------------------- = 262
                     1011712

So you can see the difference - 17 percent more child processes in the first version.

Initializing CGI.pm

CGI.pm is a big module that by default postpones the compilation of its methods until they are actually needed, thus making it possible to use it under a slow mod_cgi handler without adding a big overhead. That's not what we want under mod_perl, and if you use CGI.pm you should precompile the methods that you are going to use at the server startup in addition to preloading the module. Use the compile method for that:


  use CGI;
  CGI->compile(':all');

where you should replace the tag group :all with the real tags and group tags that you are going to use if you want to optimize the memory usage.

I'm going to compare the shared memory footprint by using a script that is backward compatible with mod_cgi. You will see that you can improve the performance of these kind of scripts as well, but if you really want a fast code think about porting it to use Apache::Request for the CGI interface, and some other module for HTML generation.

So here is the Apache::Registry script that I'm going to use to make the comparison:


  preload_cgi_pm.pl
  -----------------
  use strict;
  use CGI ();
  use GTop ();

  my $q = new CGI;
  print $q->header('text/plain');
  print join "\n", map {"$_ => ".$q->param($_) } $q->param;
  print "\n";
  
  my $proc_mem = GTop->new->proc_mem($$);
  my $size  = $proc_mem->size;
  my $share = $proc_mem->share;
  my $diff  = $size - $share;
  printf "%8s %8s %8s\n", qw(Size Shared Diff);
  printf "%8d %8d %8d (bytes)\n",$size,$share,$diff;

The script initializes the CGI object, sends a HTTP header and then prints all the arguments and values that were passed to the script if there were any. As usual, at the end, I print the memory usage.

As usual, I am going to use a single child process, using the usual settings in httpd.conf:


  MinSpareServers 1
  MaxSpareServers 1
  StartServers 1
  MaxClients 1
  MaxRequestsPerChild 100

I'm going to run memory benchmarks on three different versions of the startup.pl file. I always preload this module:


  use Gtop();
option 1
Leave the file unmodified.
option 2
Preload CGI.pm:

  use CGI ();
option 3
Preload CGI.pm and pre-compile the methods that I'm going to use in the script:

  use CGI ();
  CGI->compile(qw(header param));

The server was restarted before each new test.

So here are the results of the three tests that were conducted, sorted by the Diff column:

  1. After the first request:
    
      Version     Size   Shared     Diff        Test type
      --------------------------------------------------------------------
            1  3321856  2146304  1175552  not preloaded
            2  3321856  2326528   995328  preloaded
            3  3244032  2465792   778240  preloaded & methods+compiled
  2. After the second request (all the subsequent request showed the same results):
    
      Version     Size   Shared    Diff         Test type
      --------------------------------------------------------------------
            1  3325952  2134016  1191936 not preloaded
            2  3325952  2314240  1011712 preloaded
            3  3248128  2445312   802816 preloaded & methods+compiled

The first version shows the results of the script execution when CGI.pm wasn't preloaded. The second version has the module preloaded. The third is when it's both preloaded and the methods that are going to be used are precompiled at the server startup.

By looking at version one of the second table, we can conclude that preloading adds about 20K to the shared size. As I have mentioned at the beginning of this section, that's how CGI.pm was implemented -- to reduce the load overhead. This means that preloading CGI almost hardly changes anything. But if we compare the second and the third versions, then we will see a significant difference of 207K (1011712-802816), and I have only used a few methods (the header method loads a few more methods transparently for a user). Imagine how much memory I'm going to save if I'm going to precompile all the methods that I'm using in other scripts that use CGI.pm and do a little bit more than the script that I have used in the test.

But even in our simple case using the same formula, what do we see? (assuming that I have 256MB dedicated for mod_perl)


               RAM - largest_shared_size
  N_of Procs = -------------------------
                        Diff
						
                268435456 - 2134016
  (ver 1)  N =  ------------------- = 223
                      1191936
					  
                268435456 - 2445312
  (ver 3)  N =  ------------------- = 331
                     802816

If I preload CGI.pm and precompile a few methods that I use in the test script, I can have 50 percent more child processes than when I don't preload and precompile the methods that I am going to use.

I've heard that the 3.x generation of CGI.pm will be less bloated, but it's in a beta state as of this writing.

Increasing Shared Memory With mergemem

mergemem is an experimental utility for linux, which looks very interesting for us mod_perl users: http://www.complang.tuwien.ac.at/ulrich/mergemem/

It looks like it could be run periodically on your server to find and merge duplicate pages. It won't halt your httpds during the merge, this aspect has been taken into consideration already during the design of mergemem: Merging is not performed with one big system call. Instead most operation is in userspace, making a lot of small system calls.

Therefore, blocking of the system should not happen. And, if it really should turn out to take too much time you can reduce the priority of the process.

The worst case that can happen is this: mergemem merges two pages and immediately afterward, they will be split. The split costs about the same as the time consumed by merging.

This software comes with a utility called memcmp to tell you how much you might save.

References

This week on Perl 6 (11/24-12/01, 2002)

Oh look, it's only Monday evening and Piers has started writing this week's summary. What is the world coming to?

As usual, we start with the internals list.

C#/Parrot Status

During last week's discussion of C# and Parrot, Nicholas Clark confessed that floating point fills him with fear. Rhys Weatherley attempted to assure him by saying that C# doesn't require floating point overflow detection. Dan pointed out that, regardless of C#'s needs, we'd need overflow detection for our own purposes.

Gopal V posted a dotgnu.ops file, implementing the conversion ops that the DotGNU project needs. Leopold Tötsch dropped unsubtle hints about the NEED FOR TESTS and did a certain amount of patch polishing before applying it with added tests. Gopal commented that Parrot people make it almost too easy. Several other patches to the ops got added in order to take into account portability to such common hardware as Crays, PPC and ARM.

http://groups.google.com/groups

http://groups.google.com/groups

NCI stuff (mostly) done

Dan offered a few progress reports on the state of the Native Call Interface, which is nearing completion, and produced a document on how to use it (essentially, you edit call_types.txt). James Mastros suggested that a good deal of what was in the post should be included as comments at the top of call_types.txt and that maybe call_types.txt. should be renamed to something like nci_types.txt.

Leon Brocard had a go at using this new functionality to call out to libSDL but had some problems with it needing to have libpthread loaded.

http://groups.google.com/groups

http://groups.google.com/groups

http://groups.google.com/groups

Changes to parrot/docs/jit.pod

Leo Tötsch committed a few changes to the JIT documentation, which caused Nicholas Clark to wonder why some of the behaviour was as described. And then my head started hurting. As far as your summarizer is concerned, JIT is Really Cool Scary Magic that makes it go faster, so I will continue with my shameless handwaving whenever this topic comes up. I think this thread had something to do with making sure that the mapping between parrot registers and hardware registers is efficient.

http://groups.google.com/groups

Befunge-93? No! Befunge-98!

Jerome Quelin wondered if the $PARROT/languages/Befunge-93 directory could be renamed into $PARROT/languages/befunge because, as soon as Parrot supports objects he intends to implement the Befunge 98 specs, and a Befunge 98 interpreter can be used to interpret Befunge 93 code. Jerome's wish was Robert Spier's Command, and the directory has now been moved. Nicholas Clark muttered something about subversion and VMS (where subversion is a CVS replacement which, amongst other things, allows you to rename directories without causing confusion.)

http://groups.google.com/groups

This week's patches

For the first time in quite a while this week, several people gave Leo Tötsch a run for his money in `most patches to Parrot' stakes.

Patches to Parrot this week include:

  • More JIT changes: Register usage array, block allocation

    Leo has been doing more magic with the JIT system, but Daniel Grunblatt raised a query about ALLOCATE_REGISTER_ALWAYS.

  • Config test for i386 fcomip

    Leo added a test to see if the fcomip operation is available to the JIT system.

  • Use IMCC instead of assemble.pl

    Jürgen Bömmels has hacked IMCC to accept normal .pasm files. This leads to dramatic (350%) speed increases in assembly, but causes several of the tests to fail because IMCC has no macro support. Jürgen intends to add macro support to IMCC if this approach proves acceptable. Leo liked it a lot, suggesting that, until IMCC gets native macros it should be possible to use assemble.pl's preprocessor option to get macro free .pasm files. The patch was applied.

  • Introducing debug features in Befunge

    Jerome Quelin patched the Befunge interpreter to add the beginnings of real debugger support. He threatens a `fully-functional debugger within the Befunge interpreter with breakpoints, dumping the playfield, status line and more'. Nicholas Clark was scared, but applied the patch anyway.

    Jerome later supplied a patch adding single stepping, playfield dumping an a rudimentary UI. If Nicholas Clark was still scared he didn't show it, and applied the patch.

  • Replace 'perl' with 'parrot' in the PDDs

    Alin Iacob noticed that, in several places, the PDDs (Parrot Design Documents) use 'Perl' where they should probably use 'Parrot', and submitted a patch to fix things.

  • Long double support in i386/Linux JIT

    Leo supplied a patch fixing the `long double issue' in the i386/Linux JIT core.

On rereading that list I see that Leo is still comfortably in the lead as most prolific patcher. Somehow it didn't seem that way during the week.

Multiarray usage

Jerome Quelin wanted to use multiarrays but couldn't understand how they work, so he asked the list. Leo Tötsch provided a pile of answers and a short discussion ensued. I think Jerome chose to use a PerlArray of PerlArrays in the end. Leo pointed out that multiarrays come into their own for 'big packed arrays'.

http://groups.google.com/groups

Meanwhile, in perl6-language

By gum but you American chaps take Thanksgiving seriously don't you? There was grand total of 32 messages in perl6-language this week. I have the feeling I could just concatenate them and this section would be no longer than the usual perl6-language summary. But that would be the wrong kind of Lazy.

Dynamic Scoping

The discussion of Ralph Mellor's proposal about dynamic scoping and implicit argument passing continued. In response to last week's summary Ralph clarified that he suggested implicit argument passing as a way of eliminating globals; the perceived thread safety thing was beside the point.

Back in the thread (and in a post I discussed last week, my filtering is obviously confused), Larry provided some clarifications and suggested extensions to the currying mechanism which would address some of Ralph's concerns. Ralph responded with a pile of posts and suggested extending the currying idea still further.

http://groups.google.com/groups

Status Summary; next steps

Michael Lazzaro posted a summary of where the perl6-documentation people had got to and a list of `next Big Deals'. Bryan C. Warnock pointed out that almost all of the Big Deals should really be discussed on perl6-language rather than perl6-documentation and wondered again about the scope of perl6-documentation and how it differs from perl6-language. Larry told us that ``p6d and p6l just have different deliverables. p6l is looking at the elephant from above, while p6d is looking at the elephant from below. But it's the same elephant. (Unless it's a camel.)''. Larry also addressed the Big Deals.

Michael also posted his ideas about what the documentation list was for and where was the appropriate place to discuss things. Bryan worried that p6d was in danger of trying to go faster than the design of the language and asked that the documentation list ``refrain from rampant p6l-ish speculation.'' He went on to suggest that people be kind to Piers, which was nice of him. In a later post, he offered his vision of what the difference between the lists should be. In another subthread, Garrett Goebel offered his vision too (obviously a 'vision' week).

http://groups.google.com/groups

http://groups.google.com/groups -- Larry's vision

http://groups.google.com/groups -- Michael's vision

http://groups.google.com/groups -- Bryan's vision

http://groups.google.com/groups - Garrett's vision

Just wondering...

Piers Cawley pointed out that it's nearly 6 months since the last Apocalypse, and wondered when we could expect the next one. Larry reckons it'd be a lot faster if we'd stop asking interesting questions. Apparently he was planning to have a draft out this week, but made the mistake of buying the extended Fellowship of the Ring DVD set, so that was him out of commission for a few days. Hopefully we'll see something within the next week or so (before Christmas, please...) but the timing is dependent on whether Gloria's family played games Larry likes over Thanksgiving. (So far in the current week, Larry hasn't said whether he played games or did design.) Jonathan Scott Duff asked for the email addresses of Larry's in-laws so he could ask them to play the appropriate games.

And this précis is now almost longer than the original thread...

In Brief

Bryan Hundven wondered about writing an entire Operating System on top of Parrot. At least, I think that's what he wondered about.

Everybody ignored Leon's nabirfuck patch. (You know, I think I may be misspelling the wrong bit of that...)

The Documentation group finished up their work on numeric literals and started work on string literals.

Who's Who in Perl 6?

Who are you?
Zach Lipton, [zach at zachlipton dot com]
What do you do for/with Perl 6?
I'm the maintainer of tinderbox (http://tinderbox.perl.org) and percy (the annoying little ircbot on #parrot who tells people when tinderbox status changes. I also attempt to get bonsai to work on perl.org to make the lives of many others much easier. Occasionally, I tinker with configure and the testing suite, but rarely get anything useful to happen.
Where are you coming from?
I'm mostly coming from the Mozilla project where I do a wide amount of various miscellaneous stuff including working on Bugzilla and Technology Evangelism (no, not televangelism, you may continue your religion). Oh yes, mustn't forget school, the, erm, uh, um, most important place where I learn all sorts of interesting things.
When do you think Perl 6 will be released?
What? You mean that there is more to Perl 6 than Parrot?
Why are you doing this?
It's great to be able to help out with such a great project and give back to the Perl community. Also, it's great experience and just plain fun.
You have 5 words. Describe yourself.
Um, erm, hi, uh, yea!
Do you have anything to declare?
I promise that I will eventually have bonsai completely installed on perl.org and working correctly!

Acknowledgements

It's been a quiet week this week, presumably because of Thanksgiving, and I'm writing this on Tuesday morning, still on the train, which has been very unusual of late.

Proofreading services were once again provided by Aspell and me. Any errors should be laid at the door of Kevin Atkinson, the author of Aspell.

I've been seeing a little more mail about Perl 6 this week, including a set of questionnaire answers from Abigail that I was tempted to push up to the front of the queue this week, but which I think I'll save for later. Speaking of the questionnaire, the answer queue currently contains only three posts, so if you'd like to answer the questions Zach just answered and send them to me at mailto:5Ws@bofh.org.uk I'd be grateful.

Several people have pointed out that my autogenerated URLs for messages in perl6-documentation don't actually work. Apparently google isn't yet picking that group up, so this week I don't provide any links to messages in the documentation list. It's my understanding that, as they produce final docs these will be available from some fixed website, yet to be decided on. Once that's set up I'll start linking to that.

Time for the chorus again:

If you didn't like this summary, what are you doing still reading it? If you did like it, please consider one or both of the following options:

The fee paid for publication of these summaries on perl.com is paid directly to the Perl Foundation.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en