Recently in Modules Category

Making Perl Reusable with Modules

Perl software development can occur at several levels. When first developing the idea for an application, a Perl developer may start with a short program to flesh out the necessary algorithms. After that, the next step might be to create a package to support object-oriented development. The final work is often to create a Perl module for the package to make the logic available to all parts of the application. Andy Sylvester explores this topic with a simple mathematical function.

Creating a Perl Subroutine

I am working on ideas for implementing some mathematical concepts for a method of composing music. The ideas come from the work of Joseph Schillinger. At the heart of the method is being able to generate patterns using mathematical operations and using those patterns in music composition. One of the basic operations described by Schillinger is creating a "resultant," or series of numbers, based on two integers (or "generators"). Figure 1 shows a diagram of how to create the resultant of the integers 5 and 3.

creating the resultant of 5 and 3
Figure 1. Creating the resultant of 5 and 3

Figure 1 shows two line patterns with units of 5 and units of 3. The lines continue until both lines come down (or "close") at the same time. The length of each line corresponds to the product of the two generators (5 x 3 = 15). If you draw dotted lines down from where each of the two generator lines change state, you can create a third line that changes state at each of the dotted line points. The lengths of the segments of the third line make up the resultant of the integers 5 and 3 (3, 2, 1, 3, 1, 2, 3).

Schillinger used graph paper to create resultants in his System of Musical Composition. However, another convenient way of creating a resultant is to calculate the modulus of a counter and then calculate a term in the resultant series based on the state of the counter. An algorithm to create the terms in a resultant might resemble:

Read generators from command line
Determine total number of counts for resultant
   (major_generator * minor_generator)
Initialize resultant counter = 0
For MyCounts from 1 to the total number of counts
   Get the modulus of MyCounts to the major and minor generators
   Increment the resultant counter
   If either modulus = 0
     Save the resultant counter to the resultant array
     Re-initialize resultant counter = 0
   End if
End for

From this design, I wrote a short program using the Perl modulus operator (%):

#!/usr/bin/perl
#*******************************************************
#
# FILENAME: result01.pl
#
# USAGE: perl result01.pl major_generator minor_generator
#
# DESCRIPTION:
#    This Perl script will generate a Schillinger resultant
#    based on two integers for the major generator and minor
#    generator.
#
#    In normal usage, the user will input the two integers
#    via the command line. The sequence of numbers representing
#    the resultant will be sent to standard output (the console
#    window).
#
# INPUTS:
#    major_generator - First generator for the resultant, input
#                      as the first calling argument on the
#                      command line.
#
#    minor_generator - Second generator for the resultant, input
#                      as the second calling argument on the
#                      command line.
#
# OUTPUTS:
#    resultant - Sequence of numbers written to the console window
#
#**************************************************************

   use strict;
   use warnings;

   my $major_generator = $ARGV[0];
   my $minor_generator = $ARGV[1];

   my $total_counts   = $major_generator * $minor_generator;
   my $result_counter = 0;
   my $major_mod      = 0;
   my $minor_mod      = 0;
   my $i              = 0;
   my $j              = 0;
   my @resultant;

   print "Generator Total = $total_counts\n";

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $major_generator;
       $minor_mod = $i % $minor_generator;
       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@resultant, $result_counter);
          $result_counter = 0;
       }
       print "$i \n";
       print "Modulus of $major_generator is $major_mod \n";
       print "Modulus of $minor_generator is $minor_mod \n";
   }

   print "\n";
   print "The resultant is @resultant \n";

Run the program with 5 and 3 as the inputs (perl result01.pl 5 3):

Generator Total = 15
1
Modulus of 5 is 1
Modulus of 3 is 1
2
Modulus of 5 is 2
Modulus of 3 is 2
3
Modulus of 5 is 3
Modulus of 3 is 0
4
Modulus of 5 is 4
Modulus of 3 is 1
5
Modulus of 5 is 0
Modulus of 3 is 2
6
Modulus of 5 is 1
Modulus of 3 is 0
7
Modulus of 5 is 2
Modulus of 3 is 1
8
Modulus of 5 is 3
Modulus of 3 is 2
9
Modulus of 5 is 4
Modulus of 3 is 0
10
Modulus of 5 is 0
Modulus of 3 is 1
11
Modulus of 5 is 1
Modulus of 3 is 2
12
Modulus of 5 is 2
Modulus of 3 is 0
13
Modulus of 5 is 3
Modulus of 3 is 1
14
Modulus of 5 is 4
Modulus of 3 is 2
15
Modulus of 5 is 0
Modulus of 3 is 0

The resultant is 3 2 1 3 1 2 3

This result matches the resultant terms as shown in the graph in Figure 1, so it looks like the program generates the correct output.

Creating a Perl Package from a Program

With a working program, you can create a Perl package as a step toward being able to reuse code in a larger application. The initial program has two pieces of input data (the major generator and the minor generator). The single output is the list of numbers that make up the resultant. These three pieces of data could be combined in an object. The program could easily become a subroutine to generate the terms in the resultant. This could be a method in the class contained in the package. Creating a class implies adding a constructor method to create a new object. Finally, there should be some methods to get the major generator and minor generator from the object to use in generating the resultant (see the perlboot and perltoot tutorials for background on object-oriented programming in Perl).

From these requirements, the resulting package might be:

#!/usr/bin/perl
#*******************************************************
#
# Filename: result01a.pl
#
# Description:
#    This Perl script creates a class for a Schillinger resultant
#    based on two integers for the major generator and the
#    minor generator.
#
# Class Name: Resultant
#
# Synopsis:
#
# use Resultant;
#
# Class Methods:
#
#   $seq1 = Resultant ->new(5, 3)
#
#      Creates a new object with a major generator of 5 and
#      a minor generator of 3. These parameters need to be
#      initialized when a new object is created, as there
#      are no methods to set these elements within the object.
#
#   $seq1->generate()
#
#      Generates a resultant and saves it in the ResultList array
#
# Object Data Methods:
#
#   $major_generator = $seq1->get_major()
#
#      Returns the major generator
#
#   $minor_generator = $seq1->get_minor()
#
#      Returns the minor generator
#
#
#**************************************************************

{ package Resultant;
  use strict;
  sub new {
    my $class           = shift;
    my $major_generator = shift;
    my $minor_generator = shift;

    my $self = {Major => $major_generator,
                Minor => $minor_generator,
                ResultantList => []};

    bless $self, $class;
    return $self;
  }

  sub get_major {
    my $self = shift;
    return $self->{Major};
  }

  sub get_minor {
    my $self = shift;
    return $self->{Minor};
  }

  sub generate {
    my $self         = shift;
    my $total_counts = $self->get_major * $self->get_minor;
    my $i            = 0;
    my $major_mod;
    my $minor_mod;
    my @result;
    my $result_counter = 0;

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $self->get_major;
       $minor_mod = $i % $self->get_minor;

       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@result, $result_counter);
          $result_counter = 0;
       }
   }

   @{$self->{ResultList}} = @result;
  }
}

#
# Test code to check out class methods
#

# Counter declaration
my $j;

# Create new object and initialize major and minor generators
my $seq1 = Resultant->new(5, 3);

# Print major and minor generators
print "The major generator is ", $seq1->get_major(), "\n";
print "The minor generator is ", $seq1->get_minor(), "\n";

# Generate a resultant
$seq1->generate();

# Print the resultant
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

Execute the file (perl result01a.pl):

The major generator is 5
The minor generator is 3
The resultant is 3 2 1 3 1 2 3

This output text shows the same resultant terms as produced by the first program.

Creating a Perl Module

From a package, you can create a Perl module to make the package fully reusable in an application. Also, you can modify our original test code into a series of module tests to show that the module works the same as the standalone package and the original program.

I like to use the Perl module Module::Starter to create a skeleton module for the package code. To start, install the Module::Starter module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Module::Starter module installed, type perldoc Module::Starter in a terminal window. If the man page does not appear, you probably do not have the module installed.

Select a working directory to create the module directory. This can be the same directory that you have been using to develop your Perl program. Type the following command (though with your own name and email address):

$ module-starter --module=Music::Resultant --author="John Doe" \
    --email=john@johndoe.com

Perl should respond with:

Created starter directories and files

In the working directory, you should see a folder or directory called Music-Resultant. Change your current directory to Music-Resultant, then type the commands:

$ perl Makefile.PL
$ make

These commands will create the full directory structure for the module. Now paste the text from the package into the module template at Music-Resultant/lib/Music/Resultant.pm. Open Resultant.pm in a text editor and paste the subroutines from the package after the lines:

=head1 FUNCTIONS

=head2 function1

=cut

When you paste the package source code, remove the opening brace from the package, so that the first lines appear as:

 package Resultant;
  sub new {
    use strict;
    my $class = shift;

and the last lines of the source appears without the the final closing brace as:

   @{$self->{ResultList}} = @result;
  }

After making the above changes, save Resultant.pm. This is all that you need to do to create a module for your own use. If you eventually release your module to the Perl community or upload it to CPAN, you should do some more work to prepare the module and its documentation (see the perlmod and perlmodlib documentation for more information).

After modifying Resultant.pm, you need to install the module to make it available for other Perl applications. To avoid configuration issues, install the module in your home directory, separate from your main Perl installation.

  1. In your home directory, create a lib/ directory, then create a perl/ directory within the lib/ directory. The result should resemble:

    /home/myname/lib/perl
  2. Go to your module directory (Music-Resultant) and re-run the build process with a directory path to tell Perl where to install the module:

    $ perl Makefile.PL LIB=/home/myname/lib/perl $
    make install

    Once this is complete, the module will be installed in the directory.

The final step in module development is to add tests to the .t file templates created in the module directory. The Perl distribution includes several built-in test modules, such as Test::Simple and Test::More to help test Perl subroutines and modules.

To test the module, open the file Music-Resultant/t/00-load.t. The initial text in this file is:

#!perl -T

use Test::More tests => 1;

BEGIN {
    use_ok( 'Music::Resultant' );
}

diag( "Testing Music::Resultant $Music::Resultant::VERSION, Perl $], $^X" );

You can run this test file from the t/ directory using the command:

perl -I/home/myname/lib/perl -T 00-load.t

The -I switch tells the Perl interpreter to look for the module Resultant.pm in your alternate installation directory. The directory path must immediately follow the -I switch, or Perl may not search your alternate directory for your module. The -T switch is necessary because there is a -T switch in the first line of the test script, which turns on taint checking. (Taint checking only works when enabled at Perl startup; perl will exit with an error if you try to enable it later.) Your results should resemble the following(your Perl version may be different).

1..1
ok 1 - use Music::Resultant;
# Testing Music::Resultant 0.01, Perl 5.008006, perl

The test code from the second listing is easy to convert to the format used by Test::More. Change the number at the end of the tests line from 1 to 4, as you will be adding three more tests to this file. The template file has an initial test to show that the module exists. Next, add tests after the BEGIN block in the file:

# Test 2:
my $seq1 = Resultant->new(5, 3);  # create an object
isa_ok ($seq1, Resultant);        # check object definition

# Test 3: check major generator
my $local_major_generator = $seq1->get_major();
is ($local_major_generator, 5, 'major generator is correct' );

# Test 4: check minor generator
my $local_minor_generator = $seq1->get_minor();
is ($local_minor_generator, 3, 'minor generator is correct' );

To run the tests, retype the earlier command line in the Music-Resultant/ directory:

$ perl -I/home/myname/lib/perl -T t/00-load.t

You should see the results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

These tests create a Resultant object with a major generator of 5 and a minor generator of 3 (Test 2), and check to see that the major generator in the object is correct (Test 3), and that the minor generator is correct (Test 4). They do not cover the resultant terms. One way to check the resultant is to add the test code used in the second listing to the .t file:

# Generate a resultant
$seq1->generate();

# Print the resultant
my $j;
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

You should get the following results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
The resultant is 3 2 1 3 1 2 3
# Testing Music::Resultant 0.01, Perl 5.008006, perl

That's not valid test output, so it needs a little bit of manipulation. To check the elements of a list using a testing function, install the Test::Differences module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Test::Differences module installed, type perldoc Test::Differences in a terminal window. If the man page does not appear, you probably do not have the module installed.

Once that module is part of your Perl installation, change the number of tests from 4 to 5 on the Test::More statement line and add a following statement after the use Test::More statement:

use Test::Differences;

Finally, replace the code that prints the resultant with:

# Test 5: (uses Test::Differences and associated modules)
$seq1->generate();
my @result   = @{$seq1->{ResultList}};
my @expected = (3, 2, 1, 3, 1, 2, 3);
eq_or_diff \@result, \@expected, "resultant terms are correct";

Now when the test file runs, you can confirm that the resultant is correct:

1..5
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
ok 5 - resultant terms are correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

Summary

There are multiple levels of Perl software development. Once you start to create modules to enable reuse of your Perl code, you will be able to leverage your effort into larger applications. By using Perl testing modules, you can ensure that your code works the way you expect and provide a way to ensure that the modules continue to work as you add more features.

Resources

Here are some other good resources on creating Perl modules:

Here are some good resources for using Perl testing modules like Test::Simple and Test::More:

  • Test::Tutorial gives the basics of using Test:Simple and Test::More.
  • An Introduction to Testing presents the benefits of developing tests and code at the same time, and provides a variety of examples.

Option and Configuration Processing Made Easy

When you first fire up your editor and start writing a program, it's tempting to hardcode any settings or configuration so you can focus on the real task of getting the thing working. But as soon as you have users, even if the user is only yourself, you can bet there will be things they want to choose for themselves.

A search on CPAN reveals almost 200 different modules dedicated to option processing and handling configuration files. By anyone's standards that's quite a lot, certainly too many to evaluate each one.

Luckily, you already have a great module right in front of you for handling options given on the command line: Getopt::Long, which is a core module included as standard with Perl. This lets you use the standard double-dash style of option names:

myscript --source-directory "/var/log/httpd" --verbose \ --username=JJ

Using Getopt::Long

When your program runs, any command-line arguments will be in the @ARGV array. Getopt::Long exports a function, GetOptions(), which processes @ARGV to do something useful with these arguments, such as set variables or run blocks of code. To allow specific option names, pass a list of option specifiers in the call to GetOptions() together with references to the variables in which you want the option values to be stored.

As an example, the following code defines two options, --run and --verbose. The call to GetOptions() will then assign the value 1 to the variables $run and $verbose respectively if the relevant option is present on the command line.

use Getopt::Long;
my ($run,$verbose);
GetOptions( 'run'     => \$run,
             'verbose' => \$verbose );

When Getopt::Long has finished processing options, any remaining arguments will remain in @ARGV for your script to handle (for example, specified filenames). If you use this example code and call your script as:

myscript --run --verbose file1 file2 file3

then after GetOptions() has been called the @ARGV array will contain the values file1, file2, and file3.

Types of Command-Line Options

The option specifier provided to GetOptions() controls not only the option name, but also the option type. Getopt::Long gives a lot of flexibility in the types of option you can use. It supports Boolean switches, incremental switches, options with single values, options with multiple values, and even options with hash values.

Some of the most common specifiers are:

name     # Presence of the option will set $name to 1
name!    # Allows negation, e.g. --name will set $name to 1,
         #    --noname will set $name to 0
name+    # Increments the variable each time the option is found, e.g.
         # if $name = 0 then --name --name --name will set $name to 3
name=s   # String value required
         #    --name JJ or --name=JJ will set $name to JJ
         # Spaces need to be quoted
         #    --name="Jon Allen" or --name "Jon Allen"

So, to create an option that requires a string value, format the call to GetOptions() like this:

my $name;
GetOptions( 'name=s' => \$name );

The value is required. If the user omits it, as in:

myscript --name

then the call to GetOptions() will die() with an appropriate error message.

Options with Multiple Values

The option specifier consists of four components: the option name; data type (Boolean, string, integer, etc.); whether to expect a single value, a list, or a hash; and the minimum and maximum number of values to accept. To require a list of string values, build up the option specifier:

Option name:   name
Option value:  =s    (string)
Option type:   @     (array)
Value counter: {1,}  (at least 1 value required, no upper limit)

Putting these all together gives:

my $name;
GetOptions('name=s@{1,}' => \$name);

Now invoking the script as:

myscript --name Barbie Brian Steve

will set $name to the array reference ['Barbie','Brian','Steve'].

Giving a hash value to an option is very similar. Replace @ with % and on the command line give arguments as key=value pairs:

my $name;
GetOptions('name=s%{1,}',\$name);

Running the script as:

myscript --name Barbie=Director JJ=Member

will store the hash reference { Barbie => 'Director', JJ => 'Member' } in $name.

Storing Options in a Hash

By passing a hash reference as the first argument to GetOptions, you can store the complete set of option values in a hash instead of defining a separate variable for each one.

my %options;
GetOptions( \%options, 'name=s', 'verbose' );

Option names will be hash keys, so you can refer to the name value as $options{name}. If an option is not present on the command line, then the corresponding hash key will not be present.

Options that Invoke Subroutines

A nice feature of Getopt::Long is that, as an alternative to simply setting a variable when an option is found, you can tell the module to run any code of your choosing. Instead of giving GetOptions() a variable reference to store the option value, pass either a subroutine reference or an anonymous code reference. This will then be executed if the relevant option is found.

GetOptions( version => sub{ print "This is myscript, version 0.01\n"; exit; }
            help    => \&display_help );

When used in this way, Getopt::Long also passes the option name and value as arguments to the subroutine:

GetOptions( name => sub{ my ($opt,$value) = @_; print "Hello, $value\n"; } );

You can still include code references in the call to GetOptions() even if you use a hash to store the option values:

my %options;
GetOptions( \%options, 'name=s', 'verbose', 'dest=s',
            'version' => sub{ print "This is myscript, version 0.01\n"; exit; } );

Dashes or Underscores?

If you need to have option names that contain multiple words, such as a setting for "Source directory," you have a few different ways to write them:

--source-directory
--source_directory
--sourcedirectory

To give a better user experience, Getopt::Long allows option aliases to allow either format. Define an alias by using the pipe character (|) in the option specifier:

my %options;
GetOptions( \%options, 'source_directory|source-directory|sourcedirectory=s' );

Note that if you're storing the option values in a hash, the first option name (in this case, source_directory) will be the hash key, even if your user gave an alias on the command line.

If you have a lot of options, it might be helpful to generate the aliases using a function:

use strict;
use warnings;
use Data::Dumper;
use Getopt::Long;

my %specifiers = ( 'source-directory' => '=s',
                   'verbose'          => '' );
my %options;
GetOptions( \%options, optionspec(%specifiers) );

print Dumper(\%options);

sub optionspec {
  my %option_specs = @_;
  my @getopt_list;

  while (my ($option_name,$spec) = each %option_specs) {
    (my $variable_name = $option_name) =~ tr/-/_/;
    (my $nospace_name  = $option_name) =~ s/-//g;
    my  $getopt_name   = ($variable_name ne $option_name)
        ? "$variable_name|$option_name|$nospace_name" : $option_name;

    push @getopt_list,"$getopt_name$spec";
  }

  return @getopt_list;
}

Running this script with each format in turn shows that they are all valid:

varos:~/writing/argvfile jj$ ./optionspec.pl --source-directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --source_directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --sourcedirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Additionally, Getopt::Long is case-insensitive by default (for option names, not values), so your users can also use --SourceDirectory, --sourceDirectory, etc., as well:

varos:~/writing/argvfile jj$ ./optionspec.pl --SourceDirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Configuration Files

The next stage on from command-line options is to let your users save their settings into config files. After all, if your program expands to have numerous options it's going to be a real pain to type them in every time.

When it comes to the format of a configuration file, there are a lot of choices, such as XML, INI files, and the Apache httpd.conf format. However, all of these formats share a couple of problems. First, your users now have two things to learn: the command-line options and the configuration file syntax. Second, even though many CPAN modules are available to parse the various config file formats, you still must write the code in your program to interact with your chosen module's API to set whatever variables you use internally to store user settings.

Getopt::ArgvFile to the Rescue

Fortunately, someone out there in CPAN-land has the answer (you can always count on the Perl community to come up with innovative solutions). Getopt::ArgvFile tackles both of these problems, simplifying the file format and the programming interface in one fell swoop.

To start with, the file format used by Getopt::ArgvFile is extremely easy for users to understand. Config settings are stored in a plain text file that holds exactly the same directives that a user would type on the command line. Instead of typing:

myscript --source-directory /usr/local/src --verbose --logval=alert

your user can use the config file:

--source-directory /usr/local/src
--verbose
--logval=alert

and then run myscript for instant user gratification with no steep learning curve.

Now to the clever part. Getopt::ArgvFile itself doesn't actually care about the contents of the config file. Instead, it makes it appear to your program that all the settings were actually options typed on the command line--the processing of which you've already covered with Getopt::Long. As well as saving your users time by not making them learn a new syntax, you've also saved yourself time by not needing to code against a different API.

The most straightforward method of using Getopt::ArgvFile involves simply including the module in a use statement:

use Getopt::ArgvFile home=>1;

A program called myscript that contains this code will search the user's home directory (whatever the environment variable HOME is set to) for a config file called .myscript and extract the contents ready for processing by Getopt::Long.

Here's a complete example:

use strict;
use warnings;
use Getopt::ArgvFile home=>1;
use Getopt::Long;

my %config;
GetOptions( \%config, 'name=s' );

if ($config{name}) {
  print "Hello, $config{name}\n";
} else {
  print "Who am I talking to?\n";
}

Save this as hello, then run the script with and without a command-line option:

varos:~/writing/argvfile jj$ ./hello
Who am I talking to?

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Now, create a settings file called .hello in your home directory containing the --name option. Remember to double quote the value if you want to include spaces.

varos:~/writing/argvfile jj$ cat ~/.hello
--name "Jon Allen"

Running the script without any arguments on the command line will show that it loaded the config file, but you can also override the saved settings by giving the option on the command line as normal.

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Advanced Usage

In many cases the default behaviour invoked by loading the module will be all you need, but Getopt::ArgvFile can also cater to more specific requirements.

User-Specified Config Files

Suppose your users want to save different sets of options and specify which one to use when they run your program. This is possible using the @ directive on the command line:

varos:~/writing/argvfile jj$ cat jj.conf
--name JJ

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello @jj.conf
Hello, JJ

Note that there's no extra programming required to use this feature; handling @ options is native to Getopt::ArgvFile.

Changing the Default Config Filename or Location

Depending on your target audience, the naming convention offered by Getopt::ArgvFile for config files might not be appropriate. Using a dotfile (.myscript) will render your user's config file invisible in his file manager or when listing files at the command prompt, so you may wish to use a name like myscript.conf instead.

Again, it may also be helpful to allow for default configuration files to appear somewhere other than the user's home directory, for example, if you need to allow system-wide configuration.

A further consideration here is PAR , the tool for creating standalone executables from Perl programs. PAR lets you include data files as well as Perl code, so you can bundle a default settings file using a command such as:

pp hello -o hello.exe -a hello.conf

which will be available to your script as $ENV{PAR_TEMP}/inc/hello.conf.

I mentioned earlier that Getopt::ArgvFile can load arbitrary config files if the filename appears with the @ directive on the command line. Essentially, what the module does when loaded with:

use Getopt::ArgvFile home=>1;

is to prepend @ARGV with @$ENV{HOME}/.scriptname, then resolve all @ directives, leaving @ARGV with the contents of the files. This means that running the script as:

myscript --name=JJ

is basically equivalent to writing:

myscript @$ENV{HOME}/.myscript --name-JJ

To load other config files, Getopt::ArgvFile supports disabling the automatic @ARGV processing and triggering it later. With a little manipulation of @ARGV first, you can make:

myscript --name=JJ

equivalent to:

myscript @/path/to/default.conf @/path/to/system.conf @/path/to/user.conf \
    --name=JJ

which will load the set of config files in the correct priority order.

All you need to do to enable this feature is change the use statement to read:

use Getopt::ArgvFile qw/argvFile/;

Loading the module in this way tells Getopt::ArgvFile to export the function argvFile(), which your program needs to call to process the @ directives, and also prevents any automated processing from occurring.

Here's an example that first loads a config file from the application bundle (if packaged by PAR) and then from the directory containing the application binary:

use File::Basename qw/basename/;
use FindBin qw/$Bin/;
use Getopt::ArgvFile qw/argvFile/;

# Define config filename as <application_name>.conf
(my $configfile = basename($0)) =~ s/^(.*?)(?:\..*)?$/$1.conf/;

# Include config file from the same directory as the application binary
if (-e "$Bin/$configfile") {
  unshift @ARGV,'@'."$Bin/$configfile";
}

# If we have been packaged with PAR, include the config file from the
# application bundle
if ($ENV{PAR_TEMP} and -e "$ENV{PAR_TEMP}/inc/$configfile") {
  unshift @ARGV,'@'."$ENV{PAR_TEMP}/inc/$configfile";
}

argvFile();  # Process @ARGV to load specified config files

You can also use this technique together with File::HomeDir to access the user's application data directory in a cross-platform manner, so that the location of the config file conforms to the conventions set by the user's operating system.

Summary

Getopt::Long provides an easy to use, extensible system for processing command-line options. With the addition of Getopt::ArgvFile, you can seamlessly handle configuration files with almost no extra coding. Together, these modules should be first on your list when writing scripts that need any amount of configuration.

Better Code Through Destruction

Larry Wall said that Perl makes easy things easy and hard things possible. Perl is good both for writing a two-line script that saves the world at the last minute (well, at least it saves you and your project) and for robust projects. However, good Perl programming techniques can be quite different between small and complex applications. Consider, for example, Perl's garbage collector. It frees a programmer from memory management issues most of the time...until the programmer creates circular references.

Perl's garbage collector counts references. When the count reaches zero (which means that no one has a reference), Perl reclaims the entity. The approach is simple and effective. However, circular references (when object A has a reference to object B, and object B has a reference to object A) present a problem. Even if nothing else in the program has a reference to either A or B, the reference count can never reach zero. Objects A and B do not get destroyed. If the code creates them again and again (perhaps in a loop), you get a memory leak. The amount of memory allocated by the program increases without a sensible reason and can never decrease. This effect may be acceptable for simple run-and-exit scripts, but it's not acceptable for programs running 24x365, such as in a mod_perl or FastCGI environment or as standalone servers.

Circular references are sometimes too useful to avoid. A common example is a tree-like data structure. To navigate both directions--from root to leaves and vice versa--a parent node has a list of children and a child node has a reference to its parent. Here are the circular references. Many CPAN modules implement their data models this way, including HTML::Tree, XML::DOM, and Text::PDF::File. All these modules provide a method to release the memory. The client application must call the method when it no longer needs an object. However, the requirement of an explicit call is not very appealing and can result in unsafe code:

    ##
    ## Code with a memory leak
    #
    use HTML::TreeBuilder;

    foreach my $filename (@ARGV) {
        my $tree = HTML::TreeBuilder->new;
        $tree->parse_file($filename);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## Do the actual work (say, extract images) here
        ## ...
        ## and release the memory
        ##
        $tree->delete;
    }

The problem in the code is the next statement; HTML documents with no <img ... tags will not be released. Actually, any call of next, last, return (inside a subroutine), or die (inside an eval {} block) is unsafe and will lead to a memory leak. Of course, it is possible to move the release code into a continue block for last or next, or to write code to delete the tree before every return or die, but the code easily becomes messy.

There is a better solution--the paradigm of "resource acquisition is initialization (and destruction is resource relinquishment)." (Ironically, the second half of its name is often omitted, even though it's probably the most important part). The idea is simple. Create a special guard object (of another class) whose sole responsibility is to release the resource. When the guard object gets destroyed, its destructor deletes the tree. The code may look like:

    ##
    ## A special sentry object is employed
    ##
    use HTML::TreeBuilder;

    foreach my $filename (@ARGV) {
        my $tree = HTML::TreeBuilder->new;
        $tree->parse_file($filename);

        my $sentry = Sentry->new($tree);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## next, last or return are safe here.
        ## Tree will be deleted automatically.
        ##
    }

    package Sentry;

    sub new {
        my $class = shift;
        my $tree  = shift;
        return bless {tree => $tree}, $class;
    }

    sub DESTROY {
        my $self = shift;
        $self->{tree}->delete;
    }

Note that now there is no need to call $tree->delete explicitly at the end of the loop. The magic is simple. When program flow leaves the scope, $sentry is reclaimable because it participates in no circular references. The code of DESTROY method of the Sentry package calls, in turn, the method delete of the $tree object. This is one solution for all means; memory will be released however you leave the block.

Finally, there is no need to code your own Sentry class. Use Object::Destroyer, originally written by Adam Kennedy. As you may guess by its name, it is the object to destroy other objects:

    ##
    ## An of-the-CPAN solution with Object::Destroyer
    ##
    use HTML::TreeBuilder;
    use Object::Destroyer 2.0;

    foreach my $filename (@ARGV) {
        my $tree   = HTML::TreeBuilder->new;
        my $sentry = Object::Destroyer->new($tree, 'delete');
        $tree->parse_file($filename);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## You can safely return, die, next or last here.
        ##
    }

Because the name of the release method may vary between modules, it is the constructor's second argument.

Finally, you can destroy any data structure, not just objects, if you provide code to do so. Pass in a subroutine reference or an anonymous subroutine:

    ##
    ## An unblessed data structure with circular references
    ## that cannot untangle itself.
    ##
    use Object::Destroyer 2.0;
    while (1) {
        my (%a, %b);
        $a{b}      = \%b;
        $b{a}      = \%a;
        my $sentry = Object::Destroyer->new( sub { undef $a{b} } );
    }

Just for fun, comment out the line with the $sentry object and watch the memory consumption of the running script.

Using Object::Destroyer As a Wrapper

Object::Destroyer can make life easier for module authors, too.

If you have written a library with circular references, you may ask your clients to explicitly call a disposal method or use a new feature of Perl (stable since 5.8; see Scalar::Util)--weak references. Weak references do not increment reference counts of the objects to which they refer, so the Perl garbage collector can collect the referents. In the tree example, all references from leaves to parents (but not vice versa, or the tree will be lost!) may be weak. When the final reference to the root node goes away, Perl will dispose of it, which will remove its references to all of its children recursively. They will all reach zero, and Perl will reclaim them all down the branches of the tree to every leaf.

Indeed, some CPAN modules use this approach (XML::Twig). However, this solution works only if weak refs are available; this is certainly not the case for older Perl. Secondly, this may require quite a bit of rewriting (there are nine calls to weaken throughout the code of XML::Twig 3.26).

Alternatively, you may use Object::Destroyer internally in your library code. It can work as an almost transparent wrapper around your object:

    ##
    ## Object::Destroyer as a wrapper
    ##
    package My::Tree;
    use Object::Destroyer 2.0;

    sub new {
        my $class = shift;
        my $self  = bless {}, $class;
        $self->populate;

        return Object::Destroyer->new( $self, 'release' );
    }

    sub release{
        ## actual memory release code
    }

    sub say_hello{
        my $self = shift;
        print "Hello, I'm object of class ", ref($self), "\n";
    }

    package main;
    {
        my $tree = My::Tree->new;
        $tree->say_hello;
        ##
        ## $tree->release will be called by Object::Destroyer;
        ##
    }

The object $tree in the client code is actually an Object::Destroyer object that dispatches all invoked methods to the underlying object of class My::Tree. The method say_hello sees no difference at all--it receives an original $self object. Changes to code are minimal and well localized.

The approach has a limitation, too: clients must not access attributes of the object directly (such as $tree->{age}). This is a bad practice in client code anyway. Additionally, there is a small time penalty for method calls by client-side code. Calls made from the library code itself are not affected.

Exceptions and Resource Deallocation

Resource acquisition is initialization is a powerful technique to apply to the management of various critical resources, not only memory. It is most useful when using exceptions to handle errors. This combination makes code quite reliable: exceptions separate normal execution logic and error handling, and RAII sentries guarantee the correct release of every sensitive resource.

Consider alarms as an example. Assume that you have to call some potentially long-running (or even never-ending) code. You don't want your script to hang up, and prefer to break its execution. Alarms are just right for the task. However, the first attempt at good code might be awkward:

    ##
    ## Alarm example 1. Naive.
    ##
    eval{
        local $SIG{ALRM} = sub { die "Timed out\n" };
        alarm(5);
        long_running_code();
        ## Cancel the alarm if code returned within 5 sec.
        alarm(0);
    };
    if ($@ && $@ eq "Timed out\n") {
        ## Process the error here
    }

This code will work fine until long_running_code() dies. In this case, the eval block will catch the die, but not the alarm. If this occurred in a program that must run 24 hours a day, the program would end in 5 seconds.

This next example is much better; actually it is real-world code. It is enough for many applications. However, it's not completely bulletproof either:

    ##
    ## Alarm example 2. A standard solution.
    ##
    eval{
        local $SIG{ALRM} = sub { die "Timed out\n" };
        alarm(5);
        long_running_code();
        ## Cancel the alarm if long_running_code() returns within 5 sec.
        alarm(0);
    };
    ## Cancel the alarm if the long_running_code() died.
    alarm(0);

How many times will the alarm be cancelled in the following example?

    ##
    ## Alarm example 3. Malicious code.
    ##
    LOOP:
    foreach my $arg (1..3) {
        eval{
            local $SIG{ALRM} = sub { die "Timed out\n" };
            alarm(5);
            long_running_code($arg);
            alarm(0);
        };
        alarm(0);
    }
    sub long_running_code{ last LOOP; }

Oops, none.

The RAII solution is more reliable:

    ##
    ## Alarm example 4.
    ## Resource is under control of Object::Destroyer
    ##
    eval{
        local $SIG{ALRM} = sub { die "Timed out\n" };
        alarm(5);
        my $sentry = Object::Destroyer->new( sub {alarm(0)} );
        long_running_code();
    };

No matter how the code exits the eval block, Perl will destroy the $sentry object. That destruction will call alarm(0).

You can manage many sensitive resources this way, including file locks, semaphores, and even locks of database tables.

    ##
    ## File lock.
    ##
    use Fcntl ':flock';

    open my($fh), ">$filename.lock";
    eval{
        flock($fh, LOCK_EX);
        my $sentry = Object::Destroyer->new( sub {flock($fh, LOCK_UN)} );
        ##
        ## Actual lock-sensitive code is here.
        ## It is safe to die.
        ##
    };

    ##
    ## Semaphore
    ##
    use Thread::Semaphore;
    use Object::Destroyer;

    my $s = Thread::Semaphore->new();
    eval{
        $s->down;
        my $sentry = Object::Destroyer->new( sub { $s->up } );
        ##
        ## Critical code is here, die is safe
        ##
    };

    ##
    ## MySQL database table lock.
    ##
    use DBI;

    my $dbh = DBI->connect("dbi:mysql:...", "", "");
    eval{
        $dbh->do("LOCK TABLE table1 READ");
        my $sentry = Object::Destroyer->new(
            sub { $dbh->do("UNLOCK TABLES"); }
        );
        ##
        ## Again, actual code must be here
        ##
    };

The code is clean, simple, and quite self-explanatory.

Simple Transactions

Everyone who works with relational databases knows how useful transactions are. One of the features of transactions is atomicity: either all modifications of data are committed at once, or all of them are ignored. Your data is always consistent; it's not possible to leave it in an inconsistent state. The same effect is possible in Perl code:

    use Object::Destroyer 2.0;

    my ($account1, $account2) = (15, 15);

    printf("Account1=%d, Account2=%d, Total=%d\n",
        $account1, $account2, $account1+$account2);

    eval {
        my $coderef = create_savepoint(\$account1, \$account2);
        my $sentry  = Object::Destroyer->new($coderef);

        die "before changes" if rand > 0.7;
        $account1 += 3;
        die "after account 1 was modified" if rand > 0.7;
        $account2 -= 3;
        die "after account 2 was modified" if rand > 0.7;

        ##
        ## The transaction is considered to be committed here
        ## and $sentry can be dismissed.
        ## $coderef->() will not be called.
        ##
        $sentry->dismiss;

        die "after transaction is committed" if rand > 0.7;
    };
    print "Died $@" if $@;
    printf("Account1=%d, Account2=%d, Total=%d\n",
        $account1, $account2, $account1+$account2);

    sub create_savepoint {
        ## Save references to variables and their current values
        my @vars;
        foreach my $ref (@_) {
            die "Can remember only scalar values" unless ref($ref) eq 'SCALAR';
            push @vars, { ref => $ref, value => $$ref };
        }

        ## A closure to restore their values
        return sub {
            foreach my $var (@vars) {
                ${ $var->{ref} } = $var->{value};
            }
        };
    }

Run the script several times. Due to rand, it will break on varying lines, but it is not possible to get a Total value other than 30.

See Also

RAII is by no means a new technique. It is very popular in the world of C++ programming. If you are not afraid of C++, you may find interesting the standard container auto_ptr and effective auto_ptr usage. The non-standard ScopeGuard class provides lexically scoped resource management in C++.

The Devel::Monitor module has guidelines on how to design data structures with weak and circular references. Its primary goal, by the way, is to trace the memory consumption of a running script.

There are several modules for lexically scoped resource management on CPAN, but the Object::Destroyer is my favorite. You may also look at Hook::Scope, Scope::Guard and Sub::ScopeFinalizer.

Finally, Object Oriented Exception Handling in Perl discusses why exceptions are invaluable for big projects.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02