July 2007 Archives

Option and Configuration Processing Made Easy

When you first fire up your editor and start writing a program, it's tempting to hardcode any settings or configuration so you can focus on the real task of getting the thing working. But as soon as you have users, even if the user is only yourself, you can bet there will be things they want to choose for themselves.

A search on CPAN reveals almost 200 different modules dedicated to option processing and handling configuration files. By anyone's standards that's quite a lot, certainly too many to evaluate each one.

Luckily, you already have a great module right in front of you for handling options given on the command line: Getopt::Long, which is a core module included as standard with Perl. This lets you use the standard double-dash style of option names:

myscript --source-directory "/var/log/httpd" --verbose \ --username=JJ

Using Getopt::Long

When your program runs, any command-line arguments will be in the @ARGV array. Getopt::Long exports a function, GetOptions(), which processes @ARGV to do something useful with these arguments, such as set variables or run blocks of code. To allow specific option names, pass a list of option specifiers in the call to GetOptions() together with references to the variables in which you want the option values to be stored.

As an example, the following code defines two options, --run and --verbose. The call to GetOptions() will then assign the value 1 to the variables $run and $verbose respectively if the relevant option is present on the command line.

use Getopt::Long;
my ($run,$verbose);
GetOptions( 'run'     => \$run,
             'verbose' => \$verbose );

When Getopt::Long has finished processing options, any remaining arguments will remain in @ARGV for your script to handle (for example, specified filenames). If you use this example code and call your script as:

myscript --run --verbose file1 file2 file3

then after GetOptions() has been called the @ARGV array will contain the values file1, file2, and file3.

Types of Command-Line Options

The option specifier provided to GetOptions() controls not only the option name, but also the option type. Getopt::Long gives a lot of flexibility in the types of option you can use. It supports Boolean switches, incremental switches, options with single values, options with multiple values, and even options with hash values.

Some of the most common specifiers are:

name     # Presence of the option will set $name to 1
name!    # Allows negation, e.g. --name will set $name to 1,
         #    --noname will set $name to 0
name+    # Increments the variable each time the option is found, e.g.
         # if $name = 0 then --name --name --name will set $name to 3
name=s   # String value required
         #    --name JJ or --name=JJ will set $name to JJ
         # Spaces need to be quoted
         #    --name="Jon Allen" or --name "Jon Allen"

So, to create an option that requires a string value, format the call to GetOptions() like this:

my $name;
GetOptions( 'name=s' => \$name );

The value is required. If the user omits it, as in:

myscript --name

then the call to GetOptions() will die() with an appropriate error message.

Options with Multiple Values

The option specifier consists of four components: the option name; data type (Boolean, string, integer, etc.); whether to expect a single value, a list, or a hash; and the minimum and maximum number of values to accept. To require a list of string values, build up the option specifier:

Option name:   name
Option value:  =s    (string)
Option type:   @     (array)
Value counter: {1,}  (at least 1 value required, no upper limit)

Putting these all together gives:

my $name;
GetOptions('name=s@{1,}' => \$name);

Now invoking the script as:

myscript --name Barbie Brian Steve

will set $name to the array reference ['Barbie','Brian','Steve'].

Giving a hash value to an option is very similar. Replace @ with % and on the command line give arguments as key=value pairs:

my $name;
GetOptions('name=s%{1,}',\$name);

Running the script as:

myscript --name Barbie=Director JJ=Member

will store the hash reference { Barbie => 'Director', JJ => 'Member' } in $name.

Storing Options in a Hash

By passing a hash reference as the first argument to GetOptions, you can store the complete set of option values in a hash instead of defining a separate variable for each one.

my %options;
GetOptions( \%options, 'name=s', 'verbose' );

Option names will be hash keys, so you can refer to the name value as $options{name}. If an option is not present on the command line, then the corresponding hash key will not be present.

Options that Invoke Subroutines

A nice feature of Getopt::Long is that, as an alternative to simply setting a variable when an option is found, you can tell the module to run any code of your choosing. Instead of giving GetOptions() a variable reference to store the option value, pass either a subroutine reference or an anonymous code reference. This will then be executed if the relevant option is found.

GetOptions( version => sub{ print "This is myscript, version 0.01\n"; exit; }
            help    => \&display_help );

When used in this way, Getopt::Long also passes the option name and value as arguments to the subroutine:

GetOptions( name => sub{ my ($opt,$value) = @_; print "Hello, $value\n"; } );

You can still include code references in the call to GetOptions() even if you use a hash to store the option values:

my %options;
GetOptions( \%options, 'name=s', 'verbose', 'dest=s',
            'version' => sub{ print "This is myscript, version 0.01\n"; exit; } );

Dashes or Underscores?

If you need to have option names that contain multiple words, such as a setting for "Source directory," you have a few different ways to write them:

--source-directory
--source_directory
--sourcedirectory

To give a better user experience, Getopt::Long allows option aliases to allow either format. Define an alias by using the pipe character (|) in the option specifier:

my %options;
GetOptions( \%options, 'source_directory|source-directory|sourcedirectory=s' );

Note that if you're storing the option values in a hash, the first option name (in this case, source_directory) will be the hash key, even if your user gave an alias on the command line.

If you have a lot of options, it might be helpful to generate the aliases using a function:

use strict;
use warnings;
use Data::Dumper;
use Getopt::Long;

my %specifiers = ( 'source-directory' => '=s',
                   'verbose'          => '' );
my %options;
GetOptions( \%options, optionspec(%specifiers) );

print Dumper(\%options);

sub optionspec {
  my %option_specs = @_;
  my @getopt_list;

  while (my ($option_name,$spec) = each %option_specs) {
    (my $variable_name = $option_name) =~ tr/-/_/;
    (my $nospace_name  = $option_name) =~ s/-//g;
    my  $getopt_name   = ($variable_name ne $option_name)
        ? "$variable_name|$option_name|$nospace_name" : $option_name;

    push @getopt_list,"$getopt_name$spec";
  }

  return @getopt_list;
}

Running this script with each format in turn shows that they are all valid:

varos:~/writing/argvfile jj$ ./optionspec.pl --source-directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --source_directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --sourcedirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Additionally, Getopt::Long is case-insensitive by default (for option names, not values), so your users can also use --SourceDirectory, --sourceDirectory, etc., as well:

varos:~/writing/argvfile jj$ ./optionspec.pl --SourceDirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Configuration Files

The next stage on from command-line options is to let your users save their settings into config files. After all, if your program expands to have numerous options it's going to be a real pain to type them in every time.

When it comes to the format of a configuration file, there are a lot of choices, such as XML, INI files, and the Apache httpd.conf format. However, all of these formats share a couple of problems. First, your users now have two things to learn: the command-line options and the configuration file syntax. Second, even though many CPAN modules are available to parse the various config file formats, you still must write the code in your program to interact with your chosen module's API to set whatever variables you use internally to store user settings.

Getopt::ArgvFile to the Rescue

Fortunately, someone out there in CPAN-land has the answer (you can always count on the Perl community to come up with innovative solutions). Getopt::ArgvFile tackles both of these problems, simplifying the file format and the programming interface in one fell swoop.

To start with, the file format used by Getopt::ArgvFile is extremely easy for users to understand. Config settings are stored in a plain text file that holds exactly the same directives that a user would type on the command line. Instead of typing:

myscript --source-directory /usr/local/src --verbose --logval=alert

your user can use the config file:

--source-directory /usr/local/src
--verbose
--logval=alert

and then run myscript for instant user gratification with no steep learning curve.

Now to the clever part. Getopt::ArgvFile itself doesn't actually care about the contents of the config file. Instead, it makes it appear to your program that all the settings were actually options typed on the command line--the processing of which you've already covered with Getopt::Long. As well as saving your users time by not making them learn a new syntax, you've also saved yourself time by not needing to code against a different API.

The most straightforward method of using Getopt::ArgvFile involves simply including the module in a use statement:

use Getopt::ArgvFile home=>1;

A program called myscript that contains this code will search the user's home directory (whatever the environment variable HOME is set to) for a config file called .myscript and extract the contents ready for processing by Getopt::Long.

Here's a complete example:

use strict;
use warnings;
use Getopt::ArgvFile home=>1;
use Getopt::Long;

my %config;
GetOptions( \%config, 'name=s' );

if ($config{name}) {
  print "Hello, $config{name}\n";
} else {
  print "Who am I talking to?\n";
}

Save this as hello, then run the script with and without a command-line option:

varos:~/writing/argvfile jj$ ./hello
Who am I talking to?

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Now, create a settings file called .hello in your home directory containing the --name option. Remember to double quote the value if you want to include spaces.

varos:~/writing/argvfile jj$ cat ~/.hello
--name "Jon Allen"

Running the script without any arguments on the command line will show that it loaded the config file, but you can also override the saved settings by giving the option on the command line as normal.

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Advanced Usage

In many cases the default behaviour invoked by loading the module will be all you need, but Getopt::ArgvFile can also cater to more specific requirements.

User-Specified Config Files

Suppose your users want to save different sets of options and specify which one to use when they run your program. This is possible using the @ directive on the command line:

varos:~/writing/argvfile jj$ cat jj.conf
--name JJ

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello @jj.conf
Hello, JJ

Note that there's no extra programming required to use this feature; handling @ options is native to Getopt::ArgvFile.

Changing the Default Config Filename or Location

Depending on your target audience, the naming convention offered by Getopt::ArgvFile for config files might not be appropriate. Using a dotfile (.myscript) will render your user's config file invisible in his file manager or when listing files at the command prompt, so you may wish to use a name like myscript.conf instead.

Again, it may also be helpful to allow for default configuration files to appear somewhere other than the user's home directory, for example, if you need to allow system-wide configuration.

A further consideration here is PAR , the tool for creating standalone executables from Perl programs. PAR lets you include data files as well as Perl code, so you can bundle a default settings file using a command such as:

pp hello -o hello.exe -a hello.conf

which will be available to your script as $ENV{PAR_TEMP}/inc/hello.conf.

I mentioned earlier that Getopt::ArgvFile can load arbitrary config files if the filename appears with the @ directive on the command line. Essentially, what the module does when loaded with:

use Getopt::ArgvFile home=>1;

is to prepend @ARGV with @$ENV{HOME}/.scriptname, then resolve all @ directives, leaving @ARGV with the contents of the files. This means that running the script as:

myscript --name=JJ

is basically equivalent to writing:

myscript @$ENV{HOME}/.myscript --name-JJ

To load other config files, Getopt::ArgvFile supports disabling the automatic @ARGV processing and triggering it later. With a little manipulation of @ARGV first, you can make:

myscript --name=JJ

equivalent to:

myscript @/path/to/default.conf @/path/to/system.conf @/path/to/user.conf \
    --name=JJ

which will load the set of config files in the correct priority order.

All you need to do to enable this feature is change the use statement to read:

use Getopt::ArgvFile qw/argvFile/;

Loading the module in this way tells Getopt::ArgvFile to export the function argvFile(), which your program needs to call to process the @ directives, and also prevents any automated processing from occurring.

Here's an example that first loads a config file from the application bundle (if packaged by PAR) and then from the directory containing the application binary:

use File::Basename qw/basename/;
use FindBin qw/$Bin/;
use Getopt::ArgvFile qw/argvFile/;

# Define config filename as <application_name>.conf
(my $configfile = basename($0)) =~ s/^(.*?)(?:\..*)?$/$1.conf/;

# Include config file from the same directory as the application binary
if (-e "$Bin/$configfile") {
  unshift @ARGV,'@'."$Bin/$configfile";
}

# If we have been packaged with PAR, include the config file from the
# application bundle
if ($ENV{PAR_TEMP} and -e "$ENV{PAR_TEMP}/inc/$configfile") {
  unshift @ARGV,'@'."$ENV{PAR_TEMP}/inc/$configfile";
}

argvFile();  # Process @ARGV to load specified config files

You can also use this technique together with File::HomeDir to access the user's application data directory in a cross-platform manner, so that the location of the config file conforms to the conventions set by the user's operating system.

Summary

Getopt::Long provides an easy to use, extensible system for processing command-line options. With the addition of Getopt::ArgvFile, you can seamlessly handle configuration files with almost no extra coding. Together, these modules should be first on your list when writing scripts that need any amount of configuration.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en