Recently in CPAN Category

PDF Processing with Perl

Adobe's PDF has become a standard for text documents. Most office products can export their content into PDF. However, this software reaches its limits if you want advanced tasks such as combining different PDF documents into one single document or adding and adjusting the bookmarks panel for better navigation. Imagine that you want to collect all relevant Perl.com articles in one PDF file with an up-to-date bookmarks panel. You could use a tool like HTMLDOC, but adding article number 51 would require you to fetch articles one through 50 from the Web again. In most cases you would not be satisfied by the resulting bookmarks panel, either. This article shows how to use PDF::Reuse, by Lars Lundberg, for combining different PDF documents and adding bookmarks to them.

Example Material

Although its capabilities are limited in this area, you can also use PDF::Reuse to create PDF documents. If you want to create more sophisticated documents you should investigate other PDF-packages like PDF::API2 from Alfred Reibenschuh or Text::PDF from Martin Hosken. However PDF::Reuse is sufficient to create a simple PDF to use in later examples. The following listing should be rather self explanatory.

 # file: examples/create-pdfs.pl
  use strict;
  use PDF::Reuse;

  mkdir "out" if (!-e "out") ;

  foreach my $x (1..4) {
      prFile("out/file-$x.pdf");

      foreach my $y (1..10) {
          prText(35,800,"File: file-$x.pdf");
          prText(510,800,"Page: $y");

          foreach my $z (1..15) {
              prText(35,700-$z*16,"Line $z");
          }

          # add graphics with the prAdd function

          # stroke color
          prAdd("0.1 0.1 0.9 RG\n");

          # fill color
          prAdd("0.9 0.1 0.1 rg\n");

          my $pos = 750 - ($y * 40);

          prAdd("540 $pos 10 40 re\n");
          prAdd("B\n");

          if ($y < 10) {
              prPage();
          }
      }

      prEnd();
  }

Open a new file with with prFile($filename) and close it with prEnd. Between those two calls, add text with the prText command. You can also draw graphics by using the low level command prAdd with plain PDF markup as parameter. Start a new page with prPage. prFile starts the first one automatically, so you need to add a new page only if your document has more than one page. Be aware that for the prText(x,y,Text) command the origin of the coordinate system is on the left bottom of the page.

As an example of adding PDF markup with prAdd, the code creates a red rectangle with blue borders. In case you would like to add more graphics or complex graphics to your PDF, you can study the examples of the PDF reference manual. If so, consider switching to PDF::API2 or Text::PDF instead of using prAdd, as they both provide a comfortable layer of abstraction over the PDF markup language.

Combining PDF Documents

PDF::Reuse's main strength is the modification and reassembling of existing PDF documents. The next example assembles a new file from the example material.

  # file: examples/combine-pdfs.pl

  use strict;
  use PDF::Reuse;

  prFile("out/resultat.pdf");

  prDoc('out/file-1.pdf',1,4);
  prDoc('out/file-2.pdf',2,9);
  prDoc('out/file-3.pdf',8);
  prDoc('out/file-4.pdf');

  prEnd();

Again, prFile($filename) opens the file. Next, the prDoc($filename, $firstPage, $lastPage) calls add to the new file various page ranges from the example file. The arguments $firstPage and $lastPage are optional. Omit both to add the entire document. If only $firstPage is present, the call will add everything from that page to the end. Finally, prEnd closes the file.

Reusing Existing PDF Files

With PDF::Reuse it is possible to use existing PDF files as templates for creating new documents. Suppose that you have a file customer.txt containing a list of customers to whom to send a letter. You've used a tool to create a PDF document, such as OpenOffice.org or Adobe Acrobat, to produce the letter itself. Now you can write a short program to add the date and the names and addresses of your customers to the letter.

  # file: examples/reuse-letter.pl
  use PDF::Reuse;
  use Date::Formatter;
  use strict;

  my $date = Date::Formatter->now();
  $date->createDateFormatter("(DD).(MM). (YYYY)");

  my $n      =  1;
  my $incr   = 14;
  my $infile = 'examples/customer.txt';

  prFile("examples/sample-letters.pdf");

  prCompress(1);
  prFont('Arial');
  prForm("examples/sample-letter.pdf");

  open (my $fh, "<$infile") || die "Couldn't open $infile, $!\n aborts!\n";

  while (my $line = <$fh>)  {
      my $x = 60;
      my $y = 760;

      my ($first, $last, $street, $zipCode, $city, $country) = split(/,/, $line);
      last unless $country;

      prPage() if $n++ > 1 ;
      prText($x, $y, "$first $last");

      $y -= $incr;
      prText($x, $y, $street);

      $y -= $incr;
      prText($x, $y, $zipCode);
      prText(($x + 40), $y, $city);

      $y -= $incr;
      prText($x,   $y, $country);
      prText(60,  600, "Dear $first $last,");
      prText(400, 630, "Berlin, $date");
  }

  prEnd();
  close $fh;

After opening the file with prFile, the call to prCompress(1) enables PDF compression. prFont sets the file's font. The always-available options are Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic, Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique, Helvetica, Helvetica-Bold, Helvetica-Oblique, and Helvetica-BoldOblique. Set the font size with prFontSize. The default font is Helvetica, with 12 pixel size.

The rest of the code is a simple loop over the file containing the customer data to filling the template with prText.

Adding Page Numbers

Sometimes you need only make a small change to a document, such as adding missing page numbers.

  # file: examples/sample-numbers.pl
  use PDF::Reuse;
  use strict;

  my $n = 1;

  prFile('examples/sample-numbers.pdf');

  while (1) {
     prText(550, 40, $n++);
     last unless prSinglePage('sample-letters.pdf');
  }

  prEnd();

prSinglePage takes one page after the other from an existing PDFdocument and returns the number of remaining pages after each invocation.

Low-Level PDF Commands

If you know low-level PDF instructions, you can add them with with the prAdd(string) subroutine. PDF::Reuse will perform no syntax checks on the instructions, so refer to the PDF reference manual. Here's an example of printing colored rectangles with the prAdd subroutine.

  # file: examples/sample-rectangle.pl
  use PDF::Reuse;
  use strict;

  prFile('examples/sample-rectangle.pdf');

  my $x = 40;
  my $y = 50;
  my @colors;

  foreach my $r (0..5) {
     foreach my $g (0..5) {
         foreach my $b (0..5) {
             push @colors,
                 sprintf("%1.1f %1.1f %1.1f rg\n",
                 $r * 0.2, $g * 0.2, $b * 0.2);
         }
     }
  }

  while (1) {
     if ($x > 500) {
         $x = 40; $y += 40;
         last unless @colors;
     }

     # a rectangle
     my $string = "$x $y 30 30 re\n";
     $string   .= shift @colors;

     # fill and stroke
     $string   .= "b\n";

     prAdd($string);

     $x += 40;
  }

  prEnd();

Adding Bookmarks

Working with PDF files becomes comfortable if the document has bookmarks with a table of contents-like structure. Some applications either can't provide the PDF document with bookmarks or support insufficient or incorrect bookmarks. PDF::Reuse can fill this gap with the prBookmark($reference) subroutine.

A bookmark reference is a hash or a array of hashes that looks like:

   {  text  => 'Document-Text',
             act   => 'this.pageNum = 0; this.scroll(40, 500);',
             kids  => [ { text => 'Chapter 1',
                          act  => '1, 40, 600'
                        },
                        { text => 'Chapter 2',
                          act  => '10, 40, 600'
                        }
                      ]
   }

...where act is a JavaScript action to trigger when someone clicks on the bookmark. Because those JavaScript actions only work in the Acrobat Reader but not in other PDF viewer applications, I will later show a improvement of PDF::Reuse that fixes this issue.

Other examples for using PDF::Reuse, including image embedding, are available in the PDF::Reuse::Tutorial.

A Console Application for Combining PDF Documents

To avoid editing the Perl code for combining PDF documents every time you want to merge documents, I've written a console application that takes the names of the input files and the page ranges for each file as arguments. That's easy to reuse in a graphical application using Perl/Tk, so I've put that code in a separate Perl module called CombinePDFs. The command-line application will interact with this package instead of directly working on PDF::Reuse. The following diagram shows the relationship between the Packages, example, and applications.

  Examples    |     Packages            |     applications
  -------------------------------------------------------------------
  combine.pdfs                           app-combine-console-pdfs.pl
              \                         /
               PDF::Reuse -- CombinePDFs
              /                         \
   create.pdfs                           app-combine-tk-pdfs.pl

The application app-combine-console-pdfs.pl does not deal directly with PDF::Reuse but parses the command line arguments with Getopt::Long written by Johan Vromans. This is the standard package for this task. Here it parses the input filenames and the page ranges into two arrays of same length. The user also has to supply a filename for the output and, optionally, a bookmarks file. The main subroutine that parses the command line arguments and executes CombinePDFs::createPDF is:

  sub main {
      GetOptions("infile=s"    => \@infiles,
                 "outfile=s"   => \$outfile,
                 "pages=s",    => \@pages,
                 'overwrite'   => \$overwrite,
                 'bookmarks:s' => \$bookmarks,
                 'help'        => \&help);
      help unless ((@infiles and $outfile and @pages) and @pages == @infiles);

      checkPages();
      checkFiles();
      checkBookmarks();

      CombinePDFs::createPDF(\@infiles, \@pages, $outfile, $bookmarks);
  }

If the user passes an insufficient number of arguments, invalid filenames, or incorrect page ranges, the code invokes the the usage subroutine. It also gets invoked if the user asks explicitly for -help on the command line. Any good command line application should be written that way. Getopt::Long can distinguish between mandatory arguments, with = as the symbol after the argument name (infile, pages), optional arguments, with : (bookmarks), or flags (overwrite, usage), without a symbol. It can store these arguments as arrays (infile, pages), hashes, or scalars. It also supports type checking.

CombinePDFs Package

The application itself mainly performs error checking. If everything is fine, it calls the CombinePDFs::createPDF subroutine, passing the array of input files, the array of page ranges, and the bookmarks information. The bookmarks scalar is optional.

Page ranges can be comma-separated ranges (1-11,14,17-23), single pages, or the all token. You can include the same page several times in the same document.

The file-checking code looks for read permissions and tests if the file is a PDF document by using the CombinePDFs::isPDF($filename) subroutine. Although PDF, by Antonio Rosella, also provides such a method, this package was not developed with the use strict pragma and gives a lot of warnings. Furthermore, the package is not actively maintained, so there seems to be no chance to fix this in the near future. Implementing the isPDF subroutine is quite simple; it reads the first line of the PDF file and checks for the magic string %PDF-1.[0-9] in the first line of the document.

Please note that PDF::Reuse is not an object oriented package. Therefore the CombinePDFs package is not object oriented, either. A user of this package could create several instances, but all instances work on the same PDF file.

Submitting complex data structures via the command line is a difficult issue, so I decided that bookmarks should come from a text file. This file has a simple markup to reflect a tree structure, where each line resembles:

 <level> "bookmarks text" <page>

The level starts with 0 for root bookmarks. Children of the root bookmarks have a level of 1, their children a level of 2, and so on. Currently, the system supports bookmarks up to three levels of nesting:

 0 "Folder File 1 - Page 1" 1
 1 "File 1 - Page 2" 2
 1 "Subfolder File 1 - Page 3" 3
 2 "File 1 - Page 4"  4
 0 "Folder File 2 - Page 7 " 7
 1 "File 2 - Page 7" 7
 1 "File 2 - Page 9" 9

The parsing subroutine for the bookmarks file CombinePDFs::addBookmarks($filename) should be easy to understand, though that's not necessarily true of the complex data structure created inside this subroutine.

Bookmarks are an array of hashes. addBookmarks uses several attributes. text is the title of the entry in the bookmarks panel. act is the action to trigger when someone clicks the entry. Here it is the page number to open. kids contains a reference to the children of this bookmark entry. During the loop over the file content, the code searches for each level the last entry in a variable and pushes its related children on those last entries. The root bookmarks get collected as an array, and the loop adds the children as a reference to an array, and so on for the grand children. The result is a nested complex data structure which stores all children in the kids attribute of the parent's bookmarks hash—an array of hashes containing other arrays of hashes and so on.

The parsing subroutine for the bookmarks file CombinePDFs::addBookmarks($filename) collects bookmarks in a array of hashes. At the end, it adds the bookmarks to the document with prBookmarks($reference). All of this means that you can use a bookmarks file with the PDF file with a command line like:

   $ perl bin/app-combine-pdfs.pl \
      --infile out/file-1.pdf --pages 1-6 \
      --infile out/file-2.pdf --pages 1-4,7,9-10 \
      --bookmarks out/bookmarks.cnt \
      --outfile file-all.pdf --overwrite

Currently, you must open the document's navigation panel manually because PDF::Reuse does not yet allow you to declare a default view, whether full screen or panel view. This is easy to fix, and the author Lars Lundberg has promised me to do so in a next release of PDF::Reuse. In order to enable this feature until a new release will appear I included a modified version of PDF::Reuse in the examples zip file that accompanies this article.

Furthermore, the bookmarks use JavaScript functions. To use the bookmarks in PDF viewers other than Acrobat Reader, my patched PDF::Reuse package replaces JavaScript bookmarks with PDF specification compliant bookmarks. To do that, replace the act key with a page key using the appropiate page number and scroll options:

 $bookmarks = {  text  => 'Document',
                 page  => '0,40,50;',
                 kids  => [ { text => 'Chapter 1',
                          page  => '1, 40, 600'
                        },
                        { text => 'Chapter 2',
                          page  => '10, 40, 600'
                        }
                      ]
          }

Then print the bookmarks to the PDF document as usual with prBookmark($bookmarks);.

Tk Application to Combine PDF Documents

Console applications are fine for experienced users, but you can't expect that all users belong to this category. Therefore it might be worth it to write a GUI for combining PDF documents. The Perl/Tk toolkit founded on the old Tix widgets for Tcl/Tk is not very modern, although this might change with the Tcl/Tk release 8.5 and the Tile widgets—but it is very portable. That's why I used it for the GUI example. Because I put a layer between the PDF::Reuse package and the command line application with the CombinePDFs package, it was easy to reuse those parts in the Tk-application app-combine-tk-pdfs.pl.

With the Tk application, the user visually selects PDF files, orders the files in a Tk::Tree widget, and changes the page ranges and the bookmarks text in Tk::Entry fields. Furthermore, the application can store the resulting tree structure inside a session file and restored that later on. It's also possible to copy and paste entries inside the tree, which makes it easy to create a bookmarks panel for single files without using bookmark files. The Tk application can be found in the download at the end of this article.

Beside the final PDF file, the application creates a file with the same basename and the .cnt extension. This file contains the bookmarks for the PDF. It's also useful to continue the processing of the combined PDF file instead of reassembling all the source files again. The entry for this feature is File->Load Bookmarks-File.

When loading a bookmarks file, the same extension convention is in place.

Other PDF Packages on CPAN

I like PDF::Reuse, but there are several other options for PDF creation and manipulation on the CPAN.

  • PDF::API2, by Alfred Reibenschuh, is actively maintained. It is the package of choice if creating new PDF documents from scratch.
  • PDF::API2::Simple, by Red Tree Systems, is a wrapper over the PDF::API2 module for users who find the PDF::API2 module to difficult to use.
  • Text::PDF, by Martin Hosken, can work on more than PDF file at the same time and has Truetype font support.
  • CAM::PDF, by Clotho Advanced Media, is like PDF::Reuse more focused on reading and manipulating existing PDF documents. However, it can work on multiple files at the same time. Use it if you need more features than PDF::Reuse actually provides.

Conclusions

PDF::Reuse is a well-written and well-documented package, which makes it easy to create, combine, and change existing PDF documents. The two sample applications show some of its capabilities. Two limitations should be mentioned however, PDF::Reuse can't reuse existing bookmarks, and after combining different PDF documents some of the inner document hyperlinks might stop working properly. The example source code for the applications, packages, and the modified PDF::Reuse is available.

Making Perl Reusable with Modules

Perl software development can occur at several levels. When first developing the idea for an application, a Perl developer may start with a short program to flesh out the necessary algorithms. After that, the next step might be to create a package to support object-oriented development. The final work is often to create a Perl module for the package to make the logic available to all parts of the application. Andy Sylvester explores this topic with a simple mathematical function.

Creating a Perl Subroutine

I am working on ideas for implementing some mathematical concepts for a method of composing music. The ideas come from the work of Joseph Schillinger. At the heart of the method is being able to generate patterns using mathematical operations and using those patterns in music composition. One of the basic operations described by Schillinger is creating a "resultant," or series of numbers, based on two integers (or "generators"). Figure 1 shows a diagram of how to create the resultant of the integers 5 and 3.

creating the resultant of 5 and 3
Figure 1. Creating the resultant of 5 and 3

Figure 1 shows two line patterns with units of 5 and units of 3. The lines continue until both lines come down (or "close") at the same time. The length of each line corresponds to the product of the two generators (5 x 3 = 15). If you draw dotted lines down from where each of the two generator lines change state, you can create a third line that changes state at each of the dotted line points. The lengths of the segments of the third line make up the resultant of the integers 5 and 3 (3, 2, 1, 3, 1, 2, 3).

Schillinger used graph paper to create resultants in his System of Musical Composition. However, another convenient way of creating a resultant is to calculate the modulus of a counter and then calculate a term in the resultant series based on the state of the counter. An algorithm to create the terms in a resultant might resemble:

Read generators from command line
Determine total number of counts for resultant
   (major_generator * minor_generator)
Initialize resultant counter = 0
For MyCounts from 1 to the total number of counts
   Get the modulus of MyCounts to the major and minor generators
   Increment the resultant counter
   If either modulus = 0
     Save the resultant counter to the resultant array
     Re-initialize resultant counter = 0
   End if
End for

From this design, I wrote a short program using the Perl modulus operator (%):

#!/usr/bin/perl
#*******************************************************
#
# FILENAME: result01.pl
#
# USAGE: perl result01.pl major_generator minor_generator
#
# DESCRIPTION:
#    This Perl script will generate a Schillinger resultant
#    based on two integers for the major generator and minor
#    generator.
#
#    In normal usage, the user will input the two integers
#    via the command line. The sequence of numbers representing
#    the resultant will be sent to standard output (the console
#    window).
#
# INPUTS:
#    major_generator - First generator for the resultant, input
#                      as the first calling argument on the
#                      command line.
#
#    minor_generator - Second generator for the resultant, input
#                      as the second calling argument on the
#                      command line.
#
# OUTPUTS:
#    resultant - Sequence of numbers written to the console window
#
#**************************************************************

   use strict;
   use warnings;

   my $major_generator = $ARGV[0];
   my $minor_generator = $ARGV[1];

   my $total_counts   = $major_generator * $minor_generator;
   my $result_counter = 0;
   my $major_mod      = 0;
   my $minor_mod      = 0;
   my $i              = 0;
   my $j              = 0;
   my @resultant;

   print "Generator Total = $total_counts\n";

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $major_generator;
       $minor_mod = $i % $minor_generator;
       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@resultant, $result_counter);
          $result_counter = 0;
       }
       print "$i \n";
       print "Modulus of $major_generator is $major_mod \n";
       print "Modulus of $minor_generator is $minor_mod \n";
   }

   print "\n";
   print "The resultant is @resultant \n";

Run the program with 5 and 3 as the inputs (perl result01.pl 5 3):

Generator Total = 15
1
Modulus of 5 is 1
Modulus of 3 is 1
2
Modulus of 5 is 2
Modulus of 3 is 2
3
Modulus of 5 is 3
Modulus of 3 is 0
4
Modulus of 5 is 4
Modulus of 3 is 1
5
Modulus of 5 is 0
Modulus of 3 is 2
6
Modulus of 5 is 1
Modulus of 3 is 0
7
Modulus of 5 is 2
Modulus of 3 is 1
8
Modulus of 5 is 3
Modulus of 3 is 2
9
Modulus of 5 is 4
Modulus of 3 is 0
10
Modulus of 5 is 0
Modulus of 3 is 1
11
Modulus of 5 is 1
Modulus of 3 is 2
12
Modulus of 5 is 2
Modulus of 3 is 0
13
Modulus of 5 is 3
Modulus of 3 is 1
14
Modulus of 5 is 4
Modulus of 3 is 2
15
Modulus of 5 is 0
Modulus of 3 is 0

The resultant is 3 2 1 3 1 2 3

This result matches the resultant terms as shown in the graph in Figure 1, so it looks like the program generates the correct output.

Creating a Perl Package from a Program

With a working program, you can create a Perl package as a step toward being able to reuse code in a larger application. The initial program has two pieces of input data (the major generator and the minor generator). The single output is the list of numbers that make up the resultant. These three pieces of data could be combined in an object. The program could easily become a subroutine to generate the terms in the resultant. This could be a method in the class contained in the package. Creating a class implies adding a constructor method to create a new object. Finally, there should be some methods to get the major generator and minor generator from the object to use in generating the resultant (see the perlboot and perltoot tutorials for background on object-oriented programming in Perl).

From these requirements, the resulting package might be:

#!/usr/bin/perl
#*******************************************************
#
# Filename: result01a.pl
#
# Description:
#    This Perl script creates a class for a Schillinger resultant
#    based on two integers for the major generator and the
#    minor generator.
#
# Class Name: Resultant
#
# Synopsis:
#
# use Resultant;
#
# Class Methods:
#
#   $seq1 = Resultant ->new(5, 3)
#
#      Creates a new object with a major generator of 5 and
#      a minor generator of 3. These parameters need to be
#      initialized when a new object is created, as there
#      are no methods to set these elements within the object.
#
#   $seq1->generate()
#
#      Generates a resultant and saves it in the ResultList array
#
# Object Data Methods:
#
#   $major_generator = $seq1->get_major()
#
#      Returns the major generator
#
#   $minor_generator = $seq1->get_minor()
#
#      Returns the minor generator
#
#
#**************************************************************

{ package Resultant;
  use strict;
  sub new {
    my $class           = shift;
    my $major_generator = shift;
    my $minor_generator = shift;

    my $self = {Major => $major_generator,
                Minor => $minor_generator,
                ResultantList => []};

    bless $self, $class;
    return $self;
  }

  sub get_major {
    my $self = shift;
    return $self->{Major};
  }

  sub get_minor {
    my $self = shift;
    return $self->{Minor};
  }

  sub generate {
    my $self         = shift;
    my $total_counts = $self->get_major * $self->get_minor;
    my $i            = 0;
    my $major_mod;
    my $minor_mod;
    my @result;
    my $result_counter = 0;

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $self->get_major;
       $minor_mod = $i % $self->get_minor;

       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@result, $result_counter);
          $result_counter = 0;
       }
   }

   @{$self->{ResultList}} = @result;
  }
}

#
# Test code to check out class methods
#

# Counter declaration
my $j;

# Create new object and initialize major and minor generators
my $seq1 = Resultant->new(5, 3);

# Print major and minor generators
print "The major generator is ", $seq1->get_major(), "\n";
print "The minor generator is ", $seq1->get_minor(), "\n";

# Generate a resultant
$seq1->generate();

# Print the resultant
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

Execute the file (perl result01a.pl):

The major generator is 5
The minor generator is 3
The resultant is 3 2 1 3 1 2 3

This output text shows the same resultant terms as produced by the first program.

Creating a Perl Module

From a package, you can create a Perl module to make the package fully reusable in an application. Also, you can modify our original test code into a series of module tests to show that the module works the same as the standalone package and the original program.

I like to use the Perl module Module::Starter to create a skeleton module for the package code. To start, install the Module::Starter module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Module::Starter module installed, type perldoc Module::Starter in a terminal window. If the man page does not appear, you probably do not have the module installed.

Select a working directory to create the module directory. This can be the same directory that you have been using to develop your Perl program. Type the following command (though with your own name and email address):

$ module-starter --module=Music::Resultant --author="John Doe" \
    --email=john@johndoe.com

Perl should respond with:

Created starter directories and files

In the working directory, you should see a folder or directory called Music-Resultant. Change your current directory to Music-Resultant, then type the commands:

$ perl Makefile.PL
$ make

These commands will create the full directory structure for the module. Now paste the text from the package into the module template at Music-Resultant/lib/Music/Resultant.pm. Open Resultant.pm in a text editor and paste the subroutines from the package after the lines:

=head1 FUNCTIONS

=head2 function1

=cut

When you paste the package source code, remove the opening brace from the package, so that the first lines appear as:

 package Resultant;
  sub new {
    use strict;
    my $class = shift;

and the last lines of the source appears without the the final closing brace as:

   @{$self->{ResultList}} = @result;
  }

After making the above changes, save Resultant.pm. This is all that you need to do to create a module for your own use. If you eventually release your module to the Perl community or upload it to CPAN, you should do some more work to prepare the module and its documentation (see the perlmod and perlmodlib documentation for more information).

After modifying Resultant.pm, you need to install the module to make it available for other Perl applications. To avoid configuration issues, install the module in your home directory, separate from your main Perl installation.

  1. In your home directory, create a lib/ directory, then create a perl/ directory within the lib/ directory. The result should resemble:

    /home/myname/lib/perl
  2. Go to your module directory (Music-Resultant) and re-run the build process with a directory path to tell Perl where to install the module:

    $ perl Makefile.PL LIB=/home/myname/lib/perl $
    make install

    Once this is complete, the module will be installed in the directory.

The final step in module development is to add tests to the .t file templates created in the module directory. The Perl distribution includes several built-in test modules, such as Test::Simple and Test::More to help test Perl subroutines and modules.

To test the module, open the file Music-Resultant/t/00-load.t. The initial text in this file is:

#!perl -T

use Test::More tests => 1;

BEGIN {
    use_ok( 'Music::Resultant' );
}

diag( "Testing Music::Resultant $Music::Resultant::VERSION, Perl $], $^X" );

You can run this test file from the t/ directory using the command:

perl -I/home/myname/lib/perl -T 00-load.t

The -I switch tells the Perl interpreter to look for the module Resultant.pm in your alternate installation directory. The directory path must immediately follow the -I switch, or Perl may not search your alternate directory for your module. The -T switch is necessary because there is a -T switch in the first line of the test script, which turns on taint checking. (Taint checking only works when enabled at Perl startup; perl will exit with an error if you try to enable it later.) Your results should resemble the following(your Perl version may be different).

1..1
ok 1 - use Music::Resultant;
# Testing Music::Resultant 0.01, Perl 5.008006, perl

The test code from the second listing is easy to convert to the format used by Test::More. Change the number at the end of the tests line from 1 to 4, as you will be adding three more tests to this file. The template file has an initial test to show that the module exists. Next, add tests after the BEGIN block in the file:

# Test 2:
my $seq1 = Resultant->new(5, 3);  # create an object
isa_ok ($seq1, Resultant);        # check object definition

# Test 3: check major generator
my $local_major_generator = $seq1->get_major();
is ($local_major_generator, 5, 'major generator is correct' );

# Test 4: check minor generator
my $local_minor_generator = $seq1->get_minor();
is ($local_minor_generator, 3, 'minor generator is correct' );

To run the tests, retype the earlier command line in the Music-Resultant/ directory:

$ perl -I/home/myname/lib/perl -T t/00-load.t

You should see the results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

These tests create a Resultant object with a major generator of 5 and a minor generator of 3 (Test 2), and check to see that the major generator in the object is correct (Test 3), and that the minor generator is correct (Test 4). They do not cover the resultant terms. One way to check the resultant is to add the test code used in the second listing to the .t file:

# Generate a resultant
$seq1->generate();

# Print the resultant
my $j;
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

You should get the following results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
The resultant is 3 2 1 3 1 2 3
# Testing Music::Resultant 0.01, Perl 5.008006, perl

That's not valid test output, so it needs a little bit of manipulation. To check the elements of a list using a testing function, install the Test::Differences module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Test::Differences module installed, type perldoc Test::Differences in a terminal window. If the man page does not appear, you probably do not have the module installed.

Once that module is part of your Perl installation, change the number of tests from 4 to 5 on the Test::More statement line and add a following statement after the use Test::More statement:

use Test::Differences;

Finally, replace the code that prints the resultant with:

# Test 5: (uses Test::Differences and associated modules)
$seq1->generate();
my @result   = @{$seq1->{ResultList}};
my @expected = (3, 2, 1, 3, 1, 2, 3);
eq_or_diff \@result, \@expected, "resultant terms are correct";

Now when the test file runs, you can confirm that the resultant is correct:

1..5
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
ok 5 - resultant terms are correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

Summary

There are multiple levels of Perl software development. Once you start to create modules to enable reuse of your Perl code, you will be able to leverage your effort into larger applications. By using Perl testing modules, you can ensure that your code works the way you expect and provide a way to ensure that the modules continue to work as you add more features.

Resources

Here are some other good resources on creating Perl modules:

Here are some good resources for using Perl testing modules like Test::Simple and Test::More:

  • Test::Tutorial gives the basics of using Test:Simple and Test::More.
  • An Introduction to Testing presents the benefits of developing tests and code at the same time, and provides a variety of examples.

Option and Configuration Processing Made Easy

When you first fire up your editor and start writing a program, it's tempting to hardcode any settings or configuration so you can focus on the real task of getting the thing working. But as soon as you have users, even if the user is only yourself, you can bet there will be things they want to choose for themselves.

A search on CPAN reveals almost 200 different modules dedicated to option processing and handling configuration files. By anyone's standards that's quite a lot, certainly too many to evaluate each one.

Luckily, you already have a great module right in front of you for handling options given on the command line: Getopt::Long, which is a core module included as standard with Perl. This lets you use the standard double-dash style of option names:

myscript --source-directory "/var/log/httpd" --verbose \ --username=JJ

Using Getopt::Long

When your program runs, any command-line arguments will be in the @ARGV array. Getopt::Long exports a function, GetOptions(), which processes @ARGV to do something useful with these arguments, such as set variables or run blocks of code. To allow specific option names, pass a list of option specifiers in the call to GetOptions() together with references to the variables in which you want the option values to be stored.

As an example, the following code defines two options, --run and --verbose. The call to GetOptions() will then assign the value 1 to the variables $run and $verbose respectively if the relevant option is present on the command line.

use Getopt::Long;
my ($run,$verbose);
GetOptions( 'run'     => \$run,
             'verbose' => \$verbose );

When Getopt::Long has finished processing options, any remaining arguments will remain in @ARGV for your script to handle (for example, specified filenames). If you use this example code and call your script as:

myscript --run --verbose file1 file2 file3

then after GetOptions() has been called the @ARGV array will contain the values file1, file2, and file3.

Types of Command-Line Options

The option specifier provided to GetOptions() controls not only the option name, but also the option type. Getopt::Long gives a lot of flexibility in the types of option you can use. It supports Boolean switches, incremental switches, options with single values, options with multiple values, and even options with hash values.

Some of the most common specifiers are:

name     # Presence of the option will set $name to 1
name!    # Allows negation, e.g. --name will set $name to 1,
         #    --noname will set $name to 0
name+    # Increments the variable each time the option is found, e.g.
         # if $name = 0 then --name --name --name will set $name to 3
name=s   # String value required
         #    --name JJ or --name=JJ will set $name to JJ
         # Spaces need to be quoted
         #    --name="Jon Allen" or --name "Jon Allen"

So, to create an option that requires a string value, format the call to GetOptions() like this:

my $name;
GetOptions( 'name=s' => \$name );

The value is required. If the user omits it, as in:

myscript --name

then the call to GetOptions() will die() with an appropriate error message.

Options with Multiple Values

The option specifier consists of four components: the option name; data type (Boolean, string, integer, etc.); whether to expect a single value, a list, or a hash; and the minimum and maximum number of values to accept. To require a list of string values, build up the option specifier:

Option name:   name
Option value:  =s    (string)
Option type:   @     (array)
Value counter: {1,}  (at least 1 value required, no upper limit)

Putting these all together gives:

my $name;
GetOptions('name=s@{1,}' => \$name);

Now invoking the script as:

myscript --name Barbie Brian Steve

will set $name to the array reference ['Barbie','Brian','Steve'].

Giving a hash value to an option is very similar. Replace @ with % and on the command line give arguments as key=value pairs:

my $name;
GetOptions('name=s%{1,}',\$name);

Running the script as:

myscript --name Barbie=Director JJ=Member

will store the hash reference { Barbie => 'Director', JJ => 'Member' } in $name.

Storing Options in a Hash

By passing a hash reference as the first argument to GetOptions, you can store the complete set of option values in a hash instead of defining a separate variable for each one.

my %options;
GetOptions( \%options, 'name=s', 'verbose' );

Option names will be hash keys, so you can refer to the name value as $options{name}. If an option is not present on the command line, then the corresponding hash key will not be present.

Options that Invoke Subroutines

A nice feature of Getopt::Long is that, as an alternative to simply setting a variable when an option is found, you can tell the module to run any code of your choosing. Instead of giving GetOptions() a variable reference to store the option value, pass either a subroutine reference or an anonymous code reference. This will then be executed if the relevant option is found.

GetOptions( version => sub{ print "This is myscript, version 0.01\n"; exit; }
            help    => \&display_help );

When used in this way, Getopt::Long also passes the option name and value as arguments to the subroutine:

GetOptions( name => sub{ my ($opt,$value) = @_; print "Hello, $value\n"; } );

You can still include code references in the call to GetOptions() even if you use a hash to store the option values:

my %options;
GetOptions( \%options, 'name=s', 'verbose', 'dest=s',
            'version' => sub{ print "This is myscript, version 0.01\n"; exit; } );

Dashes or Underscores?

If you need to have option names that contain multiple words, such as a setting for "Source directory," you have a few different ways to write them:

--source-directory
--source_directory
--sourcedirectory

To give a better user experience, Getopt::Long allows option aliases to allow either format. Define an alias by using the pipe character (|) in the option specifier:

my %options;
GetOptions( \%options, 'source_directory|source-directory|sourcedirectory=s' );

Note that if you're storing the option values in a hash, the first option name (in this case, source_directory) will be the hash key, even if your user gave an alias on the command line.

If you have a lot of options, it might be helpful to generate the aliases using a function:

use strict;
use warnings;
use Data::Dumper;
use Getopt::Long;

my %specifiers = ( 'source-directory' => '=s',
                   'verbose'          => '' );
my %options;
GetOptions( \%options, optionspec(%specifiers) );

print Dumper(\%options);

sub optionspec {
  my %option_specs = @_;
  my @getopt_list;

  while (my ($option_name,$spec) = each %option_specs) {
    (my $variable_name = $option_name) =~ tr/-/_/;
    (my $nospace_name  = $option_name) =~ s/-//g;
    my  $getopt_name   = ($variable_name ne $option_name)
        ? "$variable_name|$option_name|$nospace_name" : $option_name;

    push @getopt_list,"$getopt_name$spec";
  }

  return @getopt_list;
}

Running this script with each format in turn shows that they are all valid:

varos:~/writing/argvfile jj$ ./optionspec.pl --source-directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --source_directory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

varos:~/writing/argvfile jj$ ./optionspec.pl --sourcedirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Additionally, Getopt::Long is case-insensitive by default (for option names, not values), so your users can also use --SourceDirectory, --sourceDirectory, etc., as well:

varos:~/writing/argvfile jj$ ./optionspec.pl --SourceDirectory /var/spool
$VAR1 = {
          'source_directory' => '/var/spool'
        };

Configuration Files

The next stage on from command-line options is to let your users save their settings into config files. After all, if your program expands to have numerous options it's going to be a real pain to type them in every time.

When it comes to the format of a configuration file, there are a lot of choices, such as XML, INI files, and the Apache httpd.conf format. However, all of these formats share a couple of problems. First, your users now have two things to learn: the command-line options and the configuration file syntax. Second, even though many CPAN modules are available to parse the various config file formats, you still must write the code in your program to interact with your chosen module's API to set whatever variables you use internally to store user settings.

Getopt::ArgvFile to the Rescue

Fortunately, someone out there in CPAN-land has the answer (you can always count on the Perl community to come up with innovative solutions). Getopt::ArgvFile tackles both of these problems, simplifying the file format and the programming interface in one fell swoop.

To start with, the file format used by Getopt::ArgvFile is extremely easy for users to understand. Config settings are stored in a plain text file that holds exactly the same directives that a user would type on the command line. Instead of typing:

myscript --source-directory /usr/local/src --verbose --logval=alert

your user can use the config file:

--source-directory /usr/local/src
--verbose
--logval=alert

and then run myscript for instant user gratification with no steep learning curve.

Now to the clever part. Getopt::ArgvFile itself doesn't actually care about the contents of the config file. Instead, it makes it appear to your program that all the settings were actually options typed on the command line--the processing of which you've already covered with Getopt::Long. As well as saving your users time by not making them learn a new syntax, you've also saved yourself time by not needing to code against a different API.

The most straightforward method of using Getopt::ArgvFile involves simply including the module in a use statement:

use Getopt::ArgvFile home=>1;

A program called myscript that contains this code will search the user's home directory (whatever the environment variable HOME is set to) for a config file called .myscript and extract the contents ready for processing by Getopt::Long.

Here's a complete example:

use strict;
use warnings;
use Getopt::ArgvFile home=>1;
use Getopt::Long;

my %config;
GetOptions( \%config, 'name=s' );

if ($config{name}) {
  print "Hello, $config{name}\n";
} else {
  print "Who am I talking to?\n";
}

Save this as hello, then run the script with and without a command-line option:

varos:~/writing/argvfile jj$ ./hello
Who am I talking to?

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Now, create a settings file called .hello in your home directory containing the --name option. Remember to double quote the value if you want to include spaces.

varos:~/writing/argvfile jj$ cat ~/.hello
--name "Jon Allen"

Running the script without any arguments on the command line will show that it loaded the config file, but you can also override the saved settings by giving the option on the command line as normal.

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello --name JJ
Hello, JJ

Advanced Usage

In many cases the default behaviour invoked by loading the module will be all you need, but Getopt::ArgvFile can also cater to more specific requirements.

User-Specified Config Files

Suppose your users want to save different sets of options and specify which one to use when they run your program. This is possible using the @ directive on the command line:

varos:~/writing/argvfile jj$ cat jj.conf
--name JJ

varos:~/writing/argvfile jj$ ./hello
Hello, Jon Allen

varos:~/writing/argvfile jj$ ./hello @jj.conf
Hello, JJ

Note that there's no extra programming required to use this feature; handling @ options is native to Getopt::ArgvFile.

Changing the Default Config Filename or Location

Depending on your target audience, the naming convention offered by Getopt::ArgvFile for config files might not be appropriate. Using a dotfile (.myscript) will render your user's config file invisible in his file manager or when listing files at the command prompt, so you may wish to use a name like myscript.conf instead.

Again, it may also be helpful to allow for default configuration files to appear somewhere other than the user's home directory, for example, if you need to allow system-wide configuration.

A further consideration here is PAR , the tool for creating standalone executables from Perl programs. PAR lets you include data files as well as Perl code, so you can bundle a default settings file using a command such as:

pp hello -o hello.exe -a hello.conf

which will be available to your script as $ENV{PAR_TEMP}/inc/hello.conf.

I mentioned earlier that Getopt::ArgvFile can load arbitrary config files if the filename appears with the @ directive on the command line. Essentially, what the module does when loaded with:

use Getopt::ArgvFile home=>1;

is to prepend @ARGV with @$ENV{HOME}/.scriptname, then resolve all @ directives, leaving @ARGV with the contents of the files. This means that running the script as:

myscript --name=JJ

is basically equivalent to writing:

myscript @$ENV{HOME}/.myscript --name-JJ

To load other config files, Getopt::ArgvFile supports disabling the automatic @ARGV processing and triggering it later. With a little manipulation of @ARGV first, you can make:

myscript --name=JJ

equivalent to:

myscript @/path/to/default.conf @/path/to/system.conf @/path/to/user.conf \
    --name=JJ

which will load the set of config files in the correct priority order.

All you need to do to enable this feature is change the use statement to read:

use Getopt::ArgvFile qw/argvFile/;

Loading the module in this way tells Getopt::ArgvFile to export the function argvFile(), which your program needs to call to process the @ directives, and also prevents any automated processing from occurring.

Here's an example that first loads a config file from the application bundle (if packaged by PAR) and then from the directory containing the application binary:

use File::Basename qw/basename/;
use FindBin qw/$Bin/;
use Getopt::ArgvFile qw/argvFile/;

# Define config filename as <application_name>.conf
(my $configfile = basename($0)) =~ s/^(.*?)(?:\..*)?$/$1.conf/;

# Include config file from the same directory as the application binary
if (-e "$Bin/$configfile") {
  unshift @ARGV,'@'."$Bin/$configfile";
}

# If we have been packaged with PAR, include the config file from the
# application bundle
if ($ENV{PAR_TEMP} and -e "$ENV{PAR_TEMP}/inc/$configfile") {
  unshift @ARGV,'@'."$ENV{PAR_TEMP}/inc/$configfile";
}

argvFile();  # Process @ARGV to load specified config files

You can also use this technique together with File::HomeDir to access the user's application data directory in a cross-platform manner, so that the location of the config file conforms to the conventions set by the user's operating system.

Summary

Getopt::Long provides an easy to use, extensible system for processing command-line options. With the addition of Getopt::ArgvFile, you can seamlessly handle configuration files with almost no extra coding. Together, these modules should be first on your list when writing scripts that need any amount of configuration.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02