PDF Processing with Perl
by Detlef Groth
|
Pages: 1, 2, 3
Adding Page Numbers
Sometimes you need only make a small change to a document, such as adding missing page numbers.
# file: examples/sample-numbers.pl
use PDF::Reuse;
use strict;
my $n = 1;
prFile('examples/sample-numbers.pdf');
while (1) {
prText(550, 40, $n++);
last unless prSinglePage('sample-letters.pdf');
}
prEnd();
prSinglePage takes one page after the other from an existing PDFdocument and returns the number of remaining pages after each invocation.
Low-Level PDF Commands
If you know low-level PDF instructions, you can add them with with the prAdd(string) subroutine. PDF::Reuse will perform no syntax checks on the instructions, so refer to the PDF reference manual. Here's an example of printing colored rectangles with the prAdd subroutine.
# file: examples/sample-rectangle.pl
use PDF::Reuse;
use strict;
prFile('examples/sample-rectangle.pdf');
my $x = 40;
my $y = 50;
my @colors;
foreach my $r (0..5) {
foreach my $g (0..5) {
foreach my $b (0..5) {
push @colors,
sprintf("%1.1f %1.1f %1.1f rg\n",
$r * 0.2, $g * 0.2, $b * 0.2);
}
}
}
while (1) {
if ($x > 500) {
$x = 40; $y += 40;
last unless @colors;
}
# a rectangle
my $string = "$x $y 30 30 re\n";
$string .= shift @colors;
# fill and stroke
$string .= "b\n";
prAdd($string);
$x += 40;
}
prEnd();
Adding Bookmarks
Working with PDF files becomes comfortable if the document has bookmarks with a table of contents-like structure. Some applications either can't provide the PDF document with bookmarks or support insufficient or incorrect bookmarks. PDF::Reuse can fill this gap with the prBookmark($reference) subroutine.
A bookmark reference is a hash or a array of hashes that looks like:
{ text => 'Document-Text',
act => 'this.pageNum = 0; this.scroll(40, 500);',
kids => [ { text => 'Chapter 1',
act => '1, 40, 600'
},
{ text => 'Chapter 2',
act => '10, 40, 600'
}
]
}
...where act is a JavaScript action to trigger when someone clicks on the bookmark. Because those JavaScript actions only work in the Acrobat Reader but not in other PDF viewer applications, I will later show a improvement of PDF::Reuse that fixes this issue.
Other examples for using PDF::Reuse, including image embedding, are available in the PDF::Reuse::Tutorial.
A Console Application for Combining PDF Documents
To avoid editing the Perl code for combining PDF documents every time you want to merge documents, I've written a console application that takes the names of the input files and the page ranges for each file as arguments. That's easy to reuse in a graphical application using Perl/Tk, so I've put that code in a separate Perl module called CombinePDFs. The command-line application will interact with this package instead of directly working on PDF::Reuse. The following diagram shows the relationship between the Packages, example, and applications.
Examples | Packages | applications
-------------------------------------------------------------------
combine.pdfs app-combine-console-pdfs.pl
\ /
PDF::Reuse -- CombinePDFs
/ \
create.pdfs app-combine-tk-pdfs.pl
The application app-combine-console-pdfs.pl does not deal directly with PDF::Reuse but parses the command line arguments with Getopt::Long written by Johan Vromans. This is the standard package for this task. Here it parses the input filenames and the page ranges into two arrays of same length. The user also has to supply a filename for the output and, optionally, a bookmarks file. The main subroutine that parses the command line arguments and executes CombinePDFs::createPDF is:
sub main {
GetOptions("infile=s" => \@infiles,
"outfile=s" => \$outfile,
"pages=s", => \@pages,
'overwrite' => \$overwrite,
'bookmarks:s' => \$bookmarks,
'help' => \&help);
help unless ((@infiles and $outfile and @pages) and @pages == @infiles);
checkPages();
checkFiles();
checkBookmarks();
CombinePDFs::createPDF(\@infiles, \@pages, $outfile, $bookmarks);
}
If the user passes an insufficient number of arguments, invalid filenames, or incorrect page ranges, the code invokes the the usage subroutine. It also gets invoked if the user asks explicitly for -help on the command line. Any good command line application should be written that way. Getopt::Long can distinguish between mandatory arguments, with = as the symbol after the argument name (infile, pages), optional arguments, with : (bookmarks), or flags (overwrite, usage), without a symbol. It can store these arguments as arrays (infile, pages), hashes, or scalars. It also supports type checking.

