Recently in Editors Category

Perl Needs Better Tools


Perl is in danger of becoming a fading language--new programmers are learning Java and Python in college, and companies like Google hardly use Perl at all. If you are afraid that Perl may be in danger of becoming irrelevant for medium-to-large projects, then read on.

The Scary Part

I have discussed the future of Perl with managers from companies that currently use it and find that they worry about the future of Perl. One company I spoke with here in San Francisco is rewriting their core application in Java. Another company worries they will not be able to find new Perl programmers down the road. Yet another uses Perl for major projects, but suffers from difficulty in refactoring their extensive code base.

There are many reasons why companies care about the future of Perl. I offer a part of a solution: better tools for Perl can be a major part of keeping Perl relevant and effective as the primary language for medium and large projects.

When measuring the effectiveness of a development environment (people, language, tools, processes, etc.), a key measure is how expensive and painful it is to make changes to existing code. Once a project or system has grown to thousands of lines of code in dozens (or hundreds) of modules, the cost of making changes can escalate to the point where the team is afraid to make any significant change. Excellent tools are one of the ways to avoid this unhappy situation, or at least reduce its impact. Other factors are excellent processes and, of course, excellent people.

21st-Century Integrated Development Environments for Perl

I propose that more, high-quality development tools will help keep Perl relevant and alive in medium and large project environments. My focus in this article is on IDEs, or Integrated Development Environments, and primarily those with a graphical interface.

An IDE is an integrated set of tools for programming, combining a source code editor with a variety of other tools into a single package. Common features of modern IDEs include refactoring support, version control, real-time syntax checking, and auto-completion of code while typing.

I want to make it clear right at the outset that a team of highly skilled Perl programmers, using only tools that have been around for years (such as emacs, vi, cvs, and make) can and do build large, sophisticated, and successful projects. I am not worried about those programmers. I am worried about the larger population of programmers with one to five years of experience, and those who have not yet begun to program: the next generation of Perl programmers.

Great tools will not make a bad programmer into a good programmer, but they will certainly make a good programmer better. Unfortunately, the tools for Perl are years behind what is available for other languages, particularly Java.

One powerful example is the lack of graphical IDEs for Perl with excellent support for refactoring. Several IDEs for Java have extensive refactoring support. Only one for Perl, the EPIC plugin for Eclipse, supports even a single refactoring action.

For an example of how good IDEs have inspired at least one Perl developer, see Adam Kennedy's Perl.com article on his new PPI module and Scott Sotka's Devel::Refactor module (used in EPIC).

I acknowledge that a graphical IDE is not the be-all of good tools. Just as some writers reject word processors in favor of typewriters or hand-written manuscripts, some programmers reject graphical IDEs and would refuse a job that required them to use one. Not everyone has (nor should have) the same tool set, and there are things a pencil can do that vi and emacs will never do. That said, IDEs have wide use in businesses doing larger projects, and for many programmers and teams they provide major increases in productivity.

Another important point is that while this article discusses over a dozen specific tools or features, having all the tools in a single package produces the biggest value. An IDE that provides all of these features in a single package that people can easily install, easily extend, and easily maintain across an entire development team has far more value than the sum of its parts.

There is a big win when the features provided by an IDE immediately upon installation include all or almost all of the tools and features discussed here and where the features "know" about each other. For example, it is good if you enter the name of a non-existent subroutine and the real-time syntax checker catches this. It is much better if the code-assist feature then pops up a context menu offering to create a stub for the subroutine or to correct the name to that of an existing similar subroutine or method from another class that is available to the current file. (This is standard behavior for some Java IDEs.)

What Would a 21st-Century Perl Tool Set Contain?

Perl needs a few great IDEs--not just one, but more than one so that people have a diverse set to choose from. Perl deserves and needs a few great IDEs that lead the pack and set the standard for IDEs in other languages.

I am well aware that the dynamic nature of Perl makes it harder to have a program that can read and understand a Perl program, especially a large and complex one, but the difficulty in comprehending a Perl program makes the value of such a tool all the greater, and I have faith that the Perl community can overcome some of the built-in challenges of Perl. Indeed, it is among the greatest strengths of Perl that Perl users can adapt the language to their needs.

A great Perl IDE will contain at least the following, plus other features I haven't thought of. (And, of course, there must be many of those!)

Most of the screen shot examples in this article use the EPIC Perl IDE. At present, it has the largest amount of the features on my list (although it certainly doesn't have all of them).

Syntax-Coloring Text Editor

Most of you have probably seen this. It is available under vim, emacs, BBEdit, and TextPad. Just about every decent text editor will colorize source code so that keywords, operators, variables, etc., each have their own color, making it easier to spot syntax errors such as forgetting to close a quote pair.

Real-Time Syntax Checking

real-time syntax check example
Figure 1. Real-time syntax checking

The IDE in Figure 1 shows that line 4 has an error because of the missing ) and that line 5 has an error because there is no declaration of $naame (and use strict is in effect).

A key point here is that the IDE shows these errors right away, before you save and compile the code. (In this example, the EPIC IDE lets you specify how often to run the syntax check, from 0.01 to 10.00 seconds of idle time, or only on demand.)

As nice as this is, it would be even better if the IDE also offered ways to fix the problem, for example, offering to change $naame to $name. Figure 2 shows an IDE that does exactly that; unfortunately, for Java, not Perl.

syntax help from the IDE
Figure 2. Syntax help from the IDE

It would be great if Perl IDEs offered this kind of help.

Version Control Integration

All non-insane large projects use version control software. The most common version control software systems are probably CVS, Perforce, Subversion, and Visual SourceSafe. Figure 3 shows an IDE comparing the local version of a file to an older version from the CVS repository.

Figure 3
Figure 3. Comparing a local file to an older version in CVS--click image for full-size screen shot

CVS integration is available in many modern code editors, including emacs, vim, and BBEdit, as well as graphical IDEs such as Eclipse and Komodo Pro. Subversion integration is available as a plugin for Eclipse; Komodo Pro supports Perforce and Subversion.

A Code-Assist Editor

Suppose that you have just typed in an object reference and want to call a method on the object, but you are not sure what the method name is. Wouldn't it be nice if the editor popped up a menu listing all of the methods available for that object? It might look something like Figure 4.

automatic code completion
Figure 4. Automatic code completion

In this example, the IDE is able to figure out which class the object $q is an instance of and lists the names of the available methods. If you type a p, then the list shows only the method names beginning with p. If you type pa, then the list shows only the param() and parse_params() methods.

Excellent Refactoring Support

The easier it is to do refactoring, the more often people will do it. The following list contains the most common refactorings. Your personal list will probably be a little different. All of these are things you can do "manually," but the idea is to make them into one or two-click operations so that you will do them much more often. (For a extensive list of refactoring operations, see Martin Fowler's alphabetical list of refactorings.)

Extract Subroutine/Method

The IDE should create a new subroutine using the selected code and replace the selected code with a call to the new subroutine, with the proper parameters. Here's an example of using the Extract Subroutine refactoring from Eclipse/EPIC (which uses the Devel::Refactor module).

First, you select a chunk of code to turn into a new subroutine, and then select Extract Subroutine from a context menu. You then get the a dialog box asking for the name of the new subroutine (shown in Figure 5).

code before Extract
Subroutine refactoring
Figure 5. Code before Extract Subroutine refactoring

The IDE replaces the selected code with a call to the new subroutine, making reasonable guesses about the parameters and return values (Figure 6). You may need to clean up the result manually.

code after Extract
Subroutine refactoring
Figure 6. Code after Extract Subroutine

Figure 7 shows the new subroutine created by the IDE. In this case, it needs no changes, but sometimes you will need to adjust the parameters and/or return value(s).

the new subroutine
created by Extract Subroutine
Figure 7. The new subroutine created by Extract Subroutine

Ideally, the editor should prompt you to replace similar chunks of code with calls to the new subroutine.

Rename Subroutine/Method

The IDE should find all the calls to the subroutine throughout your project and offer to change them for you. You should be able to see a preview of all of the places a change could occur, and to accept or reject each one on a case-by-case basis. The action should be undoable.

Rename Variable

Like Rename Subroutine, this feature should find all occurrences throughout the project and offer to make the changes for you.

Change Subroutine/Method Signature

The IDE should be able to make reasonable guesses about whether each subroutine or method call is supplying the proper parameters. Partly this is to enable the real-time syntax checking mentioned above, and partly this is to enable you to select a subroutine declaration and tell the IDE you want to refactor it by adding or removing a parameter. The IDE should then prompt you for the change(s) you want to make, do its best to find all of the existing calls to the subroutine, and offer to correct the subroutine calls to supply the new parameters.

Obviously, this is an especially tricky thing to do in Perl, where subroutines fish their parameters out of @_. So the IDE would have to look carefully at how the code uses shift, @_, and $_[] in order to have a reasonable guess about the parameters the subroutine is expecting. In many common cases, though, a Perl IDE could make a reasonable guess about the parameters, such as in the following two examples, so that if you added or removed one, it could immediately prompt you about making corrections throughout the project:

sub doSomething {
    my $gender = shift;
    my $age    = shift;
    # Not too terribly hard to guess that $gender and $age are params
}

sub anotherThing {
    my ($speed,$direction) = @_;
    # No magic needed to guess $speed and $direction are params.
}
Move Subroutine/Method

This refactoring operation should give you a list or dialog box to choose the destination file in your project. The IDE should allow you to preview all of the changes that it would make to accomplish the move, which will include updating a call to the subroutine/method to use the proper class. At a minimum, the IDE should show you or list all of the calls to the subroutine so you can make the appropriate changes yourself. Ideally, the IDE should make a guess about possible destinations; for example, if $self is a parameter to the method being moved, then the IDE might try assuming the method is an object (instance) method and initially only list destination classes that inherit from the source class, or from which the source class inherits.

Change a Package Name

As with Rename Subroutine and Rename Variable, when changing a package name, the IDE should offer to update all existing references throughout your project.

Tree View and Navigation of Source Files and Resources

Another useful feature of good IDEs is being able to view all of the code for a project, or multiple projects, in a tree format, where you can "fold" and "unfold" the contents of folders. All of the modern graphical IDEs support this, even with multiple projects in different languages.

Being able to view your project in this manner gives you both a high-level overview and the ability to drill down into specific files, and to mix levels of detail by having some folders show their contents and some not.

For example, Figure 8 shows a partial screen shot from ActiveState's Komodo IDE.

tree view of code in Komodo
Figure 8. Tree view of code in Komodo

Support for Creating and Running Unit Tests

Anyone who has installed Perl modules from CPAN has seen unit tests--these are the various, often copious, tests that run when you execute the make test part of the installation process. The vast majority of CPAN modules include a suite of tests, often using the Test::Harness and/or Test::More modules. A good IDE will make it very easy to both create and run unit tests as you develop your project.

The most basic form of support for unit tests in an IDE is simply to make it easy to execute arbitrary scripts from within the IDE. Create a test.pl for your project and keep adding tests to it or to a t/ subdirectory as you develop, and keep running the script as you make changes. All modern IDEs provide at least this minimal capability.

A more sophisticated level of support for unit tests might resemble the Java IDE feature for tests written in JUnit, where you can select an existing class file (a .pm file in Perl) and ask the IDE to create a set of stub tests for every subroutine in the file. (See JUnit and the Perl module Test::Unit for more on unit tests.) Furthermore, the IDE should support running a set of tests and giving simple visual feedback on what passed/failed. The standard approach in the JUnit world is to show either a "green bar" (all passed) or "red bar" (something failed) and then allow you to see details on failures. Other nice-to-have features include calculating code-coverage, providing statistical summaries of tests, etc.

Figure 9 shows a successful run of a Java test suite with Eclipse.

JUnit test run, success
Figure 9. A successful JUnit test run

Figure 10 shows the same test run, this time with a failure.

JUnit test run, with a failure.
Figure 10. A JUnit test run with a failure

A stack trace of the failure message appears in another part of the window (cropped out here to save space). If you double-click on the test that failed (testInflate), the IDE will open the file (BalloonTest, in this case) and navigate to the test function.

The central idea is that the IDE should make it as painless as possible to add and modify and run tests, so you will do more of it during development.

Language-Specific Help

This is a fairly straightforward idea--the IDE should be able to find and display the appropriate documentation for any keyword in your code, so if you highlight push and ask for help, you should see the push entry from the Perl documentation. If you highlight a method or subroutine or other symbol name from an imported module, the IDE should display the module's documentation for the selected item. Of course, this requires that the documentation be available in a consistent, machine-readable form, which is only sometimes true.

Debugger with Real-Time Display of Results

All modern IDEs offer support for running your code under a debugger, usually with visual display of what's going on, including the state of variables. The Komodo IDE supports debugging Perl that is running either locally or remotely.

Typical support for debugging in an IDE includes the ability to set breakpoints, monitor the state of variables, etc. Basically, the IDE should provide support for all of the features of the debugger itself. Graphical IDEs should provide a visual display of what is going on.

Automatic Code Reformatting

This means automatically or on-demand re-indenting and other reformatting of code. For example, when you cut and paste a chunk of code, the IDE should support reformatting the chunk to match the indentation of its new location. If you change the number of spaces or tabs for each level of indentation, or your convention for the placement of curly braces, then the IDE should support adjusting an entire file or all files in your project.

Seamless Handling of Multiple Languages

Many large software projects involve multiple languages. This is almost universally true in the case of web applications, where the user interface typically uses HTML, CSS, and JavaScript, and the back end uses one or more of Perl, PHP, Java, Python, Ruby, etc. It is very helpful to have development tools that seamlessly integrate work done in all of the languages. This is becoming quite common. For example, both Komodo and Eclipse support multiple languages.

Automated Building and Testing

This feature can be very basic by making it easy to run an arbitrary script from within the IDE and to see its output. This could be as simple as having the IDE configured to have a one-click way of running the traditional Perl module build-and-test commands:

$ perl Makefile.PL
$ make
$ make test

A more advanced version of this feature might involve having the IDE create stub code to test all of the subroutines in an existing file, or to run all of the scripts in a specified directory under Test::Harness, or to run a set of tests using Test::Unit::TestRunner or Test::Unit::TkTestRunner. (The latter provides a GUI testing framework.)

Conclusion and Recommendations

While there are many tools for helping Perl development, the current state of the Perl toolbox is still years behind those of other languages--perhaps three to five years behind, when compared to Java tools. While there are several tools for Java that have all the features described above, virtually none for Perl have all of them. On the other hand, things are looking up; they are better now than a year ago. It's possible to close that gap in a year or two.

A couple of obvious areas where improvements could be somewhat easy are adding more features to EPIC and Komodo. EPIC is open source, so there is potentially a wider pool of talent that could contribute. On the other hand, Komodo has a company with money behind it, so people actually get paid to improve it. Hopefully both tools will get better with time.

Another interesting possibility is the development of new IDEs or adding to existing ones by using Adam Kennedy's PPI module, which provides the ability to parse Perl documents into a reasonable abstract syntax tree and to manipulate the elements and re-compose the document. There is a new Perl editor project, VIP, that is in the design stages and is intended to be "pluggable" and to have special features to support pair programming.

Finally, I've gathered a couple of lists of links for related material. The first list below consist of IDEs and graphical editors for Perl, and the second list consists of various related articles and websites. I hope this is all inspirational and helpful.

Current IDEs for Perl

The listed IDEs support Perl. The list is undoubtedly incomplete, but should form a good starting point for anyone wishing to look into this further.

  • Affrus

    Perl only, Mac OS X only. Closed source (and hence not extensible by users). Primarily designed for CGI and standalone scripts. Free demo available. $99 to purchase. (See the Perl.com review of Affrus to learn more.)

  • Eclipse/EPIC

    EPIC is a plugin for the Eclipse platform. Eclipse is open-source and cross platform (Windows/Mac/Linux/Solaris, etc.). Once you have Eclipse installed, install the EPIC plugin from within the Eclipse application using the EPIC update URL. Eclipse supports Java, and with plugins, C/C++, COBOL, Perl, PHP, UML2, Python, Ruby, XML, and more. There is a large and active community around Eclipse.

  • Emacs is the mother of all text-editor/development-environment/adventure-game/all-in-one tools. Expert programmers use it widely and there are numerous enhancements for working with particular languages, including, of course, Perl. Emacs, with CPerlMode, is a richly featured IDE for Perl, albeit a non-GUI IDE (which, for some people, makes it even better). A set of extensions for CPerlMode are available but you need to join the Yahoo Extreme Perl group to get to them.
  • Komodo

    This runs on Linux, Solaris, and Windows. Free demo; $29.95 for personal and student use, $295 for commercial use. It supports Perl, PHP, Python, Tcl, and XSLT.

  • PAGE

    PAGE runs only on Windows (9x/ME/NT/2000/XP). It is a Rapid Application Development tool for Perl and comes in three versions: Free, Standard ($10), and Enterprise ($50). PAGE provides a several "wizards" for creating scripts, modules (packages), web forms, and even database applications.

  • Perl Editor

    This closed source program runs only on Windows (9x/NT/2000/XP). It has a GUI code profiler, and the Pro version has a regular expression tester and built-in web server (for CGI testing, etc.). Perl Editor claims to have the best debugger on the market. It also comes with GUI tools for managing MySQL databases. $69.95 to purchase.

  • vim

    The well-known descendent of vi is a powerful and flexible text editor with many plugins and extensions. Have a look at the vim scripts ; for example, vim.sourceforge.net/scripts/script.php?script_id=556 and vim.sourceforge.net/scripts/script.php?script_id=281.

  • visiPerl

    This is a closed source application that runs on Win9x/NT/2000. It handles Perl and HTML and has code templates, being designed for website building. visiPerl includes a built-in web server for testing and an FTP client for code deployment. There is a free demo, or you can purchase it for $59.

Related Topics

Independently Parsing Perl


A few years into my programming career, I found myself involved in a somewhat unusual web project for an enormous global IT company. Due to some odd platform issues, we could write the intranet half of the project only in Perl and the almost-identical public internet half only in Java.

In my efforts to pick up enough Java to help my Perl code interoperate with the code from the Java guys, I stumbled on a relatively new editor with the rather expansive name of JetBrains IntelliJ IDEA.

What a joy! It quite simply made learning Java an absolute pleasure, with comprehensive tab completion, light and simple API docs, easy exploration of the mountain of Java classes, and unobtrusive indicators showing me my mistakes and offering to fix them. In short, it had lots of brains and a fast, clean user interface.

Where Is IntelliPerl?

Although I only needed it heavily for a few months, it's been my gold standard ever since, and my definition of what a "great" editor should be. I install every new Perl editor and IDE I come across in the hope that Perl might one day get an editor half as good as what Java has had for years.

These great editors are spreading. Java is now up to one and a half (Eclipse is nearly great but still seems not quite "effortless" enough about what it does). Dreamweaver gave HTML people their great editor years ago, and I've heard that Python may now have something that qualifies.

Interestingly, these great editors seem to share one major thing in common.

How to Build a Great Editor

Rather than relying on the language's parser to examine code, great editors seem to implement special parsers of their own. These parsers treat a file less like code and more like a generic document (that just also happens to be code).

It's a key distinction, and one that provides two critical capabilities.

First, it creates a "round-trip" capability, parsing a file into an internal model and back out again without moving a single white space character out of place. Even if parts of a file are broken or badly formatted, you can still change other parts of the file and save it correctly without it changing anything you don't alter.

Second, it makes the parser extremely safe and error-tolerant. Any code open in an editor is there for a reason--generally because it isn't finished yet, is broken, or needs changing. A document parser can hit a problem, flag it, stumble for a character or so until it finds something it recognizes, and then continue on.

Parsing as code is an entirely different task, and one often unsuited to these type of faults.

For example, take the following.

print "Hello World!\n";
  
}
  
MyModule->foobar;

For an editor using Perl itself to understand this code, it's game over once it hits the naked closing brace, because the code is invalid. Without knowledge of what is below the brace, you lose all of the intelligence that needs the parser: syntax highlighting, module checking, helpful tips, the lot.

It's just simply not a reasonable way to build an editor, where a file can be both unfinished and have dozens of bugs.

Building a Document Parser for Perl

Even without an editor to put it in (yet), a document parser for Perl would be extraordinarily useful for all sorts of tasks. At the time, though, all I really wanted was a really accurate HTML syntax highlighter.

Some time in early 2002, I was bored one afternoon and had a first stab at the problem. The result was pretty predictable, given patterns I've seen in others trying the same thing. It was A) based on regular expressions, and B) useless for anything even remotely interesting.

Between then and the start of The Perl Foundation grant in December 2004, I've spent a day or so a month on the problem, rewriting and throwing away code. I've junked two tokenizers, one lexer, an analysis package, three syntax highlighters, an obfuscation package, a quote engine, and half of the classes in the current object tree.

Now, finally, PPI is complete, bar some minor features and testing. It is 100 percent round-trip safe, and it's been stress tested against the 38,000 (non-Acme) modules in CPAN, handling all but 28 of the most broken and bizarre.

What Does It Do?

PPI should be the basis for any task where you need to parse, analyze or manipulate Perl, and it finally provides a platform for doing these tasks to their full potential. This covers a huge range of possible tasks; far too many to cover in any depth here.

For this article, I want to demonstrate how PPI can improve existing tools that currently only do a very basic job, when there is the potential for so much more.

One of these is part of the PAR application-packaging module. When PAR bundles a module into the internal include directory, it tries to reduce the size of the modules by stripping out POD. Of course, what would be better would be to strip out everything that is excess and cut PAR file sizes even more.

This is a form of compression, but given the potential confusion in using something like "Compress::Perl" as a name, I'm picking my own term. I hereby anoint the term "Squish". A squished module occupies as little space as possible, having had redundant characters removed. It will be extremely small, although it might look a little "squished" to look at :)

Perl::Squish

Rather than showing you the final project, I prefer to show the process of squishing a single module.

# Where is File::Spec on our system?
use Class::Inspector;
my $filename = Class::Inspector->resolved_filename( 'File::Spec' );

# Load File::Spec as a document
use PPI;
my $Document = PPI::Document->new( $filename );

Everything you do with PPI starts and finishes with PPI::Document objects. If you find yourself using the lexer directly, you are probably doing something wrong.

Where can I start cutting out the fat? For starters, many core modules have an __END__ section.

# Get the (one and only) __END__ section
my $End = $Document->find_first( 'Statement::End' );
  
# Delete it from the document
$End->delete if $End;

PPI provides a set of search methods that you can use on any element that has children. find_first is a safe guess, because there can only be one __END__ section. The search methods actually take &wanted functions like File::Find, so 'Statement::End' is really syntactic sugar for:

sub wanted {
    my ($Document, $Element) = @_;
    $Element->isa('PPI::Statement::End');
}

Of course, there's a faster way to do the same thing. The prune method finds and immediately deletes all elements that match a particular condition.

# Delete all comments and POD
$Document->prune( 'Token::Pod' );
$Document->prune( 'Token::Comment' );

For a more serious example, here's how to strip the non-compulsory braces from ->method():

# Remove useless braces
$Document->prune( sub {
    my $Braces = $_[1];
    $Braces->isa('PPI::Structure::List')      or return '';
    $Braces->children == 0                    or return '';
    my $Method = $Braces->sprevious_sibling   or return '';
    $Method->isa('PPI::Token::Word')          or return '';
    $Method->content !~ /:/                   or return '';
    my $Operator = $Method->sprevious_sibling or return '';
    $Operator->isa('PPI::Token::Operator')    or return '';
    $Operator->content eq '->'                or return '';
    return 1;
    } );

It's a little bit wordy, but is relatively straightforward to write. Just add conditions and discard as you go. You can get other elements, calculate anything or call sub-searches.

When you have finished, be sure to save the file.

# Save the file
$Document->save( "$filename.squish" );

Wrapping It All Up

All you need to do now is is wrap it all up in some typical module boilerplate.

package Perl::Squish;
  
use strict;
use PPI;
  
our $VERSION = '0.01';
  
# Squish a file in place
# Perl::Squish->file( $filename )
sub file {
    my ($class, $file) = @_;
    my $Document = PPI::Document->new( $file ) or return undef;
    $class->document( $Document ) or return undef;
    $Document->save( $file );
}
  
# Squish a document object
# Perl::Squish->document( $Document );
sub document {
    my ($squish, $Document) = @_;
      
    # Remove the stuff we did earlier
    $Document->prune('Statement::End');
    $Document->prune('Token::Comment');
    $Document->prune('Token::Pod');
      
    $Document->prune( sub {
        my $Braces = $_[1];
        $Braces->isa('PPI::Structure::List')      or return '';
        $Braces->elements == 0                    or return '';
        my $Method = $Braces->sprevious_sibling   or return '';
        $Method->isa('PPI::Token::Word')          or return '';
        $Method->content !~ /:/                   or return '';
        my $Operator = $Method->sprevious_sibling or return '';
        $Operator->isa('PPI::Token::Operator')    or return '';
        $Operator->content eq '->'                or return '';
        return 1;
        } );

    # Let's also do some whitespace cleanup
    my @whitespace = $Document->find('Token::Whitespace');
    foreach ( @whitespace ) {
        $_->{content} = $_->{content} =~ /\n/ ? "\n" : " ";
    }
      
    1;
}
  
1;

Finally, the last step is to wrap it all up as a proper module. You can see the finished product prettied up with PPI's syntax highlighter at CPAN::Squish. I've added a few additional small features to the basic code described above, but you get the idea. See also Perl::Squish for more details.

In 15 minutes, I've knocked together a pretty simple module that dramatically improves on what you could do without something like PPI. Now imagine the hard things it makes possible.

More Lightning Articles

Customizing Emacs with Perl

by Bob DuCharme

Over time, I've accumulated a list of Emacs customizations I wanted to implement when I got the chance. For example, I'd like macros to perform certain global replaces just within a marked block, and I'd like a macro to reformat an Outlook formatted date to an ISO 8609 formatted date. I'm not overly intimidated by the elisp language used to customize Emacs behavior; I've copied elisp code and modified it to make some tweaks before, I had a healthy dose of Scheme and LISP programming in school, and I've done extensive work with XSLT, a descendant of these grand old languages. Still, as with a lot of postponed editor customization work, I knew I'd have to use these macros many, many times before they earned back the time invested in creating them, because I wasn't that familiar with string manipulation and other basic operations in a LISP-based language. I kept thinking to myself, "This would be so easy if I could just do the string manipulation in Perl!"

Then, I figured out how I could write Emacs functions that called Perl to operate on a marked block (or, in Emacs parlance, a "region"). Many Emacs users are familiar with the Escape+| keystroke, which invokes the shell-command-on-region function. It brings up a prompt in the minibuffer where you enter the command to run on the marked region, and after you press the Enter key Emacs puts the command's output in the minibuffer if it will fit, or into a new "*Shell Command Output*" buffer if not. For example, after you mark part of an HTML file you're editing as the region, pressing Escape+| and entering wc (for "word count") at the minibuffer's "Shell command on region:" prompt will feed the text to this command line utility if you have it in your path, and then display the number of lines, words, and characters in the region at the minibuffer. If you enter sort at the same prompt, Emacs will run that command instead of wc and display the result in a buffer.

Entering perl /some/path/foo.pl at the same prompt will run the named Perl script on the marked region and display the output appropriately. This may seem like a lot of keystrokes if you just want to do a global replace in a few paragraphs, but remember: Ctrl+| calls Emacs's built-in shell-command-on-region function, and you can call this same function from a new function that you define yourself. My recent great discovery was that along with parameters identifying the region boundaries and the command to run on the region, shell-command-on-region takes an optional parameter that lets you tell it to replace the input region with the output region. When you're editing a document with Emacs, this allows you to pass a marked region outside of Emacs to a Perl script, let the Perl script do whatever you like to the text, and then Emacs will replace the original text with the processed version. (If your Perl script mangled the text, Emacs' excellent undo command can come to the rescue.)

Consider an example. When I take notes about a project at work, I might write that Joe R. sent an e-mail telling me that a certain system won't need any revisions to handle the new data. I want to make a note of when he told me this, so I copy and paste the date from the e-mail he sent. We use Microsoft Outlook at work, and the dates have a format following the model "Tue 2/22/2005 6:05 PM". I already have an Emacs macro bound to alt+d to insert the current date and time (also handy when taking notes) and I wanted the date format that refers to e-mails to be the same format as the ones inserted with my alt+d macro: an ISO 8609 format of the form "2005-02-22T18:05".

The .emacs startup file holds customized functions that you want available during your Emacs session. The following shows a bit of code that I put in mine so that I could convert these dates:

(defun OLDate2ISO ()
  (interactive)
  (shell-command-on-region (point)
         (mark) "perl c:/util/OLDate2ISO.pl" nil t))

The (interactive) declaration tells Emacs that the function being defined can be invoked interactively as a command. For example, I can enter "OLDate2ISO" at the Emacs minibuffer command prompt, or I can press a keystroke or select a menu choice bound to this function. The point and mark functions are built into Emacs to identify the boundaries of the currently marked region, so they're handy for the first and second arguments to shell-command-on-region, which tell it which text is the region to act on. The third argument is the actual command to execute on the region; enter any command available on your operating system that can accept standard input. To define your own Emacs functions that call Perl functions, just change the script name in this argument from OLDate2ISO to anything you like and then change this third argument to shell-command-on-region to call your own Perl script.

Leave the last two arguments as nil and t. Don't worry about the fourth parameter, which controls the buffer where the shell output appears. (Setting it to nil means "don't bother.") The fifth parameter is the key to the whole trick: when non-nil, it tells Emacs to replace the marked text in the editing buffer with the output of the command described in the third argument instead of sending the output to a buffer.

If you're familiar with Perl, there's nothing particularly interesting about the OLDate2ISO.pl script. It does some regular expression matching to split up the string, converts the time to a 24 hour clock, and rearranges the pieces:

# Convert Outlook format date to ISO 8309 date 
#(e.g. Wed 2/16/2005 5:27 PM to 2005-02-16T17:27)
while (<>) {
  if (/\w+ (\d+)\/(\d+)\/(\d{4}) (\d+):(\d+) ([AP])M/) {
     $AorP = $6;
     $minutes = $5;
     $hour = $4;
     $year = $3;
     $month = $1;
     $day = $2;
     $day = '0' . $day if ($day < 10);
     $month = '0' . $month if ($month < 10);
     $hour = $hour + 12 if ($6 eq 'P');
     $hour = '0' . $hour if ($hour < 10);
     $_ = "$year-$month-$day" . "T$hour:$minutes";
  }
  print;
}

When you start up Emacs with a function definition like the defun OLDate2ISO one shown above in your .emacs file, the function is available to you like any other in Emacs. Press Escape+x to bring up the Emacs minibuffer command line and enter "OLDate2ISO" there to execute it on the currently marked buffer. Like any other interactive command, you can also assign it to a keystroke or a menu choice.

There might be a more efficient way to do the Perl coding shown above, but I didn't spend too much time on it. That's the beauty of it: with five minutes of Perl coding and one minute of elisp coding, I had a new menu choice to quickly do the transformation I had always wished for.

Another example of something I always wanted is the following txt2htmlp.pl script, which is useful after plugging a few paragraphs of plain text into an HTML document:

# Turn lines of plain text into HTML p elements.
while (<>) {
  chop($_);
  # Turn ampersands and < into entity references.
  s/\&/\&amp\;/g;
  s/</\&lt\;/g;
  # Wrap each non-blank line in a "p" element.
  print "<p>$_</p>\n\n" if (!(/^\s*$/));
}

Again, it's not a particularly innovative Perl script, but with the following bit of elisp in my .emacs file, I have something that greatly speeds up the addition of hastily written notes into a web page, especially when I create an Emacs menu choice to call this function:

(defun txt2htmlp ()
  (interactive)
  (shell-command-on-region (point) 
         (mark) "perl c:/util/txt2htmlp.pl" nil t))

Sometimes when I hear about hot new editors, I wonder whether they'll ever take the place of Emacs in my daily routine. Now that I can so easily add the power of Perl to my use of Emacs, it's going to be a lot more difficult for any other editor to compete with Emacs on my computer.

Debug Your Programs with Devel::LineTrace

by Shlomi Fish

Often, programmers find a need to use print statements to output information to the screen, in order to help them analyze what went wrong in running the script. However, including these statements verbatim in the script is not such a good idea. If not promptly removed, these statements can have all kinds of side-effects: slowing down the script, destroying the correct format of its output (possibly ruining test-cases), littering the code, and confusing the user. It would be a better idea not to place them within the code in the first place. How, though, can you debug without debugging?

Enter Devel::LineTrace, a Perl module that can assign portions of code to execute at arbitrary lines within the code. That way, the programmer can add print statements in relevant places in the code without harming the program's integrity.

Verifying That use lib Has Taken Effect

One example I recently encountered was that I wanted to use a module I wrote from the specialized directory where I placed it, while it was already installed in the Perl's global include path. I used a use lib "./MyPath" directive to make sure this was the case, but now had a problem. What if there was a typo in the path of the use lib directive, and as a result, Perl loaded the module from the global path instead? I needed a way to verify it.

To demonstrate how Devel::LineTrace can do just that, consider a similar script that tries to use a module named CGI from the path ./MyModules instead of the global Perl path. (It is a bad idea to name your modules after names of modules from CPAN or from the Perl distribution, but this is just for the sake of the demonstration.)

#!/usr/bin/perl -w

use strict;
use lib "./MyModules";

use CGI;

my $q = CGI->new();

print $q->header();

Name this script good.pl. To test that Perl loaded the CGI module from the ./MyModules directory, direct Devel::LineTrace to print the relevant entry from the %INC internal variable, at the first line after the use CGI one.

To do so, prepare this file and call it test-good.txt:

good.pl:8
    print STDERR "\$INC{CGI.pm} == ", $INC{"CGI.pm"}, "\n";

Place the file and the line number at which the trace should be inserted on the first line. Then comes the code to evaluate, indented from the start of the line. After the first trace, you can put other traces, by starting the line with the filename and line number, and putting the code in the following (indented) lines. This example is simple enough not to need that though.

After you have prepared test-good.txt, run the script through Devel::LineTrace by executing the following command:

$ PERL5DB_LT="test-good.txt" perl -d:LineTrace good.pl

(This assumes a Bourne-shell derivative.). The PERL5DB_LT environment variable contains the path of the file to use for debugging, and the -d:LineTrace directive to Perl instructs it to debug the script through the Devel::LineTrace package.

As a result, you should see either the following output to standard error:

$INC{CGI.pm} == MyModules/CGI.pm

meaning that Perl indeed loaded the module from the MyModules sub-directory of the current directory. Otherwise, you'll see something like:

$INC{CGI.pm} == /usr/lib/perl5/vendor_perl/5.8.4/CGI.pm

...which means that it came from the global path and something went wrong.

Limitations of Devel::LineTrace

Devel::LineTrace has two limitations:

  1. Because it uses the Perl debugger interface and stops at every line (to check whether it contains a trace), program execution is considerably slower when the program is being run under it.
  2. It assigns traces to line numbers, and therefore you must update it if the line numbering of the file changes.

Nevertheless, it is a good solution for keeping those pesky print statements out of your programs. Happy LineTracing!

Using Test::MockDBI

by Mark Leighton Fisher

What if you could test your program's use of the DBI just by creating a set of rules to guide the DBI's behavior—without touching a database (unless you want to)? That is the promise of Test::MockDBI, which by mocking-up the entire DBI API gives you unprecedented control over every aspect of the DBI's interface with your program.

Test::MockDBI uses Test::MockObject::Extends to mock all of the DBI transparently. The rest of the program knows nothing about using Test::MockDBI, making Test::MockDBI ideal for testing programs that you are taking over, because you only need to add the Test::MockDBI invocation code— you do not have to modify any of the other program code. (I have found this very handy as a consultant, as I often work on other people's code.)

Rules are invoked when the current SQL matches the rule's SQL pattern. For finer control, there is an optional numeric DBI testing type for each rule, so that a rule only fires when the SQL matches and the current DBI testing type is the specified DBI testing type. You can specify this numeric DBI testing type (a simple integer matching /^\d+$/) from the command line or through Test::MockDBI::set_dbi_test_type(). You can also set up rules to fail a transaction if a specific DBI::bind_param() parameter is a specific value. This means there are three types of conditions for Test::MockDBI rules:

  • The current SQL
  • The current DBI testing type
  • The current bind_param() parameter values

Under Test::MockDBI, fetch*() and select*() methods default to returning nothing (the empty array, the empty hash, or undef for scalars). Test::MockDBIM lets you take control of their returned data with the methods set_retval_scalar() and set_retval_array(). You can specify the returned data directly in the set_retval_*() call, or pass a CODEREF that generates a return value to use for each call to the matching fetch*() or select*() method. CODEREFs let you both simulate DBI's interaction with the database more accurately (as you can return a few rows, then stop), and add in any kind of state machine or other processing needed to precisely test your code.

When you need to test that your code handles database or DBI failures, bad_method() is your friend. It can fail any DBI method, with the failures dependent on the current SQL and (optionally) the current DBI testing type. This capability is necessary to test code that handles bad database UPDATEs, INSERTs, or DELETEs, along with being handy for testing failing SELECTs.

Test::MockDBI extends your testing capabilities to testing code that is difficult or impossible to test on a live, working database. Test::MockDBI's mock-up of the entire DBI API lets you add Test::MockDBI to your programs without having to modify their current DBI code. Although it is not finished (not all of the DBI is mocked-up yet), Test::MockDBI is already a powerful tool for testing DBI programs.

Unnecessary Unbuffering

by chromatic

A great joy in a programmer's life is removing useless code, especially when its absence improves the program. Often this happens in old codebases or codebases thrown together hastily. Sometimes it happens in code written by novice programmers who try several different ideas all together and fail to undo their changes.

One such persistent idiom is wholesale, program-wide unbuffering, which can take the form of any of:

local $| = 1;
$|++;
$| = 1;

Sometimes this is valuable. Sometimes it's vital. It's not the default for very good reason, though, and at best, including one of these lines in your program is useless code.

What's Unbuffering?

By default, modern operating systems don't send information to output devices directly, one byte at a time, nor do they read information from input devices directly, one byte at a time. IO is so slow, especially for networks, compared to processors and memory that adding buffers and trying to fill them before sending and receiving information can improve performance.

Think of trying to fill a bathtub from a hand pump. You could pump a little water into a bucket and walk back and forth to the bathtub, or you could fill a trough at the pump and fill the bucket from the trough. If the trough is empty, pumping a little bit of water into the bucket will give you a faster start, but it'll take longer in between bucket loads than if you filled the trough at the start and carried water back and forth between the trough and the bathtub.

Information isn't exactly like water, though. Sometimes it's more important to deliver a message immediately even if it doesn't fill up a bucket. "Help, fire!" is a very short message, but waiting to send it when you have a full load of messages might be the wrong thing.

That's why modern operating systems also let you unbuffer specific filehandles. When you print to an unbuffered filehandle, the operating system will handle the message immediately. That doesn't guarantee that whoever's on the other side of the handle will respond immediately; there might be a pump and a trough there.

What's the Damage?

According to Mark-Jason Dominus' Suffering from Buffering?, one sample showed that buffered reading was 40% faster than unbuffered reading, and buffered writing was 60% faster. The latter number may only improve when considering network communications, where the overhead of sending and receiving a single packet of information can overwhelm short messages.

In simple interactive applications though, there may be no benefit. When attached to a terminal, such as a command line, Perl operates in line-buffered mode. Run the following program and watch the output carefully:

#!/usr/bin/perl

use strict;
use warnings;

# buffer flushed at newline
loop_print( 5, "Line-buffered\n" );

# buffer not flushed until newline
loop_print( 5, "Buffered  " );
print "\n";

# buffer flushed with every print
{
    local $| = 1;
    loop_print( 5, "Unbuffered  " );
}

sub loop_print
{
    my ($times, $message) = @_;

    for (1 .. $times)
    {
        print $message;
        sleep 1;
    }
}

The first five greetings appear individually and immediately. Perl flushes the buffer for STDOUT when it sees the newlines. The second set appears after five seconds, all at once, when it sees the newline after the loop. The third set appears individually and immediately because Perl flushes the buffer after every print statement.

Terminals are different from everything else, though. Consider the case of writing to a file. In one terminal window, create a file named buffer.log and run tail -f buffer.log or its equivalent to watch the growth of the file in real time. Then add the following lines to the previous program and run it again:

open( my $output, '>', 'buffer.log' ) or die "Can't open buffer.log: $!";
select( $output );
loop_print( 5, "Buffered\n" );
{
      local $| = 1;
      loop_print( 5, "Unbuffered\n" );
}

The first five messages appear in the log in a batch, all at once, even though they all have newlines. Five messages aren't enough to fill the buffer. Perl only flushes it when it unbuffers the filehandle on assignment to $|. The second set of messages appear individually, one second after another.

Finally, the STDERR filehandle is hot by default. Add the following lines to the previous program and run it yet again:

select( STDERR );
loop_print( 5, "Unbuffered STDERR " );

Though no code disables the buffer on STDERR, the five messages should print immediately, just as in the other unbuffered cases. (If they don't, your OS is weird.)

What's the Solution?

Buffering exists for a reason; it's almost always the right thing to do. When it's the wrong thing to do, you can disable it. Here are some rules of thumb:

  • Never disable buffering by default.
  • Disable buffering when and while you have multiple sources writing to the same output and their order matters.
  • Never disable buffering for network outputs by default.
  • Disable buffering for network outputs only when the expected time between full buffers exceeds the expected client timeout length.
  • Don't disable buffering on terminal outputs. For STDERR, it's useless, dead code. For STDOUT, you probably don't need it.
  • Disable buffering if it's more important to print messages regularly than efficiently.
  • Don't disable buffering until you know that the buffer is a problem.
  • Disable buffering in the smallest scope possible.
Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02