Recently in Tools Category

Perl Needs Better Tools


Perl is in danger of becoming a fading language--new programmers are learning Java and Python in college, and companies like Google hardly use Perl at all. If you are afraid that Perl may be in danger of becoming irrelevant for medium-to-large projects, then read on.

The Scary Part

I have discussed the future of Perl with managers from companies that currently use it and find that they worry about the future of Perl. One company I spoke with here in San Francisco is rewriting their core application in Java. Another company worries they will not be able to find new Perl programmers down the road. Yet another uses Perl for major projects, but suffers from difficulty in refactoring their extensive code base.

There are many reasons why companies care about the future of Perl. I offer a part of a solution: better tools for Perl can be a major part of keeping Perl relevant and effective as the primary language for medium and large projects.

When measuring the effectiveness of a development environment (people, language, tools, processes, etc.), a key measure is how expensive and painful it is to make changes to existing code. Once a project or system has grown to thousands of lines of code in dozens (or hundreds) of modules, the cost of making changes can escalate to the point where the team is afraid to make any significant change. Excellent tools are one of the ways to avoid this unhappy situation, or at least reduce its impact. Other factors are excellent processes and, of course, excellent people.

21st-Century Integrated Development Environments for Perl

I propose that more, high-quality development tools will help keep Perl relevant and alive in medium and large project environments. My focus in this article is on IDEs, or Integrated Development Environments, and primarily those with a graphical interface.

An IDE is an integrated set of tools for programming, combining a source code editor with a variety of other tools into a single package. Common features of modern IDEs include refactoring support, version control, real-time syntax checking, and auto-completion of code while typing.

I want to make it clear right at the outset that a team of highly skilled Perl programmers, using only tools that have been around for years (such as emacs, vi, cvs, and make) can and do build large, sophisticated, and successful projects. I am not worried about those programmers. I am worried about the larger population of programmers with one to five years of experience, and those who have not yet begun to program: the next generation of Perl programmers.

Great tools will not make a bad programmer into a good programmer, but they will certainly make a good programmer better. Unfortunately, the tools for Perl are years behind what is available for other languages, particularly Java.

One powerful example is the lack of graphical IDEs for Perl with excellent support for refactoring. Several IDEs for Java have extensive refactoring support. Only one for Perl, the EPIC plugin for Eclipse, supports even a single refactoring action.

For an example of how good IDEs have inspired at least one Perl developer, see Adam Kennedy's Perl.com article on his new PPI module and Scott Sotka's Devel::Refactor module (used in EPIC).

I acknowledge that a graphical IDE is not the be-all of good tools. Just as some writers reject word processors in favor of typewriters or hand-written manuscripts, some programmers reject graphical IDEs and would refuse a job that required them to use one. Not everyone has (nor should have) the same tool set, and there are things a pencil can do that vi and emacs will never do. That said, IDEs have wide use in businesses doing larger projects, and for many programmers and teams they provide major increases in productivity.

Another important point is that while this article discusses over a dozen specific tools or features, having all the tools in a single package produces the biggest value. An IDE that provides all of these features in a single package that people can easily install, easily extend, and easily maintain across an entire development team has far more value than the sum of its parts.

There is a big win when the features provided by an IDE immediately upon installation include all or almost all of the tools and features discussed here and where the features "know" about each other. For example, it is good if you enter the name of a non-existent subroutine and the real-time syntax checker catches this. It is much better if the code-assist feature then pops up a context menu offering to create a stub for the subroutine or to correct the name to that of an existing similar subroutine or method from another class that is available to the current file. (This is standard behavior for some Java IDEs.)

What Would a 21st-Century Perl Tool Set Contain?

Perl needs a few great IDEs--not just one, but more than one so that people have a diverse set to choose from. Perl deserves and needs a few great IDEs that lead the pack and set the standard for IDEs in other languages.

I am well aware that the dynamic nature of Perl makes it harder to have a program that can read and understand a Perl program, especially a large and complex one, but the difficulty in comprehending a Perl program makes the value of such a tool all the greater, and I have faith that the Perl community can overcome some of the built-in challenges of Perl. Indeed, it is among the greatest strengths of Perl that Perl users can adapt the language to their needs.

A great Perl IDE will contain at least the following, plus other features I haven't thought of. (And, of course, there must be many of those!)

Most of the screen shot examples in this article use the EPIC Perl IDE. At present, it has the largest amount of the features on my list (although it certainly doesn't have all of them).

Syntax-Coloring Text Editor

Most of you have probably seen this. It is available under vim, emacs, BBEdit, and TextPad. Just about every decent text editor will colorize source code so that keywords, operators, variables, etc., each have their own color, making it easier to spot syntax errors such as forgetting to close a quote pair.

Real-Time Syntax Checking

real-time syntax check example
Figure 1. Real-time syntax checking

The IDE in Figure 1 shows that line 4 has an error because of the missing ) and that line 5 has an error because there is no declaration of $naame (and use strict is in effect).

A key point here is that the IDE shows these errors right away, before you save and compile the code. (In this example, the EPIC IDE lets you specify how often to run the syntax check, from 0.01 to 10.00 seconds of idle time, or only on demand.)

As nice as this is, it would be even better if the IDE also offered ways to fix the problem, for example, offering to change $naame to $name. Figure 2 shows an IDE that does exactly that; unfortunately, for Java, not Perl.

syntax help from the IDE
Figure 2. Syntax help from the IDE

It would be great if Perl IDEs offered this kind of help.

Version Control Integration

All non-insane large projects use version control software. The most common version control software systems are probably CVS, Perforce, Subversion, and Visual SourceSafe. Figure 3 shows an IDE comparing the local version of a file to an older version from the CVS repository.

Figure 3
Figure 3. Comparing a local file to an older version in CVS--click image for full-size screen shot

CVS integration is available in many modern code editors, including emacs, vim, and BBEdit, as well as graphical IDEs such as Eclipse and Komodo Pro. Subversion integration is available as a plugin for Eclipse; Komodo Pro supports Perforce and Subversion.

A Code-Assist Editor

Suppose that you have just typed in an object reference and want to call a method on the object, but you are not sure what the method name is. Wouldn't it be nice if the editor popped up a menu listing all of the methods available for that object? It might look something like Figure 4.

automatic code completion
Figure 4. Automatic code completion

In this example, the IDE is able to figure out which class the object $q is an instance of and lists the names of the available methods. If you type a p, then the list shows only the method names beginning with p. If you type pa, then the list shows only the param() and parse_params() methods.

Excellent Refactoring Support

The easier it is to do refactoring, the more often people will do it. The following list contains the most common refactorings. Your personal list will probably be a little different. All of these are things you can do "manually," but the idea is to make them into one or two-click operations so that you will do them much more often. (For a extensive list of refactoring operations, see Martin Fowler's alphabetical list of refactorings.)

Extract Subroutine/Method

The IDE should create a new subroutine using the selected code and replace the selected code with a call to the new subroutine, with the proper parameters. Here's an example of using the Extract Subroutine refactoring from Eclipse/EPIC (which uses the Devel::Refactor module).

First, you select a chunk of code to turn into a new subroutine, and then select Extract Subroutine from a context menu. You then get the a dialog box asking for the name of the new subroutine (shown in Figure 5).

code before Extract
Subroutine refactoring
Figure 5. Code before Extract Subroutine refactoring

The IDE replaces the selected code with a call to the new subroutine, making reasonable guesses about the parameters and return values (Figure 6). You may need to clean up the result manually.

code after Extract
Subroutine refactoring
Figure 6. Code after Extract Subroutine

Figure 7 shows the new subroutine created by the IDE. In this case, it needs no changes, but sometimes you will need to adjust the parameters and/or return value(s).

the new subroutine
created by Extract Subroutine
Figure 7. The new subroutine created by Extract Subroutine

Ideally, the editor should prompt you to replace similar chunks of code with calls to the new subroutine.

Rename Subroutine/Method

The IDE should find all the calls to the subroutine throughout your project and offer to change them for you. You should be able to see a preview of all of the places a change could occur, and to accept or reject each one on a case-by-case basis. The action should be undoable.

Rename Variable

Like Rename Subroutine, this feature should find all occurrences throughout the project and offer to make the changes for you.

Change Subroutine/Method Signature

The IDE should be able to make reasonable guesses about whether each subroutine or method call is supplying the proper parameters. Partly this is to enable the real-time syntax checking mentioned above, and partly this is to enable you to select a subroutine declaration and tell the IDE you want to refactor it by adding or removing a parameter. The IDE should then prompt you for the change(s) you want to make, do its best to find all of the existing calls to the subroutine, and offer to correct the subroutine calls to supply the new parameters.

Obviously, this is an especially tricky thing to do in Perl, where subroutines fish their parameters out of @_. So the IDE would have to look carefully at how the code uses shift, @_, and $_[] in order to have a reasonable guess about the parameters the subroutine is expecting. In many common cases, though, a Perl IDE could make a reasonable guess about the parameters, such as in the following two examples, so that if you added or removed one, it could immediately prompt you about making corrections throughout the project:

sub doSomething {
    my $gender = shift;
    my $age    = shift;
    # Not too terribly hard to guess that $gender and $age are params
}

sub anotherThing {
    my ($speed,$direction) = @_;
    # No magic needed to guess $speed and $direction are params.
}
Move Subroutine/Method

This refactoring operation should give you a list or dialog box to choose the destination file in your project. The IDE should allow you to preview all of the changes that it would make to accomplish the move, which will include updating a call to the subroutine/method to use the proper class. At a minimum, the IDE should show you or list all of the calls to the subroutine so you can make the appropriate changes yourself. Ideally, the IDE should make a guess about possible destinations; for example, if $self is a parameter to the method being moved, then the IDE might try assuming the method is an object (instance) method and initially only list destination classes that inherit from the source class, or from which the source class inherits.

Change a Package Name

As with Rename Subroutine and Rename Variable, when changing a package name, the IDE should offer to update all existing references throughout your project.

Tree View and Navigation of Source Files and Resources

Another useful feature of good IDEs is being able to view all of the code for a project, or multiple projects, in a tree format, where you can "fold" and "unfold" the contents of folders. All of the modern graphical IDEs support this, even with multiple projects in different languages.

Being able to view your project in this manner gives you both a high-level overview and the ability to drill down into specific files, and to mix levels of detail by having some folders show their contents and some not.

For example, Figure 8 shows a partial screen shot from ActiveState's Komodo IDE.

tree view of code in Komodo
Figure 8. Tree view of code in Komodo

Support for Creating and Running Unit Tests

Anyone who has installed Perl modules from CPAN has seen unit tests--these are the various, often copious, tests that run when you execute the make test part of the installation process. The vast majority of CPAN modules include a suite of tests, often using the Test::Harness and/or Test::More modules. A good IDE will make it very easy to both create and run unit tests as you develop your project.

The most basic form of support for unit tests in an IDE is simply to make it easy to execute arbitrary scripts from within the IDE. Create a test.pl for your project and keep adding tests to it or to a t/ subdirectory as you develop, and keep running the script as you make changes. All modern IDEs provide at least this minimal capability.

A more sophisticated level of support for unit tests might resemble the Java IDE feature for tests written in JUnit, where you can select an existing class file (a .pm file in Perl) and ask the IDE to create a set of stub tests for every subroutine in the file. (See JUnit and the Perl module Test::Unit for more on unit tests.) Furthermore, the IDE should support running a set of tests and giving simple visual feedback on what passed/failed. The standard approach in the JUnit world is to show either a "green bar" (all passed) or "red bar" (something failed) and then allow you to see details on failures. Other nice-to-have features include calculating code-coverage, providing statistical summaries of tests, etc.

Figure 9 shows a successful run of a Java test suite with Eclipse.

JUnit test run, success
Figure 9. A successful JUnit test run

Figure 10 shows the same test run, this time with a failure.

JUnit test run, with a failure.
Figure 10. A JUnit test run with a failure

A stack trace of the failure message appears in another part of the window (cropped out here to save space). If you double-click on the test that failed (testInflate), the IDE will open the file (BalloonTest, in this case) and navigate to the test function.

The central idea is that the IDE should make it as painless as possible to add and modify and run tests, so you will do more of it during development.

Language-Specific Help

This is a fairly straightforward idea--the IDE should be able to find and display the appropriate documentation for any keyword in your code, so if you highlight push and ask for help, you should see the push entry from the Perl documentation. If you highlight a method or subroutine or other symbol name from an imported module, the IDE should display the module's documentation for the selected item. Of course, this requires that the documentation be available in a consistent, machine-readable form, which is only sometimes true.

Debugger with Real-Time Display of Results

All modern IDEs offer support for running your code under a debugger, usually with visual display of what's going on, including the state of variables. The Komodo IDE supports debugging Perl that is running either locally or remotely.

Typical support for debugging in an IDE includes the ability to set breakpoints, monitor the state of variables, etc. Basically, the IDE should provide support for all of the features of the debugger itself. Graphical IDEs should provide a visual display of what is going on.

Automatic Code Reformatting

This means automatically or on-demand re-indenting and other reformatting of code. For example, when you cut and paste a chunk of code, the IDE should support reformatting the chunk to match the indentation of its new location. If you change the number of spaces or tabs for each level of indentation, or your convention for the placement of curly braces, then the IDE should support adjusting an entire file or all files in your project.

Seamless Handling of Multiple Languages

Many large software projects involve multiple languages. This is almost universally true in the case of web applications, where the user interface typically uses HTML, CSS, and JavaScript, and the back end uses one or more of Perl, PHP, Java, Python, Ruby, etc. It is very helpful to have development tools that seamlessly integrate work done in all of the languages. This is becoming quite common. For example, both Komodo and Eclipse support multiple languages.

Automated Building and Testing

This feature can be very basic by making it easy to run an arbitrary script from within the IDE and to see its output. This could be as simple as having the IDE configured to have a one-click way of running the traditional Perl module build-and-test commands:

$ perl Makefile.PL
$ make
$ make test

A more advanced version of this feature might involve having the IDE create stub code to test all of the subroutines in an existing file, or to run all of the scripts in a specified directory under Test::Harness, or to run a set of tests using Test::Unit::TestRunner or Test::Unit::TkTestRunner. (The latter provides a GUI testing framework.)

Conclusion and Recommendations

While there are many tools for helping Perl development, the current state of the Perl toolbox is still years behind those of other languages--perhaps three to five years behind, when compared to Java tools. While there are several tools for Java that have all the features described above, virtually none for Perl have all of them. On the other hand, things are looking up; they are better now than a year ago. It's possible to close that gap in a year or two.

A couple of obvious areas where improvements could be somewhat easy are adding more features to EPIC and Komodo. EPIC is open source, so there is potentially a wider pool of talent that could contribute. On the other hand, Komodo has a company with money behind it, so people actually get paid to improve it. Hopefully both tools will get better with time.

Another interesting possibility is the development of new IDEs or adding to existing ones by using Adam Kennedy's PPI module, which provides the ability to parse Perl documents into a reasonable abstract syntax tree and to manipulate the elements and re-compose the document. There is a new Perl editor project, VIP, that is in the design stages and is intended to be "pluggable" and to have special features to support pair programming.

Finally, I've gathered a couple of lists of links for related material. The first list below consist of IDEs and graphical editors for Perl, and the second list consists of various related articles and websites. I hope this is all inspirational and helpful.

Current IDEs for Perl

The listed IDEs support Perl. The list is undoubtedly incomplete, but should form a good starting point for anyone wishing to look into this further.

  • Affrus

    Perl only, Mac OS X only. Closed source (and hence not extensible by users). Primarily designed for CGI and standalone scripts. Free demo available. $99 to purchase. (See the Perl.com review of Affrus to learn more.)

  • Eclipse/EPIC

    EPIC is a plugin for the Eclipse platform. Eclipse is open-source and cross platform (Windows/Mac/Linux/Solaris, etc.). Once you have Eclipse installed, install the EPIC plugin from within the Eclipse application using the EPIC update URL. Eclipse supports Java, and with plugins, C/C++, COBOL, Perl, PHP, UML2, Python, Ruby, XML, and more. There is a large and active community around Eclipse.

  • Emacs is the mother of all text-editor/development-environment/adventure-game/all-in-one tools. Expert programmers use it widely and there are numerous enhancements for working with particular languages, including, of course, Perl. Emacs, with CPerlMode, is a richly featured IDE for Perl, albeit a non-GUI IDE (which, for some people, makes it even better). A set of extensions for CPerlMode are available but you need to join the Yahoo Extreme Perl group to get to them.
  • Komodo

    This runs on Linux, Solaris, and Windows. Free demo; $29.95 for personal and student use, $295 for commercial use. It supports Perl, PHP, Python, Tcl, and XSLT.

  • PAGE

    PAGE runs only on Windows (9x/ME/NT/2000/XP). It is a Rapid Application Development tool for Perl and comes in three versions: Free, Standard ($10), and Enterprise ($50). PAGE provides a several "wizards" for creating scripts, modules (packages), web forms, and even database applications.

  • Perl Editor

    This closed source program runs only on Windows (9x/NT/2000/XP). It has a GUI code profiler, and the Pro version has a regular expression tester and built-in web server (for CGI testing, etc.). Perl Editor claims to have the best debugger on the market. It also comes with GUI tools for managing MySQL databases. $69.95 to purchase.

  • vim

    The well-known descendent of vi is a powerful and flexible text editor with many plugins and extensions. Have a look at the vim scripts ; for example, vim.sourceforge.net/scripts/script.php?script_id=556 and vim.sourceforge.net/scripts/script.php?script_id=281.

  • visiPerl

    This is a closed source application that runs on Win9x/NT/2000. It handles Perl and HTML and has code templates, being designed for website building. visiPerl includes a built-in web server for testing and an FTP client for code deployment. There is a free demo, or you can purchase it for $59.

Related Topics

Perl Command-Line Options

Perl has a large number of command-line options that can help to make your programs more concise and open up many new possibilities for one-off command-line scripts using Perl. In this article we'll look at some of the most useful of these.

Safety Net Options

There are three options I like to think of as a "safety net," as they can stop you from making a fool of yourself when you're doing something particularly clever (or stupid!). And while they aren't ever necessary, it's rare that you'll find an experienced Perl programmer working without them.

The first of these is -c. This option compiles your program without running it. This is a great way to ensure that you haven't introduced any syntax errors while you've been editing a program. When I'm working on a program I never go more than a few minutes without saving the file and running:

  $ perl -c <program>

This makes sure that the program still compiles. It's far easier to fix problems when you've only made a few changes than it is to type in a couple of hundred of lines of code and then try to debug that.

The next safety net is the -w option. This turns on warnings that Perl will then give you if it finds any of a number of problems in your code. Each of these warnings is a potential bug in your program and should be investigated. In modern versions of Perl (since 5.6.0) the -w option has been replaced by the use warnings pragma, which is more flexible than the command-line option so you shouldn't use -w in new code.

The final safety net is the -T option. This option puts Perl into "taint mode." In this mode, Perl inherently distrusts any data that it receives from outside the program's source -- for example, data passed in on the command line, read from a file, or taken from CGI parameters.

Tainted data cannot be used in an expression that interacts with the outside world -- for example, you can't use it in a call to system or as the name of a file to open. The full list of restrictions is given in the perlsec manual page.

In order to use this data in any of these potentially dangerous operations you need to untaint it. You do this by checking it against a regular expression. A detailed discussion of taint mode would fill an article all by itself so I won't go into any more details here, but using taint mode is a very good habit to get into -- particularly if you are writing programs (like CGI programs) that take unknown input from users.

Actually there's one other option that belongs in this set and that's -d. This option puts you into the Perl debugger. This is also a subject that's too big for this article, but I recommend you look at "perldoc perldebug" or Richard Foley's Perl Debugger Pocket Reference.

Command-Line Programs

The next few options I want to look at make it easy to run short Perl programs on the command line. The first one, -e, allows you to define Perl code to be executed by the compiler. For example, it's not necessary to write a "Hello World" program in Perl when you can just type this at the command line.

  $ perl -e 'print "Hello World\n"'

You can have as many -e options as you like and they will be run in the order that they appear on the command line.

  $ perl -e 'print "Hello ";' -e 'print "World\n"'

Notice that like a normal Perl program, all but the last line of code needs to end with a ; character.

Although it is possible to use a -e option to load a module, Perl gives you the -M option to make that easier.

  $ perl -MLWP::Simple -e'print head "http://www.example.com"'

So -Mmodule is the same as use module. If the module has default imports you don't want imported then you can use -m instead. Using -mmodule is the equivalent of use module(), which turns off any default imports. For example, the following command displays nothing as the head function won't have been imported into your main package:

  $ perl -mLWP::Simple -e'print head "http://www.example.com"'

The -M and -m options implement various nice pieces of syntactic sugar to make using them as easy as possible. Any arguments you would normally pass to the use statement can be listed following an = sign.

  $ perl -MCGI=:standard -e'print header'

This command imports the ":standard" export set from CGI.pm and therefore the header function becomes available to your program. Multiple arguments can be listed using quotes and commas as separators.

  $ perl -MCGI='header,start_html' -e'print header, start_html'

In this example we've just imported the two methods header and start_html as those are the only ones we are using.

Implicit Loops

Two other command-line options, -n and -p, add loops around your -e code. They are both very useful for processing files a line at a time. If you type something like:

  $ perl -n -e 'some code' file1

Then Perl will interpret that as:

  LINE:
    while (<>) {
      # your code goes here
    }

Notice the use of the empty file input operator, which will read all of the files given on the command line a line at a time. Each line of the input files will be put, in turn, into $_ so that you can process it. As a example, try:

  $ perl -n -e 'print "$. - $_"' file

This gets converted to:

  LINE:
    while (<>) {
      print "$. - $_"
    }

This code prints each line of the file together with the current line number.

The -p option makes that even easier. This option always prints the contents of $_ each time around the loop. It creates code like this:

  LINE:
    while (<>) {
      # your code goes here
    } continue {
      print or die "-p destination: $!\n";
    }

This uses the little-used continue block on a while loop to ensure that the print statement is always called.

Using this option, our line number generator becomes:

  $ perl -p -e '$_ = "$. - $_"'

In this case there is no need for the explicit call to print as -p calls print for us.

Notice that the LINE: label is there so that you can easily move to the next input record no matter how deep in embedded loops you are. You do this using next LINE.

  $ perl -n -e 'next LINE unless /pattern/; print $_'

Of course, that example would probably be written as:

  $ perl -n -e 'print unless /pattern/'

But in a more complex example, the next LINE construct could potentially make your code easier to understand.

If you need to have processing carried out either before or after the main code loop, you can use a BEGIN or END block. Here's a pretty basic way to count the words in a text file:

  $ perl -ne 'END { print $t } @w = /(\w+)/g; $t += @w' file.txt

Each time round the loop we extract all of the words (defined as contiguous runs of \w characters into @w and add the number of elements in @w to our total variable $t. The END block runs after the loop has completed and prints out the final value in $t.

Of course, people's definition of what constitutes a valid word can vary. The definition used by the Unix wc (word count) program is a string of characters delimited by whitespace. We can simulate that by changing our program slightly, like this:

  $ perl -ne 'END { print $x } @w = split; $x += @w' file.txt

But there are a couple of command-line options that will make that even simpler. Firstly the -a option turns on autosplit mode. In this mode, each input record is split and the resulting list of elements is stored in an array called @F. This means that we can write our word-count program like this:

  $ perl -ane 'END {print $x} $x += @F' file.txt

The default value used to split the record is one or more whitespace characters. It is, of course, possible that you might want to split the input record on another character and you can control this with the -F option. So if we wanted to change our program to split on all non-word characters we could do something like this:

  $ perl -F'\W' -ane 'END {print $x} $x += @F' file.txt

For a more powerful example of what we can do with these options, let's look at the Unix password file. This is a simple, colon-delimited text file with one record per user. The seventh column in this file is the path of the login shell for that user. We can therefore produce a report of the most-used shells on a given system with a command-line script like this:

  $ perl -F':' -ane '$s{$F[6]}++;' \
  > -e 'END { print "$_ : $s{$_}" for keys %s }' /etc/passwd

OK, so it's longer than one line and the output isn't sorted (although it's quite easy to add sorting), but perhaps you can get a sense of the kinds of things that you can do from the command line.

Record Separators

In my previous article I talked a lot about $/ and $\ -- the input and output record separators. $/ defines how much data Perl will read every time you ask it for the next record from a filehandle, and $\ contains a value that is appended to the end of any data that your program prints. The default value of $/ is a new line and the default value of $\ is an empty string (which is why you usually explicity add a new line to your calls to print).

Now in the implicit loops set up by -n and -p it can be useful to define the values of $/ and $\. You could, of course, do this in a BEGIN block, but Perl gives you an easier option with the -0 (that's a zero) and -l (that's an L) command-line options. This can get a little confusing (well, it confuses me) so I'll go slowly.

Using -0 and giving it a hexadecimal or octal number sets $/ to that value. The special value 00 puts Perl in paragraph mode and the special value 0777 puts Perl into file slurp mode. These are the same as setting $/ to an empty string and undef respectively.

Using -l and giving it no value has two effects. Firstly, it automatically chomps the input record, and secondly, it sets $\ equal to $/. If you give -l an octal number (and unlike -0 it doesn't accept hex numbers) it sets $\ to the character represented by that number and also turns on auto-chomping.

To be honest, I rarely use the -0 option and I usually use the -l option without an argument just to add a new line to the end of each line of output. For example, I'd usually write my original "Hello World" example as:

  $ perl -le 'print "Hello World"'

If I'm doing something that requires changing the values of the input and output record separators then I'm probably out of the realm of command-line scripts.

In-Place Editing

With the options that we have already seen, it's very easy to build up some powerful command-line programs. It's very common to see command line programs that use Unix I/O redirection like this:

  $ perl -pe 'some code' < input.txt > output.txt

This takes records from input.txt, carries out some kind of transformation, and writes the transformed record to output.txt. In some cases you don't want to write the changed data to a different file, it's often more convenient if the altered data is written back to the same file.

You can get the appearance of this using the -i option. Actually, Perl renames the input file and reads from this renamed version while writing to a new file with the original name. If -i is given a string argument, then that string is appended to the name of the original version of the file. For example, to change all occurrences of "PHP" to "Perl" in a data file you could write something like this:

  $ perl -i -pe 's/\bPHP\b/Perl/g' file.txt

Perl reads the input file a line at a time, making the substitution, and then writing the results back to a new file that has the same name as the original file -- effectively overwriting it. If you're not so confident of your Perl abilities you might take a backup of the original file, like this:

  $perl -i.bak -pe 's/\bPHP\b/Perl/g' file.txt

You'll end up with the transformed data in file.txt and the original file backed up in file.txt.bak. If you're a fan of vi then you might like to use -i~ instead.

Further Information

Perl has a large number of command-line options. This article has simply listed a few of the most useful. For the full list (and for more information on the ones covered here) see the "perlrun" manual page.

Web Testing with HTTP::Recorder

HTTP::Recorder is a browser-independent recorder that records interactions with web sites and produces scripts for automated playback. Recorder produces WWW::Mechanize scripts by default (see WWW::Mechanize by Andy Lester), but provides functionality to use your own custom logger.

Why Use HTTP::Recorder?

Simply speaking, HTTP::Recorder removes a great deal of the tedium from writing scripts for web automation. If you're like me, you'd rather spend your time writing code that's interesting and challenging, rather than digging through HTML files, looking for the names of forms an fields, so that you can write your automation scripts. HTTP::Recorder records what you do as you do it, so that you can focus on the things you care about.

Automated Testing

We all know that testing our code is good, and that writing automated tests that can be run again and again to check for regressions is even better. However, writing test scripts by hand can be tedious and prone to errors. You're more likely to write tests if it's easy to do so. The biggest obstacle to testing shouldn't be the mechanics of getting the tests written — it should be figuring out what needs to be tested, and how best to test it.

Part of your test suite should be devoted to testing things the way the user uses them, and HTTP::Recorder makes it easy to produce automation to do that, which allows you to put your energy into the parts of your code that need your attention and your expertise.

Automate Repetitive Tasks

When you think about web automation, the first thing you think of may be automated testing, but there are other uses for automation as well:

  • Check your bank balance.
  • Check airline fares.
  • Check movie times.

How to Set It Up

Use It with a Web Proxy

One way to use HTTP::Recorder (as recommended in the POD) is to set it as the user agent of a web proxy (see HTTP::Proxy by Phillipe "BooK" Bruhat). Start the proxy running like this:


    #!/usr/bin/perl

    use HTTP::Proxy;
    use HTTP::Recorder;

    my $proxy = HTTP::Proxy->new();

    # create a new HTTP::Recorder object
    my $agent = new HTTP::Recorder;

    # set the log file (optional)
    $agent->file("/tmp/myfile");

    # set HTTP::Recorder as the agent for the proxy
    $proxy->agent( $agent );

    # start the proxy
    $proxy->start();

    1;

Then, instruct your favorite web browser to use your new proxy for HTTP traffic.

Other Ways to Use It

Since HTTP::Recorder is a subclass of LWP::UserAgent, so you can use it in any way that you can use its parent class.

How to Use It

Once you've set up HTTP::Recorder, just navigate to web pages, follow links, and fill in forms the way you normally do, with the web browser of your choice. HTTP::Recorder will record your actions and produce a WWW::Mechanize script that you can use to replay those actions.

The script is written to a logfile. By default, this file is /tmp/scriptfile, but you can specify another pathname when you set things up. See Configuration Options for information about configuring the logfile.

HTTP::Recorder Control Panel

The HTTP::Recorder control panel allows you to use to view and edit scripts as you create them. By default, you can access the control panel by using the HTTP::Recorder UserAgent to access the control URL. By default, the control URL is http://http-recorder/, but this address is configurable. See Configuration Options for more information about setting the control URL.

The control panel won't automatically refresh , but if you create HTTP::Recorder with showwindow => 1, a JavaScript popup window will be opened and refreshed every time something is recorded.

Goto Page. You can enter a URL in the control panel to begin a recording session. For SSL sessions, the initial URL must be entered into this field rather than into the browser.

Current Script. The current script is displayed in a textfield, which you can edit as you create it. Changes you make in the control panel won't be saved until you click the Update button.

Update. Saves changes made the script via the control panel. If you prefer to edit your script as you create it, you can save your changes as you make them.

Clear. Deletes the current script and clears the text field.

Reset. Reverts the text field to the currently saved version of the script. Any changes you've made to the script won't be applied if you haven't clicked Update.

Download. Displays a plain text version of the script, suitable for saving.

Close. Closes the window (using JavaScript).

Updating Scripts as They're Recorded

You can record many things, and then turn the recordings into scripts later, or you can make changes and additions as you go by editing the script in the Control Panel.

For example, if you record filling in this form and clicking the Submit button:

HTTP::Recorder produces the following lines of code:

    $agent->form_name("form1");
    $agent->field("name", "Linda Julien");
    $agent->submit_form(form_name => "form1");

However, if you're writing automated tests, you probably don't want to enter hard-coded values into the form. You may want to re-write these lines of code so that they'll accept a variable for the value of the name field.

You can change the code to look like this:

    my $name = "Linda Julien";

    $agent->form_name("form1");
    $agent->field("name", $name);
    $agent->submit_form(form_name => "form1");

Or even this:

    sub fill_in_name {
      my $name = shift;

      $agent->form_name("form1");
      $agent->field("name", $name);
      $agent->submit_form(form_name => "form1");
    }

    fill_in_name("Linda Julien");

Then click the Update button. HTTP::Recorder will save your changes, and you can continue recording as before.

You may also want to add tests as you go, making sure that the results of submitting the form were what you expected:

You can add tests to the script like this:

    sub fill_in_name {
      my $name = shift;

      $agent->form_name("form1");
      $agent->field("name", $name);
      $agent->submit_form(form_name => "form1");
    }

    my $entry = "Linda Julien";
    fill_in_name($entry);

    $agent->content =~ /You entered this name: (.*)/;
    is ($1, $entry);

Using HTTP::Recorder with SSL

In order to do what it does, HTTP::Recorder relies on the ability to see and modify the contents of requests and their resulting responses...and the whole point of SSL is to make sure you can't easily do that. HTTP::Recorder works around this, however, by handling the SSL connection to the server itself, and and communicating with your browser via plain HTTP.

Caution: Keep in mind that communication between your browser and HTTP::Recorder isn't encrypted, so take care when recording sensitive information, like passwords or credit card numbers. If you're running the Recorder as a proxy on your local machine, you have less to worry about than if you're running it as a proxy on a remote machine. The resulting script for playback will be encrypted as usual.

If you want to record SSL sessions, here's how you do it:

Start at the control panel, and enter the initial URL there rather than in your browser. Then interact with the web site as you normally would. HTTP::Recorder will record form submissions, following links, etc.

Replaying your Scripts

HTTP::Recorder getting pages, following links, filling in fields and submitting forms, etc., but it doesn't (at this point) generate a complete perl script. Remember that you'll need to add standard script headers and initialize the WWW::Mechanize agent, with something like this:

#!/usr/bin/perl

    use strict;
    use warnings;
    use WWW::Mechanize;
    use Test::More qw(no_plan);

    my $agent = WWW::Mechanize->new();

Configuration Options

Output file. You can change the filename for the scripts that HTTP::Recorder generates with the $recorder->file([$value]) method. The default output file is '/tmp/scriptfile'.

Prefix. HTTP::Recorder adds parameters to link URLs and adds fields to forms. By default, its parameters begin with "rec-", but you can change this prefix with the $recorder->prefix([$value]) method.

Logger. The HTTP::Recorder distribution includes a default logging module, which outputs WWW::Mechanize scripts. You can change the logger with the $recorder->logger([$value]) method, replacing it with a logger that:

  • subclasses the standard logger to provice special functionality unique to your site
  • outputs an entirely different type of script

RT (Request Tracker) 3.1 by Best Practical Solutions has a Query Builder that's a good example of a page that benefits from a custom logger:

This page has several Field/Operator/Value groupings. Left to its own devices, the default HTTP::Recorder::Logger will record every field for which a value has been set:

    $agent->form_name("BuildQuery");
    $agent->field("ActorOp", "=");
    $agent->field("AndOr", "AND");
    $agent->field("TimeOp", "<");
    $agent->field("WatcherOp", "LIKE");
    $agent->field("QueueOp", "=");
    $agent->field("PriorityOp", "<");
    $agent->field("LinksOp", "=");
    $agent->field("idOp", "<");
    $agent->field("AttachmentField", "Subject");
    $agent->field("ActorField", "Owner");
    $agent->field("PriorityField", "Priority");
    $agent->field("StatusOp", "=");
    $agent->field("DateField", "Created");
    $agent->field("TimeField", "TimeWorked");
    $agent->field("LinksField", "HasMember");
    $agent->field("WatcherField", "Requestor.EmailAddress");
    $agent->field("AttachmentOp", "LIKE");
    $agent->field("ValueOfAttachment", "foo");
    $agent->field("DateOp", "<");
    $agent->submit_form(form_name => "BuildQuery");

But on this page, there's no need to record setting the values of fields (XField) and operators (XOp) unless a value (ValueOfX) has actually been set. We can do this with a custom logger that checks for the presence of a value, and only records the value of the field and operator fields if the value field has been set:

    package HTTP::Recorder::RTLogger;

    use strict;
    use warnings;
    use HTTP::Recorder::Logger;
    our @ISA = qw( HTTP::Recorder::Logger );

    sub SetFieldsAndSubmit {
        my $self = shift;
        my %args = (
		    name => "",
		    number => undef,
		    fields => {},
		    button_name => {},
		    button_value => {},
		    button_number => {},
		    @_
		    );

	$self->SetForm(name => $args{name}, number => $args{number});
	my %fields = %{$args{fields}};
	foreach my $field (sort keys %fields) {
	    if ( $args{name} eq 'BuildQuery' &&
		 ($field =~ /(.*)Op$/ || $field =~ /(.*)Field$/) &&
		 !exists ($fields{'ValueOf' . $1})) {
		next;
	    }
	    $self->SetField(name => $field, 
			    value => $args{fields}->{$field});
	}
	$self->Submit(name => $args{name}, 
		      number => $args{number},
		      button_name => $args{button_name},
		      button_value => $args{button_value},
		      button_number => $args{button_number},
		      );
    }

    1;

Tell HTTP::Recorder to use the custom logger like this:

    my $logger = new HTTP::Recorder::RTLogger;
    $agent->logger($logger);

And it will record a much more reasonable number of things:

    $agent->form_name("BuildQuery");
    $agent->field("AndOr", "AND");
    $agent->field("AttachmentField", "Subject");
    $agent->field("AttachmentOp", "LIKE");
    $agent->field("ValueOfAttachment", "foo");
    $agent->submit_form(form_name => "BuildQuery");

Control panel. By default, you can access the HTTP::Recorder control panel by using the Recorder to get http://http-recorder. You can change this URL with the $recorder->control([$value]) method.

Logger Options

Agent name. By default, HTTP::Recorder::Logger outputs scripts with the agent name $agent:

     $agent->follow_link(text => "Foo", n => 1);

However, if you prefer a different agent name (in order to drop recorded lines into existing scripts, conform to company conventions, etc.), you can change that with the $logger->agentname([value]) method:

     $recorder->agentname("mech");

will produce the following:

     $mech->follow_link(text => "Foo", n => 1);

How HTTP::Recorder Works

The biggest challenge to writing a web recorder is knowing what the user is doing, so that it can be recorded. A proxy can watch requests and responses go by, the only thing you'll learn is the URL that was requested and its parameters. HTTP::Recorder solves this problem by rewriting HTTP responses as they come through, and adding additional information to the page's links and forms, so that it can extract that information again when the next request comes through.

As an example, a page might contain a link like this:

    <a href="http://www.cpan.org/">CPAN</a>

If the user follows the link, and we want to record it, we need to know all of the relevant information about the action, so that we can produce a line of code that will replay the action. This includes:

  • the fact that a link was followed.
  • the text of the link.
  • the URL of the link.
  • the index (in case there are multiple links on the page of the same name).

HTTP::Recorder overloads LWP::UserAgent's send_request method, so that it can see requests and responses as they come through, and modify them as needed.

HTTP::Recorder rewrites the link so that it looks like this:

<a href="http://www.cpan.org/?rec-url=http%3A%2F%2Fwww.cpan.org%2F&rec-action=follow&rec-text=CPAN&rec-index=1">CPAN</a>

So, with the rewritten page, if the user follows this link, the request will contain all of the information needed to record the action.

Forms are handled likewise, with additional fields being added to the form so that the information can be extracted later. HTTP::Recorder then removes the added parameters from the resulting request, and forwards the request along in something close to its originally intended state.

Looking Ahead

HTTP::Recorder won't record 100% of every script you need to write, and while future versions will undoubtedly have more features, they still won't write your scripts for you. However, it will record the simple things, and it will give you example code that you can cut, paste, and modify to write the scripts that you need.

Some ideas for the future include:

  • Choosing from a list of simple tests based on the fields on the page and their current values.
  • "Threaded" recording, so that multiple sessions won't be recorded in the same file, overlapped with each other.
  • "Add script header" feature.
  • Supporting more configuration options from the control panel.
  • Other loggers.
  • JavaScript support.

Where to Get HTTP::Recorder

The latest released version of HTTP::Recorder is available at CPAN.

Contributions, Requests, and Bugs

Patches, feature requests, and problem reports are welcomed at http://rt.cpan.org.

You can subscribe to the mailing list for users and developers of HTTP::Recorder at http://lists.fsck.com/mailman/listinfo/http-recorder, or by sending email to http-recorder-request@lists.fsck.com with the subject "subscribe".

The mailing list archives can be found at http://lists.fsck.com/piper-mail/http-recorder.

See Also

WWW::Mechanize by Andy Lester.

HTTP::Proxy by Phillipe "BooK" Bruhat.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02