Recently in Tutorials Category

Beginner's Introduction to Perl 5.10, Part 2

A Beginner's Introduction to Perl 5.10 talked about the core elements of Perl: variables (scalars, arrays, and hashes), math operators and some basic flow control (the for statement). Now it's time to interact with the world. (A Beginner's Introduction to Regular Expressions with Perl 5.10 explores regular expressions, matching, and substitutions. A Beginner's Introduction to Perl Web Programming demonstrates how to write web programs.)

This installment discusses how to slice and dice strings, how to play with files and how to define your own functions. First, you need to understand one more core concept of the Perl language: conditions and comparisons.

Comparison operators

Like all good programming languages, Perl allows you ask questions such as "Is this number greater than that number?" or "Are these two strings the same?" and do different things depending on the answer.

When you're dealing with numbers, Perl has four important operators: <, >, == and !=. These are the "less than," "greater than," "equal to" and "not equal to" operators. (You can also use <=, "less than or equal to," and >=, "greater than or equal to.)

You can use these operators along with one of Perl's conditional keywords, such as if and unless. Both of these keywords take a condition that Perl will test, and a block of code in curly brackets that Perl will run if the test works. These two words work just like their English equivalents -- an if test succeeds if the condition turns out to be true, and an unless test succeeds if the condition turns out to be false:

use 5.010;

if ($year_according_to_computer == 1900) {
    say "Y2K has doomed us all!  Everyone to the compound.";
}

unless ($bank_account > 0) {
    say "I'm broke!";
}

Be careful of the difference between = and ==! One equals sign means "assignment", two means "comparison for equality". This is a common, evil bug:

use 5.010;

if ($a = 5) {
    say "This works - but doesn't do what you want!";
}

You may be asking what that extra line of code at the start does. Just like the use feature :5.10; code from the previous article, this enables new features of Perl 5.10. (Why 5.010 and not 5.10? The version number is not a single decimal; there may eventually be a Perl 5.100, but probably not a Perl 5.1000. Just trust me on this for now.)

Instead of testing whether $a is equal to five, you've made $a equal to five and clobbered its old value. (A future article will show how to avoid this bug in running code.)

Both if and unless can be followed by an else statement and code block, which executes if your test failed. You can also use elsif to chain together a bunch of if statements:

use 5.010;

if ($a == 5) {
    say "It's five!";
} elsif ($a == 6) {
    say "It's six!";
} else {
    say "It's something else.";
}

unless ($pie eq 'apple') {
    say "Ew, I don't like $pie flavored pie.";
} else {
    say "Apple!  My favorite!";
}

You don't always need an else condition, and sometimes the code to execute fits on a single line. In that case, you can use postfix conditional statements. The name may sound daunting, but you already understand them if you can read this sentence.

use 5.010;

say "I'm leaving work early!" if $day eq 'Friday';

say "I'm burning the 7 pm oil" unless $day eq 'Friday';

Sometimes this can make your code clearer.

while and until

Two slightly more complex keywords are while and until. They both take a condition and a block of code, just like if and unless, but they act like loops similar to for. Perl tests the condition, runs the block of code and runs it over and over again for as long as the condition is true (for a while loop) or false (for a until loop).

Try to guess what this code will do:

use 5.010;

my $count = 0;

while ($count != 3) {
   $count++;
   say "Counting up to $count...";
}

until ($count == 0) {
   $count--;
   say "Counting down to $count...";
}

Here's what you see when you run this program:

Counting up to 1...
Counting up to 2...
Counting up to 3...
Counting down to 2...
Counting down to 1...
Counting down to 0...

String comparisons

That's how you compare numbers. What about strings? The most common string comparison operator is eq, which tests for string equality -- that is, whether two strings have the same value.

Remember the pain of mixing up = and ==? You can also mix up == and eq. This is one of the few cases where it does matter whether Perl treats a value as a string or a number. Try this code:

use 5.010;

my $yes_no = 'no';
say "How positive!" if $yes_no == 'yes';

Why does this code think you said yes? Remember that Perl automatically converts strings to numbers whenever it's necessary; the == operator implies that you're using numbers, so Perl converts the value of $yes_no ("no") to the number 0, and "yes" to the number 0 as well. Because this equality test works (0 is equal to 0), the condition is true. Change the condition to $yes_no eq 'yes', and it'll do what it should.

Things can work the other way, too. The number five is numerically equal to the string " 5 ", so comparing them to == works. When you compare five and " 5 " with eq, Perl will convert the number to the string "5" first, and then ask whether the two strings have the same value. Because they don't, the eq comparison fails. This code fragment will print Numeric equality!, but not String equality!:

use 5.010;

my $five = 5;

say "Numeric equality!" if $five == " 5 ";
say "String equality!"  if $five eq " 5 ";

More fun with strings

You'll often want to manipulate strings: Break them into smaller pieces, put them together and change their contents. Perl offers three functions that make string manipulation easy and fun: substr(), split(), and join().

If you want to retrieve part of a string (say, the first four characters or a 10-character chunk from the middle), use the substr() function. It takes either two or three parameters: the string you want to look at, the character position to start at (the first character is position 0) and the number of characters to retrieve. If you leave out the number of characters, you'll retrieve everything up to the end of the string.

my $greeting = "Welcome to Perl!\n";

print substr($greeting, 0, 7);     # "Welcome"
print substr($greeting, 7);        # " to Perl!\n"

A neat and often-overlooked thing about substr() is that you can use a negative character position. This will retrieve a substring that begins with many characters from the end of the string.

my $greeting = "Welcome to Perl!\n";

print substr($greeting, -6, 4);      # "Perl"

(Remember that inside double quotes, \n represents the single new-line character.)

You can also manipulate the string by using substr() to assign a new value to part of it. One useful trick is using a length of zero to insert characters into a string:

my $greeting = "Welcome to Java!\n";

substr($greeting, 11, 4) = 'Perl';    # $greeting is now "Welcome to Perl!\n";
substr($greeting, 7, 3)  = '';        #       ... "Welcome Perl!\n";
substr($greeting, 0, 0)  = 'Hello. '; #       ... "Hello. Welcome Perl!\n";

split() breaks apart a string and returns a list of the pieces. split() generally takes two parameters: a regular expression to split the string with and the string you want to split. (The next article will discuss regular expressions in more detail; for the moment, all you need to know is that this regular expression represents a single space character: / /.) The characters you split won't show up in any of the list elements.

my $greeting = "Hello. Welcome Perl!\n";
my @words    = split(/ /, $greeting);   # Three items: "Hello.", "Welcome", "Perl!\n"

You can also specify a third parameter: the maximum number of items to put in your list. The splitting will stop as soon as your list contains that many items:

my $greeting = "Hello. Welcome Perl!\n";
my @words    = split(/ /, $greeting, 2);   # Two items: "Hello.", "Welcome Perl!\n";

Of course, what you can split, you can also join(). The join() function takes a list of strings and attaches them together with a specified string between each element, which may be an empty string:

my @words         = ("Hello.", "Welcome", "Perl!\n");
my $greeting      = join(' ', @words);       # "Hello. Welcome Perl!\n";
my $andy_greeting = join(' and ', @words);   # "Hello. and Welcome and Perl!\n";
my $jam_greeting  = join('', @words);        # "Hello.WelcomePerl!\n";

Filehandles

That's enough about strings. It's time to consider files -- after all, what good is string manipulation if you can't do it where it counts?

To read from or write to a file, you have to open it. When you open a file, Perl asks the operating system if the file is accessible -- does the file exist if you're trying to read it (or can it be created if you're trying to create a new file), and do you have the necessary file permissions to do what you want? If you're allowed to use the file, the operating system will prepare it for you, and Perl will give you a filehandle.

Ask Perl to create a filehandle for you by using the open() function, which takes two or three arguments: the filehandle you want to create, the mode of the file, and the file you want to work with. First, we'll concentrate on reading files. The following statement opens the file log.txt using the filehandle $logfile:

open my $logfile, 'log.txt';

Opening a file involves several behind-the-scenes tasks that Perl and the operating system undertake together, such as checking that the file you want to open actually exists (or creating it if you're trying to create a new file) and making sure you're allowed to manipulate the file (do you have the necessary file permissions, for instance). Perl will do all of this for you, so in general you don't need to worry about it.

Once you've opened a file to read, you can retrieve lines from it by using the <> construct, also known as readline. Inside the angle brackets, place your filehandle. What you get from this depends on what you want to get: in a scalar context (a more technical way of saying "if you're assigning it to a scalar"), you retrieve the next line from the file, but if you're looking for a list, you get a list of all the remaining lines in the file.

You can, of course, close a filehandle that you've opened. You don't always have to do this, because Perl is clever enough to close a filehandle when your program ends, when you try to reuse an existing filehandle, or when the lexical variable containing the filehandle goes out of scope.

Here's a simple program that will display the contents of the file log.txt, and assumes that the first line of the file is its title:

open my $logfile, 'log.txt' or die "I couldn't get at log.txt: $!";

my $title = <$logfile>;
print "Report Title: $title";

print while <$logfile>;
close $logfile;

That code may seem pretty dense, but it combines ideas you've seen before. The while operator loops over every line of the file, one line at a time, putting each line into the Perl pronoun $_. (A pronoun? Yes -- think of it as it.) For each line read, Perl prints the line. Now the pronoun should make sense. While you read it from the file, print it.

Why not use say? Each line in the file ends with a newline -- that's how Perl knows that it's a line. There's no need to add an additional newline, so say would double-space the output.

Writing files

You also use open() when you are writing to a file. There are two ways to open a file for writing: overwrite and append. When you open a file in overwrite mode, you erase whatever it previously contained. In append mode, you attach your new data to the end of the existing file without erasing anything that was already there.

To indicate that you want a filehandle for writing, use a single > character as the mode passed to open. This opens the file in overwrite mode. To open it in append mode, use two > characters.

open my $overwrite, '>', 'overwrite.txt' or die "error trying to overwrite: $!";
# Wave goodbye to the original contents.

open my $append, '>>', 'append.txt' or die "error trying to append: $!";
# Original contents still there; add to the end of the file

Once your filehandle is open, use the humble print or say operator to write to it. Specify the filehandle you want to write to and a list of values you want to write:

use 5.010;

say $overwrite 'This is the new content';
print $append "We're adding to the end here.\n", "And here too.\n";

Live free or die!

Most of these open() statements include or die "some sort of message". This is because we live in an imperfect world, where programs don't always behave exactly the way we want them to. It's always possible for an open() call to fail; maybe you're trying to write to a file that you're not allowed to write, or you're trying to read from a file that doesn't exist. In Perl, you can guard against these problems by using or and and.

A series of statements separated by or will continue until you hit one that works, or returns a true value. This line of code will either succeed at opening $output in overwrite mode, or cause Perl to quit:

open my $output, '>', $outfile or die "Can't write to '$outfile': $!";

The die statement ends your program with an error message. The special variable $! contains Perl's explanation of the error. In this case, you might see something like this if you're not allowed to write to the file. Note that you get both the actual error message ("Permission denied") and the line where it happened:

Can't write to 'a2-die.txt': Permission denied at ./a2-die.pl line 1.

Defensive programming like this is useful for making your programs more error-resistant -- you don't want to write to a file that you haven't successfully opened! (Putting single-quotes around the filename may help you see any unexpected whitespace in the filename. You'll slap your forehead when it happens to you.)

Here's an example: As part of your job, you write a program that records its results in a file called vitalreport.txt. You use the following code:

open my $vital, '>', 'vitalreport.txt';

If this open() call fails (for instance, vitalreport.txt is owned by another user who hasn't given you write permission), you'll never know it until someone looks at the file afterward and wonders why the vital report wasn't written. (Just imagine the joy if that "someone" is your boss, the day before your annual performance review.) When you use or die, you avoid all this:

open my $vital, '>', 'vitalreport.txt' or die "Can't write vital report: $!";

Instead of wondering whether your program wrote your vital report, you'll immediately have an error message that both tells you what went wrong and on what line of your program the error occurred.

You can use or for more than just testing file operations:

use 5.010;
($pie eq 'apple') or ($pie eq 'cherry') or ($pie eq 'blueberry')
        or say 'But I wanted apple, cherry, or blueberry!';

In this sequence, if you have an appropriate pie, Perl skips the rest of the chain. Once one statement works, the rest are ignored. The and operator does the opposite: It evaluates your chain of statements, but stops when one of them doesn't work.

open my $log, 'log.file' and say 'Logfile is open!';
say 'Logfile is open!' if open my $log, 'log.file';

This statement will only show you the words Logfile is open! if the open() succeeds -- do you see why?

Again, just because there's more than one way to execute code conditionally doesn't mean you have to use every way in a single program or the most clever or creative way. You have plenty of options. Consider using the most readable one for the situation.

Subs

So far, the example Perl programs have been a bunch of statements in series. This is okay if you're writing very small programs, but as your needs grow, you'll find it limiting. This is why most modern programming languages allow you to define your own functions; in Perl, we call them subs.

A sub, declared with the sub keyword, adds a new function to your program's capabilities. When you want to use this new function, you call it by name. For instance, here's a short definition of a sub called boo:

use 5.010;

sub boo {
    say 'Boo!';
}

boo();   # Eek!

Subs are useful because they allow you to break your program into small, reusable chunks. If you need to analyze a string in four different places in your program, it's much easier to write one analyze_string sub and call it four times. This way, when you make an improvement to your string-analysis routine, you'll only need to do it in one place, instead of four.

In the same way that Perl's built-in functions can take parameters and can return values, your subs can, too. Whenever you call a sub, any parameters you pass to it appear in the special array @_. You can also return a single value or a list by using the return keyword.

use 5.010;

sub multiply {
    my (@ops) = @_;
    return $ops[0] * $ops[1];
}

for my $i (1 .. 10) {
     say "$i squared is ", multiply($i, $i);
}

There's an interesting benefit from using the the my keyword in multiply? It indicates that the variables are private to that sub, so that any existing value for the @ops array used elsewhere in our program won't get overwritten. This means that you'll evade a whole class of hard-to-trace bugs in your programs. You don't have to use my, but you also don't have to avoid smashing your thumb when you're hammering nails into a board. They're both just good ideas.

You can also assign to multiple lexical variables (declared with my) in a single statement. You can change the code within multiply to something like this without having to modify any other code:

sub multiply {
    my ($left, $right) = @_;
    return $left * $right;
}

If you don't expressly use the return statement, the sub returns the result of the last statement. This implicit return value can sometimes be useful, but it does reduce your program's readability. Remember that you'll read your code many more times than you write it!

Putting it all together

The previous article demonstrated a simple interest calculator. You can make it more interesting by writing the interest table to a file instead of to the screen. Another change is to break the code into subs to make it easier to read and maintain.

[Download this program]

#! perl

# compound_interest_file.pl - the miracle of compound interest, part 2

use 5.010;

use strict;
use warnings;

# First, we'll set up the variables we want to use.
my $outfile   = 'interest.txt';    # This is the filename of our report.
my $nest_egg  = 10000;             # $nest_egg is our starting amount
my $year      = 2008;              # This is the starting year for our table.
my $duration  = 10;                # How many years are we saving up?
my $apr       = 9.5;               # This is our annual percentage rate.

my $report_fh = open_report( $outfile );
print_headers(   $report_fh );
interest_report( $report_fh, $nest_egg, $year, $duration, $apr );
report_footer(   $report_fh, $nest_egg, $duration, $apr );

sub open_report {
    my ($outfile) = @_;
    open my $report, '>', $outfile or die "Can't open '$outfile': $!";
    return $report;
}

sub print_headers {
    my ($report_fh) = @_;

    # Print the headers for our report.
    say $report_fh "Year\tBalance\tInterest\tNew balance";
}

sub calculate_interest {
    # Given a nest egg and an APR, how much interest do we collect?
    my ( $nest_egg, $apr ) = @_;

    return int( ( $apr / 100 ) * $nest_egg * 100 ) / 100;
}

sub interest_report {
    # Get our parameters.  Note that these variables won't clobber the
    # global variables with the same name.
    my ( $report_fh, $nest_egg, $year, $duration, $apr ) = @_;

    # Calculate interest for each year.
    for my $i ( 1 .. $duration ) {
        my $interest = calculate_interest( $nest_egg, $apr );
        my $line     =
            join "\t", $year + $i, $nest_egg, $interest, $nest_egg + $interest;

        say $report_fh $line;

        $nest_egg += $interest;
    }
}

sub report_footer {
    my ($report_fh, $nest_egg, $duration, $apr) = @_;

    say $report_fh "\n Our original assumptions:";
    say $report_fh "   Nest egg: $nest_egg";
    say $report_fh "   Number of years: $duration";
    say $report_fh "   Interest rate: $apr";
}

Notice how much clearer the program logic becomes when you break it down into subs. One nice quality of a program written as small, well-named subs is that it almost becomes self-documenting. Consider these four lines:

my $report_fh = open_report( $outfile );
print_headers(   $report_fh );
interest_report( $report_fh, $nest_egg, $year, $duration, $apr );
report_footer(   $report_fh, $nest_egg, $duration, $apr );

Code like this is invaluable when you come back to it six months later and need to figure out what it does -- would you rather spend your time reading the entire program trying to figure it out or read four lines that tell you the program 1) opens a report file, 2) prints some headers, 3) generates an interest report, and 4) prints a report footer?

Play around!

This article has explored files (filehandles, open(), close(), and <>), string manipulation (substr(), split() and join()) and subs. Here's a pair of exercises -- again, one simple and one complex:

  • You have a file called dictionary.txt that contains dictionary definitions, one per line, in the format "word space definition". (Here's a sample.) Write a program that will look up a word from the command line. (Hints: @ARGV is a special array that contains your command line arguments and you'll need to use the three-argument form of split().) Try to enhance it so that your dictionary can also contain words with multiple definitions in the format "word space definition:alternate definition:alternate definition, etc...".
  • Write an analyzer for your Apache logs. You can find a brief description of the common log format at http://www.w3.org/Daemon/User/Config/Logging.html. Your analyzer should count the total number of requests for each URL, the total number of results for each status code and the total number of bytes output.

Happy programming!

A Beginner's Introduction to Perl 5.10

First, a Little Sales Pitch

Editor's note: this series is based on Doug Sheppard's Beginner's Introduction to Perl. A Beginner's Introduction to Files and Strings with Perl 5.10 explains how to use files and strings, and A Beginner's Introduction to Regular Expressions with Perl 5.10 explores regular expressions, matching, and substitutions. A Beginner's Introduction to Perl Web Programming demonstrates how to write web programs.

Welcome to Perl.

Perl is the Swiss Army chainsaw of programming languages: powerful and adaptable. It was first developed by Larry Wall, a linguist working as a systems administrator for NASA in the late 1980s, as a way to make report processing easier. Since then, it has moved into a several other areas: automating system administration, acting as glue between different computer systems, web programming, bioinformatics, data munging, and even application development.

Why did Perl become so popular when the Web came along? Two reasons: First, most of what is being done on the Web happens with text, and is best done with a language that's designed for text processing. More importantly, Perl was appreciably better than the alternatives at the time when people needed something to use. C is complex and can produce security problems (especially with untrusted data), Tcl can be awkward, and Python didn't really have a foothold.

It also didn't hurt that Perl is a friendly language. It plays well with your personal programming style. The Perl slogan is "There's more than one way to do it," and that lends itself well to large and small problems alike. Even more so, Perl is very portable and widespread -- it's available pre-installed almost everywhere -- and of course there are thousands of freely-distributable libraries available from the CPAN.

In this first part of our series, you'll learn a few basics about Perl and see a small sample program.

A Word About Operating Systems

This series assumes that you're using a Unix or Unix-like operating system (Mac OS X and Cygwin qualify) and that you have the perl binary available at /usr/bin/perl. It's OK if you're running Windows through ActivePerl or Strawberry Perl; most Perl code is platform-independent.

Your First Perl Program

Save this program as a file called first.pl:

use feature ':5.10';
say "Hi there!";

(The traditional first program says Hello world!, but I'm an iconoclast.)

Run the program. From a command line, go to the directory with this file and type perl first.pl. You should see:

Hi there!

Friendly, isn't it?

I'm sure you can guess what say does. What about the use feature ':5.10'; line? For now, all you need to know is that it allows you to use nice new features found in Perl 5.10. This is a very good thing.

Functions and Statements

Perl has a rich library of built-in functions. They're the verbs of Perl, the commands that the interpreter runs. You can see a list of all the built-in functions in the perlfunc man page (perldoc perlfunc, from the command line). Almost all functions can take a list of commma-separated parameters.

The print function is one of the most frequently used parts of Perl. You use it to display things on the screen or to send information to a file. It takes a list of things to output as its parameters.

print "This is a single statement.";
print "Look, ", "a ", "list!";

A Perl program consists of statements, each of which ends with a semicolon. Statements don't need to be on separate lines; there may be multiple statements on one line. You can also split a single statement across multiple lines.

print "This is "; print "two statements.\n";
print "But this ", "is only one statement.\n";

Wait a minute though. What's the difference between say and print? What's this \n in the print statements?

The say function behaves just like the print function, except that it appends a newline at the end of its arguments. It prints all of its arguments, and then a newline character. Always. No exceptions. print, on the other hand, only prints what you see explicitly in these examples. If you want a newline, you have to add it yourself with the special character escape sequence \n.

use feature ':5.10';

say "This is a single statement.";
say "Look, ", "a ", "list!";

Why do both exist? Why would you use one over the other? Usually, most "display something" statements need the newline. It's common enough that say is a good default choice. Occasionally you need a little bit more control over your output, so print is the option.

Note that say is two characters shorter than print. This is an important design principle for Perl -- common things should be easy and simple.

Numbers, Strings, and Quotes

There are two basic data types in Perl: numbers and strings.

Numbers are easy; we've all dealt with them. The only thing you need to know is that you never insert commas or spaces into numbers in Perl. Always write 10000, not 10,000 or 10 000.

Strings are a bit more complex. A string is a collection of characters in either single or double quotes:

'This is a test.'
"Hi there!\n"

The difference between single quotes and double quotes is that single quotes mean that their contents should be taken literally, while double quotes mean that their contents should be interpreted. Remember the character sequence \n? It represents a newline character when it appears in a string with double quotes, but is literally the two characters backslash and n when it appears in single quotes.

use feature ':5.10';
say "This string\nshows up on two lines.";
say 'This string \n shows up on only one.';

(Two other useful backslash sequences are \t to insert a tab character, and \\ to insert a backslash into a double-quoted string.)

Variables

If functions are Perl's verbs, then variables are its nouns. Perl has three types of variables: scalars, arrays, and hashes. Think of them as things, lists, and dictionaries respectively. In Perl, all variable names consist of a punctuation character, a letter or underscore, and one or more alphanumeric characters or underscores.

Scalars are single things. This might be a number or a string. The name of a scalar begins with a dollar sign, such as $i or $abacus. Assign a value to a scalar by telling Perl what it equals:

my $i                = 5;
my $pie_flavor       = 'apple';
my $constitution1776 = "We the People, etc.";

You don't need to specify whether a scalar is a number or a string. It doesn't matter, because when Perl needs to treat a scalar as a string, it does; when it needs to treat it as a number, it does. The conversion happens automatically. (This is different from many other languages, where strings and numbers are two separate data types.)

If you use a double-quoted string, Perl will insert the value of any scalar variables you name in the string. This is often useful to fill in strings on the fly:

use feature ':5.10';
my $apple_count  = 5;
my $count_report = "There are $apple_count apples.";
say "The report is: $count_report";

The final output from this code is The report is: There are 5 apples..

You can manipulate numbers in Perl with the usual mathematical operations: addition, multiplication, division, and subtraction. (The multiplication and division operators in Perl use the * and / symbols, by the way.)

my $a = 5;
my $b = $a + 10;       # $b is now equal to 15.
my $c = $b * 10;       # $c is now equal to 150.
$a    = $a - 1;        # $a is now 4, and algebra teachers are cringing.

That's all well and good, but what's this strange my, and why does it appear with some assignments and not others? The my operator tells Perl that you're declaring a new variable. That is, you promise Perl that you deliberately want to use a scalar, array, or hash of a specific name in your program. This is important for two reasons. First, it helps Perl help you protect against typos; it's embarrassing to discover that you've accidentally mistyped a variable name and spent an hour looking for a bug. Second, it helps you write larger programs, where variables used in one part of the code don't accidentally affect variables used elsewhere.

You can also use special operators like ++, --, +=, -=, /= and *=. These manipulate a scalar's value without needing two elements in an equation. Some people like them, some don't. I like the fact that they can make code clearer.

my $a = 5;
$a++;        # $a is now 6; we added 1 to it.
$a += 10;    # Now it's 16; we added 10.
$a /= 2;     # And divided it by 2, so it's 8.

Strings in Perl don't have quite as much flexibility. About the only basic operator that you can use on strings is concatenation, which is a ten dollar way of saying "put together." The concatenation operator is the period. Concatenation and addition are two different things:

my $a = "8";    # Note the quotes.  $a is a string.
my $b = $a + "1";   # "1" is a string too.
my $c = $a . "1";   # But $b and $c have different values!

Remember that Perl converts strings to numbers transparently whenever necessary, so to get the value of $b, the Perl interpreter converted the two strings "8" and "1" to numbers, then added them. The value of $b is the number 9. However, $c used concatenation, so its value is the string "81".

Remember, the plus sign adds numbers and the period puts strings together. If you add things that aren't numbers, Perl will try its best to do what you've told it to do, and will convert those non-numbers to numbers with the best of its ability.

Arrays are lists of scalars. Array names begin with @. You define arrays by listing their contents in parentheses, separated by commas:

my @lotto_numbers = (1, 2, 3, 4, 5, 6);  # Hey, it could happen.
my @months        = ("July", "August", "September");

You retrieve the contents of an array by an index, sort of like "Hey, give me the first month of the year." Indexes in Perl start from zero. (Why not 1? Because. It's a computer thing.) To retrieve the elements of an array, you replace the @ sign with a $ sign, and follow that with the index position of the element you want. (It begins with a dollar sign because you're getting a scalar value.) You can also modify it in place, just like any other scalar.

use feature ':5.10';

my @months = ("July", "August", "September");
say $months[0];         # This prints "July".
$months[2] = "Smarch";  # We just renamed September!

If an array value doesn't exist, Perl will create it for you when you assign to it.

my @winter_months = ("December", "January");
$winter_months[2] = "February";

Arrays always return their contents in the same order; if you go through @months from beginning to end, no matter how many times you do it, you'll get back July, August, and September in that order. If you want to find the number of elements of an array, assign the array to a scalar.

use feature ':5.10';
my @months      = ("July", "August", "September");
my $month_count = @months;
say $month_count;  # This prints 3.

my @autumn_months; # no elements
my $autumn_count = @autumn_months;
say $autumn_count; # this prints 0

Some programming languages call hashes "dictionaries". That's what they are: a term and a definition. More precisely, they contain keys and values. Each key in a hash has one and only one corresponding value. The name of a hash begins with a percentage sign, like %parents. You define hashes by comma-separated pairs of key and value, like so:

my %days_in_month = ( "July" => 31, "August" => 31, "September" => 30 );

You can fetch any value from a hash by referring to $hashname{key}, or modify it in place just like any other scalar.

say $days_in_month{September}; # 30, of course.
$days_in_month{February} = 29; # It's a leap year.

To see what keys are in a hash, use the keys function with the name of the hash. This returns a list containing all of the keys in the hash. The list isn't always in the same order, though; while you can count on @months always to return July, August, September in that order, keys %days_in_summer might return them in any order whatsoever.

my @month_list = keys %days_in_summer;
# @month_list is now ('July', 'September', 'August')!

The three types of variables have three separate namespaces. That means that $abacus and @abacus are two different variables, and $abacus[0] (the first element of @abacus) is not the same as $abacus{0} (the value in %abacus that has the key 0).

Comments

Some of the code samples from the previous section contained code comments. These are useful for explaining what a particular piece of code does, and vital for any piece of code you plan to modify, enhance, fix, or just look at again. (That is to say, comments are important.)

Anything in a line of Perl code that follows a # sign is a comment, unless that # sign appears in a string.)

use feature ':5.10';
say "Hello world!";  # That's more like it.
# This entire line is a comment.

Loops

Almost every program ever written uses a loop of some kind. Loops allow you run a particular piece of code over and over again. This is part of a general concept in programming called flow control.

Perl has several different functions that are useful for flow control, the most basic of which is for. When you use the for function, you specify a variable to use as the loop index, and a list of values to loop over. Inside a pair of curly brackets, you put any code you want to run during the loop:

use feature ':5.10';

for my $i (1, 2, 3, 4, 5) {
     say $i;
}

This loop prints the numbers 1 through 5, each on a separate line. (It's not very useful; you're might think "Why not just write say 1, 2, 3, 4, 5;?". This is because say adds only one newline, at the end of its list of arguments.)

A handy shortcut for defining loop values is the range operator .., which specifies a range of numbers. You can write (1, 2, 3, 4, 5) as (1 .. 5) instead. You can also use arrays and scalars in your loop list. Try this code and see what happens:

use feature ':5.10';

my @one_to_ten = (1 .. 10);
my $top_limit  = 25;

for my $i (@one_to_ten, 15, 20 .. $top_limit) {
    say $i;
}

Of course, again you could write say @one_to_ten, 15, 20 .. $top_limit;

The items in your loop list don't have to be numbers; you can use strings just as easily. If the hash %month_has contains names of months and the number of days in each month, you can use the keys function to step through them.

use feature ':5.10';

for my $i (keys %month_has) {
    say "$i has $month_has{$i} days.";
}

for my $marx ('Groucho', 'Harpo', 'Zeppo', 'Karl') {
    say "$marx is my favorite Marx brother.";
}

The Miracle of Compound Interest

You now know enough about Perl -- variables, print/say, and for() -- to write a small, useful program. Everyone loves money, so the first sample program is a compound-interest calculator. It will print a (somewhat) nicely formatted table showing the value of an investment over a number of years. (You can see the program at compound_interest.pl)

The single most complex line in the program is:

my $interest = int( ( $apr / 100 ) * $nest_egg * 100 ) / 100;

$apr / 100 is the interest rate, and ($apr / 100) * $nest_egg is the amount of interest earned in one year. This line uses the int() function, which returns the integer value of a scalar (its value after any stripping off any fractional part). We use int() here because when you multiply, for example, 10925 by 9.25%, the result is 1010.5625, which we must round off to 1010.56. To do this, we multiply by 100, yielding 101056.25, use int() to throw away the leftover fraction, yielding 101056, and then divide by 100 again, so that the final result is 1010.56. Try stepping through this statement yourself to see just how we end up with the correct result, rounded to cents.

Play Around!

At this point you have some basic knowledge of Perl syntax and a few simple toys to play with. Try writing some simple programs with them. Here are two suggestions, one simple and the other a little more complex:

  • A word frequency counter. How often does each word show up in an array of words? Print out a report. (Hint: Use a hash to count of the number of appearances of each word.)
  • Given a month and the day of the week that's the first of that month, print a calendar for the month.

Advanced Subroutine Techniques

In "Making Sense of Subroutines," I wrote about what subroutines are and why you want to use them. This article expands on that topic, discussing some of the more common techniques for subroutines to make them even more useful.

Several of these techniques are advanced, but you can use each one by itself without understanding the others. Furthermore, not every technique is useful in every situation. As with all techniques, consider these as tools in your toolbox, not things you have to do every time you open your editor.

Named Arguments

Positional Arguments

Subroutines, by default, use "positional arguments." This means that the arguments to the subroutine must occur in a specific order. For subroutines with a small argument list (three or fewer items), this isn't a problem.

sub pretty_print {
    my ($filename, $text, $text_width) = @_;

    # Format $text to $text_width somehow.

    open my $fh, '>', $filename
        or die "Cannot open '$filename' for writing: $!\n";

    print $fh $text;

    close $fh;

    return;
}

pretty_print( 'filename', $long_text, 80 );

The Problem

However, once everyone starts using your subroutine, it starts expanding what it can do. Argument lists tend to expand, making it harder and harder to remember the order of arguments.

sub pretty_print {
    my (
        $filename, $text, $text_width, $justification, $indent,
        $sentence_lead
    ) = @_;

    # Format $text to $text_width somehow. If $justification is set, justify
    # appropriately. If $indent is set, indent the first line by one tab. If
    # $sentence_lead is set, make sure all sentences start with two spaces.

    open my $fh, '>', $filename
        or die "Cannot open '$filename' for writing: $!\n";

    print $fh $text;

    close $fh;

    return;
}

pretty_print( 'filename', $long_text, 80, 'full', undef, 1 );

Quick--what does that 1 at the end of the subroutine mean? If it took you more than five seconds to figure it out, then the subroutine call is unmaintainable. Now, imagine that the subroutine isn't right there, isn't documented or commented, and was written by someone who is quitting next week.

The Solution

The most maintainable solution is to use "named arguments." In Perl 5, the best way to implement this is by using a hash reference. Hashes also work, but they require additional work on the part of the subroutine author to verify that the argument list is even. A hashref makes any unmatched keys immediately obvious as a compile error.

sub pretty_print {
    my ($args) = @_;

    # Format $args->{text} to $args->{text_width} somehow.
    # If $args->{justification} is set, justify appropriately.
    # If $args->{indent} is set, indent the first line by one tab.
    # If $args->{sentence_lead} is set, make sure all sentences start with
    # two spaces.

    open my $fh, '>', $args->{filename}
        or die "Cannot open '$args->{filename}' for writing: $!\n";

    print $fh $args->{text};

    close $fh;

    return;
}

pretty_print({
    filename      => 'filename',
    text          => $long_text,
    text_width    => 80,
    justification => 'full',
    sentence_lead => 1,
});

Now, the reader can immediately see exactly what the call to pretty_print() is doing.

And Optional Arguments

By using named arguments, you gain the benefit that some or all of your arguments can be optional without forcing our users to put undef in all of the positions they don't want to specify.

Validation

Argument validation is more difficult in Perl than in other languages. In C or Java, for instance, every variable has a type associated with it. This includes subroutine declarations, meaning that trying to pass the wrong type of variable to a subroutine gives a compile-time error. By contrast, because perl flattens everything to a single list, there is no compile-time checking at all. (Well, there kinda is with prototypes.)

This has been such a problem that there are dozens of modules on CPAN to address the problem. The most commonly recommended one is Params::Validate.

Prototypes

Prototypes in Perl are a way of letting Perl know exactly what to expect for a given subroutine, at compile time. If you've ever tried to pass an array to the vec() built-in and you saw Not enough arguments for vec, you've hit a prototype.

For the most part, prototypes are more trouble than they're worth. For one thing, Perl doesn't check prototypes for methods because that would require the ability to determine, at compile time, which class will handle the method. Because you can alter @ISA at runtime--you see the problem. The main reason, however, is that prototypes aren't very smart. If you specify sub foo ($$$), you cannot pass it an array of three scalars (this is the problem with vec()). Instead, you have to say foo( $x[0], $x[1], $x[2] ), and that's just a pain.

Prototypes can be very useful for one reason--the ability to pass subroutines in as the first argument. Test::Exception uses this to excellent advantage:

sub do_this_to (&;$) {
    my ($action, $name) = @_;

    $action->( $name );
}

do_this_to { print "Hello, $_[0]\n" } 'World';
do_this_to { print "Goodbye, $_[0]\n" } 'cruel world!';

Context Awareness

Using the wantarray built-in, a subroutine can determine its calling context. Context for subroutines, in Perl, is one of three things--list, scalar, or void. List context means that the return value will be used as a list, scalar context means that the return value will be used as a scalar, and void context means that the return value won't be used at all.

sub check_context {
    # True
    if ( wantarray ) {
        print "List context\n";
    }
    # False, but defined
    elsif ( defined wantarray ) {
        print "Scalar context\n";
    }
    # False and undefined
    else {
        print "Void context\n";
    }
}

my @x       = check_context();  # prints 'List context'
my %x       = check_context();  # prints 'List context'
my ($x, $y) = check_context();  # prints 'List context'

my $x       = check_context();  # prints 'Scalar context'

check_context();                # prints 'Void context'

For CPAN modules that implement or augment context awareness, look at Contextual::Return, Sub::Context, and Return::Value.

Note: you can misuse context awareness heavily by having the subroutine do something completely different when called in scalar versus list context. Don't do that. A subroutine should be a single, easily identifiable unit of work. Not everyone understands all of the different permutations of context, including your standard Perl expert.

Instead, I recommend having a standard return value, except in void context. If your return value is expensive to calculate and is calculated only for the purposes of returning it, then knowing if you're in void context may be very helpful. This can be a premature optimization, however, so always measure (benchmarking and profiling) before and after to make sure you're optimizing what needs optimizing.

Mimicking Perl's Internal Functions

A lot of Perl's internal functions modify their arguments and/or use $_ or @_ as a default if no parameters are provided. A perfect example of this is chomp(). Here's a version of chomp() that illustrates some of these techniques:

sub my_chomp {
    # This is a special case in the chomp documentation
    return if ref($/);

    # If a return value is expected ...
    if ( defined wantarray ) {
        my $count = 0;
        $count += (@_ ? (s!$/!!g for @_) : s!$/!!g);
        return $count;
    }
    # Otherwise, don't bother counting
    else {
        @_ ? do{ s!$/!!g for @_ } : s!$/!!g;
        return;
    }
}
  • Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts.
  • If you want to modify $_ if no parameters are given, you have to check @_ explicitly. You cannot do something like @_ = ($_) unless @_; because $_ will lose its magic.
  • This doesn't calculate $count unless $count is useful (using a check for void context).
  • The key is the aliasing of @_. If you modify @_ directly (as opposed to assigning the values in @_ to variables), then you modify the actual parameters passed in.

Conclusion

I hope I have introduced you to a few more tools in your toolbox. The art of writing a good subroutine is very complex. Each of the techniques I have presented is one tool in the programmer's toolbox. Just as a master woodworker wouldn't use a drill for every project, a master programmer doesn't make every subroutine use named arguments or mimic a built-in. You must evaluate each technique every time to see if it will make the code more maintainable. Overusing these techniques will make your code less maintainable. Using them appropriately will make your life easier.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02