Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Advanced Subroutine Techniques
by Rob Kinyon | Pages: 1, 2

Validation

Argument validation is more difficult in Perl than in other languages. In C or Java, for instance, every variable has a type associated with it. This includes subroutine declarations, meaning that trying to pass the wrong type of variable to a subroutine gives a compile-time error. By contrast, because perl flattens everything to a single list, there is no compile-time checking at all. (Well, there kinda is with prototypes.)

This has been such a problem that there are dozens of modules on CPAN to address the problem. The most commonly recommended one is Params::Validate.

Prototypes

Prototypes in Perl are a way of letting Perl know exactly what to expect for a given subroutine, at compile time. If you've ever tried to pass an array to the vec() built-in and you saw Not enough arguments for vec, you've hit a prototype.

For the most part, prototypes are more trouble than they're worth. For one thing, Perl doesn't check prototypes for methods because that would require the ability to determine, at compile time, which class will handle the method. Because you can alter @ISA at runtime--you see the problem. The main reason, however, is that prototypes aren't very smart. If you specify sub foo ($$$), you cannot pass it an array of three scalars (this is the problem with vec()). Instead, you have to say foo( $x[0], $x[1], $x[2] ), and that's just a pain.

Prototypes can be very useful for one reason--the ability to pass subroutines in as the first argument. Test::Exception uses this to excellent advantage:

sub do_this_to (&;$) {
    my ($action, $name) = @_;

    $action->( $name );
}

do_this_to { print "Hello, $_[0]\n" } 'World';
do_this_to { print "Goodbye, $_[0]\n" } 'cruel world!';

Context Awareness

Using the wantarray built-in, a subroutine can determine its calling context. Context for subroutines, in Perl, is one of three things--list, scalar, or void. List context means that the return value will be used as a list, scalar context means that the return value will be used as a scalar, and void context means that the return value won't be used at all.

sub check_context {
    # True
    if ( wantarray ) {
        print "List context\n";
    }
    # False, but defined
    elsif ( defined wantarray ) {
        print "Scalar context\n";
    }
    # False and undefined
    else {
        print "Void context\n";
    }
}

my @x       = check_context();  # prints 'List context'
my %x       = check_context();  # prints 'List context'
my ($x, $y) = check_context();  # prints 'List context'

my $x       = check_context();  # prints 'Scalar context'

check_context();                # prints 'Void context'

For CPAN modules that implement or augment context awareness, look at Contextual::Return, Sub::Context, and Return::Value.

Note: you can misuse context awareness heavily by having the subroutine do something completely different when called in scalar versus list context. Don't do that. A subroutine should be a single, easily identifiable unit of work. Not everyone understands all of the different permutations of context, including your standard Perl expert.

Instead, I recommend having a standard return value, except in void context. If your return value is expensive to calculate and is calculated only for the purposes of returning it, then knowing if you're in void context may be very helpful. This can be a premature optimization, however, so always measure (benchmarking and profiling) before and after to make sure you're optimizing what needs optimizing.

Mimicking Perl's Internal Functions

A lot of Perl's internal functions modify their arguments and/or use $_ or @_ as a default if no parameters are provided. A perfect example of this is chomp(). Here's a version of chomp() that illustrates some of these techniques:

sub my_chomp {
    # This is a special case in the chomp documentation
    return if ref($/);

    # If a return value is expected ...
    if ( defined wantarray ) {
        my $count = 0;
        $count += (@_ ? (s!$/!!g for @_) : s!$/!!g);
        return $count;
    }
    # Otherwise, don't bother counting
    else {
        @_ ? do{ s!$/!!g for @_ } : s!$/!!g;
        return;
    }
}
  • Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts.
  • If you want to modify $_ if no parameters are given, you have to check @_ explicitly. You cannot do something like @_ = ($_) unless @_; because $_ will lose its magic.
  • This doesn't calculate $count unless $count is useful (using a check for void context).
  • The key is the aliasing of @_. If you modify @_ directly (as opposed to assigning the values in @_ to variables), then you modify the actual parameters passed in.

Conclusion

I hope I have introduced you to a few more tools in your toolbox. The art of writing a good subroutine is very complex. Each of the techniques I have presented is one tool in the programmer's toolbox. Just as a master woodworker wouldn't use a drill for every project, a master programmer doesn't make every subroutine use named arguments or mimic a built-in. You must evaluate each technique every time to see if it will make the code more maintainable. Overusing these techniques will make your code less maintainable. Using them appropriately will make your life easier.