Simple IO Handling with IO::All

One of my favorite things about Perl is how flexible it is. When I don't like something about the language, I don't let it get me down. I just change the language!

The secret to doing this lies in Perl modules. Modules make this easy. Let's say you have a Perl idiom that you use everyday in your programming, but it just seems clumsier than it needs to be. Usually, with a little cleverness, you can simply hide a few hundred lines of code inside a module, thereby turning your 3-line idiom into a 2-line one!

I'm joking here, but at the same time I'm not joking. While it may seem like a recipe for scratching your itch with a backhoe, if you share your module on CPAN, you end up potentially scratching a million itches simultaneously. So perhaps the backhoe is the appropriate solution. Let me give you an example.

Slurp Me Up, Scotty

One of the most common idioms in my day-to-day programming is to read the contents of a file into a single scalar. This is often referred to by Perl geeks as a slurp operation. Usually it looks something like this:

    open my $file_handle, './Scotty'
      or die "Scotty can't be slurped:\n$!";
    local $/;   # Set input to "slurp" mode.
    my $big_string = <$file_handle>;
    close $file_handle;

So it would seem that slurping is a big deal. I mean, it took me 5 lines of code to do. Five lines is a lot in Perl. Surely, there could be an easier way. If I had my druthers, I would be able to just do this:

    my $big_string << './Scotty';

And be done with it. Unfortunately, this doesn't work the way I want it to, even though it is valid (albeit useless) Perl. How do I know it's useless? Perl told me so when I turned on warnings: "Useless use of left bitshift (<<) in void context."

Now even though I could write a source filter to make the above code do what I wanted, it wouldn't be the right approach. Surely there is something just as simple, that uses valid Perl constructs. After thinking about it for a couple hours, I came up with this:

    my $big_string = io('./Scotty')->slurp;

Being quite satisfied with my new idiom, I sat down for a few more weeks, and wrote a few hundred lines of code, and hid it in a module called IO::All and uploaded it to CPAN. Now I can do my 5-line slurp in 1 line. Phew!

Extreme Simplicity

How on earth could a module to perform the slurp idiom be several hundred lines long? Well, IO::All does slurping and a whole lot more. The motivating idea behind this module is to simplify all of the Perl IO idioms as much as possible, and also to create new idioms for common-use cases that weren't really idiomatic to begin with.

In recent years, I've become a fan and student of Extreme Programming (XP). One of the principles of XP is to constantly refactor your code to make it simpler, easier to read, and ultimately more maintainable. In striving to do so I found that as my code became as clean as I could make it, the parts that still looked dirty were constructs imposed on me by the Perl language itself; especially the IO stuff. I didn't let it get me down. I changed the language!

What's Going on Here?

The basic idea of IO::All is that it exports a function called io, which returns a new IO::All object. For example:

    use IO::All;

    my $io = io('file1');
    # Is the same thing as:
    my $io = IO::All->new('file1');

Another principle of IO::All is that it takes as many cues as possible from its context. Consider the following example:

    my @lines = io('stuff')->slurp;
    my @good_lines = grep {not /bad/} @lines;
    io('good-stuff')->print(@good_lines);

Here we are basically censoring a file, removing all the bad lines. The first statement slurps up the file, but since it is called in list context, it returns all the lines instead of one long string. The second statement weeds out any filth, and the third statement writes the good lines to a new file.

But the question arises, "How did IO::All know to open the output file for output?"

The answer is that IO::All delays the open until the first IO operation and uses the operation to determine the context. The opening and closing of files happens automatically and you almost never need to indicate the file mode, although you can do all of this manually if you really want to.

Directory Assistance

I never really liked the directory commands in Perl. You know, opendir, readdir, and closedir. I thought, well why not let an IO::All object act as a directory in addition to a file? How would IO::All know the difference? Context, of course!

    my $dir = io('mydir');
    while (my $io = $dir->read) {
        print $io->name, "\n"
          if $io->is_file;
    }

In this example, IO::All opens a directory for reading and returns one entry at a time, much like readdir. The difference is that instead of a file or subdirectory name, you get back another IO::All object. This is really cool, because you can immediately perform actions on the new objects. In the above code, we print the filename (returned by an IO::All method) if the object represents a file rather than a subdirectory (which is also returned by an IO::All Method).

File::Find

Ask any experienced Perl programmer which core module has the most abysmal interface, and they'd probably say File::Find. Rather than explain how File::Find works (which would take me an hour of research to figure out again), here's an easy way to roll your own search.

    my @wanted_file_names = map {
        $_->name
    } grep {
        $_->name =~ /\.\w{3}/ &&
        $_->slurp =~ /ingy/
    } io('my/directory')->all_files;

This search finds all the file names in a directory that have a three-character extension and contain the string 'ingy'. The all_files method is a shortcut that returns only the files. There are also all_dirs, all_links, and simply all methods.

A Poor Man's tar

This example reads all the files under a directory, and dumps them into one big file, separated by a line containing the file's name and size. This is analogous to what the Unix tar command does.

    use IO::All;
    for my $file (io('mydir')->all_files('-r')) {
        my $output = sprintf("--- %s (%s)\n", $file->name, -s $file->name)
                     . $file->slurp;
        io('tar_like')->append($output);
    }

In this usage, we pass a special flag, -r, that tells all_files to be recursive. That is, to find all files in all subdirectories. Also notice the append method. This is the same as print, but the file is opened for concatenation.

Double STanDards, String Cheese, and Temporary Insanity

IO::All has some handy shortcut names. In the Unix tradition, it uses a dash to mean STDIN, but it also uses it to mean STDOUT. Check out this one liner:

    io('-')->print(io('-')->slurp);

This just prints everything on STDIN to STDOUT. Once again context is used to determine which file handle the dash is actually referring to. A potentially more efficient way to write this is:

    my $stdin = io('-');
    my $stdout = io('-');
    $stdout->buffer($stdin->buffer);
    $stdout->write while $stdin->read;

The read and write methods work from an internal buffer, which defaults to 1k in size. What we've done in this example is to set the two objects to use the same buffer. Since the write method clears the buffer after writing it, the above idiom works nicely.

Another special character is the dollar sign. This means that the IO::All object will read/write to a Perl scalar rather than a file. This can be useful when you have a code base that writes to a file, but you want to fake it out and capture all the output in a string without changing the code base.

Finally, if you pass no arguments at all to the io function it will work as a temporary or nameless file. This is somewhat similar in effect to writing to a string, except that the data is actually going to your disk. The temporary file is opened for both read and write modes.

Here is a somewhat contrived example using all of these special cases.

    my $temp = io;
    $temp->print(io('-')->slurp);
    $temp->seek(0, 0);
    my $str = io('$');
    $str->print($_) for $temp->getline;
    my $data = ${$str->string_ref};
    $data =~ s/hate/love/;
    io('-')->print($data);

OK, listen up and repeat after me:

We slurp up all of STDIN, and slam it in a temp. We seek back to the start of it, and shove it a string. We suck the soul right out of string and save it from its sin. Then ship the lot to STDOUT, and sing it once again.

Socket to Me

If IO::All objects can represent files, directories, streams, and strings then they can surely do the same for sockets. This example prints the header lines from an HTTP GET call:

    my $io = io('www.google.com:80');
    $io->print("GET / HTTP/1.1\n\n");
    print while ($_ = $io->getline) ne "\r\n";

Again, the context comes into play. Since www.google.com:80 looks like a socket address, the IO::All object does the right thing. It is worth noting that if you really wanted to open a file called 'www.google.com:80' or '-' or '$', you can explicitly override the IO::All heuristics like such:

    my $io1 = io(-filename => 'www.google.com:80');
    my $io2 = io(-filename => '-');
    my $io3 = io(-filename => '$');

For Fork's Sake

The one thing I always use O'Reilly's Perl Cookbook for is creating a forking socket server. Not because it's that hard, but it's just something I don't keep in my head. With IO::All I have no problem remembering how to do it because it's been made dead simple:

    use IO::All;
    my $server = io(':12345')->accept('-fork');
    $server->print($_) while <DATA>;
    $server->close;
    __DATA__
    One File, Two File
    Red File, Blue File

This server sits and listens for connections on port 12345. When it gets a connection, it forks off a sub-process and sends two lines to the receiver.

The client code to call this server might look like this:

    use IO::All;
    my $io = io('localhost:12345');
    print while $_ = $io->getline;

IO::All of It to Graham

It may strike you as silly, vain, or even foolish for someone to rewrite all of the Perl IO functions as a new module when older, more mature modules exist. But therein lies the beauty of IO::All: it doesn't rewrite anything. It simply provides a keen new interface to IO::File, IO::Directory, IO::Socket, IO::String, IO::Handle, and others. It ties all of these robust modules together into one cohesive unit. So even though IO::All is relatively new, it hopefully inherits well from this legacy of stability.

As far as I know, almost all of these modules we're written by Perl superhero Graham Barr. I've met Graham personally and I don't think it would be too forward of me to suggest that you send him a beer to thank him for making Perl so great. Unfortunately I don't know his address.

Tie Me Up and Lock Me

I learned a neat trick from Gisle Aas by reading the code of his IO::String module. The trick is that you can tie an object to itself. This is especially handy when the object is IO handle. It means that you can use the object as a regular file handle with all Perl built-in IO functions. And you can also use it as a regular object by calling methods on it. It's TMTOWTDI at its finest.

    use IO::All '-tie';
    my $file = io('myfile');
    my $line1 = $file->getline;
    my $line2 = <$file>;

Pretty nifty, eh? Note that you need to request that IO::All perform the tie with the -tie option. That's because a bug in Perl 5.8.0 caused things tied to themselves to not go out of scope.

Here's another nifty feature: automatic file locking. If you specify the -lock option, IO::All will call flock after every file open. You still have to worry about things like deadlock, but at least the mechanism is simple. Here is a sample where all messages written to a log file then lock the file:

    use IO::All '-lock';
    io('log')->appendln(localtime() . " - I'm still here");

The appendln method is a cousin of the println method. Both print a new line after your output. Note that the above code is the same as the following:

    use IO::All;
    io(-lock => 'log')->appendln(localtime() . " - I'm still here");

That's because any parameters that are passed to IO::All are simply passed along to all invocations of the io function.

The Methods in My Madness

IO::All has over 60 methods that you can call to perform various IO-related actions. Not all methods make sense in all contexts for all flavors of IO. Most of the methods are simply direct proxies for methods found in the core modules that IO::All is built upon.

Some of the methods have been enhanced to be more flexible than their ancestors. Take the getline function, for example. The IO::Handle version simply gets the next line. The IO::All version takes an optional argument that is used as the record separator. To read a paragraph of text you could do this:

    my $paragraph = io('myfile')->getline('');

Lots of the methods have been presented in this article. For complete information on all the available IO::All methods, see the IO::All documentation.

It's Not Just Keen, It's Spiffy

IO::All is a little bit different from most Perl modules in that it exports the io function. As I said before, the io function acts as an object constructor, returning a new IO::All object for each invocation. This property is gained from IO::All's base class, Spiffy.pm.

Spiffy is a new kind of generic base class. Its primary magic trick is that it supports a unique feature that I call inheritable exporting. Normally if you use a module as a base class, it is strictly an object-oriented thing. You don't also export functions. Furthermore, if you were to export some functions, your subclasses would need to manually export those functions to its subclasses.

Spiffy is set up so that all of the @EXPORT arrays in all the modules in the @ISA tree of a class, are combined together to act as one big export list. The magic is then taken one step further. Functions like io that act as an object constructor are smart enough to return an object of a subclass; not just an IO::All object.

This is demonstrated in the next section. See the Spiffy documentation for details on this and other exciting features.

EIE::IO

Let's write an extension of IO::All that adds a new method called pruls, which is slurp backwards. Its purpose will be to return the whole file with its lines in reverse order. Here is the code:

    package EIE::IO;
    use IO::All '-base';
    sub pruls {
        my $self = shift;
        my @lines = reverse $self->slurp;
        wantarray ? @lines : join '', @lines;
    }
    1;

That's all there is to it. This module will act just like IO::All, except with one more method. You would use it just like this:

    use EIE::IO;
    print io('mystuff')->pruls;

Note that the io function is still exported, but it returns a new EIE::IO object. That's Spiffy!

Conclusion

IO::All is still quite a young module. There is room for many, many more idioms. There is also the possibility of including even more types of IO, like shared memory, IPC, and Unix sockets. If you have a use case that you think would make a nice addition to this Swiss Army Light Sabre of Perl IO, please Let's change the Perl language for the better.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en