Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

More Lightning Articles
by Mark Leighton Fisher, chromatic, Shlomi Fish, Bob DuCharme | Pages: 1, 2, 3

Unnecessary Unbuffering

by chromatic

A great joy in a programmer's life is removing useless code, especially when its absence improves the program. Often this happens in old codebases or codebases thrown together hastily. Sometimes it happens in code written by novice programmers who try several different ideas all together and fail to undo their changes.

One such persistent idiom is wholesale, program-wide unbuffering, which can take the form of any of:

local $| = 1;
$|++;
$| = 1;

Sometimes this is valuable. Sometimes it's vital. It's not the default for very good reason, though, and at best, including one of these lines in your program is useless code.

What's Unbuffering?

By default, modern operating systems don't send information to output devices directly, one byte at a time, nor do they read information from input devices directly, one byte at a time. IO is so slow, especially for networks, compared to processors and memory that adding buffers and trying to fill them before sending and receiving information can improve performance.

Think of trying to fill a bathtub from a hand pump. You could pump a little water into a bucket and walk back and forth to the bathtub, or you could fill a trough at the pump and fill the bucket from the trough. If the trough is empty, pumping a little bit of water into the bucket will give you a faster start, but it'll take longer in between bucket loads than if you filled the trough at the start and carried water back and forth between the trough and the bathtub.

Information isn't exactly like water, though. Sometimes it's more important to deliver a message immediately even if it doesn't fill up a bucket. "Help, fire!" is a very short message, but waiting to send it when you have a full load of messages might be the wrong thing.

That's why modern operating systems also let you unbuffer specific filehandles. When you print to an unbuffered filehandle, the operating system will handle the message immediately. That doesn't guarantee that whoever's on the other side of the handle will respond immediately; there might be a pump and a trough there.

What's the Damage?

According to Mark-Jason Dominus' Suffering from Buffering?, one sample showed that buffered reading was 40% faster than unbuffered reading, and buffered writing was 60% faster. The latter number may only improve when considering network communications, where the overhead of sending and receiving a single packet of information can overwhelm short messages.

In simple interactive applications though, there may be no benefit. When attached to a terminal, such as a command line, Perl operates in line-buffered mode. Run the following program and watch the output carefully:

#!/usr/bin/perl

use strict;
use warnings;

# buffer flushed at newline
loop_print( 5, "Line-buffered\n" );

# buffer not flushed until newline
loop_print( 5, "Buffered  " );
print "\n";

# buffer flushed with every print
{
    local $| = 1;
    loop_print( 5, "Unbuffered  " );
}

sub loop_print
{
    my ($times, $message) = @_;

    for (1 .. $times)
    {
        print $message;
        sleep 1;
    }
}

The first five greetings appear individually and immediately. Perl flushes the buffer for STDOUT when it sees the newlines. The second set appears after five seconds, all at once, when it sees the newline after the loop. The third set appears individually and immediately because Perl flushes the buffer after every print statement.

Terminals are different from everything else, though. Consider the case of writing to a file. In one terminal window, create a file named buffer.log and run tail -f buffer.log or its equivalent to watch the growth of the file in real time. Then add the following lines to the previous program and run it again:

open( my $output, '>', 'buffer.log' ) or die "Can't open buffer.log: $!";
select( $output );
loop_print( 5, "Buffered\n" );
{
      local $| = 1;
      loop_print( 5, "Unbuffered\n" );
}

The first five messages appear in the log in a batch, all at once, even though they all have newlines. Five messages aren't enough to fill the buffer. Perl only flushes it when it unbuffers the filehandle on assignment to $|. The second set of messages appear individually, one second after another.

Finally, the STDERR filehandle is hot by default. Add the following lines to the previous program and run it yet again:

select( STDERR );
loop_print( 5, "Unbuffered STDERR " );

Though no code disables the buffer on STDERR, the five messages should print immediately, just as in the other unbuffered cases. (If they don't, your OS is weird.)

What's the Solution?

Buffering exists for a reason; it's almost always the right thing to do. When it's the wrong thing to do, you can disable it. Here are some rules of thumb:

  • Never disable buffering by default.
  • Disable buffering when and while you have multiple sources writing to the same output and their order matters.
  • Never disable buffering for network outputs by default.
  • Disable buffering for network outputs only when the expected time between full buffers exceeds the expected client timeout length.
  • Don't disable buffering on terminal outputs. For STDERR, it's useless, dead code. For STDOUT, you probably don't need it.
  • Disable buffering if it's more important to print messages regularly than efficiently.
  • Don't disable buffering until you know that the buffer is a problem.
  • Disable buffering in the smallest scope possible.