June 2000 Archives

This Week on p5p 2000/06/25



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

This week's report is a little late because I went to San Diego Usenix, and then I went to YAPC in Pittsburgh (probably the only person on the continent stupid enough to try to do both) and then I went back to Philadelphia and was driven to Washington DC for a party and came back on the train.

I was going to say it was a quiet week on the list. But it wasn't. It was merely a low-traffic week. It wasn't quiet at all; all sort of useful and interesting stuff was posted, and there was an unusually high signal-to-noise ratio.

This week has been named 'Doug MacEachern and Simon Cozens' week. Thank you Doug and Simon, and also everyone else who contributed to the unusually high signal-to-noise ratio this week.

Method Lookup Speedup

More discussion of Doug's patch of last week.

Previous summary

Last week, some people pointed out that it would fail in the presence of code that modifies @ISA at runtime; Sarathy suggested a pragma that would promise that this would not happen. Nick suggested that use base could do that.

Doug submitted an updated patch.

Updated patch

For your delectation, Simon Cozens wrote up an extensive explanation of the patch and how it works, including many details about the Perl internals. If you are interested in the Perl internals (and you should be) then this is strongly recommended reading.

The explanation.

I would like very much to run other articles of the same type in the future. This should be construed as a request for people to contribute them. They don't have to be as complete or detailed as Simon's.

Thank you very much, Simon.

tr///CU and tr///UC Removed

Simon, who has been working on the line discipline feature, got rid of the nasty tr///CU feature, which Larry had already decided was a bad idea and should be eliminated.

is_utf8_string

Simon also added a function named is_utf8_string that checks a string to make sure it is valid UTF8. The plan is that if Perl is reading a putatively UTF8 file, it can check the input before setting the UTF8 flag on the resulting scalar.

Byte-Order Marks Return

Simon submitted an improved patch for this. This one just has the lexer use tell() to see if the putative byte-order mark is at the very beginning of the file.

The new patch

Previous summary

pack("U")

A few weeks ago there was discussion of what this should do.

Previous summary

Simon submitted a patch that implemented an idea of Larry's: That a U at the beginning of the pack template indicates that the result of pack will be a UTF8 string; anything else indicates a byte string. THis means (for example) that you can put U0 at the beginning of any pattern to force it to produce UTF8; if you want to start with U but have the result be bytes, add a do-nothing C0 at the beginning instead.

The patch.

Lexical variables and eval()

Yitzchak Scott-Thoennes reported on a number of puzzles related to the interaction of these features, including:

        { my $x; sub incx { eval '++$x' } }

Here incx apparently increments the lexical variable; he expected it to increment the global variable. (Rationale: The lexical variable should be optimized away.)

Rick Delaney referred to a relevant article by Ilya in clp.misc.

Yitzchak says that code in a subroutine should not be able to alter lexical variables in a more outer scope, unless it is a closure, which incx here is not. Rick presents the following counterexample:

        my $Pseudo_global = 2;

        sub double {
          my ($x) = @_;
          eval '$x * $Pseudo_global';
        }

Discussion seemed inconclusive. No patches were offered.

I said that I had done some research a while back about what Scheme and Common Lisp do in this sort of case, and that I would report back with a summary, but I have not done so.

FILEGV

There was some discussion about the FILEGV macro. When Perl compiles the op tree, the line and file information is stored in a GV. Or rather, it used to be so; now, if you compile with ithreads, it just uses strings. There were some macros, *FILEGV, to access this GV, but according to Sarathy, they was mostly used to get at the filename, and there is a more straightforward macro family, *FILE, which gets the filename directly. Doug MacEachern wanted to use the original macro in B::Graph, although I was not sure why; Sarathy said that probably B::Graph needed to be fixed.

perlhacktut

Simon contributed the first half of a document titled perlhacktut, a tutorial on hacking the Perl core. It talks about how to get started and what to read, provides an overview of Perl's large subsystems, and the begining of a discussions of Perl's basic data types and op trees.

If you are interested in the Perl internals (and you should be) then this is strongly recommended reading. (Gosh, that sounds familiar.)

First draft.

perlutil.pod

Simon also contributed a document describing the utility programs that cmoe packaged with Perl, such as perldoc, pod2html, roffitall, and a2p.

Quite a busy week for Simon.

perlutil

Missing Methods

Martyn Pierce pointed out that if you have code like this:

        Foo->new('...');

it might fail for two reasons: because the Foo class does not define that method, or because you forgot to put use Foo in your program. In both cases the message is

        Can't locate object method "new" via package "Foo" ...

Martyn suggested that in the second case, it could add a remark like

        (perhaps you forgot to load module "Foo"?)

However, he did not provide a patch.

I also wonder why it says 'object method' when it is clearly a class method. I did not provide a patch either. This would be an excellent first patch for someone who wanted to get started patching. Write to me if you are interested in looking into it but do not know where to begin.

Suppress prototype mismatch warnings

Doug MacEachern discovered lots and lots of subroutine declarations in Socket.pm that were there only to predeclare a bunch of autoloaded constants like AF_INET. The only purpose for the declarations was to prevent 'prototype mismatch' warnings from occurring when the constants were actually autoloaded at run time. He then put in a patch to suppress the warning, if it appears that the subroutine will be autoloaded later, and removed the 20K of constant sub declarations in Socket.pm.

Autoloaded Constants not Inlined

Doug also discovered that these autoloaded constants' values are not inlined, because the code that uses them is compiled before the subroutine is loaded. Doug produced a patch to Exporter.pm that lets you specify a name with a leading + sign in the use line to indicate that the subroutine should be invoked once (and hence autoloaded) immediately, when the module is loaded, so that they can be inlined into the following code.

The patch.

lib.pm

Doug MacEachern decided that it was a shame that lib.pm has to pull in all of Config.pm, so he recast lib.pm as a script, lib.pm.PL, which generates the real lib.pm at install time, inserting the appropriate values of $CONFIG variables inline.

(Many other utilities, such as perlcc and pod2html, are generated this way at present. Do ls */*.PL in the source directory to see a list.)

use English

Barrie Slaymaker contributed a patch so that you can now say

        use English '-no_match_english';

and it will import all the usual long names for the punctuation variables, except for $`, $&, and $', which slow down your regexes. If you don't supply this flag, then those variables are separately aliased via an eval statement.

This has been a long time coming---I thought it had been done already.

There was a long sidetrack from having to do with some unimportant style issue, which should have been carried out in private email, or not at all.

Numeric opens in IPC::Open3

Frank Tobin submitted a patch that allows the user of IPC::Open3 to request that any of the 'files' to be opened be an already open file descriptor, analogous to the way open FH, "<&=3" works with regular open.

Regex Bug

Ian Flanigan found a very upsetting bug in the regex engine.

Read about it.

Foo isa Foo

Johan Vromans complained that

        my $r = "Foo";
        UNIVERSAL::isa($r, "Foo::");

returns true. Johan does not like that $r (which is a string) is reported to be a member of class Foo. It was pointed out that the manual explicitly says that UNIVERSAL::isa] may be called as a class method, to determine whether one class was a subclass of another, in which case it could be invoked as

        Foo->isa('Foo')

which is essentially the same as Johan's example, and which returns true because the class Foo is (trivially) a subclass of itself.

Johan said 'Yuck.'

README.hpux

Jeff Okamoto updated it again.

Here it is.

my __PACKAGE__ $obj ...

Doug MacEachern submitted a patch to enable this. The patch came in just barely before the end-of-the week cutoff, and has already been a lot of discussion of it in the past two days, so I am going to defer talking about it any more until my next report.

Should you want to look at it before then, here it is.

asdgasdfasd

Some anonymous person running as root submitted a bug report (with perlbug) that only said 'asdgasdfasd'. Martyn Pearce replied that it was not a bug, but a feature.

Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a very small amount and spam. No serious flamage however.

This is the end of the month, so I will summarize: I filed 97 messages in the junk folder, 311 in the misc folder, and 329 messages in 45 various other folders pertaining to particular topics.

Until next week I remain, your humble and obedient servant,


Mark-Jason Dominus

This Week on p5p 2000/06/18



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

This week's report is a little early, because I am going to San Diego Usenix tomorrow. Next week's report will cover anything I missed this week, and may be late, since I will just have gotten back from YAPC.

A really mixed bag this week. Great work from Doug McEachern, Nicholas Clark, and Simon Cozens, and a lot of wasted yakkity yak from some other people.

Method Call Speedups

Doug McEachern wrote a patch to implement a compile-time optimization: Class method calls, and method calls on variables that have a declared class (as with my Dog $spot) have the code for the method call rewritten as if you had requested an ordinary subroutine call. For example, if you have

        my Class $obj = ...;
        $obj->method(...);

and the method that gets called is actually Parent::method, then Perl will pretend that you actually wrote

        my Class $obj = ...;
        Parent::method($obj, ...)

instead. Doug found that the method calls did get much faster---in some cases faster than regular subroutine calls. (I don't understand how this can possibly be the case, however.) One side benefit (or maybe it's a malefit?) of Doug's approach is that you can now enable prototype checking on method calls.

A lot of work remains to be done here. Doug's patch does not actually speed up method calls; it replaces method calls with regular subroutine calls. It would be good to see some work done on actually making method calls faster.

The patch.

More Attempts to Make B::Bytecode Faster

The whole point of B::Bytecode is to speed up the startup time of Perl programs. Two weeks ago Benjamin Stuhl reported that bytecoded files are actually slower than regular source files, probably because the bytecoded files are so big that it takes a lot of time to read them in.

Previous Summary

Nicholas Clark looked into compressing the bytecode files. He fixed [Byteloader.xs] so that it was a true filter, and could be installed atop another filter; in this case one that decompressed gzipped data.

It didn't work; the decompression overhead made the compressed bytecode files even slower than the uncompressed byecode files. Time Bunce pointed out that this is a bad test, since a lot of the modules that the byte compiler and byte loader will load are things that a larger script would have needed anyway, but that were not present at all in Nicholas' hello.pl test.

Nickolas Clark:
  1. Currently byte code is far to large compared with its script. (I think someone else mailed the list with improvements that reducethe amount of op tree information saved as bytecode, which should help)
  2. For a simple script bytecode is slower than pure perl.
  3. Using a general purpose data compression algorithm (zipdeflation) Bytecode only compresses by a factor of 3, which still leaves it much larger than its script.
  4. Decompression filters written in perl run very slowly. (butare much easier to write than those in C)
  5. Although a decompression filter written in C is much faster, it still doesn't quite match the speed of reading and parsing thebytecode, let alone the original script (for this example). However, it'sclose to uncompressed bytecode.

Nicholas' message contained many other interesting details about bytecodes.

Read about it.

Byte-Order Marks Continue

Simon produced another revision to his patch to make Perl automatically handle source code written in various flavors of Unicodings. He went to a lot of work to get the lexer to recognize the BOMs only at the very beginning of the file. (One startling trivium here: If you have

        ...some code here...
        #line 1
        #!/usr/bin/perl -wT
        ...more code...

The #line 1 fools the lexer into thinking that what follows is the first line, and then Perl interprets the `command-line' options on the following comment even though they're not really on the first line of the file.)

Simon:Yes, the part in pp_ctl does have to be this complicated and order is important. If you're going to lie to the lexer, you have to be pretty damned convincing.

Apparently Simon later posted a different revision that was simpler and used tell to see if the BOM was really at the beginning of the file, but it didn't appear on p5p.

Slurp Bug

Last month Joey Hess reported a bug in Perl's slurping; it was reading line by line and it shouldn't have been.

Original report and test case.

Nobody has investigated this yet, and Sarathy said that was a pity, which I think whould be interpreted as a hint that someone should have a look at it.

EPOC Port

Olaf Flebbe posted some enhancements to his port for EPOC, which is an OS for palmtops and mobile phones. (See README.epoc in the Perl distribution for more details.)

The patch.

README.hpux

Jeff Okamoto contributed a new one.

Here it is.

Paths in MacPerl

Last week Peter Prymmer contributed a large patch that attempts to make the test suite work better on Macintoshes by replacing a lot of Unix-style pathnames like '../lib' with constructions of the form ($^O eq 'MacOS') ? '::lib:' : '../lib'. This sparked a discussion about better ways to approach this problem. Chris Nandor suggested a paths.pl file which the suite could retquire that would set up the path strings correctly. He pointed out that if this library were in the same directory as the script that required it, the require would work on any platform. He also said that having native support for path translations was probably a bad idea. (This would mean that require 'foo/bar.pm' would actually load foo:bar.pm, which is the 'right thing' unless what you actually wanted was to require a file named foo/bar.pm.)

Matthias reported on what he actually does do in MacPerl.

It appeared that the issue about what to do about the test suite went unresolved. I do not know yet if Peter's big patch went in.

Non-destructed anonymous functions

Last week Rocco Caputo reported that his blessed coderefs were not being DESTROYed, even at interpreter shutdown time. Nick Ing-Simmons produced an explanation. I suppose it could be called a feature.

The explanation.

Extensions required for regression tests

Nicholas Clark pointed out that if you don't build all the Perl standard extension modules, some of the regression tests fail, and that the regression tests shouldn't depend on the extension modules unless they are explicitly testing the extension modules.

For example, the io/openpid.t test file wants to use the Fcntl module; if you decided not to build Fcntl, it barfs. Nick offered to make a patch, and Sarathy agreed it owuld be a good idea. I have not seen the patch appear yet.

Eudora Problem

The problem with Eudora mangling patch files turns out to be more complicated than I originally reported. If you use Eudora, you should probably read the following discussion.

Eudora discussion.

crypt docs

Ben Tilly made a trivial change to the documentation for the crypt function that sparked a long and irrelevant discussion about password security policy.

Magic Auto-Decrement

The idle and pointless magic decrement discussion continued.

Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount and spam. I think there was flamage, but it was in the thread I skipped.

Until next week I remain, your humble and obedient servant,


Mark-Jason Dominus

Return of Program Repair Shop and Red Flags



Unprogramming

A few weeks ago I got mail from Bruce, a former student who wanted to take a number like 12345678 that had come out of a database, and to format it with commas for printing in a report, as 12,345,678. I referred him to the solution in the perlfaq5 man page, but the solution there uses a rather bizarre repeated regex, and perhaps that's why he decided to do it himself. Here's the code he showed me:

     1  sub conversion
     2  {
     3     $number = shift;
     4     $size = length($number);
     5     $result = ($size / 3);
     6     @commas = split (/\./, $result);
     7     $remain = ($size - ($commas[0] * 3));
     8     $pos = 0;
     9     $next = 0;
    10     $loop = ($size - $remain);
    11     while ($next < $loop)
    12     {
    13        if ($remain > 0)
    14        {
    15           $section[$pos] = substr($number, 0, $remain);
    16           $next = $remain++;
    17           $remain = 0;
    18           $pos++;
    19        }
    20        $section[$pos] = substr($number, $next, 3);
    21        $next = ($next + 3);
    22        $pos++;
    23     } 
    24     $loop = 0;
    25     @con = ();
    26     foreach (@section) 
    27     {
    28        $loop++;
    29        $cell++;
    30        @tens = split (/:/, $_);
    31        $con[$cell] = $tens[0];
    32        if ($loop == $pos)
    33        {
    34           last;
    35        }
    36        $cell++;
    37        $con[$cell] = ",";
    38     }
    39     return @con;
    40  }

Bruce described this as ''Probably pretty crude and bulky.'' I'd have to agree. 40 lines is pushing the limit for a readable function, and there's no reason why something this simple should have to be so large. Bruce has done a lot of programming work here and produced a lot of code; let's see if we can unprogram some of this and end up with less code than we started with.

    5      $result = ($size / 3);
    6      @commas = split (/\./, $result);
    7      $remain = ($size - ($commas[0] * 3));

Right up front is probably the single weirdest piece of code in the whole program. I know it's weird because the first time I saw it I realized what it did right away, but when I revisited the program a couple of weeks later, I couldn't figure it out at all. Bruce knows that the digits in the original number will be divided into groups of three, with a group of leftover digits at the beginning. He wants to know how many digits, possibly zero, will be in that first group. To do that, he needs to divide by three and find the remainder.

Bruce has done something ingenious here: The code here divides $size by 3, and supposes that the result will be a decimal number. Then it gets the integer part with split, splitting on the decimal point character!

Perl already has a much simpler way to get the remainder: The % operator:

        $remain = $size % 3;   # This gets the remainder after division.

It's also worth remembering that Perl has an int() function which throws away the fractional part of a number and returns the integer part . This is essentially what the split was doing here.

A useful rule of thumb is that it's peculiar to treat a number like a string, and whenever you do pattern matching on a number, you should be suspicious. There's almost always a more natural way to get the same result. For example

        if ($num =~ /^-/) { it is less than zero }

is bizarre and obfuscatory; it should be

        if ($num < 0) { it is less than zero }

     8     $pos = 0;
     9     $next = 0;
    10     $loop = ($size - $remain);
    11     while ($next < $loop)
    12     {
    13        if ($remain > 0)
    14        {
    15           $section[$pos] = substr($number, 0, $remain);
    16           $next = $remain++;
    17           $remain = 0;
    18           $pos++;
    19        }
    20        $section[$pos] = substr($number, $next, 3);
    21        $next = ($next + 3);
    22        $pos++;
    23     }

Now here we have a while loop with an if condition inside it. The if condition is that $remain be positive. Inside the if block, $remain is set to 0, and it doesn't change anywhere else in this section of code. So we can deduce that the `if' block will only be executed on the first trip through the loop, because after that, $remain will be 0.

That suggests that we should do the if part before we start the loop, because then we won't have to test $remain every time. Then the structure is simpler, because we can move the if block out of the while block, and even a little shorter because we don't need the code that manages $remain:

    $next = 0;
    $pos = 0;
    if ($remain > 0)
    {
       $section[$pos] = substr($number, 0, $remain);
       $next = $remain + 1;
       $pos++;
    }
    $loop = ($size - $remain);
    while ($next < $loop)
    {
       $section[$pos] = substr($number, $next, 3);
       $next = ($next + 3);
       $pos++;
    }

In the while loop we see another case of a common beginner error I pointed out in the last article. Whenever you have a variable like $pos that only exists to keep track of where the end of an array is, you should get rid of it. Here, for example, the only use for $pos is to add a new element to the end of @section. But the push function does that already, without needing $pos. Whenever you have code that looks like

        $array[$pos] = SOMETHING;
        $pos++;

you should see if it can be replaced with

        push @array, SOMETHING;

97% of the time, it can be replaced. Here, the result is:

    $next = 0;
    if ($remain > 0)
    {
       push @section, substr($number, 0, $remain);
       $next = $remain + 1;
    }
    $loop = ($size - $remain);
    while ($next < $loop)
    {
       push @section, substr($number, $next, 3);
       $next = ($next + 3);
    }

At this point in the code, I had an inspiration. $pos was just a special case of a more general principle at work. In every program, there are two kinds of code. Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer. This second kind of code is just scaffolding. The goal of all programming is to reduce this accidental, synthetic code so that the natural, essential code is more visible.

$pos is a perfect example of synthetic code. It has nothing to do with adding commas to an input. It's only there because we happened to use an array to hold the chunks of the original input, and arrays are indexed by numbers. Array index variables are almost always synthetic.

Good languages provide many ways to reduce synthetic code. Here's an example. Suppose you have two variables, a and b, and you want to switch their values. In C, you would have to declare a third variable, and then do this:

        c = b;
        b = a;
        a = c;

The extra variable here is totally synthetic. It has nothing at all to do with what you really want to do, which is switch the values of a and b. In Perl, you can say

        ($a, $b) = ($b, $a);

and omit the synthetic variable.

It's funny how sometimes it can be so much easier to think about something once you have a name for it. Once I had this inspiration about synthetic code, I suddenly started seeing it everywhere. I noticed right away that $next and $loop were synthetic, and I started to wonder if I couldn't get rid of them. Not only are they both synthetic, but they're used for the same purpose, namely to control the exiting of the while loop. Two variables to control one loop is excessive; in most cases you only need one variable to control one loop. If there are two, as in this case, it's almost always possible to eliminate one or combine them. Here it turns out that $loop is just useless, and we could have been using $size instead. $size is rather natural, because it's simply the length of the input, and using length($number) is more natural still.

    if ($remain > 0)
    {
       push @section, substr($number, 0, $remain);
    }
    $next = $remain;
    while ($next < length($number))
    {
       push @section, substr($number, $next, 3);
       $next += 3;
    }

Now the condition on the while loop is much easier to understand, because there's no peculiar and meaningless $loop variable: ``While the current position in the string is not past the end, get another section.''

I also changed $next = $next + 3 to $next += 3 which is more concise.

Now we have two variables, $next and $remain, which only overlap at once place, and at that one place (the assignment) they mean the same thing. So let's let one variable do the work of two:

    if ($remain > 0)
    {
       push @section, substr($number, 0, $remain);
    }
    while ($remain < length($number))
    {
       push @section, substr($number, $remain, 3);
       $remain += 3;
    }

The code is not going to get much simpler than this. We have turned twelve lines into five.

Assembling the Result

    24     $loop = 0;
    25     @con = ();
    26     foreach (@section) 
    27     {
    28        $loop++;
    29        $cell++;
    30        @tens = split (/:/, $_);
    31        $con[$cell] = $tens[0];
    32        if ($loop == $pos)
    33        {
    34           last;
    35        }
    36        $cell++;
    37        $con[$cell] = ",";
    38     }
    39     return @con;
    40  }

Here we want to construct the result list, @con, from the list that has the sections in it. I couldn't understand what the @tens array is for, or why the code is looking for : characters, which don't normally appear in numerals. The original program turns 1234:5678 into 123,4,678, which I can't believe was what was wanted. I asked Bruce what he was up to here, but I didn't have enough context to understand his response---I had the impression that it was an incompletely implemented feature. So I took it out and left behind a comment.

$cell is another variable whose only purpose is to track the length of an array, so we can eliminate it by using push the same way we did before:

    foreach (@section) 
    {
       $loop++;
       push @con, $_;
       # Warning: no longer handles ':' characters

       if ($loop == $pos)
       {
          last;
       }
       push @con, ',' ;
    }
    return @con;

Now the only use of the $loop variable is to escape the loop before adding a comma to the last element. Let's simply get rid of it. Then when we leave the loop, there is an extra comma at the end of the array, but it's easier to clean up the extra comma afterwards than it was to keep track of the loop:

    foreach (@section) 
    {
       push @con, $_, ',';
    }
    pop @con;
    return @con;

Again, I don't think this loop is going to get much smaller.

What we have now looks like this:

    # Warning: No longer handles ':' characters
    sub conversion 
    { 
      my ($number) = @_;
      my $remain = length($number) % 3;

      if ($remain > 0)
      {
         @section = (substr($number, 0, $remain));
      }

      while ($remain < length $number)
      {
         push @section, substr($number, $remain, 3);
         $remain += 3;
      }

      foreach (@section) 
      {
         push @con, $_, ',';
      }
      pop @con;
      return @con;
    }

This is a big improvement already, but more improvement is possible. @section is a synthetic variable; it's only there so we can loop over it, and then we throw it away at the end. It would be better to construct @con directly, without having to build @section first. Now that the code is so simple, it's much easier to see how to do this. In the original program, one loop breaks the input into chunks, and the other loop inserts commas. We can replace these two loops with a single loop that does both tasks, so that this:

      while ($remain < $size)
      {
         push @section, substr($number, $remain, 3);
         $remain += 3;
      } 

      foreach (@section) 
      {
         push @con, $_, ',';
      }

Becomes this:

      while ($remain < $size)
      {
         push @con, substr($number, $remain, 3), ',';
         $remain += 3;
      }

Eliminating the second loop means that the special case if block at the beginning must insert its own comma if it is exercised:

      if ($remain > 0)
      {
         push @con, substr($number, 0, $remain), ',';
      }

Eliminating the second loop also has the pleasant side effect of making the function faster, since it no longer has to make two passes over the input. The finished version of the function looks like this:

    # Warning: No longer handles ':' characters
    sub conversion 
    { 
      my ($number) = @_;
      my $size = length($number);
      my $remain = $size % 3;
      my @con = ();

      if ($remain > 0)
      {
         push @con, substr($number, 0, $remain), ',';
      }

      while ($remain < $size)
      {
         push @con, substr($number, $remain, 3), ',';
         $remain += 3;
      }

      pop @con;
      return @con;
    }

This is a big win. A 30-line function has turned into a 12-line function. Formerly, it had nine scalar variables and four arrays; now it has three scalars and one array. If we wanted, we could reduce it more by eliminating $size, which is somewhat synthetic, and using length($number) in the rest of the function instead. The gain seemed smaller, so I didn't choose to do it.

In good code, the structure of the program is in harmony with the structure of the data. Here the structure of the code corresponds directly to the structure of the result we are trying to produce. We wanted to turn an input like 12345678 into an output like 12 , 345 , 678. There is a single if block up front to handle the special case of the initial digit group, which might be different from the other groups, and then there is a single while loop to handle the rest of the groups.

The really funny thing about this code is that I hardly had to use any of Perl's special features at all. The cleanup came entirely from reorganizing the existing code and removing unnecessary items. Of course, Perl features like push made it easy to eliminate synthetic variables and other code that would have been necessary in other languages.

For a more 'Perlish' (and unfortunately obfuscated) solution to this problem see the FAQ.


Red Flags

A red flag is a warning sign that something is wrong. When you see a red flag, you should immediately consider whether you have an opportunity to make the code cleaner. I liked this program because it raised many red flags.

Eliminate synthetic code

Some parts of your program relate directly to the problem you are trying to solve. This is natural code. But some parts of the program relate only to other parts of the program; this is synthetic code. An example is a loop control variable. You can tell from its name that it's synthetic. It's not there to solve your problem; it's there to control a loop, and the loop is there to help solve the problem. You might care about the loop, but the control variable is an inconvenience, only there for bookkeeping.

Beware of special cases in loops

If you have a loop with a special test to do something on the first or last iteration, you may be able to get rid of it. First-iteration code can often be hoisted out of the loop into a separate initialization section. Last-iteration code can often be hoisted down and performed after the loop is finished. If the loop runs a little too much code, undoing the extra is often simpler than trying to escape the loop prematurely.

Don't apply string operations to numbers

Treating a number as a string of digits is a bizarre thing to do, because the digits themselves don't really have much to do with the value of the number. Doing so creates a string version of the numeric quantity, which usually means you went down the wrong path, because Perl numbers are stored internally in a numeric form that should support all the numeric operations you should want.

If you used a regex, or split, substr, index, length, or any other string function on a number, that is a red flag. Stop and consider whether there might be a more natural and robust way to do the same thing using only numeric operations. For example, this is bizarre:

        if (length($number) > 3) { large number }

It is more natural to write this instead:

        if ($number >= 1000)      { large number }

An exception to this is when you really are treating a number as a string, such as when you're writing it into a fixed-width field. Examples of both cases occur in the program in this article. In the original, using split to compute the modulus operator was unnatural. In the final version, we do indeed apply length() and substr() to $number, but that's because we really do want to treat the number as a digit string, splitting it up into groups of three digits and inserting commas.

Still, the red flag is there, and so we should see what happens if we heed it, and try to replace length() and substr() with truly numeric operations. The result:

        # Warning: No longer handles ':' characters.
        sub convert {
          my ($number) = shift;
          my @result;
          while ($number) {
            push @result, ($number % 1000) , ',';
            $number = int($number/1000);
          }
          pop @result;      # Remove trailing comma
          return reverse @result;
        }

Notice again how it's easier to pop an extra comma off the end of @result afterwards than it is to special-case the first iteration of the loop to avoid adding the comma in the first place. The code now has eight lines, one scalar, and one array. I think this qualifies as a win! The lesson to learn here is: When in doubt, try writing it both ways!

This Week on p5p 2000/06/11



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

Next week's report will be late, since I will be bending space and time to attend both San Diego Usenix and YAPC. If the fabric of the universe survives my ill-advised meddling, the reports should resume the following week.

This was the quietest week I can remember. Very little seemed to happen.

Byte-Order Marks

Unicode files may begin with the special Unicode character U+FEFF. That is so that if the byte order gets reversed somehow (as with a big-endian to little-endian transformation) you can recognize that that has happened because the initial character will be U+FFFE, which is guaranteed to never be assigned.

Tim Burlowski saved a Perl program file with the UTF8 encoding under windows, and when he tried to run the script, Perl complained about the initial U+FEFF. ( Unrecognized character \xEF..., because U+FEFF encodes to "\xEF\xBB\xBF" under UTF-8.) Tim asked if Perl shouldn't know to ignore this. Sarathy agreed, and Simon provided a patch, which also enables Perl to read a UTF-16-encoded source code file.

The patch.

Magic Auto-Decrement

Someone asked why there isn't one. This sparked a long discussion of how it might work. (What is 'a'--? What is 'aAa00'--?)

There was a lot of idle discussion, and no patch, so probably nobody really cares.

Bug Reports

Richard Foley coughed up a lot of bug reports that had gotten lost somehow. So there was a lot of miscellaneous stuff. Some of the bug reports related to configuration errors, and some were genuine. Some attracted patches, others did not. It seemed to me that this batch of bug reoprts contained more than the usual number of weird oddities. For example:

Weird oddity.

Some of the non-oddities that remain unfixed follow. In an attempt to encourage more people to try to fix bugs, I tried here to select some bugs that seemed not too difficult to solve. So if you have ever wanted to become a Perl core hacker and you wanted a not-too-hard task to start on, the following bugs might be good things to work on.

If you are interested in trying to fix one of these, and you need help, or you don't know how to start, please do send me email and I will try to assist you.

Core Dump I

Here is a bug that makes Perl dump core. Sarathy reduced Wolfgang Laun's small test case to a very small test case.

Test Case.

Another Test Case.

Core Dump II

Here is another core dump, this one on an improper pseudohash reference.

Test Case.

Class::Struct objects misbehave with -&gt;isa()

If $foo is a Class:Struct object, and you call ->isa('UNIVERSAL') on it, you get the correct answer (true) the first time, and the wrong answer (false) on subsequent calls.

Test Case.

Data::Dumper Weirdness

Victor Insogna got weird output from Data::Dumper. The test cae is very simple but it's not entirely clear to me whether the bug is in Data::Dumper itself or if Perl is actually constructing a bizarre value.

Test Case.

Blessed coderefs never DESTROYed

Rocco Caputo reported that if you bless a coderef into a package with a destructor function, the destructor is never called, not even at program termination.

Test Case.

Code compiled incorrectly

Barrie Slaymaker reported that in 5.6.0,

        1 while ( $a = ( $b ? 1 : 0 ) )

appears to be compiled as if you had written

        '???' while defined($a = $b ? 1 : 0)

apparently as an incorrect application of the same transformation that makes

        while (readdir D) 

into

        while (defined(readdir D))

MacPerl Test Suite Patches

Peter Prymmer sent a big patch that attempts to make the test suite work better on Macintoshes by replacing a lot of Unix-style pathnames like '../lib' with constructions of the form ($^O eq 'MacOS') ? '::lib:' : '../lib'.

The patch.

Why / is not ignored in comments in /.../x constructions

People are often surprised that

        $string =~ m/a+
                     foo  # some comment here that mentions /
                     w{3}
                    /x;

is a syntax error; the / in the 'comment' terminates the regex prematurely. They expected it to be ignored, since it is in a comment.

The way Perl handles /.../x is that it parses the regex as usual, and locates the terminating slash as usual, and then hands off the regex to the regex engine for parsing, with a flag saying 'by the way, this regex was marked with the /x modifier. The regex is then parsed accordingly. But The main Perl parser is totally unaware of the meaning of /x and in particular it uses the same old logic to determine where the end of the regex is, and doesn't realize that it is supposed to ignore the 'comment'. In other words, the comment is a comment for the regex compiler, but not for the Perl parser.

This is well-known to many people, and I mention it here because Ben Tilly came up with a really nice example of why this problem can't be 'fixed'. Here it is:

        if ($foo =~ /#/) {
          # Do something
        }
        # Time passes
        print "eg.  In DOS you would use /x instead of -x\n";

Now, where does that regex end?

Various

A large collection of bug reports, and a small collection of bug fixes, non-bug reports, questions, answers, and spam. No flames and little discussion.

Until next week I remain, your humble and obedient servant,


Mark-Jason Dominus

Adventures on Perl Whirl 2000



Can a tech conference on a luxury cruise boat possibly be legitimate? Sure it is, and then some.

This past Memorial Day, I joined about 200 attendees on Perl Whirl 2000, the inaugural Geek Cruise. All in all, we found that not only was there a mind-blowingly good set of tutorials available, but there was more time to meet and chat with fellow Perl hackers than there is at most conferences.

Honestly, I knew I was going on the Perl Whirl, but the idea took a while to set in. When I received my tickets, it still seemed unreal. When I started packing my bags, it seemed like just another conference. When I boarded the plane, it was just another trip. When I arrived at my hotel in Vancouver, reality began to set in: I'm going on a Cruise. To Alaska. Talking about Perl all along the way. Soon, we would board Holland America's m.s. Volendam and kick off Perl Whirl 2000.

From the very beginning, the cruise exuded a palpable air of calm. Gone was the normal harried conference mentality where every minute is either spent agonizing over which great presentation to attend or which great conversation to continue in the hall. From the moment we left the harbor, everyone had a laid-back attitude, as if to say ``I can find you later, we have a week together and plenty of time to talk''. And that made all the difference.

Anyone who thought this wasn't a serious conference was in for a big surprise. Although there were many social events, sometimes starting at 10 or 11 PM, the tutorials started at 8:30am and continued until 5 PM. (The schedule was just packed, and I'm not the only one who was hurting around 8am, especially when we had to adjust our clocks into and out of Alaska Time.) Three full days of tutorials were offered across three to five tracks, with the middle day being split into two half-day chunks. Ten extra hours of conference programming were available in roughly ninety-minute chunks before dinner on some nights. Then there was the B-Movie marathon. Then there were the cocktail parties. Then there were all of the other events organized by the ship's cruise director. Yes, the schedule was truly packed.

Even with such a dense schedule, there was plenty of time to unwind. We could all eat a nice long, lingering dinner in the main dining room (or take dinner on the Lido deck in shorts and T-shirts, or just order room service), and gradually reconvene in the Crow's Nest after dinner for a few drinks (or not), and lots of after-dinner conversation until the dusk finally gave way to pre-dawn twilight around 3am.

What were the seminars themselves like? Some of the presenters were incredibly popular, like Mark-Jason Dominus, Tom Christiansen, Tim Bray and Lincoln Stein. The bits and pieces I sat through were simply stellar, without exception. Many people I talked to echoed the common conference complaint, ``there's too much good stuff to see, and too many great presentations conflict with each other''. Sadly, every conference organizer heard this before, and such conflicts are bound to appear as soon as a schedule is produced.

While I didn't sit through most of the seminar program, I did sit through the entire pre-dinner program. Here attendees got to hear some great things people are doing with Perl. The first pre-dinner talk was by Steven Roberts who talked about his Microship project. Some people may remember Steve as the man behind Behemoth, the recumbent bicycle and trailer with more computing power than an average dentist's office. Steve's current project is a pair of boats with similar amounts of computing power that will soon travel the inland and coastal waters of the United States. Steve's team recently realized that projects such as this live and die on volunteer effort, and while it is quite easy to find Perl hackers to write and upgrade Perl CGI programs, it is quite difficult to find NewtonScript programmers willing to donate their time to a project like this. This observation drove Steve and his team to replace a Newton-based management console with a simple web browser running against an on-board linux-based web server.

Steve's talk went over incredibly well, and I was among a small group of people on the boat who couldn't stop talking about it all week. The enthusiasm behind this ninety minute presentation was so much more than expected that it helped everyone involved with GeekCruises realize that we never set aside any time for BOFs (``birds of a feather'' sessions). Once again, the unharried, laid-back atmosphere came back to help us, and we found a block of time that didn't conflict with any other programming and just met beside the pool (conveniently placed near a totally unhealthy amount of dessert).

Two nights later, we were privileged to hear John Clutterbuck discuss how Perl saved the Land Registry System in Scotland, displacing Java, cross-platform toolkits and 4GL environments in the process. The crux of John's talk was about implementing a typical n-tier client-server system. Instead of using a more ``traditional'' tool like Java, Visual Basic or other such environments, John's team used Perl/TK to create graphical client programs. One of the major reasons why Perl was a good choice for this project was that Perl was already in use internally to munge incoming data feeds. When it was time to implement this project, Perl/TK was not another new skill requiring downtime and retraining; it was just another library to use alongside the DBI.

There were other great non-seminar presentations. Tim Bray talked about standards, standards organizations, and encouraged everyone to complain loudly when our vendors don't implement standards that matter to us, implement them poorly, or misimplement them intentionally. The next night, Larry Wall guest-hosted Jon Orwant's Internet Quiz show, which was quite fun. The cruise program wound up about a day later, when Larry was available for an open-ended question and answer session. This Q&A was quite refreshing, because the questions were frequently deep, often thought-provoking, and always brought out an honest, uncensored answer from Larry (unless his wife Gloria pre-empted him).

Oh, did I mention that this was a cruise? With all these Perl-ish events all week, it was easy to forget at times that we were in the midst of some of the most beautiful parts of North America. While we were cruising north, we witnessed an endless parade of tree-lined mountains of British Columbia and Alaska. Our itinerary also brought us into Glacier Bay, where we saw the big ice break apart in the morning and watched for whales in the afternoon.

And what's the point of cruising to Alaska if you're not going to get off the boat and actually see Alaska? Our itinerary offered three ports of call. Our first port was Juneau, the largest city in southeast Alaska. Although Juneau may be too small to have a Kinko's, it is plenty large enough to have one of the finest breweries in North America. Around forty Perl Whirl passengers gathered together to go on a Pub Crawl over Juneau while we were docked. Our first stop was the Alaska Brewing Company, voted best brewery in America three years running at the Great American Beer Festival (they are now enshrined in the GABF's ``Hall of Foam''). After leaving the brewery, we visited some other local watering holes and passed by the wondrous Mendenhall Glacier. Our crawl ended up near the dock at the Mt. Roberts tram. At the top of Mt. Roberts, we found ourselves in bright sunlight at 9 PM on the last day in May, making snowballs behind the observation deck. When life gives you a chance to make snowballs in Alaska when it's warm enough to wear shorts and sandals, it's best to take advantage of the opportunity.

The next day, we found ourselves in Skagway, a small town north of Juneau where the main industry is obviously tourism. The helicopter pad at the end of the dock says so loud and clear. At this point in our travels, I must admit that I welcomed a quiet day wandering through this small town, perhaps reading a bit of the Cryptonomicon on a sunny patch of turf. After a quick stroll, I bumped into Tom Christiansen, who asked me if I was up for a little walk. I briefly forgot that Tom lives in Boulder, Colorado; when he says ``little walk'', most other people interpret that as ``a hike in the woods''. I joined Tom, and we had a wonderful afternoon wandering through a lush green trail that started at the edge of Skagway. When we reached our destination, we were surrounded by on all sides by the largest fjord in North America. All in all, it was the most enjoyable day I spent on the cruise.

After our wanderings through Glacier Bay, we wound up in Ketchikan, our last port of call. I decided it was time wander aimlessly with my friends Monty and Julie Taylor. Julie peeled off early that morning and would later catch a 34-inch salmon on a fishing excursion. Monty and I were up for some local flavor and eventually found Anabelle's, a great little restaurant that serves a stunningly good seafood chowder and beer-battered halibut and chips. Paired with a fresh pint or two of Alaskan Amber, and this is easily the best meal I had all week, and the chefs on board the Volendam provided some stiff competition. If I ever return to Ketchikan, you can find me at Anabelle's.

Speaking of excursions, I didn't sign up for any mountain bike rides, helicopter rides, sea-plane tours, hikes or fishing. Lots of people did, but I'm a city boy who likes meandering through town, and the occasional hike in the woods. And I had a great time doing just that.

All in all, I had a wonderful time on Perl Whirl 2000. The Perl was great and Alaska was too (or maybe the other way around). Will I return to Alaska again? Perhaps. Will I return to Alaska on a future Perl Whirl? Just try and stop me.


  • If you haven't seen it already, you might want to check out the 31 things Tim Bray learned on the cruise. It has pictures, including one of Larry in one of his many tuxedos.
  • It turns out to be surprisingly easy to keep up an 8 AM - 11 PM schedule when the sun doesn't start setting until 9 PM and then comes up again at 3 AM.
  • I can attest to the laid-back attitude on the cruise. Everyone I met was kind and friendly, which isn't always the case at shorter conferences in less delightful venues.
  • In Glacier Bay, if you were quiet, you could hear the glacier creaking and popping as it slid into the sea. I had heard stories of glaciers calving, which is when a piece breaks off and falls into the water. (This is where icebergs come from.) I had always imagined that this was a comparatively rare event, that if you watched the glacier every moment for a whole day, you might, if you were lucky, see a piece break off. Wrong! Stuff is breaking off the glacier all the time and you only have to watch for a few minutes before something happens.
  • Downsides of the cruise ship: Sometimes other cruise activities interfered with the conference. For example, Dori Smith's Javascript class was held in a small room just off the main dining room, where a wine-tasting session was in progress, and it was hard to hear Dori over the amplified voice of the wine expert. Similarly, one of my classes was in the room immediately adjacent to the casino---fortunately there was a door I could use to shut out the slot machine racket.
  • My favorite cruise story: I got to interrupt my regex class so that we could rush over to the windows and look at the whales off the side of the ship.
  • Finally: Adam says it was easy to forget that we were in the midst of some of the most beautiful parts of North America. It wasn't easy for me to forget, and I spent most of my free time just staring out of various windows.

--Mark Dominus

ANSI Standard Perl?

An Interview with HP's Larry Rosler



Larry Rosler was both editor of the draft standard and chairman of the Language Subcommittee for X3J11. He helped put 'ANSI' in front of C. He is also just another Perl hacker. Larry recently took time out of his busy schedule to share his thoughts on the value of standards, how Sun ought to handle Java and optimizing Perl code.

What's your background? How did you get interested in computers?

LR: My formal education was in physics; my research field was experimental nuclear physics. However, I soon learned that I was more interested in the equipment used to gather and analyze the data than in the physics. The equipment was essentially a digital computer, but there were no degrees in computer science in those days.

In my first major job, with Bell Labs in New Jersey, I worked on solid-state devices. I found that simulation was faster than building and measuring test devices. When others began to use the programs I wrote, I realized that my greatest professional satisfaction would be as a software 'toolmaker'. So I switched to building tools such as graphical design terminals, and then focused on programming languages and libraries.

I was taught by several "hardware" guys who got into programming somewhat reluctantly. Do you feel that novice programmers lose an important perspective without knowing what goes on under the hood?

LR: I taught the first course on C at Bell Labs, using a draft of K&R, which helped vet the exercises. The students were hardware engineers who were being induced to learn programming. They found C (which is 'portable assembly language') much to their liking. Essentials such as pointers are very clear if you have a machine model in mind.

Perl is at a higher level of abstraction, so the machine model isn't as necessary at first. But when you get to complex data structures, which require references (which are like pointers in C, but much safer), a grounding in addressing becomes useful.

In an ideal world, a student would first learn an abstract assembly language such as MIX (see Knuth, Vol. 1), do some useful exercises, then take on a higher-level language with the machine model in the back of the head.

When did you run into Perl? What did you think of the language the first time you saw it?

LR: I am a relative latecomer to Perl. I was experimenting with CGI programming using shell scripts (!) because they were better for rapid prototyping than C. Soon I discovered that Perl offered advantages similar to the shell, but was much more expressive (particularly in the manipulation of data structures) and much faster to execute.

Because of my familiarity with Unix commands such as 'sed', which made heavy use of regular expressions, and because of my C experience, I was quite comfortable with Perl syntax. The hardest adjustment was to learn to write code with as few Perl operations as possible, because of the costs of dispatching each instruction. The Benchmark module became the most important tool I used to learn how to write efficient (and hence, sometimes elegant) Perl. I also learned a lot from the newsgroup comp.lang.perl.misc, to which I eventually became able to contribute.

Often Perl claims to be efficient in the scarce resource of the programmer's time. It isn't often that people tune scripts for optimum performance. Are there a few tips you can give to new Perl programmers on how squeak out a little better runtime performance?

LR: 'Script' ne 'program'. When dealing with data sets that will grow (one hopes and expects), performance becomes important. I proposed a tutorial to Perl Conference 4.0 this year on this subject, but it wasn't accepted. Maybe next year.

  • 0. Don't optimize prematurely!
  • 1. Don't use an external command where Perl can do a task internally.
  • 2. Refine the data structures and algorithms. References: The Practice of Programming (Kernighan and Pike), Programming Pearls (Bentley), Algorithms with Perl (Orwant et al).
  • 3. After these are optimal, identify the remaining hot spots. (Judicious use of the time() and times() functions, or the Benchmark module.)
  • 4. Try to improve the hot spots by using functions such as map() and grep() instead of explicit loops; make regexes more explicit to minimize backtracking; cache intermediate results to avoid unnecessary recalculation (memoization), ...

What's your favorite part of programming? Design or implementation (or other)?

LR: Happy users.

I understand you were the Chairman of the ANSI C committee. How did that come about?

LR: Not the chairman -- that was primarily an administrative function, and what I did was more technical. I was the chairman of one of the three major subcommittees (Language; the others were Library and Environment), and -- most important -- I was the editor of the Draft Standard. The hand that holds the pen controls the direction of the work and the results. :-)

This all happened because at Bell Labs I was one of the managers of the effort to turn C from a research vehicle to a commercially useful language, which began with internal standardization. Because of the demands of some major users, such as the US Government, commercialization required the development of a formal standard. My colleague Dennis Ritchie (who created C) didn't want to participate personally, but he was always active behind the scenes in reviewing and improving my efforts. He was quite satisfied with the final results.

A few years later, now working at Hewlett Packard, I found myself in the same position regarding C++. I persuaded Bjarne Stroustrup, the creator of C++, to support standardization, as a prerequisite for successful commercialization. He agreed, and took a very active role.

There's certainly no ANSI Perl. Does Perl need the same kind of official standardization that C got?

LR: I believe that it does, in order to increase its acceptability. Many organizations either cannot or will not endorse the use of unstandardized languages in their business-critical activities.

The current situation with Perl is better than it was with the other two languages I mentioned. Perl has one official open source for its implementation, whereas the others had multiple proprietary implementations, leading to different semantics for many language features. But this single 'official' Perl semantics has never been adequately characterized independent of the implementation, so is subject to arbitrary change.

Building on quicksand is acceptable for 'scripts' of limited longevity and applicability. It is not acceptable for 'programs' of significant commercial value. I think the lack of a firm, stable, well-defined foundation is the major inhibitor for the continuing commercial evolution of Perl. Of the major contributors to Perl, Ilya Zakharevitch is most outspoken in his view that Perl is not (yet?) a 'programming' language!

Many of the people who contribute their efforts to the Perl community are interested in adding features, perhaps to overflowing. More people should devote their attention to firming up the semantics and making sure that the implementation conforms to those semantics, rather than the other way around.

Your point is well taken about focusing future Perl development on producing consistent semantics. I'm a bit surprised that you see this as a barrier to greater Perl acceptance. Some might suggest the interpretive nature of the language is a greater barrier ("Why can't I get a hello.pl.exe?").
Or point to Perl's endearingly unique syntax.
Or to Perl's charmingly permissive OO implementation.
Are there particular areas of Perl's semantics in which the inconsistencies are glaring?

LR: I am less concerned about individual programmers' decisions for their own projects and more concerned about major corporations or government agencies that reject Perl because of lack of formal support and lack of standardization.

An example of excellent work that is long overdue is Ilya's document 'perlnumber', new in Perl 5.6.0, which specifies for the first time the semantics of Perl's string-to-number conversions.

I'm curious how one would standardize Perl when the language changes so quickly and committees move so slowly. Consider that three years ago, Perl was not threaded. Now, threads are standard, but their interface may change in the near future. Mr. Zakharevich continues to pull new regex constructs from head of Zeus. Even more striking, Perl supports unicode. Is there some way to stage the standardization so that it isn't painfully out-of-date? Would standardization necessarily slow down perl development?

LR: Sometimes standardization speeds up development, by forcing convergence on a specified way of doing things. Sometimes features are characterized and implemented during standardization (wide-character types for C, for example; the Standard Template Library and many other features for C++).

One way to view it is re Samuel Johnson's famous mot: "When a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully."

Are you following the debate amongst programmers who favor Sun lessening their control over Java (especially after Sun withdrew it from the independent standardization process)? Do you feel Java might benefit from a more open/chaotic development model (ala Perl), should it follow the C path of getting independent standardization, or is Sun doing the best thing for the language by keeping it "in-house"?

LR: My prejudice should be clear from what I have already written. AT&T handed control over C and C++ to ANSI/ISO technical committees, while keeping an active leadership role to ensure that the goals of the originators of the languages were met. Sun should do the same.

On the other hand, Microsoft's desire to 'embrace and extend' Java should be doomed by standardization, just as they failed to subvert the target independence of C and C++.

My understanding is that Microsoft is 'embracing' ActiveState's Perl. They will be shipping Perl with their "Services for UNIX 2.0". Do you see any chance for Perl to compete on a Windows Platform as a replacement for Visual Basic? In particular, as application "glue"?

LR: Why not? Perl is superior to Visual Basic in every way imaginable.

Maybe a push from Microsoft will help overcome the barrier of acceptability that I focused on above.

Back to Java for a moment. Because they are both "web technologies", Perl and Java are often seen as competitors. There have been attempts, like Larry Wall's JPL, to provide better integration between these two beasties. What sort of utility do you see in such a marriage?

LR: Hard for me to say. Java forces OO programming from the beginning, and I have never needed to write an object in any program. This may be for me a conceptual hurdle that I need not overcome.

C++ provides sweetened C syntax and semantics (particularly toward bits and bytes), generic programming (templates), and OO if you want it. This can all be done with almost the efficiency of C.

Perl provides higher-level syntax and semantics (particularly toward strings and complex data structures), and OO if you want it. I know how to write efficient Perl when I have to.

Java, in my opinion, fills a much-needed gap between those two approaches. :-)

Speaking of competitors, Python is making a big splash this year. Although it seems to satisfy many of the same itches Perl does, its proponents point to its cleaner syntax and more tradition OO implementation as making it a "better Perl". What are your thoughts on Python?

LR: Whatever improvements Python may offer are not sufficient to give it a critical mass of programmers and programming support relative to Perl. There is an order-of-magnitude difference in this metric (programmers, modules, books, ...), and I don't think significant inroads are being made. And, as I said, to me OO is a big yawner.

What five people would you like to see learn Perl?

LR: Hah! Some colleagues are now trying to convince me to move on to Python. :-)

Have you had the chance to read Mr. Conway's _Object Oriented Perl_? I found I learned more about general Perl from it than OO techniques (which by no means is to say book is inadequate in the latter department).

LR: Yes. It is a fine book. But it still didn't convince me about the necessity of objects. All I see is the performance-damaging complexity of the interfaces.

How much longer do you see the "Internet Goldrush" continuing?

LR: Judging by the recent performance of the NASDAQ, it may be over already.

You presented a paper at TPC on efficient sorting with Uri Guttman, a fellow Boston Perl Monger. How did you two meet?

LR: We met in comp.lang.perl.misc in March 1998 (soon after I began to post questions to the newsgroup). Uri educated me on 'hash slices', which he eventually turned into a tutorial. That July, I spotted his and his wife's names in the credits at the end of a movie, which exposed a common non-Perl interest (in Jews and Buddhism, to be specific). We met for the first time at TPC2 in San Jose that August.

The paper on sorting was developed entirely by email. The next time we met in person -- together with Damian Conway and our wives -- was at TPC3 in Monterey. We will meet again there at TPC4 this summer. Each of us is working on the Perl Golf tournament, which Uri organized.

Do you have any tips for new programmers dealing with management?

LR: Agree on useful metrics for progress and completion. Never report that your code is 90% complete, 90 days from completion, because that tends to be a steady-state description.

Never estimate more than about half your time actually working on the project, because other things will always happen.

Has this rapid web development trampled on software quality control irreparably?

LR: Not only the web -- Microsoft. Who would have imagined that the world would tolerate an environment with the dismal quality of Windows? So now every company, even those such as Hewlett Packard with long-standing reputations for high quality, has to compromise in order to compete in a timely way.

Since you've in the business for a respectable number of years, what are your five biggest pet peeves about programming?

LR: That's a toughie. How about four, in no particular order:

  • Extracting from the potential user a complete and useful specification of a problem.
  • Coping with buggy or inadequately documented tools.
  • Keeping things functioning as operating environments evolve. (No one wants anything to change except the things that that person wants changed. :-)
  • As I said above, evaporating expectations of quality.
    (1980's paradigm: If it's worth implementing once, it's worth implementing twice.
    1990's paradigm: Ship the prototype!
    2000's paradigm: Ship the idea!)

This Week on p5p 2000/06/04



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

This week's report is late because I got back from Vancouver very early Thursday morning. Fortunately there were only 166 messages last week.

Ilya Quits

At the end of the week, Ilya announced that he was departing p5p. This is a terrible loss to the Perl community.

Thank you, Ilya, for your tremendous contributions over the years and for all your hard work. Good luck in the future.

B::Bytecode is Ineffective

Benjamin Stuhl reported that he had been working on B::Bytecode, but that it yielded a negative performance win. He says:

Benjamin: The costs of having to do 3-4 times as much I/O (Math::Complex compiles to 300+ K from 80K) more than outweigh the costs of parsing the code.

To put it bluntly, I have serious doubts about the utility of B::Bytecode.

Nick Ing-Simmons said that had had similar doubts for some time. But he also pointed out that with B::Bytecode you can compile all your source files into one bytecode file and ship it in one piece.

Nick: I have a low-tech B::Script module which collects all the *.pm files used by a "script" into one file and adds a wrapper which overrides require so that text is read from embedded hash rather than file system.

Tim Bunce suggested compresing the bytecode. Stephen Zander recalled that Nicholas Clark had posted an almost-complete solution in October, 1998. Nicholas suggested that the final problems might be soluble once line disciplines are implemented.

Root of this thread

In other B::Bytecode news, Benjamin Stuhl posted a patch that adds several features and generates smaller bytecodes.

Patch.

Ben's map Patch

Back in April Ben Tilly submitted a patch for map that was intended to make it perform better in the common case where the result was larger than the input. Sarathy said that he thought a better solution was possible, and provided some details about how it might work. The question is when to extend the stack to accomodate the results of each iteration of the map, and when to relocate the new items (which are placed at the top of the stack) to below the remaining arguments (which need to be at the top of the stack at the beginning of each iteration.)

Sarathy's message.

Ben Tilly disagreed; he said that he had subsequently decided that his patch was doing the best possible thing, and that he had considered Sarathy's approches and decided that there was no good improvement, because the overhead of keeping track of extra information would be at least as big as the gain from not copying as many stack items.

Ilya said that Sarathy's solution seemed too complicated: The simplest thing to do is to pre-extend the stack at the beginning, leave all the result items on the top, and move them all down at once when the map is finished.

Sarathy ended the discussion by saying:

Sarathy: Anyway, I think the best way tosettle the question is by implementing it. Anyone up for it?

If you're interested in trying, but you don't know how to get started, send me a note and I'll try to help.

split Oddities

Yitzchak Scott-Thoennes reported two bugs in split: First, if you use the ?...? delimiters, it is supposed to split to @_ even in list context, but does not. Mike Guy reported that this worked in Perl 4 but apparently broken in Perl 5.000. He submitted a documentation patch to announce that the feature has been discontinued. Then some discussion followed concerning use of split in scalar context, which is useful, because it delivers the number of fields, but which produces an annoying warning. Ilya pointed out that he had submitted a patch to fix this but it was ignored.

Yitzchak's second bug was that the following construction does not deliver the Use of implicit split to @_ is deprecated warning as you would expect:

        eval "split //, '1:2:3'; 1";

Apparently the key here is that the eval is in void context. There was no discussion and no patch.

scalar Operator Doesn't

Yitzchak also pointed out that if you use the scalar operator in void context, it provides void context to its argument, not scalar context. Ilya said this was not a bug, because a void context is a special case of scalar context. Simon Cozens disagreed and provided a patch.

perlmodlib

Simon Cozens sent in a program to generate the perlmodlib man page automatically.

perlnewmod

Simon also sent in a new perlnewmod manual page, which explains how to write a module and submit it to CPAN.

Read it.

If you have suggestions about perlnewmod, please mail Simon.

Method Lookup Caching

Ben Tilly sent a long note about how to speed up inheritance and method lookups, but Sarathy replied that adding more solution hacks would be premature, since at present nobody knows why method calls are actually slow. Someone should have investigated this a long time ago. If you are interested in investigating this but do not know how to begin, please send me email.

Read about it.

Perl in Russia

Alexander S. Tereschenko posted on comp.lang.perl.misc that he and his team would translate most of the Perl documentation set into Russian; the article was forwarded to p5p.

Eudora Problem

Apparently Eudora has a clever feature that inserts extra spaces at the beginning of some lines when you compose a message in word-wrap mode. This means if you use Eudora to send in a patch, there is a good chance that the patch will not work.

If you use Eudora to send patches, make sure the word-wrap setting is turned off.

h2xs Backward Compatibility

Robert Spier pointed out that the 5.6 version of h2xs is not usable with any earlier version of Perl, because it generates our declarations and use warnings declarations. It makes sense to use the 5.6 h2xs with an earlier Perl, because the new release of h2xs has been so improved. Robert later provided a patch (which he subsequently revised) that adds a -b backward-compatibility flag to h2xs.

The patch.

Various

A medium-sized collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount of spam. No flames.

Until next week I remain, your humble and obedient servant,


Mark-Jason Dominus
Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en