# November 1999 Archives

## Sins of Perl Revisited

Long ago, back in the Perl 4 days, Tom Christiansen wrote an article called The Seven Deadly Sins of Perl, describing the biggest problems in the Perl language at the time. A few years later, he went over the list again to see how well the problems had been addressed by Perl 5. That was in 1996; I think it's time for another update. Where are we now?

Tom's original Seven Sins:

### 1. Implicit Behaviors and Hidden Context Dependencies

Every week or so someone shows up in comp.lang.perl.misc asking why this didn't work:

        while (<FILE>) {
my $line = <FILE> # ... }  It seems to be skipping every second line,'' they say. Hmmm. Of course, it is skipping every second line, because when a <...> operator is the condition of a while loop, it magically turns into defined($_ = <...>). But then a lot of people get confused in the other direction:

        if (something) {
<FILE>
print "The next line is $_''.\n"; }  <...> only changes $_ when it is the condition of the while loop. I wonder if it would have been better to have <...> always read into $_ unless assigned somewhere? Oh well, it's much too late anyway. I do see a trend away from these. Here is an example, which is somewhat obscure, because of the trend. In a regex, ^ and $ normally match at the beginning and end of the string, respectively. The regex /^Hello/ looks for Hello at the beginning of the string. If your string contains many lines, separated by \n characters, you might like to have a regex that looks for Hello at the beginning of any line. The way you did this in Perl 4 (and also Perl 3, 2, and 1) was to set the variable $* to a true value. This changed the meaning of ^ and $ for every regex in the entire program. If you set it and forgot to set it back, you could get a nasty surprise when other regexes matched when they shouldn't have. If you were writing a library to be used by other programs, you'd have to put

        { local $* = 0;$target =~ /^regex/;
}


around every one of your regexes that used ^ or $; otherwise you could get taken by surprise by an unexpected setting of $*. Of course, most libraries didn't do this. Here's another example: Array indices normally begin at 0 because the value of $[ is normally 0; if you set $[ to 1, then arrays start at 1, which makes Fortran programmers happy, and so we see examples like this in the perl3 man page:

        foreach $num ($[ .. $#entry) { print "$num\t'",$entry[$num],"'\n";
}


And of course you could set $[ to 17 to have arrays start at 17 instead of at 0 or 1. This was a great way to sabotage module authors. Fortunately, sanity prevailed. These features are now recognized to have been mistakes. The perl5-porters mailing list now has a catchphrase for such features: they're called action at a distance'. The principle is that a declaration in one part of the program shouldn't drastically and invisibly alter the behavior of some other part of the program. Some of the old action-at-a-distance features have been reworked into safer versions. For example, In Perl 5, you are not supposed to use $*. Instead, you put /m on the end of the match operator to say that the meanings of ^ and $ should be changed just for that one regex. The expected flood of ex-Fortran programmers never materialized, so $[ is highly deprecated, and only affects code in the current file; it can't be used to sabotage modules.

Unfortunately, not enough sanity prevailed, since the ubiquitous $/ variable (input record separator string) is still with us and can cause plenty of sabotage all by itself. This may change soon. A hot topic on the perl5-porters mailing list is line disciplines', which means that each filehandle would have its own private notion of the line terminator, as well as its own notion of other properties of the input such as the character representation. This would allow it to translate EBCDIC data into ASCII transparently, or (more to the point) Latin-1 to Unicode. Still the problem is not likely to go away for good; variables like $/, $\, $", and $, will be with us for a long time. Prognosis: Action at a distance is now recognized as a bad thing, and the old action at a distance features are being converted to safer versions. But the problem will probably continue for a long time, and very few modules presently take precautions against seeing a weird $" or whatever.

### 2. To Paren || !To Paren?

By this Tom meant weird context traps having to do with the way parentheses (or lack of parentheses) can drastically alter the meaning of an expression. Hardly any beginning programmers understand context. Here's a nice example: This works:

        $n = sprintf("%d %d", 32, 49);  So does this:  @n = (32, 49);$n = sprintf("%d %d", @n);


But if you try this you get a surprise:

        @args = ("%d %d", 32, 49);
$n = sprintf(@args);  (If you replace sprintf with printf, the surprise goes away. Ugh.) People have problems with my$x = ... when they should have written my($x) = .... They have the opposite problem, where they write my ($line) = <FILE> when they should have written my $line = <FILE>. And of course, they wonder how to find the number of elements in an array. People also want to think that (...) in an expression constructs a list, and they go putting parentheses around things to make them into lists, which never works, because it's what's on the left side of the equals sign that determines whether the right side is a list or not. That's a pretty good rule of thumb, in fact, as long as you're willing to overlook the x operator: 'foo' x 3 constructs the string 'foofoofoo', but ('foo') x 3 constructs the list ('foo', 'foo', 'foo'), equls signs or no equals signs. Prognosis: You don't have to love it, but you have to learn to live with it. ### 3. Global Variables The prototypical example of this problem is this code:  while (<FILE>) { print if some_function(); }  That looks harmless, doesn't it? But it's deceptive, because some_function() calls other_function(), and other_function() calls joes_function() in the module Joe::Database, and joes_function() calls load_database(), and load_database() is 484 lines long and in the middle it says  while (<DATABASE>) { push @records, <DATABASE> if /pattern/; }  That while clobbers the value of $_, which would be OK, except that it clobbers the value of $_ way up in the main program, which was going to print out the value of $_ when some_function finally returned. Instead, it prints out the empty string. The program fails and the power plant explodes, poisoning the earth and the sea. Famine and disease sweep the world. All die. Oh, the embarrassment.

Today perl5-porters got mail asking why this code destroys the array:

        my @array=(1,2,3);
foreach (@array) {
open FILE, "<test";
while (<FILE>) {
...
}
close FILE;
}


That is a good question. Another good question would be why this code does not output a bunch of 3s:

        my @array=(1,2,3);
open FILE, "<test";
while (<FILE>) {
foreach (@array) {
...
}
print;
}
close FILE;


Answer: Because foreach automatically saves the old value of its index variable and then restores the original value when the loop is over.

This problem has gotten somewhat better in recent years. Auto-localization in foreach loops was the first step. Encouraging people to use the new for my $x (...) and while (my$x = ...) syntax will help with this problem also; anything that gets people to stop using $_ is a step in the right direction. Prognosis: Things have gotten better, but have also probably reached the limit of improvement. Module authors must be educated to localize $_ before changing it, and any advance that depends on the education of module authors is probably doomed.

### 4. References vs. Non-references

Tom's complaint seems to be that reference syntax is too complicated. I don't think anyone can argue with that. Reference syntax is awful. It isn't going to get any better, either.

Prognosis: Doom and gloom.

### 5. No Prototypes

A common complaint with Perl 4 was that you couldn't write a function like push:

        my_push(@array, 1, 2, 3);


@array would be expanded into a list of elements, and my_push would never have a chance to operate on the original array. This was fixed with the prototype feature; now you can write my_push like this:

        sub my_push(\@@) {
my $aref = shift; push @$aref, @_;
}


Prototypes still have some minor but annoying holes. You can't write a function that behaves like printf with its optional filehandle argument, or like sort with its optional code block argument, or like tied with its any-kind-of-variable argument. You can write a function like lc that takes a single optional argument, but it won't be parsed the same way that lc is:

        $fred = 'Flooney'; sub my_lc (;$) {
if (@_) { lc $_[0] } else { lc$_ }
}

print lc $fred, "\n"; print my_lc$fred, "\n";

Too many arguments for main::my_lc at /tmp/lc line 7, near ""\n";"
Execution of /tmp/lc aborted due to compilation errors.


Probably the worst thing about prototypes is the name. When ANSI standardized the C language in 1989, the big change was to add prototypes' to enable compile-time type checking of function arguments. C programmers learned that you should prototype all your functions to enable these checks so that you didn't end up passing a pointer to a function that wanted an integer, or whatever. People have the idea that Perl prototypes are for the same thing, and in fact they're not. They do something totally different, and they don't protect you against this:

        sub foo ($); foo(@x); # whoops, should have been foo($x) instead.


The C programmers would like to think that this will deliver a compile-time error that says Hey, dummy! You used an array when you meant to use a scalar!' which would make it easier to debug. No, Perl takes the prototype as an indication that you would like the array automatically converted to a scalar, and passes the number of elements in the array to foo(). This will make it harder to debug, not easier.

Prognosis: The remaining technical problems with prototypes are pretty small and may get smoothed out eventually. Better type checking of function arguments may arrive eventually also; there's been talk for a long time about function declarations of the form

        sub foo (Dog, Cat) { ... }


which would make sure you were passing objects of the appropriate classes, and support for this has been going in a bit at a time. See perldoc fields for example. This, however, will compound the problems of people trying to get C programmers to stop using prototypes for compile-time type checking. Expect more confusion here, not less.

### 6. No Compiler Support for I/O or Regex objects

This got a lot better between Tom's first and second reports, so much so that he regarded it as fixed. I think the credit for this on the regex side mostly goes to Ilya Zakharevich; I don't know who gets the credit on the I/O side; probably Larry Wall and Graham Barr. Since Tom's last report, it's been fixed even better: You can say

        open my $fh,$filename;


and $fh will be autovivified to be a filehandle open to the specified file; when $fh goes out of scope, the file will be closed automatically. This means that you don't have to worry about using global filehandle names any more. Another useful use of C<local> down the drain, and good riddance.

Prognosis: Essentially fixed, despite a few lingering problems.

### 7. Haphazard Exception Model

Tom says: There's no standard model or guidelines for exception handling in libraries, modules, or classes, which means you don't know what to trap and what not to trap. Does a library throw an exception or does it just return false?''

This problem persists. Every module does something different. C programmers used to complain that having to explicitly check every system call for an error return made their code four times as big; in Perl the problem is worse because every check looks a little different.

Modules persist in issuing warning messages with no way to get them to shut up. Modules call die and take you by surprise when you thought you were going to get a simple error return. The standard modules have been substantially cleaned up in this regard since 1996, thank goodness.

Here's another problem: Exceptions and die are the same thing in Perl, which sometimes surprises people. Someone wrote into perl5-porters recently about a library function that was going to run a subprocess. The fork() succeeded but the exec() failed, so the child process called die. That was usually the right thing to do. In this case, however, the library function had been called inside of an eval block, which trapped the child's die. The original process was still waiting for the child to complete, but the child was going ahead, thinking it was the parent!

Groundwork for rationalization has been laid here; recent versions of Perl let you throw any sort of object with die, not just a string. Using these objects you could propagate complex kinds of exceptions in your programs. But as far as I know these features are little-used. There are several modules that provide try-catch-cleanup syntax, but as far as I know they're also little-used. And there are no widely accepted guidelines for the behavior of modules.

Prognosis: This is a social problem, not a technical one. The only answer is education, possibly headed by a crusader, or many crusaders.

To replace the two problems that have been solved (lack of prototypes and compiler support for IO and Regex objects) I'd like to add two new problems to this list:

### 8. The Documentation is Too Big

The Perl 1 documentation was 2,000 lines long, which is already pretty big. The documentation for the current development release is 72,000 lines long, not counting developer-only documentation like the Changes files. It is difficult for beginners to know where to start, and it's difficult for anyone to know where to find any particular piece of information. The existing documentation includes a bunch of stuff like perlhist and perlapio that should have been buried in a subdirectory somewhere instead of alongside perlfunc.

The manual keeps getting bigger and bigger, because while it's easy to see and appreciate the value of any particular addition, it's much harder to appreciate the negative value of having an 0.01% larger documentation set. So you have a situation where someone will come along and say that they were confused by X, and that the manual should have a more detailed explanation of X, and it will only take a few lines. Nobody is willing to argue against the addition of just a few lines, because there's no obvious harm, but then after you've done that 14,000 times, the manual's usefulness is severly impaired.

Similarly, it's hard to seriously suggest that the manual be made shorter. Making manuals shorter is at least as hard as making programs shorter. You can't just cut stuff out; you have to reorganize and rewrite.

Short of throwing the whole thing away and starting over, some things that might help: The trend over the past few years seems to be toward a separation of reference material from tutorial material. It might be good for this to continue. The existing documentation needs reorganization; it's not clear what is Syntax' rather than Operators' or Functions'. (If I knew how to do this, I would say so.) Right now the overall structure is flat; I think it might be a step forward if the documentation were simply divided into a few subdirectories called Tutorials', Internals', Reference', Social', and so on. Perl needs to come with better documentation browsing tools, and maybe more important, there needs to be a better search interface on the web. Better indexing of the exiting documentation would help to enable this.

Prognosis: Poor. Very little work is being done on the documentation except more of the same. Everyone wants to write; nobody wants to index.

### 9. The API is Too Complicated.

Writing Perl extensions is too hard. You have to understand XS. Existing documentation of XS is very sketchy.

If what you want to do is just to glue existing C functions into Perl, packages like SWIG and h2xs are a big help here. If you want to do anything the slightest bit offbeat, you're on your own.

What would help here? Better documentation. The example discussed in the existing perlxs manual is an interface to the rpcb_gettime function, whatever the heck that is. If you don't have it on your system, and you probably don't, you can't try out the example. There's too much dependency between the XS man pages and the perlguts man page; someone needs to go over these and reorganize them into a series of documents that can be read in order.

I once asked Larry why XS is so complicated, and he said that it was like that to make it efficient. It would be nice if there were a kind of extension glue that was simpler to write even if it were less efficient.

Prognosis: Mixed. Glue makers like SWIG and Ken Fox's C++ kit seem to be maturing nicely. But the documentation problem is not being addressed, and the real underlying problem, which is that Perl's internals are too complicated and irrational, is probably insoluble. The Topaz (Perl 6) project might fix this.

## This Week on p5p 1999/11/28

### Notes

This report is very late, because I was in London from 26--29 November, and then when I got back I had to finish preparing class materials for a class I was teaching in Chicago from 6--8 December, and when I got back from that I had to prepare class materials for a class I was teaching in New York from 14--16 December. Then I had to recover. Now I'm going to try to catch up on reports. My apologies for the delay.

#### New Meta-Information

The most recent report will always be available at http://www.perl.com/p5pdigest.cgi .

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

In the last report, I posted an explanation by Dan Sugalski of the current threading model and the new one that would be used in 5.005_53. (Here it is.)

This reopened debates about the feasibility of the new model, and Dan, Sarathy, and Ilya had a medium-long discussion about it. The debate centered around how much stuff gets cloned when you fork a new thread/process. Here's my probably-too-brief summary: Under the new model, when you start a new thread, the Perl stack is cloned and each thread gets a separate clone. But you can also request fork()-like semantics, in which case all global variables also get cloned. This will allow forkless Windows platforms to emulate fork() with threads. In either case, the op tree (which is read-only) is shared between threads.

The debate centered around the largeness of the amount of data that would need to be cloned. Sarathy claims that even in the fork case the op tree outeighs the global data by a factor of 8.

The whole discussion was very interesting and is recommended reading. The top of the discussion is here.

This and the ensuing discussion is worth reading if you are interested in the changes in Perl's threading model.

### Discussion of Line Disciplines Continues

Much interesting discussion about line disciplines continues, this time in the context of having <> and chomp() behave properly, even when you have a remote-mounted NT fileststem where the files have \r\n-terminated lines. Goal: Perl should do the right thing' regardless of where the file is located. The subtext here is that Perl should also do the right thing' when reading from an ISO-Latin-1 file, an EUC-KR encoded file in Korean, or a SHIFT-JIS encoded file in Japanese.

Larry: I expect people to expect Perl to do the right thing.

Jochen Wiedmann compained abuot the support for shadow password files. If your vendor supprots them transparently via the usual getpw* calls, Perl supports them too; otherwise you're out of luck. In fact this is a frequently asked question: How to get the shadow password?

There are at least two schools of thought on this. School of thought #1 is that when you make a getpw* call, you should get back the usual seven items, except that if you're on a system with shadow passwords and you're not rnuning as root, you get back x or * or some such instead of the real password; thus no new interface is required. The author of the man page (see perlfunc/getpwent) apparently subscribes to this school of thought.

School of thought #2 is that getpw* should always return a password of x, and instead there should be special calls, probably named getsp* or something similar, for reading the shadow password file.

Advantages of school #1: No program needs to be rewritten or even modified when you switch from traditional password style to shadow passwords. Advantages of school #2: No program will get the passwords written into its memory unless it specifically asks for them, regardless of whether or not it happens to be run by root; having programs suddenly become riskier just because they are running as root is a Bad Thing.

Anyway, Sarathy reports that as of 5.005_57, Perl has gone entirely over to school #1 and will emulate that behavior even if you are on a system that belongs to school #2. If you call any of Perl's getpw* functions, and your program is running as root, then Perl will make a getsp* call to fill in the password as if it had been returned by the getpw* call in the first place.

I seem to remember that debate on whether or not this was advisable continued the following week, so I'll follow up in the next report.

### Bugs in NT Perl Sockets?

Phil Pfeiffer posted an interesting analysis of peculiarities of Perl network sockets under NT. I found these interesting, but there was no discussion. Of course, it is probably NT's fault, but it would be nice to see these fixed anyway. As Larry has said The Golden Gate wasn't our fault either, but we still put a bridge across it.'' Read it.

### Run Out of File Descriptors

Yossi Klein was puzzled because Perl limited him to 256 open files, even when he used sysopen to try to open the files without using the Standard I/O library. Of course, even if you use sysopen, it then takes the file descriptor that it gets from the system and attaches it to a filehandle, and that means it uses fdopen to create the stdio stream structure, and there's Standard I/O again.

### Lexical Variable Leak

Barrie Slaymaker reported a bug in 5.005_61 in which a lexical variable in one file leaked into a different file that was being processed by do.

### Control-Backslash

The discussion about semantics of control-backslash in a double-quoted string continued much longer than it should have, with the original correspondent proposing every possible alternative set of rules, and Ilya Zakharevich pointing out what was wrong with all of them. The original querent finally gave in.

### Bitwise Operators

Someone was caught by the changed behavior of the & operator between Perl 4 and Perl 5. Tom Christiansen posted a clear explanation. For some reason this topic was omitted from perltrap.

The discussion then turned towards how you can tell whether or not a string has ever been used in a numeric context. Larry suggested:

### next Outside a Block

Someone complained that

	map { next if ... } ... ;


yields the error message Can't "next" outside a block'. He claimed that the message was bad because clearly the next is in a block. Larry agreed that it should say loop' instead of block'. I didn't see a patch, however.

### Quiet List

Traffic was low for a while, so Nat Torkington sent out a ping message:

Nat: Just testing whether all of p5p has fallen quiet, or whether the mailing list manager is constipated.
Kurt Starsinic: The perl5-porters are on vacation, and will be until noon on November 25. If your need is urgent, please contact python-help@python.org.
Thank you.

### Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, spam, and a small amount of flamage.

Until next time (probably Tuesday or Wednesday) I remain, your humble and obedient servant,

Mark-Jason Dominus

## This week on perl5-porters (15-21 November 1999)

### Notes

I thought it was going to be a quiet week, but then bang, 140 messages arrived on Friday. Fortunately for you, most of them are ignorable. Did I mention that these reports are made possible through the generosity of O'Reilly and Associates, who pay me a salary?

Next week's report will be late, because on Sunday I will be in London with the Perl Mongers. Expect the next report sometime in early December.

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

### XSLoader.pm

Ilya contributed a new module, XSLoader, which is a cut-down version of Dynaloader. It uses less memory and has a simpler interface. He patched the standard dynamically-loaded modules to use XSLoader instead of DynaLoader. Read about it.

Dan Sugalski submitted a patch that makes a new locking macro available to XS code, but Sarathy said:

Every time I see people patching USE_THREADS code I wonder if it's all going to be for nothing. I don't see much hopefor salvaging the existing model of USE_THREADS where prolific locking is needed.

In case you missed the point of this: 5.005_63 is going to have a drastically different threading model than previous versions of Perl. I asked Dan to explain the two models to me, and he very kindly contributed the following:

#### Shared Interpreter threads (the current model)

This threading model corresponds pretty closely to what most folks think of when they think of threads. There is a single, shared executable (or optree, in perl's case), and a single pool of variables. Any thread can see, and affect, anything that's in its lexical scope. Data sharing between threads is cheap, but the onus is on the programmer to maintain consistency.

As an analogy, consider a thread a hamster, and your program as a giant habitrail. Starting another thread means releasing another hamster into the environment. Threads don't really collide (You ever see two hamsters get stuck in a habitrail tube?) but the only thing that keeps two or more hamsters from running on an exercise wheel at once is careful design of the set.

#### Cloned interpreter threads (upcoming in 5.005_63)

This threading model corresponds more to the traditional unix fork model. There is one copy of the optree per thread. One thread can't see or affect anything in another thread unless that thing is explicitly shared. Data sharing and coordination between threads is mildly expensive and a bit cumbersome, but the onus is on the perl interpreter to keep one thread from messing with another thread's toys.

Continuing the hamster analogy, creating a new thread in this model gets you a whole new, separate habitrail for your new hamster. The two habitrails may share exercise wheels, but only at very specific, explicit spots, and there are little exclusion gates built in so only one hamster can be on a wheel at once.

My personal preference is a bit mixed--I want shared threads because they let me build all sorts of cheap, nifty tools (Threaded objects, and a subclassable Thread::Object, are I<trivial>. I know, I wrote a package to do it. One thread per object. Wheee!), but I think most folks (including me, when writing 'normal' code) are better off with the protection cloned threads get you.

The trickier bit is under the hood. Getting perl thread-safe (which is to say, it won't core or segfault if the programmer doesn't sync access) is a PITA. I've already run through a couple of different plans--the first was paranoid and I, the second bloated libperl.so with a lot of shadow routines for XS, and the third I'm not gonna bother with if threads are dead anyway.



not only doesn't do what was wanted, but doesn't appear to do anything sensible at all. (For example, someone suggested that it does the assignment to $/ first, and then localizes, but this appears not to be so.) It appears to be a real bug. ### goto out of scope of local Ilya complained that using magical goto to exit the scope of a local undoes the localization: $a = 5;
sub a { print $a } sub b {local$a = 9; goto &a}
b();


This prints 5, and he wants it to print 9. (Note that Ilya's original report contains an error; he has local $b instead of local$a.) The reason he wants this is so that he can do:

 local @ISA = (@ISA, 'DynaLoader');


Nick Ing-Simmons suggested that he do this instead:

 {
}


Of course, that leaves the original call on the call stack so that (for example) messages from Carp might be confused about where to report an error from. In the case of Carp there actually is no problem. Discussion ended abruptly.

### use foo if bar;

Ilya contributed a patch to enable this syntax, and similarly unless. It does this by faking up a BEGIN block, but without the scoping effects of a real block. (The scoping effects are the reason why the programmer cannot simply use BEGIN here in the first place.)

Steve Fink contributed an amendment to Ilya's patch. Tom Christiansen sent a giant message complaining that the feature was yet another weird special case. Some discussion of alternatives ensued, with no conclusion that I could see.

Tom Christiansen suggested a no scope pragma, which would erase' one set of braces. Then you could say something like

 BEGIN {
no scope;
if (SOMETHING) {
no scope;
use integer;
}
}


And the no scope declarations would upscope the effectof the use integer pragma so that its effect continued to the end of the file.

Larry had many interesting things to say about this; the most straightforward was that something like use comppad should just upstack declarations all the way to the top of whatever code was currently being compiled.

Ilya said that that behavior was already available to any module that uses the hints variables $^H or %^H. ### croak confounds eval David Blumenthal reported a problem in IPC::Open3: It forks a child, which tries to exec your command, and if it can't, the child croaks. If the open3 call was inside an eval block, that means that the child returns from the eval block without exiting and your program gets a big surprise. David suggests that it should use carp instead and then call exit. I mention this because probably a lot of other modules have similar problems. Modules should never call croak or die. ### Control-backslash Philip Newton pointed out that there was no way to generate a control-backslash using the \c notation. Neither "\c\" nor "\c\\" works. The first complains about an unmatched quotation mark, and the second generates a control-backslash followed by a regular backslash. ( \x1c and the like do work.) The reason for this is explained in detail in perlop, in the section titled Gory details of parsing quoted constructs'. ### Empty Conditional in while() Greg Bacon reported that  while () { CODE }  is legal, and is an infinite loop. This turns out not to be a bug; it is because you are supposed to get an empty loop if you leave out the condition in a for (;;) block, and for(;;) and while() are the same thing. Larry even said he allowed the empty condition in while() on purpose back in Perl 1. Ilya asked why not get rid of it, similar to the way that if BLOCK BLOCK was gotten rid of. Larry pointed out that the two cases are not really analogous: if BLOCK BLOCK was removed (or broken) entirely by accident and that the only reason this was never fixed was that nobody at all complained about it. ### Undefined Function Warning Mike Taylor wrote in to ask for a warning that would announce (at compile time) the presence of calls to undefined functions. Dan Sugalski enumerated some of the reasons why there isn't already such a warning: AUTOLOADed functions; calls to platform-specific functions guarded by if ($^O eq 'Eniac'); functions loaded by require; people getting funky with the symbol table.

Tom Christiansen pointed out that perl -MO=Lint,-context,-undefined-subs program does something like this already. That seemed to satisfy Mike.

### Static Extensions

Margie Levine asked about compiling perl with statically-linked extensions. Andy Dougherty replied that all the documentation may have gotten lost, the best way it to just dro pthem in the ext/ directory before you run Configure, which is supposed to notice the extensions and built them as if they were part of the standard set.

### POD Hack

Ilya points out that a construction like this:

 C<S<
code with I<..> or L<...> escapes or whatever
>>


Will generate an indented code paragraph that can still contain pod escapes for hyperlinks or whatever.

### Perl Art

Greg Bacon reported an entertainment. (The background is Perl code.) There was some discussion about generating pictures of llamas and the like in a similar medium.

### localtime() Contest Continues

Try to guess how many bogus bug reports about the localtime() function will be submitted next year. Visit http://www.plover.com/~mjd/perl/y2k/y2k.cgi.

### Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amaount of flamage. (No spam this week.)

Until next week I remain, your humble and obedient servant,

Mark-Jason Dominus

## This Week on p5p 1999/11/14

Traffic was about normal this week, but there didn't seem to be as much content as usual. That is, there were the usual number of messages, but they didn't really seem to say much. This might just be my mistake, so if you think I missed something important, please send me a note about it.

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

Ilya forwarded an article that had been posted to comp.lang.perl.misc by Kragen Sitaker. The article pointed out that the following code was surprising:

 $foo = 1;$^W = 1;
sub test1 {my $foo = 2; return sub { eval shift } } print test1()->('$foo');


The subroutine captures a lexical scope in which $foo=2; on the other hand, the argument is evaluated in a lexical scope in which $foo=1. Which value of $foo is printed? Answer: Neither. Ilya wrote to p5p suggesting that this discrepancy generate a warning. But there was no discussion. I tried to do a little research to find out how other languages handle this problem. (An ounce of prior art is worth a pound of cure.) I looked around in Lisp world. Most Lisp doesn't have lexically scoped variables. The major exception is the Scheme family. Until Revision 5, Scheme did not have eval. Revision 5 does have eval, but you have to explicitly specify the environment in which you want the code evaluated. Conclusion: We may be on untrodden ground here. ## More About Line Disciplines Last week Larry said there should be some way of registering an I/O discipline on an I/O stream so that (for example) the I/O operators could convert automatically from a national character set to UTF-8, and vice versa. Sam Tregar appeared to volunteer for the job, but it is a big project and no word has come back yet. Ilya cautioned everyone to look at notes about the Tcl version of that feature to find out what not to do. (An ounce of prior art is worth a pound of cure.) Discussion petered out. This is probably something that needs more people working on it. ## link on WinNT Jan Dubois submitted a patch to enable the Perl link() function to work under WinNT. It is implemented in terms of some equivalent underlying WinNT feature. ## Regex Optimization Ilya posted a really big patch to the regex engine that contains a pretty major optimization. Ilya's example was /\bt.{0,10}br/. Normally, Perl can look for a longish fixed string inside the regex, in this case the br. It then looks for br in the target string, because there can't be a match without a br. It knows that if it finds a match, the br will occur at some position 1 .. 11 of the matching string; the t must occur at position 0. Perl hunts through the target string looking for the br, and when it finds it, it backs up to look for the t. If there is no t, it starts over and looks for another br. Since this process involves simply scanning the target for fixed substrings, it is very fast. Suppose Perl finds the t. Prior to the optimization, it would then enter the regex engine and start matching normally, looking for the word boundary. With the new optimization, it can notice when there is no word boundary and abandon the match immediately, without starting the main part of the regex engine. ## Threads on Solaris Marek Rouchal reported that threaded programs do not always work properly under Solaris. It is not clear what the problem is or whether it is the fault of Marek, Perl, the Solaris thread library, or some combination of those. The problem appeared to be unresolved as of this report. ## Regex Engine Reentrancy Stéphane Payrard reported that Perl dumps core if you try to do a regex match inside of a (?{...}) expression. This did not come as a big surprise seeing as the regex engine is not reentrant. (It uses several global variables.) Hugo van der Sanden: Making the regexp engine reentrant should fix the problem. Jarkko Hietaniemi: This is a bit like saying, "Stopping the wars should bring the world peace." Stéphane suggested making it a fatal error in the meantime, but no patch was offered. Somewhere in the discussion, Jesus Quiroga mentioned that he was working on a replacement for perlre.pod, including a tutorial and some examples. It's about time! ## Big Files Continue Last week Jarkko remarked that 5.6 would have better support for large files. (There are 32-bit integer overflow problems associated with those larger than 2GB.) A very long discussion, which appeared to have very little actual content, ensued. If I'm mistaken about this, maybe someone would like to mail me with a summary of the important points? Here are some typical problems: You have a file larger than 2GB and you try to get the file pointer position with seek(). The result might be too big to fit in a 32-bit integer. Your system's off_t type is probably big enough to hold the offset value; but when this value is stored into Perl's SV structure, it is coerced into 32 bits and information is lost. So, to handle this properly, SVs need to use 64-bit integers. But if you change the size of the integers in the SV, then Perl is no longer binary-compatible with compiled XS modules that have a different-size SV. When arithmetic on an integer value overflows the integer, it is converted to a floating-point number; but what happens in a system where an integer is more accurate than a float? Jarkko said that in 5.005_63 he would try turning on support for large files without enabling 64-bit integers generally, and see what happened. One difficulty: It is hard to test this support. Sarathy suggested storing the file offsets into a floating-point variable. Floating point numbers are inexact, but only once you overflow the mantissa, and the mantissa of the typical (64-bit) double variable is a 53-bit integer. ## Regexp Objects again More discussion about how to make Regexp objects more like real objects, but without making them slow. Still no conclusion, but the discussion wandered onto the topic of overloading the assignment operator. ## Unicode Support on EBCDIC Machines Folks on EBCDIC machines will have an unusual problem with Unicode, because Unicode is designed for compatibility with ASCII; the first 128 Unicode characters are identical with their ASCII counterparts. For example, in a double-quoted string, \N{EXCLAMATION MARK} is supposed to generate an exclamation point. Actually, it generates Unicode character U+0021, and in UTF-8 encoding this is represented by the single character 0x21, which happens to be an ASCII exclamation mark. Under EBCDIC, however, 0x21 is a capital letter O. (I think.) All sorts of UTF-8 tests fail on EBCDIC systems because of similar problems. I think Jarkko's suggestion was to fix the charnames pragma to notice when it is on an EBCDIC system, but the issue may not be closed yet. ## Marshalling Modules David Muir Sharnoff sent an interesting message, copied to the Modules list. Among other things, it calls for a standard interface to marshalling modules. Read about it. ## Got Perl? Banana Republic has a new advertisement that features a llama wearing a scarf. Here we see the llama wearing Larry's mustache as well. ## localtime() has Another Bug! The Y2K bug in localtime() was reported. Again. I invite everyone to guess how many spurious localtime() bugs will be reported during the year 2000. Register your guess, and I will announce the winner in January 2001. ## Various A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount of flamage. (No spam this week.) Until next week, I remain your humble and obedient servant, Mark-Jason Dominus ## This Week on p5p 1999/11/07 This week's report is a little early because I had to go away to the LISA conference. You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com. Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month. Discussion continued on what meanings to assign to certain patterns on DOSISH systems. For example, with the regular glob(), a backslash is an escape character. But you know that folks on DOSISH systems are going to want to write glob('foo\\bar') to have it look in directory foo for file bar. Paul's conclusion: \ in a glob pattern on a DOSISH system only behaves like a metacharacter when it precedes another metacharacter; otherwise it's a directory separator. ## D'oh::Year and Y2K warnings Michael Schwern felt that the Y2K warnings in Perl are too little too late. (Too little: 5.005_62 warns if you try to concatenate a number with the string "19". Too late: Look! It is November of 1999.) He submitted a module, tentatively named D'oh::Year, which follows this strategy: It overrides the localtime and gmtime functions so that they return a year value that looks like a number but is actually an overloaded object. When this object is concatenated with any of the strings "19", "20", or "200" or has any of several other suspicious operations performed on it, it issues a warning. It does this without any core patches. It is available from Michael's web site and probably also from CPAN. (This idea was originally suggested by Andrew Langmead.) Read about it. Sarathy said that he would not include it in the standard distribution, except perhaps as part of B::Lint. ## Threading and Regexes Last week I reported on a problem with Perl regexes under threading but I got the technical details wrong. Please ignore it, because it has no relation to reality. Actually I understood the problem better than I thought, because it is a very long-standing problem with regexes that shows up in many places, not just under threading. Basically, the problem is that certain properties of regexes are attached to the match node in the op tree, which effectively means that they are associated with the lexical appearance of the match operation in the source code. What does that mean? Here is a simple example:  sub tryme { my$string = shift;
return unless $string =~ /(.)/; print "$1";
tryme(substr($s, 1)); print "$1";
}


 tryme('abc');


tryme is invoked three times, and you would expect each invocation to have a separate pattern match with separate backreference variables. There is no reason to expect the value of $1 to be changed by the function call, so you would expect the two print statements in the function to print the same thing each time the function was invoked, so that you would get abccba. But instead, the backreference variables are attached to the regex match operator, and that operator is shared among all calls to the subroutine. This means that a later call to tryme overwrites the $1 of the earlier call, and the output is abcccc. Ugh. The threading problem is similar: Two threads can trample on one anothers backreference variables for the same match operator. I had complained about a related problem almost two years ago, and it was well-known then.

Sarathy expressed sadness that this problem has gone unfixed for so long. The correct fix is for the op node to store an offset into the pad, which is private to each instance of a function invocation and is not shared between calls to the same function or between threads. Then the pad will have the pointer to the backreference variables or whatever.

I wrote to the MIT folks to ask what exactly they were doing, but they did not reply.

## Change to xsubpp

Ilya submitted a patch to xsubpp which will change the value return semantics of XSUBs to be more efficient. It needs wide testing because almost all XSUBs will be affected.

## utf8 Needs a New Pumpking

Nick Ing-Simmons no longer has time to be responsible for utf8. If you want to be the new utf8 king, send mail to Sarathy.

## STOP blocks

Sarathy put them in. Details were in an earlier report.

## map and grep in void context

Simon Cozens submitted a patch that would issue a warning on grep and map in void context. Larry said he didn't want to do that; he thought that it would be better to propagate the void context into the code block so that the usual useless use of ... in void context messages would appear.

Larry: The argument against using an operator for other than its primary purpose strikes me the same as the old argument that you shouldn't have sex for other than procreational purposes. Sometimes side effects are more enjoyable than the originally intended effect.

## Data::Dumper and Regexp objects

Michael Fowler submitted a patch to make Data::Dumper work on Regexp objects. (Those are the ones generated by the qr// operator.) There was some discussion of problems with these kinds of objects: They're hard to recognize; they stringize in a strange way so that, unlike other sorts of references, you can't be sure you can tell them apart by looking at the strinigzed versions; and so on. Sarathy said he was uncomfortable with the implementation of the Regexp objects, and that they should be more like regular objects so that they would be easier to understand and so that they coul be dealt with just like other objects. Larry agreed, and added that exceptions and filehandles should work that way too. However, there was no specific proposal about what should be done.

## sort improvements

Peter Haworth improved his patch to allow XSUBs to be used as sort comparator functions. If the comparator is prototyped as (), then the list elements are passed normally, in @_, instead of as $a and $b. Read about it.

## IPv6 and Socket.pm

Warren Matthews has a version of Socket.xs that contains functions for interconverting integers and IPv6 addresses (which are normally represented in the form xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx). He asked if there would be interest in adding these to the standard distribution, but nobody replied, so I guess there wasn't any.

## New quotation characters

Larry suggested that when Perl is unicode-enabled, it could deduce from the Unicode character database which characters are parentheses, and then from the character names what the corresponding closing parenthesis is for any given open parenthesis. Having done that, it could then understand any sort of parentheses at all as delimiters in the q and qq operators. For example, and are the TIBETAN LEFT BRACE and TIBETAN RIGHT BRACE characters. Larry said that brackets should do what people expect, and that people expect them to match.

Alternatively, people could declare their parenthesis characters, and then use U+261E and U+261C as parantheses.

Larry: Down that path lies jollity.

## system 1, foo

Apparently on most non-unix Perl systems, if you invoke system 1, foo, it runs foo in the background, and returns the process ID number instead of the exit status of foo. Ilya suggested implementing this on Unix also.

Sarathy said it would be better to have a modular interface to that functionality, and he did not want to propagate this hack to any more systems. The concept is portable, but the incantation is not. What it needs is some Perl code to smooth it over.''

Jenda Krynicky pointed out the enhancement he proposed to Shell.pm last week would be easy to extend to support this cleanly. In case you forgot.

## MacPerl Error Messages

MacPerl had been formatting error messages like this:

 # Syntax Error
File "script.plx"; Line 46


 Syntax error in script.plx, line 46


because some Mac programming tool can parse the other form and open the file automatically with the appropriate line highlighted. Matthias Neeracher agreed to withdraw this change; I am not exactly clear on why, and I could not find the original patch.

## pack t Template

More discussion of Ilya's proposed t template for the pack and unpack functions.

Last week Joshua Pritikin asked why not just use Storable; Ilya pointed out that different kinds of marshalling code is useful under different circumstances, and Storable is not useful, for example, if you are trying to marshal data to be passed as an argument to a command. However, you could use his new pack template to marshal data into a string, pass it as a command argument, and have the command unpack it with unpack 't' on the other end. Ilya's explanation of the usefulness of pack 't'.

## New perlthread man page

Dan Sugalski updated his proposed perlthread man page.

## Big Files

Christopher Masto reported that the stat function does not work properly on files larger than 2GB, because the size is stored in the usual signed 32-bit integer. Jarkko said that 5.6 (and current development versions) will have better support for this, but that you will have to enable the support at compile time.

## localtime is Broken

Someone submitted another bug report because localtime returns the wrong month.

## Record Separators that Contain NUL

Sarathy put in a patch so that $/ could contain the NUL character, "\0". This would probably have passed without comment, except that Jeff Pinyan followed up with a question about setting $/ to \0 (that is, a reference to the constant integer zero) and said it was on the same topic, even though it was not. Enough people were confused by this that the discussion went on twice as long as it should have, with some people talking about "\0" and others discussing \0. Anyway, the answer is that Perl only goes into fixed-length record mode if $/is a reference to a positive number. Tom: [Fixed-length record mode is] yet another special-case exception requiring an "oh by the way" in the documentation. Dan Sugalski: Yeah, but if we took 'em all out we'd be left with 37 pages of documentation and C with morphing scalars. Then there was a digression: Nat Torknigton suggested that if $/ was set to a code ref, Perl could run the code whever you did a <...> read, and yield the return value from the code, instead of doing what it would normally do. Nat's example was:

 # all filehandles now autochomp
$/ = sub { my$x = CORE::readline(shift); chomp $x; return$x };


(He said $\ instead of $/, but that was a mistake.)

Several people got very excited about this, but Sarathy pointed out that it would be more straightforward to just override CORE::GLOBAL::readline(), and that he did not want to provide more than one way to do something that hardly anyone ever wants to do anyway.

But Larry expanded on the general idea, saying that there should be a general, lightweight way to insert various kinds of read and write disciplines into an I/O stream. The most important uses for this would involve having the I/O operators convert from UTF-8 to national character sets transparently, and vice versa. Larry: This is something we have to make easy in Perl. Not just possible.''

## Perl 5.6 New Feature List

Jeff Okamoto asked if there was one. Nobody said yes', so the answer is probably no'.

Hmm, it just occurred to me that I'm now the logical person to write up such a list for 5.7 and beyond But I didn't start doing this job soon enough to be able to do it for 5.6. If people will send me their feature lists, I will collate them and go over perldelta and try to come up with the canonical list.

## Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount of flamage. (No spam this week.)

Until next week I remain, your humble and obedient servant,

Mark-Jason Dominus

## This Week on p5p 1999/10/31

I'm sorry that this report is late, but I had some serious hardware trouble at home and couldn't work on the report until I fixed my computer. Fortunately traffic was light this week.

### Notes

It is hard to keep track of everything that happens. As before, please let me know if you have any corrections or additions. Send them to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

### glob case-sensitivity

This discussion continued from last week. Paul Moore said that he would try to resolve some of the issues with the new built-in globber under Windows. ( \ vs. /, what to do when the underlying filesystem is case-insensitive, etc.) Read about it.

The issues seemed to get thornier and thornier. For example, what do you do with glob("C:*")? On Unix systems, you would like it to look in the current directory for files beginning with C:. But on Win32 systems you would like it to look on the disk labeled C. Nevertheless Paul submitted a partial patch.

I looked for a remark from Sarathy, but I did not see one.

### Perl under UNICOS

Jarkko has been making sure that Perl works on UNICOS, which I gather is a version of Unix that runs on Crays. But his Cray is going away, and he needs someone else to take over, or to give him access to a UNICOS machine. If you can do this, please contact him. If you don't know how to contact him, contact me.

### Threading and explicit unlocking

Last week's discussion of the proposed perlthread man page split into two interesting digressions. This is the first one: At present, a lock is released when control leaves the dynamic scope in which it was first obtained.

Usually this is what you want and takes care of releasing locks at the right times. Tuomas Lukka suggested that there also be an explicit unlock function for releasing a lock prematurely.

Sarathy said he'd prefer an interface that lets you store a lock into a variable as if it were an object; then its release semantics would be the same as for any other value. It would be released when the variable was destroyed, whether that was at the end of the block or by an explicit undef. Read about it.

Rob Cunningham reports that he and Brian Mancuso at MIT are working on fixing regexes, which do not always work properly in threaded Perl. This is obviously very important. One issue is the global variables like $1. If two threads try to write into $1 simultaneously, the result is backreference goulash. But there are a huge bunch of other global variables used internally by the regex engine for storing the current state and for getting the /egismosx flags from Perl and so on. All of these present thread hazards.

Rob: Brian reports that perl REGEXP code is nasty stuff, or we'd be done by now.

Ilya said that he was also planning on removing most of the internal global variables when he gets some time.

Mike Guy pointed out that this problem also occurs when you are trying to write a function that behaves like rand: The prototype of rand is supposedly ($), but if you create a function myrand with that prototype, then print myrand, myrand; aborts with a syntax error although print rand, rand; works. Prototypes were added to Perl so that user functions could get the syntax benefits that the built-in functions enjoyed. But some functions still can't be imitated with prototypes. In addition to ref and rand, neither of printf or tie can be so imitated. ### $^O

Andy Dougherty is patching Configure to have it find out what sort of Linux it is running on, if it is is running on Linux. This might solve Tom's problem from last week.

### sort improvements

Peter Haworth submitted an improved version of his patch for sort. He says he has benchmarked the new sort with several trivial comparator functions and performance is not bad at all. (If it were slower, you would expect to see the greatest difference with a trivial comparator.) You still cannot use an XSUB as a sort comparator function, but Peter is working on that. Reread what I said last week.

### Shell.pm enhancements.

[Shell.pm] presently lets you write a function call echo("hello", "world!") and if there is no echo function already defined, it will invoke the shell's echo command. It also has a new constructor that returns a reference to a fnuction that invokes a shell command. Jenda wants to be able to give the constructor some extra parameters to tell it to throw away the STDERR and to be able to pre-supply arguments to the function.

### Time Zone Output

Todd Olson complained that there was no easy way to obtain the current time zone in numeric format. (For example, -0400 instead of EDT or -0700 instead of PST. He points out that it would be wasteful to write a function to compute this value: The value must be inside there somewhere already, because it is used to compute localtime(). Todd wants someone to add another %-escape to the POSIX module's strftime function that will format and display the time zone in numeric format. However, he did not provide a patch.

### Python Consortium Forms

Randal Schwartz reposted an announcement about a new Python Consortium.

### Yikes

Sarathy did not say `yikes' this week.

### Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount of flamage and spam.

Until next week I remain, your humble and obedient servant,

Mark-Jason Dominus

Visit the home of the Perl programming language: Perl.org