This Week on p5p 1999/11/07

This week’s report is a little early because I had to go away to the LISA conference.

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to mjd-perl-thisweek-YYYYMM@plover.com where YYYYMM is the current year and month.

Discussion continued on what meanings to assign to certain patterns on DOSISH systems. For example, with the regular glob(), a backslash is an escape character. But you know that folks on DOSISH systems are going to want to write glob('foo\\bar') to have it look in directory foo for file bar. Paul’s conclusion: \ in a glob pattern on a DOSISH system only behaves like a metacharacter when it precedes another metacharacter; otherwise it’s a directory separator.

D'oh::Year and Y2K warnings

Michael Schwern felt that the Y2K warnings in Perl are too little too late. (Too little: 5.005_62 warns if you try to concatenate a number with the string "19". Too late: Look! It is November of 1999.) He submitted a module, tentatively named D'oh::Year, which follows this strategy: It overrides the localtime and gmtime functions so that they return a year value that looks like a number but is actually an overloaded object. When this object is concatenated with any of the strings "19", "20", or "200" or has any of several other suspicious operations performed on it, it issues a warning. It does this without any core patches. It is available from Michael’s web site and probably also from CPAN. (This idea was originally suggested by Andrew Langmead.) Read about it.

Sarathy said that he would not include it in the standard distribution, except perhaps as part of B::Lint.

Threading and Regexes

Last week I reported on a problem with Perl regexes under threading but I got the technical details wrong. Please ignore it, because it has no relation to reality.

Actually I understood the problem better than I thought, because it is a very long-standing problem with regexes that shows up in many places, not just under threading.

Basically, the problem is that certain properties of regexes are attached to the match node in the op tree, which effectively means that they are associated with the lexical appearance of the match operation in the source code. What does that mean? Here is a simple example:

 sub tryme {
   my $string = shift;
   return unless $string =~ /(.)/;
   print "$1";
   tryme(substr($s, 1));
   print "$1";
 }

 tryme('abc');

tryme is invoked three times, and you would expect each invocation to have a separate pattern match with separate backreference variables. There is no reason to expect the value of $1 to be changed by the function call, so you would expect the two print statements in the function to print the same thing each time the function was invoked, so that you would get abccba. But instead, the backreference variables are attached to the regex match operator, and that operator is shared among all calls to the subroutine. This means that a later call to tryme overwrites the $1 of the earlier call, and the output is abcccc. Ugh. The threading problem is similar: Two threads can trample on one anothers backreference variables for the same match operator. I had complained about a related problem almost two years ago, and it was well-known then.

Sarathy expressed sadness that this problem has gone unfixed for so long. The correct fix is for the op node to store an offset into the pad, which is private to each instance of a function invocation and is not shared between calls to the same function or between threads. Then the pad will have the pointer to the backreference variables or whatever.

I wrote to the MIT folks to ask what exactly they were doing, but they did not reply.

Change to xsubpp

Ilya submitted a patch to xsubpp which will change the value return semantics of XSUBs to be more efficient. It needs wide testing because almost all XSUBs will be affected.

utf8 Needs a New Pumpking

Nick Ing-Simmons no longer has time to be responsible for utf8. If you want to be the new utf8 king, send mail to Sarathy.

STOP blocks

Sarathy put them in. Details were in an earlier report.

map and grep in void context

Simon Cozens submitted a patch that would issue a warning on grep and map in void context. Larry said he didn’t want to do that; he thought that it would be better to propagate the void context into the code block so that the usual useless use of ... in void context messages would appear.

Larry: The argument against using an operator for other than its primary purpose strikes me the same as the old argument that you shouldn’t have sex for other than procreational purposes. Sometimes side effects are more enjoyable than the originally intended effect.

Data::Dumper and Regexp objects

Michael Fowler submitted a patch to make Data::Dumper work on Regexp objects. (Those are the ones generated by the qr// operator.) There was some discussion of problems with these kinds of objects: They’re hard to recognize; they stringize in a strange way so that, unlike other sorts of references, you can’t be sure you can tell them apart by looking at the strinigzed versions; and so on. Sarathy said he was uncomfortable with the implementation of the Regexp objects, and that they should be more like regular objects so that they would be easier to understand and so that they coul be dealt with just like other objects. Larry agreed, and added that exceptions and filehandles should work that way too. However, there was no specific proposal about what should be done.

sort improvements

Peter Haworth improved his patch to allow XSUBs to be used as sort comparator functions. If the comparator is prototyped as ($$), then the list elements are passed normally, in @_, instead of as $a and $b. Read about it.

IPv6 and Socket.pm

Warren Matthews has a version of Socket.xs that contains functions for interconverting integers and IPv6 addresses (which are normally represented in the form xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx). He asked if there would be interest in adding these to the standard distribution, but nobody replied, so I guess there wasn’t any.

New quotation characters

Larry suggested that when Perl is unicode-enabled, it could deduce from the Unicode character database which characters are parentheses, and then from the character names what the corresponding closing parenthesis is for any given open parenthesis. Having done that, it could then understand any sort of parentheses at all as delimiters in the q and qq operators. For example, and are the TIBETAN LEFT BRACE and TIBETAN RIGHT BRACE characters. Larry said that brackets should do what people expect, and that people expect them to match.

Alternatively, people could declare their parenthesis characters, and then use U+261E and U+261C as parantheses.

Larry: Down that path lies jollity.

system 1, foo

Apparently on most non-unix Perl systems, if you invoke system 1, foo, it runs foo in the background, and returns the process ID number instead of the exit status of foo. Ilya suggested implementing this on Unix also.

Sarathy said it would be better to have a modular interface to that functionality, and he did not want to propagate this hack to any more systems. ``The concept is portable, but the incantation is not. What it needs is some Perl code to smooth it over.”

Jenda Krynicky pointed out the enhancement he proposed to Shell.pm last week would be easy to extend to support this cleanly. In case you forgot.

MacPerl Error Messages

MacPerl had been formatting error messages like this:

 # Syntax Error
 File "script.plx"; Line 46

instead of like this:

 Syntax error in script.plx, line 46

because some Mac programming tool can parse the other form and open the file automatically with the appropriate line highlighted. Matthias Neeracher agreed to withdraw this change; I am not exactly clear on why, and I could not find the original patch.

pack t Template

More discussion of Ilya’s proposed t template for the pack and unpack functions.

Last week Joshua Pritikin asked why not just use Storable; Ilya pointed out that different kinds of marshalling code is useful under different circumstances, and Storable is not useful, for example, if you are trying to marshal data to be passed as an argument to a command. However, you could use his new pack template to marshal data into a string, pass it as a command argument, and have the command unpack it with unpack 't' on the other end. Ilya’s explanation of the usefulness of pack 't'.

New perlthread man page

Dan Sugalski updated his proposed perlthread man page.

Big Files

Christopher Masto reported that the stat function does not work properly on files larger than 2GB, because the size is stored in the usual signed 32-bit integer. Jarkko said that 5.6 (and current development versions) will have better support for this, but that you will have to enable the support at compile time.

localtime is Broken

Someone submitted another bug report because localtime returns the wrong month.

Record Separators that Contain NUL

Sarathy put in a patch so that $/ could contain the NUL character, "\0". This would probably have passed without comment, except that Jeff Pinyan followed up with a question about setting $/ to \0 (that is, a reference to the constant integer zero) and said it was on the same topic, even though it was not. Enough people were confused by this that the discussion went on twice as long as it should have, with some people talking about "\0" and others discussing \0. Anyway, the answer is that Perl only goes into fixed-length record mode if $/is a reference to a positive number.

Tom: [Fixed-length record mode is] yet another special-case exception requiring an “oh by the way” in the documentation.

Dan Sugalski: Yeah, but if we took ‘em all out we’d be left with 37 pages of documentation and C with morphing scalars.

Then there was a digression: Nat Torknigton suggested that if $/ was set to a code ref, Perl could run the code whever you did a <...> read, and yield the return value from the code, instead of doing what it would normally do. Nat's example was:

 # all filehandles now autochomp
 $/ = sub { my $x = CORE::readline(shift); chomp $x; return $x };

(He said $\ instead of $/, but that was a mistake.)

Several people got very excited about this, but Sarathy pointed out that it would be more straightforward to just override CORE::GLOBAL::readline(), and that he did not want to provide more than one way to do something that hardly anyone ever wants to do anyway.

But Larry expanded on the general idea, saying that there should be a general, lightweight way to insert various kinds of read and write disciplines into an I/O stream. The most important uses for this would involve having the I/O operators convert from UTF-8 to national character sets transparently, and vice versa. Larry: ``This is something we have to make easy in Perl. Not just possible.”

Perl 5.6 New Feature List

Jeff Okamoto asked if there was one. Nobody said `yes’, so the answer is probably `no’.

Hmm, it just occurred to me that I’m now the logical person to write up such a list for 5.7 and beyond But I didn’t start doing this job soon enough to be able to do it for 5.6. If people will send me their feature lists, I will collate them and go over perldelta and try to come up with the canonical list.

Various

A large collection of bug reports, bug fixes, non-bug reports, questions, answers, and a small amount of flamage. (No spam this week.)

Until next week I remain, your humble and obedient servant,


Mark-Jason Dominus

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub