These Weeks on p5p 2000/10/23

Oct 23, 2000 by Simon Cozens

Notes
What is our Unicode model?
Why not use sfio?
More than 256 Files / sysopen
What if cc changes?
Unicode on EBCDIC
AIX Is Confused
Unicode split fixed!
Fixes to Carp
open() might fail
Perl 5 is 5!
Flip-flop bug
Exiting a subroutine via next strikes again!
New constant sub mechanism
Regex segfaulting
use vars lets you do naughty things
Various

Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to simon@brecon.co.uk.

For family reasons, I wasn’t able to get the summary out last week; please note that this is a special bumper double issue, covering the last two weeks.

What is our Unicode model?

The big debate these past couple of weeks has been what our Unicode model actually means: What use bytes should really do, whether it’s all right to upgrade values to UTF8 without telling anyone, and so on. If you’re at all interested in how Perl’s Unicode strategy works - or, as some would claim, doesn’t - take a look at this thread. There were nearly a hundred messages in the thread, and it went off into discussion of line disciplines, the mechanism by which Unicode data will be read from external sources such as files. The thread also sparked up a bit of private email, and the following things were resolved:

It doesn’t matter if data gets upgraded to UTF8 internally; if there is a place where it does matter, that’s a bug.
Line disciplines need to happen before anything else.
use bytes does make sense. Honest.

A lot of the thread was taken up with technical details about how line disciplines should be implemented; having line disciplines will mean we’ll almost certainly need our own version of stdio or something very like it.

Why not use sfio?

The obvious alternative to implementing our own version of stdio is, of course, steal someone else’s, and so the Nicks (Clarke and Ing-Simmons) began looking at ATT’s sfio library.

sfio claims to do what we want from it, and its developers (David Korn - yes, him - and Phong Vo) posted to p5p to reassure us that it would be suitable, and that the new license allows it to be distributed with Perl; so it looks like that’s a very big possibility.

It will need to be ported to VMS and some other platforms, but between us and the sfio guys, this shouldn’t be a problem, and it would benefit all the other sfio users out there too. Let’s get to it!

More than 256 Files / sysopen

Piotr Piatkowski reported that you can’t open more than 256 files on Solaris. Well, that turns out to be standard Solaris behaviour, but you should be able to get around it with sysopen, since that calls the underlying operating system’s open directly, right? However, it turns out that isn’t the case - sysopen actually calls fopen, which is wrong! A bug report wasn’t opened for this, and there wasn’t a fix.

What if cc changes?

Jarkko noted that since people are now shipping Perl with their commercial Unix distributions, the compilers they compile Perl under - say, Sun cc - may well not be the compiler that the user ends up with - typically, something like FSF gcc. This causes problems when adding CPAN modules - Sarathy suggested the answer was to hack the Config.pm, and also to have packaged module distributions. (Like the module .debs and .rpms) ActiveState’s PPM was suggested as a natural way to do this, and it has recently been released onto CPAN.

Unicode on EBCDIC

Perl 5.6 on EBCDIC is slowly getting together; Ignasi Roca noticed a bug, and attempted a fix, but Peter Prymmer and I were already on to it. Peter also contributed a lot of miscellaneous patches to places where the test suites implied ASCII.

AIX Is Confused

Guy Hulbert helped us get AIX on its feet after it seemed to be having some difficulties about the contents of struct tm. Merijn then contributed README.aix, which had been mysteriously left unwritten. Guy also identified a few more issues, which I believe have been fixed but I’m not sure.

Unicode split fixed!

Last time, I reported problems with split and Unicode; the good news is, Jarkko has fixed it. I also contributed a fix to the wide-character- ~ issue identified last time, which didn’t work on 64-bit operating systems. Jarkko and I simultaneously discovered this was a bug in pp_chr which was restricted to 32-bit values because it took a U32 off the stack - when you’re dealing with characters these days, anything up to and including a UV is fair game.

In other Unicode news, there was a bug in which doing something like "$1$utf8" caused a read-only variable error; Jarkko fixed it, and Nicholas Clark produced some tests.

Fixes to Carp

Ben Tilly produced a big patch to address some errant and confusing behaviour in Carp; unfortunately, this appeared to cause more of the test suits to fail than before. The patch hasn’t been revisited as far as I can tell, so someone might want to take a look at it.

open() might fail

Jarkko was evidently in “astute” mode this fortnight, asking “what to do when open/fopen/fdopen fail?”. This potentially interesting thread got bogged down by a lot of language-lawyering about errno. This usually happens with errno.

Perl 5 is 5!

Perl 5 became 5 years old sometime last week. I say “sometime” because there was a little debate as to when birthdays are celebrated with respect to timezone. Yes, I thought it was incredulous, too, but here’s the original release message as kept by Jeff Okamoto.

Flip-flop bug

Jeff Pinyan noticed that the flip-flop operator in scalar context ( if 1...20 and so on) has interesting problems when there isn’t an open filehandle. I fixed this, badly, but Hugo fixed it tidily; Mark-Jason Dominus called for a documentation patch, which hasn’t appeared yet. If someone wants to read the thread and make the suggested change, that would be lovely!

Exiting a subroutine via next strikes again!

Alexander Farber found a small File::Find example which caused segfaults. There’s a workaround in place in the current development Perl, but the fact remains that exiting a subroutine via next can, and does, cause segfaults. This would be a good thing for someone who’s feeling bold to investigate.

New constant sub mechanism

John Tobey came up with a reimplementation of newCONSTSUB. What’s that, then? Well, when you use constant or create a sub like like sub foo () { 42 }, Perl optimizes this and creates a special subroutine which is inlined at compile time. His new version seems to save about 70% of space, but of what space and under what conditions I couldn’t tell you. This one, however, also optimises sub () { return 42; } by creating a “constant” flag for a CV. Adding the extra flag meant that things like dump.c and B::* had to be updated, which John duly did. Nice, thorough job.

Regex segfaulting

There’s been rather a lot of segfaults this time. Marc Lehmann noticed one with regexps. Andreas did the regression testing, and Hugo fixed it; Ilya came in to help explain the details and the pitfalls.

use vars lets you do naughty things

Robert Spier was doing a bit of bug archaeology, and found that use vars lets you do naughty things; in particular, it lets you create variables like $f; that you can’t easily access. He provided a patch which caused use vars when combined with use strict q/vars/ to croak on unreasonable variable names. This raises interesting philosophical issues - should we be allowed to say

use vars q($x $y;$z);

What do you think?

Various

In other news, the usual collection of bug reports, bug fixes, non-bug reports, questions, answers, and no spam. No flamage; or at least, nothing memorably amusing.

Until next week I remain, your humble and obedient servant,

Simon Cozens

Tags

community