This Week on p5p 2001/01/14

Jan 15, 2001 by Simon Cozens

Notes
Excising sigsetjmp
Benchmarking
UTF8 Heroism
Cygwin versus Windows
Lvalue lists
Linux large file support
Calls for papers
Various

Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

Excising sigsetjmp

Alan Burlison found that by telling Perl that Solaris didn’t really have sigsetjmp, he could get a noticable improvement in speed - around 15%. He asked if sigsetjmp could go, or whether there was a reason for it being there. Andy said there was a reason, but he had forgotten what it was. Nick I-S asked if there was anything that sigsetjmp was absolutely required for - the answer, from Alan, was that sigsetjmp restores the signal mask after a jump. In Perl terms, this means that if you die from a signal handler into an eval (something you’d be doing with alarm, for instance) then you’d be sure to get your signal handler reinstalled. With ordinary setjmp you might get your signal mask restored, but you might not. There was some discussion as to whether it would be possible to only use sigsetjmp to jump into and out of a signal handler, but Nicholas Clark pointed out that since any Perl subroutine could be a signal handler, it’s more or less impossible to make a distinction. The eventual consensus is that Perl’s signal handling is currently so, uh, sub-optimal, that it probably wouldn’t make that much of a difference if sigsetjmp was removed.

In the end, Nick Ing-Simmons came up with a patch which provided roughly sigsetjmp-like semantics with ordinary setjmp, so it looks like there might be a win there.

Benchmarking

Alan did some more fiddling with optimization and Solaris configuration, and managed to get what he claimed was a 30% overall speedup - 18% due to setjmp and 12% due to optimizer settings. Numbers like that immediately sparked a debate on how you can conceivably benchmark a programming language manually; it’s well known that the test suites exercises Perl in a number of non-standard ways, and really doesn’t represent real world use. Alan said that his tests had been done on a real XS module for dealilng with Solaris accounting.

Nicholas Clark asked what a sensible benchmark would be; he suggested Gisle’s perlbench, which was at least designed to try to be a fair test for Perl, but it seemed there was some confusion as to how it was supposed to work. Doug Bagley’s programming language shootout was also mentioned.

Jarkko nailed the question, in the end : “The problem with all artificial benchmarks is that they are artificial.” Read about it.

UTF8 Heroism

INABA Hiroto’s been at it again. With his latest patches, the Unicode torture test works fine, which is fantastic news - Unicode should now be considered stable and usable. In fact, one of his patches also fixes a couple of regular expression bugs as well. There was then some disagreement over Unicode semantics (as usual) and whether or not \x{XX} should produce Unicode output; Hiroto came up with an excellent suggestion: the qu// operator would work like a quoted string but would always produce UTF8. And, dammit, he implemented it as well. In the words of Pete Townsend, I’ve gotta hand my Unicode crown to him. Or something.

All that’s really left to do now is to reconcile EBCDIC support and UTF8 support - the suggested way to do this was to put in some conversion tables between the two character encodings, so that anything that created UTF8 data would have its EBCDIC input sent through a filter to turn it into Latin 1, and anything which decoded UTF8 data would be sent through a filter to turn it back into EBCDIC. There was some progress on that this week, but a fundamental problem remains: some things, such as version strings, want the UTF8 codepoints qua codepoints. That’s to say, the numbers in v5.7.0 should NOT be transformed into their EBCDIC equivalents. This was manifesting itself with weird errors like

    Perl lib version (v5.7.0) doesn't match executable version (v5.7.0)

But it’s being worked on.

Cygwin versus Windows

Some issues surfaced while Reini Urban was looking at Berkeley DB support in Cygwin - not all of them were Perl related, but contained useful information for porters.

Some code in Berkeley DB relied on the maximum path length; Reini wanted to use an #ifdef _WIN32 block to get at MAX_PATH, but Charles Wilson pointed out that Cygwin should NOT define _WIN32, which is a compatibility crutch for bad ports. Cygwin already defines FILENAME_MAX and PATH_MAX as ISO C and POSIX demand, so those should be used instead of MAX_PATH which is a strange beast from Windows-land.

The more general lesson here for Perl porters is that you should code for Cygwin as if it were a real, POSIX-compliant system, rather than as if it were Windows.

Oxymoron of the thread award went to Ernie Boyd, who explained MAX_PATH as a “MS Windows standard”.

Lvalue lists

Continuing the lvalue saga, Stephen McCamant produced a full and glorious lvalue subroutine patch, which Jarkko applied. Tim Bunce wondered what would happen if you said

        (sub_returning_lvalue_hash()) = 1;

Stephen explained that the rules for assigning things is exactly the same as you’d expect from scalars, and that, for instance, you should put brackets around the right hand side if you’re doing anything clever:

        sub_returning_lvalue_array() = (1, 2);

Radu Greab fixed a problem where lvalue subs weren’t properly imposing list context on the assignment; this causes all sorts of problems when you have

    (lvalue1(), lvalue2()) = split ' ', '1 2 3 4';

as split doesn’t see the right number of elements to populate. This led to a discussion of the curious and undocumented PL_modcount. This variable tells Perl how many things to fill up - it’s actually only used in the case of split. However, it uses the number 10000 as a signifier for “this is going to be in list context, so just keep filling”. Jarkko, after possibly one too many games of wumpus raised objection to this undocumented, unmacroified bizarre magic number. However, both the magic number and the lvalue split bug got tidied up.

Linux large file support

Richard Soderberg had a valiant crack at getting large file support to work under Linux, and concluded that he had to include the file features.h to make things work; after a little more messing around, he found that -D_GNU_SOURCE should also turn on the required 64-bit types. Russ Allbery piped up saying that -D_GNU_SOURCE ought to be more than enough - if it wasn’t, there was a bug in glibc. (It looked for a fun moment that features.h was somewhat ironically named.)

Andreas said that his experience had been that upgrading his kernel, making the kernel headers available and then rebuilding glibc had magically given him large file support with no changes to Perl required - just a reconfigure and recompile. Linux users take note!

Calls for papers

Nat Torkington reminded us that the Perl Conference call for papers has been published, and gave a few ideas for papers that Perl porters could give. We’re trying to press-gang someone into giving a paper on how the regular expression engine actually works, but the usual suspects have gone very quiet.

Rich Lafferty also remarked that the equally worthy Yet Another Perl Conference was also seeking papers.