This Week on p5p 2000/11/20

Nov 20, 2000 by Simon Cozens

Notes
Fixing the Regexp Engine
UTF8 and Charnames
PerlIO (again)
=head3 (again)
New subs.pm
Congratulations
Various

Notes

You can subscribe to an e-mail version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to simon@brecon.co.uk

There were more than 250 messages this week.

Fixing the Regexp Engine

There’s been a lot of work on the regular-expression engine this week, from Jarkko and Ilya. If you recall, there have been two major problems with the regular-expression engine. First, there’s re-entrancy. What this means is that if you try to say

    m/something(?{s|foo|bar|})bad/

then by the time you get to bad, the engine will be confused, because it can’t restore context properly after the s|foo|bar. It can now, thanks to Ilya, who noted:

Why such a trivial patch should wait for me to do it?!

Doing a similar edit of regexec() is left as an exercise to the reader.

Jarkko retaliated, “Most us mortals find the reg*.[hc] rather daunting.” I note that the “trivial patch” was 89k.

The second problem is recursion; (See coverage two weeks ago) what this means is that things such as the pathological example that Dan Brumlevel produced this week causes stack overflows:

    '' =~ ($re = qr((??{$i++ < 10e4 ? $re : ''})));

Ilya also started work on “flattening” the regular expression engine to remove this recursion. While his patch doesn’t solve the problem, it mitigates it considerably and also allows stack unwinding on things like alternation.

The famous “polymorphic regular expression” problem also saw some work, thanks to Jarkko. This problem occurs when matching inside UTF8 strings; it goes a little like this: Given a character string which contains character 300, which is represented in UTF8 as character 196 followed by character 172, should character 172 match? Of course, it shouldn’t, but it currently does. What’s needed is to turn each “node” of a regular expression into something that can make sense as a character and also as a series of bytes, and then it can behave appropriately when matched in byte mode and in character mode. (See previous discussion of this.)

Jarkko reports that most of it is now working, apart from the special match variables ( $&, $` and $') and the POSIX character classes.

UTF8 and Charnames

Andrew McNaughton found that Charnames didn’t produce UTF8-encoded strings on code points less than 255. He produced some patches to make it do so, but Nick and I didn’t believe that it should be UTF8-encoding if it doesn’t need to be. Perl’s UTF8 encoding is done lazily - strings are upgraded if Perl can’t avoid upgrading them.

PerlIO (again)

Nick’s sterling work on PerlIO continues, and this week he raised the question of where to store the defaults that use open sets; currently, there are four bits set in the CV, two for input and two for output. However, since this only gives you four states, it’s not exactly extensible to user-defined disciplines. The suggestion from Sarathy was to use the same area used by the lexical warnings pragma. The whole thread contains a good discussion about how the semantics of PerlIO disciplines will pad out. Read about it.

=head3 (again)

Casey Tweten’s unstinting drive to make POD support =head3 and further levels continues. He first patched the documentation, and then patched Pod::Checker to make sure it wouldn’t complain about the new levels. Russ Allbery released a new version of the podlaters to support them.

New subs.pm

Jeff Pinyan put forward a new version of subs.pm which is supposed to deal with pre-declaring prototypes and attributes. It needed a little smoothing out, but it’s looking quite good now.

Congratulations

Two sets of congratulations are in order this week. First, to the dedicated team of bug squashers; there are now less than 1,000 open bugs in Perl, and none of those is considered fatal.

Congratulations are also due to Perl hacker Mark Fisher, who just become a daddy. Read about it.

Various

Quite a lot of useful minor fixes, and a few noninteresting bug reports. Lots of the usual test results. Only two flames this week, and one each from your illustrious perl5-porters digest authors. You’d think we’d know better.

Until next week I remain, your humble and obedient servant,

Simon Cozens

Tags

community