You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest-subscribe@netthink.co.uk.
Please send corrections and additions to
perl-thisweek-YYYYMM@simon-cozens.org where
YYYYMM is the current year and month.
We saw about 325 messages this week, not including test results. Note that the mailing list for this summary has moved over to NetThink, joining the Perl 6 summary which you can subscribe to by sending a message to perl6-digest-subscribe@netthink.co.uk.
You know I can't resist it; there were three fairly big Unicode threads this week.
I kicked off by providing a patch which enables Unicode literals of the
form
U+89AB. Jarkko and Andreas reasonably complained that we have enough magical token
types already; I countered by suggesting
v0x89AB, but haven't produced a patch yet. Philip Newton pointed out that this
was pretty much the same as
"\x{89AB", so was probably not necessary, however a few people found it
intuitive, so I'll look at it when the next supply of tuits arrives.
Jarkko, meanwhile, found a really, really weird Unicode bug related to string sharing in hash keys and certain 64 bit platforms. The code
use utf8;
$FOO{BAR};
produced a warning about cleaning up the string when
perl exits.
I say "certain platforms", but it proved very difficult indeed to reproduce, and Jarkko finally tracked it down with a combination of brute force and finesse.
There were a number of problems of understanding with versions and vstrings this
week; several serious, and one silly. Firstly,
$] generates 5.006, rather than 5.6; Perl still juggles the two versioning
conventions, but this means that
if ($] < 5.6) ...
doesn't do the right thing. You should remember that
$] produces old-style "floating point" versions, and
$^V produces new-style Unicode versions.
Next, there was the misunderstanding about how vstrings construct strings: vstrings are not bare words, but they are another way of generating strings. Hence,
if ($^V lt "v5.6")
won't work; indeed, the
v doesn't show up in
v5.6 at all, and
v5.6 is very different from
[5.6].
For the silly one, you'll just have to read the list. :)
Linux recently introduced a temporary file system,
tmpfs, which is like the Solaris
tmpfs but done right. Unfortunately, it wasn't done right enough for us:
File::Find was having problems because the
nlink field of
stat was inaccurate; one suggestion was just to set the
dont_use_nlink configuration variable, but that's horrendously slow. Tels provided a
patch to
File::Find which avoids using
nlink where possible, but Andy pointed out that it isn't a general solution,
because it could get expensive to
stat every directory to make sure you haven't passed a filesystem boundary
and need to change your
nlink usage. Andreas got in touch with the author of Linux's
tmpfs who provided a patch to it. Alan Cox bitched that applications
demanding traditional Unix semantics for
nlink were buggy,
but applied the patch anyway, so
tmpfs and
File::Find will play nice again at some unspecified point in the future.
One of the targets for Perl 5.8 is to speed up XML parsing, but nobody
really has any idea how to do that. The current
XML::Parser uses an external library, which means that a lot of speed is lost in
flapping around in XS. The idea was mentioned of a pure-Perl version,
which we would then be able to ship in core. Jarkko says:
Okay, I lied. I do have an opinion: relying on an external library to do XML parsing is weird. expat is nice and is a de facto standard, and reinventing the wheel that has already been extensively invented and debugged is silly in the extreme -- but we are, after all, supposed to be The Text Processing Language.
Matt Sergeant, as ever, had good XML suggestions:
If you do that, I suggest/recommend at least doing it the Python way - by letting XML experts (i.e. a SIG) discuss what would be the best way to do it. Note also that ActivePerl ships with XML::Parser, though in a few months it may not necessarily be the best option any more.
Also, if you do add one, it should definitely conform to the Perl SAX API (either v1 or v2), as that's the way perl-xml is heading.
He also pointed out that:
The speed problem is that expat is basically a callback/event based parser, so you have a storm of events crossing the XS/Perl barrier, meaning that you're constantly building SV's. Orchard can get around this by doing the parsing to a tree structure in C. (Note that Orchard is also based on expat). Or it can also do SAX based event passing, but again that's about as slow as XML::Parser.
Doing it all in Perl is possible, but not entirely trivial to get exactly right. XML has a lot of annoying nuances that were left in from SGML, mostly to do with DTDs. And while I don't use most of the annoying features, I think I'd be upset if the core Perl started shipping an "XML" parser that wasn't fully XML compliant.
I wonder if the
perl-xml mailing list could go chew on that and let us know the best way to
proceed.
There was an overly long and underly helpful discussion between Ilya,
Jarkko, and Nicholas Clark about the merits and the implementation of
the IV preservation patch. (That's the thing that means
$a = $b+$c is done as a integer rather than a floating point operation where
possible.)
For some reason this morphed into an overly long and underly helpful discussion between Ilya, Jarkko and the other Nick about Unicode. I stood back and let it happen.
Nobody had any code in either branch of the discussion, so the current implementation remains, no matter how dirty people think it is. If you really want to wade through it all, start here.
Alan "The Plumber" Burlison has been exceptionally hard at work again this week tracking down memory leaks and situations where SVs are not being freed properly due to reference cycles.
Part of the problem with memory leaks is that because of the arena mechanism, they're pretty hard to spot; the "arena" is a chunk of memory which is allocated in advance and split up and divided out when new SVs are requested. The problem with this is that since everything comes from the same chunk of allocated memory, it's hard to detect where leaks are really happening; according to Alan:
...the current SV arena cleanup carries a very big shovel and bucket, and scrapes up all the camel dung at the end. However, in the process it *does* clean up things that really are *only* still referenced by the arena allocator, and therefore do in fact constitute memory that isn't accessible from anywhere else inside the interpreter (a squeak?). The upshot of this is that the arena allocator has the fortunate (?) side-effect of cleaning up squeaked SVs when the interpreter exits.
However it has the unfortunate side-effect of hiding squeaks in the rest of the interpreter, and it also hides the fact that the current attempt to delete
PL_defstashdoesn't work, and that magical things inside a stash squeak when they are deleted, and that refcounts don't get updated correctly when you delete something magical from a stash. Basically a lot of nasty stuff is hiding under the existing arena allocator. My aim is to dig out the cesspit.
He also found a nasty problem with removing XSUBs: because CVs have a pointer back to their GV, the GV has a reference count of 2. When the GV is deleted, that reference count drops to 1, and the CV is left stranded, without being cleanly removed. His suggested fix was to stop the artificial increasing of the GV's reference count. Nick Ing-Simmons said that he thought this was done to stop the function from disappearing while it was being called, meaning Perl would segfault.
Alan also found a really, really horrible problem where entries in a
stash with a circular reference never get freed. This is, of course, one
of the shortcomings of a reference counting garbage collection system,
but it's still possible to delete an SV and remove all its references
at the time you remove it from a stash. The usual "just add mark and
sweep GC" discussion miraculously failed to materialise; perhaps
everyone's tired out after having exactly the same discussion on
perl6-internals. At any rate, changing Perl 5's garbage collection system is not on the
horizon, but Alan's idea is simple and should work. Alan asked for
comment, but everyone was too busy gawping wide-eyed in wonder.
Mr. Schwern has, yet again, been doing some fantastic stuff with Perl
QA, including a suggestion to rearrange the test directory to allow
multiple tests per module; to allow tests for core modules to be exactly the
same as the tests on the CPAN version of the module, an idea to rework
t/TEST so that it uses Test::Harness, and to honour "todo" and "skip"
notifications in a test suite so that known failures can be safely
identified. Basically, you can now output
not ok 13 # TODO cure cancer
in your test script, and the harness won't call it a bug. Schwern also
extracted
Test::Harness from the core and put it back as a CPAN module so it can be used by
non-bleeding-edge people. It's a very useful module, but it doesn't
seem to have appeared on CPAN yet; check it out when it does!
Olaf Flebbe asked whether you can
rsync the perl 5.6.x tree as well as the bleadperl; the answer is yes, you
can. Simply change
perl-current to
perl-5.6.x in your
rsync line, and you can follow the maintainance track.
Sarathy asked if anyone was interested in playing with
waitpid on Windows and its
fork emulation, so that calling
waitpid would close the handle to the "child" thread. If that's something that
would interest you,
read Sarathy's mail.
Oh, and we got some spam. The first piece in a long while, thanks to the work of Richard and Ask.
Until next week I remain, your humble and obedient servant,