January 2003 Archives

Embedding Perl in HTML with Mason

Disclaimer: As you know, each month I try to review a recently published Perl book, and I aim to cover all the majors as they come out. The book that's fallen onto my desk for review this month is Dave Rolsky and Ken Williams' Embedding Perl in HTML with Mason "What is this," you're thinking, "an O'Reilly site doing a review of an O'Reilly book? Scandalous!" Well, I hope that you've taken a look at my other reviews and have satisfied yourself that I try to be as impartial as I can when reviewing. As far as I'm concerned, this is a Perl site first and an O'Reilly site second.

With that disclaimer out of the way, onto the book! There are plenty of ways of achieving the Perl-in-HTML goal, as this book correctly points out: the Template Toolkit, the venerable embperl, and so on. My personal favorite, however, is HTML::Mason, and so I've been looking forward to this book for a long time. That also means, however, that I've had high expectations of it; Rolsky and Williams' effort has lived up to many of them but let me down in a few areas.

The first chapter does a good job of setting the scene - it describes a little of what Mason looks like, talks about some of the alternatives to Mason, and shows how to install the module and test it. While the first description is precisely what is needed, and the test process is well-documented, I feel more could have been made of the comparison to other techniques - tools such as PHP and Template Toolkit are described, but they're not compared to Mason, so it's hard to see their relative strengths and weaknesses. Similarly, the installation process for Mason is quite detailed, and brushing it off with "perl Makefile.PL; make; make install" doesn't do it justice - an example of the output would make readers feel more comfortable.

The book then goes on to introduce the main syntax of Mason components. This is a useful section and I learned a lot about various ways Mason tags are interpreted, but I felt it would have been better structured with more examples building on top of one another; that said, the chapter did declare itself to be an introduction to Mason syntax and semantics, and as that, it succeeded - however, I think it would have been a better tutorial if both semantics and syntax were covered.

The short chapter on autohandlers and dhandlers was a sermon from the clouds. I've long known that these things existed and were powerful, but until reading chapter 3, I didn't quite know how they should be used. The section on the Mason API again unfortunately suffers from a lack of examples; but on the other hand, the book builds up to a full example in chapter 8. The "advanced features" chapter was a complete goldmine of information. Like many of these early chapters, it contains a lot of concentrated goodness in not many pages, and it will take me several more readings to pick out and understand more of the ideas.

Chapter 6, another short (10 page) chapter on the "lexer compiler, resolver and interpreter objects" seems to be included for completeness or for hard-core Mason hackers - it could be skipped or moved to an appendix with no loss of flow or coverage.

As I've mentioned, chapter 8 is where it all comes together: a full, real-life application (the Perl Apprenticeship site - an interesting resource in its own right!) is put together before your very eyes. As this is a real live site, areas such as creating user accounts and handling access control have to be covered. If you're doing anything at all with Mason, then I'd urge you to buy this book if only for this chapter - once you've worked through it, you'll have a much clearer idea of how a Mason site fits together.

Later chapters cover mixing Mason and CGI, another brief (8 page) chapter on design, and another gold mine chapter of recipes. Chapters like this and the advanced topics chapter are the reason you buy books on open source projects: sure, there may be free documentation's there - and the Mason documentation is pretty thorough - but it doesn't directly tell you how to do what you want to do. The documentation doesn't often cover the situations you find yourself in when you're actually developing with the tool in question; I'm happy to say that this book does.

Appendices cover the Mason API, (which is odd, given chapter 4 covers that ...) the Mason-object model, using Mason with your favorite text editor (a surprisingly useful set of information!) and information about Bricolage, a Mason-based context management system; useful as yet another set of ideas for what Mason can do.

On an aesthetic note, kudos to O'Reilly for restoring the spine coloring - now my bookshelf can be color-coded again; now bring back Garamond!

If I've sounded at all negative in this review, then it's probably because I've been expecting this book for a while and have had high hopes for it. That said, my overall impression of this book is that it's a little thin - short chapters and few worked examples leave one wanting more. On the other hand, the full example in chapter 8 is worth its weight in gold, and when combined with the advanced concept and cookbook chapters, I'd give this book a qualified thumbs-up for anyone doing any Mason work.

This week on Perl 6, week ending 2003-01-26

Welcome to the first Perl 6 summary of the new 'Copious Free Time enabled' era, which should mean that these summaries will get mailed out on Monday evening from now on.

We start, as usual, with perl6-internals

The eval patch

Leopold Tötsch continued to work on making eval and its associated operators work and asked for some opinions about a problem he was having with jumps between code segments. Jason Gloudon played sounding board and the two of them found a way forward.


The Parrot crashes

Dan announced that Parrot was crashing big time under (at least) OS X, throwing segfaults in the NCI mark routine. Other people contributed reports of what was happening on their platform and Leo Tötsch tried to track down the error, working on the assumption that it was something to do with his eval patch as that was the last patch that had gone in.

Later in the week Dan realised that part of the problem with OS X was a problem with imclexer.c, an autogenerated file being generated with bad code. Leo reckoned it was a problem with something having been multiply defined, which had since been fixed.

Still later in the week, Steve Fink offered some analysis on the NCI mark routine and a patch which attempted to fix the problems he saw with it. Leo reckoned that the patch wasn't quite right, but the analysis was good and used that to fix the problem and discovered in the process that the parrot build process's dependency analysis wasn't quite up to the mark. (I don't think there's a fix for that yet, but at least we know the problem exists...)




Compiling to Parrot

K Stol is looking for a final project for his Bachelor's degree and would like to implement some language targeting Parrot and asked for suggestions. Simon Wistow suggested PHP or Lua, Leon Brocard suggested Java (though Gopal V wasn't sure that was such a good idea just yet). Dan suggested that, just because someone else was already working on a TCL compiler it didn't mean it wasn't worth trying anyway, but apparently it would have been unlikely to get approval as a Bachelor's project. There was some further discussion of Lua and whether there would be any real point in implementing it and the thread died down before Mr Stol told anyone what he'd decided to do, which was rather a shame I thought.


Extending the packfile format

Some months ago, Jürgen Bömmels offered a patch to extend the Parrot packfile format whilst retaining backward compatibility. This week, after a little modification, Leo Tötsch applied it. James Mastros wondered if it didn't make sense (at least before parrot 1.0) to ignore backward compatibility issues and just make it right. Leo and Dan agreed, but Leo pointed out that, at least until the switch from assemble.pl to IMCC was complete, maintaining backward compatibility was the right thing. (So, if you're currently using assemble.pl to assemble your language, consider moving it to IMCC sooner rather than later).


The long running Objects thread

Discussions continued. A good deal of time this week was spent discussing the overrideablility (there's got to be better word than that...) of OO method dispatch for different languages (and at a language level too I hope -- I like flexibility). Python for instance seems to require that every step in the process be subject to override. Dan pointed out that dispatch would end up being implemented as vtable methods on per language Object PMCs so that messages from one language's objects to another's would use the target language's dispatch rules. Which seems rather cute (in a good way).

http://groups.google.com/groups -- Christopher Armstrong talks about Python

http://groups.google.com/groups -- Dan on how it works.

Intersegment branching

Leo Tötsch added a new opcode for intersegment branches, called branch_cs which was his implementation of the fix that he and Jason Gloudon had discussed while talking about problems with eval. Dan didn't think Parrot needed this and that we could just use a plain jump. Leo pointed out that there were several places where this didn't quite work, especially in the presence of JIT optimization. Dan didn't think that didn't mean it wouldn't work, but reckoned that the cases Leo pointed out were reasons he wanted to see intersegment jumps mediated by subroutine calls. Leo followed up to this with an example. I can't for the life of me tell if this was a counter example, a suggestion or what though.

This reemerged as a new thread with the subject 'Transferring control between code segments'. Dan opened the thread by discussing the design perspective and laying down a few edicts. He went on to discuss the issues that Leo's 'phenomenally cool' compile opcode has exposed in his earlier handwaving and replaced the handwaves with more detail.

People were generally happy to see this, but there were a few small issues with it which weren't explicitly resolved this week.


http://groups.google.com/groups -- Design issues

Bytecode Metadata

Dan responded to the earlier discussion of extending the packfile format by opening a discussion of the type of metadata that might be useful to have in either the on disk packfile or in memory bytecode. He reminded everyone that parrot may have to ignore (or at least mistrust) the metadata and opened the floor suggestions.

James Michael DuPont came up with a proposal involving RDF which Dan thought might well be overkill (albeit interesting overkill) since he had been thinking more about metadata that would be used by the engine itself or provided to programs running on it.

Various other suggestions were made, and it's apparent that one immutable design guideline is that any bytecode format for Parrot must allow for executable code to be mmapped in.

Leo Tötsch has a patch implementing a simplified API for generating packfiles and wondered if he should check it in.

http://groups.google.com/groups -- Opening remarks

http://groups.google.com/groups -- The RDF bazooka

Odd JIT timings

Dan reported that examples/assembly/mops_p.pasm was running slower with JIT optimization than without under OSX, which doesn't seem right. Daniel Grunblatt pointed out that JIT cores that don't optimize many opcodes will always be slower than a Computed Goto (CG) core and there aren't many optimized opcodes for the PPC architecture. Bruce Gray pointed Dan at an introduction to PPC assembler in case Dan wanted to add to the score of optimizations. Anyone else is interested could take a look too, kudos awaits the heroic implementer.



Meanwhile, in perl6-language

There was a little more diversity of discussion this instead of concentration on a single massive thread, though that thread did, of course, continue on its path.

L2R/R2L syntax

There was some discussion of implementing conditionals (if, else etc) as functions rather than special keywords. Personally I'm not sure that anything that monkeys with Perl's normal order of evaluation can strictly be called a function, but what the hey.

Buddha Buck showed off a rather neat implementation of if, then and else which borrowed some of its implementation from a Smalltalk like 'Bool' class with isTrue and isFalse methods and then used multiple dispatch to get the rest.

In another subthread, Michael Lazzaro explained why he still wanted Right to Left pipelining, even though he was much more at ease with Perl 6's rules for when you needed a comma in an argument list. It appears to be mostly about teachability.

Damian popped up to make (at least) Graham Barr and me happy when he said that he didn't think the Perl 5ish functional forms of map and grep etc would be going away. Huzzah!

http://groups.google.com/groups -- Buddha's Smalltalk port of conditionals

http://groups.google.com/groups -- Michael Lazzaro on grep, map and explicit pipelines

http://groups.google.com/groups -- map/grep/etc still functions

A proposal on if and else

Brent Dax proposed a solution to the problem of implementing if and else as functions (the problem being that, unless you cuddle all your elses, perl will intuit a semicolon at the end of if's block and things will get confused) by proposing that all code blocks take an optional 'else', accessed by the $code.else attribute, one of the side effects of which would be that elsif would become else if.

Austin Hastings wondered if supporting 'separable verbs' might be a better way forward. The idea certainly looks interesting, the best summary of the idea I can give you is to point you at Austin's original message though.

http://groups.google.com/groups -- All blocks are elseable

http://groups.google.com/groups -- Separable verbs

Arc: An Unfinished Dialect of Lisp

Rich Morin pointed everyone at Paul Graham's presentation at the first Little Languages conference about Arc, pointing out that Graham had some interesting things to say about language design. The discussion that followed was mostly off topic (in the sense that they weren't talking at all about Perl 6). Graham's paper is interesting though.



Array/Colon question

Michael Lazzaro wondered about the seemingly different meanings of colon in:

    print $FH : $a;    # $FH is an indirect object
    @a = 0 .. 10 : 2;  # @a is (0,2,4,6,8,10)

and wanted to know when : meant 'step' and when it designated an indirect object. Brent Dax helped clear things up. Essentially : is a 'supercomma' and different operators and functions interpret what comes after the supercomma in different ways. .. treats it as specifying a step.


Multiple Dispatch by Context?

Piers Cawley wondered if it would be possible to specify a multimethod by context as well as by parameter types. Dan Sugalski managed to hole the proposal below the waterline with a neat set of multidispatch declarations that led to a horrible ambiguity. Thomas A Boyer pointed out that Ada can handle such ambiguities but that it was a complete pain to implement and voted against using return type for multiple dispatch.


Who's Who in Perl 6

Ah... um... I appear to be lacking an answer set this week.

Announcements, Acknowledgements and Apologies

I'm afraid to have to announce that, as of this week I can no longer donate the fee for these summaries to the Perl Foundation it's currently my only source of income. On the positive side, now that I don't have to spend 8 hours a day working for someone else I should be able to get the Summary mailed out to the Perl 6 lists on a Monday from now on.

Thanks to everyone who sent me mail about my redundancy. No matter how good it feels not to have to get up at 5.50 every morning with the prospect of not getting home again 'til at least 18.40 it's still a terrible shock to lose your job. Thankfully we look to be in a much better position this year than we were last time I was out of work; I'm hoping that, if I can get a few more freelance writing gigs and maybe a little consultancy I'm not going to have to go back to the 4 hour commutes for the foreseeable future.

Thanks too to everyone who's given me answers for the Who's Who section since this summary started. I'm sorry to be retiring it for now, but as new blood filters into the Perl 6 community I hope I'll be able to restart it some time further down the road.

If you appreciated this summary, please consider one or more of the following options:

Screen-scraping with WWW::Mechanize

Screen-scraping is the process of emulating an interaction with a Web site - not just downloading pages, but filling out forms, navigating around the site, and dealing with the HTML received as a result. As well as for traditional lookups of information - like the example we'll be exploring in this article - we can use screen-scraping to enhance a Web service into doing something the designers hadn't given us the power to do in the first place. Here's an example:

I do my banking online, but get quickly bored with having to go to my bank's site, log in, navigate around to my accounts and check the balance on each of them. One quick Perl module (Finance::Bank::HSBC) later, and now I can loop through each of my accounts and print their balances, all from a shell prompt. Some more code, and I can do something the bank's site doesn't ordinarily let me - I can treat my accounts as a whole instead of individual accounts, and find out how much money I have, could possibly spend, and owe, all in total. Another step forward would be to schedule a crontab every day to use the HSBC option to download a copy of my transactions in Quicken's QIF format, and use Simon Cozens' Finance::QIF module to interpret the file and run those transactions against a budget, letting me know whether I'm spending too much lately. This takes a simple Web-based system from being merely useful to being automated and bespoke; if you can think of how to write the code, then you can do it. (It's probably wise for me to add the caveat, though, that you should be extremely careful working with banking information programatically, and even more careful if you're storing your login details in a Perl script somewhere.)

Back to screen-scrapers, and introducing WWW::Mechanize, written by Andy Lester and based on Skud's WWW::Automate. Mechanize allows you to go to a URL and explore the site, following links by name, taking cookies, filling in forms and clicking "submit" buttons. We're also going to use HTML::TokeParser to process the HTML we're given back, which is a process I've written about previously.

The site I've chosen to demonstrate on is the BBC's Radio Times site, which allows users to create a "Diary" for their favorite TV programs, and will tell you whenever any of the programs is showing on any channel. Being a London Perl M[ou]nger, I have an obsession with Buffy the Vampire Slayer. If I tell this to the BBC's site, then it'll tell me when the next episode is, and what the episode name is - so I can check whether it's one I've seen before. I'd have to remember to log into their site every few days to check whether there was a new episode coming along, though. Perl to the rescue! Our script will check to see when the next episode is and let us know, along with the name of the episode being shown.

Here's the code:

  #!/usr/bin/perl -w
  use strict;

  use WWW::Mechanize;
  use HTML::TokeParser;

If you're going to run the script yourself, then you should register with the Radio Times site and create a diary, before giving the e-mail address you used to do so below.

  my $email = ";
  die "Must provide an e-mail address" unless $email ne ";

We create a WWW::Mechanize object, and tell it the address of the site we'll be working from. The Radio Times' front page has an image link with an ALT text of "My Diary", so we can use that to get to the right section of the site:

  my $agent = WWW::Mechanize->new();
  $agent->follow("My Diary");

The returned page contains two forms - one to allow you to choose from a list box of program types, and then a login form for the diary function. We tell WWW::Mechanize to use the second form for input. (Something to remember here is that WWW::Mechanize's list of forms, unlike an array in Perl, is indexed starting at 1 rather than 0. Our index is, therefore,'2.')


Now we can fill in our e-mail address for the '<INPUT name="email" type="text">' field, and click the submit button. Nothing too complicated.

  $agent->field("email", $email);

WWW::Mechanize moves us to our diary page. This is the page we need to process to find the date details from. Upon looking at the HTML source for this page, we can see that the HTML we need to work through is something like:

  <tr><td></td><td></td><td class="bluetext">Date of episode</td></tr>
  <td class="bluetext"><b>Time of episode</b></td></tr>
  <a href="page_with_episode_info"></a>

This can be modeled with HTML::TokeParser as below. The important methods to note are get_tag - which will move the stream on to the next opening for the tag given - and get_trimmed_text, which returns the text between the current and given tags. For example, for the HTML code "<b>Bold text here</b>", my $tag = get_trimmed_text("/b") would return "Bold text here" to $tag.

Also note that we're initializing HTML::TokeParser on '\$agent->{content}' - this is an internal variable for WWW::Mechanize, exposing the HTML content of the current page.

  my $stream = HTML::TokeParser->new(\$agent->{content});
  my $date;
  # <input>

  # <tr><td></td></tr><tr>
  $stream->get_tag("tr"); $stream->get_tag("tr");

  # <td></td><td></td>
  $stream->get_tag("td"); $stream->get_tag("td");

  # <td class="bluetext">Date of episode</td></tr>
  my $tag = $stream->get_tag("td");
  if ($tag->[1]{class} and $tag->[1]{class} eq "bluetext") {
      $date = $stream->get_trimmed_text("/td");
      # The date contains '&nbsp;', which we'll translate to a space.
      $date =~ s/\xa0/ /g;
  # <td></td><td></td>

  # <td class="bluetext"><b>Time of episode</b>  
  $tag = $stream->get_tag("td");

  if ($tag->[1]{class} eq "bluetext") {
      # This concatenates the time of the showing to the date.
      $date .= ", from " . $stream->get_trimmed_text("/b");

  # </td></tr><a href="page_with_episode_info"></a>
  $tag = $stream->get_tag("a");

  # Match the URL to find the page giving episode information.
  $tag->[1]{href} =~ m!src=(http://.*?)'!;

We have a scalar, $date, containing a string that looks something like "Thursday 23 January, from 6:45pm to 7:30pm.", and we have an URL, in $1, that will tell us more about that episode. We tell WWW::Mechanize to go to the URL:


The navigation we want to perform on this page is far less complex than on the last page, so we can avoid using a TokeParser for it - a regular expression should suffice. The HTML we want to parse looks something like this:

  <br><b>Episode</b><br>  The Episode Title<br>

We use a regex delimited with '!' in order to avoid having to escape the slashes present in the HTML, and store any number of alphanumeric characters after some whitespace, all between <br> tags after the Episode header:

  $agent->{content} =~ m!<br><b>Episode</b><br>\s+?(\w+?)<br>!;

$1 now contains our episode, and all that's left to do is print out what we've found:

  my $episode = $1;
  print "The next Buffy episode ($episode) is on $date.\n";

And we're all set. We can run our script from the shell:

  $ perl radiotimes.pl
  The next Buffy episode (Gone) is Thursday Jan. 23, from 6:45 to 7:30 p.m.

I hope this gives a light-hearted introduction to the usefulness of the modules involved. As a note for your own experiments, WWW::Mechanize supports cookies - in that the requestor is a normal LWP::UserAgent object - but they aren't enabled by default. If you need to support cookies, then your script should call "use HTTP::Cookies; $agent->cookie_jar(HTTP::Cookies->new);" on your agent object in order to enable session-volatile cookies for your own code.

Happy screen-scraping, and may you never miss a Buffy episode again.

This week on Perl 6, week ending 2003-01-19

Summary time again, damn but those tuits are hard to round up. Guess, what? perl6-internals comes first. 141 messages this week versus the language list's 143.

Objects (again)

Objects were still very much on everyone's mind as the discussions of Dan's initial thoughts about objects in Parrot continued. Jonathan Sillito put up a list of questions about Dan's original message which Dan answered promptly. Down the thread a little Dan mentioned that he hoped Parrot's objects would serve, reasonably unmodified for a bunch of languages (ie, he hoped that there wouldn't be a requirement for PythonRef/Attr/Class/Object etc), Chris Armstrong thought that, given what Dan had outlined so far, that wouldn't be straightforward. Dan thanked him for throwing a spanner in the works, asking for more details which Chris provided.

Meanwhile Jonathan had some supplementary questions... Hmm... doing this blow by blow will take forever. Suffice to say that details are being thrashed out. At one point Dan's head started to spin as terminology mismatches started to bite, leading Nicholas Clark to suggest an entirely new set of terms involving houses and hotels (but with some serious underpinnings).

http://groups.google.com/groups -- thread root, from last week.

http://groups.google.com/groups -- Jonathan's questions

http://groups.google.com/groups -- Chris Armstrong throws a spanner

http://groups.google.com/groups -- Nicholas Clark tries for a monopoly on silliness

Optimizing and per file flags

Nicholas Clark wrote about requiring the ability to adjust compiler optimization flags on a per file basis (brought up by Dan on IRC apparently) and proposed a scheme. Quote of the thread (and quite possibly the year so far): "When unpack is going into an infinite loop on a Cray 6000 miles away that you don't have any access to, there isn't much more you can do." Thanks for that one Nick.


The draft todo/worklist

Dan posted his current todo/worklist, which he described as "reasonably high level, and a bit terse". I particularly liked the last entry "Working Perl 5 parser". Surprisingly, there was very little discussion, maybe everyone liked it.


Parrot Examples

Joe Yates asked if we could add a helloworld.pasm to parrot/examples/assembly. Joseph Guhlin wondered what was so special about

    print "Hello, world\n"

that it would need a file of its own (though he did forget the end in his post, and segfaults are not really what you want in sample code).


Thoughts on infant mortality (continued)

Jason Gloudon posted a wonderfully clear exposition of the problems facing anyone trying to implement a portable, incremental garbage collector for Parrot which sparked a small amount of discussion and muttering from Dan about the temptation to program down to the metal.


Operators neg and abs in core.ops

Bernhard Schmalhofer posted an enormous patch adding neg and abs operators to core.ops. There were a few issues with the patch so it hasn't gone in yet and an issue with what underlying C functions are available reared its head too.


The eval patch

Leo Tötsch seems to have spent most of the week working on getting eval working and he opened a ticket on rt.perl.org to track what's happening with it. The response to this can be summarized as 'Wow! Fabulous!'.

Once more, for Googlism, Leopold Toetsch is my hero.


Pretty Pictures

Mitchell N Charity posted some pretty pictures that he'd generated with doxygen and graphviz. Most of the responses to this suggested he use different tools. Ah well.


Solaris tinderbox failures

Andy Dougherty created an RT ticket for the Solaris tinderbox, which have been failing with the delightfully useful 'PANIC: Unknown signature type" and wondered if things could be fixed up to be a little more informative. Apparently it was as issue with Leo's recently checked in eval patch. So Leo fixed it.


Parrot compilers

Cory Spencer wondered about how the current compilers that target parrot work, noting that they seem to be duplicating a good deal of labour, and wondered if anyone had worked on a gcc like framework with a standardized Abstract Syntax Tree (AST). Everyone pointed him at IMCC. Gopal V also pointed out that, given the variety of implementation languages (C, Perl, Parrot...) sharing effort between the sample languages would be a little tricky, and mentioned his work on TreeCC (an AST manager).


ook.pasm eval

Leon Brocard had problems getting the eval based Ook implementation working. It turned out to be a problem with Ook's make test using parrot instead of IMCC.


Meanwhile, in perl6-language

The language list was a little fractious this week; I get the feeling that we're spinning our wheels slightly at the moment

Array questions

Piers Cawley thought that my $b is $a would be a compile error, but Michael Lazzaro pointed out that that would mean that my %data is FileBasedHash($path) would also be a error. Damian pointed out that they shouldn't be compile time errors, but there would be no compile time type checking.


L2R/R2L syntax. Again.

Okay, cards on the table here, I'm getting really, really fed up with this thread. This week it was the monster that ate perl6-language. And how.

We revisited the Unicode argument (Larry has said that Perl 6 will have Unicode operators, some people don't like it, others (including me) aren't keen. Nobody came up with any original arguments this week).

Sarcasm was employed (and missed).

Michael Lazzaro brought up Perl 5's special case syntax for functions prototyped with block arguments which sparked some slightly heated discussion. Damian had some words of wisdom on this subject.

http://groups.google.com/groups -- Michael Lazzaro on block syntax

http://groups.google.com/groups -- Damian talks sense

Later in the thread, Damian clarified his explanation of how the proposed ~> and <~ operators would work in response to Buddha Buck's excellent summary of his understanding of them. If you're taking part in this monster thread I strongly suggest rereading both of these messages, they're excellent. The subthread from Damian's clarifications led on to a discussion of multimethods that's worth looking at too.

http://groups.google.com/groups -- Buddha Buck's summary

http://groups.google.com/groups -- Damian's clarifications

Larry's state of health and employment

Damian mentioned that "We should bear in mind that Larry has had some health issues. And that he's currently unemployed with four children to support. Other matters are taking precedence at the moment." Get well soon Larry.

This led to a discussion of whether the Perl Foundation would be continuing its grant to Larry in 2003 (apparently not). (The advocacy@perl.org list is supposedly the right place to discuss this further but I'm not yet a member.)


Who's Who in Perl 6?

Who are you?
Melvin Smith. I work for IBM. Former Linux hacker. My education is Computer Science. I am happily married.
What do you do for/with Perl 6
Wrote various pieces of Parrot, including continuations (woo woo!), IMCC, and Cola as well as heckle Dan on IRC any chance I get. Sadly, my contributions are fragmented and sparse, until I no longer have to work for a living.
Where are you coming from?
Country boy from Georgia. I have a 4x4 truck and a dog. Perl is my favorite language because it has made possible many days of finishing work early to play golf, and I love WOW-ing Java programmers.
When do you think Perl 6 will be released?
In digestable form, December, 2003. However, if the Raelians can clone Leopold Tötsch by July, then August might be possible.
Why are you doing this?
Solely to learn new things. There is no better way to learn how to write a compiler than to write one badly. If Perl6 never arrives, I'll be satisfied by what I accomplished.
You have 5 words. Describe yourself
Intense, stubborn, dedicated, happy and kind.
Do you have anything to declare?
Yes. Treat people on the Net just like your friends and coworkers. They just might be, one day.


Well, this set of acknowledgements may look slightly different than usual. This morning we had one of those meetings... If you've ever worked for a dot com you know the type; the whole company got called into a conference room that was two small at about two minutes notice and the boss spent 10 minutes umming and ahhing through a speech about retrenchment and cost cutting and... um... downsizing.

So, it looks like I'm about to become a member of the Copious Free Time club. I would take this opportunity to beg for a job, but if you do have jobs to offer Perl programmers Larry and Dan may be more useful to you.

Returning to your normally scheduled acknowledgements, many thanks to Melvin Smith for his answers to the Who's Who questionnaire. The answer queue is empty again so unless someone else sends some answers Who's Who will be on hiatus for a while. Send your answers to 5Ws@bofh.org.uk

If you appreciated this summary please consider one or more of the following options:

The fee paid for the publication of these summaries on perl.com is paid directly to the Perl Foundation (but given my current situation I may be reconsidering that.)

What's new in Perl 5.8.0

It's been nearly six months since the release of Perl 5.8.0, but many people still haven't upgraded to it. We'll take a look at some of the new features it provides and describe why you should investigate them yourself.


Perl 5.8 - at last! - properly supports Unicode. Handling Unicode data in Perl should now be much more reliable than in 5.6.0 and 5.6.1. In fact, quoting the most excellent perluniintro.pod, which is suggested reading, '5.8 is the first recommended release for serious Unicode work.'

The Unicode Character Database, which ships with Perl, has been upgraded to Unicode 3.2.0, 5.6.1 had 3.0.1. Most UCD files are included with some omissions for space considerations.

Perl's Unicode model is straightforward: Strings can be eight-bit native bytes or strings of Unicode characters. The principle is that Perl tries to keep its data as eight-bit bytes for as long as possible. When Unicodeness cannot be avoided, the data is transparently upgraded to Unicode. Native eight-bit bytes are whatever the platform users (for example Latin-1), Unicode is typically stored as UTF-8.

Perl will do the right thing with regard to mixing Unicode and non-Unicode strings; all functions and operators will respect the UTF-8 flag. For example, it is now possible to use Unicode strings in hashes, and correctly use them in regular expressions and transliterations. This has fully changed from 5.6 where you controlled Unicode support with a lexically scoped utf8 pragma.

To fully use Unicode in Perl, we must now compile Perl with perlio -- the new IO system written by Nick Ing-Simmons that we will cover later -- together with the new Encode module written by Dan Kogai. Together, these allow individual filehandles to be set to bytes, Unicode or legacy encodings. Encode also comes with piconv, which is a Perl implementation of iconv and enc2xs, which allows you to create your own encodings to Encode, either from Unicode Character Mapping files or from Tcl Encoding Files. From Sadahiro Tomoyuki comes Unicode::Normalize and Unicode::Collate, surprisingly used for normalization and collating.

Perl Threads

Just like 5.8 is the first recommended release for Unicode work, it is also the first recommended release for threading work. Starting with 5.6, Perl had two modes of threading: one style called 5005threading, mainly because it was introduced with 5.005, and ithreads, which is short for interpreter threads. Gurusamy Sarathy introduced ithreads as a step forward from multiplicity to support the psuedofork implementation on Win32. However, in 5.6, there was no way to control these threads from Perl; this has now changed with the introduction of two new modules in 5.8.

The basic rule for this thread model is that all data is cloned when a new thread is created, so no data is shared between threads. If one wants to share data, then there is a threads::shared module and the new : shared; variable attribute. Controlling the new threads is done by using the threads module. More reading can be found in the respective modules and the Perl Thread Tutorial page


New IO

Perl can now rely on its own bugs instead of the bugs of your underlying IO implementation! In Perl 5.8, we are now using the PerlIO library, which replaces both stdio and sfio. The new IO system allows filters to be pushed/popped from a filehandle for doing all kinds of nifty things. For example, the Encode module, as mentioned earlier in the Unicode discussion, uses PerlIO to do the magic character set conversions at the IO level.

Interested parties that want to create their own layers should look at the library API, the IO layer API, the PerlIO module, and the PerlIO::via manpage.

Safe Signals

No more random segfaults caused by signals! We now have a signal handler that just raises a flag and then dispatches the signal between opcodes so you are free to do anything you feel like in a signal handler (Since it isn't run at async time, it isn't really a signal handler). This has potential for conflicts if you are embedding Perl and relying on signals to do some particular behavior, but I suppose if you really like having the chance of a random segfault on receiving a signal, then you can always compile perl with PERL_OLD_SIGNALS. This will, however, not be threadsafe.

New and Improved Modules

Perl 5.8 comes with 54 new modules, many of them are included of CPAN for various reasons. One goal has been to make it easy for CPAN.pm to be selfhosting; this has meant including libnet and a couple of other modules.

We have been working on testing a lot so the Test::More family of modules were natural to include. Then there was a push to make Perl more i18n friendly, so 5.8.0 includes several i18n and l10n modules as well as the previously covered Unicode modules. There many modules that provide access to internal functions like the PerlIO modules, threads module and sort, the new module that provides a interface to the sort implementation you are using. Finally, we also thought it was time to include Storable in the core.

We also have a bunch of updated modules included: Cwd is now implemented in XS, giving us a nice speed boost. B::Deparse has been improved to the point that it is actually useful. Maintenance work on ExtUtils::MakeMaker has made it more stable. Storable supports Unicode hash keys and restricted hashes. Math::BigInt and Math::BigFloat manpage have been upgraded and bugfixed quite a lot, and they have been complemented by a Math::BigRat module, and the bigrat, bigint and bignum pragmata for lexical control of transparent bignumber support.

Speed Improvements

Even if this release includes a lot of new features, there are some optimizations in there as well! We have changed sort to use mergesort, which for me is rather surprising since I have been told since I was a toddler to use quicksort. However, the old behavior can be controlled using the sort module; we even have a mystery stable quicksort!

Once again, we have changed the hashing algorithm to something called One-At-A-Time, so all of you who depend on the order of hashes, this is a good reason to fix your programs now!

Finally, map has been made faster, as has unshift.


We hope this should be the most stable release of Perl to date, as an extensive QA effort has been spearheaded by Michael Schwern that has led to several benefits. We now have six times the amount of test cases, testing a cleaner codebase with more documentation. The Perl Bug database has been switched to Request Tracker; we should thank Richard Foley for his work on perlbugtron, which has now been retired. After several discussions on what a memory leak is, several memory leaks and naughty accesses have been fixed. Tools used have been third degree, purify, and the most excellent open-source alternative, valgrind.

More Numbers

Nicholas Clark, Hugo van der Sanden and Tels have done some magic keeping integers as integers as long as possible, and when finding bugs in vendors number-to-string and string-to-number they have coded around these to increased precision. We should all be happy that 42 is now 42 and not 42.000000000000001 - imagine what the aliens would do if they found out!


I have mentioned several documentations pages earlier, they are part of the 13 new POD files included in Perl; in addition to this, all README.os files have been translated into pod. Interestingly, there are several new tutorials, including a regular expressions tutorial, a tutorial on pack and unpack, a debugging tutorial, a module creation tutorial, and 'a gentle introduction to perl'. There is also a new POD format specification written by Sean M. Burke.


Several deprecations have occurred in this release of Perl. In future versions of Perl, 5005threads will be gone and replaced by ithreads. Pseudo-hashes will be killed but the fields pragma will work using restricted hashes; suidperl, which, despite everything, isn't safe and the bare package; directive, which had unclear semantics.

A few things have been removed and forbidden: blessing a refence into another ref is one; self-tying of arrays and hashes led to some weird bugs and have been disabled, as they touched some rarely tested codepaths. The [[.c.]] and [[=c=]] character classes are also forbidden because they might be used for future extensions. Several scripts that were outdated have been removed and the upper case comparison operators have also got the ax.

The War of the Platforms

Perl 5.8 works on several new platforms and the EBDIC platforms were regained. However, sadly we lost Amiga; so any volunteers that want to make the Amiga port work again are very welcome.

Odd and Ends

There is a long list of new small changes in Perl 5.8, the biggest of these small changes are restricted hashes, which can be used from the new Hash::Util module and allows you to lock down the keys in a specific hash; this will possibly be used as a replacement for pseudohashes for the fields pragma.

For the full and gory details, check out the whole Perl delta documentation.

This week on Perl 6, week ending 2003-01-12

... And we're back. Yup, it's summary time again. We'll dive straight in with perl6-internals (as if you expected anything else).

More Thoughts on DOD

Leopold Tötsch posted a test program showing the effects of PMC size and timing of garbage collection and allocation and suggested ways of improving the GC system based on the conclusions he drew from its results. Leo, Dan and Mitchell N Charity discussed this further and tried a few different approaches to try and improve performance (though Mitchell did worry about premature optimization). Work in this area is ongoing.


The Perl 6 Parser

Dan asked about the current state of the Perl 6 parser, wanting to know what was and wasn't implemented, and wondered about adding the Perl 6 tests into the standard Parrot test suite. Sean O'Rourke and Joseph F. Ryan both gave a summaries of where things stood. Joseph also suggested a few refactorings of the parser to deal with the fluidity of the current spec (almost all the operators have changed symbols since the parser was first written for instance).


LXR - Source code indexing

Last week, I said that Robert Spier had 'started work on getting a browseable, cross-referenced version of the Parrot source up on perl.org.' What actually happened was that Robert asked Zach Lipton to do the work. This week, Zach delivered the goods, which, I must say, look fabulous.

I'm sure that if someone were to extend LXR so it had a better understanding of .pasm, .pmc, .ops and other special Parrot source types, then the community would be very grateful indeed. I know I would.

http://groups.google.com/groups -- Announcement


Thoughts on Infant Mortality

Piers Cawley offered what he thought might be a new approach to dealing with the infant mortality problem that got efficiently shot down by Leo Tötsch. This led to further discussion of possible answers, and it looks like Leo's proposed solution involving a small amount of code reordering and early anchoring will be the one that's tried next. All being well, it won't require walking the C stack and hardware register set, which can only be a good thing.

Later, Leo asked whether it'd be OK to check in his work so far on redoing the GC because he was up to 15 affected files and was starting to worry about integration hell. Steve Fink wasn't sure about one of his changes, so Leo checked in everything else.



Objects, Finally (try 1)

Last week, I mentioned that Leon Brocard's wish list for the next Parrot iteration included Objects. This week, Dan posted his first draft of what Parrot Objects would and wouldn't be able to do. The 11th entry on Dan's list (Objects are all orange) seemed to be custom-made to please Leon. There was a fair amount of discussion (of course), but the consensus was positive.


The Benchmarking Problem

Nicholas Clark crossposted to p5p and perl6-internals to discuss the problems of benchmarking Parrot against Perl 5. One of parrot's aims is, of course, to go faster than Perl 5. The problem is, how do you measure 'faster'? Nicholas has been working on making Perl 5 go faster and was distressed to find out that using 'perlbench,' a patch of his went 5 percent faster, 1 percent slower, zero and 1 percent faster, depending on what machine/compiler combination he ran the benchmark on. Leo Tötsch commented that he'd found performance varying by more than 50 percent in a JIT test, depending on the position of a loop in memory. Andreas Koenig commented that he'd come to the conclusion that bugs in glibc meant that there was little point in benchmarking Perl at all if it was built with a glibc older than version 2.3 (apparently malloc/free proved to be gloriously erratic...) I'm afraid not much was actually resolved though.


Meanwhile, in perl6-language

The discussion of Variable types versus Value types continued from the previous week. Dan opined that Arrays weren't necessarily objects, which brought forth squawks from Piers Cawley who pointed out that being able to do:

   class PersistentList is Array { 
       method FETCH ($index) { 

would be much nicer than tying a value in the Perl 5ish fashion. Dan reckoned that delegation would probably be enough which, IMHO, seemed to miss the point. Various other people chimed in to, essentially, tell Dan that he was wrong, but I'm not sure Dan agreed with them.

Meanwhile, in a subthread sprouting lower on the thread tree, Damian pointed out that there were two types associated with any Perl variable -- the 'storage type' and the 'implementation type.' (See his post for details) Essentially the storage type is the type associated with the contents of a variable and the implementation type is the type of the 'thing' that does the storing -- usually one of SCALAR, HASH or ARRAY -- i.e., related to the variable's sigil. Specifying a different implementation type will probably be how Perl 6 does tying.)



Array Questions

In a thread that spilt over from the previous discussion about whether arrays were objects, Michael Lazzaro put up a list of examples that seem to imply rather strongly that arrays are either objects or indistinguishable from them, and there was general muttering that being able to overload tied Perl containers in this way was a neat way of implementing tie semantics. Mr Nobody attempted to restart the left-to-right versus right-to-left argument. There was also some discussion of the sickness of my Foo @foo is Foo (which, in Perl 5ish parlance creates a tied array -- using Foo as the tying class -- which can only contain objects in class Foo.) Damian agreed that this was definitely sick, and that he for one would be making use of it.


L2R/R2L Syntax

Argh! No! It's back and this time it means business. The dreaded left->right versus right->left thing came back, and this time it was Damian applying the electrodes to the corpse. Of course, it being Damian, he was instantly forgiven as he came up with the very cool, very low precedence ~> and <~ operators, allowing you to write

   @out = @a ~> grep {...} ~> map {...} ~> sort;

Which is, to these eyes at least, lovely. See Damian's post for the full details. The general response to this was definitely on the 'lovely' side of the balance, though one detractor did induce a sense of humor failure on Damian's part. There was also a certain amount of discussion about whether this was exactly the right syntax to go with the semantics, but where would perl6-language be without a great deal of syntactic quibbling? (A good deal easier to summarize). The most popular alternatives were |> and <|. There was also a certain amount of discussion of what I can only describe as 'evil' uses of the syntax, involving arrows going in different directions in one statement. Rafael Garcia-Suarez earned at least one giggle when he suggested that we just needed v~ ^~ and we had our own flavour of Befunge.

There was a fair amount more discussion, mostly thrashing out details and edge cases.


"Disappearing" code

John Siracusa wondered whether there would be a perl6ish way of creating code that was 'the spiritual equivalent of #ifdef, only Perlish.' To which the answer is, of course, 'yes.' Damian showed off a neat trick (modulo a couple of pretty typos) with an immediate function that demonstrated using Perl 6 as its own macro language.


http://groups.google.com/groups -- Damian's neat trick

In Brief

Jim Radford fixed some typos in the rx.ops documentation.

Who's Who in Perl 6

Who are you?
Steve Fink. Some guy who writes code.
What do you do for/with Perl 6?
The only thing I set out to do was implement a regular expression compiler. Along the way, I seem to have implemented hashtables, patched the configuration system, messed with memory management, implemented some stuff for IMCC, beefed up the PMC file parser, fixed some problems with the JIT, and a few other things. Then I got a job and ran out of time to work on Parrot, so they made me pumpking. And I still haven't made it that far with the regex compiler.
Where are you coming from?
Computer games, originally. First language, BASIC; second language, 6502 assembly. Then a failed attempt at C, then more successful encounters with Pascal and 68000 assembly, and then C++. Next, a few more assembly languages together with SML, NESL, Lisp, Scheme, COBOL, Tcl, Prolog, and a few others. And, at last, Perl4 and then Perl5. Oh, and Java, fairly recently. My day job is now in a combination of Perl5 and C++, as well as C when nobody's looking.
When do you think Perl 6 will be released?
Probably smoothing the path for other developers and keeping them motivated. My highest priority is applying (and testing) other people's patches, since the mostly likely reason for someone to lose interest is to not have their hard work make it into the distribution. I would also like to somehow make people's contributions more visible -- anyone who has contributed anything significant should at least be able to point to something and say "Look! I did that! See my name?!"
I said, when do you think Perl 6 will be released?
Obviously, Leopold Tötsch. Anyone paying an iota of attention would know that. Particularly someone who's been writing the summaries, unless you're stupid or something. Leo's amazing; I don't know how he finds the time. To accomplish that much, I'd need to be working full-time about 26 hours a day.
No, really. When do you think Perl 6 will be released?
No, not really. I was originally thinking of Perl6 when I got involved, but since then the Parrot VM itself has become more interesting to me. Although I still wish Perl6 development would pick up -- there's a lot that can be done even with the limited amount of the language that's been defined. Sean O'Rourke did an excellent job in a short amount of time, but it looks like real life has drawn him back into its fold, and nobody seems to have picked up the slack.
Why are you doing this?
Heh. That is the question, isn't it? Making a release is probably the most concrete measure of how I'm doing as a pumpking, and by that standard I'm a dismal failure. As soon as we reclaim the tinderbox (and without dropping any machines off it in order to do so!) Everything else I wanted to get in is already there.
You have five words. Describe yourself.
No, I don't think so. Maybe I'm wrong, but I know that I personally had to put aside a lot of the actual coding I was working on in order to concentrate on making sure everyone else's changes were being given proper consideration. I'd much rather relieve him of that burden, and let him continue to exercise his demonstrated talent at churning out quality code.
Do you have anything to declare?
No. It kind of makes sense, but I remember how I first started by rewriting a bunch of Dan Sugalski's code, and then seeing most of my code get rewritten. I used to be disturbed by that, but now I think of it more as a badge of honor -- it proves that what I wrote was worth rewriting. Much more so in Dan's case, I suppose, since he stated up-front that he was merely doing a reference implementation of a design. Dan's done an amazing job of laying out a design that hasn't needed to change at its core, and so has been a very dependable guide to the implementation of the backbone. But even in my case, I can see a number of ideas that were carried through in the reimplementation, even if no actual code survived. (Interestingly, my tests did. Which kind of makes sense if you think about it.)
Are you finished yet?
Why yes, thank you.

Ahem. Thanks Steve. Really.


Another Monday evening, another summary running over into Tuesday morning. Ah well. Distractions were provided by the usual suspects (Mike and Sully, our ginger kittens), supplemented this week by a horrible cold (the compulsion to find a tissue does tend to derail the train of thought).

Proofreading was once more handled by Aspell and me. This week, we even made sure that the Who's Who section contains the name of the person answering the questions rather than making you wait 'til the acknowledgements section.

Speaking of which, many thanks to Steve Fink for his answers to the questionnaire (well, to the questions he wanted to answer anyway). The questionnaire queue is now quite empty so, unless a few more folks in the Perl 6 community send me some answers soon then the Who's Who section may be going on hiatus. Send your answers (or request the 'correct' question list from) 5Ws@bofh.org.uk.

If you didn't like this summary, then how did you get this far? If you did like it, then please consider one or more of the following options:

The fee paid for the publication of these summarize on perl.com is paid directly to the Perl Foundation.

The Perl 6 Summarizer disclaims any and all responsibility for the sanity of his readers; he's having enough trouble hanging onto his own.

This week on Perl 6, weeks ending 2003-01-05

Hello, and welcome to the first summary of 2003; welcome to the future. This summary covers two weeks, but they've been quiet what with Christmas and the New Year's.

So, starting as usual with perl6-internals

A Pile of Patches to the Perl 6 Compiler

Joseph F. Ryan submitted several patches to the Perl 6 mini-compiler, (found in the languages/perl6 subdirectory of your friendly neighborhood Parrot distribution) mostly implementing the the semantics for string and numeric literals discussed on perl6-documentation.

Garbage Collection Headaches

Heads have been put together in an attempt to get Parrot's Garbage Collection system working efficiently and accurately (no destroying stuff before anyone's had a chance to use it, damn it!) It appears that there's still a good deal of head scratching to be done in this area (the chaps over on the LL1 list are wondering why we aren't just using the Boehm GC system ... .)

I freely admit that GC makes my head hurt (especially as, in my current Perl 5 project, I'm busy implementing mark and sweep collection for a persistent object store while also making sure that my random assortment of circular data structures has weakened references in just the right places so that stuff gets destroyed but only when it should be destroyed ... . Boy, am I looking forward to Perl 6 and not having to worry about this stuff ever again ... .) but I I'll have a go at summarizing the issues.

The main problem appears to be that of 'Infant mortality,' an issue that I will now attempt to explain.

All the objects in memory can be represented as nodes in a graph, and the pointers between those objects can be represented as edges in that graph. The process of garbage collection involves taking a subset of those nodes (the rootset) and freeing (or marking as freeable) all those nodes in the graph that are unreachable from the rootset.

Now, consider a function that sets up a new PMC, specifically a PMC that contains another PMC. The first step is to grab the memory for our new PMC. Next, we create the contained PMC, a process that allocates more memory ... and there's the rub. Garbage Collection can get triggered at any point where we go to allocate more memory; unless the containing PMC is reachable from the rootset then it will get freed at the point. And that leads to badness. So the Infant Mortality problem can also be thought of as the problem of rootset maintenance. Which is, in theory, simple; just treat all C variables as members of the rootset. However, in practice it isn't that simple, mostly because hardware registers complicate the issue.

Steve Fink offered an overview of the issues and some of the possible approaches to dealing with them, which sparked a fair amount of discussion among those who understood the issues.


http://groups.google.com/groups?threadm=20021231000055.GA23896%40foxglove.digital-integrity.com -- Steve's overview

Variable/value vtable split

Leo Tötsch posted a summary of where we stand on doing the variable/value vtable split, suggesting that he wanted to start feeding in patches soon. Mitchell N Charity supplied a handy dandy 'context' post with links to appropriate articles, and he and Leo did a certain amount of thrashing out of issues.



Parrot Gets Another New Language

Ook! Jerome Quelin offered an implementation of the latest silly language, Ook! which can be thought of as brainf.ck for librarians. Due to insanity, the Ook! compiler is implemented in Parrot assembly, and emits Parrot assembly, too, which led Jerome to ask for an 'eval' opcode that Leo promised to supply -- and which Dan specced out in PDD6. All of which led Leo to comment that, for all these languages are toys, they do seem to be driving the implementation of important bits of Parrot. Nicholas Clark reminded everyone that a zcode interpreter would be another good thing to have a crack at because it would require a couple of other really useful bits of Parrot functionality. Ook! is now in the core.


Returning new PMCs

David Robins wondered what was the resolution about creating and returning a new PMC in PMC ops that take a PMC* dest parameter. He and Dan discussed it back and forth and it became apparent that Dan really needs to get Parrot Objects defined ... .


Fun with PerlHash

Jerome Quelin noticed that you couldn't delete an item from a PerlHash. Leo fixed it. Jerome later asked how one could retrieve the keys of a PerlHash in Parrot assembly and wondered whether there was a way to traverse a hash. Sadly, the answer is 'not yet,' but happily Aldo Calpini has something nearly ready for prime time.


GC/DOD feedback & runtime tuning

Dan has been playing some test programs and has found some major issues with resource allocation and added his stress-test programs to the distribution so others could see whether they could see how to fix things. Leo Tötsch (who else) made some inroads, reporting his progress to the list as he and Dan discussed ways forward.


Object Semantics

Dan posted a sketch of how Parrot was going to deal with language level objects. And there was much rejoicing. Various people pointed out that Dan's assumption that 'real' languages only had reference type objects was incorrect; ruby has value types, so does smalltalk and C# and they're just the ones off the top of people's heads.


Meanwhile in perl6-language

Not much was happening. The language folks seem to have taken their holidays seriously; the last fortnight saw all of 76 posts.

Tree-Frobbing Facilities in Perl 6

Rich Morin wondered what type of facilities Perl 6 would have for monkeying about in trees, offering a discussion of the sort of thing he wanted to do and the problems he saw with doing that in Perl 5. Michael Schwern reckoned that "Doctor, it hurts when I do this." applied ("Well, don't do that then") and suggested other ways to handle Rich's problem. Simon Cozens, who should know better, made a terrible joke about frobbing trees. Dave Whipp pointed out that continuations should make it easy to treat tree traversal just like traversing any other list.


PRE/POST in Loops

Arthur Bergman is this week's hero. He's busy writing Hook::Scope, which will (eventually) implement Perl 6's PRE/POST/FINALLY/CATCH etc. In Perl 5. Yay Arthur.

Anyway, Arthur wanted to know what happens with POST and PRE in loop scopes. Do they get called for every iteration, or merely at the beginning and end of the loop?

Luke Palmer reckons that POST gets executed for every iteration (NEXT doesn't get executed on the last time through a loop.


my int ( 1..31 ) $var ?

Murat Ünalan didn't like Damian's proposed my $date is Integer(1..31); (restricts $date to an integer between 1 and 31) and proposed using my int(1..31) $date instead. He didn't like

    my int ($pre, $in, $post) is constant = (0..2);

either, proposing either of:

    my constant int ($pre, $in, $post) = (0..2);
    my int is constant ($pre, $in, $post) = (0..2);

Murat argued that 'type and property' belong together. Damian disagreed, and pointed out that, if you want the specifiers close together, then you could write:

    my ($pre, $in, $post) returns int is constant = (0..2);

I'm caricaturing (but only slightly) the rest of the discussion if I say that the rest of the thread ran along the lines of a pantomime argument ("Oh yes it is!" ... "Oh no it isn't!"), suffice to say I don't think either Damian or Murat convinced the other. Personally, I'm on Damian's side -- sorry Murat.



Variable Types vs. Value Types

Dave Whipp wondered whether the type of a variable could vary independently of its value. I'm not sure I understand what Dave was driving at, which makes summarizing his post a little tricky (but I think there's confusion about the meaning of 'type'; A variable has what I will call a 'sigil type' and may also have a more specific 'declared type'. Thus, a declaration of the form my Array $foo; declares a scalar (sigil type) variable which points to (contains) an Array (declared type) object while one of the form my Array @foo; declares an array (sigil type) variable which contains a number of Array (declared type) objects. The meaning of my Array %foo.

This led into a discussion of the 'Everything is an object' principle, but more on that next week.


In Brief

Leo Tötsch kept up his staggering patch generation rate. Does he ever sleep?

Mitchell Charity supplied a script which generates a browseable list of Parrot file names with brief descriptions, which should prove useful to new developers who want to get a feel for the layout of Parrot. Dan agreed, and it's in the Parrot distribution now as tools/dev/extract_file_descriptions.pl.

Jason Gloudon got Garbage collection working on IA-64 Linux, yay Jason.

Bruce Gray sent in a bunch of cleanup and win32 patches.

On a suggestion from Mitchell N Charity, Robert Spier has started work on getting a browseable, cross-referenced version of the Parrot source up on perl.org and asked to be pinged in a week or two if it hasn't happened.

Jerome also improved the debugger.

Leon 'bear of very little brain' Brocard added a couple of wishlist items: Objects, and a 'make install' that does something sensible.

Who's Who in Perl 6?

Who are you?
A twenty-something coder, writer and editor who thinks it's possible to improve the state of software and software development.
What do you do for/with Perl 6?
Random stuff:
  • argue language features with Allison, who lives very nearby
  • proofread documentation when I'm trying to read it
  • extending Parrot as an embedded platform for game scripting
  • overseeing the project to turn p6d documentation into executable test cases

Maybe I need a Perl 6 Test Pumpking hat ...

Where are you coming from?
Physically, Portland, Ore., or Sebastopol, Calif.

Otherwise, I find that Perl 5 fits the way I think and expect Perl 6 to do the same, only much more so.

When do you think Perl 6 will be released?
In beta form within the next two years. Within five years, I think it will overtake Perl 5. (I expect a 5.12, though.)
Why are you doing this?
Someone has to do this. I'm blessed to be in a position where I have some ability to give back to the community that's given me so much and where I have financial compensation to spend some time participating in this community.

I also believe that the way to write high-quality software is to take quality seriously. We have the opportunity to test Perl 6 from the ground up, having learned lessons and built tools for Perl 5. If we do our job correctly, then we'll even have tests in place before we have the language features in place. Hooray for test-driven development!

You have five words. Describe yourself.
I am not really Schwern.
Do you have anything to declare?
So this camel and parrot walk into a bar...


Back to writing summaries on the train and in the armchair when I get distracted by almost anything (current distractions, writing a graphical TestRunner for the ObjcTest framework, Eliza Carthy's utterly wonderful Anglicana CD and the nsNet puzzle game...). I started writing this Monday morning damn it.

Proofreading was once more handled by Aspell and me.

Thanks to chromatic for answering the questionnaire for me. The queue now has one (count it, one) entry left in it so please consider sending me your answers to mailto:mailto:5Ws@bofh.org.uk

If you didn't like this summary, why are you still reading it? If you did like it, please consider one or more of the following options:

The fee paid for publication of these summaries on perl.com is paid directly to the Perl Foundation.

Improving mod_perl Sites' Performance: Part 6

It's desirable to avoid forking under mod_perl, as when you do, you are forking the entire Apache server -- lock, stock and barrel. Not only is your Perl code and Perl interpreter being duplicated, but so is mod_ssl, mod_rewrite, mod_log, mod_proxy, mod_speling (it's not a typo!) or whatever modules you have used in your server, all the core routines.

Modern operating systems come with a light version of fork, which adds a little overhead when called, since it was optimized to do the absolute minimum of memory pages duplications. The copy-on-write technique is what allows it to do so. The gist of this technique is as follows: The parent process' memory pages aren't immediately copied to the child's space on fork(); this is done only when the child or the parent modifies the data in some memory pages. Before the pages get modified, they get marked as dirty and the child has no choice but to copy the pages that are to be modified since they cannot be shared any more.

If you need to call a Perl program from your mod_perl code, then it's better to try to covert the program into a module and call it as a function without spawning a special process to do that. Of course, if you cannot do that or the program is not written in Perl, then you have to call via system() or its equivalent, which spawns a new process. If the program is written in C, then you can try to write a Perl glue code with help of XS or SWIG architectures, and then the program will be executed as a Perl subroutine.

Also, by trying to spawn a sub-process, you might be trying to do the "wrong thing". If what you really want is to send information to the browser and then do some post-processing, then look into the PerlCleanupHandler directive. The latter allows you to tell the child process after request has been processed and user has received the response. This doesn't release the mod_perl process to serve other requests, but it allows you to send the response to the client faster. If this is the situation and you need to run some cleanup code, then you may want to register this code during the request processing stage like so:

  my $r = shift;
  sub do_cleanup{ #some clean-up code here }

But when a long-term process needs to be spawned, there is not much choice but to use fork(). We cannot just run this process within the Apache process because it'll keep the Apache process busy, instead of allowing it to do the job it was designed to do. Also, if Apache stops, then the long-term process might be terminated as well unless coded properly to detach from Apache's process group.

In the following sections, I'm going to discuss how to properly spawn new processes under mod_perl.

Forking a New Process

This is a typical way to call fork() under mod_perl:

  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
    # some code comes here
  # possibly more code here usually run by the parent

When using fork(), you should check its return value, because if it returns undef, it means that the call was unsuccessful and no process was spawned; something that can happen when the system is running too many processes and cannot spawn new ones.

When the process is successfully forked, the parent receives the PID of the newly spawned child as a returned value of the fork() call and the child receives 0. Now the program splits into two. In the above example, the code inside the first block after if will be executed by the parent and the code inside the first block after else will be executed by the child process.

It's important not to forget to explicitly call exit() at the end of the child code when forking - if you don't and there is some code outside the if/else block, then the child process will execute it as well. But under mod_perl there is another nuance: You must use CORE::exit() and not exit(), which would be automatically overriden by Apache::exit() if used in conjunction with Apache::Registry and similar modules. We actually do want the spawned process to quit when its work is done, otherwise, it'll just stay alive, use resources and do nothing.

The parent process usually completes its execution path and enters the pool of free servers to wait for a new assignment. If the execution path is to be aborted earlier for some reason, then one should use Apache::exit() or die(). In the case of Apache::Registry or Apache::PerlRun handlers, a simple exit() will do the correct thing.

The child shares with parent its memory pages until it has to modify some of them, which triggers a copy-on-write process that copies these pages to the child's domain before the child is allowed to modify them. But this all happens afterward. At the moment the fork() call is executed, the only work to be done before the child process goes on its separate way is to set up the page tables for the virtual memory, which imposes almost no delay at all.

Freeing the Parent Process

In the child code, you must also close all pipes to the connection socket that were opened by the parent process (i.e. STDIN and STDOUT) and inherited by the child, so the parent will be able to complete the request and free itself for serving other requests. If you need the STDIN and/or STDOUT streams, then you should reopen them. You may need to close or reopen the STDERR filehandle. It's opened to append to the error_log file as inherited from its parent, so chances are that you will want to leave it untouched.

Under mod_perl, the spawned process also inherits a file descriptor that's tied to the socket through which all communication between the server and the client occur. Therefore, we need to free this stream in the forked process. If we don't do that, then the server cannot be restarted while the spawned process is still running. If an attempt is made to restart the server, then you will get the following error:

  [Mon Dec 11 19:04:13 2000] [crit] 
  (98)Address already in use: make_sock:
    could not bind to address port 8000

Apache::SubProcess comes to our aid and provides a method cleanup_for_exec(), which takes care of closing this file descriptor.

So the simplest way to free the parent process is to close all three STD* streams if we don't need them, and untie the Apache socket. In addition, you may want to change the process' current directory to / so the forked process won't keep the mounted partition busy, if this is to be unmounted at a later time. To summarize all this issues, here is an example of the fork that takes care of freeing the parent process.

  use Apache::SubProcess;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
      $r->cleanup_for_exec(); # untie the socket
      chdir '/' or die "Can't chdir to /: $!";
      close STDIN;
      close STDOUT;
      close STDERR;
    # some code comes here
  # possibly more code here usually run by the parent

Of course, between the freeing-parent code and child-process termination, the real code is to be placed.

Detaching the Forked Process

Now what happens if the forked process is running and we decide that we need to restart the Web server? This forked process will be aborted, since when the parent process dies during the restart, it'll kill its child processes as well. In order to avoid this, we need to detach the process from its parent session by opening a new session. We do this with help of setsid() system call, provided by the POSIX module:

  use POSIX 'setsid';
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
      setsid or die "Can't start a new session: $!";

Now the spawned child process has a life of its own, and it doesn't depend on the parent any longer.

Avoiding Zombie Processes

Now let's talk about zombie processes.

Normally, every process has its parent. Many processes are children of the init process, whose PID is 1. When you fork a process, you must wait() or waitpid() for it to finish. If you don't wait() for it, then it becomes a zombie.

A zombie is a process that doesn't have a parent. When the child quits, it reports the termination to its parent. If no parent wait()s to collect the exit status of the child, then it gets "confused" and becomes a ghost process. This process can be seen as a process, but not killed. It will be killed only when you stop the parent process that spawned it!

Generally, the ps(1) utility displays these processes with the <defunct> tag, and you will see the zombies counter increment when doing top(). These zombie processes can take up system resources and are generally undesirable.

So the proper way to do a fork is:

  my $r = shift;
  defined (my $kid = fork) or die "Cannot fork: $!";
  if ($kid) {
    print "Parent has finished\n";
  } else {
      # do something

In most cases, the only reason you would want to fork is when you need to spawn a process that will take a long time to complete. So if the Apache process that spawns this new child process has to wait for it to finish, then you have gained nothing. You can neither wait for its completion (because you don't have the time to), nor continue because you will get yet another zombie process. This is called a blocking call, since the process is blocked to do anything else before this call gets completed.

The simplest solution is to ignore your dead children. Just add this line before the fork() call:


When you set the CHLD (SIGCHLD in C) signal handler to 'IGNORE', all the processes will be collected by the init process and are therefore prevented from becoming zombies. This doesn't work everywhere, however. It proved to work at least on the Linux OS.

Note that you cannot localize this setting with local(). If you do, then it won't have the desired effect.

So now the code would look like this:

  my $r = shift;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent has finished\n";
  } else {
      # do something time-consuming

Note that waitpid() call is gone. The $SIG{CHLD} = 'IGNORE'; statement protects us from zombies, as explained above.

Another, more portable but slightly more expensive solution, is to use a double fork approach.

  my $r = shift;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
  } else {
    defined (my $grandkid = fork) or die "Kid cannot fork: $!\n";
    if ($grandkid) {
    } else {
      # code here
      # do something long lasting

$grandkid becomes a "child of init", i.e. the child of the process whose PID is 1.

Note that the previous two solutions do allow you to know the exit status of the process, but in my example I didn't care about it.

Another solution is to use a different SIGCHLD handler:

  $SIG{CHLD} = sub { while( waitpid(-1,WNOHANG)>0 ) {} };

This is useful when you fork() more than one process. The handler could call wait() as well, but for a variety of reasons involving the handling of stopped processes and the rare event when two children exit at nearly the same moment, the best technique is to call waitpid() in a tight loop with a first argument of -1 and a second argument of WNOHANG. Together, these arguments tell waitpid() to reap the next child that's available, and prevent the call from blocking if there happens to be no child ready for reaping. The handler will loop until waitpid() returns a negative number or zero, indicating that no additional reapable children remain.

While you test and debug your code that uses one of the above examples, you might want to write some debug information to the error_log file so you know what happens.

Read perlipc manpage for more information about signal handlers.

A Complete Fork Example

Now let's put all the bits of code together and show a well-written fork code that solves all the problems discussed so far. I will use an <Apache::Registry> script for this purpose:

  use strict;
  use POSIX 'setsid';
  use Apache::SubProcess;
  my $r = shift;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent $$ has finished, kid's PID: $kid\n";
  } else {
      $r->cleanup_for_exec(); # untie the socket
      chdir '/'                or die "Can't chdir to /: $!";
      open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
      open STDOUT, '>/dev/null'
          or die "Can't write to /dev/null: $!";
      open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
      setsid or die "Can't start a new session: $!";
      select STDERR;
      local $| = 1;
      warn "started\n";
      # do something time-consuming
      sleep 1, warn "$_\n" for 1..20;
      warn "completed\n";
      CORE::exit(0); # terminate the process

The script starts with the usual declaration of the strict mode, loading the POSIX and Apache::SubProcess modules and importing of the setsid() symbol from the POSIX package.

The HTTP header is sent next, with the Content-type of text/plain. The gets ready to ignore the child, to avoid zombies and the fork is called.

The program gets its personality split after fork and the if conditional evaluates to a true value for the parent process, and to a false value for the child process; the first block is executed by the parent and the second by the child.

The parent process announces his PID and the PID of the spawned process and finishes its block. If there will be any code outside, then it will be executed by the parent as well.

The child process starts its code by disconnecting from the socket, changing its current directory to /, opening the STDIN and STDOUT streams to /dev/null, which in effect closes them both before opening. In fact, in this example we don't need neither of these, so I could just close() both. The child process completes its disengagement from the parent process by opening the STDERR stream to /tmp/log, so it could write there, and creating a new session with help of setsid(). Now the child process has nothing to do with the parent process and can do the actual processing that it has to do. In our example, it performs a simple series of warnings, which are logged into /tmp/log:

      select STDERR;
      local $|=1;
      warn "started\n";
      # do something time-consuming
      sleep 1, warn "$_\n" for 1..20;
      warn "completed\n";

The localized setting of $|=1 is there, so we can see the output generated by the program immediately. In fact, it's not required when the output is generated by warn().

Finally, the child process terminates by calling:


which makes sure that it won't get out of the block and run some code that it's not supposed to run.

This code example will allow you to verify that indeed the spawned child process has its own life, and its parent is free as well. Simply issue a request that will run this script, watch that the warnings are started to be written into the /tmp/log file and issue a complete server stop and start. If everything is correct, then the server will successfully restart and the long-term process will still be running. You will know that it's still running if the warnings will still be printed into the /tmp/log file. You may need to raise the number of warnings to do above 20, to make sure that you don't miss the end of the run.

If there are only five warnings to be printed, then you should see the following output in this file:


Starting a Long-Running External Program

But what happens if we cannot just run a Perl code from the spawned process and we have a compiled utility, i.e. a program written in C. Or we have a Perl program that cannot be easily converted into a module, and thus called as a function. Of course, in this case, we have to use system(), exec(), qx() or `` (back ticks) to start it.

When using any of these methods and when the Taint mode is enabled, we must at least add the following code to untaint the PATH environment variable and delete a few other insecure environment variables. This information can be found in the perlsec manpage.

  $ENV{'PATH'} = '/bin:/usr/bin';
  delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};

Now all we have to do is to reuse the code from the previous section.

First, we move the core program into the external.pl file, add the shebang first line so the program will be executed by Perl, tell the program to run under Taint mode (-T) and possibly enable the warnings mode (-w) and make it executable:

  #!/usr/bin/perl -Tw
  open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
  open STDOUT, '>/dev/null'
      or die "Can't write to /dev/null: $!";
  open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
  select STDERR;
  local $|=1;
  warn "started\n";
  # do something time-consuming
  sleep 1, warn "$_\n" for 1..20;
  warn "completed\n";

Now we replace the code that moved into the external program with exec() to call it:

  use strict;
  use POSIX 'setsid';
  use Apache::SubProcess;
  $ENV{'PATH'} = '/bin:/usr/bin';
  delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
  my $r = shift;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent has finished, kid's PID: $kid\n";
  } else {
      $r->cleanup_for_exec(); # untie the socket
      chdir '/'                or die "Can't chdir to /: $!";
      open STDIN, '/dev/null'  or die "Can't read /dev/null: $!";
      open STDOUT, '>/dev/null'
          or die "Can't write to /dev/null: $!";
      open STDERR, '>&STDOUT'  or die "Can't dup stdout: $!";
      setsid or die "Can't start a new session: $!";
      exec "/home/httpd/perl/external.pl" or die "Cannot execute exec: $!";

Notice that exec() never returns unless it fails to start the process. Therefore, you shouldn't put any code after exec()--it will be not executed in the case of success. Use system() or back-ticks instead if you want to continue doing other things in the process. But then you probably will want to terminate the process after the program has finished. So you will have to write:

      system "/home/httpd/perl/external.pl" or die "Cannot execute system: $!";

Another important nuance is that we have to close all STD* streams in the forked process, even if the called program does that.

If the external program is written in Perl, then you may pass complicated data structures to it using one of the methods to serialize Perl data and then to restore it. The Storable and FreezeThaw modules come handy. Let's say that we have program master.pl calling program slave.pl:

  # we are within the C<mod_perl> code
  use Storable ();
  my @params = (foo => 1, bar => 2);
  my $params = Storable::freeze(\@params);
  exec "./slave.pl", $params or die "Cannot execute exec: $!";
  #!/usr/bin/perl -w
  use Storable ();
  my @params = @ARGV ? @{ Storable::thaw(shift)||[] } : ();
  # do something

As you can see, master.pl serializes the @params data structure with Storable::freeze and passes it to slave.pl as a single argument. slave.pl restores it with Storable::thaw, by shifting the first value of the ARGV array if available. The FreezeThaw module does a similar thing.

Starting a Short-Running External Program

Sometimes you need to call an external program and you cannot continue before this program completes its run and optionally returns some result. In this case, the fork solution doesn't help. But we have a few ways to execute this program. First using system():

  system "perl -e 'print 5+5'"

We believe that you will never call the Perl interperter for doing this simple calculation, but for the sake of a simple example it's good enough.

The problem with this approach is that we cannot get the results printed to STDOUT, and that's where back-ticks or qx() help. If you use either:

  my $result = `perl -e 'print 5+5'`;


  my $result = qx{perl -e 'print 5+5'};

the whole output of the external program will be stored in the $result variable.

Of course, you can use other solutions, such as opening a pipe (| to the program) if you need to submit many arguments and more evolved solutions provided by other Perl modules like IPC::Open2, which allows to open a process for both reading and writing.

Executing system() or exec() in the Right Way

The exec() and system() system calls behave identically in the way they spawn a program. For example, let's use system(). Consider the following code:


Perl will use the first argument as a program to execute, find /bin/echo along the search path, invoke it directly and pass the Hi string as an argument.

Perl's system() is not the system(3) call (from the C-library). This is how the arguments to system() get interpreted. When there is a single argument to system(), it'll be checked for having shell metacharacters first (like *,?), and if there are any--Perl interpreter invokes a real shell program (/bin/sh -c on Unix platforms). If you pass a list of arguments to system(), then they will be not checked for metacharacters, but split into words if required and passed directly to the C-level execvp() system call, which is more efficient. That's a very nice optimization. In other words, only if you do:

  system "sh -c 'echo *'"

will the operating system actually exec() a copy of /bin/sh to parse your command. But even then, since sh is almost certainly already running somewhere, the system will notice that (via the disk inode reference) and replace your virtual memory page table with one pointing to the existing program code plus your data space, thus will not create this overhead.


Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en