August 2001 Archives

This Week on Perl 6 (19 - 25 August 2001)

Notes

This Week in Perl 6 News

Closures

Method Signatures

Foo::$bar

Perl 6 Internals (Yes, There Are Some...)

More Modules

Last Words

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to bwarnock@capita.com.

Another quiet week, with only 52 messages. 22 authors contributed to 10 threads.

Closures

(17 posts) Dave Mitchell started the debate on whether closures should be explicitly declared, by way of having lexical variables not import into nested blocks implicitly. Some areas of discussion centered around whether named subs are closures, closures are ever created accidentally, and the current behavior is correct.

Dave further provided some insight into how closures actually work, to help explain his point:

With the $x:

foo() is a closure created at compile time. By the time the main {} block has been executed (but before foo() is called), the $outer:x is undef, and $foo:x is 'bar' (standard closure stuff). When foo() is executed, the anon sub is cloned, and at that time, $anon:x is set from from foo's pad, so it gets 'bar'.

Without the $x:

foo is no longer a closure - ie it doesnt have a private copy of $x in its pad. At cloning time, sub {$x} picks up its value of $x from $outer:x, since there isn't a $x in foo's pad - thus it picks up 'undef' from $outer:x that went out of scope a while ago.

Method Signatures

(12 posts) Damian Conway answered last week's question on whether subroutine signatures will apply to methods in Perl 6. There was some subsequent debate on how strict Perl's optional typing would need to be, and how easily it would still be to circumvent through regular Perl magic.

Foo::$bar

(2 posts) Brent Dax asked:

I was thinking about Perl 6 today, and thought of something: if the sigil is now part of a variable's name, does that mean that $Foo::bar should actually be Foo::$bar in Perl 6?

Michael Schwern's thinking was:

Techincally 'bar' is shorthand for the complete name, 'Foo::bar'. So '$Foo::bar' would remain.

Besides, Foo::$bar looks funny.

Perl 6 Internals (Yes, There Are Some...)

(5 posts) Dan Sugalski announced that he's got code:

I've got the rudiments of the parrot interpreter and assembler built and running. (I get around 23M ops/sec on a 700MHz Alpha EV6) I'm beating it up enough to get it into a reasonably released state, so while I'm doing that...

Simon Cozens made this suggestion:

On an unrelated note, and seeing Dan [and] Bryan's experiments with different kinds of switch/dispatch, I think it makes sense to separate out ops which correspond to PMC vtable functions (add, subtract, etc.) and those which don't. Those which do can be done with a switch to save a function call, and those which don't can use function pointers. This achieves the same objective as auto-generating op wrappers around vtable functions, (saving one level of indirection) while leveraging the gain from a split-level op despatch loop.

More Modules

(7 posts) John Siracusa continued the discussion on the Perl 6 module plan, calling for more conformity in APIs, and deepening the namespace heirarchy.

Kirrily Robert pointed to some work she's been doing on the perl5-porters list - perlmodstyle - in preparation for Perl 6.

Last Words

We should be seeing Apocalypse 3 (Larry Wall) and Exegesis 3 (Damian Conway) sometime this week, if things are on schedule. Nathan Torkington is currently evaluating SourceForge as the Perl 6 code repository.

I'd like to keep the code on Sourceforge from the get-go. I don't have much experience with Sourceforge, though, and would like to talk to someone who has. Which bits work well? Which bits aren't worth the effort? Any tips or tricks to pass on?

If you've got some opinions, pass them on to Nat.


Bryan C. Warnock

This Week on p5p 2001/08/27

Notes

This Week on P5P

$] and v-strings

Callbacks in the core

CvMETHOD and ->can()

Coderefs in @INC (again)

Malloc Madness

Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

$] and v-strings

John Peacock sought to make vstrings and ordinary strings that look like version numbers compare equal; this lead to some problems, and a lot of hasty and misunderstood explanations. The nub of the problem seemed to be that there is essentially no semantic difference between a vstring and an ordinary string, as was angrily pointed out by... uh, someone or other:


    VSTRINGS ARE REALLY SIMPLE.

    vx.y.z == join '', map ord, x, y, z

    THAT'S ALL.

    NOTHING ELSE.

(Of course, I meant chr instead of ord. Oh, the embarrassment.)

John's self-stated "overriding goal" is laudable: "make sure that all comparisons of versions happen in a consistent fashion". However, I'm far from convinced that this is possible. My beef with it is that v49.46.49, under the above definition, looks rather a lot like "1.1" to Perl, if not to you. (John disagreed with this, but that's OK; he wants to catch it while the tokeniser still sees it in the v... form.)

Should v49.46.49 compare equal to "49.46.49", "1.1", or both? Or neither? Which would be consistent? Does it, in the end, matter?

Urgh.

Callbacks in the core

David Lloyd had three goes at getting callback support into core; he's still trying. His Async::Callback module, which previously featured here and is still really great, plugs in a new runops routine to Perl to allow other routines to be run as a callback after every operation. However, to avoid having to change it every time Perl changed, he wanted the ability to add custom callbacks to runops to be the default in core. A nice idea, but after a couple of iterations of patches, he still had a slowdown of 7% on programs which didn't even use any callbacks. This was obviously unacceptable, (especially since a mere three modules - Async::Callback being one of them - use the pluggable runops functionality) but he's going to have another go at speeding it up.

Nicholas Clark suggested an alternative: have several runops functions - a standard one, a signal-aware one, and so on - and have them plugged in when they're called for. Whenever someone touched %SIG, the signal-aware loop would be substituted in, and so on. The only problem was the fact that a runops routine does not end until the end of the program; but I have some ideas to fix that...

CvMETHOD and ->can()

Geoffery Young asked whether or not we should set CvMETHOD on the coderefs returned by can so that "XS routines can know whether to use perl_call_method or perl_call_sv on it". Artur pointed out that this was a bit of a waste of time:

When you have a coderef, why not just call perl_call_sv and push the object as the first argument. Since can already found the correct method, we don't need to search again using call_method.

There was some to-and-fro before Nick Ing-Simmons came along and told us what CvMETHOD was really for:

CvMETHOD corresponds to the : method attribute you can give to subs. Its original use was for Malcolm's original thread locking scheme. A sub which had : locked and : method locked the object; one which just had : locked locked the CV itself.

One could make a case that ->can should fail unless sub had : methodattribute - but I won't 'cos lots of OO perl code (including mine) does not annotate methods.

Oh, and speaking of can, Tony Bowden added some tests and documentation for a confusing property. In his words:

can cannot know whether an object will be able to provide a method through AUTOLOAD, so a return value of undef does not necessarily mean the object will not be able to handle the method call. To get around this some module authors use a forward declaration for methods they will handle via AUTOLOAD. For such 'dummy' subs, canwill still return a code reference, which, when called, will fall through to the AUTOLOAD. If no suitable AUTOLOAD is provided, calling the coderef will cause an error.

Coderefs in @INC (again)

Rafael Garcia-Suarez is this week's hero, for documenting this feature which has been long dormant. (Though I've often called for people to document it.)

Here's the original description from the digest; Nicholas Clark points to original discussion of it here. Raphael's documentation is here.

Malloc Madness

Doug Wilson found an absolutely horrific problem in New which can't be adequately resolved without a speed hit, as far as I can see.

You'll like this.

Assume I say something like


    $#var1 = 2_147_483_647;

Now, internally, Perl allocates space for that many SVs in the array by making a call to New, something like this:


    New(0, my_array, 2147483647, SV*)

New is implemented like this:


    #define New(x,v,n,t)    (v = (t*)safemalloc((MEM_SIZE)((n)*sizeof(t))))

Now, what happens if n is very large? You guessed it, the multiplication overflows, and we end up with a small number of bytes being allocated - but the allocation doesn't fail, so we think we've got the lot. The result is spectacular.

Nobody said anything about Doug's find or his patch.

Various

Artur copied out this handy discovery from "Programming with POSIX threads"

Jeff Okamoto, a rather excellent HP hacker, announced that unfortunately he's had to stop working on Perl's IPv6 support because he's been laid off and needs a new job. I wonder if anyone would like to do something about that.

Robin Houston announced his wonderful Want module, which can only be described as wantarray on steroids. Check it out.

Paul Kulchenko dropped in a module which uses regular expressions to parse XML, and I defy anyone to tell him he shouldn't do that. But look, we've now got a pure-Perl XML parser! Life is good!

MJD reported from comp.lang.perl.misc that someone had found a bug in Carp; if overloaded objects where thrown, and the resulting stringification called carp, a rather nasty recursion would ensue. Michael Schwern pointed out that this has already been fixed.

Nicholas Clark asked whether or not BOOT: was threadsafe - would two interpreter threads both try loading a dynamic module? Artur didn't know, but hoped not. Nicholas also provided a massive patch to division to make it preserve IVs if possible, and some tests for binmode, which Michael Schwern subverted into yet another Test::Simple test.

I threw in a patch to provide support for adding your own ops to Perl at runtime. Now I've got to go write the supporting modules for it, so until next week I remain, your humble and obedient servant,


Simon Cozens

Perl Helps The Disabled

As part of Mark-Jason Dominus's Lightning Talks at the 2001 O'Reilly Open Source Convention, Jon Bjornstad gave a talk about a Perl/Tk program he wrote to help a mute quadriplegic friend, Sue Simpson, speak and better use her computer. Jon's talk received a grand reception, not only for his clever use of Perl, but for a remarkably unselfish application of his skills.

Q: Who are you, and what do you do?

A: I'm a Perl programmer, currently employed by a company called Sesame Technology, a small consulting firm in Scotts Valley near Santa Cruz. I'm 51 years old, so I've been around. My first language was Fortran IV. I've programmed in C on UNIX, done lots of FoxPro for UNIX, various database applications, as well as several large Perl apps.

Q: What got you into Perl?

A: About five years ago, a community project I was involved with wanted to create a Web store. The product I found to do the job was written in Perl, so I decided I had better learn the language. I went out and bought "Learning Perl," which was the first edition at that time. It was right up my alley. I absorbed that book in two days, and knew that was clearly where I wanted to be. It just made a lot of sense, and was a very easy transition. I haven't written a C program since!

Sue at home

Q: Your lightning talk was about a program you wrote for a friend. Tell us about that friend.

A: I met Sue in 1986 through a request on a mailing list. The sender was seeking help to install some software for a woman described as a mute quadriplegic, paralyzed from the neck down. The request piqued my interest, and has led to a long-standing friendship. For years I helped with configuring and programming a device called the Express 3, a rectangular array of LEDs that she could point to with a light pen attached to her glasses. That system is now obsolete, and newer devices are beyond the funding of her family at present, and the variety of other choices available really didn't match her needs.

Q: Tell us about the program you wrote for her.

Screenshot of Word Prediction Software
Word Prediction Software in action
 
Screenshot of Keyboard Software
The Keyboard Interface

A: There are commercial programs that allow the user to point to letters on an on-screen keyboard, spell out phrases and so on. At first, for fun, I started writing a program with Perl/Tk for Sue that performed some of the same functions. Over time, the program has evolved to be more of a total environment rather than just a keyboard and mouse.

The base functionality includes an on-screen keyboard that lets her choose letters, words and phrases by pointing to them. By using word prediction, I have the words and phrases she has previously used displayed in a list form as she types, so that she can select a word without typing it in completely. She can both type in prefixes and then select from a list of matching words, or use abbreviations that get expanded to the word.

Since she cannot click a mouse button, it was important that the selection of letters and words not require any clicking. I implemented this with a timer that triggers the button or word when the pointer pauses over that element.

I added a text-to-speech synthesizer from Microsoft that is freely downloadable and works well with Perl. From the Gutenberg project we downloaded some public-domain, full textbooks and created a reader interface with which she can select chapters and sections. She can bookmark locations and return to them later.

Sue also wanted to be able to view photos, so I added a photo album. She can put photos of her family and friends, and browse them at her leisure. I also allowed her to redirect the text input to a file (like ``tee'' on Unix). She can even customize all the display colors.

Q: What was the most difficult part of the program?

Screenshot of Dictionary
Looking up a word using the Dictionary
 
Screenshot of the Reader
The Reader Display

A: I included a dictionary that is referenced with the texts she reads. She can pause over any word in the text and have that word looked up in the dictionary, giving quick access to the definition. The tricky part was allowing the selection to happen by hovering over the word, and determining what that word is in the text. Nancy Walsh probably didn't know people would actually use these types of details from her book, "Learning Perl/Tk." I did! Tk is a wonderful thing.

Q: How long did the base functionality take you?

A: About a week. The dictionary itself took about another week. I had to massage the dictionary into DBM files to allow for quicker access, and the hover selection I described took some time. And then there were the usual endless fiddlings and polishings that took months!

Q: Why did you choose Perl for the program?

A: That was a natural choice. I know Perl best, and Perl's strengths were a perfect fit: extensive text processing, quick creation of interfaces with Tk and so on. The short time it took to create the program is evidence that my decision was a good one.

There's another project under way I have noticed, called the Hawking Communicator. They are creating a program for Stephen Hawking that includes some similar functionality. Of course, they are assuming that a user has the ability to right- or left-click. Apparently Stephen Hawking only has the ability to click - he can only move two fingers with control. He cannot control the mouse in two dimensions as Sue can. He uses a technique called "switch scanning".

There are lots of other programs similar to mine. None of them are perfect for everyone so there is still room for fun projects like the one that I started.

Q: Are you still adding features to the program?

A: It's an ongoing project. One thing I want to add is the ability for Sue to control her lights and television through an X10 interface similar to the Misterhouse software. This probably wouldn't be too difficult with the X10 modules.

It would also be nice if she had a way to tidy things up: clean up her files, delete words and phrases from the word prediction lists, and also to control more aspects of the whole environment. Sue is a very sharp lady, and has high standards with regards to order, punctuation and so on.

Q: Do you plan to make the program available?

A: Definitely. I'm going to have it downloadable from my Web site soon, with some detailed instructions on how to install it. I've hoped all along that other people could benefit from the program.

Q: Many attendees found your talk inspiring. What are some ways that you see for the Perl community and other programmers to use their abilities for the good of others?

A: I've been involved in quite a few volunteer projects. There's a retreat center near me named Mount Madonna Center, for which I've written programs to help them organize their activities, registration and so on. They have a private school there as well, for which I wrote a complete school administration program.

I think people just need to be aware of the needs of others. It is quite rewarding to write a program that directly benefits someone. But volunteers are needed everywhere. There are needs for people with computer expertise in most organizations, if not programming expertise. Just being present as a person with those skills is helpful.

Q: After your five-minute talk, the audience roared in applause. Many people are still talking about it as one of their favorite talks of the conference. Has any of this surprised you?

A: There were lots of talks that took much more effort than mine. This one probably touched a human chord that's present in all of us. Programmers are often considered stereotypical nerds who relate only to their computers. But we're human, too! It's nice to have a project that happens to merge the allure of hacking, which we all understand, with the opportunity to meet a human need.

Q: Finally, an off-topic question. What is the best and worst thing about Perl?

A: One of the other lightning talks was by a manager who had been turned back on to programming. He had lots of good things to say about the Perl community, with its attitude of sharing and helpfulness. It is quite infectious, and I'm proud to be part of that. That's the best thing. That manager fellow said that Perl is more like a home than a hotel. That sounded just right to me!

The worst thing about Perl may be its flexibility. Being so flexible has gained Perl a reputation as a hacker's language. A person needs a certain discipline to write good Perl, since style and strictness are not enforced.

Q: Any final words?

A: That's all folks, but I'll continue to work on making that Web site more easily accessed and complete.

This Week on Perl 6 (12 - 18 August 2001)

Notes

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to bwarnock@capita.com.

There were 44 messages across 10 threads, with 26 authors contributing.

The Modules Plan

(21 posts) The discussion continued with talk about CPAN, namespaces, and implementations.

Perl 6 Internals

(1 post) Simon Cozens gave an update on what he's up to with Perl 6.

The other front is, of course, code. I have started writing some code which sets up vtables, marshals access to an object's method through its vtable, and helps you write new object types. I'm also trying to develop an integer object type. All it does at the moment is the infrastructure that allows you to create a new integer PMC and get and set its value.

(1 post) Uri Guttman suggested that some perl ops should be written in Perl.

(11 posts) Numerous folks continued the discussion on the Coding Conventions PDD.

Perl 6 Language

(2 posts) Michael Schwern asked that implicit @_ passing be removed. Damian replied that it would be, although as a side-effect to some new behavior.

(1 post) Garrett Goebel asked whether subroutine signatures will apply to methods in Perl 6.

(1 post) John Siracusa asked if properties were tempable.

(1 post) I brought up an inconsistency in the visibility of Perl 5's my, our, and local, implicitly asking if it could be changed for Perl 6.

(3 posts) Raptor requested a way to preserve leading white space in Here Docs. Michael Schwern pointed out that this is capable with the new functionality.

Last Words

Quiet week aside, Perl 6 is alive and well. Dan Sugalski and Simon Cozens are finishing some last minute work that could be the seed for the Perl 6 internals. Larry and Damian are working on the next Apocalypse and Exegesis, respectively. They should be released in about a week.


Bryan C. Warnock

Choosing a Templating System

Introduction

Go on, admit it: You've written a templating system. It's OK, nearly everyone has at some point. You start with something beautifully simple like $HTML =~ s/\$(\w+)/${$1}/g and end up adding conditionals, loops and includes until you've created your own unmaintainable monster.

Luckily for you, you are not the first to think it might be nice to get the HTML out of your code. Many have come before, and more than a few have put their contributions up on CPAN. At this time, there are so many templating modules on CPAN that it's almost certain you can find one that meets your needs. This document aims to be your guide to those modules, leading you down the path to the templating system of your dreams.

And, if you just went straight to CPAN in the first place and never bothered to write your own, congratulations: You're one step ahead of the rest of us.

On a Personal Note

Nothing can start an argument faster on the mod_perl mailing list than a claim that one approach to templating is better than another. People get attached to the tools they've chosen. Therefore, let me say up front that I am biased. I've been at this for a while and I have opinions about what works best. I've tried to present a balanced appraisal of the features of various systems in this document, but it probably won't take you long to figure out what I like. Besides, attempts to be completely unbiased lead to useless documents that don't contain any real information. So take it all with a pound of salt and if you think I've been unfair to a particular tool through a factual error or omission, let me know.

Why Use Templates?

Why bother using templates at all? Print statements and CGI.pm were good enough for grandpa, so why should you bother learning a new way to do things?

Consistency of Appearance

It doesn't take a genius to see that making one navigation bar template and using it in all of your pages is easier to manage than hard-coding it everywhere. If you build your whole site like this, it's much easier to make site-wide changes in the look and feel.

Reusability

Along the same lines, building a set of commonly used components makes it easier to create new pages.

Better Isolation From Changes

Which one changes more often: the logic of your application or the HTML used to display it? Actually the answer doesn't matter, as long as it's one of them. Templates can be a great abstraction layer between the application logic and the display logic, allowing one to be updated without touching the other.

Division of Labor

Separating your Perl code from your HTML means that when your marketing department decides everything should be green instead of blue, you don't have to lift a finger. Just send them to the HTML coder down the hall. It's a beautiful thing, getting out of the HTML business.

Even if the same people in your organization write the Perl code and the HTML, you at last have the opportunity for more people to be working on the project in parallel.

What Are the Differences?

Before we look at the available options, let's go through an explanation of some of the things that make them different.

Execution Models

Although some try to be flexible about it, most templating systems expect you to use some variation of the two basic execution models, which I will refer to as ``pipeline'' and ``callback.'' In the callback style, you let the template take over and it has the application's control flow coded into it. It uses callbacks to modules or snippets of in-line Perl code to retrieve data for display or perform actions such as user authentication. Some popular examples of systems using this model include Mason, Embperl and Apache::ASP.

The pipeline style does all the work up front in a standard CGI or mod_perl handler, then decides which template to run and passes some data to it. The template has no control flow logic in it, just presentation logic, e.g. show this graphic if this item is on sale. Popular systems supporting this approach include HTML::Template and Template Toolkit.

The callback model works well for publishing-oriented sites where the pages are essentially mix and match sets of articles and lists. Ideally, a site can be broken down into visual ``components'' or pieces of pages that are general enough for an HTML coder to recombine them into entirely new kinds of pages without any help from a programmer.

The callback model can get a bit hairy when you have to code logic that can result in totally different content being returned. For example, if you have a system that processes some form input and takes the user to different pages depending on the data submitted. In these situations, it's easy to end up coding a spaghetti of includes and redirects, or putting what are really multiple pages in the same file.

On the other hand, a callback approach can result in fewer files (if the Perl code is in the HTML file), and feels easier and more intuitive to many developers. It's a simple step from static files to static files with a few in-line snippets of code in them. This is part of why PHP is so popular with new developers.

The pipeline model is more like a traditional model-view-controller design. Working this way can provide additional performance tuning opportunities over an approach where you don't know what data will be needed at the beginning of the request. You can aggregate database queries, make smarter choices about caching, etc. It can also promote a cleaner separation of application logic and presentation. However, this approach takes longer to get started with since it's a bigger conceptual hurdle and always involves at least two files: one for the Perl code and one for the template.

Keep in mind, many systems offer significant flexibility for customizing their execution models. For example, Mason users could write separate components for application logic and display, letting the logic components choose which display component to run after fetching their data. This allows it to be used in a pipeline style. A Template Toolkit application could be written to use a simple generic handler (like the Apache::Template module included in the distribution) with all the application logic placed in the template using object calls or in-line Perl. This would be using it in a callback style.

HTML::Template and some of the AxKit XML processors are fairly rigid about insisting on a pipeline approach. Neither provide methods for calling back into Perl code during the HTML formatting stage; you have to do the work before running the template. The authors of these tools consider this a feature since it prevents developers from cheating on the separation of application code and presentation.

Languages

Here's the big issue with templating systems. This is the one that always cranks up the flame on Web development mailing lists.

Some systems use in-line Perl statements. They may provide some extra semantics, like Embperl's operators for specifying whether the code's output should be displayed or Mason's <%init> sections for specifying when the code gets run, but at the end of the day your templates are written in Perl.

Other systems provide a specialized mini-language instead of (or in addition to) in-line Perl. These will typically have just enough syntax to handle variable substitution, conditionals and looping. HTML::Template and Template Toolkit are popular systems using this approach. AxKit straddles the fence, providing both a (not-so-) mini-language - XSLT - and an in-line Perl approach - XPathScript.

Here's how a typical discussion of the merits of these approaches might go:

IN-LINE: Mini-languages are stupid. I already know Perl and it's easy enough. Why would you want to use something different?

MINI-LANG: Because my HTML coder doesn't know Perl, and this is easier for him.

IN-LINE: Maybe he should learn some Perl. He'd get paid more.

MINI-LANG: Whatever. You just want to use in-line Perl so you can handle change requests by putting little hacks in the template instead of changing your modules. That's sloppy coding.

IN-LINE: That's efficient coding. I can knock out data editing screens in half the time it takes you, and then I can go back through, putting all the in-line code into modules and just have the templates call them.

MINI-LANG: You could, but you won't.

IN-LINE: Is it chilly up there in that ivory tower?

MINI-LANG: Go write some VBScript, weenie.

etc.

Most people pick a side in this war and stay there. If you are one of the few who hasn't decided yet, you should take a moment to think about who will be building and maintaining your templates, what skills those people have and what will allow them to work most efficiently.

Here's an example of a simple chunk of template using first an in-line style (Apache::ASP in this case) and then a mini-language style (Template Toolkit). This code fetches an object and displays some properties of it. The data structures used are identical in both examples. First Apache::ASP:

  <% my $product = Product->load('sku' => 'bar1234'); %>

  <% if ($product->isbn) { %>
    It's a book!
  <% } else { %>
    It's NOT a book!
  <% } %>

  <% foreach my $item (@{$product->related}) { %>
    You might also enjoy <% $item->name %>.
  <% } %>

And now Template Toolkit:

  [% USE product(sku=bar1234) %]


  [% IF product.isbn %]
    It's a book!
  [% ELSE %]
    It's NOT a book!
  [% END %]

  [% FOREACH item = product.related %]
    You might also enjoy [% item.name %].
  [% END %]

There is a third approach, based on parsing an HTML document into a DOM tree and then manipulating the contents of the nodes. The only module using this approach is HTML_Tree. The idea is similar to using a mini-language, but it doesn't require any non-standard HTML tags and it doesn't embed any logic about loops or conditionals in the template itself. This is nice because it means your templates are valid HTML documents that can be viewed in a browser and worked with in most standard HTML tools. It also means people working with the templates can put placeholder data in them for testing and it will simply be replaced when the template is used. This preview ability only breaks down when you need an if/else type construct in the template. In that situation, both the ``if'' and ``else'' chunks of HTML would show up when previewing.

Parsers and Caching

The parsers for these templating systems are implemented in one of three ways: They parse the template every time (``repeated parse''), they parse it and cache the resulting parse tree (``cached parse tree''), or they parse it, convert it to Perl code and compile it (``compiled'').

Systems that compile templates to Perl take advantage of Perl's powerful run-time code evaluation capabilities. They examine the template, generate a chunk of Perl code from it and eval the generated code. After that, subsequent requests for the template can be handled by running the compiled bytecode in memory. The complexity of the parsing and code generation steps varies based on the number of bells and whistles the system provides beyond straight in-line Perl statements.

Compiling to Perl and then to Perl bytecode is slow on the first hit but provides excellent performance once the template has been compiled, since the template becomes a Perl subroutine call. This is the same approach used by systems like JSP (Java ServerPages). It is most effective in environments with a long-running Perl interpreter, like mod_perl.

HTML::Template, HTML_Tree, and the 2.0 beta release of Embperl all use a cached parse tree approach. They parse templates into their respective internal data structures and then keep the parsed structure for each processed template in memory. This is similar to the compiled Perl approach in terms of performance and memory requirements, but does not actually involve Perl code generation and thus doesn't require an eval step. Which way is faster: caching the parse tree or compiling? It's hard to objectively measure, but anecdotal evidence seems to support compilation. Template Toolkit used a cached parse tree approach for version 1, but switched to a compilation approach for version 2 after tests showed it to offer a significant speed increase. However, as will be discussed later, either approach is more than fast enough.

In contrast to this, a repeated parse approach may sound slow. However, it can be pretty fast if the tokens being parsed for are simple enough. Systems using this approach generally use simple tokens, which allows them to use fast and simple parsers.

Why would you ever use a system with this approach if compilation has better performance? Well, in an environment without a persistent Perl interpreter like vanilla CGI this can actually be faster than a compiled approach since the startup cost is lower. The caching of Perl bytecode done by compilation systems is useless when the Perl interpreter doesn't stick around for more than one request.

There are other reasons, too. Compiled Perl code takes up a lot of memory. If you have many unique templates, they can add up fast. Imagine how much RAM it would take up if each page that used server-side includes (SSI) had to stay in memory after they had been accessed. (Don't worry, the Apache::SSI module doesn't use compilation so it doesn't have this problem.)

Application Frameworks vs. Just Templates

Some of the templating tools try to offer a comprehensive solution to the problems of Web development. Others offer just a templating solution and assume you will fit this together with other modules to build a complete system.

Some common features offered in the frameworks include:

URL Mapping

All of the frameworks offer a way to map a URL to a template file. In addition to simple mappings similar to the handling of static documents, some offer ways to intercept all requests within a certain directory for pre-processing, or to create an object inheritance scheme out of the directory structure of a site.

Session Tracking

Most interactive sites need to use some kind of session tracking to associate application state data with a user. Some tools make this simple by handling all the cookies or URL-munging for you and allowing you simply to read and write from an object or hash that contains the current user's session data. A common approach is to use the Apache::Session module for storage.

Output Caching

Caching is the key to good performance in many Web systems, and some of these tools provide user-controlled caching of output. This is one of the major features of both Mason and AxKit. AxKit can cache at the page level, while Mason also offers fine-grained caching of components within the page.

Form Handling

How will you live without CGI.pm to parse incoming form data? Many of these tools will do it for you, making it available in a convenient data structure. Some also validate form input, and even provide ``sticky'' form widgets that keep their selected values when re-displayed or set up default values based on data you provide.

Debugging

Everyone knows how painful it can be to debug a CGI script. Templating systems can make it worse, by screwing up Perl's line numbers with generated code. To help fix the problem they've created, some offer built-in debugging support, including extra logging, or integration with the Perl debugger.

If you want to use a system that just does templates but you need some of these other features and don't feel like implementing them yourself, there are some tools on CPAN that provide a framework you can build upon. The libservlet distribution, which provides an interface similar to the Java servlet API, is independent of any particular templating system. Apache::PageKit and CGI::Application are other options in this vein, but both of these are currently tied to HTML::Template. OpenInteract is another framework, this time tied to Template Toolkit. All of these could be adapted for the ``just templates'' module of your choice with fairly minimal effort.

The Contenders

OK, now that you know something about what separates these tools from one another, let's take a look at the top choices for Perl templating systems. This is not an exhaustive list: I've only included systems that are currently maintained, well-documented and have managed to build up a significant user community. In short, I've left out about dozen less-popular systems. At the end of this section, I'll mention a few systems that aren't as commonly used but may be worth a look.

SSI

SSI is the granddaddy of templating systems, and the first one that many people used since it comes as a standard part of most Web servers. With mod_perl installed, mod_include gains some additional power. Specifically, it is able to take a new #perl directive that allows for in-line subroutine calls. It can also efficiently include the output of Apache::Registry scripts by using the Apache::Include module.

The Apache::SSI module implements the functionality of mod_include entirely in Perl, including the additional #perl directive. The main reasons to use it are to post-process the output of another handler (with Apache::Filter) or to add your own directives. Adding directives is easy through subclassing. You might be tempted to implement a complete template processor in this way, by adding loops and other constructs, but it's probably not worth the trouble with so many other tools out there.

SSI follows the callback model and is mostly a mini-language, although you can sneak in bits of Perl code as anonymous subs in #perl directives. Because SSI uses a repeated parse implementation, it is safe to use it on large numbers of files without worrying about memory bloat.

SSI is a great choice for sites with fairly simple templating needs, especially ones that just want to share some standard headers and footers between pages. However, you should consider whether your site will eventually need to grow into something with more flexibility and power before settling on this simple approach.

HTML::Mason

http://www.masonhq.com/

Mason has been around for a few years, and has built a loyal following. It was originally created as a Perl clone of some of the most interesting features from Vignette StoryServer, but has since become its own unique animal. It comes from a publishing background, and includes features oriented toward splitting pages into re-useable chunks, or ``components.''

Mason uses in-line Perl with a compilation approach, but has a feature to help keep the Perl code out of the HTML coder's way. Components (templates) can include a section of Perl at the end of the file that is wrapped inside a special tag indicating that it should be run first, before the rest of the template. This allows programmers to put all the logic for a component down at the bottom away from the HTML, and then use short in-line Perl snippets in the HTML to insert values, loop through lists, etc.

Mason is a site development framework, not just a templating tool. It includes a handy caching feature that can be used for capturing the output of components or simply storing data that is expensive to compute. It is currently the only tool that offers this sort of caching as a built-in. It also implements an argument parsing scheme that allows a component to specify the names, types and default values that it expects to be passed, either from another component or from the values passed in the URI query string.

While the documentation mostly demonstrates a callback execution model, it is possible to use Mason in a pipeline style. This can be accomplished in various ways, including building special components called ``autohandlers,'' which run before anything else for requests within a certain directory tree. An autohandler could do some processing and set up data for a display template that includes only minimal in-line Perl. There is also support for an object-oriented site approach, applying concepts such as inheritance to the site directory structure. For example, the autohandler component at /store/book/ might inherit a standard layout from the autohandler at /store/, but override the background color and navigation bar. Then /store/music/ can do the same, with a different color. This can be a powerful paradigm for developing large sites. Note that this inheritance is only supported at the level of methods defined in autohandler components. You can't override the component /store/foo.html with another one at /store/book/foo.html.

Mason's approach to debugging is to create ``debug files'' that run Mason outside of a Web server environment, providing a fake Web request and activating the debugger. This can be helpful if you're having trouble getting Apache::DB to behave under mod_perl, or using an execution environment that doesn't provide built-in debugger support.

Another unique feature is the ability to leave the static text parts of a large template on disk, and pull them in with a file seek when needed rather than keeping them in RAM. This exchanges some speed for a significant savings in memory when dealing with templates that are mostly static text.

There are many other features in this package, including filtering of HTML output and a page previewing utility. Session support is not built-in, but a simple example showing how to integrate with Apache::Session is included. Mason's feature set can be a bit overwhelming for newbies, but the high-quality documentation and helpful user community go a long way.

HTML::Embperl

http://perl.apache.org/embperl/

Embperl makes its language choice known up front: embedded perl. It is one of the most popular in-line Perl templating tools and has been around longer than most of the others. It has a solid reputation for speed and ease of use.

It is commonly used in a callback style, with Embperl intercepting URIs and processing the requested file. However, it can optionally be invoked through a subroutine call from another program, allowing it to be used in a pipeline style. Templates are compiled to Perl bytecode and cached.

Embperl has been around long enough to build an impressive list of features. It has the ability to run code inside a Safe compartment, support for automatically cleaning globals to make mod_perl coding easier, and extensive debugging tools including the ability to e-mail errors to an administrator.

The main thing that sets Embperl apart from other in-line Perl systems is its tight HTML integration. It can recognize TABLE tags and automatically iterate over them for the length of an array. It automatically provides sticky form widgets. An array or hash reference placed at the end of a query string in an HREF or SRC attribute will be automatically expanded into query string ``name=value'' format. META HTTP-EQUIV tags are turned into true HTTP headers.

Another reason people like Embperl is that it makes some of the common tasks of Web application coding so simple. For example, all form data is always available just by reading the magic variable %fdat. Sessions are supported just as easily, by reading and writing to the magic %udat hash. There is also a hash for storing persistent application state. HTML-escaping is automatic (though it can be toggled on and off).

Embperl includes something called EmbperlObject, which allows you to apply OO concepts to your site hierarchy in a similar way to the autohandler and inheritance features of Mason, mentioned above. This is a convenient way to code sites with styles that vary by area, and is worth checking out. EmbperlObject includes the ability to do overrides on a file level. This means that you can have a directory like /store/music that overrides specific templates and inherits the rest from a parent directory.

One drawback of older versions of Embperl was the necessity to use built-in replacements for most of Perl's control structures like ``if'' and ``foreach'' when they are being wrapped around non-Perl sections. For example:

  [$ if ($foo) $]
    Looks like a foo!
  [$ else $]
    Nope, it's a bar.
  [$ endif $]

These may seem out of place in a system based around in-line Perl. As of version 1.2b2, it is possible to use Perl's standard syntax instead:

  [$ if ($foo) { $]
    Looks like a foo!
  [$ } else { $]
    Nope, it's a bar.
  [$ } $]

At the time of this writing, a new 2.x branch of Embperl is in beta testing. This includes some interesting features such as a more flexible parsing scheme that can be modified to users' tastes. It also supports direct use of the Perl debugger on Embperl templates and provides performance improvements.

Apache::AxKit

http://axkit.org/

AxKit is the first mod_perl page generation system to be built from the ground up around XML. Technically, AxKit is not a templating tool but rather a framework for stringing together different modules that generate and transform XML data. In fact, it can optionally use Template Toolkit as an XML transformation language. However, it deserves coverage here since it is also the home of some templating tools that are not represented elsewhere.

In its simplest form, AxKit maps XML files to XSL stylesheets that it can process using commonly available XSLT modules like XML::XSLT or XML::Sablotron. The rules for mapping a stylesheet to a request are flexible, and they can incorporate query strings, cookies and other attributes of the request. The idea is that you can use this feature to handle a wide variety of clients with differing display capabilities by choosing the right stylesheet.

Recognizing that not everyone is a fan of XSL's somewhat obtuse syntax, Matt Sergeant has provided an alternate stylesheet language called XPathScript. XPathScript allows you to write a stylesheet using text with embedded Perl code. This is similar to the other embedded Perl templating tools, but the focus is on using the built-in XPath functions for querying an XML document and manipulating the retrieved data. XPathScript can also be used in a declarative fashion, specifying the formatting of particular elements in the XML input. For example, this snippet will change all <foo> tags in an XML document to BAR in the output::

  <%
    $t->{'foo'}{pre}   = 'BAR';
    $t->{'foo'}{post}    = '';
    $t->{'foo'}{showtag} = 0;
  %>
  <%= apply_templates() %>

By using XPathScript's include function (which looks just like SSI), you can build libraries of useful transformations that use this technique.

This is all well and good if you have a bunch of XML files sitting on a disk somewhere, but what about dynamic content? AxKit handles this by allowing you to substitute a different data source for the default file-based one. This can include running some dynamic code on each request to generate the XML data that will be transformed. The distribution includes a module for doing this called XSP. XSP is a language for building an XML DOM using in-line Perl and tag libraries. The tag libraries are specified as stylesheets that can turn XML tags into Perl code. This is demonstrated through the included SQL tag library, which allows you to write an XSP page using XML tags that will connect to a database, execute queries and generate an XML document with the results.

AxKit has some nice performance boosts built into it. It can cache the full output of a page and serve it as a static file on future requests. It can also compress output to speed up downloads for browsers that understand gzip encoding. These can be done with other systems, but they require you to setup additional software. With AxKit, you just enable them in the configuration file.

If all of these languages, tag libraries and stylesheets sound intimidating to you, AxKit may be overkill for your project. However, AxKit has the advantage of being built on approved W3C standards, and many of the skills used in developing for it carry over to other languages and tools.

Apache::ASP

http://www.apache-asp.org/

Apache::ASP started as a port of Microsoft's Active Server Pages technology, and its basic design still follows that model. It uses in-line Perl with a compilation approach and provides a set of simple objects for accessing the request information and formulating a response. Scripts written for Microsoft's ASP using Perl (via ActiveState's PerlScript) can usually be run on this system without changes. (Pages written in VBScript are not supported.)

Like the original ASP, it has hooks for calling specified code when certain events are triggered, such as the start of a new user session. It also provides the same easy-to-use state and session management. Storing and retrieving state data for a whole application or a specific user is as simple as a single method call. It can even support user sessions without cookies -- a unique feature among these systems.

A significant addition that did not come from Microsoft ASP is the XML and XSLT support. There are two options provided: XMLSubs and XSLT transforms. XMLSubs is a way of adding custom tags to your pages. It maps XML tags to your subroutines, so that you can add something like <site:header page="Page Title" /> to your pages and have it translate into a subroutine call like &site::header({title => "Page Title"}). It can handle processing XML tags with body text as well.

The XSLT support allows the output of ASP scripts to be filtered through XSLT for presentation. This allows your ASP scripts to generate XML data and then format that data with a separate XSL stylesheet. This support is provided through integration with the XML::XSLT module.

Apache::ASP provides sticky widgets for forms through the use of the HTML::FillInForm module. It also has built-in support for removing extra whitespace from generated output, gzip compressing output (for browsers that support it), tracking performance using Time::HiRes, automatically mailing error messages to an administrator and many other conveniences and tuning options. This is a mature package that has evolved to handle real-world problems.

One thing to note about the session and state management in this system is that it currently only supports clusters through the use of network file systems such as NFS or SMB. (Joshua Chamas, the module's author, has reported much better results from Samba file sharing than from NFS.) This may be an issue for large-scale server clusters, which usually rely on a relational database for network storage of sessions. Support for database storage of sessions is planned in a future release. In the meantime, instructions are provided for hooking up to Apache::Session.

Text::Template

http://search.cpan.org/search?dist=Text-Template

This module has become the de facto standard general purpose templating module on CPAN. It has an easy interface and thorough documentation. The examples in the docs show a pipeline execution style, but it's easy to write a mod_perl handler that directly invokes templates, allowing a callback style. The module uses in-line Perl. It has the ability to run the in-line code in a Safe compartment, in case you are concerned about mistakes in the code crashing your server.

The module relies on creative uses of in-line code to provide things that people usually expect from templating tools, like includes. This can be good or bad. For example, to include a file you could just call Text::Template::fill_in_file(filename). However, you'll have to specify the complete file path and nothing will stop you from using /etc/passwd as the file to be included. Most of the fancier templating tools have concepts like include paths, which allow you to specify a list of directories to search for included files. You could write a subroutine that works this way, and make it available in your template's namespace, but it's not built in.

Each template is loaded as a separate object. Templates are compiled to Perl and only parsed the first time they are used. However, to take full advantage of this caching in a persistent environment like mod_perl, your program will have to keep track of which templates have been used, since Text::Template does not have a way of globally tracking this and returning cached templates when possible.

Text::Template is not tied to HTML, and is just a templating module, not a Web application framework. It is perfectly at home generating e-mails, PDFs, etc.

Template Toolkit

http://template-toolkit.org/

One of the more recent additions to the templating scene, Template Toolkit, is a flexible mini-language system. It has a complete set of directives for working with data, including loops and conditionals, and it can be extended in a number of ways. In-line Perl code can be enabled with a configuration option, but is generally discouraged. It uses compilation, caching the compiled bytecode in memory and optionally caching the generated Perl code for templates on disk. Although it is commonly used in a pipeline style, the included Apache::Template module allows templates to be invoked directly from URLs.

Template Toolkit has a large feature set, so we'll only be able cover some of the highlights here. The TT distribution sets a gold standard for documentation thoroughness and quality, so it's easy to learn more if you choose to.

One major difference between TT and other systems is that it provides simple access to complex data structures through the concept of a dot operator. This allows people who don't know Perl to access nested lists and hashes or call object methods. For example, we could pass in this Perl data structure:

  $vars = {
           customer => {
                        name    => 'Bubbles',
                        address => {
                                    city => 'Townsville',
                                   }
                       }
          };

Then we can refer to the nested data in the template:

  Hi there, [% customer.name %]!
  How are things in [% customer.address.city %]?

This is simpler and more uniform than the equivalent syntax in Perl. If we pass in an object as part of the data structure, we can use the same notation to call methods within that object. If you've modeled your system's data as a set of objects, this can be convenient.

Templates can define macros and include other templates, and parameters can be passed to either. Included templates can optionally localize their variables so that changes made while the included template is executing do not affect the values of variables in the larger scope.

There is a filter directive, which can be used for post-processing output. Uses for this range from simple HTML entity conversion to automatic truncation (useful for pulldown menus when you want to limit the size of entries) and printing to STDERR.

TT supports a plugin API, which can be used to add extra capabilities to your templates. The provided plug-ins can be broadly organized into data access and formatting. Standard data access plugins include modules for accessing XML data or a DBI data source and using that data within your template. There's a plugin for access to CGI.pm as well.

Formatting plug-ins allow you to display things like dates and prices in a localized style. There's also a table plugin for use in displaying lists in a multi-column format. These formatting plug-ins do a good job of covering the final 5 percent of data display problems that often cause people who are using an in-house system to embed a little bit of HTML in their Perl modules.

In a similar vein, TT includes some nice convenience features for template writers, including eliminating white space around tags and the ability to change the tag delimiters -- things that may sound a little esoteric, but can sometimes make templates significantly easier to use.

The TT distribution also includes a script called ttree, which allows for processing an entire directory tree of templates. This is useful for sites that pre-publish their templated pages and serve them statically. The script checks modification times and only updates pages that require it, providing a make-like functionality. The distribution also includes a sample set of template-driven HTML widgets that can be used to give a consistent look and feel to a collection of documents.

HTML::Template

http://search.cpan.org/search?dist=HTML-Template

HTML::Template is a popular module among those looking to use a mini-language rather than in-line Perl. It uses a simple set of tags that allow looping (even on nested data structures) and conditionals in addition to basic value insertion. The tags are intentionally styled to look like HTML tags, which may be useful for some situations.

As the documentation says, it ``does just one thing and it does quickly and carefully'' -- there is no attempt to add application features like form-handling or session tracking. The module follows a pipeline execution style. Parsed templates are stored in a Perl data structure that can be cached in any combination of memory, shared memory (using IPC::SharedCache) and disk. The documentation is complete and well-written, with plenty of examples.

You may be wondering how this module is different from Template Toolkit, the other popular mini-language system. Beyond the obvious differences in syntax, HTML::Template is faster and simpler, while Template Toolkit has more advanced features, like plug-ins and dot notation. Here's a simple example comparing the syntax:

HTML::Template:

  <TMPL_LOOP list>
      <a href="<TMPL_VAR url>"><b><TMPL_VAR name></b></A>
  </TMPL_LOOP>

Template Toolkit:

  [% FOREACH list %]
      <a href="[% url %]"><b>[% name %]</a></a>
  [% END %]

And now, a few honorable mentions:

HTML_Tree

http://homepage.mac.com/pauljlucas/software/html_tree/

As mentioned earlier, HTML Tree uses a fairly unique method of templating: it loads in an HTML page, parses it to a DOM and then programmatically modifies the contents of nodes. This allows it to use genuine valid HTML documents as templates, something that none of these other modules can do. The learning curve is a little steeper than average, but this may be just the thing if you are concerned about keeping things simple for your HTML coders. Note that the name is ``HTML_Tree,'' not ``HTML::Tree.''

Apache::XPP

http://opensource.cnation.com/projects/XPP/

XPP is an in-line Perl system that compiles to bytecode. Although it is a perfectly good implementation, it has little to differentiate it except for an easy mechanism to define new HTML-like tags that can be used to replace in-line code in templates.

ePerl

http://search.cpan.org/search?dist=Apache-ePerl

Possibly the first module to embed Perl code in a text or HTML file, ePerl is still a viable option in the form of Apache::ePerl. It caches compiled bytecode in memory to achieve solid performance, and some people find it refreshingly simple to use.

CGI::FastTemplate

http://search.cpan.org/search?dist=CGI-FastTemplate

This module takes a minimalistic approach to templating, which makes it unusually well-suited to use in CGI programs. It parses templates with a single regular expression and does not support anything in templates beyond simple variable interpolation. Loops are handled by including the output of other templates. Unfortunately, this leads to a Perl coding style that is more confusing than most, and a proliferation of template files. However, some people swear by this dirt-simple approach.

Performance

People always seem to worry about the performance of templating systems. If you've ever built a large-scale application, you should have enough perspective on the relative costs of different actions to know that your templating system is not the first place to look for performance gains. All of the systems mentioned here have excellent performance characteristics in persistent execution environments like mod_perl. Compared to such glacially slow operations as fetching data from a database or file, the time added by the templating system is almost negligible.

If you think your templating system is slowing you down, get the facts: pull out Devel::DProf and see. If one of the tools mentioned here is at the top of the list for wall clock time used, you should pat yourself on the back -- you've done a great job tuning your system and removing bottlenecks! Personally, I have only seen this happen when I had managed to successfully cache nearly every part of the work to handle a request except running a template.

However, if you really are in a situation where you need to squeeze a few extra microseconds out of your page generation time, there are performance differences between systems. They're pretty much what you would expect: systems that do the least run the fastest. Using in-line print() statements is faster than using templates. Using simple substitution is faster than using in-line Perl code. Using in-line Perl code is faster than using a mini-language.

The only templating benchmark available at this time is one developed by Joshua Chamas, author of Apache::ASP. It includes a ``hello world'' test, which simply checks how fast each system can spit back those famous words, and a ``hello 2000'' test, which exercises the basic functions used in most dynamic pages. It is available from the following URL:

http://www.chamas.com/bench/hello.tar.gz

Results from this benchmark currently show SSI, Apache::ASP and HTML::Embperl having the best performance. Not all of the systems mentioned here are currently included in the test. If your favorite was missed, you might want to download the benchmark code and add it. As you can imagine, benchmarking people's pet projects is largely a thankless task and Joshua deserves some recognition and support for this contribution to the community.

CGI Performance Concerns

If you're running under CGI, you have bigger fish to fry than worrying about the performance of your templating system. Nevertheless, some people are stuck with CGI but still want to use a templating system with reasonable performance. CGI is a tricky situation, since you have to worry about how much time it will take for Perl to compile the code for a large templating system on each request. CGI also breaks the in-memory caching of templates used by most of these systems, although the slower disk-based caching provided by Mason, HTML::Template and Template Toolkit will still work. (HTML::Template does provide a shared memory cache for templates, which may improve performance, although shared memory on my Linux system is usually slower than using the filesystem. Benchmarks and additional information are welcome.)

Your best performance bet with CGI is to use one of the simpler tools, like CGI::FastTemplate or Text::Template. They are small and compile quickly, and CGI::FastTemplate gets an extra boost since it relies on simple regex parsing and doesn't need to eval any in-line Perl code. Almost everything else mentioned here will add tenths of seconds to each page in compilation time alone.


Matrix

To help you choose a system, I'll summarize the basic characteristics of the major systems along the decision points I've explained in the beginning of the article. Keep in mind that in many cases a system can be used in more than one way, and I've simply shown the dominant method as seen in the documentation and real world use. You should not eliminate options based on this chart without reading the more detailed explanations above.

Application Framework Pipeline or Callback Parsing Method Language
HTML::Mason Framework Callback Compiled Perl
Template Toolkit Just Templates Pipeline Compiled Mini-Language
Apache::ASP Framework Callback Compiled Perl and XSL
HTML::Embperl Framework Callback Compiled Perl
SSI Just Templates Callback Repeated Parse Mini-Language
AxKit Framework Pipeline Compiled or Cached Parse Tree Perl and XSL and Mini-Language(s)
HTML::Template Just Templates Pipeline Cached Parse Tree Mini-Language
Text::Template Just Templates Pipeline Compiled Perl

Updates

These modules are moving targets, and a document like this is bound to contain some mistakes. Send your corrections to perrin@elem.com. Future versions of this document will be announced on the mod_perl mailing list, and possibly other popular Perl locations.

This Week on p5p 2001/08/15

Notes

This Week on P5P

POD Specification
Unicode Normalisation
Threading Semantics
perlmodstyle
Shrinking down the Perl install
Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

I'm back. This week saw the usual just-over-400 messages.

POD Specification

Sean Burke attempted to save the world:

Over the past several years, I have heard nothing but griping from all quarters about the perpetually underspecified state of perlpod, when considered as a pod specification. A markup language without a clear specification simply invites everyone, including implementors, to have their own shaky idea of what the language means. And that, along with the general tendency for markup language parsing to produce write-only code, explains much of the upsetting current state of Pod::* modules.

And so he did something about it. He wrote two documents, a completely rewritten perlpod and a new perlpodspec, which clarified the sense of POD without adding anything much in the way of new features. The spec in particular is extremely useful for those contemplating writing POD formatters. Jarkko reported that he'd be including them in 5.8.0, but Russ complained that the translators didn't fully match the specification. Jarkko's primary concern was cleaning up the horrendous state of the L<..> tag, which is often abused. Russ picked up primarily on Sean's request that translators assume the input text to be UTF-8, although what that actually means is not specified. Sean clarified what he meant in quite a long message:

First off, my intent is to declare Unicode to be POD's "the reference character set" (ugh, and I thought I could get out of this without using SGML jargon), for purposes of resolving E<number> sequences.

So, for example, if I say E<233>, that is to mean the e-acute character, because in Unicode, code point 233 is e-acute.

Whether "print ord 233" on your terminal prints an e-acute, an ess-tsett, a gimel, a "tsu" katakana, a double-dagger, or whether it lasers a hole thru your monitor's glass, is a whole different problem.

E<233> does NOT mean "simply pass a literal 233 blindly thru to the formatter". E<233> and its exact synonym E<eacute> both merely mean "make a reasonable attempt to make an e-acute".

This wandered off into the usual Unicode semantics debate. Philip Newton threw a brilliantly unexpected spanner in the works:

(Oh, by the way, if someone writes POD on an EBCDIC machine, *all* of the bytes will have code points > 127 AFAIK if the author sticks only to letters and numbers.)

Peter Prymmer also chastised the use of the word "ASCII" in the specification.

Sarathy suggested we mandate POD to be in Unicode, and move towards assuming all Perl code to be in UTF8, and mandating unicode semantics on things like chr $xwhen [$x] is less than 255. There did not appear to be anyone forging his posts.

Unicode Normalisation

And while we're on the subject of Unicode, Sadahiro Tomoyuki rocks my world. Oh my. Not only has he produced an alternative and much cleaner (and more complete) module to handle normalization of Unicode data, he achieved the near-impossible and implemented the Unicode collation algorithm detailed in Unicode Technical Report 10. This is a major bonus, since it allows us to correctly compare and store Unicode strings.

(Incidentally, Nick Ing-Simmons pointed out that "normalize" is incorrect, according to OED and Fowler. Surprisingly for p5p, this generated less comment than the technical content of the thread.)

Threading Semantics

Artur reported on the remaining problems with iThreads:

  • Request for threads->kill, to kill a given thread, is this something we should have? Should we try to do what POSIX does with all these cancelation points and cancelation callbacks? We know what mutexes we own so we can make sure they are all canceled. My belief [is] that this is needed.

    On unix this can be done by a signal(?), I don't know about Win32.

  • Right now, it seems like the parent interpreter must still stick around for its children. I think it will have to stay like this for 5.8 and something that can be fixed for 5.10 with a global SV arena; right now we use the first interpreter as the store.

  • Quitting the main thread; now, this kills all threads (as normal threading). However this usually comes at bad times. And we don't get proper cleanup (and segfaults). I think we should wait for all threads to finish, letting the user kill them.

  • lock, unlock, share, are now implmented in threads::shared, You have to manually unlock(), it is not scope based. I think the implementation of lock() unlock() should perhaps move into pp_lock and pp_unlock, share() cond_*() needs new reference prototype.

  • Sharing probably needs a new kind of magic, but I think that can wait until we see sharing working as we want.

  • A shared blessed object: should the destructor be called once for each thread, or in the thread where it actually is destroyed? Is there a way to catch blessings to magic/tied variables? If not, I guess we can not rebless shared data structures.

Dan Sugalski suggested that people ought to write shutdown synchronisation code themselves, to ensure that the shared data structures are in a sane state. Benjamin Stuhl asked for a detach method to disown other threads, so that the main interpreter can shutdown while blissfully ignorant of what's going on elsewhere. Dan made the very good point that we can provide surprising behaviour because:

When you come to threaded programming without any experience, everything is surprising, and lots of things don't make sense. Inexperienced thread programmers *will* screw themselves up. Repeatedly. Threads are a very powerful tool, but one with no guards to speak of.

The discussion got heated soon afterwards, with Artur trying to avoid coredumps because coredumps are bad, but Dan maintaining that most things that happen with threads are bad, and coredumps are occasionally unavoidable - if you exit your interpreter without shutting down all your threads, that's an abnormal exit and anything could happen.

Artur did, however, go ahead with the new shared_sv implementation, adding two files to core and core functionality for sharing, locking and correctly refcounting shared SVs. Wow.

perlmodstyle

Kirrily Robert (Skud) stormed onto the scene with a fantastic first contribution: she submitted a document on Perl module style conventions, which was generally pretty well received, with a few minor nits.

The ensuing thread had some very interesting thoughts on various style considerations. Benjamin Frantz did a benchmark of different parameter passing conventions, finding that passing arrays is 200,000ths of a second slower than passing hashes. So if you need a bit more speed from your Perl code, turn your arrays into hashes. This massive efficiency gain didn't impress Michael Schwern, though:

Speed isn't really the issue, clarity of interface is. You don't want to confuse things in first-time module author docs by bringing up efficiency arguments. It just confuses things.

This then turned into Yet Another CPAN Argument. Elaine Ashton did plea for help on behalf of the CPAN testers; if you want to be a hero, see testers.cpan.org.

Skud also hinted she's working on a perltesttut. She also tried to herd date and time module authors onto the datetime list. Not a bad start. Think you can do better?

Shrinking down the Perl install

Alan Burlison gave out the now-common howl: Perl has doubled in size from 5.005_03 to 5.6.1. However, he was trying to cram it onto the Solaris miniroot CD so that people can write Jumpstart scripts in Perl. Jarkko suggested throwing out pod/, (which Alan had already done) unicode/ (with a caution that attempting to use Unicode stuff would make demons fly out of your nose.) the headers from CORE/, plus a bunch of the libraries.

Dan Sugalski suggested stuffing everything into miniperl, statically linking in all the extensions.

Dave Mitchell asked what the big Unicode files were; Jarkko explained that they could be stored compressed or left out completely, since the information in them is stripped out into the various unicode/*.pl files. (By the way, unicode/ has been renamed unicore/ so that we can merge in Unicode::* modules without having to worry about case-folding filesystems.) The only casualty would be UnicodeCD.

Andy Dougherty pointed to the Debian perl-base package, which is a meagre 1.2Mb. This is possible by turning off the autoloader in some of the packages, and chopping down lib/ to the absolute bare essentials. The really serious cut, of course, would be to just ship perl, libperl.so and Config.pm.

Various

There was a reasonably big thrad on wild-card expansion on Windows that I just couldn't follow at all.

Artur asked why perl_run and not perl_destruct called the END blocks; Sarathy said it was a bug and wanted a patch, which Artur provided.

Jerrad Pierce asked how to get modules that exist in the core and not on CPAN, such as the fixed-up version of English. Schwern encouraged him to take the module back to CPAN.

I discovered two old, dust-covered opcodes; Abhijit Menon-Sen breathed live into one of them, rcatline, which optimizes $a .= <foo>. Actually, Abhijit did all sorts this week, including: fixing the calling of FIRSTKEY on tied hashes, allowing localised tying on filehandles, stopping FETCH being called twice on tainted data, and, to defend accusations that he only fixes bugs without adding new functionality, added the ever-useful panic operator.

Tony Bowden asked what would happen to the my Dog $sam declaration now pseudohashes are dead. The responses were: i) pseudohashes aren't dead yet, they just smell that way, and ii) nothing, since the fields pragma would still do the trick.

The old more-than-256-files-open-on-Solaris question came up again: this time, though, we had a sensible answer. PerlIO can be used to get around the Solaris limitation. Alan explained that there was a section in perlsolaris about the limitation, which was due to the extreme backwards compatibility of Solaris harking back to a time before it could conceive of more than 256 files.

Someone discovered that you can tie a variable with an object. The utility of this was debated, and Schwern conceded that there'd be no harm in documenting it. That'd be a nice little small task for someone.

Andy Dougherty dropped in a couple of BSD patches, to the Makefile generation and the hints file. Paul Johnson made B::Concise recognise padops.

James Duncan allowed BEGIN blocks to be more visible from B. Robin Houston asked whether or not we should move PL_minus_c from a bool to a U8 to give us more flexibility; apparently we currently to (PL_minus_c & 0x10) which, as Robin points out, is a "rather wrong thing to do to a bool". I wonder when the Perl core was formally forbidden from doing wrong things.

Until next week I remain, your humble and obedient servant,


Simon Cozens

Yet Another Perl Conference Europe 2001

Last year Europe had its first Yet Another Perl Conference. Leon Brocard, Jon Peterson and Greg McCarroll did a marvellous job organizing this grassroots conference - themed "The Art of Perl" - from scratch within two months. For the first time Europe had its chance meeting people like Nat Torkington, Tom Christiansen, Kevin Lenzo, brian d foy and Michael Schwern. Even before this conference started, a few people from the Netherlands got together and talked about the possibilities of organizing the next YAPC::Europe in Amsterdam.

Related Articles

The State of the Onion 5

Yet Another YAPC Report: Montreal

And so they did. Only this time, they had a whole year to organize it and they chose "Security" as the theme for the conference. Hardly any of The Big Names from the USA were coming to the conference this time, but a lot of European speakers were. And that should be considered a Good Thing - since of course there is a lot of Perl Talent in Europe!

The first day was filled with tutorials, whereas the second and third day were filled with one track containing security-related talks and two others with more general Perl talks. Three complete simultaneous tracks, all stuffed with lots of very interesting talks covering many facets of Perl - it was very hard to pick the talk you wanted to attend each time! Apart from the conference itself, there were quite a lot of Birds of a Feather meetings too (like CPANTS, PerlMonks, P5P, Concurrency and a Pub Crawl), which made the evenings also very entertaining.

The whole conference went extremely well. That's not just my opinion, which may be coloured because I was one of the organizers, but everyone agreed on that. Of course minor, things did not go as planned. For example, the renovation that was going on in the venue disturbed some talks, but everyone agreed that in general, things went extremely smoothly. Thanks to the wireless crew from HAL2001 everyone in the whole building had wireless access to the Internet. Many people said that the wireless access was even better than it was at TPC. Besides that, the computer lab offered 10 and 100mb regular ethernet access for laptops and had 30 desktop PCs ready to run.

And then there was the catering. There were free lunches for everyone, along with free coffee, tea and snacks on Thursday and Friday afternoon. We all got a good value for our money.

Thursday was Tutorial Day. I was session chair of the tutorials in $room[2], which got its name because the room was at the third floor and the Dutch call that the second. The tutorials in $room[2] were about GUI programming with Perl. First there was a 3 hour tutorial about using Tk for existing applications by Mark Overmeer and after lunch, a 3 hour tutorial about Gtk by Redvers Davies. In other rooms tutorials like Parse::RecDescent (by Abigail) and Object Oriented Programming (by Johan Vromans) were given.

The welcome speech on Friday was given by Kevin Lenzo. He explained a bit about the merger between YAS, PerlMongers and PerlMonks and told us not to worry because basically nothing would change for us. After that, Daniel Karrenberg -one of the pioneers of the Internet in Europe - gave his keynote speech. He talked about the time from the early days of RIPE NCC until today.

The rest of the day was filled with lots of interesting talks. For example, Artur "Sky" Bergman talked about POE, Robin Houston gave a talk about Mutagenic Modules, Schwern convinced the audience of the need for CPANTS and there were two sessions of lightning talks.

On Friday afternoon, Brian "Ingy" Ingerson explained all about his Inline module. Of course he started with Inline::C, talked about the other possibilities like Java, Python, C++ and the like, and could even announce Inline::Javascript by Claes Jacobsson that had just been finished. Things were really getting funny when he explained how he created a C interpreter, where you can use a kind of Inline::Perl which in turn uses Inline::C.

He also mentioned the hilarious Inline::PERL by John McNamara (which has now been released under the Acme:: namespace on CPAN) and explained why this itself is not a bad idea at all, since you are able to use Inline::Perl with a version number. That would, for example, allow you to use Perl 5 code in Perl 6...

Saturday morning started with a second keynote speech by Hugh Daniel about "The Current Tragedy of Common Free & Open Source Quality". I was too busy preparing my own talk about pVoice later that morning in the Iterative Software room (named after the sponsor) to attend that speech, but everyone agreed that it was a very good one.

That afternoon General M. Schwern ordered us all to be as strict and disciplined as possible in his talk about "Bondage and Discipline, or Strict beyond strict.pm". It was a pity Schwern's voice hadn't had the military training that was needed to keep shouting at us all.

The last talk that was given in the O'Reilly room (again named after the sponsor) was given by Jan-Pieter Cornet and Antony Antony, which showed us all the security bloopers we had all made during the conference. It turned out that during the whole conference they had sniffed the network using dsniff and analyzed the logs with a quickly hacked Perl script. The results were astonishing. More than half of the attendees had sent unencrypted passwords over the network, most of them not even good enough to be run through cracklib. All kinds of insecure protocols were used like Telnet, POP3, FTP and the like. The end of the story was that people who were going to attend the "Hackers at Large" conference the next week were strongly advised to use secure protocols and better passwords. The Perl community is probably friendly enough not to attack fellow Perlhackers, but don't be too sure that the audience of Hackers At Large will be so friendly...

After Kevin Lenzo and yours faithfully thanked everyone who had helped organizing and running the conference, Greg McCarroll led an auction where items that were generously donated by several sponsors were sold using the Dutch Auctioning system. That system means that you start at a high price and let the the price fall at a certain speed until someone says "stop". The price at the moment the person said "stop" is the price he or she has to pay. It worked very well, and was supported by a small Gtk application that Redvers Davies had written.

Not only books were auctioned this time, but also a few London.pm specific items - signed photos of Buffy and Willow, and the right to decide at what date the London.pm meetings are to be held - were auctioned. Dave Cross, writer of Data Munging with Perl, came with a very special item for the auction. The item is best described as

A module from Damian Conway that he will dedicate to you and which you can influence the purpose/topic of if you can come up with something sufficiently Damian-esque.
That particular item was sold for $200, and the whole auction raised more than 9000 Dutch guilders, some $4000, which will firstly be used to pay all left over costs of the conference. The remainder will be spent on a Perl advocacy project in the Netherlands and on startup costs of next years' YAPC, which will either be in Munich or Paris. Both cities have Perl Monger groups who want to organize it, and YAS will have to make a decision based on the proposals they submit.

The second edition of Yet Another Perl Conference in Europe was a great success. No complaints were heard and everyone was enthusiastic. That's not due to the merit of the organizing committee, but merely the merit of the attendees and speakers. May the next conference be just as successful!


Return to Perl.com

This Week in Perl 6 (5 - 11 August 2001)

Notes

This Week in Perl 6

Notes

Damian's Perl 6 Talk, Take Two

Opcode Dispatching

Modules

given (Properties) { when: Proper }

Eval/o's Vs. Evil Woes

  • PDDs Released
  • You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

    Please send corrections and additions to bwarnock@capita.com.

    There were 129 messages across 19 threads, with 31 authors contributing.

    Damian's Perl 6 Talk, Take Two

    Damian Conway has released an updated version of his Perl 6 slides (PDF). Plenty of hints of things to come in this one... or of things that came.

    Opcode Dispatching

    (32 posts) There were some brief peeks into the opcode dispatch engine. Here, Dan Sugalski explains why he wants 32-bit-wide opcode tables...

    >but a pure 32 bit table can be very large (and sparse) if we have those gaps you seem to propose below.

    True, but only if we actually use the whole set. We're 32 bit for a few reasons, none of which are to provide a billion opcodes:

    * So all the pieces of the op stream are the same size

    * It avoids alignment problems we'll see on many processors

    * It means the endian preprocessor we'll need on some platforms can be very fast, as it can just byteswap everything rather than actually need to know which pieces are which size.

    Basically everything gets compacted down at one end, and we'll probably (dynamically) limit the table size to 1K custom opcodes in force at once or something.

    ...and how multiple dispatch tables will be handled.

    Simple. As I said, the opcode function table is lexically scoped. Jump into code from another scope and the 'correct' function table is automagically (well, OK, there's an opcode for this, but...) installed for you. So while opcode 774 used to be "socketpair", in the new lexical scope it might be "LWP::Simple::get", but that's OK because since we swapped in a new table everything's just fine.

    Other brief discussions included Dan on opcode numbering, shared libraries, and event handling; and myself on the latest in testing.

    Modules

    (28 posts) Kirrily Robert posted what she feels the Perl 6 community should do for the next generation of Perl's reusable code base.

    At YAPC, I told Nat I wanted to get involved with modules-related work for Perl 6. To that end, I've put together a bit of a list of what I think needs to be done in that area. The list appears below, in POD format. If you're interested in being involved in this stuff, or just have comments, please follow up to perl6-stdlib@perl.org, which is probably the most suitable place available to us.

    There was lively discussion, mostly giving examples of the points she made in her posting. (Namespace messes, implementation issues, etc.)

    given (Properties) { when: Proper }

    (10 posts) The discussion on properties continued with Damian Conway posting the revised draft he sent to Larry for (re)consideration, and answering a few questions on global variables:

    But, yes, I would fully expect that the global punctuation I/O control variables will become attributes/properties/traits of individual filehandles.

    (2 posts) Damian also explained in detail about the given/ when construct.

    Eval/o's Vs. Evil Woes

    (5 posts) David Nicol proposed an o switch for string evals - indicating 'compile once', a la the regex o switch - to solve some issues he has with alternate syntaxes.

    Bart Lateur and Marc-Oliver Ihm went one step further, by just having eval compile the code and return the compiled code block.

    PDDs Released

    (2 posts) Dave Mitchell released the final draft for his Conventions and Guidelines for Perl Source Code PDD.

    (20 posts) Dan Sugalski previewed his first draft on Perl's assembly language. The initial discussion pointed out some areas for needed clarification, and coverage of this document will pick up as it matures.


    Bryan C. Warnock

    This Fortnight In Perl 6 (July 22 - Aug. 4, 2001)

    Notes

    Table of Contents

    The Perl Conference 5.0

    The Perl 5 Porters Meeting

    The State Of The Onion

    The P 6 Q & A

    Sarathy Speaks

    The Opcode BOF

    The Mailing Lists

    You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

    Please send corrections and additions to bwarnock@capita.com.

    My apologies, again, for the lack of a summary recently. My last one, covering July up to the beginning of The Perl Conference, was apparently lost in the shuffle or the noise that was TPC, because it never made its way to the list or perl.com. And fool that I am, I didn't bug Simon about it, fearing I would be burdening an already overloaded schedule (shades of Warnock's Dilemma). But have no fear. If you've a penchant for ancient history, you can find it here in my personal archive.

    As far as current news, there were 120 messages across 28 threads, with 34 authors contributing, most of which occurred the week following the conference.

    The Perl Conference 5.0

    Despite accounts to the contrary, Perl 6 actually played a very small role in the production that was The Perl Conference.

    This isn't a bad thing, mind you. Perl 6 still exists only in the minds of a few select individuals, and it would make a very uninteresting conference to have tutorials and sessions based on a language that doesn't exist, and is still in flux. Thus, with a few exceptions, TPC was geared around the here-and-now of Perl 5.

    Perl 6..., well, who cares anything about Perl 6?

    But Larry didn't recant last year's proclamation, and Jon Orwant didn't throw mugs asking why we were working on a new version when there was so much interest in the old.

    So Perl 6 moves on. Slowly but surely. In no hurry. And so, a brief overview of what did happen at TPC with Perl 6.

    The Perl 5 Porters Meeting

    Before the conference even started, the attending Perl 5 Porters got together for a business meeting (of sorts), to talk about the past, present, and future of Perl. Nat Torkington was kind enough to allow me to attend, and there was a long enough break in the Perl 5 action for he and Dan Sugalski to summarize what has been done so far for Perl 6.

    The State Of The Onion

    Larry's State Of The Onion was all Perl 6, or at least what he could get out 55 seconds at a time. It was, we hope, a preview of some upcoming Apocalypses. You can read Simon Cozen's review here, and Nat announced an MP3 recording of it on perl5-porters.

    The P 6 Q & A

    Larry, Nat, Dan, Simon, Damian Conway, and Chip Salzenberg led a session for the general public, iterating what has been done ( Topaz, RFCs, etc.), and what is left to do.

    'Being Dan Sugalksi'

    Folks were later able to get a glimpse at the inside of Perl 6 through the eyes of Dan, who is heading up the internals effort. His slides can be found here.

    Sarathy Speaks

    Perl 5.6 pumpking Gurusamy Sarathy finished the week with his vision of what Perl 6 should be. It was a much different perspective from the ones pitched before, as it was based on what he sees as the problems in Perl 5, untainted by the Perl 6 process (which he has not been following).

    In some cases, there was complete overlap (garbage collection and more pluggable components, for instance); in others, a different perspective (heavier reliance on facilities provided by the underlying architecture - physical or virtual - instead of rolling our own).

    The Opcode BOF

    Dan, Simon, and I attended Uri Guttman's Op Code BOF to try to flesh out more details of the op code and event dispatch loop. Gurusamy and newly appointed pumpking Hugo van der Sanden also sat in.

    The meeting was short and vociferous. About the only major idea added to Dan's current PDD was a prioritization scheme for event handling. No notes are posted anywhere, so the following comes from my own scribblings:

    Events will be priority based, much like a real CPU. System events - signals, etc. - will naturally have a higher priority than user events. The opcode dispatch loop will check for events after each opcode, and if there are queued events at a higher level than our current state, the opcode loop will exhaust them. (That prevents events from interrupting themselves, since the event handlers themselves will consist of opcodes, between which events will be checked for.)

    Obviously, there are still a lot of details to be worked out, but that was the plan at the time.

    The Mailing Lists

    Opcode Dispatch, Continued

    (6 posts) In blatant defiance of Knuth, I decided to test some of the ideas from opcode discussions of past and present, and coded a couple variations of dispatchers to get a feel for performance. Following some suggestions from Nick Ing-Simmons and Dan Sugalski, testing continues.

    Module Versioning

    (12 posts) I also did a brain dump of some ideas on module versioning. There was some brief discussions on what could (and couldn't) and should (and shouldn't) be done. Bart Lateur pointed everyone to an old message from Ilya Zakharevich, where he suggested a potential solution to the conflicting DLL issue.

    Interestingly enough, a similar discussion took place on perl5-porters at about the same time.

    Coding Conventions

    (1 post) Dave Mitchell released an update for his PDD on Perl coding conventions and guidelines.

    Adding Methods To Builtins

    (1 post) Brent Dax suggested a method (no pun intended) for adding methods to the builtin types.

    If Then Else Otherwise

    (22 posts) Raptor started a thread that began with a specialized conditional block construct tuned to the return values of <=> and cmp called if-else-otherwise, and later shortened to ?::. Once the participants worked past the actual syntax of his proposal, it was clear that he was looking for some form of computed goto, a la FORTRAN's arithmetic conditional, IFA. There were plenty of examples of how many different ways you could accomplish the same in Perl 5.

    (7 posts) David Nicol then extrapolated this into a generic macro language proposal for Perl 6.

    Circular References

    (10 posts) Ilya Sterin asked about the difficulty in solving the circular reference problem, which several folks kindly answered.

    Now's a good time to re-advertise that Perl 6 will have a better garbage collection system.

    Properties, Again

    (12 posts) I had to go and revisit some potential problems in the (currently proposed) property mechanism. Dan quietly pointed out that Damian and Larry were working on it.

    Documentation

    (9 posts) Adam Turoff explained a little about the Perl Documentation Project - a project similar to the Linux Documentation Project. He will speak more on this later.

    Last Words

    One of the recurring Perl 6 questions at TPC was documentation. "The Apocalypse and Exegesis series are nice, but is there a complete picture of what Perl 6 is going to look like?"

    Soon, there will be. I'm working on two forms of "snapshots" of what Perl 6 is, in parallel with Larry's pronouncements and various implementations being done. The first is to be a dull, dry, boring, reference document, which will briefly describe the current feature set of Perl 6, and how it ties to the internals. The second will be a Perl 6 version of the existing PODs - or at least all the ones that make sense. As Larry and Damian leak various tidbits of Perl 6, I'll update those documents to add any new features, or rip out dead Perl 5 ones.


    Bryan C. Warnock

    This Week on p5p 2001/08/07

    Notes

    This Week on P5P

    Subroutine Prototypes

    -Wall fixes

    The Great SDK Debate

    New Stuff!

    Various

    Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

    For the final time, this is a somewhat abridged summary, since I'm still on the road. This week is Washington and Philadelphia.

    Subroutine Prototypes

    Jeff Pinyan started the ball rolling with a collection of ideas he'd had on how to extend the subroutine prototype system. Strangely enough, these ended up looking more and more like regular expressions. Unfortunately, the discussion collapsed under the weight of its own complexity, as people suggested that the prototype for a subroutine which did something like split would have to be ((&($$?)?)?). Jarkko tried to focus the discussion away from operations such as repetition and grouping, and toward better type coercion - specifically, a generic reference-to-anything operation. (It was more or less agreed that this ought to be \.) Kurt Starsinic came up with a neat set of guidelines to aid thinking about prototypes; Chris Nandor complained that people were adding more incomprehensible line noise to Perl instead of taking it away.

    Jeff did come up with a patch to make \? take a reference, so it would be easy to change that to \. - I don't know if this has been done or integrated yet.

    -Wall fixes

    Well done to Richard Soderburg, who produced a set of neat patches which make Perl compile without warnings under -Wall on his platform, FreeBSD.

    Paul Marquess noted that it would probably be handy to have Perl compile itself with -Wall but compile other extensions more loosely; Jarkko saw to it.

    The Great SDK Debate

    Tim Bunce kicked off the discussion on SDKs by pointing out two flaws in our treatment of SDK and module selection:

    One was the understanding that we should select and include only _one_ module for any given area of functionality. Thereby forcing us into futile debates about which of several similar modules to include.

    I see no problem with 'lowering the bar' of entry into an SDK. The requirements should be simply that the module: is reasonably useful; is reasonably well documented; the interface is reasonably stable. If that means we end up with an SDK with five different date modules, so what?

    The other mistake was that we were trying to define 'the' (one) SDK. I'd rather see multiple SDKs (web centric, database centric, xml, net, etc). Sure they'll be overlaps, but that's a bonus not a problem.

    He then asked that we start thinking about defining these multiple SDKs. Matt Sergeant added another mistake: that the creation of SDKs was too big a task for two or three individuals.

    Kurt Starsinic asked why SDKs couldn't be built on the Bundle model. Adam Turoff provided his notes from the P5P meeting at TPC:

    The Bundle approach doesn't capture versions of the included modules, just the module names. An SDK version must be in a Known Good State (tm) and should be versioned itself.

    Bundles don't work because they're a quick hack at a meta distribution. If someone downloads a 2K bundle file and goes offline, they don't have all of the modules they need to install (and can't throw that 2K bundle file around inside the firewall effectively). This says to me that an SDK needs to be a packaged version of a specific set of module distributions (compressed or not) that comprise a specific version of that SDK.

    We probably want to take a page from the [x]emacs folks and keep the SDKs alongside the core distribution on CPAN and other such mirrors. There's the core emacs source + about 3-5 groups of additional packages, and emacs-sumo which contains everything. Perhaps there's room for perl-5.8-web-1.3.tar.gz alonside perl-5.8-win32-2.3.tar.gz, etc., which should be a simple matter of packaging at distribution time (and repackaging at SDK upgrade time).

    There's an unresolved issue with DLL Hell; if two modules include a common module (e.g. Date::Broken), SDK 1 may function with version 3.14, and SDK 2 may function with version 2.78, but both SDKs can't be installed concurrently because of conflicting requirements with that common module.

    John Peacock noted that we could solve some of these problems by either allowing multiple versions of a module to co-exist, or extending the use semantics to ask for precise versions.

    Elaine came out in support of bundles, disagreeing with some of Adam's points. (or rather, the points that Adam had summarized from the general discussion) Notably, bundles can require individual versions from individual modules.

    Peter Scott said that we don't have to care whether or not we make the modules somehow "harmonise" - so long as we reduce the set of version numbers that administrators have to worry about from, say, fifty individual modules down to five SDKs. Elaine disagreed that this is a good idea, as it reduces overall control from the administrator. The people who will care about which versions are installed now don't get all the information, and those who don't care still won't care, so they're not helped either.

    Everyone's agreed that we need SDKs, which is the main thing, but opinions differed on how to make one "official" and how to ensure that the user installs what they need. Isn't that exactly where we were last week?

    New Stuff!

    Jarkko announced that we had a new port officially supported in the distribution - Perl now cross-compiles "out of the box" to WinCE. He also (finally) decided to merge Artur's wonderful threads modules into core. Hoorah!

    There's also a new cool Configure feature of being able to tell it what additional modules you want to have downloaded and installed from CPAN - porters following bleadperl will see another question in the interactive run, or may specify -Dextras="..." to tell Configure which modules to get.

    Various

    Ilya's "retirement" is looking more and more dubious as he again piped up with a bunch of patches this week, this time for h2xs. Gerrit Haase spend a while investigating upgrading fcrypt.c in Win32, but found that the version distributed with current versions of libdes already has the Perl patches folded in.

    Doug McEachern fixed up a slew of weird threads bugs and other leaks, as usual. I made the optimizer pluggable - optimizer.pm is on CPAN, and lets you do Great Things.

    Peter Prymmer produced some test harness cleanups, plus some Win32 build fixes. Thanks, Peter!

    Jarkko asked if Storable should use B::Deparse to serialize code blocks. Casey West had already done that months ago, but nobody was listening. This just goes to show that if you do cool stuff for Perl, people might not listen the first time but they'll come around to your way of thinking in the end.

    Until next week I remain, your humble and obedient servant,


    Simon Cozens

    Quantum::Entanglement

    There Is More Than One World (In Which) To Do It

    With the possible exception of many physicists, quantum mechanics is one of the stranger things to have emerged from science over the last hundred years. It has led the way to new understanding of a diverse range of fundamental physical phenomena and, should recent developments prove fruitful, could also lead to an entirely new mode of computation where previously intractable problems find themselves open to easy solution.

    The Quantum::Entanglement module attempts to port some of the functionality of the universe into Perl. Variables can be prepared in a superposition of states, where they take many values at once, and when observed during the course of a program, will collapse to have a single value. If variables interact then their fates are linked so that when one is observed and forced to collapse the others will also collapse at the moment of observation.

    It is quite hard to provide a complete version of quantum mechanics in Perl, so we need to make some simplifications. Instead of solving thousands of equations each time we want to do something, we will forget entirely about eigen-functions, Hermitian operators and other mathematical hurdles. This still leaves us with plenty of ways to make Perl behave in a thoroughly unpredictable fashion.

    The entangle() function

    The Quantum::Entanglement module adds an entangle() function to Perl, this takes a list of amplitudes and values and returns a scalar in a superposition of values; saying

    
      $die = entangle( 1=>1, 1=>2, 1=>3, 1=>4, 1=>5, 1=>6);

    creates a superposition of the values 1..6. From now on, $die acts as if it has every one of those values at the same time as long as we do not try to find out exactly which one.

    Observation and Collapse in Perl

    We now need to decide what happens when we observe our variable, and what we mean by observe? Taking a broad definition as being ``anything that reveals the values which a variable has'' seems about right. Perl provides us with many ways of doing this, there are the obvious acts of printing out a variable or testing it for truth, but even operators such as eq or <= tell us something.

    How do we decide which way a variable collapses? Well, each possible value has an associated probability amplitude, so all we need to do is build up a list of distinct outcomes, add up the amplitudes for each one, square the result, then use this to bias the value (or values) to which the variable collapses.

    As every coefficient of the superposition in $die is equal to 1,

    
     print "You rolled a $die.\n";

    will output You rolled a 1. or You rolled a 2. and so on, each for one sixth of the time.

    Entanglement and Simple Complex Logic

    Whenever superposed variables interact, or are involved in calculations, the results of these as well as the variables themselves become entangled. This means that they will all collapse at the same time, so as to remain consistent with their history. This emulates the entanglement, or ``spooky action at a distance'', which so worried Einstein.

    Complex Amplitudes and Entanglement in Perl

    If we can have plain numbers as the coefficients of our superpositions it seems sensible that we could also use complex numbers. Although instead of just squaring the number when working out our probability, we need to square the size of the number. (eg. |1+2i|**2 == 5 == |1-2i|**2.)

    The Quantum::Entanglement module allows subroutines to create new states (amplitude-value pairs) based on the current set of states by using the function q_logic. This takes as an argument a subroutine which is presented each state in turn and must return a new set of states constructed from these.

    Starting our program with:

    
     #!/usr/bin/perl -w
    
     use Quantum::Entanglement qw(:DEFAULT :complex);
    
     $Quantum::Entanglement::destroy = 0;

    so that we have access to the constants defined by Math::Complex and turn off the memory management performed by the module (as this causes some information to be lost, which will be important later). We then define a subroutine to return the value it receives and its logical negation, their coefficients are those of the original state multiplied by i/sqrt(2) and 1/sqrt(2) respectively:

    
     sub root_not {
    
       my ($prob, $val) = @_;
    
       return( $prob * i / sqrt(2) , $val,
    
                   $prob / sqrt(2) , !$val );
    
     }

    We then create a superposition which we know is equal to 0 and feed it through our root_not() once:

    
     my $var = entangle(1 => 0);
    
     $var = q_logic(\&root_not, $var);

    the variable is now in a superposition of two possible values, 0 and 1, with coefficients of i/sqrt(2) and 1/sqrt(2) respectively. We now make our variable interact, storing the result in $peek. As $var is in a superposition, every possible value it has participates in the calculation and contributes to the result.

    
     my $peek  = 12*$var;   # $peek and $var become entangled
    
     $var = q_logic(\&root_not, $var);

    We then feed $var through root_not() one more time and test it for truth. What will happen and what will be the value of $peek?

    
     if ($var) { print "\$var is true!\n"; }
    
     else      { print "\$var is false\n"; }
    
    
    
    
    
     print "\$peek is equal to: $peek.\n";

    The output is always $var is true! as $var is in a final superposition of (1/2=0, i/2=>1, -1/2=>0, i/2=>1)>. You can convince yourself of this by running through the math. What about $peek? Well, because it interacted with $var before $<var> collapsed and both possible values that $var had at that time contributed to its eventual truthfulness, both values of $peek are still present, we get 0 or 12 each for half the time.

    If we reverse the order in which we examine the variables:

    
     print "\$peek is equal to: $peek.\n";
    
    
    
    
    
     if ($var) { print "\$var is true!\n"; }
    
     else      { print "\$var is false\n"; }

    we still see peek being 0 and 12 but as we collapsed $peek we must also collapse $var at the same time. This causes $var to be in a superposition of (1/2=0,i/2=>1)> or a superposition of (-1/2=0,i/2=>1)>, both of which will collapse to 0 half of the time and 1 the other half of the time so that (on average) we see both phrases printed.

    If we try to find the value that $var had while it was `between' the subroutines we force it to have a single value so that after two passes though root_not() we get random noise, even if we test this after the event. If, on the other hand, we leave it alone it emerges from repeated application of root_not() as the logical negation of its original value, thus the name of our subroutine.

    Beneath the Veil

    Although the module is intended to be used as a black box which does the Right Thing (or some close approximation to it), the internals of the code are interesting and reveal many features of Perl which may be useful elsewhere.

    Writing entangled behaviour into Perl presents an interesting challenge; a means of representing a superposition is required, as is some way of allowing different variables to know about each other without creating a twisty maze of references which would stand in the way of garbage collection and lead to a certain programming headache. We also need a means to cause collapse, as well as a robust mechanism for dealing with both real and complex numbers. Thankfully Perl provides a rich set of ingredients which can more than satisfy these requirements without making the job so hard that it becomes impossible.

    Objective Reality

    We want to represent something which has many values (and store these somewhere) while making it look like there's only one value present. Objects in Perl are nothing more than scalars that know slightly more than usual. When a new entanglement is created, we create a new object, and return that to the calling program. Deep within the module we have a routine which is similar to:

    
     sub entangle {
    
       my $self = [ ~ data goes in here ~ ];
    
       return bless $self, 'Quantum::Entanglement';
    
     }

    exactly how we store the data is covered below. We then turn this into a 'core' function by importing it into the namespace which asked for it.

    When Worlds Collide

    We've created a superposition of values and sent it back to our user. What needs to happen when they write something like:

     $talk = entangle( 1=>'Ships',    1=>'Sealing Wax',
                       1=>'Cabbages', 1=>'Kings'        );
     $more = $talk . ' yada yada yada';

    We want to redefine the meaning of concatenation when an entangled object is involved. Perl lets us do this using the overload module. Within the Quantum::Entanglement module we say:

     use overload
            '+'  => sub { binop(@_, sub{$_[0] + $_[1]} ) },
         # more ...
            '.'  => sub { binop(@_, sub{$_[0] . $_[1]} ) },
         # yet more ...

    Whenever someone applies the '.' operator to our object, a subroutine (in this case an anonymous one) is called to handle the operation, the result of this subroutine is then used as the result of the operation. Because the module provides new behaviours for all of Perl's operations, we write a generic routine to handle Binary Non-observational Operations and pass this the values to operate on along with another anonymous routine (which it will see as a code-ref) so that it knows which operation to perform. This allows us to re-use the code which works out if both operands are objects and if they are reversed and pieces together the data structures we use. binop is described below.

    Data Structures with Hair

    This module lives and dies on the strength of its data structures. We need to ensure that every variable (or, more correctly, object) knows about all the other superpositions it has been involved with throughout the course of the program without having any direct pointers between them.

    When we create a new variable, we give it the following structure:

     sub entangle {
       my $universe = [ [ @_[0,1] ], # amp1, val1
                        [ @_[2,3] ], ...  ];
       my $offsets  = [];
       $var = [ \$universe, 1, \$offsets];
       $offsets->[0] = \ $var->[1];
       return bless $var, 'Quantum::Entanglement';
     }

    there's a lot going on here, so pay attention. $universe is a list of lists (lol), essentially a two dimensional table with the first two columns holding the amplitudes and values of our superposition. $var contains a reference which points at a scalar which then points at the universe, rather like this:

     ($var->[0]) ---> (anonymous scalar) ---> $universe

    The second value in $var is a number which indicates the column in the universe that we need to look at to find the values of our superposition. The last field of $var again points to a pointer to an array. This array though contains a scalar which points directly at the scalar which holds the number representing the offset of the values in the universe, something like this:

     $var[ (->X->universe), (number), (->Y->offsets[  ])  ]
                                \------<----<-------/

    Now, when we want this object to interact with another object, all we need to do is make $var->[0] and $var->[1] for each object end up refering to the same universe. Easy, you might say, given that we have both objects around. But what if one had already interacted with another variable, which we cannot directly access anymore? This is where our extra level of indirection is required. Because each variable contains something which points at something else which then points at their set of values, we merely need to make sure that the 'something else' ends up pointing at the same thing for everything. So, we delve into each object's universe, choosing one which will contain the data for both objects (and thus for all those which have interacted in the past) and move all the data from the other object's universe into it. We then make our middle reference the same for each object.

    Initially,

    
     universe1 = [[a1,av1],      [a2,av2]      ,... ]
    
     universe2 = [[b1,bv1,c1,cv1],[b2,bv2,c1,cv1],... ] 
    
     $var1[ (->X->universe1), 1,... ] # we have this object
    
     $var2[ (->Y->universe2), 1,... ] #  and this object
    
     $var3[ (->Y->universe2), 3,... ] # but not this one

    then by pointing Y at universe1 the whole structure of our objects becomes

    
     universe1 = [[a1,av1,b1,bv1,c1,cv1],[a2,v2,b1,bv1,c1,cv1] ,... ]
    
     $var1[ (->X->universe1), 1,... ] # we have this object
    
     $var2[ (->Y->universe1), 3,... ] #  and this object
    
     $var3[ (->Y->universe1), 5,... ] # but not this one

    To allow every possible value of one variable to interact with every possible value of our other variables, we need to follow a crossing rule so that the rows of our merged universe look like this:

    
     universe1   universe2            result
    
     a1 av1      b1 bv1 c1 cv1      a1 av9  ]
    
                                \------<----<-------/

    Now, when we want this object to interact with another object, all we need to do is make $var->[0] and $var->[1] for each object end up refering to the same universe. Easy, you might say, given that we have both objects around. But what if one had already interacted with another variable, which we cannot directly access anymore? This is where our extra level of indirection is required. Because each variable contains something which points at something else which then points at their set of values, we merely need to make sure that the 'something else' ends up pointing at the same thing for everything. So, we delve into each object's universe, choosing one which will contain the data for both objects (and thus for all those which have interacted in the past) and move all the data from the other object's universe into it. We then make our middle reference the same for each object.

    Initially,

    
     universe1 = [[a1,av1],      [a2,av2]      ,... ]
    
     universe2 = [[b1,bv1,c1,cv1],[b2,bv2,c1,cv1],... ] 
    
     $var1[ (->X->universe1), 1,... ] # we have this object
    
     $var2[ (->Y->universe2), 1,... ] #  and this object
    
     $var3[ (->Y->universe2), 3,... ] # but not this one

    then by pointing Y at universe1 the whole structure of our objects becomes

    
     universe1 = [[a1,av1,b1,bv1,c1,cv1],[a2,v2,b1,bv1,c1,cv1] ,... ]
    
     $var1[ (->X->universe1), 1,... ] # we have this object
    
     $var2[ (->Y->universe1), 3,... ] #  and this object
    
     $var3[ (->Y->universe1), 5,... ] # but not this one

    To allow every possible value of one variable to interact with every possible value of our other variables, we need to follow a crossing rule so that the rows of our merged universe look like this:

    
     universe1   universe2            result
    
     a1 av1      b1 bv1 c1 cv1      a1 av1  b1 bv1  c1 cv1
    
     a2 av2    * b1 bv1 c2 cv2  ==> a1 av1  b1 bv1  c2 cv2
    
                                    a2 av2  b1 bv1  c1 cv1
    
                                    a2 av2  b1 bv1  c2 cv2

    so that every row in the first universe is paired with every row of the second. We then need to update the offsets for each variable which has had data moved from one universe to another. As the offsets array contains pointers back to these values, it is easy to increase each one by the correct amount. So, given two entanglements in @_, and a bit of cheating with map, we can say

    
      my $offsets1 = ${$_[0]->[2]}; # middle-man reference
    
      my $offsets2 = ${$_[1]->[2]};
    
      my $extra = scalar(@{ ${$_[0]->[0]} });
    
      push @$offsets1, map {$$_+=$extra; $_} @$offsets2;
    
      ${$_[1]->[2]} = $offsets1;

    and you can't get clearer than that.

    So binop is written like so (assuming that we can only be given two entangled variables in the correct order, for the full story, read the source):

    
     sub binop {
    
        my ($obj1,$obj2,$r,$code) = @_;
    
        _join($obj1,$obj2);   # ensure universes shared
    
        my ($os1, $os2) = ($obj1->[1],$obj2->[1]);
    
        my $new = $obj1->_add(); # new var also shares universe
    
        foreach my $state (@{${$obj1->[0]}}) {
    
           push( @$state, $state->[$os1-1]*$state->[$os2-1],
    
                          &$code( $state->[$os1], $state->[$os2] );
    
        }
    
        return $new;
    
     }

    or, in English: make sure each variable is in the same universe then create a new variable in the same universe. For every row of the universe: add two extra values, the first is the product of the two input amplitudes, the second is the result of our operation on our two input values. Here you see the tremendous value of code reuse, no sane man would write such a routine more than once. Or, more correctly, no man would remain sane if they tried.

    London Bridge is Falling Down

    How do we collapse our superpositions so that every entangled variable is affected even though we can only access one of them at once? When we perform an observational operation (if ($var){...}, say) we simply need to split our universe (table of values) into two groups, those which lead to our operator returning a true value and those that do not. We add up the probability amplitudes for each value in each group, square these to get two numbers and use these to decide which group to keep. To cause our collapse we merely need to delete all the rows of the universe which form the other group which will remove any value of any variable in that row.


    Getting the Module

    The module distribution, like all good things, is available from the CPAN and includes a few short demonstrations of what the module can do, along with plenty of explanation (including Shor's algorithm and the square root of NOT gate outlined above). The source of this, and any other module on the CPAN, is available for inspection. If you have a burning desire to find out how the mystical wheel was first invented, Perl, and its community, will gladly show you.

    This Week on p5p 2001/07/30

    Notes

    This Week on P5P

    Hash "clamping"

    P5P Meeting

    Asynchronous Callbacks

    h2ph

    iThreads

    Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

    Again this is a somewhat abridged summary this week, since I'm still on the road, but also because last week was the Perl conference and so there was very little activity.

    Hash "clamping"

    The discussion of Jeffrey Friedl's idea of hash "clamping" - that is, disallowing new hash key creation - rolled on. Nick encouraged Jeffrey to make use of the HV's existing readonly bit to signal "clampedness" instead of using up another flag bit, but Jeffrey disagreed that this makes semantic sense: readonlyness would refer to the values as well as the keys. Abigail took a more holistic view:

    I think this is yet another example of "there's something wrong/missing in Perl, let's fix it with a kludge".

    The real underlaying problems here are 1) Perl doesn't have structs, but more importantly, 2) Perl only gives you one instance variable per object. Pseudo-hashes tried to fix the symptoms, not the cause and they didn't succeed. "clamp" tries to fix the symptoms too - and not the cause.

    The problem isn't really that you can make typos with hashkeys, the problem is that you're almost forced to use hashes in cases you really wanted something else, but the best alternative Perl gives you are hashes.

    While I agree such a "clamp" function feels better than pseudo-hashes, I feel less than thrilled by the idea explaining the reasoning about "clamp" to my students - or collegeas for that matter.

    He also disagreed (as did I) with the idea that something like this, which can be done with a simple tie, should require a change in the language. I proposed Tie::SecureHash which does what was requested but without the speed of a core hack.

    P5P Meeting

    At TPC, a motley crew of developers got together to discuss the state of Perl development and raise issues of interest. Jarkko started the ball rolling by talking about development. He outlined some of the new things and also some of the deprecations in 5.7.x, which you should all probably know if you've been following these summaries; he also outlined things that had been put off for the future so that he wouldn't have to care too much about them: Artur's ithreads work, bignum, Unicode, and Inline. He mentioned that a hopeful timetable for 5.8.0 would be around three months. After that, Hugo would take the helm for 5.9.0, which would include a cleanup of the regexp engine plus a phased shift towards 6.0 features.

    (Jarkko's slides can be found at http://www.iki.fi/jhi/osc2001/)

    At this point, Damian mentioned that we might be producing versions too fast for companies to adequately support - many companies will skip 5.6.0 and go to 5.8 if it's stable. Randy Ray explained how RedHat's product adoption works.

    Sarathy came up and talk about 5.6, saying that he wouldn't do much maintainance other than integrating minor patches from 5.8. He also once again appealed for a new pumpking.

    Nat and Dan talked about Perl 6, and again outlined a lot of things that should be familiar; I'll try and make sure these are covered in the Perl 6 digest.

    Hugo explained the regex changes that were needed, including rewriting the regex engine to use an optree much like (but separate to) Perl's own optree, rather than the current bytecode "oplist". This would allow for more aggressive but at the same time conceptually simpler optimization. Dan suggested that this should be done as a module to give it a trial run, then moved to core if it works. Hugo also said that he and MJD have been trying to remove the recursion in the regular expression engine.

    Jeff Pinyan then talked about some work he'd been doing with Jeff Friedl, particularly new syntax for character class arithmetic ("alphabetic characters minus vowels" and so on) and syntax for named captures. Larry had, of course, been through this: he announced the character class arithmetic syntax in Perl 6, and said that the deeper problem with named captures is that people really wanted regular expressions to return complex data structures. MJD also pointed out that overloading qr could do a lot of what the Jeffs wanted, including reversable regular expressions.

    I spoke briefly about Unicode and the compiler, but pre-empted myself since both topics were covered in my TPC tutorials. Jeff Okamoto enthused about IPv6; the discussion continued throughout the week and is too verbose to explain here.

    Finally, there was a long discussion about the standard module set and SDKs. Most people agreed that an SDK was required, but opinions differed on how to make one "official" and how to ensure that the user installs what they need. Larry's proposal for Perl 6 was that the standard library was so small as to be almost useless on its own, forcing people to go get what they need from CPAN.

    Huge thanks to Ziggy for taking notes.

    Asynchronous Callbacks

    Remember the discussion about asynchronous callbacks way back when? Well, David Lloyd has come back with a solution: Async::Callback. Nice one, David.

    He notes that it's not re-entrant unless you link Perl with -lpthread, but it's pretty fast.

    h2ph

    Kurt Starsinic rocks! He wrote a fantastic little test for h2ph which compares the constants provided by the Socket and POSIX extensions with their values from the .h files as generated by h2ph. This uncovered a load of bugs in h2ph, as well as one in Scalar::Utils::dualvar. Kurt also fixed these bugs. Oh, and made hex constants appear as numbers when used in a numeric context and strings of bytes when used as strings. What more could you ask for?

    iThreads

    Artur Bergman announced the release of some of his new threading modules, threads, threads::shared and threads::queue to CPAN. Great work, Artur!

    Various

    Phillip Newton continued his clean-up work of typoes and POD errors in the documentation, and Ilya dropped in a bunch of patches mainly to help tidy up the build process. One of the patches which was particularly handy allowed

        make OPTIMIZE=-g perl
    

    to simply rebuild perl with debugging.

    If you ever thought pseudohashes were useful, Autrijus Tang demonstrated that they were even slower than tying and overloading.

    Will Sanchez produced another clutch of Darwin patches - thanks, Will! I tried a patch to turn off optimization and constant folding but was quickly convinced these are two separate areas. Look for a new one coming next week...

    There was another report about how $a and $b aren't picked up by the strict pragma. I have despatched the hellhounds.

    Until next week I remain, your humble and obedient servant,


    Simon Cozens

    People Behind Perl : Artur Bergman

    We continue our series on the People Behind Perl by interviewing Artur Bergman. Artur is a recent addition to the legion of Perl developers, but he's already making extremely big waves. Artur picked up the gauntlet of the new threading model introduced in Perl 5.6.0 and has set about making it usable for the ordinary Perl programmer. Here's what Artur has to say.

    Who are you, and what is your day job?

    My name is Artur Bergman. I work on a content management system as a system developer/designer at Contiller AB in Sweden. I am usually found on #perl, which is also my only real longstanding contact with the Perl community.

    I used to work with Stock Exchange systems, and have designed and programmed a number of trade execution engines in Perl for both stocks, power and interest-rate markets.

    How long have you been programming in Perl?

    Since '95. I needed to make a Web page dynamic, so I had to learn a programming language; it ended up being Perl.

    What got you into Perl?

    My online friends recommended me to use Perl. I had tried Basic (of course) and C, but I didn't stick with them.

    Why do you still prefer Perl? What other languages apart from Perl do you program in, and how do they compare?

    Perl does what I want with a minumum of fuss. I can generally write nice clean code, but when I need to do something clever or impossible I can still do it. I feel like I am making steady progress while programming Perl.

    Currently the only other language I use is C; it doesn't compare! I wouldn't dream of using C for most stuff I do in Perl; I do dream of using Inline::C to optimize some stuff - I rarely do but it happens.

    What sort of things do you do with Perl now? What was the last Perl program you wrote?

    Content Mangement, which generally means dealing with text. The last Perl program I wrote was a OODBMS that is based on BerkeleyDB complete with transactions, version and release management, scheduled tasks and monitors. It uses POE, of course.

    What got you into Perl development?

    I was getting bored. I needed a chellange! I also wanted to refresh my C knowledge. Perl is a very interesting project to work on: the great range of platforms it supports, the large source and the very intelligent people working on it. I have learned a lot during the past three months,

    Do you remember the first patch you submitted? Was it applied?

    I do not remember it, but yes it was applied and Richard Soderberg found it for me; it was a tiny documentation patch a couple of years ago.

    What do you do for Perl development now?

    Mainly threading. I have released two modules on CPAN (threads and threads::shared) that give access to Perl's newer threading system. To make these work I have had to add some stuff to how Perl deals with threads. I am also starting to get involved into attribute stuff, and in my threads path I have stumbled over the regex/Perl interface and been working on making it work better. More work will surely be done in this area.

    Why do you work particularly with threads?

    People Behind Perl

    Interested in reading more from our series of interviews with Perl luminaries? Read our discussion with Nathan Torkington, a long-time Perl developer and mainstay of the Perl community.

    Because I have a real need for threads. I have a system that ideally should scale on a SMP system, it should also work both with Win32 and Unix. I am not a great fan of fork(), so I want threads. As Gurusamy Sarathy already provided a lot of the work on USEITHREADS, I decided to try to finish it.

    I am very interested in concurrency issues and have worked on and with POE quite a lot.

    People often say that threading in Perl is unusable. What do you say to that, and how is it getting better?

    Sure, tell that to all the people who use fork() on Win32 for all kinds of stuff!

    Actually, I agree. 5.6.1 is very unusable for all kind of complex applications with a lot of external APIs; it is unusable for people on unsafe C libraries. (From what I have been told, Win32 and Digital Unix have safe C libraries, so that might explain why it works better in Win32).

    The main problem has been the regex interface and match variables. This has now been solved. A lot of bugs have been fixed and I think 5.8 will be safe enough to start developing applications that are multithreaded. Of course, some people have been working on multithreaded applications since 5.5.3!

    What do you envisage Perl 6's threading model to look like?

    We conducted this interview before the State of the Onion, in which Larry confirmed that the threading model would look like ithreads - Simon.

    Something very similar to threads.pm/ithreads. I like the idea of only sharing what is explicitly declared to be shared. I think the idea could be taken a stop further, and only guarantee the value of shared variables after a synchronization.

    Finally, what's the best thing about Perl and what's the worst thing about Perl?

    Perl is fun, Perl has CPAN! Bad things? I would like an optional strict typing system.

    Visit the home of the Perl programming language: Perl.org

    Sponsored by

    Monthly Archives

    Powered by Movable Type 5.13-en