May 2001 Archives

Turning the Tides on Perl's Attitude Toward Beginners


The Perl community has held tight to a "zero tolerance" policy for beginners. The transition to a more accepting, responsible community has begun. The past is behind us and the future looks brighter.

The Way We Were

As far back as I can remember, asking a question that has been answered before is one of the many deadly sins of the Perl community. The general attitude revolved around, "If the docs are good enough for me, they're good enough for you. RTFM." One first-time programmer could easily accumulate 10 flames in his inbox after asking a question about why this code didn't print anything out:

  my $input    = <STDIN>;
  my $username = chop( $input );
  print "$username";

After said programmer has been flambe'ed to perfection they have to endure five more messages concerning the use of chop() and its evils, not to mention a handful of warnings about why putting double quotes around $username will cause famine in the land. Granted, these last few messages contain good information, but it's unlikely the beginner will even read these messages. Why would anyone want to subject themselves to more abuse when it's easier to delete the messages and move on to another programming language?

It seems that the very thing we want to have happen, adding number to our ranks, is the first thing we fight against when beginners show their faces. Wielding our swords of "RTFM" and shields of "killfile" we smite the very programmers that will carry this language into the future. I have a co-worker who is known for saying, "It's a good thing Perl is so powerful and cool, it barely makes up for the collective, childish 'elitism' displayed by its community." Collectively, this is a sad truth.

Many programmers end up turning to alternate languages and communities, ones that don't require them to carry fire trucks in their back pocket. For instance, Python has a mailing list designed to be a "Help Desk." It doesn't get much better than that for a beginner.

Another path to travel is the pay software route. If I pay Microsoft for their software, developer resources and customer support, they'll be nice to me. They won't call me a newbie. They won't prepare the clue stick, they'll give me all the answers I need. Believe it or not, folks, this is our competition, too. Forget about a single programmer turned away, this affects whole companies. A corporation has the wallet to turn away from Perl if it feels the support is "lacking."

One would figure that, with Perl's age, the community would have fostered a powerful, useful medium to ease the newcomer into our world. When a child is born, it requires extreme amounts of care, attention and guidance for several years. Most children aren't told to sit in the corner and RTFM until they get it; that's cruel and unusual punishment. Why then, must we be any different? We all have the ability to help programmers grow and mature, to shape their views and opinions. It's time for us to open up and give back to our trade. All of us have received gentle guidance; we are all capable of giving some back.

In short, no good has come of this.

A Step in the Right Direction

Last month, a few folks on the Perl Porters list decided enough was enough. Casey West set out in search of a new mailing list for Perl beginners. Some concerns were raised regarding the validity of the list but, overall, there was acceptance. Out of the blue, Ask Bjoern Hansen came barreling through the crowd announcing that the list had been created -- three days before. ;-)

It was time to find out if the Perl community needed a friendly, fire-free environment to foster growth and knowledge in the masses. It was time to find out if there were any masses. It was also time to find out if the Perl community would be willing to jump in and help. Don't you just love the suspense? On with the show!

beginners@perl.org

It has been one month and statistics speak for themselves. Thirteen hundred messages were sent last month, sending my mbox to more than 3.5 MB. As far as Perl lists are concerned, this is near the highest-trafficked list. Of course, the Python "Help Desk" list generates a solid 4000+ messages a month, with excellence.

Under normal circumstances, generating that much beginner traffic would cause the list to combust, sending fireballs as far as the eye can see. Not this time. Established folks in the Perl community have made an effort to put the gloves on and play nice. As list baby-sitter, I have only had to slap a handful of wrists, and the people attached to them have responded honorably. Nearly all questions have been answered at least once and every customer seems to be satisfied.

beginners-cgi@perl.org

The beginners list recently split in two. The beginners-cgi list was created as a means to cut back on the amount of traffic. Many people were finding it hard to keep up with the flow of just one list. This is understandable considering we have over 1000 subscribers to the beginners list.

daily-tips@perl.org

Ask and I started the daily-tips mailing list. This list will be sending our daily mailings with simple tips for beginners. The tips will cover a wide range of topics, in the form of ``Questions and Answer''. We won't be mailing from the standard Perl FAQ, there's already a list for that. Instead, we'll be taking submissions from the community and mailing them (That means you!). More information on this list can be found at http://learn.perl.org/.

beginners-workers

The beginners-workers mailing list is the place to send your daily-tips submissions, as well as your thoughts on how the Perl community should be helping beginners. Ask, Adam Turoff, Kevin Meltzer and I (and others) will be reading this list and responding if necessary. The Beginners Team is working hard on this list to get a number of initiatives rolling. Thanks to the people mentioned above, and many other folks for your continued help and support.

#perl-help

Another fine member of the Perl community, Kevin Lenzo, followed up the mailing list announcement with one of his own. He created an IRC channel named #perl-help on irc.rhizomatic.net and even attached the lovable purl bot to it. Kevin reports moderate and friendly traffic on this channel.

This is an exciting move, because it provides real-time results. The ability to interact with someone in real time makes this a complete win. Believe me, you won't go away disappointed from this channel: the people there are fast and fun to work with.

Code Review BOF at the Perl Conference

Peter Scott and I have teamed up to make the second TPC Code Review bigger and better than ever. A number of the top Perl people have volunteered to review code, free of charge. The format will be similar to last years BOF. This is your opportunity to get advice from Perl's best without having to pay their rates.

If you are attending this year's TPC and you have some questions about your code, please stop by. It doesn't matter what your question is, all subjects are open. We are putting an emphasis on answering beginners' questions but, please, don't hesitate to bring your latest regex engine patch for review, we have people who can do that, too.

Onward and Upward

The Perl community has had a taste of success with the beginners list and IRC channels. We will hopefully solidify our new found stance at the Perl Conference with the Code Review BOF. So what happens next? Are we finished? Not by a long shot.

Ask and I have begun work on http:, a modest site dedicated to helping people learn Perl. At the moment it is slightly bare, hosting only the beginners list FAQ, which is maintained by Kevin Meltzer. This is all about to change.

learn.perl.org will become a beginners documentation project. We are asking for contributions from the Perl community for tutorials. The topics of these tutorials may be broad, covering everything from text editors with Perl syntax highlighting to in-depth research on why one should use while to loop over files. We hope, with time, that learn.perl.org can mature into a central, community-lead documentation project: the Perl Documentation Project.

Pioneering the Status Quo Revolution

It has been the general opinion of the open-source community that, if you can't find the answers yourself, you're lost. Read the documentation, if there is no documentation, read the source. If there is no source, you've stepped out of the world as we know it. If you're a greenhorn and you can't figure it out from the docs or the source, goodbye. And not just goodbye, but a plague on your house as well.

This attitude is just not good enough any more. Frankly, it's a terrible way to behave. Any child can play the "I'm not telling you, figure it out yourself" game, complete with fingers wiggling from the ears and a stuck out tongue. It's time for us to collectively grow up. As a community, we need to mature. We need to get up and help each other.

I admit, I have been on the giving end of the flamethrower serveral times in the past. It seemed like a fun idea at first, but it really isn't. After I sent flames, I felt terrible, often going back to the recipient and asking forgiveness. That's just what it did to me, I hate to think about how it made the other guy feel. What kind of impression did I give him about me, about the Perl community, about open source? What we do in public really does reach that far: We are responsible for the people we represent, like it or not. So let's all grow up and represent them well.

This reaches far beyond the walls of Perl ( no pun... ). I am beginning to see the awareness all around the open-source community. We are in need of a change and the challenge is here. The Perl community has begun its journey, let's be the ones to carry it through the rest of open source, and beyond.

Closing Thoughts

My dream is for a Perl community that stands together with kindess and open arms. Many may feel this is a goal too lofty for us. I couldn't disagree more. We have shown the world that we can do just about anything; this should be easy. I know I'm not the only one who shares this opinion.

"But, paradoxically, the way in which Perl helps you the most has almost nothing to do with Perl, and everything to do with the people who use Perl. Perl folks are, frankly, some of the most helpful folks on earth. If there's a religious quality to the Perl movement, then this is at the heart of it. Larry wanted the Perl community to function like a little bit of heaven, and he seems to have gotten his wish, so far. Please do your part to keep it that way." -- Preface, Programming Perl 2nd Edition

This Week on p5p 2001/05/27



Notes

You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

This was a week of many little threads consisting of bug reports and fixes, with 360 messages.

Attribute tieing

Artur Bergman, while working on the new iThreads threading model, has reported that variable attributes are not usable for tieing.

Variable attributes are an experimental feature of Perl that are a means of associating "out of band" information with a variable. The problem is that, as Nick Ing-Simmons pointed out, in the current implementation, the attribute code is called at compile time (this is also where the tie happens), but that the scope cleanup removes the "magic" from the variable.

Some discussion ensued on the fact that the code was working as designed, but that the design needed to be expanded slightly, by removing the restriction that the attribute callbacks were designed to be called at compile time. Gurusamy Sarathy provided a possible solution by adding another hook, and I documented the brokeness.

When is a bug a bug?

Michael Stevens and Larry Virden submitted bug reports via the perlbug interface for a bug which was only present in perl-current. Perl-current (also known as bleadperl) is the absolute latest development version of Perl, and (as perlhack mentions, "is usually in a perpetual state of evolution. You should expect it to be very buggy". Jarkko mentioned that he didn't think submitting perlbugs on bleadperl was a good idea:

Also, I don't think submitting perlbugs on rsynced snapshots is a good plan. If one is playing with the snapshots, one is playing with the bleeding edge, and one should directly send a report to p5p, not as a full perlbug report.

The rationale: the perlbug database is already working "too well" :-) by being too full of bugs that strictly speaking aren't. I don't want the database to clutter up with noise from volatile snapshots. I cannot and will not guarantee that every check-in I make is free from test failures. For the announced snapshots I try harder.

Philip Newton reminded us that the point of development releases is to find and fix bugs. Merijn Brand provided a patch to include the patch snapshot level into perlbug reports. Jarkko releases snapshots of bleadperl a couple of times a week, the latest being called 10210, and including the level will help find the bug being reported.

Test::Harness cleanup

Ilya Zakharevich provided a patch to clean up output of the test harness, fixing the alignment of the fields making it easier to read reports with failures. Michael Schwern disagreed on the patches necessary, leading to a small tug of war. As Jarkko pointed out "the suggested patches are not converging".

Time::Local

Uros Juvan reported a bug about Time::Local and invalid dates: it would return different dates in that case. For example, asking for February 30th would return March 2nd without giving a warning. It contains tests which are a little too primitive, and a patch was supplied by Stephen Potter.

Various

John Peacock submitted a patch to make sure magic is removed at scope exit. Mike Guy supplied a patch to support qualified variables in "use vars", somewhat controversially (a similar patch for our by Mark-Jason Dominus last year was rejected). He also supplied a patch to remove the long deprecated uppercase aliases for the string comparison operators EQ, NE, LT, LE, GE, GT etc. This led to the most amusing idea of the week: Jarkko suggested testing the Perl parser with some some Markov chain (n-characters) generated Perl-like gibberish: "That way we get a lot of data that constantly begins to look like valid Perl but then switches back to not being Perl". No-one provided such a Markov chain, unfortunately.

Michael Schwern submitted some more minor patches, including trying to get Perl to compile cleanly under -Wall.

Hugo proposed removing [$*] ( PL_multiline), which has been deprecated since at least as far back as 5.003_97.

Gisle Aas patched Perl to allow overriding of require to be delayed slightly to increase its usefulness.

Colin P McMillen asked if Perl's sort function was intended to be stable, which resulted in a documentation patch by John P. Linderman stating "Perl does not guarantee that sort is stable".

Richard Soderberg patched a bug found by Mark-Jason Dominus where a localized glob loses its value when it is assigned to.

There were various other minor patches, but I think most people have been relaxing in the sun this week. Until next week I remain, your temporarily-replaced humble and obedient servant,

Leon Brocard


This Week in Perl 6 (20 May - 26 May 2001)



Notes

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl6-thisweek-YYYYMM@simon-cozens.org, where YYYYMM is the current year and month.

Perl "Assembly Language Standard"

A.C. Yardley submitted a proposal for documenting PDDs with the very-low-level operations of the Perl Virtual Machine itself. (This is a separate idea from the assembly language that will also be written.) A.C. will continue to work on it.

The "Perl Apprenticeship Program" Revisted

A.C. Yardley also revisited an old thread about a Perl Apprenticeship Program.

Nat Torkington:

Not to speak for Dan, but there's no code yet to review or learn from. I'd love to see someone set up a perl *5* apprentice program, and Mark-Jason Dominus has some ideas on how it might work. For perl6, though, we're not yet at a place where I think it makes sense. Right now there's so little defined in the way of implementation, that questions can be asked and answered on the mailing list.

Simon, however, felt that the group was almost at the point where some coding could start, and suggested that folks whisk through the sv.c, av.c, and hv.c Perl 5 code (describing scalar, array, and hash functionalities, respectively) to summarize what actually needs to be implemented.

Dave Mitchell and A.C. Yardley accepted the challenge.

Perl Virtual Registers

Dan Sugalski pushed for a register-based Virtual Machine, vice a stack-based machine, with little real dissention on that point. Dan, however, wanted typed and linked registers, but opinions were mixed on having typed registers, and generally opposed to having them linked.

Uri Guttman and Nick Ing-Simmons then took over the thread to banter about dealing with the stack/register window that will be necessary with threads and register overflows.

Here's a snippet from Uri's last response:

NI> That makes sense if (and only if) virtual machine registers are real 
NI> machine registers. If virtual machine registers are in memory then 
NI> accessing them "on the stack" is just as efficient (perhaps more so)
NI> than at some other "special" location. And it avoids need for 
NI> memory-to-memory moves to push/pop them when we do "spill".

no, the idea is the VM compiler keeps track of IL register use for the purpose of code generating N-tuple op codes and their register arguments. this is a pure IL design thing and has nothing to do with machine registers. at this level, register windows don't win IMO.

And a snippet from Nick's:

... My point is that UNLESS machine (real) machine registers are involved then all IL "Registers" are in memory. Given that they are in memory they should be grouped with and addressed-via-same-base-as other "memory" that a sub is accessing. (The sub will be accessing the stack (or its PAD if you like), and the op-stream for sure, and possibly a few hot globals.)

The IL is going to be CISC-ish - so treat it like an x86 where you operate on things where-they-are (e.g. "on the stack")

   add 4,BP[4]

rather than RISC where you

   ld BP[4],X
   add 4,X
   ST X,BP[4]

If "registers" are really memory the extra "moves" of a RISC scheme are expensive.

Slices

Raul Miller threw out some ideas about slice syntax. There was some minor discussion centered around whether an array index in the new syntax is in scalar or list context by default.

This led to David Whipp proposing a new index context, to parallel the other contexts that Perl is aware of.

Slice References

Peter Scott bridged a posting from the perl beginners list, asking if it would be possible to allow non-copying slice references in Perl 6.

Damian Conway, of course, said, of course, "Of course":

@A = (1..10);   # array
sub sliceref
 {my($i,$o,$k)=(0,0,pop);$i+$o-$k->[$i]?++$o&&splice@_,$i,1:$i++while$i<@_;\@_}
my $ref = sliceref @A, [3..5,9];        # reference to slice
print "@A\n";
print "@$ref\n";
$ref->[1] = 99;
print "@A\n";
print "@$ref\n";

Properties Continued

The head-wrapping continued on the new properties feature of Perl 6. Although Damian gave some encouraging answers, he eventually posted:

I have already been discussing this with Larry and have privately sent him a complete proposal that, I believe, addresses all these issues. Let's wait and see what he makes of that proposal.

The Parrot Squawks

There has long been confusion between Perl (the community), Perl (the language), and perl (the program). This is only exacerbated on the inside with various bits and pieces of Perl's (and perl's) guts. Add the complexity that Perl 6 is promising to add, and you've got a maze of twisty little passages, all named "Perl."

Now, a while back, Larry countered a suggestion to use Parrot as the project name for Perl 6 with a list of codenames for various pieces, but it seems that Dan has been slipping it in anyway. (The name Parrot, of course, is from Simon's tomfoolery earlier this year.) So it seems that at least the Perl 6 internals has a new moniker.

Until next week...


Bryan C. Warnock

Taking Lessons From Traffic Lights

O'Reilly Open Source Convention

Michael Schwern will be speaking at the O'Reilly Open Source Convention in San Diego, CA, July 23-27, 2001.

Excuse me, I'm going to ramble a bit about traffic lights as they relate to language design to see whether something interesting falls out.

I was riding my bike to work today and started thinking about the trouble we were having with string concatenation in Perl 6 [1] and how the basic problem is that we've run out of ASCII characters. I briefly thought about Unicode, but it's not nearly well-supported.

Then I stopped at a traffic light (the police in Belfast get annoyed by bikes blowing through lights, even at 2a.m. on an otherwise empty road. And they drive unmarked cars). Traffic lights convey their signals through color. I briefly thought it would be neat to convey Perl grammar via color, but that can't be done for similar reasons to Unicode.

Color Coding vs. Position

The interesting thing about traffic lights is that the color is just for clarification. The real communication is through position. Stop on top, caution in middle, go at the bottom (some communities do this a bit differently, but it's all locally consistent). This is important, because a large section of the population is color blind, but even if you saw a traffic light in black-and-white you could still make out the signals by their position.

If you ask anyone on the street what "stop" and "go" are on a traffic light, they'll probably say 'red' and 'green' without even thinking. But if you asked them to draw a traffic light they'd be sure to put the red on top and green on bottom. It's interesting that although we respond strongly to color cues, we subconsciously remember the positional cues. It's especially interesting given that we're never actually taught "go is on the bottom".

There's a thin analogy to syntax highlighting of code, where the color is just there to highlight and position conveys the actual meaning.

This idea of having redundant syntax that exists to merely make something easier to remember is perhaps one we can explore.

Sequence


O'Reilly Open Source Convention Featured Speaker

Michael Schwern will be presenting four sessions at the O'Reilly Open Source Convention in San Diego, CA, July 23-27, 2001. Rub elbows with Open Source leaders while relaxing on the beautiful Sheraton San Diego Hotel and Marina waterfront. For more information, visit our conference home page.


Now, color and position aren't the only tools a traffic light has. Order is another. Stop, Go, Caution, Stop, Go, Caution ... that's the way it goes. Again, most people know this subconsciously even if they've never been taught it or ever thought about it. It's a pattern that's picked up on and expected. Go follows Stop. Caution precedes Stop. If a light were to suddenly jump from Go to Stop, drivers would be momentarily confused and indecisive.

The lesson there is simple: be consistent.

So while order doesn't convey any extra information, its consistency can be important. I experienced this when I came to Belfast. The lights here go: Stop, Caution, Get Ready, Go where "Get Ready" is conveyed by a combination of red and yellow. Very useful (especially if you drive a stick or have toe-clips on your bike and need a moment to get ready) but a bit confusing the first few times.

This directly contradicts the above lesson, eschew consistency if it's going to add a useful feature. People may be taken back the first few times, but the utility will shine through in the long run. This could be considered a learning curve.

Combinations

Which brings us to another tool: combinations. Although rarely done, you can squeeze more meaning out a set of lights by combining them. Just like a three-bit number. The red-yellow combination is the only one I can think of, and probably rightly so. While there's still three more combinations available, they would rapidly get confusing if used.

Perhaps the lesson is: Just because you can wedge more meaning in doesn't mean you should.

The final method of communication is flashing. Flashing red is like a stop sign. Flashing yellow, proceed with caution. I don't think flashing green is ever used or what it could mean [2]. Most flashing lights are there to draw attention. Emergency vehicles, gaudy advertisements, navigation lights. Flashing signals are deliberately jarring. They're also rarely used in combination with the normal signals. This is very important. The normal confusion associated with a break in the pattern isn't there since the normal pattern is totally absent. The meaning of the flashing signals is close to their normal solid meaning, which allows most drivers to know what they mean without thinking about it.

Flashing signals are also rather rare. They're used at times when there's few cars on the road (late at night) or on roads that carry little traffic.

The lesson there, if you're going to be inconsistent, is make sure you don't do it in a way that will mix with the normal pattern of things. Think about how the inconsistent feature will be used and make sure it will be used in spots that are distanced from normal use. Also, the potential uses of the inconsistent feature should be relatively rare.

As an aside, when I was young and on vacation with my family, we visited my uncle in the middle of the Mohave Desert. He worked at a borax mine (yep, mining for soap). Aside from the 40-foot-high dump trucks, the thing I remember most is the speed-limit signs. Has to be 15 years ago and I still remember this. The speed limit was "12 1/2 MPH." Why? Not because you'll get pulled over if you go 13 MPH, but so you'll take notice. Consistency can ease understanding, but it can also encourage complacency.

Back to flashing. Although it will vary from light to light, a single light will only use one frequency. They could use more. In fact, an almost infinite amount of information could be conveyed by various frequencies of flashing. This technique is in active use today by most Navies of the world as a method of secure, short-range ship-to-ship communication. Powerful signal lamps with shutters are used to flash Morse code between ships. More commonly, FM (Frequency Modulation) radio is essentially just a big flashing traffic light.

Traffic lights chose to use only one frequency. Why? Simplicity. There is flashing and there is not flashing. That's it. Easy to remember, but more importantly, easy (and quick) to recognize. Very important when a car is approaching at 65 mph.

There comes a point when you're cutting the syntax too fine. When the distinctions between one meaning and another take too much careful examination to distinguish. A good example being the string concat proposals that wanted to use certain whitespace combinations to distinguish, or special uses of quotes. Perl 6 must be careful of this sort of thing as we strive to shove more and more information into just 95 lights (the set of printable ASCII characters). There's a reason the Navy employs highly trained men on those signal lamps.

Sound as Syntax or Grammar

Finally, there's sound. I lived in Pittsburgh for a while near a school for blind children and a clinic for the blind. The major intersections for a few blocks around all had the normal walk-don't-walk pedestrian signals, but these are a bit different. Rather than the usual Walk with the green, Don't Walk with the red, it would be Don't Walk in all directions. When Walk came on, all lights would go red and pedestrians could cross in any direction.

This was accompanied by a distinct, very loud "koo-koo" sound to let the blind know it was time to cross. Also, there was a speaker at each corner to give them something to walk toward.

Sound as syntax. It would be interesting to use sound as grammar, especially since we already have a grammar to represent sound (i.e. sheet music). However, I don't know about you, but I'm not about to start dragging around a Moog with me to code Perl ... though playing chords to represent code would be neat. Imagine the twisted noises coding a particularly nasty regex might produce.

Rambling along this thread, it has been reported that London.pm recently attempted to encode DeCSS as an interpretive dance. Perhaps DeCSS will surpass the Tarentella as the "Forbidden Dance".

There are also some attempts at translating Perl code to music. I think someone hooked Perl code run through the DNA module into something that generates music from genetic code. But I digress.

The Function of Traffic Lights

Let's think about what traffic lights are used for. Well, they're there to control the flow of traffic. Theoretically, you'd want to use them to maximize use of the available roads and get the most cars to their destinations as quickly and efficiently as possible. You'd do things like time the lights down a stretch of road so someone going the speed limit never hit a red light. You'd want to keep the lights green as long as possible on the major roads at intersections. You'll want sensors to detect when no car is waiting for a red so it can keep the other side green longer. By doing this you'll get everyone coordinated and moving about quickly.

But then traffic lights can be used for the exact opposite. They can be used to deliberately slow and minimize traffic, say around a school or in a shopping district with lots of pedestrians. Lights will be deliberately set to prevent drivers from going continuously down a road, making them stop often and keeping their awareness up (but perhaps their frustration and gasoline consumption as well). All sides of an intersection can be stopped at once to allow pedestrians to pass safely. Flashing yellows can be employed to warn of a school zone. Lights can be placed at dangerous intersections (or ones where children are known to be about) even though drivers should be able to self-regulate.

So the same device and features can be combined and used in different ways to produce contradictory effects. Perl 6 must have this nature, as is clearly evident from the wildly differing and contradictory RFCs presented, often in direct opposition to each other. We should design Perl features to be like traffic lights. The same feature can be used for different and contradictory effects. This will ease the pressure to squeeze more and more grammar out of our limited syntax possibilities.

Oddly enough, varying the number of traffic lights can effect efficiency. By over-regulating you can choke off traffic. Constant fiddling with the setups and timings, trying to control each and every intersection to maximize throughput leads to grid lock, zero throughput. The exact opposite of what was intended.

We are in danger of doing just that. By wanting to correct, streamline and optimize each bump and snag in Perl we may cross some imaginary line and have syntactical grid lock where the language design descends into a morass of continual minor adjustment. By backing off we can often find a much more sweeping solution than just putting up lights on each corner. A perfect example is Larry's "module" solution to the Perl 6 vs. Perl 5 interpretation (although it still needs a few extra lights here and there to make it really work).

Life without Traffic Lights

There is an alternative to all this. I've been working in Ireland for the past three months, and like most Americans I have met that peculiar English invention, the roundabout. [3] Three, four, five, even six-way intersections are handled seamlessly by this apparently anarchistic piece of the transportation landscape. All without any traffic lights.

First few times, I fearfully creeped across, pushing my bike along as a pedestrian, too frightened to try and enter the unending flow of traffic. After a while, and asking around a bit, the underlying rule became obvious: yield to traffic in the circle. With this revelation I was able to zip through confidently. I rather like them now and appreciate how they keep the traffic flowing even for the most complicated intersections. The apparent complexity of the details (lots of cars zipping about, merging and leaving from many points) all stems from a single rule.

Contrast this with the typical American four-way intersection. Roads placed at right angles, traffic lights poised in each direction. Cars jerk forward hesitantly and the system rapidly breaks down under heavy traffic. Initially easier to learn, anyone can understand a traffic light, but the devilish complexity of right-of-way and subtleties of making a proper left turn betray that what seems at first simple, might actually be clunky in the long run. And that which seems complex and frightening will yield its underlying simplicity with time and experience.

So the lesson there, aside from "roundabouts are neat" is about learning curves and the wisdom of focusing design on "beginners." While effort must be made to flatten the learning curve, don't short-change the ultimate goal just to make it easier initially. After all, we are only beginners for so long. [4]

Foot Notes

[1] It has been decided that . will be used instead of -> for method calls in Perl 6. This leaves the problem of how to concatenate strings. Everyone and their dog seemed to have a proposal on the perl6-language mailing list, all of them a bit contrived as we've run out of characters.

[2] It has been reported by two sources that there is a flashing green light outside of Boston that meant "this light will rarely go red."

[3] Some may call them "traffic circles." Most Americans know them best from "National Lampoon's European Vacation" ("Look kids! Big Ben, Parliament!") They're used in a couple places in the U.S.: Oregon, Florida, New England. " Modern Roundabouts" gives a nice explanation and visualization from an American point-of-view (i.e. the right side of the road).

[4] Of course, this can get a little out of hand: "Swindon's Magic Roundabout"

This Month on Perl6 (1 May - 20 May 2001)


Notes

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl6-thisweek-YYYYMM@simon-cozens.org, where YYYYMM is the current year and month.

It's been two months since we've seen a Perl 6 Summary, but it's certainly not for lack of activity. Instead, Simon's been very busy, er, being Simon, so I'm going to be hacking the summaries for a while. Since there's been a lot of traffic flowing since we last aired, we're going to skip April, and give a glossy on what's gone on so far in May. The weekly summaries should resume next week.

Perl 6 Internals

All was fairly quiet with the -internals list, so I'm going to dip back into the tail end of April for some significant events. Dan Sugalski pointed everyone to an updated preview of the upcoming Garbage Collection PDD. Dave Storrs also released a first rough cut on a debugger PDD. After feedback from Jarkko and Dan on some scope tweaks to the PDD, Dave went back to work on it. Dave Mitchell proposed "PDD: Conventions and Guidelines for Perl Source Code" to try to establish some coding standards for Perl 6. Reaction was mostly silent assent (we presume), except for a peripheral discussion centered around where macros fall on the scale from Good to Bad.

Perl 6 Meta

It was much louder on the -meta side of the house. Peter Scott wondered aloud whether Perl 6 was enough of a name change to reflect the apparent differences between Perl 5 and the new language. After some half-digested suggestions on version numbers, code names, and even a departure from the Perl name itself, Nathan Torkington dropped a teaser for Damian Conway's soon-to-be Exegesis 2, and Larry expressed that he didn't feel the language was going to look all that different.

Adam Turoff later opined:

I don't think backwards compatibility is the point here.

I picked up Camel 1 recently, and it was quite amazing how different Perl4 *felt*. It's like Perl was being pitched as a good language for writing standalone programs or utilities of standalone programs (the type a sysadmin would use). It didn't feel like it was being offered as the kind of language to write things like Class::Contract, Inline::C, AxKit, SOAP::Lite or the all-singing-all-dancing CGI.pm.

The ensuing discussion attempted to answer the $64,000 question: In the effort to "make the hard things easy," were the easy things becoming harder? Was the entry barrier to Perl being raised too high as a side-effect of the sheer capabilities of the complexity of the language? There were plenty of arguments on all five sides of the coin.

Michael Schwern attempted to clarify some of the concerns by pointing out that "you will not have to rewrite your Perl 5 programs", since Larry had apocalyzed Perl 5 syntax handling for Perl 6, as well.

Nathan then posted some sample code to show exactly how unchanged common Perl will be.

Perl 6 Language

The -language list saw the bulk of the activity, with over 500 messages during the three week period.

Apocalypse 2

As expected, Larry's Apocalypse 2 generated a lot of response. Most of it was Q&A and general water cooler talk, but there are a couple things to bubble to the top.

Richard Proctor pointed out an ambiguity between the new and old uses of \Q. At the thread's end, it looked as though Larry was in the process of consolidating all quoting constructs into one meta-syntax, although nothing specific was put forth.

Nathan Wiger was the first to express concerns about the new syntax for the I in I/O. (One of the basic issues of several folks was the verbiage needed to accomplish something as common as reading input.) Larry proffered some musings here, here, and here.

Nathan also asked for clarification on context and variables, which both Larry and Damian provided.

Exegesis 2

Damian Conway's Exegesis 2 also spawned a lot of traffic. With few exceptions, however, the discussion was completely focused on properties and the new is keyword. Responses were varied as to what level of confusion the author was in, but the overall trend was fairly static - properties on values versus variables, and how it all comes together. Damian posted one last explanation.

Sandboxing

David Nicol asked whether Perl 6 should support sandboxing, or if it should rely on the underlying OS.

After some minor debate, the general consensus was that Perl should support its own sandboxing, which should be (relatively) trivial, at least for security concerns. There was some expressed worry, however, as to how to handle resource limitations.

Undecorated References

David Nicol also suggested that references be undecorated, in an effort to reclaim $ as a content hint, as well as a contextual one. After a fairly meandering discussion on Perl 5, Perl 6, and Hungarian notation, Larry said:

I happen to like $ and @. They're not going away in standard Perl as long as I have anything to do with it. Nevertheless, my vision for Perl is that it enable people to do what *they* want, not what I want.

Lexing and Pushdown Expressions

Daniel Wilkerson requested better parsing capabilities to be built into Perl. There was general agreement that Perl had some weaknesses in these areas, but some trepidation on solving it beyond how current Perl modules do.

Miscellaneous

Alexander Farber asked for last to work in grep. After a couple demonstrations on how this was possible in Perl 5, the thread shifted to talking about compiler optimizations.

David Grove queried about the similarities between Perl 6 and .NET. After a couple responses on how this makes sense, the thread shifted to talking about data serialization.

Perl 6 Docs Released

Here are some of the major documents released during this period:

Apocalypse Two, Larry Wall.

Exegesis Two, Damian Conway.

PDD Debugger, Dave Storrs (rough draft, v1)

PDD Conventions and Guidelines for Perl Source Code, Dave Mitchell (Proposed, v1)

Until next week, I remain Simon's humble, and (mostly) obedient, servant,


Bryan C. Warnock

Exegesis 2

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 2 for the current design information.


exegesis: n. an interpretation and explanation of a text, esp. Holy Writ

This is the first of a series of articles paralleling Larry's ``Apocalypse'' encyclicals (it's numbered 2 to keep it in sync with those Revelations). These articles will take each unveiled piece of the design for Perl 6 and demonstrate the new syntax and semantics in an annotated program.

So, without further ado, let's write some Perl 6:

        # bintree - binary tree demo program 
        # adapted from "Perl Cookbook", Recipe 11.15
        use strict;
        use warnings;
        my ($root, $n);
        while ($n++ < 20) { insert($root, int rand 1000) }
        my int ($pre, $in, $post) is constant = (0..2);
        print "Pre order:  "; show($root,$pre);  print "\n";
        print "In order:   "; show($root,$in);   print "\n";
        print "Post order: "; show($root,$post); print "\n";
        $*ARGS is chomped;
        $ARGS prompts("Search? ");
        while (<$ARGS>) {
            if (my $node = search($root, $_)) {
                print "Found $_ at $node: $node{VALUE}\n";
                print "(again!)\n" if $node{VALUE}.Found > 1;
            }
            else {
                print "No $_ in tree\n";
            }
        }
        exit;
        #########################################
        sub insert (HASH $tree is rw, int $val) {
            unless ($tree) {
                my %node;
                %node{LEFT}   = undef;
                %node{RIGHT}  = undef;
                %node{VALUE}  = $val is Found(0);
                $tree = %node;
                return;
            }
            if    ($tree{VALUE} > $val) { insert($tree{LEFT},  $val) }
            elsif ($tree{VALUE} < $val) { insert($tree{RIGHT}, $val) }
            else                        { warn "dup insert of $val\n" }
        }
        sub show {
            return unless @_[0];
            show(@_[0]{LEFT}, @_[1]) unless @_[1] == $post;
            show(@_[0]{RIGHT},@_[1])     if @_[1] == $pre;
            print @_[0]{VALUE};
            show(@_[0]{LEFT}, @_[1])     if @_[1] == $post;
            show(@_[0]{RIGHT},@_[1]) unless @_[1] == $pre;
        }
        sub search (HASH $tree is rw, *@_) {
            return unless $tree;
            return search($tree{@_[0]<$tree{VALUE} && "LEFT" || "RIGHT"}, @_[0])
                unless $tree{VALUE} == @_[0];
            $tree{VALUE} is Found($tree{VALUE}.Found+1);
            return $tree;
        }

It's Perl, Jim, and quite like we know it

The program gets off to a familiar start:

        use strict;
        use warnings;
        my ($root, $n);
        while ($n++ < 20) { insert($root, int rand 1000) }

Nothing new here. And, in fact, despite the many new features it illustrates, overall this program looks and feels a great deal like Perl 5 code.

That shouldn't really be surprising, given that Perl 6 is growing out the of suggestions of hundreds of devoted Perl 5 programmers, filtered through the mind that invented Perl 5.

As RFC 28 suggested, Perl is definitely going to stay Perl.

Any variables to declare?

Variable declarations in Perl 6 can be as simple as those for $root and $n above, but they can also be much more sophisticated:

        my int ($pre, $in, $post) is constant = (0..2);

Here we declare three variables that share a common type (int) and a common property (constant). Typed lexicals are a feature of Perl 5 too, but having names for Perl's built-in types is new.

The type specification tells the compiler that $pre, $in, and $post will only ever be used to store integer values. And because int is in lower-case, the specification also tells the compiler that it's okay to optimize the implementation of the variables, because we promise not to bless them or ascribe any run-time properties to them. Making this promise and then breaking the rules later in the program will get you a compile-time or run-time error (depending on whether the compiler can detect the malfeasance statically).

If we had not been willing to live without the blessing of bless-ing or the useful run-time properties of run-time properties, we would have written:

        my INT ($pre, $in, $post) is constant = (0..2);

in which case we'd get three less-optimized, but fully-functional, Perl scalars.

In this particular case, the int/INT distinction makes very little practical difference. However, there's a significant advantage to writing:

        my int @hit_count is dim(100,366,24);

compared to:

        my INT @hit_count is dim(100,366,24);

and thereby replacing nearly a million chunky scalars with svelte raw integers.

La propriété c'est le vol

The is constant and is dim bits of the above declarations are compile-time property specifications. These particular properties are standard to Perl 6, but you can also roll-your-own. The is dim property tells Perl the (fixed!) dimensions of the array in question. The is constant property specifies that the preceding variables cannot be assigned to, nor have their values otherwise modified, once they're initialized.

Moreover, the constant property is a hint to the compiler that it may be able to optimize the variables right out of existence, by inlining their values directly. Of course, that's only feasible if we don't ever treat them like a real variable (e.g. take a reference to them, or bless them).

The is keyword is optional where its absence is unambiguous, so we could have written:

        my int ($pre, $in, $post) constant = (0..2);

Larry's also still mulling over a suggestion that are be provided as a synonym for is, so you might even be able to write the declaration as:

        my int ($pre, $in, $post) are constant = (0..2);

An important feature of the is operator that we'll make use of shortly, is that it returns its left operand. So:

        $submarine is Colour('yellow')

evaluates to $submarine, not 'yellow'.

More of the same

The three calls to show are also exactly as they were in Perl 5:

        print "Pre order:  "; show($root,$pre);  print "\n";
        print "In order:   "; show($root,$in);   print "\n";
        print "Post order: "; show($root,$post); print "\n";

Happily, we're going to see that a lot throughout this series of articles.

Biting off less so you can chew

Do you ever get tired of writing:

        while (<>) {            # Common Perl 5 idiom
                chomp;
                ...

Wouldn't it be nice if input lines were automatically chomped? In Perl 6, they can be. We just set the chomped property on the input handle referred to by the global variable $*ARGS:

        $*ARGS is chomped;

This causes any normal read on the handle (see Inputs that output) to automatically pre-chomp the string it returns. Of course, like most other global punctuation variables, $/ has been banished from Perl 6, so the trailing character sequence to be chomped is specified by the handle's own insep (input separator) property instead.

The asterisk in $*ARGS indicates that the variable is the one from the special global namespace. If the asterisk is omitted, it's probably still the one from the special global namespace -- unless you declared a lexical or package variable of the same name. You can pronounce * as ``standard'', if that helps.

By the way, it's called $*ARGS because it lets us access the files passed as a Perl 6 program's arguments (just as the Perl 5 ARGV filehandle provides access to the program's...err...argumentv).

Inputs that output

In the original Cookbook version of this program, the next line was:

        for (print "Search? "; <>; print "Search? ") {

This highlights a common situation for which there is no satisfactory solution in Perl 5. Namely: repeatedly prompting for input and reading it into $_, until EOF. In Perl 6, there's finally a clean way to do this -- with another property:

        $ARGS prompts("Search? ");
        while (<$ARGS>) {

The first thing you'll notice is that reports of the diamond operator's death have been greatly exaggerated. Yes, even though the Second Apocalypse foretold its demise, Rule #2 has since been applied and the angle brackets live!

Of course, they're slightly different in Perl 6, in that they require a handle object inside them (normally stored in a variable), but that's already possible in Perl 5 too.

Meanwhile, what about that prompt? Well, the Perl 6 solution is to allow input handles to have an associated character string that they print out just before they attempt to read in data.

Wait a minute!, I hear you object, Input handles that do output??? Actually, you've been using handles like that for decades. In most languages, every time you do a read from standard input, the first thing the input operation does is flush the standard output buffer. That's why something like:

        print "nuqneH? ";
        $request = <>;

pre-prints the prompt correctly, even though it doesn't end in a newline.

So input and output mechanisms are already carrying on a secret relationship. The only change here is that now you're allowed to have the input handle add a little something to the output before it flushes the buffer. That's done with the prompts property. If an input handle has that property, its value is written to $*OUT just before the input handle reads. So we can replace:

        for (print "Search? "; <>; print "Search? ";) {         # Perl 5 (or 6)

with

        $ARGS prompts("Search? ");                              # Perl 6
        while (<$ARGS>) {

Technically, that should of course be:

        $ARGS is prompts("Search? ");

but that grates unbearably. Fortunately the is is optional in contexts -- such as this one -- where it can be inferred.

Note that, because the is operation returns its left operand (even when the is is invisible!), we could also use the rather elegant:

        while (<$ARGS prompts("Search? ")>) {

In fact, this one-line version may often be preferable, since the value of the prompts property might be changed somewhere inside the loop and this resets it on each iteration.

The exact semantics of the prompting mechanism aren't nailed down yet, so it may also be possible to use a subroutine reference as a dynamic prompt (the handle would call the subroutine before each read and pre-print the return value).

Haven't we met before? (part 1)

Having requested and read in a value, the search-and-report code is almost entirely familiar:

            if (my $node = search($root, $_)) {
                print "Found $_ at $node: $node{VALUE}\n"
                print "(again!)\n" if $node{VALUE}.Found > 1;
            }
            else {
                print "No $_ in tree\n"
            }
        }

The only lurking Perl6ism is the use of the user-defined Found property to report repeated searches.

The call to $node{VALUE}.Found would normally be a method call (in Perl 6 -> is spelled .). But since $node{VALUE} is just a regular unblessed int, there is no Found method to call. So Perl treats the request as a property query instead and returns (an alias to) the corresponding property.

Take that! And that!

In Perl 6, subroutines can -- optionally -- specify proper parameter lists (as opposed to the not-evil-just-misunderstood argument context prototypes that Perl 5 allows).

For instance the insert subroutine, declares itself to take two parameters:

        sub insert (HASH $tree is rw, int $val) {

The first parameter specifies that the first argument must be a reference to a hash and is to be assigned to the lexical variable $tree. Defining the first parameter to be a hash reference means that any attempt to use it in some other way (e.g. trying to do an subroutine call through it, trying to pass it an explicit array reference, etc.) can be caught and punished -- at compile-time.

It's important to understand that, by default, named parameters are not like the elements of @_. Specifically, even though each argument is passed to its corresponding parameter by reference (for efficiency), the parameter variable itself is automatically declared constant, so any attempt to assign to it results in a compile-time error. This is intended to reduce the incidence of people accidentally people shooting themselves in the foot.

Of course, this being Perl, when we really do need to draw a bead on those metatarsals, we can. To allow assignments to a named parameter -- assignments that will propagate back to the original argument -- we need to to declare the parameter with the standard rw (read-write) property. It then becomes an fully assignable alias for the original argument, which in this example allows us to autovivify it (see We don't need no stinking backslashes).

The @_ argument array is still available in Perl 6, but only when we declare subroutines in the Perl 5 manner -- without a parameter list. See A good, old-fashioned show).

The second parameter of insert is defined to take an integer value. By using the type int instead of INT, we're once again explicitly promising not to do bizarre things with the referent (at least, not within the body of insert). The compiler might be able to use this information to optimize the subroutine's code in some way.

A sigil is for life, not just for value type

Long ago, when the Earth was new and Perl was young and clean, the type of sigil ($, @, or %) that was associated with a variable described what it evaluated to. For example:

        print $x;                       # $x evaluates to scalar
        print $y[1];                    # $y[1] evaluates to scalar
        print $z{a};                    # $z{a} evaluates to scalar
        print $yref->[1];               # $yref->[1] evaluates to scalar
        print $zref->{a};               # $zref->{a} evaluates to scalar
        print @y;                       # @y evaluates to list
        print @y[2,3];                  # @y[2,3] evaluates to list
        print @z{'b','c'};              # @z{'b','c'} evaluates to list
        print @{$yref}[2,3];            # @{$yref}[2,3] evaluates to list
        print @{$zref}{'b','c'};        # @{zyref}{'b','c'} evaluates to list
        print %z;                       # %z evaluates to hash

Regardless of the actual type of the variable being referred to, a leading $ on the access meant the result would be a scalar; a leading @ meant a list; a leading %, a hash.

But then the serpent of OO entered the garden, and offered Perlkind the bitter fruit of subroutine and method calls:

        print $subref->();
        print $objref->method();

Now the leading $ no longer indicated the type of value returned. And in beginners' Perl classes across the land there arose a great wailing and a gnashing of teeth.

Perl 6 returns us to a state of grace -- albeit a state of different grace --in which each variable type has One True Sigil, from which it never strays.

In Perl 6, a scalar always has a leading $, an array always has a leading @ (even when accessing its elements or slicing it), and a hash always has a leading % (even when accessing its entries or slicing it).

In other words, the sigil no longer (sometimes) indicates the type of the resulting value. Instead, it (always) tells you exactly what kind of variable you're messing about with, regardless of what kind of messing about you're doing.

The insert subroutine has several examples of this new syntax. The most obvious are in the autovivification of an empty subtree at the start of the subroutine:

            unless ($tree) {
                my %node;
                %node{LEFT}   = undef;
                %node{RIGHT}  = undef;
                %node{VALUE}  = $val

Even though we're accessing the %node hash's entries, the variable retains its % sigil and the hash access braces are simply appended to the complete variable name.

Likewise, to access the element of an array, we simply append the array-access square brackets to the variable name: @array[1]. This is a significant departure from Perl 5 syntax. In Perl 5, @array[1] is a one-element slice of the @array array; in Perl 6, it's a direct single element access (no slicing involved).

This means, of course, that Perl 6 will require some revised array-slicing semantics. Larry's planning to take that opportunity to beef up Perl's slicing facilities and provide for arbitrary slicing and dicing of multidimensional arrays. But that's for a future Apocalypse.

For the time being, it's enough to know that, if you put a single scalar in the square brackets, you get a single element look-up; if you put a list in the brackets, you get a slice.

Haven't we met before? (part 2)

The last assignment to a %node entry has one other little twist. The (copy of the) value being assigned is also ascribed a Found property, initialized to the value zero:

                %node{VALUE}  = $val is Found(0);

Once again, that works because, when a property is set using is, the result of the operation is the left operand (in this case $val), not the new value of the property.

Indeed, although we glossed over it at the time, that's the only reason the:

        while (<$ARGS prompts("Search? ")>) {

syntax actually worked. The expression $ARGS prompts("Search? ") set the handle's prompt, and then returned $ARGS, which became the operand of the diamond operator, resulting in a prompt-and-read operation through that handle.

We don't need no stinking backslashes

Once the new %node is initialized, a reference to it needs to be assigned to the variable that was passed as the first argument (if it's not clear why, see section 12.3.2 of Object Oriented Perl for a detailed explanation of this tree-manipulation technique).

In Perl 5, modifying an original argument would require an assignment to $_[0] (i.e. @_[0] in Perl 6), but because we declared $tree to be rw, we can assign directly to it and have the original argument change appropriately:

                $tree = %node;

Oops, (you're probably thinking), he just fell victim to one of the Classic Blunders: In a scalar context, a hash evaluates to the ratio of used buckets to allocated buckets!

In Perl 5 maybe, but in Perl 6 that near-useless behaviour has gone the way of the powdered wigs, buggy whips, and DSL providers. Instead, when evaluated in a scalar context, a hash (or an array) returns a reference to itself. So the above line of code works correctly.

Okay, (you're now wondering), if arrays do that too, how do I get the length of an array??? The answer is that in a numeric context an array reference now evaluates to the length of the array. So the translation of the Perl 5 code:

        while (@queue > 0) {    # scalar eval of @queue yields length

is:

        while (@queue > 0) {    # scalar eval of @queue yields ref to array
                                # ref to array in numeric context yields length

Similarly, in a boolean context, an array evaluates true if it contains any elements, so the translation of the Perl 5 code:

        while (@queue) {    # scalar eval of @queue yields length

is:

        while (@queue) {    # boolean eval of @queue yields true if not empty

Cunning, huh?

You say %node{VALUE}, but I say $tree{VALUE}

When we were loading up the new node, we wrote %node{VALUE} to access its 'VALUE' entry. Now that $tree holds a reference to %node, we need some way of accessing the same entry.

In Perl 5 that would be:

        $tree->{VALUE}        # Perl 5 entry access through hash ref in $tree

And since -> is spelled . in Perl 6, that becomes:

        $tree.{VALUE}         # Perl 6 entry access through hash ref in $tree

However, since the direct hash access syntax now uses a completely different sigil -- %node{VALUE} -- the . isn't needed for disambiguation there and hence can be made optional:

        $tree{VALUE}          # Perl 6 entry access through hash ref in $tree

And that's the usual way accesses to hash references will be written:

            if    ($tree{VALUE} > $val) { insert($tree{LEFT},  $val) }
            elsif ($tree{VALUE} < $val) { insert($tree{RIGHT}, $val) }
            else                        { warn "dup insert of $val\n" }
        }

This is actually far less confusing than it might at first seem. For example, back in Haven't we met before? (part 1), did you notice that:

        if (my $node = search($root, $_)) {
            print "Found $_ at $node: $node{VALUE}\n"

already used this new syntax?

In Perl 5 that would have been a (very common) error -- the second line would print an entry of %node, when we actually wanted an entry of %{$node}. But in Perl 6, it just Does What We Mean.

And, of course, access through other kinds of references will also allow the . to be omitted: $arr_ref[$index] and $sub_ref(@args).

Here's a handy conversion table:

        Access through...       Perl 5          Perl 6
        =================       ======          ======
        Scalar variable         $foo            $foo
        Array variable          $foo[$n]        @foo[$n]
        Hash variable           $foo{$k}        %foo{$k}
        Array reference         $foo->[$n]      $foo[$n] (or $foo.[$n])
        Hash reference          $foo->{$k}      $foo{$k} (or $foo.{$k})
        Code reference          $foo->(@a)      $foo(@a) (or $foo.(@a))
        Array slice             @foo[@ns]       @foo[@ns]
        Hash slice              @foo{@ks}       %foo{@ks}

A good, old-fashioned show

The show subroutine illustrates the optional nature of parameter lists. Here we omit the parameter specifications entirely, and get the good old familiar ``take-any-number-of-arguments-and-stick-them-all-in-@_'' semantics.

Indeed, apart from its DWIM-ier array access syntax, the show subroutine is vanilla Perl 5:

        sub show {
            return unless @_[0];
            show(@_[0]{LEFT}, @_[1]) unless @_[1] == $post;
            show(@_[0]{RIGHT},@_[1])     if @_[1] == $pre;
            print @_[0]{VALUE};
            show(@_[0]{LEFT}, @_[1])     if @_[1] == $post;
            show(@_[0]{RIGHT},@_[1]) unless @_[1] == $pre;
        }

And that, we believe, will be the normal experience when moving from 5 to 6: Perl will still be Perl...only slightly more so.

Of course, the show subroutine is moderately funky Perl anyway, so if symmetrically guarded repetitions of the left- and right- subtree traversals aren't your maintenance dream, this would also be the ideal place to use Perl's new case statement.

But that won't be unveiled until Apocalypse 4, so if you could just look at this little red light....<FLASH>...Thank-you.

Search me

The parameter list of the search subroutine is interesting because it's a hybrid of the old and new Perl semantics:

        sub search (HASH $tree is rw, *@_) {

Both parameter's are explicitly declared, but the second declaration (*@_) causes the remaining parameters to be collected in @_. There's nothing magical about @_ there: if the second declaration had been *@others, the rest of the arguments would have turned up in @others.

The asterisk in the second parameter tells Perl 6 that the corresponding argument position is a plain ol' list context, so that any arguments there (or thereafter) should be treated as a single list and assigned to the corresponding parameter variable. It's the equivalent of the Perl 5 @ prototype.

In contrast, a parameter declaration of @param, is the equivalent of Perl 5's \@ prototype -- and explicitly requires an array variable as the corresponding argument.

Notice that, because we started collecting arguments in @_ from the second parameter, the value we're looking for (i.e. the second argument) is referred to as @_[0], not @_[1]:

            return search($tree{@_[0]<$tree{VALUE} && "LEFT" || "RIGHT"}, @_[0])
                unless $tree{VALUE} == @_[0];

Haven't we met before? (part 3)

The second last line of search is where all the Perl 6 action is. Having worked out that we're already at the desired node, we're going to return it. But we also need to increment its Found property, which we do like so:

            $tree{VALUE} is Found($tree{VALUE}.Found+1);

This highlights two of the three ways of accessing a property: the read-write . syntax, and the write-only is operator.

If a property is accessed as if it were a method, its value can be set by passing the new value as an argument. Whether such a value is passed or not, the result of the operation is an alias (i.e. an lvalue) for the property itself. So we could also increment the values's Found property like this:

        $tree{VALUE}.Found($tree{VALUE}.Found+1);

or, like this:


        $tree{VALUE}.Found++;

The is syntax, on the other hand, can only set a property, because the is operation returns its left operand (the referent that owns the property), not the value of property itself. This is often highly useful, however, for last-minute property setting in a return statement:

        return $result is Verified;

Another very common usage is expected to be in returning zero-but-true and non-zero-but-false values:

        sub my_system ($shell_command) {
                ...
                return $error is false if $error;
                return 0 is true;
        }

The third way of accessing a property is via the prop meta-property, which returns a reference to a hash containing all the properties of a referent:

        $tree{VALUE}.prop{Found}++;

You can also use this feature to list all the properties that a referent has been ascribed:

        for (keys %{$tree.prop}) {
            print "$_: $tree{VALUE}.prop{$key}\n";
        }

By the way, in Apocalypse 2, Larry waggishly referred to the prop meta-property as btw, but with the help of modern therapeutic techniques, he's now gotten over the idea.

Coda on an earlier theme

This article has illustrated several important new features that Perl 6 will provide. But don't let all that newness scare you. Perl has always offered the ability to code at your own level and in the style that suits you best. That's not going to change, even if the style that suits you best is Perl 5.

Almost every new feature covered here will be optional, and if you choose not to use them, you can still write the same program in a manner that is very close to Perl 5. Like so:

        use strict;
        use warnings;
        my ($root, $n);
        while ($n++ < 20) { insert($root, int rand 1000) }
        my ($pre, $in, $post) = (0..2);
        print "Pre order:  "; show($root,$pre);  print " \n";
        print "In order:   "; show($root,$in);   print " \n";
        print "Post order: "; show($root,$post); print " \n";
        for (print "Search? "; <$ARGS>; print "Search? ") {
            chomp;
            if (my $node = search($root, $_)) {
                print "Found $_ at $node: $node{VALUE}\n";
                print "(again!)\n" if $node{FOUND} > 1;
            }
            else {
                print "No $_ in tree\n";
            }
        }
        exit;
        #########################################
        sub insert {
            unless (@_[0]) {
                @_[0] = { LEFT  => undef, RIGHT => undef,
                          VALUE => @_[1], FOUND => 0,
                        };
                return;
            }
            if    (@_[0]{VALUE} > @_[1]) { insert(@_[0]{LEFT},  @_[1]) }
            elsif (@_[0]{VALUE} < @_[1]) { insert(@_[0]{RIGHT}, @_[1]) }
            else                         { warn "dup insert of @_[1]\n"  }
        }
        sub show  {
            return unless @_[0];
            show(@_[0]{LEFT}, @_[1]) unless @_[1] == $post;
            show(@_[0]{RIGHT},@_[1])     if @_[1] == $pre;
            print @_[0]{VALUE};
            show(@_[0]{LEFT}, @_[1])     if @_[1] == $post;
            show(@_[0]{RIGHT},@_[1]) unless @_[1] == $pre;
        }
        sub search {
            return unless @_[0];
            return search(@_[0]{@_[1]<@_[0]{VALUE} && "LEFT" || "RIGHT"}, @_[1])
                unless @_[0]{VALUE} == @_[1];
            @_[0]{FOUND}++;
            return @_[0];
        }

In fact, that's only 40 characters (out of 1779) from being pure Perl 5. And almost all of those differences are @'s instead of $'s at the start of array element look-ups.

98% backwards compatibility even without an automatic p52p6 translator...pretty slick!

This Week on p5p 2001/05/13



Notes

You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

This week was slightly more busy than average, but was mainly little bitty threads that didn't really go everywhere; part of the problem may be that the bug reporting system's been playing up recently, and dumped a bunch of old bugs onto the list toward the end of the week.

Help wanted

Very soon, I will have some rather important exams; in order for me to concentrate on these, I'd appreciate it if someone could help out by taking the P5P summary off my hands for a couple of weeks. If you want to do that, email me at the above address.

Cleaning up the Todo list

I started the week by posting a commentary on the three todo files, perltodo.pod, Todo and Todo-5.6. The ensuing discussion helped to weed out the items that had already been done, or that people didn't actually want anyway, or were no longer appropriate. It also raised the possibility of some kind of regexp engine BOF at TPC: either a tag-team bug-fixing effort, or a talk by people such as Jarkko and Hugo.

The important outcome of this is that we now have a Grand Unified Todo List, which means there are a lot of things that you can do! And, of course, we found out where the phrase "Danger, Will Robinson!" comes from, thanks to Walt Mankowski:

Lost in Space was a campy 1960's American scifi series about a family which was, umm, lost in space. At least once an episode the family's young son Will would get into some sort of trouble (usually due to his friendship with the evil stow-away Dr. Smith) and the family's robot would flail his arms about and yell "Danger, Will Robinson! Danger!"

Perl Power Tools

The discussion on what's left to do and the discussion on Perl website updates (see "Various") both touched on the Perl Power Tools project. This was one of Tom's ideas - a set of standard Unix utilities, rewritten in Perl. The tools are very useful, and I can personally testify to their utility when I'm stuck out in the big, bad non-Unix world. Unfortunately, Tom had a hard disk crash a while ago, wiping out the mailing list for the project and some contributions; with one thing (Camel 3) or another, (a rather nasty accident recently) he's not found time to keep the project up to date.

Both Casey West and Sean Dague made offers to take over the project, with Sean also proposing a CPAN-style "Bundle" of the tools. There's been no response from Tom yet.

Safe signals

Nick Ing-Simmons quietly sneaked safe signal handling into Perl. Benjamin Sugars seemed to understand it, and produced the following summary:

1. The new signal model calls out to Perl-level signal handlers only when it's safe: either between opcodes or at certain points within certain opcodes where it's known the callout can be done safely.

2. Delaying the callout has been shown to effect code that relies on the Perl-level handler being called immediately.

3. It was demonstrated by Nick I-S that the best (IMO) way to deal with these side effects is to update those ops most likely to be affected such that they perform callout at a safe time (Nick I-S has already done this for pp_wait and pp_waitpid, and for perlio).

4. Likely candidates for the aforementioned changes are those opcodes that involve syscalls and looping opcodes. The syscall opcodes may need to be further modified to restart the syscall if need be.

5. This fix does not help those people that rely on an immediate callout to a Perl-level signal handler that gets dispatched by C code outside of the perl distribution. An example of this is code that relies on the handler being called from within stdio.

6. The old signal model is always available if compiled with PERL_OLD_SIGNALS.

At this point we can either

A. Go ahead and implement #3 for those opcodes identified in #4. We can also see if anything can be done about #5.

B. Leave the code as is, wait for bug reports, and address on an as needed basis following approach #3.

C. Do nothing more. Tell people who complain "Well, you shouldn't have relied on %SIG in the first place..." and ask them to recompile with PERL_OLD_SIGNALS.

This means for internals watchers that the long-dormant PERL_ASYNC_CHECK now actually does something: it now catches the signals and diverts the flow of execution if needed. This means that long-running operations, the regular expression and system calls may need a few more PERL_ASYNC_CHECKs dotted around them. Identifying the right places to put the checks will be the critical battle in the fight for safer signals.

Almost immediately, Abi found a bug in the signal handling, which turned out to be because the new system calls model was waiting until a wait had completed before despatching a signal, and also because system calls are no longer restarted. Benjamin posted an analysis, and started working on a pragma to select how signals should be despatched: immediately, when safe, with restarted system calls, and so on. Abigail objected, saying that would place too much emphasis on the user being familiar with signal handling implementation. Jarkko agreed, and said that system calls should always be restarted, without any need for a pragma. This caused Nick and Benjamin some consternation:

Safe signals means we cannot call a handler written in perl at arbitrary points, having system restart signals means we don't get a chance to despatch the perl handler unless we do it from the C handler which is not "safe".

We could have dynamically scoped pragma which messed with the function pointer, or the C code could advertise that it was okay to call perl handler or ...

Nick also pointed out that PerlIO did manage to arrange the signal despatch at the right times, because it had PERL_ASYNC_CHECK in the right places. I don't know if system call restarting has been implemented, but I do know that Nick quietly fixed Abigail's bug as well.

Release Numbering

Alan asked what the release numbering policy for Perl was; his intuitions in fact turned out to be correct:

Perl release numbering is composed of a dotted <Language Version> <Major Version> <Minor Version> tuple.

Major releases may break both source and binary compatibility, although they are more likely to preserve source compatibility and break binary compatibility.

Sometimes binary compatibility is possible over a major release boundary, e.g. by use of the -Dbincompat5005 to Configure.

Minor releases should not break either source or binary compatibility, i.e. XSUBs compiled against 5.6.0 should still work on 5.6.1.

Binary compatibility will be broken if you change the 'bitness' of perl, e.g. by switching from 32 to 64 bit integer support or vice-versa.

The transition from 5.005_03 to 5.6.0 was a Major release The transition from 5.6.0 to 5.6.1 was a Minor release The current development track is the 5.7.X series The next Major release will be 5.8.0

Andy Dougherty explained that Perl uses the xs_apiversion config variable to find modules which are "binary compatible" with the current Perl. Alan noted that Perl's "binary compatibility" assumes that everyone has the source and a compiler, whereas Solaris' "binary compatibility" means that SunOS 4 executables run on 64-bit Solaris.

Andy also explained how to distribute non-core modules with Perl without getting fried by upgrades.

Alan also bemoaned the problem that if Sun ships a Perl compiled with its own compilers, users who only have gcc won't be able to build external modules properly; he hacked up a solution whereby " gccperl" would see the right configuration.

Various

I started fiddling with roffitall, and then got sidetracked into making roffitall, installperl and installman all use a separate data file for identifying utilities that ship with Perl; Mike Guy fixed up a bug which caused AutoSplit to generate bogus line numbers.

Benjamin Sugars documented the our $foo :shared attribute. Randal and I cleaned up the book, magazine and website links in the Perl FAQ.

There was a long and drawn-out campaign to make

    1 while unlink $filename

the official way to delete files in the core: this is supposed to help VMS and other versioning filesystems really delete a file instead of just taking a swipe at it. Peter Prymmer chimed in some EBCDIC fixes. Robin Houston.... oh, you can guess the rest.

Until next week, or longer if I can persuade someone to take my place, I remain, your humble and obedient servant,


Simon Cozens

This Week on p5p 2001/05/20



Notes

You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

This was a fairly typical week with some large threads.

Help found

Simon Cozens, the usual summarizer, is currently having rather important exams. Leon Brocard has stepped in to take the P5P summary off his hands for a bit. Good luck, Simon!

Internationalisation

Alan Burlison started off by asking:

In amongst all the Unicode work, has any thought been given to internationalising the perl interpreter itself? At the moment all the error and warning messages are english-only, AFAIK. This also applies to the standard modules.

A small thread discussed the huge amount of work involved in translating every English part of the Perl source into multiple languages, and the fact that the typical method of achieving this was message catalogues which are, as Jarkko understatedly pointed out "messy". Sean Burke's Locale::Maketext module was mentioned as a solution. The thread went dead, although it looks like at least the perldiag documentation will be worked upon.

Must pseudo-hashes die?

Pseudohashes are a long-lived and yet not much loved experimental feature of Perl which aims to optimise a hash reference to an array reference at compile time, helping speed and efficiency. Michael Schwern asked that pseudo-hashes should be removed from Perl 5.8 and onwards.

He found that pseudohashes are about 15% faster than hashes, but that the current implementation is layered over the hash code so that:

Tearing out the pseudohash code gives an across the board 10-15% gain in speed in basic benchmarks. That means if we didn't have pseudohashes, normal hashes would be just as fast as fully declared pseudohashes!

Removal of pseudohashes would need a clean implementation of fields. Graham Barr suggested moving the pseudohash code out of the hash code and making a new PSEUDOHASH type.

In summary: it looks like current user-visible implementation of pseudo-hashes (the weird use of the first array element) will be deprecated starting from Perl 5.8.0 and will be removed in Perl 5.10.0. Jarkko complained: "Instead of too much implementation hand-waving how about some real implementations, followed by some real benchmarks, and discussion on some real tradeoffs?".

This brought upon another point: making experimental features Perl compile time configuration options which default to OFF. Little discussion ensued.

More Magic

Simon Cozens attempted to convince us of the utility of a generalised magic for arrays and hashes:

The idea is simple. When you use a special scalar in Perl, like, say, $/, (but not $_, because that's not only special, it's extra special) Perl looks up the scalar in the symbol table and finds that it has magic attached to it. This magic, called "\0" magic, means that instead of getting the value from the SV structure, Perl calls a function magic_get and returns the value from there. (It doesn't have to be magic_get - there are lots of different types of magic, which fire off different functions. Anyway, "\0" magic fires off magic_get.) magic_get looks at the name of the scalar, and basically does a big case statement to perform code appropriate for that scalar.

The end result is that if I want to add a new special variable - let's say $} - then I can tell gv.c that it should have "\0" magic, and insert the appropriate code to deal with it in magic_get.

For arrays and hashes, it's not that easy. Unfortunately, every single special array and hash has its own magic - and not just its own, but also a different type of magic that should be copied to the elements of the array or hash. So if you want to add more special variables, you have to cook up a new kind of magic and more virtual tables and functions and it's just horrid.

Simon and Sarathy want to make it easier to add magic to aggregates, but suggested different approaches. Simon wants to introduce a new once-and-for-all generic magic for aggregates and their elements scalars; Sarathy wants to collapse everything into the "P" magic which handles tied arrays and hashes.

Legal FAQ

A. C. Yardley proposed to develop a Legal FAQ to hopefully address most of the important legal issues concerning Perl, the Perl community, and Perl's Open Source efforts. This is a wonderful thing to undertake, for as Jarkko pointed out: "it's about time we get something a little bit more than just educated guesses."

Proposed topics include: general discussion about lawyers and the American legal system, copyright and trademark law and international conventions, a review of the so-called Digital Millennium Copyright Act, licensing issues about code on CPAN and names, liability issues, and much more.

He also notes (unfortunately, but quite understandably) for a distributed, world-wide development effort such as Perl:

Unfortunately, other the rather general remarks and references, I must limit my review and analysis of the relevant legal issues contained within the proposed FAQ to the domestic laws of the United States. It is unfortunate, but a review of the domestic laws of all of the applicable Nation States would be a significantly, non-trivial undertaking.

Various

Dan Sugalski dropped in a patch to speed up threaded perl on Alpha VMS systems by about 2-3 percent.

Doug MacEachern asked about optimising hash lookups with a constant key by pre-calculating the hash of the key. Turns out this was already in bleadperl, but disabled under ithreads.

Gisle Aas contributed a patch to stop chomp always stringifying its argument (even if it leaves it alone) as well as a patch to fix require $mod where $mod has touched numeric context.

Ilya Zakharevich provided 5 patches, including cleanups, both low- and high-level API additions, and more convenient client-server debugging in the Perl debugger, along with some OS/2 work.

Robin Houston asked about CvFILE() corruption under ithreads. CvFILE() returns the name of the file that the CV (sub, xsub or format) was declared in, and isn't used in core Perl but rather used by compiler backends such as B::C and B::Deparse. Robin supplied a patch but will rework it.

Dave Mitchell supplied a large patch that replaced magic char constants with symbolic values.

Some bugs in pp_concat were noticed by Hugo and Jarkko took the blame for previously cleaning it up rather too severely.

Until next week I remain, your temporarily-replaced humble and obedient servant,

Leon Brocard
http://www.astray.com/
Iterative Software

... All things are green unless they are not


This Week on p5p 2001/05/06



Notes

You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest-subscribe@netthink.co.uk.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

Thanks to the gigantic ithreads, uh, thread, this week saw nearly 600 messages, and guess who has to sit and read them all?

iThreads

The first rumblings came when Dan Sugalski, without any warning, dropped in an 87K patch to make threads and multiplicity work on VMS - a veritable feat.

Then the thread started last week when Artur talked about his work on the iThreads.pm module, which we looked at a little Read about it. two weeks ago. Sarathy said it was great that someone was working on it, and gave a few suggestions; primarily that magic was probably the best way to implement shared data access, and how we would implement :shared. Doug explained that we already have :shared, but it does something else, what Dick Hardt calls solar variables. (See below) Artur asked if we could turn solars into fully shared variables by using a mutex when one wants to write to them. Dan explained that it was non-threadsafe to upgrade a readonly scalar, (which Perl will do if it needs to stringify or numify it.) which could be seen as a bug in Perl's implementation of :shared GVs. Doug said it wasn't a bug - if you're marking data as :shared it shouldn't be upgraded at runtime. This is true from Doug's point of view - solar variables - but not from Artur's desire for true shared variables.

Benjamin Sugars asked if Sarathy expected Perl to do the locking transparently:

    our $sv : shared = "foo";
    # Start some threads here, then...
    print $sv;         # read-only, no lock according to above?
    $sv = "bar";       # locks $sv for read/write

He quite rightly pointed out that this would mean that every op would need to test for sharedness and locking. Jarkko agreed, saying that locking should be implemented manually by the programmer; Sarathy said that should only be the case for locks to avoid deadlock in user space: "Any and all locking needed to avoid coredumps and memory corruption should be the sole province of the perl guts." He also said that he thinks that his idea could work and even be suitable for Perl 5.6.x. (And also took the opportunity to attempt to throw the pumpkin away again, but nobody took the hint.)

Dan brought up the vexed question of non-reentrant libraries, and whether or not XS authors will properly protect their code. Nick Clarksuggested that CPAN authors should have to declare whether or not their code is threadsafe. Chris Stith asked if there could be a way for modules to tell perl itself that they were threadsafe. Benjamin Sugars asked if this would actually help: "Simply serializing calls into a non-thread-safe library doesn't make it thread-safe." Alan produced a Sun manpage all about what "thread-safe" means.

This naturally took us on to using reentrant versions of C library calls: unfortunately, this would take a lot of work at the Configure end, and is especially tricky because the interfaces to reentrant functions aren't necessarily standard. Jarkko suggested that we have more important thread problems to look at, specifically the regular expression engine.

Dan said that serializing calls to a non-thread-safe library will work most of the time, but Alan said that only works if they have no stored state. Artur asked why we couldn't use thread local storage - the answer, of course, being that external libraries are black boxes; we don't know what state they're storing or where they're storing it.

Alan pointed at some bits of Java that had locking wrong, even though Java has a well-defined thread support model, and mentioned that we would be better off putting a proper event loop in Perl. Artur mentioned POE, and that he was writing the threads module to make POE multithreaded.

Alan then began his impression of Eeyore:

SETJMP AND LONGJMP ARE USED EXTENSIVELY IN PERL5. SETJMP AND LONGJMP ARE NOT MT-SAFE. YOU ARE WASTING YOUR TIME TRYING TO PUT MULTITHREADING INTO PERL5 AS IT STANDS. EVEN IF IT WORKS MOST OF THE TIME ON YOUR UNIPROCESSOR MACHINE IT WILL EXPLODE IN YOUR FACE ON A MULTIPROCESSOR MACHINE. THREADS IN PERL5 ARE DOOMED TO FAILURE WITHOUT SIGNIFICANT REARCHITECTING.

AND NO, PUTTING A WHACKING GREAT GLOBAL LOCK AROUND EVERYTHING DOES NOT MAKE PERL5 MULTITHREADED.

I hope that is clear.

Dan concurred, but a lot more mildly: "Perl 5 won't ever be properly thread safe, I think. The best we can hope for is for it to be safe except for exceptional cases. Which isn't good enough, but isn't that bad." He also explained that if you do any mallocs with Perl's own malloc you really really need to protect it with a mutex. (Sarathy picked a nit - it actually protects itself.)

Sarathy disagreed, and said that it was a priority to make sure that Perl built-ins always call "safe" library functions under ithreads. He also said that there was no shared state between interpreters in ithreads, so a build which uses multiplicity and libpthread should be perfectly safe, barring external libraries: ithreads are going to be as safe as your system's thread-safe C library. Jarkko came up with two old P5P postings detailing the interfaces to "safe" library functions: here and here.

Jarkko called a halt to the discussion as it started getting out of order (which was a bit of a shame as the major players had just agreed to stop mudslinging and discuss how to help Artur; oh well.) He also said that he wasn't too impressed by the idea of adding Configure probes for a maze of twisty syscalls, all different. This, bizarrely, fell into a discussion about how h2xs is broken for constants. But I suspect that's another story for another time.

Here are Artur's conclusions on ithreads:

a) We have ithreads today, they exist in the core,

b) they are used on win32, now, and they will possibly be used on a larger amount of platforms with mod_perl 2.0 (assume non forking mpm)

c) my belief is that the mod_perl usage, is reason enough to work on this

d) there are no changes to the core suggested (if we are not to add configure probes and *_r)

e) modules that want to work under mod_perl 2.0 should be threadsafe, same with psuedo forking on windows

f) call me a bigot but I believe that supporting Win32, Linux and Solaris is good for a starter (mainly because I use those systems, but sounds like Tru64 shouldn't be a problem either, nor *bsd)

g) an event loop is not a replacment for threads, threads are not an replacment for event loops, there are event loops avaible to perl, Event.pm, POE, Tk, there are no threads avaible even if the support is there in the C layer

In the meantime, you can read more about what Artur's doing by looking at his use.perl journal. (A few of the Perl porters have journals on use.perl.org - have you?)

Relocatable Perl

Alan has been trying to get Perl relocatable on Solaris. As anyone who's administered a Unix system with Perl on will know, one major downer is that the paths in @INC are hard-coded into the perl binary. This makes it nearly impossible to pick up Perl and move the installation somewhere else.

Alan noticed two new things that will help him in his quest: "The first of these is the ability to refer to a library with a path that is relative to the executable (removing the need for LD_LIBRARY_PATH hackery) and the second is the ability to find out the path of the executable from inside the executable (removing the need for PERL5LIB hackery)." He then asked what would need changing before it would all just work.

Dan emphatically did not want to point out that he'd had a relocatable Perl on VMS for many years now. He explained how he would do it here, moving the #defines into global strings and instanciating them at run-time. Nick Ing-Simmons pointed out that ActiveState's Win32 Perl already does this. Surprisingly, everyone agreed about how it should be done. Steve Fink asked if we could generalise it to other Unices with procfs by calling readlink("/proc/self/exe",...). Jarkko also gave us some useful tips about how programs ought to find themselves on Unix: Read about it.

Quote of the week from Dan: "Remember, any design flaw you're sufficiently snide about becomes a feature."

Change 10,000

After four years in Perforce version control, Perl has finally seen its 10,000th registered patch. Jarkko has this to say about it:

The 10000th patch was courtesy of Andreas Köaut;nig. Yes, I made itso by selective patch ordering :-) -- but it doesn't diminish the significance of the choice.

Andreas does deserve all the possible glory for keeping the PAUSE running. The PAUSE is the major part of CPAN and I always feel bad when people talk of me as "the CPAN guy". CPAN wouldn't exist without Andreas. Also the wondrous CPAN.pm is his handiwork. Remember him everytime you are using CPAN.pm and a module, including all its prerequisites, installs like magic. I know Andreas wants you to remember him also when the installation doesn't work *quite* that magically :-), send him the bug report and you can be certain that Andreas will fix the problem amazingly fast.

What Jarkko didn't say, however, is that over the past four years, nearly 50% of the patches in the repository have been due to him.

perl < /dev/random

Ilya found, unsurprisingly, that you can break Perl by feeding it random crud:

As shipped: it can survive circa 10000 evals of random input (of length 1000 each), i.e., I see a failure in the rate 1 per minute (athlon 850). The failure mode I see is an infinite loop. The debugging indicates that the actual failure rate may be much larger, but the corruption is not visible for some time, since the code executed in the script is so short, and (random?) memory corruption misses it for some time.

He produced a little patch to fix up an input overflow; Jarkko mentioned a neat idea relating to deliberately "poisoning" various parts of Perl - the allocation arena, for instance - and seeing how it coped. After checking Ilya's methodology, we found that the problem was that Perl got slowly more brain-damaged by more and more failed evals. The input overflow patch was applied, but nobody seems to know how Perl can read in too much input in the first place...

Module License Registration

After a brief and surprisingly sane debate about licensing, Andreas came up with his list of suggestions for the new "License" category of the CPAN module classification system.

His message on the subject explains it far better than I could; however, the categories he's chosen are:

     p   - Standard-Perl: user may choose between GPL and Artistic
     g   - GPL: GNU General Public License
     l   - LPGL: GNU Library or "Lesser" Public License,
     b   - BSD: The BSD License
     a   - Artistic license alone
     o   - Any other Open Source licenses listed at http://www.opensource.org/licenses/
     d   - Not approved by www.opensource.org, but distribution allowed without restrictions
     r   - Some restriction on distribution
     n   - No license given

Various

Ask complained that the Perl Builder people were spamming - it turned out that they were actually just incompetent about handling other people's email addresses.

Andreas complained the bleadperl wasn't installing properly any more - the new version of s2p can also be used as a complete implementation of sed in Perl, so we've called it psed. Unfortunately, someone forgot to add code to make the symlink between s2p and psed.

Benjamin Sugars did some work to fix up the new

    open $fh, ">", \$scalar;

semantics, allowing Unix-like appending and seeking. He also asked what stat should do on a scalar-file; Jarkko warned against taking the metaphor too far: "And... link() should do alias-via-typeglob and symlink() should create a weak reference? :-)" Hugo noticed that the output of -Dt was occasionally incorrect, producing the wrong lexical variables; Benjamin took a look into this, with some help from Sarathy.

Benjamin also patched sv_dump to notice the GvSHARED flag, which prompted a short discussion of what GvSHARED means. Doug explained: "for example, our @EXPORT : shared makes the variables shared across interpreters and read-only. This is useful for things that take up lots of space, e.g. *POSIX::*EXPORT*w ithout the shared attribute, they are copied. For POSIX.pm thats 132k * say 20 interpreters, [shared] gives savings there of about 2.6Mb. It's useful for any app where perl_clone() would be called, which at the moment includes embedded apps, Win32 fork() and anything that uses iThreads."

Tels found that building Perl was, for some reason, taking up a semi-infinite amount of disk space at the make depend stage; nobody knew why, since that code hadn't been touched for a long time. Jarkko told him to use 5.7.1 and the problem seems to have gone away.

As usual, Robin made B::Deparse sing, dance and do new tricks, and until next week I remain, your humble and obedient servant,


Simon Cozens

Larry Wall: Apocalypse Two


Editor's Note: this Apocalypse is out of date and remains here for histor ic reasons. See Syn opsis 02 for the latest information.

Perl 6 Apocalypse

The rest of the "Apocalypse" series is available on Larry Wall's author page.

Table Of Contents


Larry Wall will give his annual entertaining talk on the state of the Perl world, covering both Perl 5 and Perl 6 at this Year's Open Source Convention. Don't miss this rare opportunity to hear the creator of Perl, patch, and run share his insights.

Here's Apocalypse 2, meant to be read in conjunction with Chapter 2 of the Camel Book. The basic assumption is that if Chapter 2 talks about something that I don't discuss here, it doesn't change in Perl 6. (Of course, it could always just be an oversight. One might say that people who oversee things have a gift of oversight.)

Before I go further, I would like to thank all the victims, er, participants in the RFC process. (I beg special forgiveness from those whose brains I haven't been able to get inside well enough to incorporate their ideas). I would also like to particularly thank Damian Conway, who will recognize many of his systematic ideas here, including some that have been less than improved by my meddling.

Here are the RFCs covered:

    RFC  PSA  Title
    ---  ---  -----
      Textual
    005  cdr  Multiline Comments for Perl
    102  dcr  Inline Comments for Perl
      Types
    161  adb  Everything in Perl Becomes an Object
    038  bdb  Standardise Handling of Abnormal Numbers Like Infinities and NaNs
    043  bcb  Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars
    192  ddr  Undef Values ne Value
    212  rrb  Make Length(@array) Work
    218  bcc  C<my Dog $spot> Is Just an Assertion
      Variables
    071  aaa  Legacy Perl $pkg'var Should Die
    009  bfr  Highlander Variable Types
    133  bcr  Alternate Syntax for Variable Names
    134  bcc  Alternative Array and Hash Slicing
    196  bcb  More Direct Syntax for Hashes
    201  bcr  Hash Slicing
      Strings
    105  aaa  Remove "In string @ must be \@" Fatal Error
    111  aaa  Here Docs Terminators (Was Whitespace and Here Docs)
    162  abb  Heredoc Contents
    139  cfr  Allow Calling Any Function With a Syntax Like s///
    222  abb  Interpolation of Object Method Calls
    226  acr  Selective Interpolation in Single Quotish Context
    237  adc  Hashes Should Interpolate in Double-Quoted Strings
    251  acr  Interpolation of Class Method Calls
    252  abb  Interpolation of Subroutines
    327  dbr  C<\v> for Vertical Tab
    328  bcr  Single Quotes Don't Interpolate \' and \\
      Files
    034  aaa  Angle Brackets Should Not Be Used for File Globbing
    051  ccr  Angle Brackets Should Accept Filenames and Lists
      Lists
    175  rrb  Add C<list> Keyword to Force List Context (like C<scalar>)
      Retracted
    010  rr  Filehandles Should Use C<*> as a Type Prefix If Typeglobs Are Eliminated
    103  rr  Fix C<$pkg::$var> Precedence Issues With Parsing of C<::>
    109  rr  Less Line Noise - Let's Get Rid of @%
    245  rr  Add New C<empty> Keyword to DWIM for Clearing Values
    263  rr  Add Null() Keyword and Fundamental Data Type

Atoms

Perl 6 programs are notionally written in Unicode, and assume Unicode semantics by default even when they happen to be processing other character sets behind the scenes. Note that when we say that Perl is written in Unicode, we're speaking of an abstract character set, not any particular encoding. (The typical program will likely be written in UTF-8 in the West, and in some 16-bit character set in the East.)


Molecules

RFC 005: M ultiline Comments for Perl

I admit to being prejudiced on this one -- I was unduly influenced at a tender age by the rationale for the design of Ada, which made a good case, I thought, for leaving multiline comments out of the language.

But even if I weren't blindly prejudiced, I suspect I'd look at the psychology of the thing, and notice that much of the time, even in languages that have multiline comments, people nevertheless tend to use them like this:

    /*
     *  Natter, natter, natter.
     *  Gromish, gromish, gromish.
     */

The counterargument to that is, of course, that people don't always do that in C, so why should they have to do it in Perl? And if there were no other way to do multiline comments in Perl, they'd have a stronger case. But there already is another way, albeit one rejected by this RFC as ``a workaround.''

But it seems to me that, rather than adding another kind of comment or trying to make something that looks like code behave like a comment, the solution is simply to fix whatever is wrong with POD so that its use for commenting can no longer be considered a workaround. Actual design of POD can be put off till Apocalypse 26, but we can speculate at this point that the rules for switching back and forth between POD and Perl are suboptimal for use in comments. If so, then it's likely that in Perl 6 we'll have a rule like this: If a =begin MUMBLE transitions from Perl to POD mode then the corresponding =end MUMBLE should transition back (without a =cut directive).

Note that we haven't defined our MUMBLEs yet, but they can be set up to let our program have any sort of programmatic access to the data that we desire. For instance, it is likely that comments of this kind could be tied in with some sort of literate (or at least, semiliterate) programming framework.

RFC 102: Inline Comments for Perl

I have never much liked inline comments -- as commonly practiced they tend to obfuscate the code as much as they clarify it. That being said, ``All is fair if you predeclare.'' So there should be nothing preventing someone from writing a lexer regex that handles them, provided we make the lexer sufficiently mutable. Which we will. (As it happens, the character sequence ``/*'' will be unlikely to occur in standard Perl 6. Which I guess means it is likely to occur in nonstandard Perl 6. :-)

A pragma declaring nonstandard commenting would also allow people to use /* */ for multiline comments, if they like. (But I still think it'd be better to use POD directives for that, just to keep the text accessible to the program.)


Built-In Data Types

The basic change here is that, rather than just supporting scalars, arrays and hashes, Perl 6 supports opaque objects as a fourth fundamental data type. (You might think of them as pseudo-hashes done right.) While a class can access its object attributes any way it likes, all external access to opaque objects occurs through methods, even for attributes. (This guarantees that attribute inheritance works correctly.)

While Perl 6 still defaults to typeless scalars, Perl will be able to give you more performance and safety as you give it more type information to work with. The basic assumption is that homogenous data structures will be in arrays and hashes, so you can declare the type of the scalars held in an array or hash. Heterogenous structures can still be put into typeless arrays and hashes, but in general Perl 6 will encourage you to use classes for such data, much as C encourages you to use structs rather than arrays for such data.

One thing we'll be mentioning before we discuss it in detail is the notion of ``properties.'' (In Perl 5, we called these ``attributes,'' but we're reserving that term for actual object attributes these days, so we'll call these things ``properties.'') Variables and values can have additional data associated with them that is ``out of band'' with respect to the ordinary typology of the variable or value. For now, just think of properties as a way of adding ad hoc attributes to a class that doesn't support them. You could also think of it as a form of class derivation at the granularity of the individual object, without having to declare a complete new class.

RFC 161: Everything in Perl Becomes an Object.

This is essentially a philosophical RFC that is rather short on detail. Nonetheless, I agree with the premise that all Perl objects should act like objects if you choose to treat them that way. If you choose not to treat them as objects, then Perl will try to go along with that, too. (You may use hash subscripting and slicing syntax to call attribute accessors, for instance, even if the attributes themselves are not stored in a hash.) Just because Perl 6 is more object-oriented internally, does not mean you'll be forced to think in object-oriented terms when you don't want to. (By and large, there will be a few places where OO-think is more required in Perl 6 than in Perl 5. Filehandles are more object-oriented in Perl 6, for instance, and the special variables that used to be magically associated with the currently selected output handle are better specified by association with a specific filehandle.)

RFC 038: Standardise Handling Of Abnormal Numbers Like Infinities and NaNs

This is likely to slow down numeric processing in some locations. Perhaps it could be turned off when desirable. We need to be careful not to invent something that is guaranteed to run slower than IEEE floating point. We should also try to avoid defining a type system that makes translation of numeric types to Java or C# types problematic.

That being said, standard semantics are a good thing, and should be the default behavior.

RFC 043: Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars

This RFC suggests that a pragma enables the feature, but I think it should probably be tied to the run-time type system, which means it's driven more by how the data is created than by where it happens to be stored or processed. I don't see how we can make it a pragma, except perhaps to influence the meaning of ``int'' and ``num'' in actual declarations further on in the lexical scope:

    use bigint;
    my int $i;

might really mean

    my bigint $i;

or maybe just

    my int $i is bigint;

since representation specifications might just be considered part of the ``fine print.'' But the whole subject of lexically scoped variable properties specifying the nature of the objects they contain is a bit problematic. A variable is a sort of mini-interface, a contract if you will, between the program and the object in question. Properties that merely influence how the program sees the object are not a problem -- when you declare a variable to be constant, you're promising not to modify the object through that variable, rather than saying something intrinsically true about the object. (Not that there aren't objects that are intrinsically constant.)

Other property declarations might need to have some say in how constructors are called in order to guarantee consistency between the variable's view of the object, and the nature of the object itself. In the worst case we could try to enforce consistency at run time, but that's apt to be slow. If every assignment of a Dog object to a Mammal variable has to check to see whether Dog is a Mammal, then the assignment is going to be a dog.

So we'll have to revisit this when we're defining the relationship between variable declarations and constructors. In any event, if we don't make Perl's numeric types automatically promote to big representations, we should at least make it easy to specify it when you want that to happen.

RFC 192: Undef Values ne Value

I've rejected this one, because I think something that's undefined should be considered just that, undefined. I think the standard semantics are useful for catching many kinds of errors.

That being said, it'll hopefully be easy to modify the standard operators within a particular scope, so I don't think we need to think that our way to think is the only way to think, I think.

RFC 212: Make length(@array) Work

Here's an oddity, an RFC that the author retracted, but that I accept, more or less. I think length(@array) should be equivalent to @array.length(), so if there's a length method available, it should be called.

The question is whether there should be a length method at all, for strings or arrays. It almost makes more sense for arrays than it does for strings these days, because when you talk about the length of a string, you need to know whether you're talking about byte length or character length. So we may split up the traditional length function into two, in which case we might end up with:

    $foo.chars
    $foo.bytes
    @foo.elems

Or some such. Whatever the method names we choose, differentiating them would be more powerful in supplying context. For instance, one could envision calling @foo.bytes to return the byte length of all the strings. That wouldn't fly if we overloaded the method name.

Even chars($foo) might not be sufficiently precise, since, depending on how you're processing Unicode, you might want to know how long the string is in actual characters, not counting combining characters that don't take extra space. But that's a topic for later.

RF C 218: my Dog $spot Is Just an Assertion

I expect that a declaration of the form:

    my Dog $spot;

is merely an assertion that you will not use $spot inconsistently with it being a Dog. (But I mean something different by ``assertion'' than this RFC does.) This assertion may or may not be tested at every assignment to $spot, depending on pragmatic context. This bare declaration does not call a constructor; however, there may be forms of declaration that do. This may be necessary so that the variable and the object can pass properties back and forth, and in general, make sure they're consistent with each other. For example, you might declare an array with a multidimensional shape, and this shape property needs to be visible to the constructor, if we don't want to have to specify it redundantly.

On the other hand, we might be able to get assignment sufficiently overloaded to accomplish the same goal, so I'm deferring judgment on that. All I'm deciding here is that a bare declaration without arguments as above does not invoke a constructor, but merely tells the compiler something.

Other Decisions About Types

Built-in object types will be in all uppercase: INTEGER, NUMBER, STRING, REF, SCALAR, ARRAY, HASH, REGEX and CODE. Corresponding to at least some of these, there will also be lowercase intrinsic types, such as int, num, str and ref. Use of the lowercase typename implies you aren't intending to do anything fancy OO-wise with the values, or store any run-time properties, and thus Perl should feel free to store them compactly. (As a limiting case, objects of type bit can be stored in one bit.) This distinction corresponds roughly to the boxed/unboxed distinction of other computer languages, but it is likely that Perl 6 will attempt to erase the distinction for you to the extent possible. So, for instance, an int may still be used in a string context, and Perl will convert it for you, but it won't cache it, so the next time you use it as a string, it will have to convert again.

The declared type of an array or hash specifies the type of each element, not the type of an array or hash as a whole. This is justified by the notion that an array or hash is really just a strange kind of function that (typically) takes a subscript as an argument and returns a value of a particular type. If you wish to associate a type with the array or hash as a whole, that involves setting a tie property. If you find yourself wishing to declare different types on different elements, it probably means that you should either be using a class for the whole heterogenous thing, or at least declare the type of array or hash that will be a base class of all the objects it will contain.

Of course, untyped arrays and hashes will be just as acceptable as they are currently. But a language can only run so fast when you force it to defer all type checking and method lookup till run time.

The intent is to make use of type information where it's useful, and not require it where it's not. Besides performance and safety, one other place where type information is useful is in writing interfaces to other languages. It is postulated that Perl 6 will provide enough optional type declaration syntax that it will be unnecessary to write XS-style glue in most cases.


Variables

RFC 0 71: Legacy Perl $pkg'var Should Die

I agree. I was unduly influenced by Ada syntax here, and it was a mistake. And although we're adding a properties feature into Perl 6 that is much like Ada's attribute feature, we won't make the mistake of reintroducing a syntax that drives highlighting editors nuts. We'll try to make different mistakes this time.

RFC 009: Hig hlander Variable Types

I basically agree with the problem this RFC is trying to solve, but I disagree with the proposed solution. The basic problem is that, while the idiomatic association of $foo[$bar] with @foo rather than $foo worked fine in Perl 4, when we added recursive data structures to Perl 5, it started getting in the way notationally, so that initial funny character was trying to do too much in both introducing the ``root'' of the reference, as well as the context to apply to the final subscript. This necessitated odd looking constructions like:

    $foo->[1][2][3]

This RFC proposes to solve the dilemma by unifying scalar variables with arrays and hashes at the name level. But I think people like to think of $foo, @foo and %foo as separate variables, so I don't want to break that. Plus, the RFC doesn't unify &foo, while it's perfectly possible to have a reference to a function as well as a reference to the more ordinary data structures.

So rather than unifying the names, I believe all we have to do is unify the treatment of variables with respect to references. That is, all variables may be thought of as references, not just scalars. And in that case, subscripts always dereference the reference implicit in the array or hash named on the left.

This has two major implications, however. It means that Perl programmers must learn to write @foo[1] where they used to write $foo[1]. I think most Perl 5 people will be able to get used to this, since many of them found the current syntax a bit weird in the first place.

The second implication is that slicing needs a new notation, because subscripts no longer have their scalar/list context controlled by the initial funny character. Instead, the context of the subscript will need to be controlled by some combination of:

  1. Context of the entire term.

  2. Appearance of known list operators in the subscript, such as comma or range.

  3. Explicit syntax casting the inside of the subscript to list or scalar context.

  4. Explicit declaration of default behavior.

One thing that probably shouldn't enter into it is the run-time type of the array object, because context really needs to be calculated at compile time if at all possible.

In any event, it's likely that some people will want subscripts to default to scalars, and other people will want them to default to lists. There are good arguments for either default, depending on whether you think more like an APL programmer or a mere mortal.

There are other larger implications. If composite variables are thought of as scalar references, then the names @foo and %foo are really scalar variables unless explicitly dereferenced. That means that when you mention them in a scalar context, you get the equivalent of Perl 5's \@foo and \%foo. This simplifies the prototyping system greatly, in that an operator like push no longer needs to specify some kind of special reference context for its first argument -- it can merely specify a scalar context, and that's good enough to assume the reference generation on its first argument. (Of course, the function signature can always be more specific if it wants to. More about that in future installments.)

There are also implications for the assignment operator, in that it has to be possible to assign array references to array variables without accidentally invoking list context and copying the list instead of the reference to the list. We could invent another assignment operator to distinguish the two cases, but at the moment it looks as though bare variables and slices will behave as lvalues just as they do in Perl 5, while lists in parentheses will change to a binding of the right-hand arguments more closely resembling the way Perl 6 will bind formal arguments to actual arguments for function calls. That is to say,

    @foo = (1,2,3);

will supply an unbounded list context to the right side, but

    (@foo, @bar) = (@bar, @foo)

will supply a context to the right side that requests two scalar values that are array references. This will be the default for unmarked variables in an lvalue list, but there will be an easy way to mark formal array and hash parameters to slurp the rest of the arguments with list context, as they do by default in Perl 5.

(Alternately, we might end up leaving the ordinary list assignment operator with Perl 5 semantics, and define a new assignment operator such as := that does signatured assignment. I can argue that one both ways.)

Just as arrays and hashes are explicitly dereferenced via subscripting (or implicitly dereferenced in list context), so too functions are merely named but not called by &foo, and explicitly dereferenced with parentheses (or by use as a bare name without the ampersand (or both)). The Perl 5 meanings of the ampersand are no longer in effect, in that ampersand will no longer imply that signature matching is suppressed -- there will be a different mechanism for that. And since &foo without parens doesn't do a call, it is no longer possible to use that syntax to automatically pass the @_ array -- you'll have to do that explicitly now with foo(@_).

Scalar variables are special, in that they may hold either references or actual ``native'' values, and there is no special dereference syntax as there is for other types. Perl 6 will attempt to hide the distinction as much as possible. That is, if $foo contains a native integer, calling the $foo.bar method will call a method on the built-in type. But if $foo contains a reference to some other object, it will call the method on that object. This is consistent with the way we think about overloading in Perl 5, so you shouldn't find this behavior surprising. It may take special syntax to get at any methods of the reference variable itself in this case, but it's OK if special cases are special.

RFC 133: Alternate Syntax for Variable Names

This RFC has a valid point, but in fact we're going to do just the opposite of what it suggests. That is, we'll consider the funny characters to be part of the name, and use the subscripts for context. This works out better, because there's only one funny character, but many possible forms of dereferencing.

R FC 134: Alternative Array and Hash Slicing

We're definitely killing Perl 5's slice syntax, at least as far as relying on the initial character to determine the context of the subscript. There are many ways we could reintroduce a slicing syntax, some of which are mentioned in this RFC, but we'll defer the decision on that till Apocalypse 9 on Data Structures, since the interesting parts of designing slice syntax will be driven by the need to slice multidimensional arrays.

For now we'll just say that arrays can have subscript signatures much like functions have parameter signatures. Ordinary one-dimensional arrays (and hashes) can then support some kind of simple slicing syntax that can be extended for more complicated arrays, while allowing multidimensional arrays to distinguish between simple slicing and complicated mappings of lists and functions onto subscripts in a manner more conducive to numerical programming.

On the subject of hash slices returning pairs rather than values, we could distinguish this with special slice syntax, or we could establish the notion of a hashlist context that tells the slice to return pairs rather than just values. (We may not need a special slice syntax for that if it's possible to typecast back and forth between pair lists and ordinary lists.)

RFC 19 6: More Direct Syntax for Hashes

This RFC makes three proposals, which we'll consider separately.

Proposal 1 is ``that a hash in scalar context evaluate to the number of keys in the hash.'' (You can find that out now, but only by using the keys() function in scalar context.) Proposal 1 is OK if we change ``scalar context'' to ``numeric context,'' since in scalar context a hash will produce a reference to the hash, which just happens to numify to the number of entries.

We must also realize that some implementations of hash might have to go through and count all the entries to return the actual number. Fortunately, in boolean context, it suffices to find a single entry to determine whether the hash contains anything. However, on hashes that don't keep track of the number of entries, finding even one entry might reset any active iterator on the hash, since some implementations of hash (in particular, the ones that don't keep track of the number of entries) may only supply a single iterator.

Proposal 2 is ``that the iterator in a hash be reset through an explicit call to the reset() function.'' That's fine, with the proviso that it won't be a function, but rather a method on the HASH class.

Proposal 3 is really about sort recognizing pairs and doing the right thing. Defaulting to sorting on $^a[0] cmp $^b[0] is likely to be reasonable, and that's where a pair's key would be found. However, it's probable that the correct solution is simply to provide a default string method for anonymous lists that happens to produce a decent key to sort on when cmp requests a string representation of either of its arguments. The sort itself should probably just concentrate on memoizing the returned strings so they don't have to be recalculated.

RFC 201: Hash Slicing

This RFC proposes to use % as a marker for special hash slicing in the subscript. Unfortunately, the % funny character will not be available for this use, since all hash refs will start with %. Concise list comprehensions will require some other syntax within the subscript, which will hopefully generalize to arrays as well.

Other Decisions About Variables

Various special punctuation variables are gone in Perl 6, including all the deprecated ones. (Non-deprecated variables will be replaced by some kind of similar functionality that is likely to be invoked through some kind of method call on the appropriate object. If there is no appropriate object, then a named global variable might provide similar functionality.)

Freeing up the various bracketing characters allows us to use them for other purposes, such as interpolation of expressions:

    "$(expr)"           # interpolate a scalar expression
    "@(expr)"           # interpolate a list expression

$#foo is gone. If you want the final subscript of an array, and [-1] isn't good enough, use @foo.end instead.

Other special variables (such as the regex variables) will change from dynamic scoping to lexical scoping. It is likely that even $_ and @_ will be lexically scoped in Perl 6.


Names

In Perl 5, lexical scopes are unnamed and unnameable. In Perl 6, the current lexical scope will have a name that is visible within the lexical scope as the pseudo class MY, so that such a scope can, if it so chooses, delegate management of its lexical scope to some other module at compile time. In normal terms, that means that when you use a module, you can let it import things lexically as well as packagely.

Typeglobs are gone. Instead, you can get at a variable object through the symbol table hashes that are structured much like Perl 5's. The variable object for $MyPackage::foo is stored in:

    %MyPackage::{'$foo'}

Note that the funny character is part of the name. There is no longer any structure in Perl that associates everything with the name ``foo''.

Perl's special global names are stored in a special package named ``*'' because they're logically in every scope that does not hide them. So the unambiguous name of the standard input filehandle is $*STDIN, but a package may just refer to $STDIN, and it will default to $*STDIN if no package or lexical variable of that name has been declared.

Some of these special variables may actually be cloned for each lexical scope or each thread, so just because a name is in the special global symbol table doesn't mean it always behaves as a global across all modules. In particular, changes to the symbol table that affect how the parser works must be lexically scoped. Just because I install a special rule for my cool new hyperquoting construct doesn't mean everyone else should have to put up with it. In the limiting case, just because I install a Python parser, it shouldn't force other modules into a maze of twisty little whitespace, all alike.

Another way to look at it is that all names in the ``*'' package are automatically exported to every package and/or outer lexical scope.


Literals

Underscores in Numeric Literals

Underscores will be allowed between any two digits within a number.

RFC 105: Remove ``In string @ must be \@'' Fatal Error

Fine.

RFC 111: Here Docs Terminators (Was Whitespace and Here Docs)

Fine.

RFC 162: Heredoc co ntents

I think I like option (e) the best: remove whitespace equivalent to the terminator.

By default, if it has to dwim, it should dwim assuming that hard tabs are 8 spaces wide. This should not generally pose a problem, since most of the time the tabbing will be consistent throughout anyway, and no dwimming will be necessary. This puts the onus on people using nonstandard tabs to make sure they're consistent so that Perl doesn't have to guess.

Any additional mangling can easily be accomplished by a user-defined operator.

RFC 139: Allow Calling Any Function With a Syntax Like s///

Creative quoting will be allowed with lexical mutataion, but we can't parse foo(bar) two different ways simultaneously, and I'm unwilling to prevent people from using parens as quote characters. I don't see how we can reasonably have new quote operators without explicit declaration. And if the utility of a quote-like operator is sufficient, there should be little relative burden in requiring such a declaration.

The form of such a declaration is left to the reader as an exercise in function property definition. We may revisit the question later in this series. It's also possible that a quote operator such as qx// could have a corresponding function name like quote:qx that could be invoked as a function.

RFC 222: Interpolation of Object Method Calls

I've been hankering for methods to interpolate for a long time, so I'm in favor of this RFC. And it'll become doubly important as we move toward encouraging people to use accessor methods to refer to object attributes outside the class itself.

I have one ``but,'' however. Since we'll switch to using . instead of ->, I think for sanity's sake we may have to require the parentheses, or ``$file.$ext'' is going to give people fits. Not to mention ``$file.ext''.

RFC 226: Selective Interpolation in Single Quotish Context.

This proposal has much going for it, but there are also difficulties, and I've come close to rejecting it outright simply because the single-quoting policy of Perl 5 has been successful. And I think the proposal in this RFC for \I...\E is ugly. (And I'd like to kill \E anyway, and use bracketed scopings.)

However, I think there is a major ``can't get there from here'' that we could solve by treating interpolation into single quotes as something hard, not something easy. The basic problem is that it's too easy to run into a \$ or \@ (or a \I for that matter) that wants to be taken literally. I think we could allow the interpolation of arbitrary expressions into single-quoted strings, but only if we limit it to an unlikely sequence where three or more characters are necessary for recognition. The most efficient mental model would seem to be the idea of embedding one kind of quote in another, so I think this:

    \q{stuff}

will embed single-quoted stuff, while this:

    \qq{stuff}

will embed double-quoted stuff. A variable could then be interpolated into a single-quoted string by saying:

    \qq{$foo}

RFC 237: Hashes Should Interpolate in Double-Quoted Strings

I agree with this RFC in principle, but we can't define the default hash stringifier in terms of variables that are going away in Perl 6, so the RFC's proposal of using $" is right out.

All objects should have a method by which they produce readable output. How this may be overridden by user preference is open to debate. Certainly, dynamic scoping has its problems. But lexical override of an object's preferences is also problematic. Individual object properties appear to give a decent way out of this. More on that below.

On printf formats, I don't see any way to dwim that %d isn't an array, so we'll just have to put formats into single quotes in general. Those format strings that also interpolate variables will be able to use the new \qq{$var} feature.

Note for those who are thinking we should just stick with Perl 5 interpolation rules: We have to allow % to introduce interpolation now because individual hash values are no longer named with $foo{$bar}, but rather %foo{$bar}. So we might as well allow interpolation of complete hashes.

RFC 251: Interpolation of Class Method Calls

Class method calls are relatively rare (except for constructors, which will be rarely interpolated). So rather than scanning for identifiers that might introduce a class, I think we should just depend on expression interpolation instead:

    "There are $(Dog.numdogs) dogs."

RFC 252 : Interpolation of Subroutines

I think subroutines should interpolate, provided they're introduced with the funny character. (On the other hand, how hard is $(sunset $date) or @(sunset $date)? On the gripping hand, I like the consistency of & with $, @ and %.)

I think the parens are required, since in Perl 6, scalar &sub will just return a reference, and require parens if you really want to deref the sub ref. (It's true that a subroutine can be called without parens when used as a list operator, but you can't interpolate those without a funny character.)

For those worried about the use of & for signature checking suppression, we should point out that & will no longer be the way to suppress signature checking in Perl 6, so it doesn't matter.

RFC 327: \ v for Vertical Tab

I think the opportunity cost of not reserving \v for future use is too high to justify the small utility of retaining compatibility with a feature virtually nobody uses anymore. For instance, I almost used \v and \V for switching into and out of verbatim (single-quote) mode, until I decided to unify that with quoting syntax and use \qq{} and \q{} instead.

RFC 328: Single quotes don't interpolate \' and \\

I think hyperquotes will be possible with a declaration of your quoting rules, so we're not going to change the basic single-quote rules (except for supporting \q).

Other Decisions About Literals

Scoping of \L et al.

I'd like to get rid of the gratuitously ugly \E as an end-of-scope marker. Instead, if any sequence such as \L, \U or \Q wishes to impose a scope, then it must use curlies around that scope: \L{stuff}, \U{stuff} or \Q{stuff}. Any literal curlies contained in stuff must be backslashed. (Curlies as syntax (such as for subscripts) should nest correctly.)

Bareword Policy

There will be no barewords in Perl 6. Any bare name that is a declared package name will be interpreted as a class object that happens to stringify to the package name. All other bare names will be interpreted as subroutine or method calls. For nonstrict applications, undefined subroutines will autodefine themselves to return their own name. Note that in ${name} and friends, the name is considered autoquoted, not a bareword.

Weird brackets

Use of brackets to disambiguate

    "${foo[bar]}"

from

    "${foo}[bar]"

will no longer be supported. Instead, the expression parser will always grab as much as it can, and you can make it quit at a particular point by interpolating a null string, specified by \Q:

    "$foo\Q[bar]"

Special tokens

Special tokens will turn into either POD directives or lexically scoped OO methods under the MY pseudo-package:

    Old                 New
    ---                 ---
    __LINE__            MY.line
    __FILE__            MY.file
    __PACKAGE__         MY.package
    __END__             =begin END      (or remove)
    __DATA__            =begin DATA

Heredoc Syntax

I think heredocs will require quotes around any identifier, and we need to be sure to support << qq(END) style quotes. Space is now allowed before the (required) quoted token. Note that custom quoting is now possible, so if you define a fancy qh operator for your fancy hyperquoting algorithm, then you could say <<qh(END) .

It is still the case that you can say <<"" to grab everything up to the next blank line. However, Perl 6 will consider any line containing only spaces, tabs, etc., to be blank, not just the ones that immediately terminate with newline.


Context

In Perl 5, a lot of contextual processing was done at run-time, and even then, a given function could only discover whether it was in void, scalar or list context. In Perl 6, we will extend the notion of context to be more amenable to both compile-time and run-time analysis. In particular, a function or method can know (theoretically even at compile time) when it is being called in:

    Void context
    Scalar context
        Boolean context
        Integer context
        Numeric context
        String context
        Object context
    List context
        Flattening list context (true list context).
        Non-flattening list context (list of scalars/objects)
        Lazy list context (list of closures)
        Hash list context (list of pairs)

(This list isn't necessarily exhaustive.)

Each of these contexts (except maybe void) corresponds to a way in which you might declare the parameters of a function (or the left side of a list assignment) to supply context to the actual argument list (or right side of a list assignment). By default, parameters will supply object context, meaning individual parameters expect to be aliases to the actual parameters, and even arrays and hashes don't do list context unless you explicitly declare them to. These aren't cast in stone yet (or even Jello), but here are some ideas for possible parameter declarations corresponding to those contexts:

    Scalar context
        Boolean context                 bit $arg
        Integer context                 int $arg
        Numeric context                 num $arg
        String context                  str $arg
        Object context                  $scalar, %hash, Dog @canines, &foo
    List context
        Flattening list context         *@args
        Non-flattening list context     $@args
        Lazy list context               &@args
        Hash list context               *%args

(I also expect unary * to force flattening of arrays in rvalue contexts. This is how we defeat the type signature in Perl 6, instead of relying on the initial ampersand. So instead of Perl 5's &push(@list), you could just say push *@list, and it wouldn't matter what push's parameter signature said.)

It's also possible to define properties to modify formal arguments, though that can get clunky pretty quickly, and I'd like to have a concise syntax for the common cases, such as the last parameter slurping a list in the customary fashion. So the signature for the built-in push could be

    sub push (@array, *@pushees);

Actually, the signature might just be (*@pushees), if push is really a method in the ARRAY class, and the object is passed implicitly:

    class ARRAY;
    sub .push (*@pushees);
    sub .pop (;int $numtopop);
    sub .splice (int $offset, int $len, *@repl);

But we're getting ahead of ourselves.

By the way, all function and method parameters (other than the object itself) will be considered read-only unless declared with the rw property. (List assignments will default the other way.) This will prevent a great deal of the wasted motion current Perl implementations have to go through to make sure all function arguments are valid lvalues, when most of them are in fact never modified.

Hmm, we're still getting ahead of ourselves. Back to contexts.

References are now transparent to b oolean context

References are no longer considered to be ``always true'' in Perl 6. Any type can overload its bit() casting operator, and any type that hasn't got a bit() of its own inherits one from somewhere else, if only from class UNIVERSAL. The built-in bit methods have the expected boolean semantics for built-in types, so arrays are still true if they have something in them, strings are true if they aren't "" or &qu ot;0", etc.


Lists

RFC 175: Add list keyword to force list context (like scalar)

Another RFC rescued from the compost pile. In Perl 6, type names will identify casting functions in general. (A casting function merely forces context -- it's a no-op unless the actual context is different.) In Perl 6, a list used in a scalar context will automatically turn itself into a reference to the list rather than returning the last element. (A subscript of [-1] can always be used to get the last element explicitly, if that's actually desired. But that's a rarity, in practice.) So it works out that the explicit list composer:

    [1,2,3]

is syntactic sugar for something like:

    scalar(list(1,2,3));

Depending on whether we continue to make a big deal of the list/array distinction, that might actually be spelled:

    scalar(array(1,2,3));

Other casts might be words like hash (supplying a pairlist context) and objlist (supplying a scalar context to a list of expressions). Maybe even the optional sub keyword could be considered a cast on a following block that might not otherwise be considered a closure in context. Perhaps sub is really spelled lazy. In which case, we might even have a lazylist context to supply a lazy context to a list of expressions.

And of course, you could use standard casts like int(), num(), and str(), when you want to be explicit about such contexts at compile time. (Perl 5 already has these contexts, but only at run time.) Note also that, due to the relationship between unary functions and methods, $foo.int, $foo.num, and $foo.str will be just a different way to write the same casts.

Lest you worry that your code is going to be full of casts, I should point out that you won't need to use these casts terribly often because each of these contexts will typically be implied by the signature of the function or method you're calling. (And Perl will still be autoconverting for you whenever it makes sense.) More on that in Apocalypse 6, Subroutines. If not sooner.

So, while boolean context might be explicitly specified by writing:

    if (bit $foo)

or

    if ($foo.bit)

you'd usually just write it as in Perl 5:

    if ($foo)

Other Decisions about Lists

Based on some of what we've said, you can see that we'll have the ability to define various kinds of lazily generated lists. The specific design of these operators is left for subsequent Apocalypses, however. I will make one observation here, that I think some of the proposals for how array subscripts are generated should be generalized to work outside of subscripts as well. This may place some constraints on the general use of the : character in places where an operator is expected, for instance.

As mentioned above, we'll be having several different kinds of list context. In particular, there will be a hash list context that assumes you're feeding it pairs, and if you don't feed it pairs, it will assume the value you feed it is a key, and supply a default value. There will likely be ways to get hashes to default to interesting values such as 0 or 1.

In order to do this, the => operator has to at least mark its left operand as a key. More likely, it actually constructs a pair object in Perl 6. And the { foo => $bar } list composer will be required to use => (or be in a hashlist context), or it will instead be interpreted as a closure without a sub. (You can always use an explicit sub or hash to cast the brackets to the proper interpretation.)

I've noticed how many programs use qw() all over the place (much more frequently than the input operator, for instance), and I've always thought qw() was kind of ugly, so I'd like to replace it with something prettier. Since the input operator is using up a pair of perfectly good bracketing characters for little syntactic gain, we're going to steal those and make them into a qw-like list composer. In ordinary list context, the following would be identical:

    @list = < foo $bar %baz blurch($x) >;
    @list = qw/ foo $bar %baz blurch($x) /;                     # same as this
    @list = ('foo', '$bar', '%baz', 'blurch($x)');              # same as this

But in hashlist context, it might be equivalent to this:

    %list = < foo $bar %baz blurch($x) >;
    %list = (foo => 1, '$bar' => 1, '%baz' = 1, blurch => $x);  # same as this


Files

Basically, file handles are just objects that can be used as iterators, and don't belong in this chapter anymore.

RFC 034: Angle Brackets Should Not Be Used for File Globbing

Indeed, they won't be. In fact, angle brackets won't be used for input at all, I suspect. See below. Er, above.

RFC 051: Angle Brackets Should Accept Filenames and Lists

There is likely to be no need for an explicit input operator in Perl 6, and I want the angles for something else. I/O handles are a subclass of iterators, and I think general iterator variables will serve the purpose formerly served by the input operator, particularly since they can be made to do the right Thing in context. For instance, to read from standard input, it will suffice to say

    while ($STDIN) { ... }

and the iterator will know it should assign to $_, because it's in a Boolean context.

I read this RFC more as requesting a generic way to initialize an iterator according to the type of the iterator. The trick in this case is to prevent the re-evaluation of the spec every time -- you don't want to reopen the file every time you read a line from it, for instance. There will be standard ways to suppress evaluation in Perl 6, both from the standpoint of the caller and the callee. In any case, the model is that an anonymous subroutine is passed in, and called only when appropriate. So an iterator syntax might prototype its argument to be an anonymous sub, or the user might explicitly pass an anonymous sub, or both. In any event, the sub keyword will be optional in Perl 6, so things like:

    while (file {LIST}) { ... }

can be made to defer evaluation of LIST to the appropriate moment (or moments, if LIST is in turn generating itself on the fly). For appropriate parameter declarations I suppose even the brackets could be scrapped.


Properties

Variables and values of various types have various kinds of data attributes that are naturally associated with them by virtue of their type. You know a dog comes equipped with a wag, hopefully attached to a tail. That's just part of doghood.

Many times, however, you want the equivalent of a Post-It(r) note, so you can temporarily attach bits of arbitrary information to some unsuspecting appliance that (though it wasn't designed for it) is nevertheless the right place to put the note. Similarly, variables and values in Perl 6 allow you to attach arbitrary pieces of information known as ``properties.'' In essence, any object in Perl can have an associated hash containing these properties, which are named by the hash key.

Some of these properties are known at compile time, and don't actually need to be stored with the object in question, but can actually be stored instead in the symbol table entry for the variable in question. (Perl still makes it appear as though these values are attached to the object.) Compile-time properties can therefore be attached to variables of any type.

Run-time properties really are associated with the object in question, which implies some amount of overhead. For that reason, intrinsic data types like int and num may or may not allow run-time properties. In cases where it is allowed, the intrinsic type must generally be promoted to its corresponding object type (or wrapped in an object that delegates back to the original intrinsic for the actual value). But you really don't want to promote an array of a million bits to an array of a million objects just because you had the hankering to put a sticky note on one of those bits, so in those cases it's likely to be disallowed, or the bit is likely to be cloned instead of referenced, or some such thing.

Properties may also be attached to subroutines.

In general, you don't set or clear properties directly -- instead you call an accessor method to do it for you. If there is no method of that name, Perl will assume there was one that just sets or clears a property with the same name as the method. However, using accessor methods to set or clear properties allows us to define synthetic properties. For instance, there might be a real constant property that you could attach to a variable. Certain variables (such as those in a function prototype) might have constant set by default. In that case, setting a synthetic property such as rw might clear the underlying constant property.

A property may be attached to the foregoing expression by means of the ``is'' keyword. Here's a compile-time property set on a variable:

    my int $pi is constant = 3;

Here's a run-time property set on a return value:

    return 0 is true;

Whether a property is applied to a variable at compile time or a value at run-time depends on whether it's in lvalue or rvalue context. (Variable declarations are always in lvalue context even when you don't assign anything to them.)

The ``is'' works just like the ``.'' of a method call, except that the return value is the object on the left, not the return value of the method, which is discarded.

As it happens, the ``is'' is optional in cases where an operator is already expected. So you might see things like:

    my int $pi constant = 3;
    return 0 true;

In this case, the methods are actually being parsed as postfix operators. (However, we may make it a stricture that you may omit the is only for predeclared property methods.)

Since these actually are method calls, you can pass arguments in addition to the object in question:

    my int @table is dim(366,24,60);

Our examples above are assuming an argument of (1):

    my int $pi is constant(1) = 3;
    return 0 is true(1);

Since the ``is'' is optional in the common cases, you can stack multiple properties without repeating the ``is.''

    my int $pi is shared locked constant optimize($optlevel) = 3;

(Note that these methods are called on the $pi variable at compile time, so it behooves you to make sure everything you call is defined. For instance, $optlevel needs to be known at compile-time.)

Here are a list of property ideas stolen from Damian. (I guess that makes it intellectual property theft.) Some of the names have been changed to protect the (CS) innocent.

    # Subroutine attributes...
    sub name is rw { ... }                      # was lvalue
    my sub rank is same { ... }                 # was memoized
    $snum = sub is optimize(1) { ... };         # "is" required here
    # Variable attributes...
    our $age is constant = 21;                  # was const
    my %stats is private;
    my int @table is dim(366,24,60);
    $arrayref = [1..1000000] is computed Purpose('demo of anon var attrs');
    sub choose_rand (@list is lazy) { return $list[rand @list] }
                                                # &@list notation is likely
    $self = $class.bless( {name=>$name, age=>$age} is Initialized );
    # Reference attributes...
    $circular = \$head is weak;
    # Literal attributes...
    $name = "Damian" is Note("test data only");
    $iohandle = open $filename is dis(qw/para crlf uni/) or die;
    $default = 42 is Meaning(<<OfLife);
                             The Answer
                             OfLife
    package Pet is interface;
    class Dog inherits('Canine') { ... }
    print $data{key is NoteToSelf('gotta get a better name for this key')};

(I don't agree with using properties for all of these things, but it's pretty amazing how far into the ground you can drive it.)

Property names should start with an identifier letter (which includes Unicode letters and ideographs). The parsing of the arguments (if any) is controlled by the signature of the method in question. Property method calls without a ``.'' always modify their underlying property.

If called as an ordinary method (with a ``.''), the property value is returned without being modified. That value could then be modified by a run-time property. For instance, $pi.constant would return 1 rather than the value of $pi, so we get:

    return $pi.constant is false;       # "1 but false" (not possible in Perl 5)

On the other hand, if you omit the dot, something else happens:

    return $pi constant is false;       # 3 but false (and 3 is now very constant)

Here are some more munged Damian examples:

    if (&name.rw) { ... }
    $age++ unless $age.constant;
    $elements = return reduce $^ * $^, *@table.dim;
    last if ${self}.Initialized;
    print "$arrayref.Purpose() is not $default.Meaning()\n";
    print %{$self.X};    # print hash referred to by X attribute of $self
    print %{$self}.X;    # print X attribute of hash referred to by $self
    print %$self.X;      # print X attribute of hash referred to by $self

As with the dotless form, if there is no actual method corresponding to the property, Perl pretends there's a rudimentary one returning the actual property.

Since these methods return the properties (except when overridden by dotless syntax), you can temporize a property just as you can any method, provided the method itself allows writing:

    temp $self.X = 0;

Note that

    temp $self is X = 0;

would assign to 0 to $self instead. (Whether it actually makes sense to set the compile-time X property at run time on the $self variable is anybody's guess.)

Note that by virtue of their syntax, properties cannot be set by interpolation into a string. So, happily:

    print "My $variable is foobar\n";

does not attempt to set the foobar property on $variable.

The ``is'' keyword binds with the same precedence as ``.'', even when it's not actually there.

Note that when you say $foo.bar, you get $foo's compile-time property if there is one (which is known at compile time, duh). Otherwise it's an ordinary method call on the value (which looks for a run-time property only if a method can't be found, so it shouldn't impact ordinary method call overhead.)

To get to the properties directly without going through the method interface, use the special btw method, which returns a hash ref to the properties hash.

    $foo.btw{constant}

Note that synthetic properties won't show up there!

None of the property names in this Apocalypse should be taken as final. We will decide on actual property names as we proceed through the series.

Well, that's it for Apocalypse 2. Doubtless there are some things I should have decided here that I didn't yet, but at least we're making progress. Well, at least we're moving in some direction or other. Now it's time for us to dance the Apocalypso, in honor of Jon Orwant and his new wife.

Reversing Regular Expressions


Regular Expressions are arguably one of Perl's most useful and powerful tools. The ability to match complex strings is one of the things that makes Perl the effective "Practical Extraction and Reporting Language" that it is. The regular expression engine is highly optimised and thus very fast, although in some situations people fail to use it in the most effective manner.

For example, imagine a case where you have a collection of very long lines from a log file, that each have a four digit year (starting with 19) somewhere near the end.

One would normally say something like:

# Match any occurrence of 19xx that's not followed by another one
if ($string =~ /(19\d\d)(?!.*19\d\d)/) {$date = $1}

or, better:

if ($string =~ /.*(19\d\d)/) {$date = $1}

However, in this situation, the regular expression engine has to go through all the string, remembering any matches it finds and discarding them any time it finds a new match, until it reaches the end of the string. If you have long log lines, this can be highly inefficient, and comparatively slow. But is there another way to do it? (Hint: With Perl, There's Always More Than One Way To Do It)

Unless you're a regular on IRC or perlmonks.com , or attended YAPC::Europe, it's quite possible that you've never heard of (a) Jeff Pinyan (aka Japhy) and (b) his simple and elegant solution to getting around this problem. He dubbed these solutions reversed regular expressions, or sexeger.

Instead of going through the whole string looking for the last match, wouldn't it be a better idea to work from the back of the string and take the first match that is found? At the moment, Perl doesn't provide a built in method for doing this, but it's surprisingly easy to emulate one. To quote the lightning talk Mr Pinyan wrote for me to give at YAPC::E "Reverse the input! Reverse the regex! Reverse the match!".

And so that's exactly what we'll do:

sub get_date {
    $_ = scalar reverse($_[0]); # Reverse whatever string we were passed
    /(\d\d91)/;                 # Look for the first occurrence of a xx91 (19xx reversed)
    return scalar reverse $1;   # Reverse back whatever we found and return it
}

Obviously, we're performing two more functions in this example, two calls of reverse. Reverse however appears to be very efficient, and in benchmarks that were run (details below), this seemed not to be a problem. To test if this really would be significantly faster, I used a 10,000 character string, and ran the regex 10000 times on it. Here's the code used to test it:

$la = [our 10000 character string]
...
use Benchmark;
$t = timeit("10000",  sub { $_ = $la; /.*(19\d\d)/; });
print "Greedy took:",timestr($t), "\n";
$t = timeit("10000",  sub { $_ = $la; /(19\d\d)(?!.*19\d\d)/;});
print "Lookahead took:",timestr($t), "\n";
$t = timeit("10000",  sub { $_ = reverse($la); /(\d\d91)/;  });
print "Sexeger took:",timestr($t), "\n";

The first example, with its look ahead assertion, took four hundred seconds, as the look ahead assertion is a major time sink. The second example took 1.5 seconds. The reversed method managed however to shave three quarters of a second off that, and become 0.75 seconds. To summarise:

MethodTime
/(19\d\d)(?!.*19\d\d)/400
/.*(19\d\d)/1.5
sexeger0.75

However, performance gains are not the only good use for reversed regular expression, as one can also use them to solve a few other common problems that Perl doesn't handle easily. If you've all read the Frequently Asked Questions included with Perl, which I'm sure you have, then you'll have seen a particularly good example in Section 5 (perldoc perlfaq5) by Andrew Johnson - "How can I output my numbers with commas added?".

sub commify {
my $input = shift;
$input = reverse $input;
$input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
return scalar reverse $input;
}

One of my personal favourite regex tools is zero-width look ahead assertions, as in the very first example (it's worth noting that that's a negative zero-width look ahead assertion). For people who haven't scoured perldoc perlre, this allows you to state that a pattern has to exist in front of your position, without actually matching it. Perhaps an example would help illustrate: /ab(?=.+e)cd/ will match "abcde" but not "abecd" or "abcd". Sadly, it's not possible to do variable length zero-width look behind assertions with the current Regular Expression Engine. However, if we apply sexeger principles to it, as suggested by Anthony Guselnikov, it suddenly becomes easy. If we want to match the string "def", as long as it was preceded by the letter 'a', we can say:

sub nlba {
$_ = scalar reverse $_[0];
print "Success\n" if /fed(?=.*a)/;
}

Reversing Regular Expressions is a powerful and effective tool in any programmers arsenal. I hope that I've managed to illustrate the utility that this simple and elegant solution offers. Some overhead in programmer time, and in processing time is required to use them, and so I'd suggest that you evaluate using them on a case by case basis.

For more information, visit the homepage at http://www.pobox.com/~japhy/sexeg er

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en