November 2003 Archives

This week on Perl 6, week ending 2003-11-30

Welcome back to the weekly Perl 6 Summary, which I'm hoping to keep on a weekly cycle for the foreseeable future.

It's been a relatively low volume week this week, I'm assuming that Thanksgiving had something to do with it (I hope those of you who celebrate it had a jolly good extended weekend), and for the first time in ages perl6-language saw almost as much traffic as perl6-internals. We're still going to start with perl6-internals though.

Some PIR "How do I?" questions

Last week Dan put up a list of questions about IMCC, intending it as a bit of grit. This week Melvin Smith added a couple of layers of nacre by checking in an initial IMCC FAQ (you'll find it at imcc/docs/imcfaq.pod in your parrot directory if you're following with CVS). After an initial flurry of work on adding to the FAQ, the discussion seems to have drifted off into discussions of bike shed pigmentation. Yes, the FAQ may need to be split eventually, but splitting an FAQ into sub documents is trivial compared to actually answering the questions. Ahem. Editorial ends.

http://groups.google.com/groups?selm=5.1.1.6.2.20031123011917.03503858@pop.mindspring.com

http://www.parrotcode.org/faq/imcc -- The FAQ

PIO_eof

Vladimir Lipsky had a few questions about how the Parrot IO subsystem (PIO) handles error conditions. This sparked a discussion of whether the various PIO functions should make some assertions up front about their arguments. Consensus said "yes", they help with debugging, and the performance 'hit' is minuscule.

http://groups.google.com/groups?selm=006c01c3b267$7e683500$6b9d943e@87w5ovc8gmxcahy

Freezing and Thawing

Discussion of the details of object serialization continued this week. Judging by the amount of clarification traffic that's been going on in this thread (and others), I find myself wondering if the time has come for some generous benefactor to sponsor the principals to get together in a conference room with copious whiteboard acreage. Sponsors welcome.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311240851430.31299@sprite.sidhe.org

Segfault warning boxes

Consider a system of test machines, all of them happily running in unattended mode, constantly checking out the latest version of parrot, compiling everything, running the tests and reporting the results to some central webserver that displays a status report for all to see. It's not just a good idea, it's the Parrot tinderbox.

Now, consider an operating system that takes it upon itself to throw up a modal dialog box when a program segfaults.

The two don't sit well together do they? Which is why Dan has committed a patch to disable the "Your program has crashed, want to send a report to Microsoft?" box that pops up on Win32 when Parrot segfaults. Still, at least it can be disabled.

Dan asked for opinions on whether the (non tinderbox) default should be to put up a dialog box or not. And it appears that, if you disable the dialog box, there's no other notification (apart from the failed test, obviously) that there was a segfault. I am sure I'm not alone in boggling at this, but it looks like the default will be to display the dialog.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311251153350.19018@sprite.sidhe.org

Perl 6 updated

Allison Randal sent in a patch to update the Perl 6 compiler to use the right operators (^+ becomes >>+<<, ~~ (meant xor) becomes ^^, making way for =~ to become ~~). She's still working on other tweaks to get the semantics of what's already been implemented right, and claims to be sufficiently annoyed by the failing regex tests that she might have to go and make them work to remove the annoyance. Which would be good. (I'm afraid that the tantalizing code I mentioned last week has actually been there for ages, as I feared it had). That capital fellow, chromatic applied the patch.

http://groups.google.com/groups?selm=rt-24559-67828.1.16647251862517@rt.perl.org

String formatting and transformation

Dan got around to specifying some details about string formatting and transformation. Essentially there will be various case changing transformations and a couple of different formatting approaches, one sprintf like and another more like traditional COBOL style formatting. Dan's not sure yet whether these formatters will get ops or simply be implemented in the standard library.

http://groups.google.com/groups?selm=a06010202bbebfb35bca6@[10.0.1.3]

AUTOLOAD anyone?

Someone calling itself ibotty wondered if there would be a default PMC method that gets automagically called whenever a method call isn't found. To which the answer is "Yes, when it's done".

http://groups.google.com/groups?selm=20031127235646.20556.qmail@onion.perl.org

Determining PMC memory addresses

Cory Spencer wanted to know if there was a PASM way of finding whether two PMCs share the same memory address. "Not yet", said Leo Tötsch, "but there are entries in the vtable to do it". This sparked a discussion of the meaning of 'equal'. It was suggested that Parrot use Lisp's triple equality test operators (one tests if two things are the same thing, the second if two things have the same value, and the third tests that they have the same structure). Jos Visser argued (convincingly I thought) that having three operators with the same semantics might be a Good Thing, but using Lisp's names (eq, eql and equal) for them would definitely be a Bad Thing.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311272124130.31738@okcomputer.antiflux.org

Meanwhile, on perl6-language

'Core' language philosophy

Branching off from a thread about using catch as a statement modifier á la if, unless etc., Michael Lazzaro waxed philosophical about the distinction between core and library. He wondered if putting syntactic sugar into CP6AN and then having almost everyone install it anyway wasn't a false economy. He went on to defend the idea of the catch modifier.

Larry didn't agree. He suggested that a better way would be to look at some kind of pragma which would make err (the low priority form of //, aka 'defined or') put an implicit try around its left hand argument. Michael still wasn't keen, but he wants to use undef to stand for a SQLesque NULL.

http://groups.google.com/groups?selm=479C7C28-2048-11D8-A666-000A277AA894@cognitivity.com

The C Comma

Luke Palmer doesn't like the C Comma -- the one that lets you write:

    while $n++, $foo > $bar {...}

He does like the semantics though, so he proposed renaming the C comma and calling it then. People liked the idea of changing the name because, if nothing else, it would mean one could write @foo = 1,2,3; and get a list with three elements rather than a one element list containing the number 3. However, they weren't necessarily keen on then as the new name. Damian wondered if we shouldn't just insist on

   while do { $n++; $foo > $bar } {...}

And then people started coming up various synonyms for 'and' in Latin like ac, cum et cetera. Larry hasn't ruled on any of the options yet.

http://groups.google.com/groups?selm=20031125000038.GA30028@babylonia.flatirons.org

Properties

Luke Palmer once again demonstrated his mastery of the telling question by asking a series of questions about runtime properties. Larry's answers, both to the immediate question and in the thread that followed make for fascinating reading and leave us all waiting with bated breath for Apocalypse 12 on Objects.

I'm afraid I'm going to punt on attempting to summarize the thread though; Larry's always hard to summarize when he's introducing anything new. You really should (at least) read his initial sketch though -- role based object orientation looks (to this reader at least) like a brilliant idea.

http://groups.google.com/groups?selm=20031127070230.GA8488@babylonia.flatirons.org

http://groups.google.com/groups?selm=20031127212123.GA24862@wall.org -- Larry outlines his current thinking

Acknowledgements, Apologies, Announcements

I've still not got 'round to getting the link shortening in place I'm afraid. For some reason the appropriate modules don't seem to want to install on my G5. Maybe next week.

Leon Brocard has taken on the role of Pumpking for Perl 5.004 and will not be appearing in these summaries for the duration of his reign out of respect. After all, being a pumpking requires a kind of dignity that would be incompatible with being a running joke.

If you find these summaries useful or enjoyable, please consider contributing time and/or money to the Perl Foundation to help support the Perl 6 effort. You might also like to send me feedback at p6summarizer@bofh.org.uk, or drop by my website.

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

http://www.bofh.org.uk:8080/ -- My website, "Just a Summary"

This fortnight on Perl 6, week ending 2003-11-23

Right, hopefully things are back to normal(ish) after the disk crashes that rather spoilt the last summary. I've managed to fill in my mail archive too so this summary will cover the events of the last fortnight (that's two weeks for those whose version of English lacks this vital unit of time).

We start, as usual, with all the action in perl6-internals.

Unifying call and return

It's been a long time coming, but Dan returned from LL3 suitably refreshed by his dealings with academics ("You can [...] consider a a sub PMC as a frozen semi-stateless continuation") and set about unifying Parrot's call and return conventions. The strange thing is that everyone's been wanting this for months now, but that didn't mean there wasn't discussion.

Meanwhile Melvin Smith set about making this discussion moot (from the point of view of someone targeting Parrot) by hiding the details behind IMCC directives (which already happens admittedly, but Melvin's in the process of renaming the .pcc_* directives to get rid of the .pcc_ and (eventually) allowing for pluggable calling conventions.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311100905190.15737@sprite.sidhe.org -- Dan's heads up

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311121141090.28619@sprite.sidhe.org -- Changes, draft 1

http://groups.google.com/groups?selm=a06010201bbdd728d2b16@[10.0.1.2] -- Dan puts his foot down

http://groups.google.com/groups?selm=5.1.1.6.2.20031115234124.020a7f60@pop.mindspring.com -- Melvin Makes it Moot

Reviewing a book about VMs

Stéphane Payrard bought a copy of "Virtual Machine Design and Implementation in C++" and reviewed it for perl6-internals. He didn't like it. Which is putting it mildly.

This led to the inevitable questions about finding good books on VM design. Stéphane quoted Leon Brocard's preemption of this question from London.pm: "No, there's surprisingly little out there on virtual machine design and development".

http://groups.google.com/groups?selm=20031113173044.GB5924@stefp.dyndns.org

IMCC gets high level sub call syntax

Melvin Smith continues to make PIR less like Assembly language and more like pseudo code. You can now write code like:

   _foo()
   var = _foo()
   var = _foo(a, b)
   (var1, var2) = _foo(a, b)

and IMCC will do the Right Thing. Go Melvin.

http://groups.google.com/groups?selm=5.1.1.6.2.20031116195005.03c2c488@pop.mindspring.com

IMCC problems with library loading

Jeff Clites pointed out some issues with IMCC's behaviour when loading external libraries and wondered about definitive docs on the macro syntax (I think the current definitive docs may well be the source code). Leo Tötsch seemed to accept that there were issues with the current state of IMCC in this context, but implied that fixing it wasn't an immediate priority.

http://groups.google.com/groups?selm=C5EC6699-18A1-11D8-92BE-000393A6B9DA@mac.com

A new list proposal

Melvin Smith proposed creating a parrot-compilers mailing list as a general purpose list for any and all language development targeting Parrot. He even went so far as to volunteer to maintain and distribute the FAQ.

The immediate response was positive, but Leo and others didn't really think the volume of traffic currently warranted it (which, I think, was part of Melvin's point, Parrot is reaching the point where it can just be used without necessarily needing to worry over much about its internals. I think Melvin is concerned that the technical level (and implicit Perlcentricity) of the current internals list is putting potential compiler developers off participating).

So, if you're reading this because you're interested in using Parrot as a target platform, but you don't subscribe to the internals list but would subscribe to a parrot-compilers list, could you let me, Melvin, or the list know? I've already committed to including any such list in my summaries (at least until traffic reaches the point where someone decides to do a dedicated summary). Let us know if you think it's a terrible idea too of course.

http://groups.google.com/groups?selm=5.1.1.6.2.20031117135818.021515f8@pop.mindspring.com

Parrot Documentation Review

Michael Scott posted a review of Parrot documentation with pointers of where to find existing docs and a discussion of where people should be looking. Somewhat bizarrely, he failed to give the URL of the Parrot Wiki, his favoured source of documentation (favoured for the very good reason that it's excellent). However, you'll find the URL below.

http://groups.google.com/groups?selm=47906C10-1959-11D8-AC7B-0050E479991D@mac.com

http://www.vendian.org/parrot/wiki/ -- The very lovely and worthwhile parrot Wiki

Freeze is in

Dan announced that he's checked in Leo's preliminary patch implementing freeze/thaw, commenting that it's likely to change dramatically internally and that the vtable/low-level API wasn't final, but that the op-level interface would be stable. Jeff Clites was initially concerned that the patch used things he was expecting to be able to use for his (in progress) ordered destruction patch. However, after some discussion, he and Leo seem to be confident that the two things can be unified.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311191201490.7503@sprite.sidhe.org

Warning Patrol

Jürgen Bömmels has been doing a little warning patrol of parrot and finds that only a very few fixes to IMCC are required to get a warning-free compile and posted a patch with them in. Leo applied it immediately, reasoning that, whilst IMCC's under heavy development, there's no excuse for throwing compile warnings, and every time the perpetrators have to integrate such patches they'll become more likely to try not to generate the warnings in the first place.

Inspired by this, Jonathan Worthington posted a similar patch to eliminate warnings when building for win32, which Leo also applied.

http://groups.google.com/groups?selm=m24qwzgj0y.fsf@helium.physik.uni-kl.de

Finished/stable parts of Parrot?

Robin Redeker wondered which bits of Parrot and IMCC could be considered stable enough to use in implementing a language. Leo opined that existing syntax will not change much, most of the expected changes relate to things like adding support for calling conventions and simplifications like replacing .pcc_sub with .sub. Melvin Smith pitched in with rather more detail about what was expected to change and the kind of changes that could be expected in the future. Reading this it's apparent that we've reached the point where almost any conceivable extension to IMCC will be backward compatible with earlier versions. Famous last words I know, but it is looking that way.

http://groups.google.com/groups?selm=20031120192528.GA6967@x-paste.de

Do Steve Fink's debugging for him

Steve Fink had a problem with some generated code throwing a segfault when it was run and, having hit the debugging wall himself, posted the code to the list and asked help. Leo tracked down the bug in Parrot and fixed it. However, that's not why this made the summary, what's interesting is the sample of code that the problem code was generated from. Here it is:

  rule thing() {
      <[a-z]>+
  }

  rule parens() {
      { print "entering parens\n"; }
      \( [ <thing> | <parens> | \s ]* \)
      { print "leaving parens\n"; }
  }

  sub main() {
      my $t = "()()(((blah blah () blah))(blah))";
      my $t2 = $t _ ")";
      print "ok 8\n" if $t2 !~ /^ <parens> $/;
  }

Looks tantalizing doesn't it? Someone's now going to tell me that the code in question has been there forever, but I'm still tantalized.

http://groups.google.com/groups?selm=20031121071225.GI26642@foxglove

Another minor tasks for the interested

Dan discovered that a make -j 4 dies quickly and horribly, which seems to imply that there's a dependency or two missing from the nest of twisty makefiles that make up the Parrot build system. He asked for volunteers to fix things. There's been some work done on this, but no definitive fix so far.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311211058540.29189@sprite.sidhe.org

Some PIR "How do I?" questions

Dan threw a list of questions about IMCC out to the list as things that would really benefit from being documented. Sterling Hughes added a question or two, and Melvin Smith volunteered to get the answers written up real soon now.

http://groups.google.com/groups?selm=Pine.LNX.4.58.0311211515220.29189@sprite.sidhe.org

Bytecode portability and word/int sizes

Now that Parrot has the beginnings of freeze/thaw support, Melvin Smith restarted discussion of ensuring that Parrot's bytecode format is portable. He pointed at several issues that need resolving. Discussion continues.

http://groups.google.com/groups?selm=5.1.1.6.2.20031122120034.02236230@pop.mindspring.com

Meanwhile, in perl6-language

Fun with 'multi'

Discussion of multi dispatch rolled on into the current summary period. Luke Palmer pointed out that multi is no longer a declarator but a modifier, allowing for fine-grained control of the 'scope' of a method/function/whatever. Larry pointed out that 'multi' is orthogonal to scope, saying simply "I'm putting multiple names into a spot that would ordinarily demand a unique name." and went on to explain precisely what that implied.

http://groups.google.com/groups?selm=20031110094753.GA14948@babylonia.flatirons.org

http://groups.google.com/groups?selm=20031119025313.GA10300@wall.org -- Larry Clarifies

The Perl 6 design process

Rod Adams worried that all of the design of Perl 6 seems to come from Larry and Damian, with the language list being relegated to asking questions about how stuff works in more detail and in sniffing out tricky bits. Taken in combination with the real world's nasty habit of saddling Larry and Damian with health and financial problems, well, Rod worries, and wonders if there's anything that can be done to improve matters.

The redoubtable chromatic (who prefers it if I always render his name in all lower case, even at the beginning of sentences, hence his becoming redoubtable) gave the Standard Answer to such questions, pointing out that the reason things are going through Larry is because, well, Larry's bloody good at it. However, he did suggest that we're now at the point where we can probably start turning existing design docs into story cards and programmer tests for Perl 6, which should make implementation a good deal easier, and pointed to the Perl XP Training Wiki as a good place to look into how such tools are used to develop code.

http://groups.google.com/groups?selm=4.2.0.58.20031114234841.00aa39e0@rodadams.net

http://xptrain.perl-cw.com/ -- The Perl XP Training Wiki

Control flow variables

Luke Palmer doesn't like the repetition and ugliness inherent in code like:

  my $is_ok = 1;
  for 0..6 -> $t
    { if abs(@new[$t] = @new[$t+1]) > 3 { $is_ok = 0; last } }

  if $is_ok { push @moves: [$i, $j] }

and wondered if there was a Perl6ish way of eliminating $is_ok. He suggested something like:

  for 0..6 -> $t
    { if abs(@new[$t] - @new [$t+1]) > 3 { last }}
  FINISH 
    { push @moves: [$i, $j] }

Simon Cozens made a plea to eschew adding syntax to the language when it would be better to find a general mechanism. He wondered if a better approach may be to have for return a status, by analogy with Perl 6's if, which would allow one to write:

  for 0..6 -> $t { 
    last if abs(@new[$t] - @new[$t+1] > 3; 
  } or push @moves: [$i, $j];

Which wasn't really well liked (I'm practicing my understatement here, okay?) However, Simon's point about distinguishing between Perl 6 the language and stuff that belongs in CP6AN (CPAN for Perl 6) was well made and taken.

The thread's almost unsummarizable, it went off in so many different directions, and elicited some good stuff from Larry about the thinking behind Perl 6, along with a horde of solutions to Luke's problem. Some of 'em even work in Perl 5, at the expense of some seriously ugly looking code. Piers Cawley wrapped one solution, based on a guard clause approach, up in a higher order function, but fumbled the ball on working out the macro syntax, which pulled the discussion in another direction. Meanwhile other subthreads led to discussions of generators, coroutines, Mathematica's Sow[] and Reap[] and legitimate uses of goto.

And Larry emphasised that nested modifiers are still quite illegal in Standard Perl 6.

http://groups.google.com/groups?selm=20031118142052.GA5735@babylonia.flatirons.org

http://groups.google.com/groups?selm=20031119181906.GD10300@wall.org -- Larry on language design

s/// in string context should return the string

In another of those deceptively 'pregnant' posts, Stéphane Payrard proposed that s/// in string context should return the string after substitution. Austin Hastings proposed something scary, which led to Larry contemplating making

  $x.s/foo/bar/.s/moo/goo/;

work somehow. (Which is actually less scary than Austin's original proposal). I think he rejected the idea for the time being though.

Piers Cawley suggested that it should return the substituted string, with an appropriate boolean property set depending on whether any substitutions were made. But then there issues of how you strip such properties off if you want to use the standard boolean interpretations of a string. Luke Palmer suggested the rather lovely but nothing meta property.

I have the feeling that what's actually going to happen is that we'll end up with a supplementary, non destructive form of s/// which doesn't alter the original string, but returns a new string with the substitutions made.

http://groups.google.com/groups?selm=20031118171859.GH3887@stefp.dyndns.org

Acknowledgements, Apologies, Announcements

Sorry, we're a tad late again, sorry too to those of you reading this on the mailing lists, I've not got 'round to reimplementing the software to shorten the article URLs.

If you find these summaries useful or enjoyable you might like to show your appreciation by contributing money or time to the Perl Foundation and the Perl 6 effort. , or you could give me feedback directly at p6summarizer@bofh.org.uk, or stop by at my website.

http://donate.perl-foundation.org/ -- The Perl Foundation

http://dev.perl.org/perl6/ -- Perl 6 Development site

http://www.bofh.org.uk:8080/ -- My website, "Just a Summary"

Perl Slurp-Eaze

One of the common Perl idioms is processing text files line by line:


while( <FH> ) {
    do something with $_
}

This idiom has several variants, but the key point is that it reads in only one line from the file in each loop iteration. This has several advantages, including limiting memory use to one line, the ability to handle any size file (including data piped in via STDIN), and it is easily taught to and understood by Perl beginners. Unfortunately, it means they then go on to do things like this:


while( <FH> ) {
    push @lines, $_ ;
}

foreach ( @lines ) {
    do something with $_
}

Line by line processing is fine, but it isn't the only way to deal with reading files. The other common style is reading the entire file into a scalar or array, and that is commonly known as slurping. Now, slurping has somewhat of a poor reputation, and this article is an attempt at rehabilitating it.

Slurping files has advantages and limitations, and is not something you should just do when line by line processing is fine. It is best when you need the entire file in memory for processing all at once. Slurping with in memory processing can be faster and lead to simpler code than line by line if done properly.

The biggest issue to watch for with slurping is file size. Slurping very large files or unknown amounts of data from STDIN can be disastrous to your memory usage and cause swap disk thrashing. You can slurp STDIN if you know that you can handle the maximum size input without detrimentally affecting your memory usage, and so I advocate slurping only disk files and only when you know their size is reasonable and you have a real reason to process the file as a whole.

Note that "reasonable" size these days is larger than it was in the bad old days of limited RAM. Slurping in a megabyte is not an issue on most systems. But most of the files I tend to slurp in are much smaller than that. Typical files that work well with slurping are configuration files, (mini-)language scripts, some data (especially binary) files, and other files of known sizes which need fast processing.

Another major win for slurping over line by line is speed. Perl's IO system (like many others) is slow. Calling <> for each line requires a check for the end of line, checks for EOF, copying a line, munging the internal handle structure, etc. Plenty of work for each line read in. On the other hand, slurping, if done correctly, will usually involve only one I/O call and no extra data copying. The same is true for writing files to disk, and we will cover that as well.

Finally, when you have slurped the entire file into memory, you can do operations on the data that are not possible or easily done with line by line processing. These include global search/replace (without regard for newlines), grabbing all matches with one call of //g, complex parsing (which in many cases must ignore newlines), processing *ML (where line endings are just white space) and performing complex transformations such as template expansion.

Global Operations

Here are some simple global operations that can be done quickly and easily on an entire file that has been slurped in. They could also be done with line by line processing but that would be slower and require more code.

A common problem is reading in a file with key/value pairs. There are modules which do this but who needs them for simple formats? Just slurp in the file and do a single parse to grab all the key/value pairs.


my $text = read_file( $file ) ;
my %config = $test =~ /^(\w+)=(.+)$/mg ;

That matches a key which starts a line (anywhere inside the string because of the /m modifier), the '=' char and the text to the end of the line (again, /m makes that work). In fact the ending $ is not even needed since . will not normally match a newline. Since the key and value are grabbed and the m// is in list context with the /g modifier, it will grab all key/value pairs and return them. The %config hash will be assigned this list and now you have the file fully parsed into a hash.

Various projects I have worked on needed some simple templating and I wasn't in the mood to use a full module (please, no flames about your favorite template module :-). So I rolled my own by slurping in the template file, setting up a template hash and doing this one line:


$text =~ s/<%(.+?)%>/$template{$1}/g ;

That only works if the entire file was slurped in. With a little extra work it can handle chunks of text to be expanded:


$text =~ s/<%(\w+)_START%>(.+)<%\1_END%>/ template($1, $2)/sge ;

Just supply a template sub to expand the text between the markers and you have yourself a simple system with minimal code. Note that this will work and grab over multiple lines due the the /s modifier. This is something that is much trickier with line by line processing.

Note that this is a very simple templating system, and it can't directly handle nested tags and other complex features. But even if you use one of the myriad of template modules on the CPAN, you will gain by having speedier ways to read and write files.

Slurping in a file into an array also offers some useful advantages. One simple example is reading in a flat database where each record has fields separated by a character such as ::


my @pw_fields = map [ split /:/ ], read_file( '/etc/passwd' ) ;

Random access to any line of the slurped file is another advantage. Also a line index could be built to speed up searching the array of lines.

Traditional Slurping

Perl has always supported slurping files with minimal code. Slurping of a file to a list of lines is trivial, just call the <> operator in a list context:


my @lines = <FH> ;

and slurping to a scalar isn't much more work. Just set the built in variable $/ (the input record separator) to the undefined value and read in the file with <>:


{
    local( $/, *FH ) ;
    open( FH, $file ) or die "sudden flaming death\n"
    $text = <FH>
}

Notice the use of local(). It sets $/ to undef for you and when the scope exits it will revert $/ back to its previous value (most likely "\n").

Here is a Perl idiom that allows the $text variable to be declared, and there is no need for a tightly nested block. The do block will execute <FH> in a scalar context and slurp in the file named by $file:


    local( *FH ) ;
    open( FH, $file ) or die "sudden flaming death\n"
    my $text = do { local( $/ ) ; <FH> } ;

Both of those slurps used localized filehandles to be compatible with 5.005. Here they are with 5.6.0 lexical autovivified handles:


{
    local( $/ ) ;
    open( my $fh, $file ) or die "sudden flaming death\n"
    $text = <$fh>
}

        open( my $fh, $file ) or die "sudden flaming death\n"
        my $text = do { local( $/ ) ; <$fh> } ;

And this is a variant of that idiom that removes the need for the open call:


my $text = do { local( @ARGV, $/ ) = $file ; <> } ;

The filename in $file is assigned to a localized @ARGV and the null filehandle is used which reads the data from the files in @ARGV.

Instead of assigning to a scalar, all the above slurps can assign to an array and it will get the file but split into lines (using $/ as the end of line marker).

There is one common variant of those slurps which is very slow and not good code. You see it around, and it is almost always cargo cult code:


my $text = join( '', <FH> ) ;

That needlessly splits the input file into lines (join provides a list context to <FH>) and then joins up those lines again. The original coder of this idiom obviously never read perlvar and learned how to use $/ to allow scalar slurping.

Write Slurping

While reading in entire files at one time is common, writing out entire files is also done. We call it "slurping" when we read in files, but there is no commonly accepted term for the write operation. I asked some Perl colleagues and got two interesting nominations: Peter Scott said to call it "burping" (rhymes with "slurping" and suggests movement in the opposite direction); others suggested "spewing" which has a stronger visual image :-) Tell me your favorite or suggest your own. I will use both in this section so you can see how they work for you.

Spewing a file is a much simpler operation than slurping. You don't have context issues to worry about and there is no efficiency problem with returning a buffer. Here is a simple burp subroutine:


sub burp {
    my( $file_name ) = shift ;
    open( my $fh, ">$file_name" ) || 
        die "can't create $file_name $!" ;
    print $fh @_ ;
}

Note that it doesn't copy the input text but passes @_ directly to print. We will look at faster variations of that later on.

Slurp on the CPAN

As you would expect there are modules in the CPAN that will slurp files for you. The two I found are called Slurp.pm (by Rob Casey - ROBAU on CPAN) and File::Slurp.pm (by David Muir Sharnoff - MUIR on CPAN).

Here is the code from Slurp.pm:


sub slurp { 
    local( $/, @ARGV ) = ( wantarray ? $/ : undef, @_ ); 
    return <ARGV>;
}

sub to_array {
    my @array = slurp( @_ );
    return wantarray ? @array : \@array;
}

sub to_scalar {
    my $scalar = slurp( @_ );
    return $scalar;
}

The subroutine slurp() uses the magic undefined value of $/ and the magic file handle ARGV to support slurping into a scalar or array. It also provides two wrapper subs that allow the caller to control the context of the slurp. And the to_array() subroutine will return the list of slurped lines or a anonymous array of them according to its caller's context by checking wantarray. It has 'slurp' in @EXPORT and all three subroutines in @EXPORT_OK.

File::Slurp.pm has this code:


sub read_file
{
    my ($file) = @_;

    local($/) = wantarray ? $/ : undef;
    local(*F);
    my $r;
    my (@r);

    open(F, "<$file") || croak "open $file: $!";
    @r = <F>;
    close(F) || croak "close $file: $!";

    return $r[0] unless wantarray;
    return @r;
}

This module provides several subroutines including read_file() (more on the others later). read_file() behaves similarly to Slurp::slurp() in that it will slurp a list of lines or a single scalar depending on the caller's context. It also uses the magic undefined value of $/ for scalar slurping but it uses an explicit open call rather than using a localized @ARGV and the other module did. Also it doesn't provide a way to get an anonymous array of the lines but that can easily be rectified by calling it inside an anonymous array constructor [].

Both of these modules make it easier for Perl coders to slurp in files. They both use the magic $/ to slurp in scalar mode and the natural behavior of <> in list context to slurp as lines. But neither is optimized for speed nor can they handle binmode() to support binary or unicode files. See below for more on slurp features and speedups.

Slurping API Design

The slurp modules on CPAN are have a very simple API and don't support binmode(). This section will cover various API design issues such as efficient return by reference, binmode() and calling variations.

Let's start with the call variations. Slurped files can be returned in four formats: as a single scalar, as a reference to a scalar, as a list of lines or as an anonymous array of lines. But the caller can only provide two contexts: scalar or list. So we have to either provide an API with more than one subroutine (as Slurp.pm did) or just provide one subroutine which only returns a scalar or a list (not an anonymous array) as File::Slurp does.

I have used my own read_file() subroutine for years and it has the same API as File::Slurp: a single subroutine that returns a scalar or a list of lines depending on context. But I recognize the interest of those that want an anonymous array for line slurping. For one thing, it is easier to pass around to other subs and, for another, it eliminates the extra copying of the lines via return. So my module will support multiple subroutines with one that returns the file based on context, and the other returns only lines (either as a list or as an anonymous array). So this API is in between the two CPAN modules. There is no need for a specific slurp-in-as-a-scalar subroutine as the general slurp() will do that in scalar context. If you wanted to slurp a scalar into an array, just select the desired array element and that will provide scalar context to the read_file() subroutine.

The next area to cover is what to name these subs. I will go with read_file() and read_file_lines(). They are descriptive, simple and don't use the 'slurp' nickname (though that nickname is in the module name).

Another critical area when designing APIs is how to pass in arguments. The read_file*() subroutines takes one required argument which is the file name. To support binmode() we need another optional argument. A third optional argument is needed to support returning a slurped scalar by reference. My first thought was to design the API with 3 positional arguments - file name, buffer reference and binmode. But if you want to set the binmode and not pass in a buffer reference, you have to fill the second argument with undef and that is ugly. So I decided to make the filename argument positional and the other two named. The subroutine starts off like this:


sub read_file {

    my( $file_name, %args ) = @_ ;

    my $buf ;
    my $buf_ref = $args{'buf'} || \$buf ;

The binmode argument will be handled later (see code below).

The other sub (read_file_lines()) will only take an optional binmode (so you can read files with binary delimiters). It doesn't need a buffer reference argument since it can return an anonymous array if the called in a scalar context. So this subroutine could use positional arguments, but to keep its API similar to the API of read_file(), it will also use pass by name for the optional arguments. This also means that new optional arguments can be added later without breaking any legacy code. A bonus with keeping the API the same for both subs will be seen how the two subs are optimized to work together.

Write slurping (or spewing or burping :-)) needs to have its API designed as well. The biggest issue is not only needing to support optional arguments but a list of arguments to be written is needed. Perl 6 will be able to handle that with optional named arguments and a final slurp argument. Since this is Perl 5, we have to do it using some cleverness. The first argument is the file name and it will be positional as with the read_file subroutine. But how can we pass in the optional arguments and also a list of data? The solution lies in the fact that the data list should never contain a reference. Burping/spewing works only on plain data. So if the next argument is a hash reference, we can assume it contains the optional arguments and the rest of the arguments is the data list. So the write_file() subroutine will start off like this:


sub write_file {

    my $file_name = shift ;

    my $args = ( ref $_[0] eq 'HASH' ) ? shift : {} ;

Whether or not optional arguments are passed in, we leave the data list in @_ to minimize any more copying. You call write_file() like this:


write_file( 'foo', { binmode => ':raw' }, @data ) ;
write_file( 'junk', { append => 1 }, @more_junk ) ;
write_file( 'bar', @spew ) ;

Fast Slurping

Somewhere along the line, I learned about a way to slurp files faster than by setting $/ to undef. The method is very simple, you do a single read call with the size of the file (which the -s operator provides). This bypasses the I/O loop inside perl that checks for EOF and does all sorts of processing. I then decided to experiment and found that sysread is even faster as you would expect. sysread bypasses all of Perl's stdin and reads the file from the kernel buffers directly into a Perl scalar. This is why the slurp code in File::Slurp uses sysopen/sysread/syswrite. All the rest of the code is just to support the various options and data passing techniques.

Benchmarks

Benchmarks can be enlightening, informative, frustrating and deceiving. It would make no sense to create a new and more complex slurp module unless it also gained significantly in speed. So I created a benchmark script which compares various slurp methods with differing file sizes and calling contexts. This script can be run from the main directory of the tarball like this:


perl -Ilib extras/slurp_bench.pl

If you pass in an argument on the command line, it will be passed to timethese() and it will control the duration. It defaults to -2 which makes each benchmark run to at least 2 seconds of CPU time.

The following numbers are from a run I did on my 300Mhz sparc. You will most likely get much faster counts on your boxes but the relative speeds shouldn't change by much. If you see major differences on your benchmarks, please send me the results and your Perl and OS versions. Also you can play with the benchmark script and add more slurp variations or data files.

The rest of this section will be discussing the results of the benchmarks. You can refer to extras/slurp_bench.pl to see the code for the individual benchmarks. If the benchmark name starts with cpan_, it is either from Slurp.pm or File::Slurp.pm. Those starting with new_ are from the new File::Slurp.pm. Those that start with file_contents_ are from a client's code base. The rest are variations I created to highlight certain aspects of the benchmarks.

The short and long file data is made like this:


my @lines = ( 'abc' x 30 . "\n")  x 100 ;
my $text = join( '', @lines ) ;

@lines = ( 'abc' x 40 . "\n")  x 1000 ;
$text = join( '', @lines ) ;

So the short file is 9,100 bytes and the long file is 121,000 bytes.

Scalar Slurp of Short File


file_contents        651/s
file_contents_no_OO  828/s
cpan_read_file      1866/s
cpan_slurp          1934/s
read_file           2079/s
new                 2270/s
new_buf_ref         2403/s
new_scalar_ref      2415/s
sysread_file        2572/s

Scalar Slurp of Long File


file_contents_no_OO 82.9/s
file_contents       85.4/s
cpan_read_file       250/s
cpan_slurp           257/s
read_file            323/s
new                  468/s
sysread_file         489/s
new_scalar_ref       766/s
new_buf_ref          767/s

The primary inference you get from looking at the numbers above is that when slurping a file into a scalar, the longer the file, the more time you save by returning the result via a scalar reference. The time for the extra buffer copy can add up. The new module came out on top overall except for the very simple sysread_file entry which was added to highlight the overhead of the more flexible new module which isn't that much. The file_contents entries are always the worst since they do a list slurp and then a join, which is a classic newbie and cargo culted style which is extremely slow. Also the OO code in file_contents slows it down even more (I added the file_contents_no_OO entry to show this). The two CPAN modules are decent with small files but they are laggards compared to the new module when the file gets much larger.

List Slurp of Short File


cpan_read_file          589/s
cpan_slurp_to_array     620/s
read_file               824/s
new_array_ref           824/s
sysread_file            828/s
new                     829/s
new_in_anon_array       833/s
cpan_slurp_to_array_ref 836/s

List Slurp of Long File


cpan_read_file          62.4/s
cpan_slurp_to_array     62.7/s
read_file               92.9/s
sysread_file            94.8/s
new_array_ref           95.5/s
new                     96.2/s
cpan_slurp_to_array_ref 96.3/s
new_in_anon_array       97.2/s

This is perhaps the most interesting result of this benchmark. Five different entries have effectively tied for the lead. The logical conclusion is that splitting the input into lines is the bounding operation, no matter how the file gets slurped. This is the only benchmark where the new module isn't the clear winner (in the long file entries - it is no worse than a close second in the short file entries).

Note: In the benchmark information for all the spew entries, the extra number at the end of each line is how many wall-clock seconds the whole entry took. The benchmarks were run for at least 2 CPU seconds per entry. The unusually large wall-clock times will be discussed below.

Scalar Spew of Short File


cpan_write_file 1035/s  38
print_file      1055/s  41
syswrite_file   1135/s  44
new             1519/s  2
print_join_file 1766/s  2
new_ref         1900/s  2
syswrite_file2  2138/s  2

Scalar Spew of Long File


cpan_write_file 164/s   20
print_file      211/s   26
syswrite_file   236/s   25
print_join_file 277/s   2
new             295/s   2
syswrite_file2  428/s   2
new_ref         608/s   2

In the scalar spew entries, the new module API wins when it is passed a reference to the scalar buffer. The syswrite_file2 entry beats it with the shorter file due to its simpler code. The old CPAN module is the slowest due to its extra copying of the data and its use of print.

List Spew of Short File


cpan_write_file  794/s  29
syswrite_file   1000/s  38
print_file      1013/s  42
new             1399/s  2
print_join_file 1557/s  2

List Spew of Long File


cpan_write_file 112/s   12
print_file      179/s   21
syswrite_file   181/s   19
print_join_file 205/s   2
new             228/s   2

Again, the simple print_join_file entry beats the new module when spewing a short list of lines to a file. But is loses to the new module when the file size gets longer. The old CPAN module lags behind the others since it first makes an extra copy of the lines and then it calls print on the output list and that is much slower than passing to print a single scalar generated by join. The print_file entry shows the advantage of directly printing @_ and the print_join_file adds the join optimization.

Now about those long wall-clock times. If you look carefully at the benchmark code of all the spew entries, you will find that some always write to new files and some overwrite existing files. When I asked David Muir why the old File::Slurp module had an overwrite subroutine, he answered that by overwriting a file, you always guarantee something readable is in the file. If you create a new file, there is a moment when the new file is created but has no data in it. I feel this is not a good enough answer. Even when overwriting, you can write a shorter file than the existing file and then you have to truncate the file to the new size. There is a small race window there where another process can slurp in the file with the new data followed by leftover junk from the previous version of the file. This reinforces the point that the only way to ensure consistent file data is the proper use of file locks.

But what about those long times? Well it is all about the difference between creating files and overwriting existing ones. The former have to allocate new inodes (or the equivalent on other file systems) and the latter can reuse the existing inode. This mean the overwrite will save on disk seeks as well as on cpu time. In fact when running this benchmark, I could hear my disk going crazy allocating inodes during the spew operations. This speedup in both cpu and wall-clock is why the new module always does overwriting when spewing files. It also does the proper truncate (and this is checked in the tests by spewing shorter files after longer ones had previously been written). The overwrite subroutine is just an typeglob alias to write_file and is there for backwards compatibility with the old File::Slurp module.

Benchmark Conclusion

Other than a few cases where a simpler entry beat it out, the new File::Slurp module is either the speed leader or among the leaders. Its special APIs for passing buffers by reference prove to be very useful speedups. Also it uses all the other optimizations including using sysread/syswrite and joining output lines. I expect many projects that extensively use slurping will notice the speed improvements, especially if they rewrite their code to take advantage of the new API features. Even if they don't touch their code and use the simple API they will get a significant speedup.

Error Handling

Slurp subroutines are subject to conditions such as not being able to open the file, or I/O errors. How these errors are handled, and what the caller will see, are important aspects of the design of an API. The classic error handling for slurping has been to call die() or even better, croak(). But sometimes you want the slurp to either warn()/carp() or allow your code to handle the error. Sure, this can be done by wrapping the slurp in a eval block to catch a fatal error, but not everyone wants all that extra code. So I have added another option to all the subroutines which selects the error handling. If the 'err_mode' option is 'croak' (which is also the default), the called subroutine will croak. If set to 'carp' then carp will be called. Set to any other string (use 'quiet' when you want to be explicit) and no error handler is called. Then the caller can use the error status from the call.

write_file() doesn't use the return value for data so it can return a false status value in-band to mark an error. read_file() does use its return value for data, but we can still make it pass back the error status. A successful read in any scalar mode will return either a defined data string or a reference to a scalar or array. So a bare return would work here. But if you slurp in lines by calling it in a list context, a bare return will return an empty list, which is the same value it would get from an existing but empty file. So now, read_file() will do something I normally strongly advocate against, i.e., returning an explicit undef value. In the scalar context this still returns a error, and in list context, the returned first value will be undef, and that is not legal data for the first element. So the list context also gets a error status it can detect:


my @lines = read_file( $file_name, err_mode => 'quiet' ) ;
your_handle_error( "$file_name can't be read\n" ) unless
    @lines && defined $lines[0] ;

The implementation

Here's the whole code which implements my faster slurp:


sub read_file {

    my( $file_name, %args ) = @_ ;

    my $buf ;
    my $buf_ref = $args{'buf_ref'} || \$buf ;

    my $mode = O_RDONLY ;
    $mode |= O_BINARY if $args{'binmode'} ;

    local( *FH ) ;
    sysopen( FH, $file_name, $mode ) or
        carp "Can't open $file_name: $!" ;

    my $size_left = -s FH ;

    while( $size_left > 0 ) {

        my $read_cnt = sysread( FH, ${$buf_ref},
            $size_left, length ${$buf_ref} ) ;

        unless( $read_cnt ) {

            carp "read error in file $file_name: $!" ;
            last ;
        }

            $size_left -= $read_cnt ;
    }

# handle void context (return scalar by buffer reference)
    return unless defined wantarray ;

# handle list context
    return split m|?<$/|g, ${$buf_ref} if wantarray ;

# handle scalar context
    return ${$buf_ref} ;
}

sub read_file_lines {
# handle list context
return &read_file if wantarray;
# otherwise handle scalar context
return [ &read_file ] ;
}

sub write_file {

    my $file_name = shift ;

    my $args = ( ref $_[0] eq 'HASH' ) ? shift : {} ;
    my $buf = join '', @_ ;

    my $mode = O_WRONGLY ;
    $mode |= O_BINARY if $args->{'binmode'} ;
    $mode |= O_APPEND if $args->{'append'} ;

    local( *FH ) ;
    sysopen( FH, $file_name, $mode ) or
        carp "Can't open $file_name: $!" ;

    my $size_left = length( $buf ) ;
    my $offset = 0 ;

    while( $size_left > 0 ) {

        my $write_cnt = syswrite( FH, $buf,
                $size_left, $offset ) ;

        unless( $write_cnt ) {

            carp "write error in file $file_name: $!" ;
            last ;
        }

        $size_left -= $write_cnt ;
        $offset += $write_cnt ;
    }

    return ;
}

In Summary

We have compared classic line-by-line processing with munging a whole file in memory. Slurping files can speed up your programs and simplify your code, if done properly. You must still be aware to not slurp humongous files (logs, DNA sequences, and so forth), or STDIN, where you don't know how much data you will read in. But slurping megabyte-size files is not a major issue on today's systems with the typical amount of RAM installed. When Perl was first being used in-depth (Perl 4), slurping was limited by the smaller RAM size of ten years ago. Now, you should be able to slurp almost any reasonably sized file, whether it contains configuration, source code, or data.

Solving Puzzles with LM-Solve

Suppose you encounter a (single-player) riddle or a puzzle that you don't know how to solve. Let's also suppose that this puzzle involves moving between several states of the board with an enumerable number of moves emerging from one state. In this case, LM-Solve (or Games::LMSolve on CPAN) may be of help.

LM-Solve was originally written to tackle various types of the so-called logic mazes that can be found online. Nevertheless, it can be extended to support many other types of single-player puzzles.

In this article, I will demonstrate how to use LM-Solve to solve a type of puzzle that it does not know yet to solve.

Installation

Use the CPAN.pm module install Games::LMSolve command to install LM-Solve. For instance, invoke the following command on the command line:

$ perl -MCPAN -e 'install Games::LMSolve'

That's it! (LM-Solve does not require any non-base modules, and should run on all recent versions of Perl.)

The Puzzle in Question

The puzzle in question is called "Jumping Cards" and is taken from the Macalester College Problem of the Week No. 949. In this puzzle, we start with eight cards in a row (labeled 1 to 8). We have to transform it into the 8 to 1 sequence, by swapping two cards at a time, as long as the following condition is met: at any time, two neighboring cards must be in one, two, or three spaces of each other.

Let's experience with this puzzle a bit. We start with the following formation:

1 2 3 4 5 6 7 8

Let's swap 1 and 3, and see what it gives us:

3 2 1 4 5 6 7 8

Now, we cannot exchange 1 and 4, because then, 1 would be close to the 5, and 5-1 is 4, which is more than 3. So let's exchange 2 and 1:

3 1 2 4 5 6 7 8

Now we can exchange 2 and 4:

3 1 4 2 5 6 7 8

And so on.

Let's Start ... Coding!

The Games::LMSolve::Base class tries to solve a game by iterating through its various positions, recording every one it passes through, and trying to reach the solution. However, it does not know in advance what the games rules are, and what the meaning of the positions and moves are. In order for it to know that, we need to inherit it and code several methods that are abstract in the base class.

We will code a derived class that will implement the logic specific to the Jumping Cards game. It will implement the following methods, which, together with the methods of the base class, enable the solver to solve the game:

  1. input_board
  2. pack_state
  3. unpack_state
  4. display_state
  5. check_if_final_state
  6. enumerate_moves
  7. perform_move
  8. render_move

Here's the beginning of the file where we put the script:

package Jumping::Cards;

use strict;

use Games::LMSolve::Base;

use vars qw(@ISA);

@ISA=qw(Games::LMSolve::Base);

As can be seen, we declared a new package, Jumping::Cards, imported the Games::LMSolve::Base namespace, and inherited from it. Now let's start declaring the methods. First, a method to input the board in question.

Since our board is constant, we just return an array reference that contains the initial sequence.

sub input_board
{
    my $self = shift;

    my $filename = shift;

    return [ 1 .. 8 ];
}

When Games::LMSolve::Base iterates over the states, it stores data about each state in a hash. This means we're going to have to provide a way to convert each state from its expanded form into a uniquely identifying string. The pack_state method does this, and in our case, it will look like this:

# A function that accepts the expanded state (as an array ref)
# and returns an atom that represents it.
sub pack_state
{
    my $self = shift;
    my $state_vector = shift;
    return join(",", @$state_vector);
}

It is a good idea to use functions like pack, join or any other serialization mechanism here. In our case, we simply used join.

It is not very convenient to manipulate a packed state, and so we need another function to expand it. unpack_state does the opposite of pack_state and expands a packed state.

# A function that accepts an atom that represents a state 
# and returns an array ref that represents it.
sub unpack_state
{
    my $self = shift;
    my $state = shift;
    return [ split(/,/, $state) ];
}

display_state() converts a packed state to a user-readable string. This is so that it can be displayed to the user. In our case, the comma-delimited notation is already readable, so we leave it as that.

# Accept an atom that represents a state and output a 
# user-readable string that describes it.
sub display_state
{
    my $self = shift;
    my $state = shift;
    return $state;
}

We need to determine when we have reached our goal and can terminate the search with a success. The check_if_final_state function accepts an expanded state and checks if it qualifies as a final state. In our case, it is final if it's the 8-to-1 sequence.

sub check_if_final_state
{
    my $self = shift;

    my $coords = shift;
    return join(",", @$coords) eq "8,7,6,5,4,3,2,1";
}

Now we need a function that will tell the solver what subsequent states are available from each state. This is done by enumerating a set of moves that can be performed on the state. The enumerate_moves function does exactly that.

# This function enumerates the moves accessible to the state.
sub enumerate_moves
{
    my $self = shift;

    my $state = shift;

    my (@moves);
    for my $i (0 .. 6)
    {
        for my $j (($i+1) .. 7)
        {
            my @new = @$state;
            @new[$i,$j]=@new[$j,$i];
            my $is_ok = 1;
            for my $t (0 .. 6)
            {
                if (abs($new[$t]-$new[$t+1]) > 3)
                {
                    $is_ok = 0;
                    last;
                }
            }
            if ($is_ok)
            {
                push @moves, [$i,$j];
            }
        }
    }
    return @moves;
}

What enumerate_moves does is iterate over the indices of the locations twice, and checks every move for the validity of the resultant board. If it's OK, it pushes the exchanged indices to the array @moves, which is returned at the end.

We also need a function that will translate an origin state and a move to a resultant state. The perform_move function performs a move on a state and returns the new state. In our case, it simply swaps the cards in the two indices specified by the move.

# This function accepts a state and a move. It tries to perform the
# move on the state. If it is successful, it returns the new state.
sub perform_move
{
    my $self = shift;

    my $state = shift;
    my $m = shift;

    my @new = @$state;

    my ($i,$j) = @$m;
    @new[$i,$j]=@new[$j,$i];
    return \@new;
}

Finally, we need a function that will render a move into a user-readable string, so it can be displayed to the user.

sub render_move
{
    my $self = shift;

    my $move = shift;

    if (defined($move))
    {
        return join(" <=> ", @$move);
    }
    else
    {
        return "";
    }
}

Invoking the Solver

To make the solver invokable, create an instance of it in the main namespace, and call its main() function. This will turn it into a script that will solve the board. The code is this:

package main;

my $solver = Jumping::Cards->new();
$solver->main();

Now save everything to a file, jumping_cards.pl (or download the complete one), and invoke it like this: perl jumping_cards.pl --norle --output-states. The --norle option means not to run-length encode the moves. In our case, run-length encoding will do no good, because a move can appear only once (or else its effect will be reversed). --output-states causes the states to be displayed in the solution.

The program thinks a little and then outputs:

solved
solved
1,2,3,4,5,6,7,8: Move = 0 <=> 1
2,1,3,4,5,6,7,8: Move = 1 <=> 2
2,3,1,4,5,6,7,8: Move = 1 <=> 3
2,4,1,3,5,6,7,8: Move = 4 <=> 5
2,4,1,3,6,5,7,8: Move = 0 <=> 4
6,4,1,3,2,5,7,8: Move = 2 <=> 3
6,4,3,1,2,5,7,8: Move = 0 <=> 1
4,6,3,1,2,5,7,8: Move = 0 <=> 7
8,6,3,1,2,5,7,4: Move = 6 <=> 7
8,6,3,1,2,5,4,7: Move = 3 <=> 5
8,6,3,5,2,1,4,7: Move = 2 <=> 7
8,6,7,5,2,1,4,3: Move = 1 <=> 2
8,7,6,5,2,1,4,3: Move = 4 <=> 6
8,7,6,5,4,1,2,3: Move = 5 <=> 7
8,7,6,5,4,3,2,1

Which is a correct solution to the problem. If you want to see a run-time display of the solving process, add the --rtd switch.

Conclusion

LM-Solve is a usable and flexible framework for writing your own solvers for various kind of puzzles such as the above. Puzzles that are good candidates for implementing solvers have a relatively limited number of states and a small number of states emerging from each origin state.

I found several solitaire games, such as Freecell, to be solvable by methods similar to the above. On the other hand, Klondike and other games with talon, are very hard to solve using such methods, because the talon expands the number of states a great deal.

Still, for most "simple-minded" puzzles, LM-Solve is very attractive as a solver framework. Have fun!

This week on Perl 6, week ending 2003-11-09

Traditionally this paragraph concerns itself with a few words on what I've been up to before finally settling down to get the summary written. But despite the fact that it's nearly four o'clock, it's been one of those days where I seem to have done almost as much as Leon Brocard generally does to warrant a mention each week.

So, here's what's been happening in perl6-internals to make up for the lack of guff about breadmaking or whatever. (If you're interested, the raisin borodinsky I mentioned last week was an unmitigated disaster. The focaccia was fabulous though).

New glossary entries

Gregor N. Purdy has added a few entries to the Parrot glossary, so if you've been bursting to know what PIR, IMCC and other Parrot specific clusters of capitals stand for, check out docs/glossary.pod in the parrot distribution.

http://groups.google.com/groups

String Encodings hurt my head!

Peter Gibbs is attempting to implement the DBCS encoding (whatever that is) and has discovered that he can't implement skip_backward for it because of the mixture of 1-byte and 2-byte characters. He offered seven suggestions for the right thing to do at this impasse.

Michael Scott didn't have any suggestions about the Right Thing, but he did point to a page on his very lovely Parrot Wiki which discussed most things Unicode for parrot, and made a plea for Dan (or whoever) to produce a Strings PDD.

http://groups.google.com/groups

http://www.vendian.org/parrot/wiki/bin/view.cgi/Main/ParrotDistributionUnicodeSupport - Michael's WikiWord

Perl 6 patches

Allison Randal posted a couple of patches to the current (very) mini Perl 6 that comes with Parrot (in languages/perl6. A little later in the week, Joseph F. Ryan contributed a Perl 6 patch. It's good to see this receiving attention again.

http://groups.google.com/groups

Documentation

Nick Kostirya wondered why docs/parrot_assembly.pod appeared to be simply an old version of docs/pdds/pdd06_pasm.pod. He also worried that docs/ops/ appeared to be empty in the 0.0.13 release of Parrot. Dan noted that both of the parrot assembly docs were wrong, and that what would probably happen would be that the PDD would be updated and docs/parrot_assembly.pod would be retired. Jürgen Bömmels said that the empty docs/ops was because during the Great Move, the Makefile that generated those POD files didn't get updated to cope with the new location of the .ops files. Nick wondered which other POD files might be going away so he'd not have to go through the process of translating obsolete docs into Russian.

http://groups.google.com/groups

http://www.parrotcode.ks.ua/docs -- Why can't I type in Cyrillic?

From the "Interesting, but is it useful?" department

Melvin Smith has been playing with an uncommitted version of invoke which allows you to invoke a function by name not address. He outlined the ideas behind it (and the workaround to make it play nice with the GC system), but wondered if it was actually of any use. Dan and Leo both agreed that it wasn't because of issues with threading and the JIT.

http://groups.google.com/groups

Freeze/thaw data format and PBC

Leo Tötsch is working on the data serialization/deserialization (aka Freeze/Thaw) system discussed over the last few weeks. He wondered if there were any plans for the frozen image data format. Leo's plan is to use PBC constant format (with possible extensions) so things integrate neatly into bytecode. Dan had a bunch of comments, but the PBC based format idea seemed to be well received, with the caveat that it should be a 'dense' format.

http://groups.google.com/groups

Opening files on other layers

Jürgen Bömmels asked for comments on a patch for opening files on different layers which had a few issues that he felt needed clarifying. He and Melvin Smith spent some time discussing things.

Apologies for not doing a better job in summarizing this thread, but I'm hamstrung by not quite knowing what 'layer' means in this context.

http://groups.google.com/groups

Parrot Has PHP

Okay, so the subject line's not quite true (yet), but who could resist the recursive acronyminess of it? Anyhow:

Thies C. Arntzen and Sterling Hughes, core PHP hackers, popped up to discuss the work they're doing on porting PHP to Parrot. Specifically, they've hit a performance snag where PHP's typeless nature meant using a PMC where they would rather be using a native type for speed. Thies proposed a new datatype to get 'round the issue.

The general response was "Hey! Fabulous! Someone's making a serious effort to port a real language to Parrot! But that new type suggestion is just reinventing the PMC. Oh, and if you could change your generated code slightly you'd get much faster execution".

It's definitely fabulous though.

http://groups.google.com/groups

http://www.edwardbear.org/pap.pdf -- Thies and Sterling's presentation

newsub and implicit registers

Melvin Smith was concerned about the version of newsub that implicitly sets P0 and P1, which can give IMCC's register tracking code something of a headache. He proposed getting rid of the implicit version and simply using IMCC to hide things. Leo agreed that the implicit form of newsub wasn't really necessary, but pointed out that there were plenty more ops out there that had implicit registers that IMCC needed to track. Leo has a patch in his tree that deals with the issue.

http://groups.google.com/groups

HASH changes

Jürgen Bömmels wasn't entirely happy with some recent changes to HASH in src/hash.c which make the hash tests fail. Nor was he happy with the asymmetry of hash_put and hash_get, where you hash_put a void *value but hash_get back a HASHBUCKET. Leo apologised for breaking the tests, but defended the asymmetry because it allows for distinguishing between a value of NULL and a nonexistent key. Jürgen wasn't impressed, sometimes the ambiguity is exactly what you want.

Jürgen ended up submitting a patch which implements a new, extended hash querying API:


  HashBucket *hash_get_bucket(...);
  void       *hash_get(...);
  INTVAL      hash_exists(...);

http://groups.google.com/groups

NCI broken on Win32

NCI, the Native Call Interface, tries to help its users by adding the appropriate 'loadable library' extension to any library it's asked to load. This turns out to be the wrong thing to do when you're trying to use the NCI from Win32, where there's more than one possible extension for a loadable library (this turns out to be true for Mac OS X as well). Jonathan Worthington let slip the fact that he was working on a library to give Parrot access to the entire Win32 API, and that this would involve loading a file with a .drv extension, which isn't currently possible with Parrot on Win32.

The catch is that, in a lot of cases, you need to be able to leave the extension unspecified because different Unix like OSes use the same basic library name, but a different extension. (For instance, a Linux box uses '.so', and OS X uses '.dynlib'). Leo proposed a workaround of only adding the default extension if the file name didn't already include a . in the filepart. Jonathan thought that sounded workable, but suggested that simply trying to load the library with the name as given, then trying with the default extension and only if that fails throwing an exception, but Leo thought that that would slow down opening libraries quite substantially.

http://groups.google.com/groups

Regular expressions

It's been a busy old week this week. First we hear of an effort to port PHP to Parrot, and then Stéphane Payrard announces that he's started working on implementing Perl 6 Regular expressions. Which is nice.

http://groups.google.com/groups

Meanwhile in perl6-language

Nested modules

Luke Palmer noted that, as with Perl 5, there was no need for modules named Foo and Foo::Bar to be related. However, he wondered if it would be possible to do


   module Foo;
   module Bar {...}

and refer to the inner module as, say, 'Foo.Bar'. He also wondered about scary things like anonymous modules.

Larry came up with answers, essentially you could get the semantics Luke was after but not necessarily with the same syntax.

http://groups.google.com/groups

How to get environment variables

Andrew Shitov wanted to know how to get at environment variables from the Perl 6 mini language that comes with Parrot. Judging by the error report that he pasted on his second post, it looks like that's not supported in the current Perl 6 implementation.

http://groups.google.com/groups

Announcements, Apologies, Acknowledgements

As I said earlier, it's been a busy week. It's fabulous to see things like the PHP porting effort and Stéphane's work on Perl 6 regular expressions getting underway.

I'm afraid that anyone who thought I'd manage two Monday summaries in a row has been sadly disappointed. Sorry.

Despite that, if you found this summary valuable, please consider the following ways of showing your appreciation:

  • Take part. There's new, shiny projects to get involved with now, maybe you have exactly the knowledge and skills they need. http://dev.perl.org/perl6/ and http://www.parrotcode.org/ are good starting points. Hopefully once the PHP porting effort gets further down the road I'll be able to point you at a website for that too.
  • Take out your wallet and donate some good hard cash to the Perl Foundation http://donate.perl-foundation.org/ to help support Larry, Dan and Damian.
  • Take the time to drop me a line at pdcawley@bofh.org.uk letting me know what you think. I was enormously pleased last week to get some mail from someone at the Sanger Institute thanking me for the summaries. Personally, I reckon the Sanger Institute deserves far more thanks from all of us for keeping the genome free, but it's always nice to be appreciated.

Bringing Java into Perl

In this article, I will show how to bring Java code into a Perl program with Inline::Java. I won't probe the internals of Inline or Inline::Java, but I will tell you what you need to make a Java class available in a program or module. The program/module distinction is important only in one small piece of syntax, which I will point out.

The article starts with the Java code to be glued into Perl, then shows several approaches for doing so. First, the code is placed directly into a Perl program. Second, the code is placed into a module used by a program. Finally, the code is accessed via a small Perl proxy in the module.

The Java Code

Consider the following Java class:


    public class Hi {

        String greeting;

        public Hi(String greeting) {
            this.greeting = greeting;
        }

        public void setGreeting(String newGreeting) {
            greeting = newGreeting;
        }

        public String getGreeting() {
            return greeting;
        }
    }

This class is for demonstration only. Each of its objects is nothing but a wrapper for the string passed to the constructor. The only operations are accessors for that one string. Yet with this, we will learn most of what we need to know to use Java from Perl. Later, we will add a few features, to show how arrays are handled. That's not as interesting as it sounds, since Inline::Java almost always does all of the work without help.

A Program

Since we're talking about Perl, there is more than one way to incorporate our trivial Java class into a Perl program. (Vocabulary Note: Some people call Perl programs "scripts." I try not to.) Here, I'll show the most direct approach. Subsequent sections move to more and more indirect approaches, which are more often useful in practice.

Not surprisingly, the most direct approach is the simplest to understand. See if you can follow this:


    #!/usr/bin/perl
    use strict; use warnings;

    use Inline Java => <<'EOJ';
    public class Hi {
        // The class body is shown in the Java Code above
    }
    EOJ

    my $greeter = Hi->new("howdy");
    print $greeter->getGreeting(), "\n";

The Java class is the one above, so I have omitted all but the class declaration. The Perl code just wraps it, so it is tiny. To use Inline::Java, say use Inline Java => code where code tells Inline where to look for the code. In this case, the code follows inline (clever naming, huh?). Note that single-quote context is safest here. There are other ways to include the code; we'll see my favorite way later. The overly curious are welcome to consult the perldoc for all of the others.

Once Inline::Java has worked its magic -- and it is highly magical -- we can use the Java Hi class as if it was a Perl package. Inline::Java provides several ways to construct Java objects. I usually use the one shown here; namely, I pretend the Java constructor is called new, just like many Perl constructors are. In honor of Java, you might rather say my $greeter = new Hi("howdy");, but I usually avoid this indirect object form. You can even call the constructor by the class name as in my $greeter = Hi->Hi("howdy"); (or, you could even say the pathological my $greeter = Hi Hi("howdy");). Class methods are accessed just like the constructor, except that their names are the Java method names. Instance methods are called through an object reference, as if the reference were a Perl object.

Note that Inline::Java performs type conversions for us, so we can pass and receive Java primitive types in the appropriate Perl variables. This carries over to arrays, etc. When you think about what must be going on under the hood, you'll realize what a truly magical module this is.

A Module

I often say that most Perl code begins life in a program. As time goes by, the good parts of that code, the ones that can be reused, are factored out into modules. Suppose our greeter is really popular, so many programs want to use it. We don't want to have to include the Java code in each one (and possibly require each program to compile its own copy of the class file). Hence, we want a module. My module looks a lot like my earlier program, except for two features. First, I changed the way Inline looks for the code, which has nothing to do with whether the code is in a program or a module. Second, reaching class methods from any package other than main requires careful -- though not particularly difficult -- qualification of the method name.


    package Hi;
    use strict; use warnings;

    use Inline Java => "DATA";

    sub new {
        my $class    = shift;
        my $greeting = shift;
        return Hi::Hi->new($greeting);
    }

    1;

    __DATA__
    __Java__
    public class Hi {
        // The class body is shown in The Java Code above
    }

The package starts like all good packages, by using strict and warnings. The use Inline statement is almost like the previous one, but the code lives in the __DATA__ segment instead of actually being inline. Note that when you put the code in the __DATA__ segment, you must include a marker for your language so that Inline can find it. There are usually several choices for each language's marker; I chose __Java__. This allows Inline to glue from multiple languages into one source file.

The constructor is needed so that the caller does not need to know they are interfacing with Inline::Java. They call the constructor with Hi->new("greeting") as they would for a typical package called Hi. Yet, the module's constructor must do a bit of work to get the right object for the caller. It starts by retrieving the arguments, then returns the result of the unusual call Hi::Hi->new(...). The first Hi is for the Perl package and the second is for the Java class; both are required. Just as in the program from the last section, there are multiple ways to call the constructor. I chose the direct method with the name new. You could use the indirect object form and/or call the method by the class name. The returned object can be used as normal, so I just pass it back to the caller. All instance methods are passed directly through Inline::Java without help from Hi.pm. If there were class methods (declared with the static keyword in Java), I would either have to provide a wrapper, or the caller would have to qualify the names. Neither solution is particularly difficult, but I favor the wrapper, to keep the caller's effort to a minimum. This is my typical laziness at work. Since there will likely be several callers, and I will have to write them, I want to push any difficult parts into the module.

If you need to adapt the behavior of the Java object for your Perl audience, you may insert routines in Hi.pm to do that. For instance, perhaps you want a more typical Perl accessor, instead of the get/set pair used in the Java code. In this case, you must make your own genuine Perl object and proxy through it to the Java class. That might look something like this:


    package Hi2;
    use strict; use warnings;

    use Inline Java => "DATA";

    sub new {
        my $class    = shift;
        my $greeting = shift;
        bless { OBJECT => Hi2::Hi->new($greeting) }, $class;
    }

    sub greeting {
        my $self      = shift;
        my $new_value = shift;
        if (defined $new_value) {
            $self->{OBJECT}->setGreeting($new_value);
        }
        return $self->{OBJECT}->getGreeting();
    }

    1;

    __DATA__
    __Java__
    public class Hi {
        // Body omitted again
    }

Here, the object returned from Inline::Java, which I'll call the Java object for short, is stored in the OBJECT key of a hash-based Hi2 object that is returned to the caller. The distinction between the Perl package and the Java class is clear in this constructor call. The Perl package comes first, then the Java class, then the class method to call.

The greeting method, shifts in the $new_value, which the caller supplies if she wants to change the value. If $new_value is defined, greeting passes the set message to the Java object. In either case, it returns the current value to the caller, as Perl accessors usually do.

A Pure Proxy

In the last section, we saw how to make a Perl module access Java code. We also saw how to make the Perl module adapt between the caller's expectation of Perl objects and the underlying Java objects. Here, we will see how to access Java classes that can't be included in the Perl code.

There are a lot of Java libraries. These are usually distributed in compiled form in so-called .jar (java archive) files. This is good design on the part of the Java community, just as using modules is good design on the part of the Perl community. Just as we wanted to make the Hi Java class available to lots of programs -- and thus placed it in a module -- so the Java people put reusable code in .jars. (Yes, Java people share the bad pun heritage of the Unix people, which brought us names like yacc, bison, more, and less.)

Suppose that our humble greeter is so popular that it has been greatly expanded and .jarred for worldwide use. Unless we provide an adapter like the one shown earlier, the caller must use the .jarred code from Perl in a Java-like way. So I will now show three pieces of code: 1) an expanded greeter, 2) a Perl driver that uses it, and 3) a mildly adapting Perl module the driver can use.

Here's the expanded greeter; the two Perl pieces follow later:


    import java.util.Random;
    public class Higher {
        private static Random myRand = new Random();
        private String[] greetings;

        public Higher(String[] greetings) {
            this.greetings = greetings;
        }

        public void setGreetings(String[] newGreetings) {
            greetings = newGreetings;
        }

        public String[] getGreetings() {
            return greetings;
        }

        public void setGreeting(int index, String newGreeting) {
            greetings[index] = newGreeting;
        }

        public String getGreeting() {
            float randRet = myRand.nextFloat();
            int   index   = (int) (randRet * greetings.length);
            return greetings[index];
        }
    }

Now there are multiple greetings, so the constructor takes an array of Strings. There are get/set pairs for the whole list of greetings and for single greetings. The single get accessor returns one greeting at random. The single set accessor takes the index of the greeting to replace and its new value.

Note that Java arrays are fixed-size; don't let Inline::Java fool you into thinking otherwise. It is very good at making you think Java works just like Perl, even though this is not the case. Calling setGreeting with an out-of-bounds index will be fatal unless trapped. Yes, you can trap Java exceptions with eval and the $@ variable.

This driver uses the newly expanded greeter through Hi3.pm:


    #!/usr/bin/perl
    use strict; use warnings;

    use Hi3;

    my $greeter = Hi3->new(["Hello", "Bonjour", "Hey Y'all", "G'Day"]);
    print $greeter->getGreeting(), "\n";
          $greeter->setGreeting(0, "Howdy");
    print $greeter->getGreeting(), "\n";

The Hi3 module (directly below) provides access to the Java code. I called the constructor with an anonymous array. An array reference also works, but a simple list does not. The constructor returns a Java object (at least, it looks that way to us); the other calls just provide additional examples. Note, in particular, that setGreeting expects an int and a String. Inline::Java examines the arguments and coerces them into the best types it can. This nearly always works as expected. When it doesn't, you need to look in the documentation for "CASTING."

Finally, this is Hi3.pm (behold the power of Perl and the work of the Inline developers):


    package Hi3;
    use strict; use warnings;

    BEGIN {
        $ENV{CLASSPATH} .= ":/home/phil/jar_home/higher.jar";
    }
    use Inline Java  => 'STUDY',
               STUDY => ['Higher'];

    sub new {
        my $class = shift;
        return Hi3::Higher->new(@_);
    }

    1;

To use a class hidden in a .jar I need to do three things:

  1. Make sure an absolute path to the .jar file is in the CLASSPATH, before using Inline. A well-placed BEGIN block makes this happen.
  2. Use STUDY instead of providing Java source code.
  3. Add the STUDY directive to the use Inline statement. This tells Inline::Java to look for named classes. In this case, the list has only one element: Higher. Names in this list must be fully qualified if the corresponding class has a Java package.

The constructor just calls the Higher constructor through Inline::Java, as we have seen before.

Yes, this is the whole module, all 15 lines of it.

If you need an adapter between your caller and the Java library, you can put it in either Perl or Java code. I prefer to code such adapters in Perl when possible, following the plan we saw in the previous section. Yet occasionally, that is too painful, and I resort to Java. For example, the glue module Java::Build::JVM uses both a Java and a Perl adapter to ease communication with the genuine javac compiler. Look at the Java::Build distribution from CPAN for details.

Anatomy of Automated Compiling: A Brief Discussion

So what is Inline::Java doing for us? When it finds our Java code, it makes a copy in the .java file of the proper name (javac is adamant that class names and file names match). Then it uses our Java compiler to build a compiled version of the program. It puts that version in a directory, using an MD5 sum to ensure that recompiling happens when and only when the code changes.

You can cruise through the directories looking at what it did. If something goes wrong, it will even give you hints about where to look. Here's a tour of some of those directories. First, there is a base directory. If you don't do anything special, it will be called _Inline, under the working directory from which you launched the program. If you have a .Inline directory in your home directory, all Inline modules will use it. If you use the DIRECTORY directive in your use Inline statement, its value will be used instead. For ease of discussion, I'll call the directory _Inline.

Under _Inline is a config file that describes the various Inline languages available to you. More importantly, there are two subdirectories: build and lib. If your code compiles, the build directory will be cleaned. (That's the default behavior; you can include directives in your use Inline statement to control this.) If not, the build directory has a subdirectory for your program, with part of the MD5 sum in its name. That directory will hold the code in its .java file and the error output from javac in cmd.out.

Code that successfully compiles ends up in lib/auto. The actual .class files end up in a subdirectory, which is again named by class and MD5 sum. Typically, there will be three files there. The .class file is as normal. The other files describe the class. The .inl file has an Inline description of the class. It contains the full MD5 sum, so code does not need to be recompiled unless it changes. It also says when the code was compiled, along with a lot of other information about the Inline::Java currently installed. The .jdat file is specific to Inline::Java. It lists the signatures of the methods available in the class. Inline::Java finds these using Java's reflection system (reflection is the Java term for symbolic references).

See Also

For more information on Inline and Inline::Java and the other inline modules, see their perldoc. If you want to join in, sign up for the inline@perl.org mailing list, which is archived at nntp.x.perl.org/group/perl.inline.

Acknowledgements

Thanks to Brian Ingerson for Inline and Patrick LeBoutillier for Inline::Java. These excellent modules have saved me much time and heartache. In fact, I doubt I would have had the courage to use Java in Perl without them. Double thanks to Patrick LeBoutillier, since he took the time to read this article and correct some errors (including my failure to put "Bonjour" in the greetings list).

This week on Perl 6, week ending 2003-11-02

It's Monday morning, the croissants have been baked, the focaccia is glistening with all the extra virgin olive oil I poured on it as it left the oven and, in the airing cupboard, a raisin borodinsky slouches towards full proof (thought at the rate it's currently rising it'll probably be Tuesday before I can bake it off), what better time could there be to pause and write a summary?

So, I'll kick off with perl6-internals because, well, it's on the summary checklist (which goes something like: 1. Wibble about the weather or something before; 2. Start with perl6-internals; 3. Continue with perl6-language if it saw any traffic; 4. Make announcements, suggest people give money to the Perl Foundation; 5. Make sure Leon Brocard gets a mention; 6. Aspell; 7. Mail PODs to http://perl.com and a text version to perl6-announce; 8. Profit!)

NULL Px Proposal

Right at the end of the previous week, Melvin Smith suggested having the initial 'empty' PMC registers all point at a global PMCNull which would throw an exception if you tried to invoke any of its methods. Which sounds weird, but it does mean that you get a real exception instead of a segfault, and exceptions are so much more trappable. Dan liked the idea.

Melvin later posted a patch implementing the idea, which Leo Tötsch fixed up slightly an applied.

http://groups.google.com/groups

http://groups.google.com/groups

Parrot Calling Convention Confusion

Steve Fink is having problems using an unprototyped call to a prototyped function, which he thinks is a reasonable thing to do (and I think I agree with him; I can imagine cases where you have a function pointer or something where you don't know its exact prototype, but you do want to make a call to it so you'd be forced to make an unprototyped call.). Melvin Smith disagrees with him. Steve then went on to point out that he's still getting failures when the function is both declared and called in an unprototyped fashion. According to Leo, this is because unprototyped returns are neither defined nor implemented. Which is odd really -- I thought they were exactly the same as an unprototyped call, but you invoke the return continuation (P1) instead of P0, the other registers are set up exactly as if you were making an unprototyped function call.

http://groups.google.com/groups

A clash of symbols

Arthur Bergman, Ponie stablemaster, popped up to point out that definitions like

#define version obj.version

(found in include/parrot/pobj.h if you're interested) did some scary things to Perl_utilize in the Perl 5 core. Steve Fink stuck his hand up to being the person who added the version field (which is apparently rather handy if you're debugging the Garbage Collection (GC) system. Leo fixed things by applying a s/version/pobj_version/g patch.

http://groups.google.com/groups

Storing external data in PMCs

Arthur popped up again asking for help with implementing a Perl5LVALUE PMC. It turns out that the API doesn't quite support what he needs. After a certain amount of discussion of various options Arthur proposed a Parrot_PMC_attach_data(Parrot, PMC, *void) extension to the API. He didn't *quite* get what he wanted, but he got something very like it later in the week.

I did like Arthur's reasons for starting the serious Ponie effort by working on Perl5LVALUEs though: ``[because] it is so obscure that it's hardly used anywhere and is limited to a few small areas in the core''.

http://groups.google.com/groups

Screaming Pumpkins Ahoy!

On Monday, Leo declared that yes, Melvin Smith's proposed Halloween 'Screaming Pumpkin' Parrot release would be happening. Various people promised extra goodies, and a few problems were sorted out with some platforms.

Features were frozen on Wednesday and Parrot 0.0.13 ``Screaming Pumpkin'' was released upon a cowering world at 2003103114:11:46 precisely. For all the astonishing speed of the release cycle, there's a lot of good stuff to be had in the screaming pumpkin, check out Leo's announcement for details.

http://groups.google.com/groups

http://groups.google.com/groups -- Leo's announcement

Header Dependencies

Jürgen Bömmels wasn't entirely happy with the way Parrot's headers are set up. Apparently there are some nasty dependency (and crypto dependency) issues. He proposed fixing up the headers as much as possible to eliminate these issues. Dan and Leo thought it would be nice, but Dan thought there wouldn't be that much point in doing it as things would probably decay, and pretty much everything internal should just be using parrot.h, and everything external should be using extend.h or embed.h.

http://groups.google.com/groups

Tinderboxen

Jürgen Bömmels triggered a quick bout of tinderbox fixing, as various different hardware experts helped to figure out why several of the tinderboxes weren't building and testing successfully. Jonathan Worthington submitted a few patches to clean up the various Win32 warnings. David Robins submitted similar patches to clean up some Solaris warnings.

http://groups.google.com/groups

Wrapping C/C++ libraries

Anuradha Ratnaweera asked a bunch of questions about how Parrot and Perl 6 would interface to external C and C++ libraries. The Leo and Dan provided the answers.

http://groups.google.com/groups

Broken Windows

(Sorry about the heading, I couldn't resist). Jonathan Worthington reported some breakage of NCI on Win32. He and Leo worked through the issues in search of a fix.

http://groups.google.com/groups

Garbage-collecting classes

Luke Palmer responded to the comment in last week's summary about the way to instantiate an object in class Foo being:

new P5, .Foo

He worried that, because classes are now simply integers, there was no way to garbage collect class objects if anonymous classes were used heavily. Jeff Clites, Leo and Dan all rushed to reassure him.

http://groups.google.com/groups

Parrot IO Fun

On Thursday, Melvin Smith announced that Parrot had fetched its first web page. Your parrot can fetch webpages too, just update to the latest, check out examples/io/http.imc and the world is yours. Everyone made impressed noises.

http://groups.google.com/groups

Melvin Smith, tease of the week

Melvin announced that, in his personal copy of Parrot he has most of the Class metadata declaration support working, but that he wouldn't release it 'til after the Screaming Pumpkin release. If it's as good as he promised, I guess we can wait.

http://groups.google.com/groups

Character classification functions

Noting the presence of the is_digit function, Peter Gibbs wondered if it would be useful to have a set of is_foo functions, or better to have a single is_ctype function with an enum parameter. He preferred the is_ctype option, and set about implementing it. Ever the speed demon, Dan thought it best to have a reasonable set of is_foo functions for 'common' chartypes and a fallback is_ctype function for the rest. Once they were all wrapped up in sensible macros that would mean that you didn't have the possibility of a very small speed hit getting multiplied up in the middle of a tight loop, but the programmer wouldn't have to worry about which types were checked in which fashion.

http://groups.google.com/groups

Moving to a new PMC compiler

Leo Tötsch posted a patch to switch Parrot over to using pmc2c2, the new version of the PMC compiler, but didn't commit it because his Makefile was playing up. He asked for some help to fix that before the patch could be committed.

http://groups.google.com/groups

Summary suggestion

Ron Grunwald suggested that it might be a good idea to include a glossary with each summary explaining what IMCC, PMC and PIR stand for (um... Incredibly Magnificent Compiler Compiler(?), Parrot Magic Cookie and Parrot Intermediate Representation(?) respectively) and maybe giving some explanation of other bits of Parrot jargon. Which is a jolly good idea. But laziness dictates that, instead I point you all at docs/glossary.pod which, following a gentle nudge on the mailing list, covers the above acronyms and more (but sadly some of 'em are missing from the Screaming Pumpkin release).

http://groups.google.com/groups

Meanwhile, in perl6-language

There was some real traffic. With questions. And Answers! What is the world coming to? Admittedly, there were only 10 posts in the week, but it's the quality that counts.

Alternately named arguments

Remember last week, when Luke ``Edgecase finder general'' Palmer asked about named return values and made David Storrs boggle? This week he was answered. By Damian. And Larry. Any question that elicits a ``Please, no!'' from Damian has got to be a good question methinks.

The question revolves around the statement in Apocalypse 6 that a list on the left hand side of a binding operator (:=) is interpreted in the same way as a function signature. Which means that you can use 'named' binding:

(who => $name, why => $reason) := (why => $because, who => "me");
   # ($name == "me") && ($reason = $because)

What elicited the ``Please, no!'' was Luke's logical conclusion that this could also mean that you could do:

(+$name, +$id) := getinfo();

and the unary + would be treated as a 'named only' marker in the same way it would be in a function signature.

Interestingly, Larry was less convinced that this was inherently a bad thing, though he did propose that, under most forms of stricture that it should only work in the case of:

my(+$name, +$id) := getinfo();

because the my helps the reader to realise that + may not be in Kansas any more without having to reach the := and then reinterpret everything that went before. (If you've heard Damian, Allison or possibly even Larry talking about Perl 6, you've almost certainly come up against the concept of ``end weight''; this is an example).

http://groups.google.com/groups

State of the Conway

It's been a while since Damian said anything in perl6-language, so David Wheeler welcomed both him and Larry back. Damian thanked him, and posted a short status report. While he's not been in quite the same state as Larry, ``mild influenza and a little light pneumonia'' don't sound like a barrel of laughs either. Anyway, he's back on the case when he can spare the time from putting food on the table.

http://groups.google.com/groups

Questions about currying

Joe Gottman had some questions about currying. Luke Palmer and Jonathan Scott Duff answered 'em. I'm getting quite good at this summarizing malarkey aren't I?

http://groups.google.com/groups

Acknowledgements, Announcements, Apologies

Whee! Monday is this week's Monday! What's the betting I can keep it up?

Those of you who've been reading this summary since almost the beginning will remember that I copped the basic 'chunks of text followed by a link or two' format/structure(hah!) of these summaries from NTK. Well, this Friday's NTK had some coverage of the state of Larry and Damian's health in their HARD NEWS section (http://www.ntk.net/2003/10/31/).

If you found this summary valuable, please consider one or more of:

  • Join in! The project needs people. http://dev/perl.org/perl6/ and http://www.parrotcode.org/ are good starting points for information.
  • Chip in! Money helps keep the wheels turning, programmers like Larry, Dan and Damian have families to feed and medical bills to pay, finding that money takes time away from getting Perl 6 out of the door. By donating money to The Perl Foundation http://donate.perl-foundation.org/ you can help.
  • Chime in! It's always good to get direct feedback from readers. Drop me a line to let me know what you think at mailto:p6summarizer@bofh.org.uk, I promise I'll reply (assuming it's not spam, in which case I shall simply wish recurrent hemorrhoids on you).
Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en