June 2003 Archives

Perl 6 Design Philosophy

Editor's note: Perl 6 Essentials is the first book to offer a peek into the next major version of the Perl language. It covers the development of Perl 6 syntax as well as Parrot, the language-independent interpreter developed as part of the Perl 6 design strategy. In this excerpt from Chapter 3 of the book, the authors take an in-depth look of some of the most important principles of natural language and their impact on the design decisions made in Perl 6.

Introduction

At the heart of every language is a core set of ideals that give the language its direction and purpose. If you really want to understand the choices that language designers make--why they choose one feature over another or one way of expressing a feature over another--the best place to start is with the reasoning behind the choices.

Perl 6 has a unique set of influences. It has deep roots in Unix and the children of Unix, which gives it a strong emphasis on utility and practicality. It's grounded in the academic pursuits of computer science and software engineering, which gives it a desire to solve problems the right way, not just the most expedient way. It's heavily steeped in the traditions of linguistics and anthropology, which gives it the goal of comfortable adaptation to human use. These influences and others like them define the shape of Perl and what it will become.

Linguistic and Cognitive Considerations

Perl is a human language. Now, there are significant differences between Perl and languages like English, French, German, etc. For one, it is artificially constructed, not naturally occurring. Its primary use, providing a set of instructions for a machine to follow, covers a limited range of human existence. Even so, Perl is a language humans use for communicating. Many of the same mental processes that go into speaking or writing are duplicated in writing code. The process of learning to use Perl is much like learning to speak a second language. The mental processes involved in reading are also relevant. Even though the primary audience of Perl code is a machine, as often as not humans have to read the code while they're writing it, reviewing it, or maintaining it.

Many Perl design decisions have been heavily influenced by the principles of natural language. The following are some of the most important principles, the ones we come back to over and over again while working on the design and the ones that have had the greatest impact.

The Waterbed Theory of Complexity

The natural tendency in human languages is to keep overall complexity about equivalent, both from one language to the next, and over time as a language changes. Like a waterbed, if you push down the complexity in one part of the language, it increases complexity elsewhere. A language with a rich system of sounds (phonology) might compensate with a simpler syntax. A language with a limited sound system might have a complex way of building words from smaller pieces (morphology). No language is complex in every way, as that would be unusable. Likewise, no language is completely simple, as too few distinctions would render it useless.

The same is true of computer languages. They require a constant balance between complexity and simplicity. Restricting the possible operators to a small set leads to a proliferation of user-defined methods and subroutines. This is not a bad thing, in itself, but it encourages code that is verbose and difficult to read. On the other hand, a language with too many operators encourages code that is heavy in line noise and difficult to read. Somewhere in the middle lies the perfect balance.

The Principle of Simplicity

In general, a simple solution is preferable to a complex one. A simple syntax is easier to teach, remember, use, and read. But this principle is in constant tension with the waterbed theory. Simplification in the wrong area is one danger to avoid. Another is false simplicity or oversimplification. Some problems are complex and require a complex solution. Perl 6 grammars aren't simple. But they are complex at the language level in a way that allows simpler solutions at the user level.

The Principle of Adaptability

Natural languages grow and change over time. They respond to changes in the environment and to internal pressure. New vocabulary springs up to handle new communication needs. Old idioms die off as people forget them, and newer, more relevant idioms take their place. Complex parts of the system tend to break down and simplify over time. Change is what keeps language active and relevant to the people who use it. Only dead languages stop changing.

The plan for Perl 6 explicitly includes plans for future language changes. No one believes that Perl 6.0.0 will be perfect, but at the same time, no one wants another change process quite as dramatic as Perl 6. So Perl 6 will be flexible and adaptable enough to allow gradual shifts over time. This has influenced a number of design decisions, including making it easy to modify how the language is parsed, lowering the distinctions between core operations and user-defined operations, and making it easy to define new operators.

The Principle of Prominence

In natural languages, certain structures and stylistic devices draw attention to an important element. This could be emphasis, as in "The dog stole my wallet" (the dog, not the man), or extra verbiage, as in "It was the dog who stole my wallet," or a shift to an unusual word order, "My wallet was stolen by the dog" (my wallet, not my shoe, etc.), or any number of other verbal tricks.

Perl is designed with its own set of stylistic devices to mark prominence, some within the language itself, and some that give users flexibility to mark prominence within their code. The NAMED blocks use all capitals to draw attention to the fact that they're outside the normal flow of control. Perl 5 has an alternate syntax for control structures like if and for, which moves them to the end to serve as statement modifiers (because Perl is a left-to-right language, the left side is always a position of prominence). Perl 6 keeps this flexibility, and adds a few new control structures to the list.

The balance for design is to decide which features deserve to be marked as prominent, and where the syntax needs a little flexibility so the language can be more expressive.

The Principle of End Weight

Natural languages place large complex elements at the end of sentences. So, even though "I gave Mary the book" and "I gave the book to Mary" are equally comfortable, "I gave the book about the history of development of peanut-based products in Indonesia to Mary" is definitely less comfortable than the other way around. This is largely a mental parsing problem. It's easier to interpret the major blocks of the sentence all at once than to start with a few, work through a large chunk of minor information, and then go back to fill in the major sentence structure. Human memory is limited.

End weight is one of the reasons regular expression modifiers were moved to the front in Perl 6. It's easier to read a grammar rule when you know things like "this rule is case insensitive" right at the start. (It's also easier for the machine to parse, which is almost as important.)

End weight is also why there has been some desire to reorder the arguments in grep to:

grep @array { potentially long and complex block };

But that change causes enough cultural tension that it may not happen.

The Principle of Context

Natural languages use context when interpreting meaning. The meanings of "hot" in "a hot day," "a hot stereo," "a hot idea," and "a hot debate" are all quite different. The implied meaning of "it's wet" changes depending on whether it's a response to "Should I take a coat?" or "Why is the dog running around the kitchen?" The surrounding context allows us to distinguish these meanings. Context appears in other areas as well. A painting of an abstract orange sphere will be interpreted differently depending on whether the other objects in the painting are bananas, clowns, or basketball players. The human mind constantly tries to make sense of the universe, and it uses every available clue.

Perl has always been a context-sensitive language. It makes use of context in a number of different ways. The most obvious use is scalar and list contexts, where a variable may return a different value depending on where and how it's used. These have been extended in Perl 6 to include string context, boolean context, numeric context, and others. Another use of context is the $_ defaults, like print, chomp, matches, and now when.

Context-dependent features are harder to write an interpreter for, but they're easier on the people who use the language daily. They fit in with the way humans naturally think, which is one of Perl's top goals.

The Principle of DWIM

In natural languages there is a notion called "native speaker's intuition." Someone who speaks a language fluently will be able to tell whether a sentence is correct, even if they can't consciously explain the rules. (This has little to do with the difficulty English teachers have getting their students to use "proper" grammar. The rules of formal written English are very different from the rules of spoken English.)

As much as possible, features should do what the user expects. This concept of DWIM, or "Do What I Mean," is largely a matter of intuition. The user's experiences, language exposure, and cultural background all influence their expectations. This means that intuition varies from person to person. An English speaker won't expect the same things as a Dutch speaker, and an Ada programmer won't expect the same things as a COBOL programmer.

The trick in design is to use the programmer's intuitions instead of fighting against them. A clearly defined set of rules will never match the power of a feature that "just seems right."

Perl 6 targets Perl programmers. What seems right to one Perl programmer may not seem right to another, so no feature will please everyone. But it is possible to catch the majority cases.

Perl generally targets English speakers. It uses words like "given," which gives English speakers a head start in understanding its behavior in code. Of course, not all Perl programmers are English speakers. In some cases idiomatic English is toned down for broader appeal. In grammar rules, ordinal modifiers have the form 1st, 2nd, 3rd, 4th, etc., because those are most natural for native English speakers. But they also have an alternate form 1th, 2th, etc., with the general rule Nth, because the English endings for ordinal numbers are chaotic and unfriendly to non-native speakers.

The Principle of Reuse

Human languages tend to have a limited set of structures and reuse them repeatedly in different contexts. Programming languages also employ a set of ordinary syntactic conventions. A language that used { } braces to delimit loops but paired keywords to delimit if statements (like if ... then ... end if) would be incredibly annoying. Too many rules make it hard to find the pattern.

In design, if you have a certain syntax to express one feature, it's often better to use the same syntax for a related feature than to invent something entirely new. It gives the language an overall sense of consistency, and makes the new features easier to remember. This is part of why grammars are structured as classes. Grammars could use any syntax, but classes already express many of the features grammars need, like inheritance and the concept of creating an instance.

The Principle of Distinction

The human mind has an easier time identifying big differences than small ones. The words "cat" and "dog" are easier to tell apart than "snore" and "shore." Usually context provides the necessary clues, but if "cats" were "togs," we would be endlessly correcting people who heard us wrong ("No, I said the Johnsons got a new dog, not tog, dog.").

The design consideration is to build in visual clues to subtle contrasts. The language should avoid making too many different things similar. Excessive overloading reduces readability and increases the chance for confusion. This is part of the motivation for splitting the two meanings of eval into try and eval, the two meanings of for into for and loop, and the two uses of sub into sub and method.

Distinction and reuse are in constant tension. If too many features are reused and overloaded, the language will begin to blur together. Far too much time will be spent trying to figure out exactly which use is intended. But, if too many features are entirely distinct, the language will lose all sense of consistency and coherence. Again, it's a balance.

Language Cannot Be Separated from Culture

A natural language without a community of speakers is a dead language. It may be studied for academic reasons, but unless someone takes the effort to preserve the language, it will eventually be lost entirely. A language adds to the community's sense of identity, while the community keeps the language relevant and passes it on to future generations. The community's culture shapes the language and gives it a purpose for existence.

Computer languages are equally dependent on the community behind them. You can measure it by corporate backing, lines of code in operation, or user interest, but it all boils down to this: a programming language is dead if it's not used. The final sign of language death is when there are no compilers or interpreters for the language that will run on existing hardware and operating systems.

For design work this means it's not enough to only consider how a feature fits with other features in the language. The community's traditions and expectations also weigh in, and some changes have a cultural price.

The Principle of Freedom

In natural languages there is always more than one way to express an idea. The author or speaker has the freedom, and the responsibility, to pick the best phrasing--to put just the right spin on the idea so it makes sense to their audience.

Perl has always operated on the principle that programmers should have the freedom to choose how to express their code. It provides easy access to powerful features and leaves it to the individuals to use them wisely. It offers customs and conventions rather than enforcing laws.

This principle influences design in several ways. If a feature is beneficial to the language as a whole, it won't be rejected just because someone could use it foolishly. On the other hand, we aren't above making some features difficult to use, if they should be used rarely.

Another part of the design challenge is to build tools that will have many uses. No one wants a cookbook that reads like a Stephen King novel, and no one wants a one-liner with the elaborate structure of a class definition. The language has to be flexible to accommodate freedom.

The Principle of Borrowing

Borrowing is common in natural languages. When a new technology (food, clothing, etc.) is introduced from another culture, it's quite natural to adopt the original name for it. Most of the time borrowed words are adapted to the new language. In English, no one pronounces "tortilla," "lasagna," or "champagne" exactly as in the original languages. They've been altered to fit the English sound system.

Perl has always borrowed features, and Perl 6 will too. There's no shame in acknowledging that another language did an excellent job implementing a particular feature. It's far better to openly borrow a good feature than to pretend it's original. Perl doesn't have to be different just for the sake of being different. Most features won't be adopted without any changes, though. Every language has its own conventions and syntax, and many aren't compatible. So, Perl borrows features, but uses equivalent structures to express them.

Architectural Considerations

The second set of principles governs the overall architecture of Perl 6. These principles are connected to the past, present, and future of Perl, and define the fundamental purpose of Perl 6. No principle stands alone; each is balanced against the others.

Perl Should Stay Perl

Everyone agrees that Perl 6 should still be Perl, but the question is, what exactly does that mean? It doesn't mean Perl 6 will have exactly the same syntax. It doesn't mean Perl 6 will have exactly the same features. If it did, Perl 6 would just be Perl 5. So, the core of the question is what makes Perl "Perl"?

True to the original purpose

Perl will stay true to its designer's original intended purpose. Larry wanted a language that would get the job done without getting in his way. The language had to be powerful enough to accomplish complex tasks, but still lightweight and flexible. As Larry is fond of saying, "Perl makes the easy things easy and the hard things possible." The fundamental design philosophy of Perl hasn't changed. In Perl 6, the easy things are a little easier and the hard things are more possible.

Familiarity

Perl 6 will be familiar to Perl 5 users. The fundamental syntax is still the same. It's just a little cleaner and a little more consistent. The basic feature set is still the same. It adds some powerful features that will probably change the way we code in Perl, but they aren't required.

Learning Perl 6 will be like American English speakers learning Australian English, not English speakers learning Japanese. Sure, there are some vocabulary changes, and the tone is a little different, but it is still--without any doubt--English.

Translatable

Perl 6 will be mechanically translatable from Perl 5. In the long term, this isn't nearly as important as what it will be like to write code in Perl 6. But during the transition phase, automatic translation will be important. It will allow developers to start moving ahead before they understand every subtle nuance of every change. Perl has always been about learning what you need now and learning more as you go.

Important New Features

Perl 6 will add a number of features such as exceptions, delegation, multi-method dispatch, continuations, coroutines, and currying, to name a few. These features have proven useful in other languages and provide a great deal of power for solving certain problems. They improve the stability and flexibility of the language.

Many of these features are traditionally difficult to understand. Perl takes the same approach as always: provide powerful tools, make them easy to use, and leave it up to the user to decide whether and how to use them. Most users probably won't even know they're using currying when they use the assuming method.

Features like these are an important part of preparing Perl for the future. Who knows what development paradigms might develop in a language that has this combination of advanced features in a form easily approachable by the average programmer. It may not be a revolution, but it's certainly evolution.

Long-Term Usability

Perl 6 isn't a revision intended to last a couple of years and then be tossed out. It's intended to last 20 years or more. This long-range vision affects the shape of the language and the process of building it. We're not interested in the latest fad or in whipping up a few exciting tricks. We want strong, dependable tools with plenty of room to grow. And we're not afraid to take a little extra time now to get it right. This doesn't mean Perl 6.0 will be perfect, any more than any other release has been perfect. It's just another step of progress.


O'Reilly & Associates recently released (June 2003) Perl 6 Essentials.

This week on Perl 6, week ending 2003-06-29

The Perl 6 Summary for the week ending 20030629

Welcome to the third of my US tour Perl 6 summaries. Once again I'm pleased to report that the denizens of the Perl 6 mailing lists continue to make the life of a touring summarizer an easy one by not posting all that much to the lists. So, I can sit here in my room at the Shaker Inn in Enfield and marvel at the traffic noise outside, wonder about the car next door with the New Hampshire plates reading PERLFAN, and just generally appreciate the loveliness of the room.

But, while I'm doing that, I should start with perl6-internals

Exceptions

At the end of last week, Dan outlined his thoughts on how exception handling will work in Parrot. This week, people talked about it. Discussion revolved around how much information should be attached to an exception and how/whether we should support resumable exceptions.

http://groups.google.com/groups

More on Continuation Passing

Last week I said that "I get the strong feeling that Leo Tötsch isn't entirely happy with the new Continuation Passing Style". This week Leo corrected me; I hadn't noticed that the speed issues had been addressed by the latest changes to parrot (in fact the current CPS implementation is faster than the old invoke/ret scheme).

Sean O'Rourke addressed Leo's problem with the Perl 6 Compiler tests failing by saying that the compiler should really be ported to use CPS rather than implementing a new variant of the Sub PMC that uses the old scheme. Leo reckoned that such a port wasn't currently doable because IMCC needed to be modified to use the CPS scheme, which would also involve reworking the register allocator. Given Leo's prodigious rate of implementation, this may have already happened.

http://groups.google.com/groups

IMCC/Parrot leak

Clinton A. Pierce had reported a memory leak in Parrot, but tracked it down to a situation where he was doing:

.arg 0
call _foo

and forgetting to take the 0 off the stack. However, even after he'd fixed that, he had segfault issues, and posted a (largish) code fragment that tweaked the bug.

It appears that Parrot wasn't throwing warnings when stacks get to big, just failing silently. Leo added a check for too deeply nested stacks, which at least avoids segfaulting on logic bugs.

Leo and Dan discussed other places where such limit checking should be put in place. Dan also muttered something about turning stack chunks into PMCs, allowing for the garbage collection of stack frames. Leo also muttered about the proliferation of stack implementations in Parrot (there are five) and thinks it should be possible to have one general stack engine.

http://groups.google.com/groups

Making + a unary operator

Bernhard Schmalhofer found a problem with the Perl 6 implementation.

print +42, "\n";

printed 42, but omitted the carriage return. He fixed this by making + into a unary operator as well as a binary operator and sent the patch to the list, where it was applied. Good catch, Bernhard.

http://groups.google.com/groups

ParrotIO File-Descriptors

Jürgen Bömmels is in the process of porting the IO subsystem from its current mem_sys_alloc/free based implementation to the sunny, garbage-collected uplands of a PMC based implementation. However, he's run into a problem; some of the operations in op.ops use integer File Descriptors, grabbing information from a table in the interpreter structure. This gets in the way of garbage collection, since any integer could be a file descriptor.

Jürgen proposed removing the integer file descriptors and mandating that ParrotIO PMCs be the only way to access IO (including the standard STDIN, STDOUT, and STDERR). He proposed adding get_std[in|out|err] ops to get at the standard streams.

Dan suggested that Jürgen Just Do It; the current IO system being more than slightly hackish, essentially put in place until something better came along.

http://groups.google.com/groups

Small Perl task for the interested

Want to get involved in the Parrot development process? Don't know much about Virtual Machine design and implementation? Do know Perl? Dan has a small but interesting task for you.

At present, Parrot gets built without any compiler level optimizations turned on because files like tsq.c can't have any optimizations turned on (tsq.c is the thread safe queue module, which is "annoyingly execution-order-dependent because it has to operate safely as interrupt code potentially interrupting itself").

Dan would like a version of Configure.pl which can build a Makefile (or whatever build tool we end up using) with per-C-file compiler flags, and it needs to be possible to override those flags, per file, by the platform configuration module.

Interested? David Robins seems to be, and he asked whether the build system had to be Makefile based. Dan says not, but the really important thing is that the resulting build script, or the config system that generates the script be adequately understandable/maintainable.

http://groups.google.com/groups

Scoping, .local and IMCC

Bugfinder General Clinton A Pierce is getting a headache trying to understand .local. When he executes the following code

.local int f
.sub _main
    .local int x
    .sub _foo1
        f=1
        x=2
        call _foo2
        end
    .end
    .sub _foo2
        print "f is 1: "
        print f
        print "\n"
        ret
    .end
.end

the output looks like:

f is 1: 2

which isn't quite what one would expect.

Leo explained what's going on; essentially it boils down to issues with register allocation not being aware of .local scopes. He recommended that Clint use either true globals or lexicals instead of .local. Clint isn't so sure that this is a good idea, pointing out that there are occasions when having lexically scoped names at the IMCC level as well as at the level of lexical pads would be very useful.

"In my mind, when I saw: 1. .local, 2. automagical register spillage in IMCC, and 3. nested compilation units I thought I'd found Assembler Manna."
— Clint Pierce

http://groups.google.com/groups — Clint is puzzled

http://groups.google.com/groups — Leo explains it all

Tentative valclone patch

Luke Palmer has been thinking about value and reference objects. He wondered if there was any value in a valclone operator alongside set and clone which would allow the target PMC to decide whether to use set or clone semantics. He also offered a patch implementing the operator if people thought it would be useful. Leo Tötsch wasn't sure the new operator was necessary.

Klaas-jan Stol noted that he'd encountered problems with reference/value confusion when he'd been working on his Lua compiler, but he wondered if the problem couldn't be solved by having a general, language independent "argument" PMC class. (I'm not sure I understood what he meant by this so I'm hoping for an explanation with code fragments).

http://groups.google.com/groups

Events, exceptions, and threads. Oh my!

There is a story that UK prime minister Harold MacMillan was asked by a student what it was that concerned him most as Prime Minister. Mac replied "Events dear boy, events."

Leo Tötsch laid out his thoughts and ensuing questions about Exceptions, events, and threads, and how they played together. There has been a small amount of discussion in response to this, but I think everyone's currently thinking hard about the issue....

http://groups.google.com/groups

CPS and the call stack

Luke Palmer wondered if there would be a standard way of inspecting the call stack (for debugging/caller/etc). (I think I'm going to switch to using the phrase 'call chain' rather than call stack, as the presence of continuations makes the call 'stack' look pretty unstacklike....).

Leo and Dan both thought that this would be a high level language issue rather than a Parrot issue, though Dan did note that there might be useful things that Parrot could do to make such introspection easier/possible.

http://groups.google.com/groups

Continuation manipulation

Leo Tötsch has been thinking about occasions when one might need to monkey with the internals of an existing continuation (he was thinking about the warnings state) and proposed several solutions. Dan favoured his new opcode, updatecc and thought it would be good to be able to broaden the scope of what one could update in a continuation/context. This scared Leo somewhat, but Dan came up with some examples of where it might prove to be useful.

http://groups.google.com/groups


Meanwhile in perl6-language

Almost nothing happened. There were all of 15 messages.

Perl 6 Daydreams

Miko O'Sullivan engaged in some summer daydreaming by asking what everyone was looking forward to most from Perl 6. Miko himself is looking forward to more Pure Perl modules. If Perl 6 delivers on its performance promises then there are going to be more and more things where implementing directly in Perl will be fast enough, and Perl is so much easier to implement in than C....

Jonathan Scott Duff incurred Cozeny when he said that he's hoping that by this time next year we'll have an 85% complete Perl 6 that will be usable in production (by brave people). Simon Cozens noted that we already have such a beast and it's called Perl 5. For some reason this led to a new marketing slogan being proposed: Perl 6, the reconstituted cheeseburger of programming languages. Somehow I don't think that one's going to fly. (I just read this bit out loud to my wife and she says that she really doesn't like the thought of a flying reconstituted cheeseburger, so I think we'd best leave it at that.

http://groups.google.com/groups


Acknowledgements, Announcements, and Apologies

Tcha! I announce the retirement of Leon Brocard from his post as Perl 6 Summary Running Joke and put the right to choose the next joke up for auction at YAPC. And what do you know, the winner of the auction nominates Leon Brocard as the new running joke. So, settle in for another year of desperate rationalizations for mentioning Leon in these summaries. Who knows, maybe Leon's Parrot related workrate will go up to such an extent that it'll be easy, but somehow I doubt it.

Thanks to, in chronological order, Adam Turoff and Lisa Wolfisch; Walt Mankowski, Mark-Jason and Laurie Dominus; Dave Adler; Dan and Karen Sugalski; and Uri and Linda Guttman for being such fine hosts in Washington, Philadelphia, New York, Hartford, and Boston respectively. Next time we do this, we will not be attempting to visit quite so many cities on the Eastern Seaboard in such a short time. At one point all we were seeing was Perl nerds and freeways in rapid succession.

As ever, if you've appreciated this summary, please consider one or more of the following options:

This week on Perl 6, week ending 2003-06-22

Welcome to my first anniversary issue of the Perl 6 Summary. Hopefully there won't be too many more anniversaries to celebrate before we have a real, running Perl 6, but there's bound to be ongoing development after that. My job is secure!

Because I can't think of anything better to do, I'll start with the action on the perl6-internals list.

Converting parrot to continuation passing style

The ongoing effort to convert Parrot to use/support continuation passing style (CPS) at the assembler level continues. Jonathan Sillito offered another patch implementing the require support, which Dan liked and applied.

http://groups.google.com/groups

Klaas-Jan Stol wondered what he'd missed; last time he looked Parrot wasn't doing continuation passing. He asked why Dan had chosen to go down that route. Dan answered that he had realized that "we had to save off so much state that we essentially had a continuation anyway". Explicitly going with continuation passing just made things more formal, and wrapped up all the context saving behind a straightforward interface. He promised a more detailed explanation later.

http://groups.google.com/groups

Portable way of finding libc, unbuffered reads

Clinton Pierce noted that the following code:

    loadlib P1, "/lib/libc.so.6"
    dlfunc P0, P1, "system", "it"
    set I0, 1
    set S5, "ls"
    invoke
    end

just works, which simultaneously pleases and scares him silly. He wondered if there was a good way of finding the standard C library on a Unix system without scary hardwiring as in the fragment above. He also wondered if there was an "official" way of getting an unbuffered read via parrot.

Jens Rieks came up with a gloriously evil way of finding libc. The theory goes that Parrot is linked against libc, so you just have to dlopen the running image and you can call libc functions to your heart's content. To dlopen the running image you need to pass a NULL pointer to the underlying loadlib so he offered a patch to core.ops which interpreted an empty string as a pointer to NULL. Leo and Dan were impressed and the patch (or something similar) was applied. I get the feeling that Dan wants to do something a little less hacky to access the current executable though....

Clint noted that the dlopen the running image by passing a null pointer trick doesn't work with Windows, but outlined a workaround for that too. Jen Rieks suggested a better Windows workaround.

Nobody came up with an approved way of doing getc, but once you have libc loaded you can just use its getc.

http://groups.google.com/groups

OO, Objects

If you look in a fresh from CVS parrot directory you'll now find object.ops, which will be the cause of much rejoicing in many places. Dan's nailed the object spec down enough that he's started implementing a few of the required ops. As he points out, what we have is "hardly sufficient", but everyone's got to start somewhere, the journey of a thousand miles begins with but a single step, etc.

Judging by the number of comments (none), everyone was stunned into silence.

http://groups.google.com/groups

More CPS shenanigans

I get the strong feeling that Leo Tötsch isn't entirely happy with the new Continuation Passing Style regime. He's worried that the P6C tests break, and that CPS subs are some 3 times slower for calling the sub. This led into a discussion of what context really must go into a continuation, whether we can get away with different classes of continuation (hold more or less contextual information) and other ways of possibly speeding things up.

I'm not sure Leo has been entirely convinced, but I'm confident that Dan's not going to change his mind about this.

Leo later submitted a large patch which unifies the various subroutine related PMCs to take into account CPS.

http://groups.google.com/groups

Exceptions

Now that the rules for subs/methods etc are settling down, Dan outlined his thoughts on exception handlers. If I'm understanding him correctly, an exception handler is just a continuation that you invoke with the exception as its only argument. There were no comments by the end of the week.

http://groups.google.com/groups

Meanwhile in perl6-language

The language list was quiet again. Maybe everyone was doing face to face things at YAPC. Or on holiday. Or something.

printf like formatting in interpolated strings

Remember last week I mentioned that Luke Palmer had made a cool suggestion about printf like formatting in string interpolation? (He suggested a syntax like rx/<expression> but formatted(<formatspec>)/, which I for one quite liked).

Edwin Steiner wasn't so keen, noting that Luke's suggestion was actually more verbose than rx/sprintf <formatspec>, <expression>/. He wasn't entirely sure that having a formatting rule attached to a value with a 'but' was really the right thing to do (it does rather violate the whole model/view/controller abstraction for instance). Edwin's favoured interpolation syntax was,

  rule formatted_interpolation {\\F <formatspec> <interpolatable_atom>}
  rule formatspec { # sprintf format without '%' 
  }
  rule interpolatable_atom { <variable> | \$\( <expr> \) }

(or something along those lines). Edwin went on to extend his idea, allowing for all sorts of clever interpolation rules, leading Dave Storrs to comment that the Obfuscated Perl people would certainly thank him if the suggestions went in.

Arcadi Shehter came up with yet another suggested syntax involving : (neglecting the important rule that, whilst one's heart may belong to Daddy, the : belongs to Larry. And I'm really trying not to think about the images that conjures up).

At this point, we ended up in a philosophical discussion about when was the right time to do stuff, generality of solutions and Perl remaining Perl. I remain confident that come the appropriate time, Larry and/or Damian (more likely Damian given some of the stuff he was showing off to do with formatting at YAPC) will nail things down and we'll all go "Of course!" and move onto the next thing.

http://groups.google.com/groups

Dispatching, Multimethods and the like

Adam Turoff noted that, in his YAPC opening talk, Damian had mentioned the catchall DISPATCH sub, which will allow for altering the dispatch behaviour to do any magic you choose. The 'problem' with DISPATCH is defining its interaction with the likes of AUTOLOAD and other built in dynamic dispatch behaviours, which will need to be nailed down.

Dan Sugalski jetted over from perl6-internals to give the lowdown on what would be available at the parrot level (which may or may not be exposed at the Perl 6 language level). Essentially, what we know is that there will be the capability to insert any dispatch method you like, but the details of how you'd do it aren't thrashed out yet. It almost certainly won't be easy, but that's a good thing.

http://groups.google.com/groups

Type Conversion Matrix, Pragmas (Take 4)

Discussion of Mike Lazzaro's type conversion matrix continued as people explored corner cases.

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

Whee! My first anniversary! I confess that when I started writing these things I didn't expect to keep going for this long. Now I don't expect to ever stop.

After due and careful consideration of a short shortlist, I should like to award an anniversary virtual white parrot award to Leopold "Patchmonster" Tötsch for his astonishing contribution to the Parrot core. Other mental nominees for this award were: Clinton A Pierce, for BASIC and the associated bug finding; Leon Brocard, for humorous reasons and Robert Spier and Ask Bjørn Johansen for invaluable and invisible work on websites, CVS and mailing list maintenance.

I eliminated the core design team from consideration for the above award, but I'd like to formally thank Larry, Damian, Allison and Dan, without whom...

As I said last week, Leon Brocard is no longer the summaries' running joke. However, I auctioned off the right to specify the next running joke at YAPC last week; next week should see the unveiling of the new, improved Perl 6 Summary Running Joke.

If you've appreciated this summary, please consider one or more of the following options:

  • Send money to the Perl Foundation at http://donate.perl-foundation.org/ and help support the ongoing development of Perl.
  • Get involved in the Perl 6 process. The mailing lists are open to all. http://dev.perl.org/perl6/ and http://www.parrotcode.org/ are good starting points with links to the appropriate mailing lists.
  • Send feedback, flames, money, photographic and writing commissions, or a cute little iPod with a huge capacity to satisfy my technolust .

Hidden Treasures of the Perl Core, part II

In the previous hidden treasures article, we looked at some easy-to-use (but not well-known) modules in the Perl Core. In this article, we dig deeper to uncover some of the truly precious and unique gems in the Perl Core.

constant

The constant pragma is not new or unknown, but it is a nice feature enhancement. Many people have used constant. Here is a standard example of using the constant for π.

use constant PI => 22/7;

When constants are used in programs or modules, they are often used in a set. Older versions of Perl shipped with a constant pragma that required a high level of work to produce a set.

        use constant SUNDAY  => 0;
        use constant MONDAY  => 1;
        use constnat TUESDAY => 2;

Wow, that's a lot of work! I've already given up on my program, not to mention the syntax error in the declaration of TUESDAY. Now let's try this again using the multiple declaration syntax, new to the constant pragma for Perl 5.8.0.

        use constant {
                SUNDAY    => 0,
                MONDAY    => 1,
                TUESDAY   => 2,
                WEDNESDAY => 3,
                THURSDAY  => 4,
                FRIDAY    => 5,
                SATURDAY  => 6,
        };

The only warning here is that this syntax is new to Perl 5.8.0. If you intend to distribute a program using multiple constant declarations, then remember the limitations of the program. You may want to specify what version of Perl is required for your program to work.

        use 5.8.0;

Perl will throw a fatal error if the version is anything less than 5.8.0.

Attribute::Handlers

This module allows us to play with Perl's subroutine attribute syntax by defining our attributes. This is a powerful module with a rich feature set. Here I'll give you an example of writing a minimal debugger using subroutine attributes.

First, we need to create an attribute. An attribute is any subroutine that has an attribute of :ATTR. Setting up our debug attribute is easy.

        use Attribute::Handlers;
        sub debug :ATTR {
                my (@args) = @_;
                warn "DEBUG: @args\n";
        }

Now we have a simple debug attribute named :debug. Using our attribute is also easy.

        sub table :debug {
                # ...
        }
        table(%data);
        table(%other_data);

Now, since attributes are compiled just before runtime, in the CHECK phase, our debugging output will only be sent to STDERR once. For the code above, we might get output like this:

        DEBUG: main GLOB(0x523d8) CODE(0x2e758) debug  CHECK
           Casey  Dad
        Chastity  Mom
         Evelina  Kid
        Coffee  Oily
          Cola  Fizzy

That debug string represents some of the information we get in an attribute subroutine. The first argument is the name of the package the attribute was declared in. Next is a reference to the symbol table entry for the subroutine, followed by a reference to the subroutine itself. Next comes the name of the attribute, followed by any data associated with the attribute (none in this case). Finally, the name of the phase that invoked the handler passed.

At this point, our debugging attribute isn't useful, but the parameters we are given to work with are promising. We can use them to invoke debugging output each time the subroutine is called. Put on your hard hat, this is where things get interesting.

First, let us take a look at how we want to debug our subroutine. I think we'd like different levels of debugging output. At the lowest level (1), the name of the subroutine being invoked should be sent to STDERR. At the next level (2), it would be nice to be notified of entry and exit of the subroutine. Going further (level 3), we might want to see the arguments passed to the subroutine. Even more detail can be done, but we'll save that for later and stop at three debug levels.

In order to do this voodoo, we need to replace our subroutine with one doing the debugging for us. The subroutine doing the debugging must then invoke our original code with the parameters passed to it, and return the proper output from it. Here is the implementation for debug level one (1).

        use Attribute::Handlers;
        use constant {
                PKG    => 0,
                SYMBOL => 1,
                CODE   => 2,
                ATTR   => 3,
                DATA   => 4,
                PHASE  => 5,
        };
        sub debug :ATTR {
                my ($symbol, $code, $level) = @_[SYMBOL, CODE, DATA];
                $level ||= 1;
                
                my $name = join '::', *{$symbol}{PACKAGE}, *{$symbol}{NAME};
                
                no warnings 'redefine';
                *{$symbol} = sub {
                        warn "DEBUG: entering $name\n";
                        return $code->(@_);
                };
        }
        sub table :debug {
                # ...
        }
        table(%data);
        table(%other_data);

There are some sticky bits in the debug subroutine that I need to explain in more detail.

        my $name = join '::', *{$symbol}{PACKAGE}, *{$symbol}{NAME};

This line is used to find the name and package of the subroutine we're debugging. We do the lookups from the symbol table, using the reference to the symbol that our attribute is given.

        no warnings 'redefine';

Here we turn off warnings about redefining a subroutine, because we're going to redefine a subroutine on purpose.

        *{$symbol} = sub { ... };

This construct simply replaces the code section in the symbol table with this anonymous subroutine (which is a code reference).

In this example, we set the default log level to one (1), set up some helper variables, and replace our table() subroutine with a debugging closure. I call the anonymous subroutine a closure because we are reusing some variables that are defined in the debug() subroutine. Closures are explained in greater detail in perlref (perldoc perlref from the command line).

To set the debug level for a subroutine, just a number the :debug attribute.

        sub table :debug(1) {
                # ...
        }

The output looks something like this:

        DEBUG: entering main::table
           Casey  Dad
        Chastity  Mom
         Evelina  Kid
        DEBUG: entering main::table
        Coffee  Oily
          Cola  Fizzy

Creating debug level two (2) is pretty easy from here. Time stamps will also be added to the output, which are useful for calculating how long your subroutine takes to run.

        *{$symbol} = sub {
                warn sprintf "DEBUG[%s]: entering %s\n",
                        scalar(localtime), $name;
                my @output = $code->(@_);
                if ( $level >= 2 ) {
                        warn sprintf "DEBUG[%s]: leaving %s\n",
                                scalar(localtime), $name;
                }
                return @output;
        };

In this example, we use sprintf to make out debugging statements a little more readable as complexity grows. This time, we cannot return directly from the original code reference. Instead, we have to capture the output and return it at the end of the routine. When the table() subroutine defines its debug level as :debug(2) the output is thus.

        DEBUG[Wed Jun 18 12:18:44 2003]: entering main::table
           Casey  Dad
        Chastity  Mom
         Evelina  Kid
        DEBUG[Wed Jun 18 12:18:44 2003]: leaving main::table
        DEBUG[Wed Jun 18 12:18:44 2003]: entering main::table
        Coffee  Oily
          Cola  Fizzy
        DEBUG[Wed Jun 18 12:18:44 2003]: leaving main::table

Finally, debug level three (3) should also print the arguments passed to the subroutine. This is a simple modification to the first debugging statement.

        warn sprintf "DEBUG[%s]: entering %s(%s)\n",
                scalar(localtime), $name, ($level >= 3 ? "@_" : '' );

The resulting output.

        DEBUG[Wed Jun 18 12:21:06 2003]: entering main::table(Chastity Mom Casey Dad Evelina Kid)
           Casey  Dad
        Chastity  Mom
         Evelina  Kid
        DEBUG[Wed Jun 18 12:21:06 2003]: leaving main::table
        DEBUG[Wed Jun 18 12:21:06 2003]: entering main::table(Coffee Oily Cola Fizzy)
        Coffee  Oily
          Cola  Fizzy
        DEBUG[Wed Jun 18 12:21:06 2003]: leaving main::table

Attribute::Handlers can do quite a lot more than what I've shown you already. If you like what you see, then you may want to add attributes to variables or worse. Please read the thorough documentation provided with the module.

B::Deparse

This module is a well-known Perl debugging module. It generates Perl source code from Perl source code provided to it. This may seem useless to some, but to the aspiring obfuscator, it's useful in understanding odd code.

        perl -snle'$w=($b="bottles of beer")." on the wall";$i>=0?print:last
        LINE for(map "$i $_",$w,$b),"take one down, pass it around",
        do{$i--;"$i $w!"}' -- -i=100

That is an example of an obfuscated program. It could be worse, but it's pretty bad already. Understanding this gem is as simple as adding -MO=Deparse to the command line. This will use B::Deparse to turn that mess into more readable Perl source code.

        LINE: while (defined($_ = <ARGV>)) {
                chomp $_;
                $w = ($b = 'bottles of beer') . ' on the wall';
                foreach $_ (
                         map("$i $_", $w, $b),
                         'take one down, pass it around',
                         do { --$i; "$i $w!" }
                       ) {
                        $i >= 0 ? print($_) : last LINE;
                }
        }

To use B::Deparse in the everyday example, just run your program using it on the command line.

        perl -MO=Deparse prog.pl

But if you want to have some real fun, then dig into the object-oriented interface for B::Deparse. There you will find an amazing method called coderef2text(). This method turns any code reference to text, just like the command line trick does for an entire program. Here is a short example.

        use B::Deparse;
        
        my $deparser = B::Deparse->new;
        
        print $deparser->coderef2text(
                sub { print "Hello, world!" }
        );

The output will be the code block, after it's been deparsed.

        {
        print 'Hello, world!';
        }

We can use this to add another debug level to our Attribute::Handlers example. Here, debug level four (4) will print out the source of our subroutine.

Before our debug() subroutine declaration we add the following lines of code.

        use B::Deparse;
        my $deparser = B::Deparse->new;

Next, our debugging closure declaration is updated to print out the full subroutine with the DEBUG: prefix on each line.

        *{$symbol} = sub {
                warn sprintf "DEBUG[%s]: entering %s(%s)\n",
                        scalar(localtime), $name, ($level >= 3 ? "@_" : '' );
                if ( $level >= 4 ) {
                        my $sub = sprintf "sub %s %s",
                                $name, $deparser->coderef2text( $code );
                        $sub =~ s/\n/\nDEBUG: /g;
                        warn "DEBUG: $sub\n";
                }
                my @output = $code->(@_);
                if ( $level >= 2 ) {
                        warn sprintf "DEBUG[%s]: leaving %s\n",
                                scalar(localtime), $name;
                }
                return @output;
        };

The verbose debugging output looks like this.

        DEBUG[Wed Jun 18 12:47:22 2003]: entering main::table(Chastity Mom Casey Dad Evelina Kid)
        DEBUG: sub main::table {
        DEBUG:    BEGIN {${^WARNING_BITS} = "UUUUUUUUUUUU"}
        DEBUG:    use strict 'refs';
        DEBUG:    my(%data) = @_;
        DEBUG:    my $length = 0;
        DEBUG:    foreach $_ (keys %data) {
        DEBUG:        $length = length $_ if length $_ > $length;
        DEBUG:    }
        DEBUG:    my $output = '';
        DEBUG:    while (my($k, $v) = each %data) {
        DEBUG:        $output .= sprintf("%${length}s  %s\n", $k, $v);
        DEBUG:    }
        DEBUG:    print "\n$output";
        DEBUG:}
           Casey  Dad
        Chastity  Mom
         Evelina  Kid
        DEBUG[Wed Jun 18 12:47:22 2003]: leaving main::table

There are more methods in the B::Deparse class that you can use to muck around with the results of coderef2text(). This module is powerful and useful for debugging. I suggest you at least use the simple version if code becomes ambiguous and incomprehensible.

While B::Deparse is good at what it does, it's not complete. Each version of Perl has made it better, and it's good in Perl 5.8.0. Don't trust B::Deparse to get everything right, though. For instance, I wouldn't trust it to serialize code for later use.

Class::Struct

This module, just like the constant pragma, is well-known. The difference is that Class::Struct is not often used. For many programs, setting up a class to represent data would be ideal, but overkill. Class::Struct gives us the opportunity to live in our ideal world without the pain of setting up any classes by hand. Here is an example of creating a class with Class::Struct. In this example, we're going to use compile time-class declarations, a new feature in Perl 5.8.0.

        use Class::Struct Person => {
                name => '$',
                mom  => 'Person',
                dad  => 'Person',
        };

Here we've created a class called Person with three attributes. name can contain a simple scalar value, represented by the dollar sign ($). mom and dad are both objects of type Person. Using our class within the same program is the same as using any other class.

        my $self = Person->new( name => 'Casey West' );
        my $wife = Person->new( name => 'Chastity West' );
        my $baby = Person->new(
                name => 'Evelina West',
                mom  => $wife,
                dad  => $self,
        );
        printf <<__FORMAT__, $baby->name, $baby->mom->name;
        %s, daughter of %s,
        went on to cure cancer and disprove Fermat's Theorem.
        __FORMAT__

Class::Struct classes are simple by design, and can get more complex with further creativity. For instance, to add a method to the Person class you can simply declare it in the Person package. Here is a method named birth() which should be called on a Person object. It takes the name of the baby as an argument, and optionally the father (a Person object). Returned is a new Person object representing the baby.

        sub Person::birth {
                my ($self, $name, $dad) = @_;
                return Person->new(
                        name => $name,
                        mom  => $self,
                        dad  => ( $dad || undef ),
                );
        }

These object are not meant to be persistent. If you want persistent objects, then you need to look elsewhere, perhaps Class::DBI or any other implementation, of which there are many.

These in-memory objects can help to clean up your code, but they add a bit of overhead. You have to decide where the balance in your program is. In most cases, using Class::Struct is going to be OK.

Encode

Encode is Perl's interface to Unicode. An explanation of Unicode itself is far beyond the scope of this article. In fact, it's far beyond the scope of most of us. This module is powerful. I'm going to provide some examples and lots of pointers to the appropriate documentation.

The first function of the API to learn is encode(). encode() will convert a string for Perl's internal format to a series of octets in the encoding you choose. Here is an example.

        use Encode;
        my $octets = encode( "utf8", "Hello, world!" );

Here we have turned the string Hello, world! into a utf8 string, which is now in $octets. We can also decode strings using the decode() function.

        my $string = decode( "utf8", $utf8_string );

Now we've decoded a utf8 string into Perl's internal string representation. Since utf8 is a common encoding to deal with, there are two helper functions: encode_utf8(), and decode_utf8. Both of these function take a string as the argument.

A list of supported encodings can be found in Encode::Supported, or by using the encodings() method.

        my @encodings = Encode->encodings;

For even more Unicode fun, dive into the documentation in Encode (perldoc Encode on the command line).

Filter::Simple

This module gives us an easy way to write source-code filters. These filters may change the behavior of calling Perl code, or implement new features of Perl, or do anything else they want. Some of the more infamous source-filter modules on the CPAN include Acme::Bleach, Semi::Semicolons, and even Switch.

In this article, I'm going to implement a new comment syntax for Perl. Using the following source-filter package will allow you to comment your code using SQL comments. SQL comments begin with two consecutive dashes (--). For our purposes, these dashes cannot be directly followed by a semicolon (;) or be preceded by something other than whitespace or a the beginning of a line.

        package SQLComments;
        use Filter::Simple sub {
                s/(?:^|\s)--(?!;)/#/g;
        };
        1;

In this example, we create an anonymous subroutine that is passed on to Filter::Simple. The entire source of the calling program is in $_, and we use a regular expression to search for our SQL comments and change them to Perl comments.

Using our new source filter works like this.

        use SQLComments;
        -- Here is some code that decrements a variable.
        my $i = 100; -- start at 100.
        while ( $i ) {
                $i--; -- decrement
        }
        -- That's it!.

Using B::Deparse on the command line, we can see what the code looks like after it's filtered. Just remember that B::Deparse doesn't preserve comments.

        use SQLComments;
        
        my $i = 100;
        while ($i) {
            --$i;
        }

The output is exactly as we expect. Filtering source code is a complex art. If your filters are not perfect, then you can break code in unexpected ways. Our SQLComments filter will break the following code.

        print "This is nice -- I mean really nice!\n";

It will turn into this.

        print "This is nice# I mean really nice!\n";

Not exactly the results we want. This particular problem can be avoided, however, using Filter::Simple in a slightly different way. You can specify filters for different sections of the source code, here is how we can limit our SQLComments filter to just code and not quote-like constructs.

        package SQLComments;
        
        use Filter::Simple;
        
        FILTER_ONLY code => sub { s/(?:^|\s)--(?!;)/#/g };

If you want to learn more about source filters, then read the documentation provided in Filter::Simple.

Variable Utility Modules

There are some functions that are repeated in hundreds (probably thousands) of programs. Think of all the sorting functions written in C programs. Perl programs have them, too, and the following utility modules try to clean up our code, eliminating duplication is simple routines.

There are a number of useful functions in each of these modules. I'm going to highlight a few, but be sure to read the documentation provided with each of them for a full list.

Scalar::Util
blessed() will return the package name that the variable is blessed into, or undef if the variable isn't blessed.
        my $baby  = Person->new;
        my $class = blessed $baby;

$class will hold the string Person. weaken is a function that takes a reference and makes it weak. This means that the variable will not hold a reference count on the thing it references. This is useful for objects, where you want to keep a copy but you don't want to stop the object from being DESTROY-ed at the right time.

List::Util
The first() function returns the first element in the list for which the block returns true.
        my $person = first { $_->age < 18 } @people;

shuffle() will return the elements of the list in random order. Here is an example of breaking a group of people into teams.

        my @people = shuffle @people;
        
        my @team1  = splice @people,  0, (@people/2);
        my @team2  = @people;

Finally, sum returns the sum of all the elements in a list.

        my $sum = sum 1 .. 10;
Hash::Util
Hash::Util has a slightly different function than the previously discussed variable utility modules. This module implements restricted hashes, which are the predecessor to the undesirable (and now obsolete) pseudo-hashes.

lock_keys() is a function that will restrict the allowed keys of a hash. If a list of keys is given, the hash will be restricted to that set, otherwise the hash is locked down to the currently existing keys.

        use Hash::Util qw[lock_keys];
        
        my %person = (
                name => "Casey West",
                dad  => $dad,
                mom  => $mom,
        );
        
        lock_keys( %person );

The %person hash is now restricted. Any keys currently in the hash may be modified, but no keys may be added. The following code will result in a fatal error.

        $person{wife} = $wife;

You can use the unlock_keys() function to release your restricted hash.

You can also lock (or unlock) a value in the hash.

        lock_value( %person, "name" );
        $person{name} = "Bozo"; # Fatal error!

Finally, you can lock and unlock an entire hash, making it read only in the first case.

        lock_hash( %person );

Now our %person hash is really restricted. No keys can be added or deleted, and no values can be changed. I know all those OO folks out there wishing Perl made it easy to keep class and instance data private are smiling.

Locale Modules

I've seen these modules implemented time and time again. Perl 5.8.0 introduced them. Each of them implements a set of functions that handle locale issues for you.

Locale::Language
This module will translate language codes to names, and vice-versa.
        my $lang = code2language( 'es' );      # Spanish
        my $code = language2code( 'English' ); # en

You can also get a full list of supported language names and codes.

        my @codes = all_language_codes();
        my @names = all_language_names();
Locale::Country

Convert country names to codes, and vice-versa. By default country codes are represented in two character codes.

        my $code = country2code( 'Finland' ); # fi

You can change the default behavior to get three character codes, or the numeric country codes.

        my $code = country2code( 'Russia', LOCALE_CODE_ALPHA_3 ); # rus
        my $num  = country2code( 'Australia', LOCALE_CODE_NUMERIC ); # 036

You can also go from any code type to country name.

        my $name = code2country( 'jp' ); # Japan

You can specify any type of code, but if it's not the default two character representation you must supply the extra argument to define what type it is.

        my $name = code2country( "120", LOCALE_CODE_NUMERIC ); # Cameroon

Just as before, you can get a full list of codes and countries using the two query functions: all_country_codes(), and all_country_names(). Both of these functions accept an optional argument specifying the code set to use for the resulting list.

Locale::Currency

This module has the same properties as the other locale modules. You can convert currency codes into full names, and vice-versa.

        my $curr = code2currency( 'jpy' ); # Yen
        my $code = currency2code( 'US Dollar' ); # usd

The query functions are: all_currency_codes(), and all_currency_names().

Memoize

Memoize is a module that performs code optimization for you. In a general sense, when you memoize a function, it is replaced by a memoized version of the same function. OK, that was too general. More specifically, every time your memoized function is called, the calling arguments are cached and anything the function returns is cached as well. If the function is called with a set of arguments that has been seen before, then the cached return value is sent back and the actual function is never called. This makes the function faster.

Not all functions can be memoized. For instance, if your function would return a different value on two calls, even for the exact same set of calling arguments, then it will be broken. Only the first sets return values will be returned for every call. Many function do not act this way, and that's what makes Memoize so useful.

Here is an example of a memoizeable function.

        sub add {
                my ($x, $y) = @_;
                return $x + $y;
        }

For every time this function is called as add( 2, 2 ), the result will be 4. Rather than compute the value of 4 in every case, we can cache it away the first time and retrieve it from the cache every other time we need to compute 2 + 2.

        use Memoize;
        
        memoize( 'add' );
        
        sub add {
                my ($x, $y) = @_;
                return $x + $y;
        }

We've just made add() faster, without any work. Of course, our addition function isn't slow to begin with. The documentation of Memoize gives a much more details look into this algorithm. I highly suggest you invest time in learning about Memoize, it can give you wonderful speed increases if you know how and when to use it.

Win32

I currently don't have a Microsoft operating system running on any of my networks, but when perusing the Perl core, I happened upon the Win32 module. I wanted to bring it up because if I were using a Microsoft OS, then I would find the functions in his module invaluable. Please, if you are running in that environment, then look at the documentation for Win32 for dozens of helpful functions (perldoc Win32 on the command line).

Conclusion

Just as before, I've still not covered all of the Perl core. There is much more to explore and a full list can be found by reading perlmodlib. The benefit of having these modules in the core is great. Lots of environments require programmers to be bound to using only code that is distributed with Perl. I hope I've been able to lighten the load for anyone who has been put in that position (even by choice).

This week on Perl 6, week ending 2003-06-15

Welcome to the last Perl 6 Summary of my first year of summarizing. If I were a better writer (or if I weren't listening with half an ear to Damian telling YAPC about Perl 6 in case anything's changed), then this summary might well be a summary of the last year in Perl 6. But I'm not, so it won't. Instead, I'm going to try and keep it short (summaries generally take me about eight hours on an average day, and I really don't want to lose eight hours of YAPC, thank you very much).

It's getting predictable I know, but we'll start with the internals list -- again.

Class Instantiation and Creation

Dan continued slouching toward full OO and outlined the issues involved with setting up classes and asked for opinions. People offered them.

http://groups.google.com/groups

Writing Language Debuggers

Clinton Pierce wanted to know how to go about writing language level debuggers in Parrot. (This man is unstoppable, I tell you.) He offered some example code to show what he was trying to do. Benjamin Goldberg had a style suggestion for the code, but nobody had much to say about Clint's particular issue.

http://groups.google.com/groups

Converting Parrot to Continuation-Passing Style

Much of this week's effort was involved in getting support for the continuation-passing style function calling into Parrot. Jonathan Sillito posted a patch. This led to a certain amount of confusion about what needs to be stashed in the continuation and a certain amount of bemusement about the implications of caller saves rather than callee saves (in a nutshell, a calling context only has to save those registers that it cares about; it doesn't have to worry about saving any other registers, because its callers will already have saved them if they cared.)

Dan ended up rewriting the calling conventions PDD to take into account some of the confusion.

I think the upshot of this is that the Parrot core now has everything we need to support the documented continuation-passing calling conventions. But I could be wrong.

http://groups.google.com/groups

http://groups.google.com/groups

Segfaulting IMCC for Fun and Profit

Clint Pierce's BASIC implementation efforts continue to be one of the most-effective bug hunting (in code and/or docs) efforts the Parrot team has. This time, Clint managed to segfault IMCC by trying to declare nested .subs using the wrong sorts of names. Leo Tötsch explained how to fix the problem. It seems that fixing IMCC to stop it from segfaulting on this issue is hard, since the segfault happens at runtime.

http://groups.google.com/groups

Passing the Time

Clint's BASIC can now place chess! Not very well, but we're in 'dogs dancing' territory here. Bravo Clint! There was applause.

http://groups.google.com/groups


Meanwhile in Damian's YAPC Address ...

New DISPATCH method

Last week, Ziggy worried about multimethod dispatch not being good enough. This week at YAPC, Damian announced DISPATCH, a scary magic subroutine that allows you to define your own dispatch rules. Essentially, it gets called before the built-in dispatch rules do; beyond that, I know nothing.

Sorry, no link for this.


Meanwhile in perl6-language

Ziggy's Obsolete Thread

Last week, I mentioned that Adam Turoff had worried a little about multimethod dispatch, and wanted to know whether it would be possible to easily override the dispatch system. This week, he outlines the types of things he might want to do.

See above for the resolution. Details don't exist just yet, but we'll get there.

http://groups.google.com/groups

Type Conversion Matrix, Pragmas (Take 4)

Michael Lazzaro posted the latest version of his Type Conversion Matrix and asked for comments and, hopefully, definitive answers. There was a small about of discussion ...

http://groups.google.com/groups

Returning from a Nested Call

Whilst idly 'longing for the cleansing joy [of] Perl,' Dave Storrs wondered how/whether he could write a method that would return from its caller. Answer: yes, use leave.

http://groups.google.com/groups

printf like formatting in interpolated strings

Edward Steiner wondered about having some way to printf, like formatting of numbers in interpolated strings. Luke Palmer (who just told me he's embarrassed about something I wrote about something he said last week, but I'd forgotten it) came up with a cool-looking suggestion in response.

http://groups.google.com/groups


Acknowledgements, Announcements and Apologies

Well, that wraps up my first year of summary writing. Thanks to everyone for reading, it's been fun.

I have one announcement to make: As of next week, there will be no obligatory reference to Leon Brocard -- I'm getting bored of it, you all must have been bored with it for months ... .

If you've appreciated this summary, then please consider one or more of the following options:

  • Send money to the Perl Foundation at http://donate.perl-foundation.org/ and help support the ongoing development of Perl.

  • Get involved in the Perl 6 process. The mailing lists are open to all. http://dev.perl.org/perl6/ and http://www.parrotcode.org/ are good starting points with links to the appropriate mailing lists.

  • /usr/bin/pod2html: p6s.pod: cannot resolve L<p6summarizer@bofh.org.uk> in paragraph 50. Send feedback, flames, money, photographic and writing commissions, or a nice long US power cable to plug into my Mac power-brick to .

Perl Design Patterns

Introduction

In 1995, Design Patterns was published, and during the intervening years, it has had a great influence on how many developers write software. In this series of articles, I present my take on how the Design Patterns book (the so-called Gang of Four book, which I will call GoF) and its philosophy applies to Perl. While Perl is an OO language -- you could code the examples from GoF directly in Perl -- many of the problems the GoF is trying to solve are better solved in Perl-specific ways, using techniques not open to Java developers or those C++ developers who insist on using only objects. Even if developers in other languages are willing to consider procedural approaches, they can't, for instance, use Perl's immensely powerful built-in pattern support.

Though these articles are self-contained, you will get more out of them if you are familiar with the GoF book (or better yet have it open on your desk while you read). If you don't have the book, then try searching the Web - many people talk about these patterns. Since the Web and the book have diagrams of the object versions of the patterns, I will not reproduce those here, but can direct you to this fine site.

I will show you how to implement the highest value patterns in Perl, most often by using Perl's rich core language. I even include some objects.

For the object-oriented implementations, I need you to understand the basics of Perl objects. You can learn that from printed sources like the Perl Cookbook by Tom Christiansen and Nat Torkington or Objected Oriented Perl by Damian Conway. But the simplest way to learn the basics is from perldoc perltoot.

As motivation for my approach, let me start with a little object-oriented philosophy. Here are my two principles of objects:

  1. Objects are good when data and methods are tightly bound.
  2. In most other cases, objects are overkill.

Let me elaborate briefly on these principles.

Objects are good when data and methods are tightly bound.

When you are working for a company that rents cars (as I do), an object to represent a rental agreement makes sense. The data on the agreement is tightly bound to the methods you need to perform. To calculate the amount owed, you take the various rates and add them together, etc. This is a good use of an object (or actually several aggregated objects).

In most other cases, objects are overkill.

Consider a few examples from other languages. Java has the java.lang.Math class. It provides things such as sine and cosine. It only provides class methods and a couple of class constants. This should not be forced into an object-oriented framework, since there are no Math objects. Rather the functions should be put in the core, left out completely, or made into non-object-oriented functions. The last option is not even available in Java.

Or think of the C++ standard template library. The whole templating framework is needed to make C++ backward compatible with C and to handle strong static-type checking. This makes for awkward object-oriented constructs for things that should be simple parts of the core language. To be specific, why shouldn't the language just have a better array type at the outset? Then a few well-named built-in operations take care of stacks, queues, dequeues and many other structures we learned in school.

So, in particular, I take exception to one consistent GoF trick: turning an idea into a full-blown class of objects. I prefer the Perl way of incorporating the most-important concepts into the core of the language. Since I prefer this Perl way, I won't be showing how to objectify things that could more easily be a simple hash with no methods or a simple function with no class. I will invert the GoF trick: implement full-blown pattern classes with simpler Perl concepts.

The patterns in this first article rely primarily on built-in features of Perl. Later articles will address other groups of patterns. Now that I've told you what I'm about to do, let's start.

Iterator

There are many structures that you need to walk one element at a time. These include simple things such as arrays, moderate things such as the keys of a hash, and complex things such as the nodes of a tree.

The Gang of Four suggest solving this problem with the above mentioned trick: turn a concept into an object. Here that means you should make an iterator object. Each class of objects that can reasonably be walked should have a method that returns an iterator object. The object itself always behaves in a uniform way. For example, consider the following code, which uses an iterator to walk the keys of a hash in Java.


    for (Iterator iter = hash.keySet().iterator(); iter.hasNext();) {
        Object key   = iter.next();
        Object value = hash.get(key);
        System.out.println(key + "\t" + value);
    }

The HashMap object has something that can be walked: its keys. You can ask it for this keySet. That Set will give you an Iterator on request to its iterator method. The Iterator responds to hasNext with a true value if there are more things to be walked, and false otherwise. Its next method delivers the next object in whatever sequence the Iterator is managing. With that key, the HashMap delivers the next value in response to get(key). This is neat and tidy in the completely OO framework of a language with limited operators and built-in types. It also perfectly exhibits the GoF iterator pattern.

In Perl any built-in or user defined object which can be walked has a method which returns an ordered list of the items to be walked. To walk the list, simply place it inside the parentheses of a foreach loop. So the Perl version of the above hash key walker is:


    foreach my $key (keys %hash) {
        print "$key\t$hash{$key}\n";
    }

I could implement the pattern exactly as it is diagrammed in GoF, but Perl provides a better way. In Perl 6, it will even be possible to return a list that expands lazily, so the above will be more efficient than it is now. In Perl 5, the keys list is built completely when I call keys. In the future, the keys list will be built on demand, saving memory in most cases, and time in cases where the loop ends early.

The inclusion of iteration as a core concept represents Perl design at its finest. Instead of providing a clumsy mechanism in non-core code, as Java and C++ (through its standard template library) do, Perl incorporates this pattern into the core of the language. As I alluded to in the introduction, there is a Perl principle here:

If a pattern is really valuable, then it should be part of the core language.

The above example is from the core of the language. To see that foreach fully implements the iterator pattern, even for user-defined modules, consider an example from CPAN: XML::DOM. The DOM for XML was specified by Java programmers. One of the methods you can call on a DOM Document is getElementsByTagName. In the DOM specification this returns a NodeList, which is a Java Collection. Thus, the NodeList works like the Set in the Java code above. You must ask it for an Iterator, then walk the Iterator.

When Perl people implemented the DOM, they decided that getElementsByTagName would return a proper Perl list. To walk the list one says something like:


    foreach my $element ($doc->getElementsByTagName("tag")) {
        # ... process the element ...
    }

This stands in stark contrast to the overly verbose Java version:


    NodeList elements = doc.getElementsByTagName("tag");
    for (Iterator iter = elements.iterator(); iter.hasNext();) {
        Element element = (Element)iter.next();
        // ... process the element ...
    }

One beauty of Perl is its ability to combine procedural, object-oriented, and core concepts in such powerful ways. The facts that GoF suggests implementing a pattern with objects and that object only languages like Java require it do not mean that Perl programmers should ignore the non-object features of Perl.

Perl succeeds largely by excellent use of the principle of promotion. Essential patterns are integrated into the core of the language. Useful things are implemented in modules. Useless things are usually missing.

So the iterator pattern from GoF is a core part of Perl we hardly think about. The next pattern might actually require us to do some work.

Decorator

In normal operation, a decorator wraps an object, responding to the same API as the wrapped object. For example, suppose I add a compressing decorator to a file writing object. The caller passes a file writer to the decorator's constructor, and calls write on the decorator. The decorator's write method first compresses the data, then calls the write method of the file writer it wraps. Any other type of writer could be wrapped with the same decorator, so long as all writers respond to the same API. Other decorators can also be used in a chain. The text could be converted from ASCII to unicode by one decorator and compressed by another. The order of the decorators is important.

In Perl, I can do this with objects, but I can also use a couple of language features to obtain most of the decorations I need, sometimes relying solely on built-in syntax.

I/O is the most common use of decoration. Perl provides I/O decoration directly. Consider the above example: compressing while writing. Here are two ways to do this.

Use the Shell and Its Tools

When I open a file for writing in Perl, I can decorate via shell tools. Here is the above example in code:


    open FILE, "| gzip > output.gz"
        or die "Couldn't open gzip and/or output.gz: $!\n";

Now everything I write is passed through gzip on its way to output.gz. This works great so long as (1) you are willing to use the shell, which sometimes raises security issues; and (2) the shell has a tool to do what you need done. There is also an efficiency concern here. The operating system will spawn a new process for the gzip step. Process creation is about the slowest thing the OS can do without performing I/O.

Tying

If you need more control over what happens to your data, then you can decorate it yourself with Perl's tie mechanism. It will be even faster, easier to use, and more powerful in Perl 6, but it works in Perl 5. It does work within Perl's OO framework; see perltie for more information.

Suppose I want to preface each line of output on a handle with a time stamp. Here's a tied class to do it.


    package AddStamp;
    use strict; use warnings;

    sub TIEHANDLE {
        my $class  = shift;
        my $handle = shift;
        return bless \$handle, $class;
    }

    sub PRINT {
        my $handle = shift;
        my $stamp  = localtime();

        print $handle "$stamp ", @_;
    }

    sub CLOSE {
        my $self = shift;
        close $self;
    }

    1;

This class is minimal, in real life you need more code to make the decorator more robust and complete. For example, the above code does not check to make sure the handle is writable nor does it provide PRINTF, so calls to printf will fail. Feel free to fill in the details. (Again, see perldoc perltie for more information.)

Here's what these pieces do. The constructor for a tied file handle class is called TIEHANDLE. Its name is fixed and uppercase, because Perl calls this for you. This is a class method, so the first argument is the class name. The other argument is an open output handle. The constructor merely blesses a reference to this handle and returns that reference.

The PRINT method receives the object constructed in TIEHANDLE plus all the arguments supplied to print. It calculates the time stamp and sends that together with the original arguments to the handle using the real print function. This is typical decoration at work. The decorating object responds to print just like a regular handle would. It does a little work, then calls the same method on the wrapped object.

The CLOSE method closes the handle. I could have inherited from Tie::StdHandle to gain this method and many more like it.

Once I put AddTimeStamp.pm in my lib path, I can use it like this:


    #!/usr/bin/perl
    use strict; use warnings;

    use AddStamp;

    open LOG, ">output.tmp" or die "Couldn't write output.tmp: $!\n";
    tie *STAMPED_LOG, "AddStamp", *LOG;

    while (<>) {
        print STAMPED_LOG;
    }

    close STAMPED_LOG;

After opening the file for writing as usual, I use the built-in tie function to bind the LOG handle to the AddStamp class under the name STAMPED_LOG. After that, I refer exclusively to STAMPED_LOG.

If there are other tied decorators, then I can pass the tied handle to them. The only downside is that Perl 5 ties are slower than normal operations. Yet, in my experience, disks and networks are my bottlenecks so in memory inefficiency like this tends not to matter. Even if I make the script code execute 90 percent faster, I don't save a noticeable amount of time, because it wasn't using much time in the first place.

This technique works for many of the built-in types: scalars, arrays, hashes, as well as file handles. perltie explains how to tie each of those.

Ties are great since they don't require the caller to understand the magic you are employing behind their back. That is also true of GoF decorators with one clear exception: In Perl, you can change the behavior of built-in types.

Decorating Lists

One of the most common tasks in Perl is to transform a list in some way. Perhaps you need to skip all entries in the list that start with underscore. Perhaps you need to sort or reverse the list. Many built-in functions are list filters. They take a list, do something to it and return a resultant list. This is similar to Unix filters, which expect lines of data on standard input, which they manipulate in some way, before sending the result to standard output. Just as in Unix, Perl list filters can be chained together. For example, suppose you want a list of all subdirectories of the current directory in reverse alphabetical order. Here's one possible solution.


 1  #!/usr/bin/perl
 2  use strict; use warnings;
 3
 4  opendir DIR, ".",
 5      or die "Can't read this directory, how did you get here?\n";
 6  my @files = reverse sort map { -d $_ ? $_ : () } readdir DIR;
 7  closedir DIR;
 8  print "@files\n";

Perl 6 will introduce a more meaningful notation for these operations, but you can learn to read them in Perl 5, with a little effort. Line 6 is the interesting one. Start reading it on the right (this is backward for Unix people). First, it reads the directory. Since map expects a list, readdir returns a list of all files in the directory. map generates a list with the name of each file which is a directory (or undef if the -d test fails). sort puts the list in ASCII-betical order. reverse reverses that. The result is stored in @files for later printing.

You can make your own list filter quite easily. Suppose you wanted to replace the ugly map usage above (I tend to think map is always ugly) with a special purpose function, here's how:


    #!/usr/bin/perl
    use strict; use warnings;

    sub dirs_only (@) {
        my @retval;
        foreach my $entry (@_) {
            push @retval, $entry if (-d $entry);
        }
        return @retval;
    }

    opendir DIR, "."
        or die "Can't read this directory, how did you get here?\n";
    my @files = reverse sort { lc($a) cmp lc($b) } dirs_only readdir DIR;
    closedir DIR;
    local $" = ";";
    print "@files\n";

The new dirs_only routine replaces map above, leaving out the entries we don't want to see.

The sort now has an explicit comparison subroutine. This is to prevent it from thinking that dirs_only is its comparison routine. Since I had to include this, I chose to take advantage of the situation and sort with more finesse: ignoring case.

You can make such list filters to your heart's content.

I have now shown you the most important types of decoration. Any others you need could be implemented in the traditional GoF way.

The next pattern feels like cheating, but then Perl often gives me that feeling.

Flyweight

The idea of reusing objects is the essence of the flyweight pattern. Thanks to Mark-Jason Dominus, Perl takes this far beyond what the GoF had in mind. Further, he did the work once and for all. Larry Wall likes this idea so much he's promoting it to the core for Perl 6 (there's that promotion concept again).

What I want is this:

For objects whose instances don't matter (they are constants or random), those requesting a new object should be given the same one they already received whenever possible.

This pattern fails dramatically if separate instances matter. But if they don't, then it would save time and memory.

Here's an example of how this works in Perl. Suppose I want to provide a die class for games like Monopoly or Craps. My die class might look like this: (Warning: This example is contrived to show you the technique.)


    package CheapDie;
    use strict; use warnings;

    use Memoize;
    memoize('new');

    sub new {
        my $class = shift;
        my $sides = shift;
        return bless \$sides, $class;
    }

    sub roll {
        my $sides         = shift;
        my $random_number = rand;

        return int ($random_number * $sides) + 1;
    }

    1;

On first glance, this looks like many other classes. It has a constructor called new. The constructor stores the received number of sides into a subroutine lexical variable (a.k.a. a my variable), returning a blessed reference to it. The roll method calculates a random number, scales it according to the number of sides, and returns the result.

The only thing strange here are these two lines:


    use Memoize;
    memoize('new');

These exploit Perl's magic extraordinarily well. The memoize function modifies the calling package's symbol table so that new is wrapped. The wrapping function examines the incoming arguments (the number of sides in this case). If it has not seen those arguments before, then it would call the function as the user intended, storing the result in a cache and returning it to the user. This takes more time and memory than if I had not used the module.

The savings come when the method is called again. When the wrapper notices a call with the same arguments it used before, it does not call the method. Rather, it sends the cached object instead. We don't have to do anything special as a caller or as an object implementor. If your object is big, or slow to construct, then this technique would save you time and memory. In my case, it wastes both since the objects are so small.

The only thing to keep in mind is that some methods don't benefit from this technique. For example, if I memoize roll, then it would return the same number each time, which is not exactly the desired result.

Note too that Memoize can be used in non-object situations - in fact the documentation for it doesn't seem to contemplate using it for object factories.

Not only do languages such as Java not have core functions for caching method returns, they don't allow clever users to implement them. Mark-Jason Dominus did a fine thing implementing Memoize, but Larry Wall did a better thing by letting him. Imagine Java letting a user write a class that manipulated the caller's symbol table at run time - I can almost hear the screams of terror. Of course, these techniques can be abused, but precluding them is a greater loss than rejecting poor code on the few occasions that some less-than-stellar programmer improperly adjusts the symbol table.

In Perl all things are legal, but some are best left to modules with strong development communities. This allows regular users to take advantage of magic manipulations without worrying about whether our own magic will work. Memoize is an example. Instead of rolling your own wrapped call and caching scheme, use the well-tested one that ships with Perl (and looked for the 'is cached' trait to do this for routines in Perl 6).

The next pattern is related to this one, so you can use flyweight to implement it.

Singleton

In the flyweight pattern, we saw that there are sometimes resources that everyone can share. GoF calls the special case when there is a single resource that everyone needs to share the singleton pattern. Perhaps the resource is a hash of configuration parameters. Everyone should be able to look there, but it should only be built on startup (and possibly rebuilt on some signal).

In most cases, you could just use Memoize. That seems most reasonable to me. (See the flyweight section above.) In that case, everyone who wants access to the resource calls the constructor. The first person to do so causes the construction to happen and receives the object. Subsequent people call the constructor, but they receive the originally constructed object.

There are many other ways to achieve this same effect. For instance, if you think your callers might pass you unexpected arguments, then Memoize would make multiple instances, one for each set of arguments. In this case, managing the singleton with modules like Cache::FastMemoryCache from CPAN may make more sense. You could even use a file lexical, assigning it a value in a BEGIN block. Remember bless doesn't have to be used in a method. You could say:


    package Name;
    my $singleton;
    BEGIN {
        $singleton = {
            attribute => 'value',
            another   => 'something',
        };
        bless $singleton, "Name";
    }

    sub new {
        my $class = shift;
        return $singleton;
    }

This avoids some of the overhead of Memoize and shows what I'm doing more directly. I made no attempt to take subclassing into account here. Maybe I should, but the pattern says a singleton should belong always to one class. The fundamental statement about singletons is:

``There can only be one singleton.''

Summary

All four of the patterns shown in this article use built-in features, or standard modules. The iterator is implemented with foreach. The decorator is implemented for I/O with Unix pipe and redirection syntax or with a tied file handle. For lists, decorators are just functions which take and return lists. So, I might call decorators filters. Flyweights are shared objects easily implemented with the Memoize module. Singletons can be implemented as flyweights or with simple object techniques.

The next time some uppity OO programmer starts going on about patterns, rest assured, you know how to use them. In fact, they are built-in to the core of your language (at least if you have the sense to use Perl).

Next time, I will look at patterns which rely on code references or data containers.

Acknowledgements and Background

I wrote these articles after taking a training course using GoF from a well-known training and consulting company. My writing is also informed by many people in the Perl community, including Mark-Jason Dominus, who showed at YAPC 2002, using his unique flair, how Perl deals with the iterator pattern. Though the writing here is mine, the inspiration comes from Dominus and many others in the Perl community, most of all Larry Wall, who have incorporated patterns into the heart of Perl during the years. As these patterns show, time and time again, Perl employs the principle of promotion carefully and well. Instead of adding a collection framework in source code modules, as Java and C++ do, Perl has only two collections: arrays and hashes. Both are core to the language. I think Perl's greatest strength is the community's choices of what to include in the core, what to ship along with the core, and what to leave out. Perl 6 will only make Perl more competitive in the war of language design ideas.

This week on Perl 6, week ending 2003-06-08

It's another Monday, it's another summary and I need to get this finished so I can starting getting the house in order before we head off to Boca Raton and points north and west on the long road to Portland, Oregon. Via Vermont. (I'm English and as the poem comments, the rolling English road is ``A rare road, a rocky road and one that we did tread // The day we went to Birmingham by way of Beachy Head.'' Just because I'm in America doesn't mean I can't take an English route to OSCON)

We'll start with the internals list this week (and, given that there are only 18 or so messages in my perl6-language inbox, we may well stop there).

Building IMCC as parrot

It's been pretty much decided that IMCC will soon become 'the' parrot executable. Josh Wilmes, Robert Spier and Leo ``Perl Foundation grant recipient'' Tötsch are looking into what needs to be done to make this so. It's looking like the build system may well see some vigorous cleanup action in this process.

http://groups.google.com/groups

The Horror! The Horror!

Clint Pierce continued to expand on the internals of this Basic implementation. The more I see of his pathological examples, the gladder I am that I escaped BASIC as quickly as possible. Still, kudos to Clint once more for the effort, even if it is a tad embarrassing that the most advanced language hosted on Parrot is BASIC. (On IRC Leon Brocard and others have been heard to remark that they're? unlikely to go all out at a real language until Parrot has objects. Dan?)

http://groups.google.com/groups

The Horror! The Horror! Part II

The timely destruction thread still doesn't want to go away. Dan has been heard muttering about this on IRC. Eventually, he did more than mutter on IRC -- he stated clearly on list that 'We aren't doing reference counting' and that as far as he is concerned the matter is closed.

Dan's blog also has another of his excellent ``What The Heck Is'' posts, this time about Garbage Collection.

http://groups.google.com/groups

http://groups.google.com/groups

http://www.sidhe.org/~dan/blog/archives/000200.html - What the Heck is: Garbage Collection

The Continu(ation)ing Saga

Jonathan Sillito posted a longish meditation on Parrot's new continuation passing calling conventions. He wondered if, now we have continuation passing, we really needed the various register stacks that were used in the old stack based calling conventions. Warnock's Dilemma currently applies.

http://groups.google.com/groups

Clint Pierce, IMCC tester extraordinaire

Over the past couple of week's Clint Pierce has been porting his BASIC implementation over to run on IMCC. In the process of doing so he's been finding and reporting all sorts of IMCC bugs and/or misunderstandings and Leo Tötsch (usually) has either been correcting Clint's assumptions or fixing the bugs he's found. I've mentioned a few of these exchanges that generated longish threads in the past, but that hasn't covered everything that's been found, discussed and fixed. It's been great to see this sort of dialogue driving the design and implementation forward based on the needs of a real program.

The thread I've linked to below is another exchange in this ongoing dialogue. Clint found a way of reliably segfaulting IMCC. Leo fixed it. And on to the next.

http://groups.google.com/groups

And, on the subject of list stalwarts...

Jürgen Bömmels is still working away at the Parrot IO (PIO) subsystem. In this particular patch, he's gone through the Parrot source replacing occurrences PIO_fprintf(interpreter, PIO_STDERR(interpreter, ...) with the better factored PIO_eprintf(interpreter, ...), which as well as eliminating repetition, helps to keep the IO code slightly easier to maintain.

Leo applied the patch. (Although it's not mentioned explicitly elsewhere, Leo continues to keep up his astonishing productivity with various other patches to Parrot)

http://groups.google.com/groups

Make mine SuperSized

Bryan C. Warnock continued to discuss issues of the size of Parrot's various types, particularly the integer types that get used within a running Parrot. Bryan argues that these should ideally use a given platform's native types, worrying about guaranteed sizes only at the bytecode loading/saving stage. Dan and others commented on this (Dan essentially said that he understood what Bryan was driving at but wasn't quite sure of the way forward, and outlined his options). Discussion continues.

http://groups.google.com/groups

Call invoke call?

Jonathan Sillito submitted a patch which changes invoke to call, adds some PMC access macros and updates the tests. He and Leo Tötsch discussed things for a while and I think the patch is in the process of being rewritten as result of that discussion.

http://groups.google.com/groups

Constant Propagation/Folding

Matt Fowles had another bite at his Constant Propagation cherry. His latest patch is rather more conservative, and actually has tests. Leo changed one line and applied it.

http://groups.google.com/groups

Destruction order

One of the good things about a simple minded reference counting Garbage Collector is that object destructors generally get called in a sensible order; if you have a tree of objects, the various node destructors will generally get called in such a way that a given node's children won't have been destroyed already. Garrett Goebel asked if we could keep this behaviour with the Parrot GC system. Dan was minded to say ``Yes'' as he's been wrestling with issues of non deterministic destruction order in another of his projects (So have I; it's a very long way from being fun, if I had the C chops I'd be trying to fix Perl 5's 'at exit' mark and sweep garbage collector to do something similar.)

http://groups.google.com/groups

K Stol's Lua Compiler project

Klaas-Jan Stol announced that he's turned in his project implementing a Lua compiler that targets Parrot. He hasn't actually finished the compiler, his deadline being what it was, but he did post a link to his project report and commented that ``[Parrot is] a really cool project and VM to target'' and thanked everyone on the mailing list for their help. I think the parrot-internals people will echo my best wishes to Klaas-Jan; it's great to see someone who comes to a list with a project and, instead of saying ``Write this for me!'', asks sensible questions and makes a useful contribution to the ongoing task.

http://groups.google.com/groups

http://members.home.nl/joeijoei/parrot/report.pdf -- Project report

Meanwhile, in perl6-language

By gum, it's been quiet this week. I haven't seen any traffic in the language list since Wednesday. Maybe everyone's waiting for Damian's Exegesis to escape.

Multimethod dispatch?

Adam Turoff asked if multimethod dispatch (MMD) was really the Right Thing (it's definitely a Right Thing) and suggested that it would be more Perlish to allow the programmer to override the dispatcher, allowing for all sorts of more or less cunning dispatch mechanisms (which isn't to say we could still have MMD tightly integrated, but it wouldn't be the only alternative to simple single dispatch). Luke Palmer gets the ``Pointy End Grandma'' award for pointing out that Perl 6 is a '``real'' programming language now' (as Adam pointed out, Perl's been a 'real' programming language for years), inspiring a particularly pithy bit of Cozeny. As far as I can tell, Adam wants to be able to dispatch on the runtime value of a parameter as well as on its runtime type (he's not alone in this). Right now you either have to do this explicitly in the body of the subroutine, or work out the correct macromantic incantations needed to allow the programmer to use 'nice' syntax for specifying such dispatch.

Assuming I'm not misunderstanding what Adam is after, this has come up before (I think I asked about value based dispatch a few months back) and I can't remember if the decision was that MMD didn't extend to dispatching based on value, or if that decision hasn't been taken yet. If it's not been taken, I still want to be able to do


   multi factorial (0) { 1 }
   multi factorial ($n) { $n * factorial($n - 1) }

It seems to me that if MMD is flexible enough to do this, then it becomes easy to express any other set of dispatch rules as special cases of this more general mechanism. (That said, I'm not sure how one would go about expressing Smalltalk like message specifiers, which is a shame, I like Smalltalk message specifiers).

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

Well, that's about it for another week. Next week's summary will be coming to you from YAPC in Boca Raton. then there should be one from chez Turoff in Washington DC (As far as I can tell the Washington summary will be the first summary of the second year of my summary writing, if you're going to be in the Greater Washington area around that time, consider getting in touch with either me or Ziggy and we'll see about having a celebratory something or other that evening). After Washington I'll be in Boston for the next summary, and at OSCON for the one after that. I fully expect to be writing either enormously long summaries or drastically curtailed ones while I'm over in the States. After OSCON, there'll be a summary from Seattle and then I'll be off back home. If you're in any of those localities at the appropriate times drop me a line, we'll try and arrange meet-ups to wet the appropriate summaries' heads.

If you've appreciated this summary, please consider one or more of the following options:

Regexp Power

Everyone knows that Perl works particularly well as a text processing language, and that it has a great many tools to help the programmer slice and dice text files. Most people know that Perl's regular expressions are the mainstay of its text processing capabilities, but do you know about all of the features which regexps provide in order to help you do your job?

In this short series of two articles, we'll take a look through some of the less well-known or less understood parts of the regular expression language, and see how they can be used to solve problems with more power and less fuss.

If you're not too familiar with the basics of the regexp language, a good place to start is perlretut, which comes as part of the Perl distribution. We're going to assume that you know about anchors, character classes, repetition, bracketing, and alternation. Where can we go from here?

Multi-line strings

Matching multi-line strings is one thing that I have to admit confuses me every time. I remember that it has something to do with the /m and /s modifiers, so when I think my strings will contain embedded newlines, I just slap both /ms on the end of my regular expression and hope for the best.

This is inexcusable behavior, especially since the distinction is pretty simple. /m has to do with anchors. /s has to do with dots. Let's start by looking at /s. The ``any'' character, ., does not actually match any character; by default, it matches any character except for a newline. So for instance, this won't match:


    "This is my\nmulti-line string" =~ /This.*string/;

Don't just take my word for it. Get into the habit of trying out these things for yourself; with Perl's -e switch, it's very easy to make up a quick test of regular expression behavior if you're unsure:


    % perl -e 'print "Matched!" if "This is my\nmulti-line string" =~
        /This.*string/;'

As predicted, it doesn't print Matched!.

This newline-phobia only relates to the . operator. It's nothing to do with regular expressions in general. If we use something other than a . to match the stuff in the middle, it will work:


    "This is my\nmulti-line string" =~ /This\D+string/;

This matches the first This, then more than one thing that isn't a digit, and then string. Because \n isn't a digit - and nor is anything else between This and string - the regular expression will match.

So the dot operator won't match a newline. If we want to change the behavior of the dot operator, we can use the /s modifier to the regular expression.


    "This is my\nmulti-line string" =~ /This.*string/s;

This time, it matches. If you're using the . operator in your regular expressions and you want it to be able to cross over newline boundaries, use the /s modifier. However, you can sometimes get the same result without using /s by choosing another way of matching

What about anchors? Well, there are two possible things that we might want anchors to do with a multi-line string. We might them to match the start or end of any line in the string, or we might want them to match the start or end of the whole thing. Let's back up a little, and then see how the /m modifier can be used to choose between these two possible behaviors.

First, let's try something we know that doesn't work.


    "This is my\nmulti-line string" =~ /^(.*)$/;

This wants to match the start of the string, any amount of stuff that's not a newline and the end of the string. But we know that there is a newline between the start of the string and the end, so it won't match. We could, of course, allow . to match a newline using the /s trick we've just learnt, and then we can capture the whole lot:


     % perl -e 'print $1 if "This is my\nmulti-line string" =~ /^(.*)$/s'
     This is my
     multi-line string

But instead, we could use the /m modifier. Let's see what happens if we do that:


     % perl -e 'print $1 if "This is my\nmulti-line string" =~ /^(.*)$/m' 
     This is my

Aha! This time, we've changed the meanings of the anchors - instead of matching just the start and end of the string, they now match the start of any line in the string.

What happens when Perl runs this regular expression? Let's pretend we're the regular expression engine for a brief, mad moment.

We start at the beginning of the string. The ^ anchor tells us to match the beginning of a line, which is handy, since we're at one of those right now. Now we match and capture any amount of stuff - so long as it isn't a newline. This takes us up to This is my, and as the next character is a newline, that is where we must stop. Next, we have the $ anchor. Now without the /m modifier, this would want to find the end of the string. We're not at the end of the string - there's \nmulti-line string left to go - so without the /m modifier this match would fail. That's what happened just above.

However, this time we do have the /m modifier, so the meaning of $ has changed. This time, it means the end of any line in the string. As we've had to stop at the \n, that would mean we're at the end of a line. So that means that our $ matches, and the whole expression matches and all is well.

What if we use both the /m and /s modifiers here? Let's see:


    % perl -e 'print $1 if "This is my\nmulti-line string" =~ /^(.*)$/ms'
    This is my
    multi-line string

Well, it looks the same as when we had just used /s. Why? Because we do have /s, the .* can eat up absolutely everything right up to the end of the string. Now our /m-enabled $ matches the end of any line in the string, and indeed we are at the end of the second line in the string, so this matches too. In this case, the /m is superfluous.

Another trick to avoid confusion is to use explicit newlines in your expression. For instance, if you're dealing with data like this:


    Name: Mark-Jason Dominus
    Occupation: Perl trainer
    Favourite thing: Octopodes

    Name: Simon Cozens
    Occupation: Hacker
    Favourite thing: Sleep

then you can split it up with a newline-embedded regexp like so:


    /^Name: (.*)\nOccupation: (.*)\nFavourite thing: (.*)/

This time we don't need any modifiers at all - we want the .* to stop before the newline, and then the explicit newlines themselves obviate the need for start-of-line or end-of-line anchors. In our next article, we'll see how to use the /g modifier to read in multiple records.

So those are the two rules for dealing with multi-line strings: /s changes the behavior of the dot operator. Without /s, . will not match a newline. With /s, . truly matches anything. On the other hand /m changes the behavior of the anchors ^ and $; without /m, these anchors only match the start and end of the whole string. With /m, they match the start or end of any line inside the string.

Spacing, Commenting and Quoting Regexps

Another modifier like /s and /m is /x; /x changes the behavior of whitespace inside a regular expression. Without /x, a literal space inside a regex matches a space in the string. This makes sense:


    "A string" =~ /A string/;

You would expect this to match, and without /x, it does match. Phew. With /x, however, the match fails. Why is this? /x strips literal whitespace of any meaning. If we want to match A string, we have to use either the \s whitespace character class or some other shenanigans:


    "A string" =~ /A\sstring/x;
    "A string" =~ /A[ ]string/x;

How can this conceivably be useful? Well, for a start, by removing the meaning of white space inside a regular expression, we can use whitespace at will; this is particularly useful to help us space out complicated expressions. The rather unpleasant


    ($postcode) = 
        ($address =~ 
    /([A-Z]{1,2}\d{1,3}[ \t]+\d{1,2}[A-Z][A-Z]|[A-Z][A-Z][\t ]+\d{5})/);

becomes the slightly more managable


    ($postcode) = 
        ($address =~
    /(
          [A-Z]{1,2}\d{1,3} [ \t]+ \d{1,2} [A-Z][A-Z]
        | [A-Z][A-Z] [\t ]+ \d{5}
    )/x);

Without /x, we would be looking for literal spaces, tabs and carriage returns inside our postcode, which really wouldn't work out as we want.

Another advantage of using /x is that it allows us to add comments to our regular expression, helping to make the example above even more maintainable:


    ($postcode) = 
        ($address =~
    /(
        # UK Postcode:
          [A-Z]{1,2} # Post town
          \d{1,3}    # Area 
          [ \t]+ 
          \d{1,2}    # Region
          [A-Z][A-Z] # Street part
        | 
        # US Postcode:
          [A-Z][A-Z]   # State
          [\t ]+ 
          \d{5}        # ZIP+5
    )/x);

Of course, to make it still tidier, we can put regular expression components into variables:


    my $post_town = '[A-Z]{1,2}';
    my $area      = '\d{1,3};
    my $space     = '[ \t]+';
    my $region    = '\d{1,2}';
    my $street    = '[A-Z][A-Z]';
    
    my $uk_postcode = "$post_town $area $space $region $street";
    ...

Because variables are interpolated inside regular expressions:


    ($postcode) = 
        ($address =~ /($uk_postcode|$us_postcode)/x);

Perl 5.6.0 introduced the ability to package up regular expressions into variables using the qr// operator. This acts just like q// except that it follows the quoting, escaping and interpolation rules of the regular expression match operator. In our example above, we had to use single quotes for the ``basic'' components, and then double quotes to get the interpolation when we wanted to string them all together into $uk_postcode. Now, we can use the same qr// operator for all the parts of our regular expression:


    my $post_town = qr/[A-Z]{1,2}/;
    my $area      = qr/\d{1,3}/;
    my $space     = qr/[ \t]+/;
    my $region    = qr/\d{1,2}/;
    my $street    = qr/[A-Z][A-Z]/;
   
And we can also add modifiers to parts of a quoted regular expression:

    my $uk_postcode = qr/$post_town $area $space $region $street/x;

Because the modifiers are packaged up inside their own little component, we can ``mix and match'' modifiers inside a single regular expression. If, for instance, we want to match part of it case-insensitively and some case-sensitively:


    my $prefix = qr/zip code: /i;
    my $code   = qr/[A-Z][A-Z][ \t]+\d{5}/;

    $address =~ /$prefix $code/x;

In this example, the prefix part ``knows'' that it has to match case-insensitively and the code part ``knows'' that it should match case-sensitively like any other normal regular expression.

Another boon of using quoted regular expressions is a little off-the-wall. We can actually use them to create recursive regular expressions. For instance, an old chestnut is the question ``How do I extract parenthesized text?''. Well, such a simple problem turns out to be quite nasty to solve using regular expressions. Here's a simple-minded approach:


    $paren = qr/ \( [^)]+ \) /x;

This simple approach works in simple cases:


    "Some (parenthesized) text" =~ /($paren)/;
    print $1; # parenthesized

But fails in complex cases:


    "Some (parenthesised and (gratuitously) sub-parenthesised text"
        =~ /($paren)/;
    print $1; # parenthesized and (gratuitously

Oops. Our expression sees the first closing paren and stops. We need to find a way to tell it to count the number of opening and closing parens and make sure they're balanced before finishing. This actually turns out to be tremendously difficult, and the solution is too messy to show here. Regular expressions are not meant for iterative solutions.

Regular expressions aren't really meant for recursive solutions either, but if we have recursive regular expressions, we can define our balanced-paren expression like this: first match an opening paren; then match a series of things that can be non-parens or an another balanced-paren group; then a closing paren. Turned into Perl code, this becomes:


    $paren = qr/
      \(
        ( 
           [^()]+  # Not parens
         | 
           $paren  # Another balanced group
        )*
      \)
    /x;

This is almost there, but it's not quite correct. Because qr// compiles a regular expression, it does the interpolation right there and then. And when our expression is compiled $paren isn't defined yet, so it's interpolated as an empty string, and we don't get the recursion.

That's OK. We can tell the expression not to interpolate the $paren quite yet with the super-secret regular expression ``don't interpolate this bit yet'' operator: (??{ }). (It has two question marks to remind you that it's doubly secret.) Now we have


    $paren = qr/
      \(
        ( 
           [^()]+  # Not parens
         | 
           (??{ $paren })  # Another balanced group (not interpolated yet)
        )*
      \)
    /x;

When this is run on some text like (lambda (x) (append x '(hacker))), the following happens: we see our opening paren, so all is well. Then we see some things which are not parens (lambda ) and all is still well. Now we see (, which definitely is a paren. Our first alternative fails, we try the second alternative. Now it's finally time to interpolate what's inside the double-secret operator, which just happens to be $paren. And what does $paren tell us to match? First, an open paren - ooh, we seem to have one of those handy. Then some things which are not parens, such as x, and then we can finish this part of the match by matching a close paren. This polishes off the sub-expression, so we can go back to looking for more things that aren't parens, and so on.

Of course, if we need to get this confusing, you might wonder why we're using a regular expression at all. Thankfully, there's a much easier way of doing things: the the Text::Balanced manpage module helps extract all kinds of balanced, quoted and tagged texts, and this is one of the things we'll look at in our next article, next month.

In Conclusion

Regular expressions are like a microcosm of the Perl language itself: it's simple to use them to do simple things with, and most of the time you only need to do simple things with them. But sometimes you need to do more complex things, and you have to start digging around in the dark corners of the language to pull out the slightly more complex tools.

Hopefully this article has shed a little light on some of the dark corners: for dealing with multi-line strings and making expressions more readable with quoting and interpolation. In the next article, we'll look at the dreaded look-ahead and look-behind operators, splitting up text with more than just split, and some CPAN modules to help you get all this done.

This week on Perl 6, week ending 2003-06-01

Another Monday, another Perl 6 Summary. Does this man never take a holiday? (Yes, but only to go to Perl conferences this year, how did that happen?)

We start with the internals list as usual.

More on timely destruction

The discussion of how to get timely destruction with Parrot's Garbage Collection system continued. Last week Dan announced that, for languages that make commitments to call destructors as soon as a thing goes out of scope, there would be a set of helpful ops to allow languages to trigger conditional DOD (Dead Object Detection) runs at appropriate times. People weren't desperately keen on the performance hit that this entails (though the performance hit involved with reference counting is pretty substantial...) but we didn't manage to come up with a better solution to the issue.

http://groups.google.com/groups

Bryan C. Warnock, patchmonster of the week

Bryan C. Warnock seems to be attempting to outdo Leo Tötsch in the patchmonster stakes this week. He put in a miscellany of patches dealing with the Perl based assembler, opcode sizes, debugging flags and probably others. Most of them were applied with alacrity.

The Perl 6 Essentials book

Dan Sugalski gave a rundown of how the Perl 6 Essentials book came about, what's in it and all that jazz. He started by apologizing for not mentioning it before, but he thought he had. This led Clint Pierce to wonder if there was something up with Dan's Garbage Collection system. The existence of the book probably goes some way to explaining Leo Tötsch's relative silence over the last few weeks. Nicholas Clark wondered if it explains why Parrot doesn't have objects yet. Brent Dax wondered when it would be available (by OSCON this year apparently).

http://groups.google.com/groups

IMCC, PASM and constants/macros

Clint Pierce had some big headaches with moving his BASIC interpreter over to IMCC owing to problems with .constant which is legal for the assembler, but not for IMCC. Leo Tötsch pointed Clint at IMCC's .const operator. Bryan Warnock wondered if IMCC and the assembler's syntax couldn't be unified. Leo noted that it wasn't quite that straightforward because .constant declares an untyped constant, but .const requires a type as well. It turns out that .const wasn't quite what Clint needed, so Leo pointed him at .sym and .local which do seem to do what he needs.

http://groups.google.com/groups

3-arg opens

Bryan Warnock wondered if

    open I3, "temp.file", "r"

was valid code. Answer, no, the right way to do it is the Perlish open I3, "temp.file", "<". Jürgen Bömmels promised more and better documentation for the Parrot IO system. Eventually.

http://groups.google.com/groups

Smaller PMCs

Leo Tötsch's work on the new PMC layout continues apace. I'm afraid I don't quite understand what's going on in this area, which does make it rather tricky to summarize things. It seems to have a good deal to do with memory allocation and garbage collection... Leo thinks that it's the right thing, but there seem to be issues involved with good ways of allocating zeroed memory.

http://groups.google.com/groups

An experimental Wiki for parrot development

Mitchell N Charity has put up an experimental Wiki for Parrot and primed it with a few things. Stéphane Payrard pointed out that it's rather hard to make a WikiWord from, for example, PMC. (10 points to the first person to email p6summarizer@bofh.org.uk with the expansion of PMC).

http://groups.google.com/groups

IMCC packfile bug

While toying with pbc2c.pl, Luke Palmer discovered that it doesn't want to play with IMCC generated .pbc files. Apparently this is because we currently have two bytecode file formats. Leo Tötsch thought the problem lay with assemble.pl which is old and slow and doesn't produce 'proper' parrot bytecode. Leo also thought that the way pbc2c.pl worked wasn't actually any use. Dan reckoned the time had come to ditch assemble.pl too, and reckoned there was a case for renaming IMCC as parrot since it can run either .pbc or assembly files. Leo liked the idea, but is concerned about the state of the Tinderbox.

http://groups.google.com/groups

Method Calling

Dan tantalized all those waiting eagerly for objects in Parrot by discussing how to make method calls. This, of course, means a few new ops, called findmeth, callmeth and callccmeth for the time being. Jonathan Sillito had a few naming consistency issues with the ops. Dan agreed there were issues and asked for suggestions for an opcode naming convention.

http://groups.google.com/groups

Simple Constant Propagation in IMCC

Matt Fowles posted a patch to add simple constant propagation to IMCC. Essentially this means that, say

    set I0, 5
    set I1, 2
    set I2, I1
    add I2, I0

would compile as if it were:

    set I0, 5
    set I1, 2
    set I2, 7

Leo Tötsch liked the idea modified it slightly and added it to the code base, but disabled. Apparently there are problems with it, but it's a good starting framework. There need to be lots more tests though...

http://groups.google.com/groups

Make mine SuperSized...

Bryan Warnock (in his own words) popped in to 'waffle on Parrot's core sizes'. He proposed a way of drastically simplifying Parrot's type system. He and Gopal V had a long discussion that I didn't quite follow. I think Leo thinks that what Bryan proposes is doable, but I'm not entirely sure whether he thinks it's a good idea...

http://groups.google.com/groups

Register allocation in IMCC

Clint Pierce had some problems with IMCC's register allocation. He posted an example that gave problems and wondered if the problem was with him or with IMCC. Leo Tötsch confirmed that it was a bug. Luke Palmer pointed Clint at find_global and friends as the 'correct' way to solve the problem. For bonus points, Clint showed of a pathological example of why BASIC should not be anyone's favourite language.

http://groups.google.com/groups

Meanwhile, in perl6-language

Cothreads

As if the Coroutine thread wasn't confusing enough, we now have the Cothread thread, in which Michael Lazzaro argued that we should blur the distinctions between coroutines and threads. Dave Whipp pointed everyone at 'Austin Hastings' draft for A17 (threads)' and argued that, whilst Coroutines, threads, closures, and various other things that Michael had argued were aspects of the same thing were related, they sufficiently different that bundling them all up behind a single class would lead to badness (``a bloated blob with a fat interface'' was the phrase he used).

This thread saw even more unrestrained speculation than usual and saw the first use on the Perl 6 lists of the adjective 'Cozeny', from Simon Cozens, possibly meaning ``feeling that what is being discussed is over fussy and generally trying to take the language a long way from what Real Programmers need''. This would seem to imply a verb form 'to Cozen', ``To more or less forcibly express ones Cozeny feelings''.

I'm afraid this was another thread I had a hard time following. I reckon there's some interesting ideas in there, but I'm hoping that someone will pull it all together in an RFC type document so I can go ``Remember that Cothreads thread last week? Leon Brocard summarized it all neatly in a single proposal, you can find it here.'' (Except it almost certainly won't be Leon Brocard, it'll be Mike Lazzaro, Leon doesn't seem to do perl6-language very much).

http://groups.google.com/groups

http://archive.develooper.com/perl6-language@perl.org/msg14771.html -- Austin Hastings on threads

Compile time binding

In an effort to learn about Perl 6, Luke Palmer has been reading about Haskell. For reasons he doesn't understand, this set him to wondering what ::= is supposed to mean -- it means 'compile time binding', but what does that mean?

Damian Conway came through with the goods, summarizing his answer as ::= is to := as a macro call is to a subroutine call.

http://groups.google.com/groups

Threads and Progress Monitors

Dave Whipp had some more thread questions, and wondered what would be a good Perl 6ish way of implementing a threaded progress monitor. Whilst the discussion of all this was interesting, I'm not sure that it's really much to do with the language, more something that one would implement according to taste and the particular requirements of a given project.

http://groups.google.com/groups

Exegesis 6 Status Update

Damian announced that Exegesis six is mostly written, and should be undergoing final revisions while he and Larry are on the Perl Whirl. Hopefully we'll see the Exegesis before YAPC::America::North.

http://groups.google.com/groups

Acknowledgements, Announcements and Apologies

Thanks once again are due to all the good people on the Perl 6 lists. Apologies will almost certainly be due to the organizers of YAPC North America as I still haven't started writing the talks I'm supposed to be giving.

As I noted last week, I'm awarding points (and points mean prizes) to those kind people who spotted the deliberate mistake. Smylers gets 100 points for spotting the accidental mistake (last week was not in 2004.) Sam Smith, David Wheeler, David Cantrell and Leon Brocard all earned 50 points for spotting the deliberate mistake of not mentioning Leon Brocard. But they've helped me make up for it this week by mentioning him twice, so the karmic balance is restored.

The points I have awarded can be redeemed for the following, wonderful prizes:

  1. A lifetime subscription to the Perl 6 summaries.
  2. Er...
  3. That's it

If you've appreciated this summary, please consider one or more of the following options:

  • Send money to the Perl Foundation at http://donate.perl-foundation.org/ and help support the ongoing development of Perl.
  • Get involved in the Perl 6 process. The mailing lists are open to all. http://dev.perl.org/perl6/ and http://www.parrotcode.org/ are good starting points with links to the appropriate mailing lists.
  • Send feedback, flames, money, photographic and writing commissions, or a patches to Camelbones making it possible to make Perl classes that inherit from Objective C classes (heck, if Ruby and Python can do it) to p6summarizer@bofh.org.uk.
Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en