September 1999 Archives

Topaz: Perl for the 22nd Century


Introduction

One of the more interesting talks at the O'Reilly 1999 Open Source Convention was by Chip Salzenberg, one of the core developers of Perl. He described his work on Topaz, a new effort to completely re-write the internals of Perl in C++. The following article is an abridged version of the transcript of this talk that provide the basic context for Topaz and the objectives for this new project. You can also listen to the complete 85-minute talk using the RealPlayer.

Listen to Chip Salzenburg's Topaz talk! Choose either Real Audio or you can download the MP3.

Topaz is a project to re-implement all of Perl in C++. If it comes to fruition, if it actually works, it's going to be Perl 6. There is, of course, the possibility that for various reasons, things may change and it may not really work out, so that's why I'm not really calling it Perl 6 at this point. Occasionally I have been known to say, "It will be fixed in Perl 6," but I'm just speaking through my hat when I say that.

Who's doing it? Well, it's me mostly for now because when you're starting on something like this, there's really not a lot of room to fit more than one or two people. The core design decisions can't be done in a bazaar fashion (with an "a"), although obviously they can be bizarre (with an "i").

When? The official start was last year's Perl conference. I expected to have something, more or less equivalent to Perl 4 by, well, now. That was a little optimistic.

So how will it be done? Well, it's being done in C++, and there are some reasons for that, one of which is, of course, I happen to like C++. Actually the very first discussion/argument on the Perl 6 porter's mailing list was what language to use. We had some runners-up that actually were under serious consideration.

Choosing A Systems Programming Language

Objective C has some nice characteristics. It's simple and, with a GNU implementation, it is pretty much available everywhere. The downside is that Objective C has no equivalent of inline functions, so you'd have to resort to heavy use of macros again, which is something I'd like to get away from. Also, it doesn't have any support for namespaces, which means that the entire mess we currently have would have to be carried forward: maintaining a separate list of external functions that need to be renamed by the preprocessor during compilation so that you don't conflict with somebody else when you embed it in another program. I really hate that part. Even though it's well done, it's just one of those things you wish you didn't have to do.

In C++ you solve that problem by saying "namespace Perl open curly brace," and the rest is automatic. So that is the reason why Objective C fell out of the running.

Eiffel actually was a serious contender for a long time. That is, until I realized that to get decent performance, Eiffel compilers—or I should say the free Eiffel compiler, because there are multiple implementations—needed to do analysis at link-time as to all the classes that were actually in the program. Eiffel has no equivalent of declaring member functions—I'm using the C++ terminology—declaring them to be virtual or nonvirtual. It intuits this by figuring out the equivalent of the Java characteristic final, i.e., I have no derived classes, at link-time. And so it says, well, if there are no derived classes, then therefore I can just inline this function call. Which is clever and all, but the problem is that Topaz must be able to load classes dynamically at run time and incorporate them into the class structure, and so obviously anything that depends on link-time analysis is right out. So that was the end of Eiffel.

Ada, actually as a language, has much to recommend it. Conciseness is not one of them, but it does have some good characteristics. I do secretly tend toward the bondage and discipline style of programming, i.e., the more you tell the compiler, the more it can help you to enforce the things you told it. However, the only free implementation of Ada, at least the only one I'm aware of, GNAT, is written in Ada. This is an interesting design decision and it obviously helped them. They obviously like Ada so they use it, right? The problem is that if Perl 6 were written in Ada—it would require people to bootstrap GNAT before they could even get to Perl. That's too much of a burden to put on anybody.

So, we're left with C++. It's rather like the comment that I believe Winston Churchill is reported to have said about democracy: It's absolutely the worst system except for all the others. So, C++ is the worst language we could have chosen, except for all the others.

So, where will it run? The plan is for it to run anywhere that there's an ANSI-C++ compiler. Those of you who have seen the movie the mini-series Shogun might remember when the pilot is suppose to learn Japanese, and if he doesn't learn it the entire village will be killed. He can't stand the possibility of all these deaths being on his head so he's about to commit suicide and finally the shogun says, "Well, whatever you learn, it will be considered enough," and so then he's okay with it. Well, that's kind of how I feel about Visual C++. Whatever Visual C++ implements, we shall call that "enough," because I really don't think that we can ignore Windows as a target market. If nothing else, we need the checklist item—works on Windows. Otherwise the people who don't understand what's going on will refuse to Perl in situations where they really need to.

So, you know, unless there's an overriding reason why it's absolutely impossible, although we will use ANSI features as much as possible because ANSI C++ really is a well-done description and a well-done specification for C++ with a few minor things I don't like. Visual C++ is so common we really just can't afford to ignore it.

As for non-Windows platforms, and even for Windows platforms for some people, EGCS (which actually has now been renamed to GCC 2.95) is a really amazingly good implementation of the C++ language. The kind of bugs, the kind of features that they're working on the mailing list, are so esoteric that actually it takes me two or three times to read through just the description of the bug before I understand it. The basic stuff is no problem at all.

The ANSI C++ library for EGCS/GCC is really not all that good at this point, but work is under way on that. I expected them to be more or less done by now, but obviously they're not. I still expect them to be done by the next conference. It's just that the next conference is now the conference 4.0. By then I hope that we'll be able to use that library in the Topaz implementation.

Now, the big question:

Why in the world would I do such a thing? Or rather start the ball rolling? Well the primary reason was difficulty in maintenance. Perl's guts are, well, complicated. Nat Torkington described them well. I believe he said that they are "an interconnected mass of livers and pancreas and lungs and little sharp pointy things and the occasional exploding kidney." It really is hard to maintain Perl 5. Considering how many people have had their hands in it; it's not surprising that this is the situation. And you really need indoctrination in all the mysteries and magic structures and so on—before you can really hope to make significant changes to the Perl core without breaking more things than you're adding.

Some design decisions have made certain bugs really hard to get rid of. For example, the fact that things on the stack do not have the reference counts incremented has made it necessary to fake the reference counting in some circumstances, � la the mortality concept, for those of you who have been in there.

Really, when you think about it, the number of people who can do that sort of deep work because they're willing to or have been forced to put enough time into understanding it, is very limited, and that's bad for Perl, I think. It would be better if the barrier to entry to working on the core were lower. Right now the only thing that's really accessible to everyone is the surface language, so anytime anybody has the feeling that they want to contribute to Perl, the only thing they know how to do is suggest a new feature. I hope in the future they'll be able to do things like suggest an improvement in the efficiency layer or something like that.

The secondary reason actually is new features. There are some features there where people say, "Yeah, I want that just cuz it's cool." First of all, dynamic loading of basic types—and I'll give an example of that later—the basic concept is if you want to invent a new thing like a B-tree hash, you shouldn't have to modify the Perl core for that. You should just be able to create an add-on that's dynamically loaded and inserts itself and then you'd be able to use it.

Robust byte-code compilation is another such feature. Now, in complete honesty, I don't know. I haven't looked at the existing byte-code compilation output, but I do know from examining how the internals work that retrofitting something like that is quite difficult. If you incorporate it into the structure of the OP-tree (for those of you who know what that is, the basic operations), there's the concept of a separation between designing the semantic tree (as in "this is what I want") versus designing the runtime representation for efficient execution. Once you've made that separation, now you can also have a separate implementation of the semantic tree, which is, say, just a list of byte codes that would be easy to write to a file and then read back later. So, separation of representing the OP-tree statically versus what you use dynamically is an important part of that part the internals.

Also, something that could be done currently but nobody's gotten around to it—Micro Perl. Now if you built Perl, you've noticed that there's a main Perl, and then there's Mini Perl, which you always to expect to have a little price tag hanging off of, and then there's the concept of Micro Perl, which is even smaller than Mini Perl. The idea here is: What parts of Perl can you build without any knowledge that Configure would give you. Or perhaps, only very, very, very little configure tests. For example, we could assume ANSI or we could assume pseudo-POSIX. In any case, even if you limit yourself to ANSI, you've got quite a bit of stuff. You, of course, have all the basic internal data structures in the language. You can make a call to system, to spawn children, and a few other things, and that basically gives you Perl as your scripting language. Then you can write the configure in Micro Perl. I don't know about you, but I'd much rather use Micro Perl as a language for configuration than sh, because who knows what particular weird variant of sh you're going to have, and really isn't it kind of a pain to have to spawn an external text program just to see if two strings are equal? Come on. Okay, so that's also part of the plan. We could do this with Perl 5, who knows maybe now that I've mentioned it somebody will, but that's also something I have in mind.

Why not use C? Certainly C does have a lot to recommend it. The necessity of using all those weird macros for namespace manipulation, which I'd rather just use the namespace operator for, and the proliferation of macros are all disadvantages. Stroustrup makes the persuasive argument that every time you can eliminate a macro and replace it with an inline function or a const declaration or something or that sort, you are benefiting yourself because the preprocessor is so uncontrolled and all of the information from it is lost when you get to the debugger. So I'd prefer to use C++ for that reason.

Would it be plausible to use Perl, presumably Perl 5 to automatically generate parts of Perl 6? And the answer is yes, that absolutely will be done. The equivalent of what is now opcode.pl will still exist, and it will be generating a whole bunch of C++ classes to implement the various types of OPs.

A perfect Perl doesn't have systems programming as part of its target problem domain. That's what C++ and C and those other languages are for. Those are systems programming languages. Perl is an application language, and in fact one of the things that I really felt uncomfortable about Eiffel was that it also is really an applications programming language. The whole concept of pointers and pointer arithmetic and memory management—if you read Meyer's new book, the chapter on memory management begins with "Ideally, we would like to completely forget about memory management." And I thought to myself, well that's great if you're doing applications, but for systems programming, that's nuts. It's an example of what the language is for. When I was trying to figure out how to be persuasive on this subject, I finally realized that Perl may be competing with Java in the problem space, but when you're writing Perl, implementing the Perl runtime, really what you're doing is something equivalent to writing a JVM. You're writing the equivalent of a Java Virtual Machine. Now, would you write a JVM in Eiffel? I don't think so. No, so neither would you write the Perl runtime in Java or in Eiffel.

How or Why Perl Changes

The language changes only when Larry says so. What he has said on this subject is that anything that is officially deprecated is fair game for removal. Beyond that I really need to leave things as is. He's the language designer. I'm the language implementer, at least for this particular project. It seems like a good separation of responsibilities. You know, I've got enough on my plate without trying to redesign the language.

Larry is open to suggestions, and in fact that was an interesting discussion we had recently on the Perl 5 Porters mailing list. Was the syntax appropriate for declaring variables to give appropriate hints to a hypothetical compiler? That is to say MY INT $X or MY STR $Y -- and I thought that the INT and the STR and the NUM should be suffixes, something like MY $X:NUM—and, in fact, that suffix syntax is something that Larry officially has blessed, but just not for this purpose. That's the instinct of the language designer coming to the fore saying that something that is a string or a number should not be so hard to type. It should read better.

Meanwhile, if you want to declare something as being a reference to a class - MY DOG SPOT—that's going to work. You can say that $SPOT when it has a defined value will be a reference to an object of type DOG or at least of a type that is compatible with DOG, and the syntax is already permitted in the Perl parser; it doesn't do very much yet but that will be more fully implemented in the future as well. Many of the detailed aspects of this came about not just springing fully formed from Larry's forehead but as a result of discussion. So yes, he absolutely is taking suggestions.

Getting into the Internals

Now I'd like to ask how many of you do not know anything about C++? Okay, a fair number, so I'm going to have to explain—everyone else is lying. Two kinds of people: people who say that they know C++ and the truthful. Okay. C++ is complicated, definitely. Actually that reminds me, I'm doing this in C++ and I use EMACS. Tom Christiansen asked me, "Chip, is there anything that you like that isn't big and complicated?" C++, EMACS, Perl, Unix, English—no, I guess not.

At this point, Chip begins to dive rather deep into a discussion of the internals. You can listen to the rest of his talk if you are interested in these details.

Open Source Highlights


Overview

August 21 - 24, 1999

My main reason for attending the Open Source Conference is to observe Open Source developments and to gather business intelligence for Chevron. I learned Python. I also concentrated on understanding the business case for Open Source and understanding and interpreting correctly events in the industry.

Learning Python and Python for Windows.

Armed with my recent experience of ploughing through the most obfuscated Perl code, I chose to learn Python, a well-constructed, object-oriented language. Python was created by Guido von Rossum who named it after his favorite TV show, Monty Python's Flying Circus. Python is easy to read and handles Object Oriented Programming in a natural and easy to learn way. The development time of any project in Python is fast. It also tends to encourage clarity in human communication as its very execution depends on use of white space and indentation. The Python Development Environment (IDLE) also, rather neatly, enables "grep" searches for strings in any Unix or NT files I attended an excellent Windows Python tutorial which emphasized using Python with the array of Windows functions from COM, the register, as a macro language and as a test-harness for other systems...in addition to its "normal" function of data processing and as systems "glue". Much of the tutorial attended to COM processing with Excel and Word examples, databases, systems administration, C++ and DLLs. Python Programming on Win32 will be published in November 1999 - see here. I also asked about Python and LDAP and was able to locate sites at: www.it.uq.edu.au/~leonard/dc-prj/ldapmodule and http://sites.inka.de/ms/python/ldap-client.cgi Python LDAP calls are well constructed and clearer to read than Perl. ...but more research is needed here.

Keynote Speech—"Rules for Revolutionaries"

Guy Kawasaki, a venture capitalist and previously an evangelist for Apple, gave the keynote "Rules for Revolutionaries". His speech was both funny and inspirational. He suggested ten things to do to succeed, starting with examples of change in food preservation and regaled us with stories drawn from personal experience and the computer industry. Guy gave good advice for anyone attempting change. I have a video copy of this speech should anyone like to see it. See also book references, "Crossing the Chasm" by Geoffrey Moore and "Rules for Revolutionaries" by Guy Kawasaki. The mechanism in a revolution, he reminded us, was not "a rising tide floats all boats"...but..."in a tornado even turkeys can fly". He emphasized, it is our objective as revolutionaries to create the tornado.

People and Information.

I spoke with VA/Linux to clarify their confidence in the small-margin market of selling hardware with pre-loaded Linux. They had been in operation since 1993 and had a lot of Linux turnkey experience particularly service and support. Among their talents is Beowulf installation - I thought they might be of interest to our high-end computing people. Certainly their $899 personal computers are blindingly fast and I consider they have good prospects when they go public. According to John Vrolyk of SGI (Silicon Graphics), VA/Linux has a strong business model. SGI has invested a large amount and is co-developing software. Armed with knowledge of Unilever's success in managing a Sendmail backbone and IMAP connectivity with Outlook desktops, I considered it worth asking about the economics and possibility of running a similar system at Chevron. The Sendmail people were not able to present any base figures and told me their Director of Corporate Accounts would contact me. I said there was only a point in doing that if we could work up some comparative figures with which to work. Point pending. I spoke with Derrick Story of O'Reilly and Bob McMillan of Linux Magazine, both of whom were encouraging in wanting to keep in contact and to consider any articles I might care to write. I have been published before so it's a reasonable stretch. I also met with Andrew Leonard of Salon who wants to include the story of Open Source at Chevron in a new book he's writing on the movement.

"The State of Python" Address

Guido von Rossum gave the keynote address indicating increased interest in Python. During August alone over 8,000 Windows versions have been downloaded and the Python website has had over 63,000 hits. Guido reviewed the recent successes of Python in Web Development Packages (ZOPE), Mailman, JPYTHON, Windows (COM and ASP), XML, Open Classroom, Industrial Light and Magic (Star Wars), Yahoo and Lawrence Livermore Labs. He then referred to CP4E (Computer Programming for Everyone) and outlined why he expected Python will take over from Pascal in education. The next release of Python will be issued in 2000 and Python 2.0 in 2001. DARPA is supporting further development of the IDLE developer environment.

Linux in Wearable Computer Research

Thad Starner from Georgia Tech is a most friendly, intelligent and innovative man. He described the status of wearable research and how he personally uses a wearable for all his computing needs. He described why Linux is an ideal choice for research and alluded to the advantages of Linux:

  • Research needs several tries to do anything good...and since, at the start you don't know what you are doing (that's research), you need to be able to make changes quickly.
  • Market-driven research is a fallacy. Consumers don't know what they want and even though they may express interest, don't know enough to express what is reasonable.
  • It's a real problem when research degenerates into struggling with the interface problem of a proprietary system.
  • Commercial packages create balkanization of projects where one groups find it difficult to talk to one another. Code is "idea" exchange.
  • No black walls round bits of code makes training easier.
  • Complicated machines are possible within small budgets.
  • The usual arguments of Linux giving flexibility, stability, scalability, obsolescence protection, real time, drivers, raid prototyping, remote access and networking at low cost.
  • Need for greater than 640 x 480 displays
Other factors he mentioned were low fixed cost base for embedded devices, great community, easy porting to other platforms. Dr Starner then spoke of how he runs his research administration and announced the Wearable Computer Conference (ISWC) to be held in Oct 18 - 19, 1999 at the Cathedral Hill Hotel in San Francisco. The web site is http://iswc.gatech.com. Other subjects he covered were wearable research to support the deaf or blind...and gambling, particularly the work of Shannon and Thorp who did research with shoe-based computers in Las Vegas running simulations timing ball, rotation of wheel etc., giving them a 44% advantage over the house. Dr Starner gave me names for "blind" research. Collins, who created a camera/tactile blind navigation system, John Goldwaite from Georgia Tech and David Ascher, who taught me Python, at a San Francisco sight research organization - address <mailto:"da@python.net">da@python.net. Getting back to wearables, Thad described the keyboard, twiddler (a combination keyboard and mouse enabling sixty wpm input), the retinal projection system and the now credit card sized processor. Very short-range wireless and IR communication are used for communication. He described the non-intrusive collaboration that is enabled and a nice remembrance agent that works under EMACS...also how he used this to sit for his Ph.D. This technology has much significance for Chevron in supporting the disabled in computing and, more mainstream, in refinery and pipeline "hands-off" computer work. Current research is tracking fingertips, glasses that attach links to physical reality, messaging and a form of active badges, baseball cap mounted sign-language to English translator, circuits sewn in clothing. The future will include wearables. There are 8 billion computers on the planet, two percent only of which are desktops. Cray-like power can be carried and very short-range wireless communications used. Cell phones will run an OS...lots of other stuff...smart dust, etc. References:http://www.gvu.gatech.edu/ccg and http://www.media.mit.edu/wearables. Andy Barrow has a tape recording of this talk should you wish to hear it.

Making the Business Case to Management for Open Source

Barry Caplin, a manager of USWest, gave a presentation on how to make the case for open source to management. The detail slides are at www.users.uswest.net/~bcaplin/talks

Barry spoke of management fears about Open Source, the problems with proprietary systems, and how to make the case. In particular, that deserves a summary here. Making the case consists of:

  • Gather the information
  • Journal the current situation
  • Journal the company's current capabilities and skills
  • Determine company's needs and goals
  • Identify players and allies
  • Identify top-tech minds
  • Get feedback from a sympathetic manager
  • Identify people you have to convince and target the presentation accordingly
  • Publish a White Paper

There was considerable debate during question time about the economic viability of Open Source. Issues were discussed but hardly resolved (see Keynote—"Extreme Business" by John Vrolyk of SGI for a more definitive process).

The White Paper should be a "living document", address itself to the core purpose, vision and corporate culture, should contain some degree of "comfort factor" and must contain:

  • An Executive Summary.
  • Relevant Company History.
  • A Summary Expertise Matrix of People Skills
  • The Criteria for Choice including scalability, security, robustness, risk, cost of conversion, lowered operational cost, stability, training and standardization, advantages of shared code and shared people.
  • A Plan to Integrate other proprietary and commercial products.
  • Summary and Conclusions.
The White Paper must not contain too much opinion or technical depth - details can be discussed later.

Keynote Speech - "Sun and Open Source"

Bill Joy of Sun described the BSD Unix and vi editor developments and the difficulties and successes he had experienced before he joined Sun. He emphasized the strength of copyright in enforcing "good behavior" rather than contract law implying that the GNU Public License created by Richard Stallman in 1981 was a particularly good method of ensuring that Open Source would not fragment.

"Extreme Business" - Is Linux Economically Viable?

This address, given by John Vrolyk, a senior VP of Silicon Graphics, was very impressive. I have an audio tape should anyone wish to listen to it. Mr Vrolyk considered the next phase of Linux development would be the alignment around brands. He had some interesting things to say:

  • The OS is a commodity, no end-user really cares what it is.
  • Microsoft should concentrate on desktop applications.
  • SGI had released their IRIX file journalling system to Open Source.
  • SGI, HP and Intel are guaranteeing smooth transfer of Linux to 64bit chips.

Vrolyk made the example of economic sustainability by using water as an example. Water is free. But Perrier and Pellegrino seem to do very well. Case closed.

Regarding business models, he alluded to those of the VA/Linux (turnkey), Sun (envelop), IBM (just throw money), SGI (hardware and service), O'Reilly (publishing), Stonehenge (training) or Red Hat (GIL-like distribution and service) type and said it was unclear which would succeed well, but SGI were investing in VA/Linux as one with good potential. He also added that the industry in general is turning from having to go cap-in-hand for compliance testing for Redmond's proprietary and often secret standards. This is a revolution. "Stupid ideas" he said "only last for any time in large corporations". I reflected on recent events in Chevron. We have positioned ourselves reasonably well for the Tsunami about to hit.

©Chevron Corporation.

White Camel Awards


The Perl Conference brought us the first "White Camel" awards. The White Camel Awards were created to honor individuals who devote remarkable creativity, energy, and time to the non-technical work that supports Perl's active and loyal user community. The awards were conceived of and administered by Perl Mongers, a not-for-profit organization whose mission is to establish Perl user groups. In addition to Perl Mongers, O'Reilly and sourceXchange, sponsored the awards.

I recently had the chance to talk with the winners and ask them a few questions about the White Camel awards and their contributions to the Perl community.

Kevin Lenzo

You won the "Perl Community" White Camel Award for YAPC. Briefly (50 pages or less), what was YAPC and what went into planning it?

Yet Another Perl Conference is a grassroots Perl conference, and the first one (YAPC 1999) was hosted at Carnegie Mellon University. YAPC 19100 will stay in Pittsburgh, incubating here for another year, and then I'd like to find another host city somewhere in Eastern North America—probably somewhere there's a Perl Monger's stronghold near a university.

The conference was cheap: $60 conference cost, including food, covering the two-day event. We actually ran a little short on the budget in the end, but this was pried out of me in the closing session, and the community had $1000 on the stage in about 30 minutes! You don't see that very often. The next one will probably cost a wee bit more, just to avoid the shortfall, but the whole thing is intended to be zero profit.

The aim of YAPC is to be an affordable, regional conference, that people can get to, and that students, hobbyists and enthusiasts can get to without selling their computers. We had some great speakers show up and volunteer their time, also—I think the speakers did a great job, even when the mechanics of the conference occasionally broke down.

Next year, we have some interesting challenges—we're not the best-kept secret anymore. Even with double the capacity, I think we will have to turn people away, and we'll surely get more talk submissions than we'll be able to show. One goal I have is to bring some discussion of the internals and directions of Perl itself more to the next YAPC, and from the response so far, I'd say we'll be seeing it.

What does receiving the White Camel award mean to you personally?

It's pretty overwhelming—and I've felt that way ever since the conference. In a way it's a physical symbol of the goodwill that almost broke me up at the end of YAPC. It's not something one can scheme to get!

Does the award mean more to you because you were chosen by your peers?

Quite. I feel like I've been elected as a Tribune by my peers, and it's my responsibility to carry that trust, and use any power it may grant to battle evil magistrates.

Has receiving the award renewed your enthusiasm or have you always been active in the Perl community?

I've had increasing activity in the Perl community over the last five or so years. I'd like to point out that Internet Relay Chat, newsgroups, and mailing lists have been really important to me, and I'd say the EFNet IRC channel #perl really helped catalyze YAPC. If I have any great involvement, that was the gateway to it.

As far as renewing my enthusiasm—it had never waned. We do have some serious issues ahead of us in carrying Perl to the next level, as a language and as a platform. Perhaps my enthusiasm is tempered by the looming due diligence.

How do you think the White Camels will affect the Perl community?

I think the awards certainly help raise awareness, both inside and out, of the Perl community—which appears to be becoming self-aware. This sort of recongition certainly helps me when I ask major speakers to come and speak at YAPC. The award has, in some ways, legitimized me with respect to Perl, so that I can ask about things and speak of it as "for Perl." I think these awards are strong indicators to the places folks work, for instance, that their involvement with Perl is justified, and I know Carnegie Mellon University took it quite seriously. It has both freed me to officially focus, but also has brought certain responsibilties. The award made me look "presidential," as they say in election years, and that apparent legitimacy helps me work for Perl.

Do you have any exciting plans for your new found fame and fortune?

Well, just gearing up for the next YAPC. If you've seen any of the Infobot work, you know I have an interest in group communication and discussion. Well, I'd really like to expose the planning process of YAPC itself through interactive means and web sites that can be group-authored. I'd like to see the community helping to structure the conference, and give feedback during the planning stages—to bring groupware to the whole conference planning process. It is a community event, after all.

I'd like to mention one other thing I'm working on here at CMU, though— we're about to release Sphinx, a major speech recognition engine as open source, and I'm intending to make a Perl module to go with it, so you can start talking to your desktops. Speech isn't good for everything, and there are lots of things it's inappropriate for, but it's nice to say "turn on the lights in the kitchen." Desktop agents, global communication using the net, and speech interaction are going to change the way we work and talk, and I'd like to see speech technology available in Perl.

Adam Turoff

You won the "Perl User Group" White Camel Award. What were some of the things you did for the Perl community to earn this award?

  • Founding member of NY.pm
  • Founded PHL.pm, with the monthly "social" dinners
  • Started the monthly tech meeting series for PHL.pm
  • Started the biweekly perl reading group for PHL.pm
    (at this point, we meet 4 weeks/month)
  • Started the (mostly) weekly reading group at ISI (my employer).
  • Organized the signup lists at TPC2.0 (helping ~2-4 dozen groups form or grow IIRC)
  • Perl evangelism at general tech events (conferences, LUG meetings)
  • Free Stuff!! (Mostly FreeBSD goodies at large events.)
  • Hardware scout for pm.org. (Many thousand thank you's to our anonymous donor, Elaine Ashton.)

What does receiving the White Camel award mean to you personally?

I think it means that the Perl Community is hitting the next level. Two years ago, the Perl Community wasn't very well formed, except for groups like p5p and other people interested in extending Perl technically. Today, we have three categories where people can be rewarded for making non-technical contributions to the Perl Community.

Taking Perl to the next level means bringing Perl to new audiences and in new directions. One way to do that is to promote Perl user groups where we are interested in discussing common CGI idioms, venting about Microsoft, swapping Twilight Zone stories, talking about Postmodernism or sharing really cool observations about Perl. All at the same time, of course.

Has receiving the award renewed your enthusiasm or have you always been active in the Perl community?

I've been active in the Perl Community since TPC 1.0. [TPC 1.0 is The Perl Conference 1.0]

How do you think the White Camels will affect the Perl community?

The Perl community is both quite social, very eclectic. I hope that the White Camels (specifically the user groups and community awards) will help to highlight and encourage user groups everywhere to continue these traditions.

One of the themes that has been circulating since the conference is that of the quiet majority of Perl users. We may not be as big or as vocal as the Java or Win* communities, but we use Perl and get the job done without making lots of noise. That makes it more difficult to advocate Perl and help it to grow in new directions. I hope that over the next few years, Perl advocacy grows and becomes more effective and more visible. I can't wait to see the White Camel awards for Advocacy over the next few years.

Do you have any exciting plans for your new found fame and fortune?

Taking phl.pm out to dinner, and buying an exotic computer or two.


These were some of the first people who distinguished themselves enough in the Perl community to earn the White Camel Awards. These awards were an excellent idea, the presence of peer awards such as these will undoubtably affect the Perl community. Any Perl programmer out there would be very proud to display their own White Camel award.

Bless My Referents


Introduction

Damian Conway is the author of the newly released Object Oriented Perl, the first of a new series of Perl books from Manning.

Object-oriented programming in Perl is easy. Forget the heavy theory and the sesquipedalian jargon: classes in Perl are just regular packages, objects are just variables, methods are just subroutines. The syntax and semantics are a little different from regular Perl, but the basic building blocks are completely familiar.

The one problem most newcomers to object-oriented Perl seem to stumble over is the notion of references and referents, and how the two combine to create objects in Perl. So let's look at how references and referents relate to Perl objects, and see who gets to be blessed and who just gets to point the finger.

Let's start with a short detour down a dark alley...

References and referents

Sometimes it's important to be able to access a variable indirectly— to be able to use it without specifying its name. There are two obvious motivations: the variable you want may not have a name (it may be an anonymous array or hash), or you may only know which variable you want at run-time (so you don't have a name to offer the compiler).

To handle such cases, Perl provides a special scalar datatype called a reference. A reference is like the traditional Zen idea of the "finger pointing at the moon". It's something that identifies a variable, and allows us to locate it. And that's the stumbling block most people need to get over: the finger (reference) isn't the moon (variable); it's merely a means of working out where the moon is.

Making a reference

When you prefix an existing variable or value with the unary \ operator you get a reference to the original variable or value. That original is then known as the referent to which the reference refers.

For example, if $s is a scalar variable, then \$s is a reference to that scalar variable (i.e. a finger pointing at it) and $s is that finger's referent. Likewise, if @a in an array, then \@a is a reference to it.

In Perl, a reference to any kind of variable can be stored in another scalar variable. For example:

$slr_ref = \$s;     # scalar $slr_ref stores a reference to scalar $s
$arr_ref = \@a;     # scalar $arr_ref stores a reference to array @a
$hsh_ref = \%h;     # scalar $hsh_ref stores a reference to hash %h
Figure 1 shows the relationships produced by those assignments.

Note that the references are separate entities from the referents at which they point. The only time that isn't the case is when a variable happens to contain a reference to itself:

$self_ref = \$self_ref;     # $self_ref stores a reference to itself!
That (highly unusual) situation produces an arrangement shown in Figure 2.

Once you have a reference, you can get back to the original thing it refers to—it's referent—simply by prefixing the variable containing the reference (optionally in curly braces) with the appropriate variable symbol. Hence to access $s, you could write $$slr_ref or ${$slr_ref}. At first glance, that might look like one too many dollar signs, but it isn't. The $slr_ref tells Perl which variable has the reference; the extra $ tells Perl to follow that reference and treat the referent as a scalar.

Similarly, you could access the array @a as @{$arr_ref}, or the hash %h as %{$hsh_ref}. In each case, the $whatever_ref is the name of the scalar containing the reference, and the leading @ or % indicates what type of variable the referent is. That type is important: if you attempt to prefix a reference with the wrong symbol (for example, @{$slr_ref} or ${$hsh_ref}), Perl produces a fatal run-time error.

[A series of scalar variables with arrows pointing to
other variables]
Figure 1: References and their referents

[A scalar variable with an arrow pointing back to
itself]
Figure 2: A reference that is its own referent

The "arrow" operator

Accessing the elements of an array or a hash through a reference can be awkward using the syntax shown above. You end up with a confusing tangle of dollar signs and brackets:

${$arr_ref}[0] = ${$hsh_ref}{"first"};  # i.e. $a[0] = $h{"first"}
So Perl provides a little extra syntax to make life just a little less cluttered:
$arr_ref->[0] = $hsh_ref->{"first"};    # i.e. $a[0] = $h{"first"}
The "arrow" operator (->) takes a reference on its left and either an array index (in square brackets) or a hash key (in curly braces) on its right. It locates the array or hash that the reference refers to, and then accesses the appropriate element of it.

Identifying a referent

Because a scalar variable can store a reference to any kind of data, and because dereferencing a reference with the wrong prefix leads to fatal errors, it's sometimes important to be able to determine what type of referent a specific reference refers to. Perl provides a built-in function called ref that takes a scalar and returns a description of the kind of reference it contains. Table 1 summarizes the string that is returned for each type of reference.

If $slr_ref contains... then ref($slr_ref) returns...
a scalar value undef
a reference to a scalar "SCALAR"
a reference to an array "ARRAY"
a reference to a hash "HASH"
a reference to a subroutine "CODE"
a reference to a filehandle "IO" or "IO::Handle"
a reference to a typeglob "GLOB"
a reference to a precompiled pattern "Regexp"
a reference to another reference "REF"


Table 1: What ref returns

As Table 1 indicates, you can create references to many kinds of Perl constructs, apart from variables.

If a reference is used in a context where a string is expected, then the ref function is called automatically to produce the expected string, and a unique hexadecimal value (the internal memory address of the thing being referred to) is appended. That means that printing out a reference:

print $hsh_ref, "\n";
produces something like:
HASH(0x10027588)
since each element of print's argument list is stringified before printing.

The ref function has a vital additional role in object-oriented Perl, where it can be used to identify the class to which a particular object belongs. More on that in a moment.

References, referents, and objects

References and referents matter because they're both required when you come to build objects in Perl. In fact, Perl objects are just referents (i.e. variables or values) that have a special relationship with a particular package. References come into the picture because Perl objects are always accessed via a reference, using an extension of the "arrow" notation.

But that doesn't mean that Perl's object-oriented features are difficult to use (even if you're still unsure of references and referents). To do real, useful, production-strength, object-oriented programming in Perl you only need to learn about one extra function, one straightforward piece of additional syntax, and three very simple rules. Let's start with the rules...

Rule 1: To create a class, build a package

Perl packages already have a number of class-like features:

  • They collect related code together;
  • They distinguish that code from unrelated code;
  • They provide a separate namespace within the program, which keeps subroutine names from clashing with those in other packages;
  • They have a name, which can be used to identify data and subroutines defined in the package.
In Perl, those features are sufficient to allow a package to act like a class.

Suppose you wanted to build an application to track faults in a system. Here's how to declare a class named "Bug" in Perl:

package Bug;
That's it! In Perl, classes are packages. No magic, no extra syntax, just plain, ordinary packages. Of course, a class like the one declared above isn't very interesting or useful, since its objects will have no attributes or behaviour.

That brings us to the second rule...

Rule 2: To create a method, write a subroutine

In object-oriented theory, methods are just subroutines that are associated with a particular class and exist specifically to operate on objects that are instances of that class. In Perl, a subroutine that is declared in a particular package is already associated with that package. So to write a Perl method, you just write a subroutine within the package that is acting as your class.

For example, here's how to provide an object method to print Bug objects:

package Bug;
sub print_me
{
       # The code needed to print the Bug goes here
}
Again, that's it. The subroutine print_me is now associated with the package Bug, so whenever Bug is used as a class, Perl automatically treats Bug::print_me as a method.

Invoking the Bug::print_me method involves that one extra piece of syntax mentioned above—an extension to the existing Perl "arrow" notation. If you have a reference to an object of class Bug, you can access any method of that object by using a -> symbol, followed by the name of the method.

For example, if the variable $nextbug holds a reference to a Bug object, you could call Bug::print_me on that object by writing:

$nextbug->print_me();
Calling a method through an arrow should be very familiar to any C++ programmers; for the rest of us, it's at least consistent with other Perl usages:
$hsh_ref->{"key"};           # Access the hash referred to by $hashref
$arr_ref->[$index];          # Access the array referred to by $arrayref
$sub_ref->(@args);           # Access the sub referred to by $subref
$obj_ref->method(@args);     # Access the object referred to by $objref
The only difference with the last case is that the referent (i.e. the object) pointed to by $objref has many ways of being accessed (namely, its various methods). So, when you want to access that object, you have to specify which particular way—which method—should be used. Hence, the method name after the arrow.

When a method like Bug::print_me is called, the argument list that it receives begins with the reference through which it was called, followed by any arguments that were explicitly given to the method. That means that calling Bug::print_me("logfile") is not the same as calling $nextbug->print_me("logfile"). In the first case, print_me is treated as a regular subroutine so the argument list passed to Bug::print_me is equivalent to:

( "logfile" )
In the second case, print_me is treated as a method so the argument list is equivalent to:
( $objref, "logfile" )
Having a reference to the object passed as the first parameter is vital, because it means that the method then has access to the object on which it's supposed to operate. Hence you'll find that most methods in Perl start with something equivalent to this:
package Bug;
sub print_me
{
    my ($self) = shift;
    # The @_ array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self 
    # and the explicit arguments (still in @_)
}
or, better still:
package Bug;
sub print_me
{
    my ($self, @args) = @_;
    # The @args array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self
    # and the explicit arguments (now in @args)
}
This second version is better because it provides a lexically scoped copy of the argument list (@args). Remember that the @_ array is "magical"—changing any element of it actually changes the caller's version of the corresponding argument. Copying argument values to a lexical array like @args prevents nasty surprises of this kind, as well as improving the internal documentation of the subroutine (especially if a more meaningful name than @args is chosen).

The only remaining question is: how do you create the invoking object in the first place?

Rule 3: To create an object, bless a referent

Unlike other object-oriented languages, Perl doesn't require that an object be a special kind of record-like data structure. In fact, you can use any existing type of Perl variable—a scalar, an array, a hash, etc.—as an object in Perl.

Hence, the issue isn't how to create the object, because you create them exactly like any other Perl variable: declare them with a my, or generate them anonymously with a [...] or {...}. The real problem is how to tell Perl that such an object belongs to a particular class. That brings us to the one extra built-in Perl function you need to know about. It's called bless, and its only job is to mark a variable as belonging to a particular class.

The bless function takes two arguments: a reference to the variable to be marked, and a string containing the name of the class. It then sets an internal flag on the variable, indicating that it now belongs to the class.

For example, suppose that $nextbug actually stores a reference to an anonymous hash:

$nextbug = {
                id    => "00001",
                type  => "fatal",
                descr => "application does not compile",
           };
To turn that anonymous hash into an object of class Bug you write:
bless $nextbug, "Bug";
And, once again, that's it! The anonymous array referred to by $nextbug is now marked as being an object of class Bug. Note that the variable $nextbug itself hasn't been altered in any way; only the nameless hash it refers to has been marked. In other words, bless sanctified the referent, not the reference. Figure 3 illustrates where the new class membership flag is set.

You can check that the blessing succeeded by applying the built-in ref function to $nextbug. As explained above, when ref is applied to a reference, it normally returns the type of that reference. Hence, before $nextbug was blessed, ref($nextbug) would have returned the string 'HASH'.

Once an object is blessed, ref returns the name of its class instead. So after the blessing, ref($nextbug) will return 'Bug'. Of course the object itself still is a hash, but now it's a hash that belongs to the Bug class. The various entries of the hash become the attributes of the newly created Bug object.

[A picture of an anonymous hash having a flag set within it]
Figure 3: What changes when an object is blessed

Creating a constructor

Given that you're likely to want to create many such Bug objects, it would be convenient to have a subroutine that took care of all the messy, blessy details. You could pass it the necessary information, and it would then wrap it in an anonymous hash, bless the hash, and give you back a reference to the resulting object.

And, of course, you might as well put such a subroutine in the Bug package itself, and call it something that indicates its role. Such a subroutine is known as a constructor, and it generally looks like this:

package Bug;
sub new
{
    my $class = $_[0];
    my $objref = {
                     id    => $_[1],
                     type  => $_[2],
                     descr => $_[3],
                 };
    bless $objref, $class;
    return $objref;
}
Note that the middle bits of the subroutine (in bold) look just like the raw blessing that was handed out to $nextbug in the previous example.

The bless function is set up to make writing constructors like this a little easier. Specifically, it returns the reference that's passed as its first argument (i.e. the reference to whatever referent it just blessed into object-hood). And since Perl subroutines automatically return the value of their last evaluated statement, that means that you could condense the definition of Bug::new to this:

sub Bug::new
{
        bless { id => $_[1], type => $_[2], descr => $_[3] }, $_[0];
}
This version has exactly the same effects: slot the data into an anonymous hash, bless the hash into the class specified first argument, and return a reference to the hash.

Regardless of which version you use, now whenever you want to create a new Bug object, you can just call:

$nextbug = Bug::new("Bug", $id, $type, $description);
That's a little redundant, since you have to type "Bug" twice. Fortunately, there's another feature of the "arrow" method-call syntax that solves this problem. If the operand to the left of the arrow is the name of a class —rather than an object reference—then the appropriate method of that class is called. More importantly, if the arrow notation is used, the first argument passed to the method is a string containing the class name. That means that you could rewrite the previous call to Bug::new like this:
$nextbug = Bug->new($id, $type, $description);
There are other benefits to this notation when your class uses inheritance, so you should always call constructors and other class methods this way.

Method enacting

Apart from encapsulating the gory details of object creation within the class itself, using a class method like this to create objects has another big advantage. If you abide by the convention of only ever creating new Bug objects by calling Bug::new, you're guaranteed that all such objects will always be hashes. Of course, there's nothing to prevent us from "manually" blessing arrays, or scalars as Bug objects, but it turns out to make life much easier if you stick to blessing one type of object into each class.

For example, if you can be confident that any Bug object is going to be a blessed hash, you can (finally!) fill in the missing code in the Bug::print_me method:

package Bug;
sub print_me
{
    my ($self) = @_;
    print "ID: $self->{id}\n";
    print "$self->{descr}\n";
    print "(Note: problem is fatal)\n" if $self->{type} eq "fatal";
}
Now, whenever the print_me method is called via a reference to any hash that's been blessed into the Bug class, the $self variable extracts the reference that was passed as the first argument and then the print statements access the various entries of the blessed hash.

Till death us do part...

Objects sometimes require special attention at the other end of their lifespan too. Most object-oriented languages provide the ability to specify a subroutine that is called automatically when an object ceases to exist. Such subroutines are usually called destructors, and are used to undo any side-effects caused by the previous existence of an object. That may include:

  • deallocating related memory (although in Perl that's almost never necessary since reference counting usually takes care of it for you);
  • closing file or directory handles stored in the object;
  • closing pipes to other processes;
  • closing databases used by the object;
  • updating class-wide information;
  • anything else that the object should do before it ceases to exist (such as logging the fact of its own demise, or storing its data away to provide persistence, etc.)
In Perl, you can set up a destructor for a class by defining a subroutine named DESTROY in the class's package. Any such subroutine is automatically called on an object of that class, just before that object's memory is reclaimed. Typically, this happens when the last variable holding a reference to the object goes out of scope, or has another value assigned to it.

For example, you could provide a destructor for the Bug class like this:

package Bug;
# other stuff as before
sub DESTROY
{
        my ($self) = @_;
        print "<< Squashed the bug: $self->{id} >>\n\n";
}
Now, every time an object of class Bug is about to cease to exist, that object will automatically have its DESTROY method called, which will print an epitaph for the object. For example, the following code:
package main;
use Bug;

open BUGDATA, "Bug.dat" or die "Couldn't find Bug data";

while (<BUGDATA>)
{
    my @data = split ',', $_;       # extract comma-separated Bug data
    my $bug = Bug->new(@data);      # create a new Bug object
    $bug->print_me();               # print it out
} 

print "(end of list)\n";
prints out something like this:
ID: HW000761
"Cup holder" broken
Note: problem is fatal
<< Squashed the bug HW000761 >>

ID: SW000214
Word processor trashing disk after 20 saves.
<< Squashed the bug SW000214 >> 

ID: OS000633
Can't change background colour (blue) on blue screen of death.
<< Squashed the bug OS000633 >> 

(end of list)
That's because, at the end of each iteration of the while loop, the lexical variable $bug goes out of scope, taking with it the only reference to the Bug object created earlier in the same loop. That object's reference count immediately becomes zero and, because it was blessed, the corresponding DESTROY method (i.e. Bug::DESTROY) is automatically called on the object.

Where to from here?

Of course, these fundamental techniques only scratch the surface of object-oriented programming in Perl. Simple hash-based classes with methods, constructors, and destructors may be enough to let you solve real problems in Perl, but there's a vast array of powerful and labor-saving techniques you can add to those basic components: autoloaded methods, class methods and class attributes, inheritance and multiple inheritance, polymorphism, multiple dispatch, enforced encapsulation, operator overloading, tied objects, genericity, and persistence.

Perl's standard documentation includes plenty of good material—perlref, perlreftut, perlobj, perltoot, perltootc, and perlbot to get you started. But if you're looking for a comprehensive tutorial on everything you need to know, you may also like to consider my new book, Object Oriented Perl, from which this article has been adapted.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en