April 2005 Archives

People Behind Perl: brian d foy

brian d foy is a longtime leader in the Perl community. Besides founding the Perl Mongers and being a trainer for Stonehenge Consulting Services, he founded and edits The Perl Review, a quarterly magazine for Perl users. If that weren't enough, he writes and contributes to several CPAN modules. Recently, Perl.com interviewed brian on his work and plans.

Can you give us a brief professional biography?

brian d foy: I started out studying nuclear physics, and I started using Perl to extract simulation results out of huge text reports produced by legacy systems. The researchers were doing it with highlighter pens before I wrote them some programs to turn a three week task into a 15 minute one. Around that time, the dot.com bubble was rapidly expanding, and I got sucked in to that. I worked for a couple of ventures in New York that never really went anywhere. Around 1998, Randal Schwartz hooked me into working with Stonehenge Consulting Services as a Perl trainer, which I've been doing ever since. Along the way I've done a lot of work in the Perl community, with technical and non-technical contributions.

Had you known Perl previously, or did you learn it for this?

bdf: I picked up Perl sometime around the beginning of graduate school. I wasn't a programmer, and had only gotten my first computer about a year before that (previously using school computers for anything I needed). At the start I just wanted to understand these weird characters that people were using in messages on a BBS I read. It turned out that they were Perl regular expressions. I bought Programming Perl, then Learning Perl in the same week. (I still have the receipts, oddly: Learning Perl receipt and Programming Perl receipt.) And I was on my way.

Stonehenge has its headquarters in Portland, Oregon, and you live in Chicago. Do they send you out to East Coast gigs? How does that work?

bdf: When I started with Stonehenge, I was living in New York, and we actually had a lot of business in New York. We also had a lot of business in the usual places like Silicon Valley, Research Triangle, and other tech centers. Teaching assignments are handed out more on availability than location, although each of the Stonehenge instructors has a favorite part of the country to visit, so we sometimes trade assignments to make it work out. Lately I've been taking the New England area assignments.

How'd you come up with the idea for "The Perl Review"?

bdf: It really had nothing to do with Perl. I had just left a dot.com company, and I was dating an opera singer. We moved to Chicago because she had a long-term engagement in Chicago. I went on several interviews, and being the dummy that I am, I was honest about my chances of staying in the Chicago area. I found Chicago much more hostile to job mobility than New York, and I wasn't going to get a job unless I committed the rest of my working life to it. I couldn't do that and follow my wife around the world as she pursued her career. I figured, she has to actually be where she wants to sing, whereas I could live in a virtual office from anywhere that had an internet connection. It's especially nice now that a lot of hotels have wireless connectivity.

I needed something I could do from a hotel room, literally. I published most of the first print issue of "The Perl Review" from a hotel during a month-long production of The Pirates of Penzance.

Can we expect to see small discounts for orphans then? (Just kidding.)

bdf: Yes, but only until their 21st birthday.

I looked around to see what was missing from the Perl community. "The Perl Journal" had disappeared almost completely, and "Perl Month", an online Perl magazine, had stopped publishing new stuff. No one seemed to be publishing much good stuff about Perl. A lot of magazines that had had some Perl content had disappeared in the dot.com shakeout too. I also realized that many of the people still writing about Perl had been writing about Perl for a while, and the community didn't have too many ways to develop new writing talent.

When I asked around to gauge the interest, no one spoke any words of doom or misery, so I just kept going with the idea.

Is that the key to success in Perl, or do you think it worked out only in your particular situation?

bdf: Well, I had made a lot of friends through my work with Perl Mongers, so I was able to talk to a lot of people and get their support because they knew me. I also had a track record in the Perl community, so I think my personal reputation gave the project a bit of a boost. If no one knew who I was, I think things would have been a lot tougher.

What's "Extreme Publishing" and how is that working out for you?

bdf: I started calling the process we used for "The Perl Review" Extreme Publishing as a bit of a joke on Extreme Programming. I had this idea that we'd be able to publish in short iterations with fast turn-around, and build a set of tools to automate the process.

On the technical side, that worked nicely. I could take a text file (usually in POD) from an author, run it through a converter to get LaTeX, then typeset the whole thing and turn it into a PDF file. If I changed some text, I only had to type a few characters to get a new PDF file. Everything was text-based and in source control, just how a good Perl programmer likes it.

On the aesthetic side, it wasn't so great. I had put the technical details before the creative ones, and no one had said anything about it because everyone working on it was a techie, and those were the bits that we liked to play with. Those weren't the hard parts of the process, and they weren't the parts that needed the most time. The process I had set up really restricted the aesthetic side. Publishing is not programming.

The real process in creating something better than an author can do on his own (for instance, putting something on his website with no help from us) doesn't involve many technical considerations: the value of a magazine or book is editorial help that develops and matures the content. All of that happens before we get to the point where we want to create the final output, and it's the part of the process that takes the longest. By paying attention only to the last part of the process like we were, and then only the technical details of it, we had fallen into the trap of premature optimization.

After we printed our first issue, I switched from LaTeX to Adobe InDesign because we could produce better output faster. LaTeX is very difficult to wrangle if you want to precisely control the placement of a lot of things: you don't want the last line of a paragraph at the top of a column or on the next page all by itself, or several hyphenated lines in a row. You can control all that in LaTeX, but you have to edit the input files, process them, and see what you get. You have to do that every time you want to make a change. We don't have that problem with InDesign. Not only that, a much larger work pool becomes available. A good designer will probably know how to really use InDesign or Quark but know nothing of LaTeX, while the situation is reversed for a good technical person.

Although we still prefer POD from our authors, I've backed off of that requirement too. The more restrictive we are with submissions, the less motivated the authors are to do the work. Now we let them submit their articles in any format they want; they were doing that anyway. We're here to publish good content, so the content should be first and the technical details later. If we have to do a couple extra minutes worth of work to convert a Microsoft Word document into another format, that isn't going to kill us.

What do you look for from your authors editorially? Do you put together a theme for an issue and look for contributions there, or do you try to find a mix of solicited and unsolicited articles?

bdf: The themes come from what we get. I tried to come up with themes, but it just never worked out. I go after new authors rather than the same ones people see in every other magazine. My friend Randal Schwartz writes a column for just about any magazine that has Perl in it, and he'd be a popular author for "The Perl Review", but I'd like to develop new talent too.

Most of the articles we publish now are solicited. One of the editors will see something cool, like a usenet post, new module registration, or blog entry, and we'll get the author to expand that into a story.

How do you motivate authors to hit deadlines? Seriously, I'd like to know.

bdf: Deadlines are tough for part-time (or one time) authors. I tell the authors not to worry about the time. Whoever gets their articles in first gets into the next issue. If we want a particular article, we'll just keep asking for it, or asking for rough drafts, or even outlines.

We have a big stack of articles waiting for space in the magazine, though. At first, when we were only a PDF file, every article could make it into the magazine because we didn't have to commit to a page count. That also meant we started from scratch for the next one. Now that we have to commit to a certain number of pages to coordinate the designers, printers, and everyone else, we have articles left over after each cycle. Some are longer than the space we have and we don't want to cut them, and some we want to give a more prominent position.

I don't know if I've discovered any special secret, or really been any more successful than anyone else regarding deadlines. If you see a short article by me in the magazine, that probably means something that was supposed to take up that space didn't get done in time.

How do you keep track of the dozens of little details? Sticky notes? Database? Notepads?

bdf: My personal to-do list is mostly a mail folder. Anything in that folder needs attention. Other than that, I have a several whiteboards in my office. Around production time, every page in the next issue turns into a Post-it note, and those notes go onto my big corkboard. As things change, like an article going from 6 to 8 pages perhaps, I move around all the Post-its.

For long-term article tracking, I have a mysql database and web application that tracks each submission. I know when I received it, what stage it's in, and who's worked on it. When it comes time to pick the articles we want to focus on, I look at those that have gone through most of the steps (such as technical reviews, re-writes, and peer review). We then concentrate on those articles.

I don't think I've found the best way to do that, though (and I've been meaning to ask you about Jellybean). In the end, I don't think it's a technical problem. I think the real solution is going to be something that an editor uses to collect and view information. It's not going to be something that automatically collects things because the value of the process is in the editor and what that person knows. I've found that that sort of information is not self-organizing, and that we really need one person who takes the article from start to finish. If we try to cut the human out of that process by automating things, we're going to lose out on the creativity and writing side. Whatever the right answer is, I think it's going to be something that helps rather than enforces.

I've noticed that you're now shipping paper copies to Europe. How has the process of collecting addresses, billing, printing, and mailing worked out?

bdf: I'm continuing to learn a lot, and the main thing I'm learning is that a lot of the service industries are still protecting their knowledge so they can get me to pay thousands of dollars for them to handle all that for me. It's not at all like the open source world.

Billing is pretty easy, although international credit cards fall into a higher risk category in the bank's calculus of fraud. Sometimes the bank flags a transaction as suspicious because the bank on the other side doesn't give it the right secret handshake. It's better to be cautious, though.

We haven't started printing in Europe yet, but that's a big part of the plan. Some people have asked about printing on A4 paper, but that would be a new layout. Whatever we do, it will be the same size wherever we print as long as we're using the same content in each edition.

Mailing is most of the excitement. Every country has their own format and includes different things in the address. I'm used to dealing with ASCII, but other languages (hence country, city, and street names) use lots of other characters. I'm learning about all the stuff I've been avoiding. People want to see everything in the right place and spelled correctly (and with the right characters) on the address labels. I'm still learning a lot, and plenty of people are being sufficiently patient with me to help. The information to make it all work is often closely guarded by service bureaus, but I've been learning to use the post office websites from all over the world.

What's one question no interviewer has ever asked you that you think someone should?

bdf: As for "The Perl Review", no one has asked me "Are you a masochist?" It's a big undertaking that takes a lot of work and I started with no experience in the publishing business. It's a business where the costs are high, the margins low, and there is an ongoing customer service commitment. I'm not in this to make money, though, so I think I'm safe.

What's next for "The Perl Review"?

bdf: The big goal for the first year is to just keep going. So far, we've paid all of our bills on time, and we've been getting a lot of subscribers. At the end of the first year, I hope enough people think we've done a good enough job that they want to renew.

This Week in Perl 6, April 12 - 19, 2005

All~

Sadly, a slip of the mouse caused me to delete a partially completed summary, so I am going to push ahead on the rewrite without a witty intro. Feel free to make one up for yourself involving stuffed animals, musicians, and dinner.

Perl 6 Compilers

Pugs 6.2.0

Autrijus released Pugs 6.2.0, marking the first major milestone for Pugs. This includes most of the control flow primitives of Perl 6 and is a testament to the solid work that all of the "lambdacamels" have put in.

CGI.pm and Multibyte Characters

BÁRTHÁZI András had trouble encoding and decoding multibyte characters in CGI.pm. This led to a general discussion of how to avoid such characters in URLs as well as when to call chr.

Auto Currying?

Matthew D. Swank wondered if he really needed an extra set of parens to call a function generator and its generated function simultaneously. Autrijus told him that yes he did as Perl 6 is not quite Haskell yet.

Case Insensitive P5 Regex

BÁRTHÁZI András wanted to use the :i switch on P5 regexes. Autrijus implemented it, but Larry noticed that this introduced a flag ordering dependency. As a result, the new way to supply flags to a Perl 5 regex is rx:P5<imsxg>/.../.

Cookbook Ettiquette

Marcus Adair wondered if there were rules of etiquette he should obey when writing examples for the Perl 6 Cookbook. In particular, should examples run and be only one file? Ovid suggested that one file was a good idea, but was open to contrary arguments.

Austrian Parrot/Pugs Hackathon

Thomas Klausner announced that on June 9-10 in Vienna, Austria there would be a Hackathon featuring the collective might of Autrijus, Chip, Leo, and more. When that much brain power gets together only two things can happen: much hacking and much drinking.

Encoding Illegal Byte Sequences in Strings

BÁRTHÁZI András wanted to know if he could encode an illegal byte sequence in a string. Much discussion ensued, but Larry promised that it would be possible.

Test::TAP

Yuval Kogman announced the release of two new modules to CPAN that provide Pugs smoke HTML.

Quoting Constructs

Roie Marianer noticed that Pugs was missing some quoting constructs. He implemented them. This led to discussion of interpolation and corner cases. As usual, Larry provided both answers and questions. Roie produced a patch which Autrijus applied.

Code Block as Argument

Stevan Little found some bugs with passing a code block to a function in Pugs. Warnock applies.

push, pop, shift, and unshift on Infinite Lists

Stevan Little has been playing with push, pop, shift, and unshift on infinite lists. He thinks he has found a bug, although maybe he just hasn't let it run long enough. Larry provided answers regarding the correct semantics.

cd Issue in Makefile

Jonathan Worthington noticed a Win32 issue in the Makefile. He can point to the offending line in the autogenerated Makefile, but that is not where the fix belongs. Warnock applies.

Hyperoperator Tests

David Christensen provided a patch for hyperoperators. Unfortunately, character set transcodings ate his patch.

shift Oddity

Stevan Little noticed that shift did not act like pop. Larry noted that the examples were not semantically valid, but even so, Pugs should not freeze.

Pugs SEGV

Aaron Sherman managed to make Pugs segfault. Autrijus thinks someone might already have fixed it.

Parrot

Dynamic Perl 2

William Coleda provide the second of his patches to move Perl*PMC out of the core. Leo applied it.

SVN Revision in Bug Reports

Jens Rieks reported a difficult-to-reproduce bug. This caused Leo to pine for having the SVN revision in the bug report. Brent "Dax" Royal-Gordon commented that this was a good idea. Jens Rieks offered to implement it.

Win32 SDL

Jerry Gay tried to get SDL working on Windows. There was some give and take, but in the end, it works.

-l/path/to/icu

Andy Dougherty provided a patch making Configure.pl provide a link flag to ICU headers if provided. Jens Rieks applied it.

Svk and SVN issues

Roger Hale noticed a small problem with parrotcode.org. Robert Spier fixed it.

nci.t Failure Under MinGW

François Perrad fixed a MinGW test failure. Leo applied the patch.

Trailing Space with ${LD_OUT}

Andy Dougherty fixed an old bug with LD_OUT having trailing space. Leo applied the patch.

Warnings Cleaning

Jerry Gay cleaned some warnings from the source tree. Leo applied most of the patch.

Win32 Path Nit

Philip Taylor fixed a small Win32 path issue. Leo applied the patch.

cmp Op Bug

Leo found a bug in cmp_p_i_ic opcodes.

SDL Uninitialized Variable

Nick Glencross provided a patch fixing an uninitialized variable in SDL. Leo applied the patch.

PerlScalar Morph Bug

Nicholas Clark found a bug in morph for PerlScalar. Leo verified that it was a bug. Nick Glencross offered to fix it.

Infix Method Change

Leo threatened to continue with his plan to simplify infix methods. No one objected.

Used Before Set Warning?

Nick Glencross wanted a warning for using unset variables in imcc. Leo pointed out that this was not as simple as one might like.

Remove Temp Files for Win32 make clean

Jerry Gay provided a patch removing more files under make clean in Win32. Warnock applies.

Fix Typos

Nick Glencross provided a patch which fixes some typos in docs. chromatic applied it with a few extra tweaks.

Parrot Security

BÁRTHÁZI András wondered about the general security mechanisms that Parrot would provide. Dan assured him that security would be a fundamental part of Parrot. He also provided a sketch of the Parrot security model which sparked some discussion.

Debian ARM Failure

Falls Huang reported a build failure on Debian-ARM. Leo provided a pointer in the right direction.

Missing Make Target

François Perrad notice that make src/revision.c couldn't handle .svn/entries. Jens Rieks fixed the problem.

JIT Generation Help

Adam Preble put out a call for some general advice on understanding Parrots JIT. Leo provided some general advice.

"Attribute not found" Exception

Cory Spencer provide a patch changing getattribute to throw a real exception. Jens applied the patch with a few tweaks.

stderr during bc Configure Step

Jerry Gay provided a patch to suppress stderr during the bc configure step. There was some debate on how to make this cross platform. I don't think there was a resolution.

string.c Segfault

Nick Glencross provided a patch fixing a few segfaults in string.c. Jens applied it.

MSWin32 ICU Linkage

Ron Blaschke added ICU to the linkage for Win32. Jens applied the patch.

Win32 Readme Updates

Ron Blaschke updated the README.win32 document. chromatic applied the patch.

Discussion Before the Patch

Big Changes Afoot

Our pumpking Leo has some big changes underway and asked someone else to man the pumps for a little while. Jens volunteered to be someone.

More Registers Make Dan's Code less Unhappy

Dan has some very ugly generated code. It takes a LONG time to compile. Leo sped it up by giving Parrot more registers.

.const Weirdness

Nick Glencross found some weirdness with .const in IMCC. Warnock applies.

PMC Help

Bloves hoped for a pointer on PMC writing. Leo provided a helpful pointer.

&& in Commands is Not Cross Platform

Jens noticed that && in commands causes problems on some platforms. Michael G. Schwern fixed it and Jens applied the patch.

MMD Migration

Leo continued his slow but steady migration to a more MMD-like world.

Make Config Info Available at Runtime

Leo wants to have useful config information available at runtime. Steven Philip Schubiger offered to try.

Remove Old Files

Leo opened a ticket for removing some outdated files.

Small Spelling Errors

Steven Philip Schubiger provided a patch fixing some small spelling errors. He worried that perhaps he picked nits needlessly. I don't think so, but Warnock applies.

Win32 ICU Error

François Perrad fixed a small mistake in the naming of icudt.lib. chromatic applied the patch.

Drunken Parrot

Cory Spencer has succeed in making LISP run on Parrot and uncovered a few GC bugs in the process, impressing everyone.

Python on Parrot

Kevin Tew wondered about the state of Pyrate. Sam Ruby provided a general explanation.

C => Parrot Compiler

Philip Taylor posted a few questions about Parrot for help with Carrot, his C to Parrot Compiler. Leo and Chip provided a few answers.

Perl 6 Language

Yet Another Perl Conference, North America

Gerard Lim announced YAPC::NA with much information and useful links.

Subscripts as Objects

Yuval Kogman threw out the idea of using subscripts as objects. Larry worried that this would hurt speed a little too much.

Statement Modifiers and Scopes

Paul Seamons posted some examples involving local scopes and statement modifiers. Larry decided that only curlies will delimit scopes, so as not to surprise too many people.

Whitespace in heredocs

Juerd posted a question from the p6cookbook asking about spaces v. tabs. Larry took a guess as to the context and pointed to A2 for info.

&nbsp; in \s, <?ws>, <>

Juerd wondered what sort of character classes matched nonbreaking spaces. Larry replied that they did, but postulated a <bws> class for breaking whitespace.

trim() and words()

Marcus Adair wondered about trim and words and if they actually existed. It seems that trim will exist although words might be <<$string>>.

<[]> Ugly and Hard to Type

Some people are complaining that character classes are difficult to type. The design team considers this a feature as character classes do not handle internationalization well.

Comparing Floats with Fudge

Marcus Adair wants an easy way to compare floats with a fudge factor. Larry seemed to feel that ~~ could use more DWIMery.

$*CWD vs cwd() and chdir()

Michael G. Schwern wanted a simple tied variable interface to $*CWD. This is apparently a sensitive topic. Much discussion ensued of changing directory in bizarre circumstances. It looks like Michael's suggestion will not reach the core, although it looks like a fairly simple module.

Junction Precedence Error

Brad Bowman noticed an error in some examples involving junction precedence. Larry confirmed the error. Patrick R. Michaud fixed it.

Spelling Mistake in A06

Steven Philip Schubiger found a spelling error in A06. Patrick R. Michaud applied the patch.

<[a-z]> to Become <[a..z]>

Larry decided that the range operator in character classes should change to ... Much discussion ensued. I like it.

Tainted Variables

BÁRTHÁZI András wondered if he could mark variables as tainted. Luke Palmer showed him a way.

Re-declarable temp Variables

Aaron Sherman wants a way to re-declare variables without warnings. He suggested temp. Larry suggested ok to turn off a warning, but doesn't think Aaron's feature is really necessary. There's no official ruling yet, I think.

Hyper Slices

David Christensen wants to use hyper slices as a convenient way of dealing with multidimensional data structures. Luke Palmer showed him how.

Hyperoperator Corner Case

David Christensen wondered how hyperoperators know what to pad with when one side is not long enough. The answer appears to be slightly ill-defined magic, especially when considering subtraction or division.

Fine Granularity Sleeping/Events

Gaal Yahas wants to have an alarm function that takes a double (for systems with sub-second timing promises). He also wants a version that takes a callback (possibly called later). Warnock applied.

Junctions in Subscripts

David Christensen wants junctions in subscripts to autothread. Luke Palmer commented that they might. I think they do.

Quote Operators and Interpolation

Roie Marianer had a few questions about how interpolation and quote operators would work. Larry and Juerd provided some answers. Larry's short version: "We pretend we're a top-down parser even if we aren't".

Junctions with Adverbs

David Christensen wants to use adverbs to supply exceptions to junctions. Luke Palmer told him that it doesn't work that way.

++ Evaluation Order

Lam Fayland found an oddity in Pugs evaluation order for ++. Warnock applies.

Statement Modifiers for Setting Variables

Dave Whipp wants to use statement modifiers to restrict the scope for variables in his print statements. Larry provided a different technique.

Tie Hashes

Ingo Blechschmidt wondered what syntax to use for tying hashes. Larry began to muse aloud.

The Usual Footer

Posting via the Google Groups interface does not work. To post to any of these mailing lists please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to ubermatt@gmail.com.

Automating Windows Applications with Win32::OLE

Getting My Feet Wet

My first glimpse of the Internet happened at Lotus Development somewhere around 1994. A Mosaic demonstration duly impressed me. A few months later, working at a small Cambridge, MA company called Dataware brought me closer to the online revolution.

I landed a job at America Online in 1995 working for their browser team. They originally had their own browser, which they had purchased from CMGI. It was kind of cool at the time because it included a tabbed frames feature about eight years before the Gecko engine.

It was there that Pete Deschanes wrote a tool using Microsoft Visual Test to automate the America Online embedded web browser. I was especially interested in certain functions he was using that captured browser events.

Later, I moved on to a company in Chelmsford, MA that was using OLE Automation to drive the translation of Microsoft Word Documents into Fax documents. Rob Murtha explained a lot to me about OLE Automation and introduced me to Perl and Java.

Washed Up

The internet bubble burst for me in 2002 and I wound up stranded on the shores of Fidelity Investments as a manual tester for one of their many investment Web page groups. After a few weeks of manual testing, I was ready to automate several of my tasks.

There was a problem. There were plenty of available Winrunner licenses, but I was not allowed to use them because I was not part of their automation group. In fact, I was reprimanded and almost lost my job for being too persistent in asking to use one. Then I decided to write my own automation tool.

I had experience with C, C++, Java, and Perl. I decided to start with a scripting language just to get a prototype going. I also thought it would be cool to have a real open source script language to write code with instead of something that a few engineers had developed exclusively for Web automation.

My process consisted of:

  • Ask a question.
  • Do some research.
  • Write some code.

Getting Started

The first thing I needed to do was to see if I could start IExplore.exe from Perl. I knew I did not want to simply start the process with system("C:\\Program Files\\Internet Explorer\\IExplore.exe");. Yes, I could get IE up and running that way, but I would not be able to do anything useful with it except to kill it.

I noticed that Active Perl contained an interesting module called Win32::OLE. I opened the OLE.pm file and began to read the comments.

Comments like this looked very promising:

This module provides an interface to OLE Automation from Perl. OLE Automation brings VisualBasic-like scripting capabilities and offers powerful extensibility and the ability to control many Win32 applications from Perl scripts.

This one also looked pretty good:

The MessageLoop() class method will run a standard Windows message loop, dispatching messages until the QuitMessageLoop() class method is called. It is used to wait for OLE events.

Keeping in mind that faith is the substance of things hoped for and the evidence of things not seen, I set about to write a Simple Automation Module for Internet Explorer using Active Perl's Win32::OLE.

Going back to the Win32::OLE documentation I found out how to start IE through the COM object. I translated some examples for Excel and Word and wound up with this:

$IE = Win32::OLE->new("InternetExplorer.Application")
	|| die "Could not start Internet Explorer.Application\n";

That was nice, but nothing appeared on my computer screen. I could hear the hard drive making a sound like it was starting an application but I couldn't see Internet Explorer. I decided to Google for some examples. The information out there was very sparse but I found something that set the visible attribute to 1:

$IE->{visible} = 1;

This time Internet Explorer appeared with a blank screen. It was a start. I figured that there were a bunch of IE processes that I could not see on my machine from my previous efforts, so I killed those using the task manager.

Next, I started a free Microsoft tool called OLEVIEW.exe. This gave me a tree view of all automation objects registered on my machine. There were hundreds of them. I found the one called Internet Explorer (Ver. 1.0) and expanded the tree looking for methods. IWebBrowser2 looked interesting so I clicked on that and selected the View Type Info button. Out popped up a new window with a list of methods. This was looking better all the time.

I clicked on a method called Navigate and saw:

[id(0x00000068), helpstring("Navigates to a URL or file.")].
void Navigate(
    [in] BSTR URL,
    [in, optional] VARIANT* Flags,
    [in, optional] VARIANT* TargetFrameName,
    [in, optional] VARIANT* PostData,
    [in, optional] VARIANT* Headers);

I decided to try it:

$IE->Navigate("http://www.google.com");

Now I had navigated to my first website with my new automation tool.

On the Road

I knew I was getting close to hitting the wall. I had seen through Google and newsgroups that lots of folks had reached this point in the game and turned around. Remembering back to Pete Deschanes' Visual Test Tool, I was certain that I could make more progress if I could capture Internet Explorer's events. I went back to the OLE.pm documentation to do a little more reading about events.

=item Win32::OLE->WithEvents(OBJECT[, HANDLER[, INTERFACE]])

This class method enables and disables the firing of events by
the specified OBJECT.

If I could just grok what OBJECT, HANDLER, and INTERFACE represented, I felt that I could get my events. I made some guesses.

The Object: that would be what Win32::OLE->new() returned. Everyone knows you instantiate an object with the new operator.

The Handler: I read further through OLE.pm:

The HANDLER argument to Win32::OLE->WithEvents() can either be a CODE reference or a package name. In the first case, all events will invoke this particular function. The first two arguments to this function will be the OBJECT itself and the name of the event. The remaining arguments will be event-specific.

Win32::OLE->WithEvents($Obj, \&Event);

Now I understood that WithEvents was going to tell Internet Explorer to call my Perl Handler whenever IE fired an event. I had to give the WithEvents call a reference to my subroutine like this:

\&Event

What was the Interface name going to be? I went back to OLEVIEWER and looked through the interface folder. It looked like DwebBrowserEvents2 would deliver what I wanted.

This is what I came up with:

Win32::OLE->WithEvents($IE,\&Event,"DWebBrowserEvents2");

Now all I needed was to write an Event subroutine for IE to call. The OLE.pm comments told me that "the first two arguments to this function will be the OBJECT itself and the name of the event", so I just used the document example.

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print "Event triggered: '$Event'\n";
}

I noticed that there was a third argument called @Args and assumed that this was there to catch all other unknown parameters for each event.

How was I going to block my code to wait for the events to transpire? I returned to the Win32::OLE comments:

The MessageLoop() class method will run a standard Windows message loop, dispatching messages until the QuitMessageLoop() class method is called. It is used to wait for OLE events.

I added this code, just to see if I could capture IE events:

Win32::OLE->MessageLoop();

I've Been to the Mountain Top

It was quite a moment when I saw these events pouring out of my nine lines of code:

Event triggered: CommandStateChange
Event triggered: OnVisible
Event triggered: PropertyChange
Event triggered: BeforeNavigate2
Event triggered: DownloadBegin
Event triggered: StatusTextChange
Event triggered: ProgressChange
Event triggered: FileDownload
Event triggered: DownloadComplete
Event triggered: TitleChange
Event triggered: NavigateComplete2
Event triggered: OnQuit

Internet Explorer was firing off events and its COM object was calling my Perl Event subroutine.

From here, I kept searching the newsgroups. I found one sentence where someone mentioned acquiring the DOM from the DocumentComplete Event. I knew this was a key, but how could I take this reference using Perl?

I read up about the DOM. I borrowed a book from someone at work. Microsoft has their own version of the DOM called DHTML and I came across their Web page. After reading this documentation for a while, I saw that the DOM could give me everything I needed to have a full-blown automation tool. All I needed was the reference.

Finding the DOM

I broke up my Event subroutine into pieces. I wanted to do something different for each triggered event. Specifically I wanted to try to take a reference to the DOM when the DocumentComplete event triggered. My idea was that if I shifted out the first element of the @Args array, I would find the reference I was looking for. I rewrote the Event subroutine:

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print " Event triggered: $Event\n";
    if ($Event eq "DocumentComplete") {
        $IEObject = shift @Args;
        print "Here is my reference: $IEObject\n";
    }
}

This printed out:

Here is the Event: DocumentComplete
Here is my reference: Win32::OLE=HASH(0x1a524fc)

This was looking better all the time.

I had a reference, but was it to the DHTML on the page that had just loaded? There was only one way to find out: could I use it to make a DHTML call? I looked on the Microsoft DHTML reference page for a property that would tell me I had a reference. URL looked like a good one, so I tried this code:

print "URL: " . $IEObject->URL . "\n";

That gave me nothing. I went back to the OLEVIEWER and found this interesting method

[id(0x000000cb), propget, helpstring("Returns the active 
   Document automation object, if any.")]
IDispatch* Document();

The Active document sounded good, so I tried:

print "URL: " . $IEObject->Document->URL . "\n";

This gave me:

URL: http://www.google.com/

Golden! My next step was to find a way to break out of the MessageLoop() I was in.

Win32::OLE->QuitMessageLoop();

Going On Home

My last step was to do something more useful with my reference to the DHTML. It was time to write a subroutine that would enter text into an edit box. That would seal my proof of concept.

Other automation tools need you to make a GUI map (Mercury Winrunner) or an include file (Segue Silk) of every page you view before running the automation. I wanted something that would just look through the code that was already on the page and pick out a control on the fly using the power of regular expressions.

I found all of the methods and properties in the following example in the Microsoft DHTML API documentation. First I needed a name for my subroutine. SetEditBox seemed easy to understand.

sub SetEditBox {
}

Next I need to pass in two parameters. The first would be the name of the control and the second would set the text.

sub SetEditBox {
    my ($name,$value) = @_;

I had to start with the document object from my reference to the DOM:

sub SetEditBox {
    my ($name,$value) = @_;
    $IEDocument = $IEObject->{Document};

To save time iterating, I made the assumption that edit boxes would only appear inside forms. I used the collection called forms to return all forms on the page.

$forms = $IEDocument->forms;

Now it was time to iterate.

for ($i = 0; $i < $forms->length; $i++) {
}

First I needed each item in the form collection.

$form = $forms->item($i);

Inside this iteration I wanted to find a specific element of the form with the name of the edit box.

if (defined($form->elements($name))) {
}

Inside this if statement I wanted to set the value of the edit box to the value passed into the subroutine.

$form->elements($name)->{value} = $value;

Then it was time to get out of Dodge so I wouldn't waste time continuing the iteration.

return;

Here is the final initial subroutine.

sub SetEditBox {
    my ($name, $value) = @_;
    my $IEDocument     = $IEObject->{Document};
    my $forms          = $IEDocument->forms;

    for (my $i = 0; $i < $forms->length; $i++) {
        my $form       = $forms->item($i);
        if (defined($form->elements($name))) {
           $form->elements($name)->{value} = $value;
        }
        return;
    }
}

Here is something very similar to the first original proof of concept version of SAMIE:

use Win32::OLE qw(EVENTS);
my $URL = "http://samie.sf.net/simpleform.html";
my $IE  = Win32::OLE->new("InternetExplorer.Application")
    || die "Could not start Internet Explorer.Application\n";
Win32::OLE->WithEvents($IE,\&Event,"DWebBrowserEvents2");

$IE->{visible} = 1;

$IE->Navigate($URL);

Win32::OLE->MessageLoop();
SetEditBox("name","samie");

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print "Here is the Event: $Event\n";
    if ($Event eq "DocumentComplete") {
        $IEObject = shift @Args;
        print "Here is my reference: $IEObject\n";
        print "URL: " .  $IEObject->Document->URL . "\n";
            Win32::OLE->QuitMessageLoop();
    }
}

sub SetEditBox {
    my ($name, $value) = @_;
    my $IEDocument     = $IEObject->{Document};
    my $forms          = $IEDocument->forms;

    for (my $i = 0; $i < $forms->length; $i++) {
        my $form       = $forms->item($i);
        if (defined($form->elements($name))) {
           $form->elements($name)->{value} = $value;
        }
        return;
    }
}

It made me laugh and tip my hat to Larry Wall, to think that I had the basic proof of concept of a $3,000 dollar per seat automation tool with about thirty lines of Perl. See more at the SAMIE home page.

This Week in Perl 6, April 4-11, 2005

Whoa! Deja vu! Where'd Matt go?

Don't worry, Matt's still writing summaries. As you may have noticed, Matt's been writing summaries every two weeks. Now so am I. Because we love you, we've decided to arrange things so I write summaries in the weeks when Matt doesn't. We could do it the other way, but that some could see as self-defeating. Heck, when I say "some" I probably mean "all."

So bear with me while I remember how to type all those accented characters and get back into the swing of writing these things (and of reading everything in the mailing lists once more--someone should write a summary for us summarizers).

I'll be sticking to my old "lists in alphabetical order" scheme of writing summaries. Let's get going.

This Week in perl6-compiler

Array of Arrays, Hash of Hashes, Elems, Last

Lev Selector asked for confirmation that Pugs didn't support compound data structures, @ar.elems, or @ar.last. Autrijus and others confirmed that they didn't then, but they do now.

MakeMaker6 Stalls on Takeoff

Darren Duncan pointed out that while last week's summary had claimed he was working on implementing MakeMaker in Perl 6, this is, sadly, not the case. He reckoned he'd possibly look into it again when he had tuits and Pugs was more complete (supporting objects, for instance).

Declaration Oddness

Roie Marianer pointed out what looks like some weirdness in Pugs' parsing of lexically scoped subroutines. Warnock applies.

Toronto Pugs Hackathon

John Macdonald asked for people who wanted to come to the YAPC::NA Pugs hackathon to get in touch with him beforehand, as there is limited space. If you're interested, drop him a line.

Pugs Slice Oddities

Andrew Savige noticed some weirdness in Pugs's slicing behavior. He posted some example code showing the problem. Autrijus agreed that there was a problem and explained that he was in the process of rewriting all the variable types, symbol tables, and casting rules to agree with the Perl 5 model as described in perltie.pod. The rewrite is currently failing tests, so he posted a patch for people who want to play. On Sunday, he bit the bullet and committed the entire 2500-line patch which "touches pretty much all evaluator code."

Meanwhile, in perl6-internals

Tcl, Unicode

William Coleda has been trying to add Unicode support to his Tcl implementation and he fell across issues with missing methods in charset/unicode.h. Leo waved a magic wand and checked in an implementation that he fenced around with disclaimers.

The Status of Ponie

Nicholas Clark confessed that Ponie had stalled for some time, but sweetened the pill by announcing that it's about to restart and that he will be able to allocate at least one day a week to the project. He pointed people at the Ponie intro/roadmap that breaks down the required tasks between here and a first release, complete with time estimates. If you're interested in getting Ponie to a ridable state, this is a good place to start.

People were pleased.

Monthly Release Schedule

Chip donned his "Fearless Leader" hat and announced that Parrot will move to a monthly release schedule (with an initial three-week "month" to get things into sync). There was some debate about whether Solaris/SPARC should be one of the officially required monthly release platforms (Darwin, linux-x86-gcc3.*, and win32-ms-cl were Chip's initial blessed three). This morphed into a discussion of Tinderbox; apparently there are cool things happening behind the scenes.

Calling Convention Abstraction

What do you know? You go away for n months and when you come back people are still talking about calling conventions.

Dynamic Perl, Part 1

William Coleda announced that he was starting work on removing the core's dependence on Perl* PMCs in favor of using language-agnostic PMCs internally and loading the Perl ones dynamically as required. He dealt with everything but PerlArray quickly and the list discussed names and ways forward with that tricky case. It looks like someone with tuits needs to add and write tests for some vtable methods for ResizablePMCArray.

Subversion

Another discussion that wouldn't go away back when I was last writing summaries has come to a head. Parrot's finally migrating from CVS to Subversion. By the time you read this, Parrot's main repository should be at svn.perl.org/parrot. Hurrah!

There were, of course, wrinkles to iron out.

The imcc/ Subdirectory

Matt Diephouse wondered if, now that IMCC has been integrated with Parrot, we really need the imcc/ subdirectory. He suggested that maybe its contents should be distributed about the rest of the Parrot directory structure. MrJoltCola (Melvin Smith?) thought it was best kept separate and thought of as a front end. Bernhard Schmalhofer pointed out PAST, another Parrot front end, and suggested that it may make sense to refactor imcc/main.c into (he suggests) src/main.c and imcc/frontend.c, which would make the distinction rather clearer and provide an opportunity to clean up the exported symbols. Leo pretty much agreed with Melvin (no comment on Bernhard's suggestions yet, though).

Perl Jobs for the Willing

Leo looked for volunteers to rejig t/src/manifest.t to use .svn/Entries instead of CVS/Entries when checking the MANIFEST. Michael Schwern (possibly accidentally) volunteered.

Meanwhile, in perl6-language

A quick note about notation: I've started borrowing notation from Ruby/Smalltalk to discuss methods. Where I write SomeClass#method, then I am referring to an instance method of SomeClass and where I write SomeClass.method, I am referring to a class method.

Identity Tests and Comparing Two References

By heck, but I've not been keeping up.

I started understanding what was going on when people started talking about implicit dereferencing of long chains of references. Larry's saying that even if $foo is a reference to a reference to a reference to a ... to "10," then $$foo will chase all the way along the reference chain and evaluate to "10." The general response seemed to be, "Wah! How do I make it not do that?"

How to help Larry

say What?

Ovid wondered if say (and print, come to that) should default to printing the current topic. An initial hunt through the Perl 6 documents proved to be "like trying to sip through a firehose," so he asked the list (and dropped a heavy hint about indexing the docs).

According to Luke, it should default to the current topic.

Blocks, Continuations, and eval()

Wolverian's been looking at the Perl 5 debugger and wondered if it would be possible to add an eval method to objects that represent scopes. The idea is that:

$scope.eval 'say $foo'

would evaluate say $foo with all of $scope's bindings, etc. in place. (I wonder what $scope.eval 'return' would do.) At least, that's what I think he meant. Others asked for clarification. Wolverian also wanted to know how to get hold of a scope's continuation (or at least the current continuation). Larry has in the past said that he wouldn't expose continuations in the core language. Others have noted that it wouldn't be beyond the bounds of possibility to write a Parrot-level module which would expose them, though.

Questions on $pair.kv

Stevan Little had some questions about the behavior of the Pair#kv method. Luke came through with the answer ("when all else fails, consider the pair to be a one-element hash").

Managing PLEAC

Prompted by a suggestion from Tim Bunce, Ovid started porting the examples in the Perl Cookbook into idiomatic Perl 6. He asked for comments and suggestions on how to proceed.

Marcus Adair proposed, and Luke Palmer strongly seconded, moving the development of the code onto a mediawiki (wikipedia)-style Wiki that has good support for "offline" discussion of code as it develops. Autrijus reckoned that his current practice of handing out SVN committer bits to anyone who expressed an interest and leaving discussion in the files themselves seems to be working pretty well so far. He pointed at pugs.kwiki.org, though.

Collaborative Synopses

Bryan Ingerson posted a preliminary cut of Synopsis 26 and asked for comments. Yuval Kogman pointed out that the docs/ subdirectory of the Pugs distribution is filled with documentation that needed proofreading and nitpicking. Go to it, people.

Aliasing Swapped Values

Ovid wondered what:

($x,$y) := ($y, $x)

would do.

Juerd reckoned the answer is straightforward, and I must say I agree with him.

String#chars in a List Context

Marcus Adair argued that it seems natural that String#chars, when used in a list context, should return a list of the Unicode chars in the string. Opinion seemed to favor the idea, but there's been no ruling from Larry (or anyone else on the Design Team).

Whither use English

David Vergin wondered if Perl 6 would have an equivalent of use English, which will give sensible names to the various magic globals (those that still exist, least). The answer: yes and no. There will be no English.pm module, but the magic globals will all have English names by default.

Heredocs and Their Workings

Marcus Adair wondered about the use of Heredocs as positional parameters. Luke confirmed that they should work just like they do in Perl 5, modulo minor matters of spelling (now qq::to/END/, etc.) and whitespace removal.

Slicing Conflict

Luke pointed out that:

my @a = (1,2,3,4);
my @b = @a[1...];
say +@b;

is potentially problematic (he argues it should print "3", but Perl 5 semantics imply that it should print "Inf"). He proposed breaking with the Perl 5 way. Autrijus agreed with him and has implemented his proposal in Pugs.

Coo--That Was Fun, I Think I'll Do It Again Some Time

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.

Or you can check out my website. Maybe now that I'm back writing stuff, I'll start updating it. There are also vaguely pretty photos by me.

That's quite enough shameless self-promotion. See you in two weeks.

Building Good CPAN Modules

When you are planning to release a module to CPAN, one of your first tasks is figure out what OS, Perl version(s), and other environments you will and will not support. Often, the answers will come from what you can and cannot support, based on the features you want to provide and the modules and libraries you have used.

Many CPAN modules, however, unintentionally limit the places where they can work. There are several steps you can take to remove those limitations. Often, these steps are very simple changes that can actually enhance your module's functionality and maintainability.

It Runs On My Machine

You have the latest PowerBook, update from CPAN every day, and run the latest Perl version. The people using your module are not. Remember, just because an application or OS is older than your grandmother doesn't mean that it isn't useful anymore. Code doesn't spontaneously develop bugs over time, nor does it collect cruft that makes it run slower. Some vitally important applications have run untouched for 30+ years in languages that were deprecated when you were in diapers. These applications keep the lights on and keep track of all the money in the world, for example, and they typically run on very old computers.

Companies want to keep using their older systems because these systems work and they want to use Perl because Perl works everywhere. If you can leverage CPAN, you already have 90 percent of every Perl application written.

When in Rome

Perl runs on at least 93 different operating systems. In addition, there are 18 different productionized Perl 5 versions floating around out there (not counting the development branches and build options). 93 x 18 = 1674. That means your module could run on one of well over 1500 different OS/Perl version environments. Add in threading, Unicode, and other options, and there is simply no way you can test your poor module in all of the places it will end up!

Luckily, Perl also provides (many of) the answers.

Defining Your Needs

If you know that your module simply will not run in a certain environment, you should set up prerequisites. These allow you to provide a level of safety for your users. Prerequisites include:

  • OSes that your module will not run under

    Check $^O and %Config for this. $^O will tell you the name of the operating system. Sometimes, this isn't specific enough, so you can check %Config.

    use Config;
    
    if ( $Config{ osname } ne 'solaris' || $Config{ osver } < 2.9 )
    {
        die "This module needs Solaris 2.9 or higher to run.\n";
    }

    It's usually better to limit yourself to a specific set of OSes that you know to be good. As your module's popularity grows, users will let you know if it works elsewhere.

  • Perl versions/features

    Check $] and %INC for this. $] holds the Perl version and %INC contains a list of loaded Perl modules so far. (See the Threading section for an example.) If your module simply cannot be run in Perl before a certain version, make sure you have a use 5.00# (where # is the version you need) within your module. Additionally, Module::Build allows you to specify a minimum Perl version in the requires option for the constructor.

  • Modules/libraries

    In ExtUtils::MakeMaker, you can specify a PREREQ_PM in your call to WriteMakefile() to indicate that your module needs other modules to run. That can include version numbers, both the minimum and maximum acceptable. Module::Build has a similar feature with the requires option to the constructor.

    If you depend on external, non-Perl libraries, you should see if they exist before continuing onwards. Like everything else, CPAN has a solution: App::Info.

    use App::Info::HTTPD::Apache;
    
    my $app = App::Info::HTTPD::Apache->new;
    
    unless ( $app->installed ) {
        die "Apache isn't installed!\n";
    }
Operating System

What OS your module happens to land on is both less and more of an issue than most people realize. Most of us have had to work in both Unix-land and Windows-land, so we know of pitfalls with directory separators and hard-coding outside executables. However, there are other problems that only arise when your module lands in a place like VMS.

The VMS filesystem, for example, has the idea of a volume in a fully qualified filename. VMS also handles file permissions and file versioning very differently than the standard Unix/Win32/Mac model. An excellent example of how to handle these differences is the core module File::Spec.

Because this is an issue most authors have had to face at some point, there is a standard perlpod called, fittingly, perlport. If you follow what's in there, you will be just fine.

Perl Version

It's been over ten years since the release of Perl 5.0.0, and Perl has changed a lot in that time. Most installations, however, are not the latest and greatest version. The main reason is "If it ain't broke, don't fix it." There is no such thing as a safe upgrade.

Most applications have no need for the latest features and will never trip most of the bugs or security holes. They just aren't that complex. If you restrict your module to features only found in 5.8, or even 5.6, you will ignore a large number of potential users.

Security Improvements

Most security fixes are transparent to the programmer. If the algorithms behind Perl hashes improve, you won't see it. If a new release fixes a hole in suidperl, your module won't care.

Sometimes, however, a security fix is a new feature whose usage will (and should) become the accepted norm: for example, the three-arg form of open() of 5.6. In these cases, I use string-eval to try to use the new feature and default to the old feature if it doesn't work. (Checking $] here isn't helpful because if your Perl version is pre-5.6, it will still try to compile the three-arg form and complain.)

eval q{
    open( INFILE, ">", $filename )
          or die "Cannot open '$filename' for writing: $!\n";
}; if ( $@ ) {
    # Check to see if it's a compile error
    if ( $@ =~ /Too many arguments for open/ ) {
        open( INFILE, "> $filename" )
            or die "Cannot open '$filename' for writing: $!\n";
    }
    else {
        # Otherwise, rethrow the error
        die $@;
    }
}
Bug Fixes

Like security fixes, most bug fixes are transparent to the programmer. Most of us didn't notice that the hashing algorithm was less than optimal in 5.8.0 and had several improvements in 5.8.1. I know I didn't. In general, these will not affect you at all.

Unlike security fixes, if your module breaks on a bug in a prior version of Perl, there's probably not much you can do other than require the version where the bug fix occurred.

New Features

Everyone knows about use warnings; and our appearing in 5.6.0. You may, however, not know about the smaller changes. A good example is sorting.

5.8.0 changed sorting to be stable. This means that if the two items compare equally, the resulting list will preserve their original order. Prior versions of Perl made no such guarantee. This means that code like this may not do what you expect:

my @input = qw( abcd abce efgh );
my @output = sort {
    substr( $a, 0, 3 ) cmp substr( $b, 0, 3 )
} @input;

If you depend on the fact that @output will contain qw( abcd abce efgh ), your module may be run into problems on versions prior to 5.8.0. @output could contain qw( abce abcd efgh) because the sorting function considers abcd and abce identical.

Gotchas With OS and Perl Versions

Your module may be pristine when it comes to OS or Perl versions. Is the rest of your distribution? Your tests may betray a dependency that you weren't aware of.

For example, 5.6.0 added lexically scoped warnings. Instead of using the -w flag to the Perl executable, you can now say use warnings. Because enabling warnings is generally a good thing, this is a very common header for test files written by conscientious programmers using Perl 5.6.0+:

use strict;
use warnings;

use Test::More tests => 42;

Now, even if your module runs with Perls older than 5.6.0, your tests won't! This means your distribution will not install through CPAN or CPANPLUS. For administrators who install modules this way and who have better things to do that debug a module's tests, they won't install it.

Major New Features

Some new features are so large that they change the name of the game. These include Unicode and threading. Unicode has had support, in one form or another, in every version of Perl 5. That support has slowly moved from modules (such as Unicode::String) to the Perl core itself.

Threading

In 5.8.0, Perl's threading model changed from the 5.005 model (which never worked very well) to ithreads (which do). Additionally, multi-core processors are coming to the smaller servers. More and more, developers using 5.8+ choose to write threaded applications.

This means that your module might have to play in a threaded playground, which is a weird place indeed to process-oriented folks. Now, Perl's threading model is unshared by default, which means that global variables are safe from clobbering each other. This is different from the standard threading model, like Java's, which shares all variables by default. Because of this decision, most modules will run under threads with little to no changes.

The main issue you will need to resolve is what happens with your stateful variables. These are the variables that persist and keep a value in between invocations of a subroutine, yet need coordination across threads. A good example is:

{
    my $counter;
    sub next_value ( return ++$counter; }
}

If you depend on this counter being coordinated across every invocation of the next_value() subroutine, you need to take three steps.

  • Sharing

    Because Perl doesn't share your variables for you, you must explicitly share $counter to make sure that it is correctly updated across threads.

  • Locking

    Because a context-switch between threads can happen at any time, you need to lock $counter within the next_value() subroutine.

  • Version safety

    Also, because ithreads is an optional 5.8.0+ feature and the lock() subroutine is undefined before 5.6.0+, you may want to do some version checks.

    {
        my $counter = 0;
        if ( $] >= 5.008 && exists $INC{'threads.pm'} ) {
            require threads::shared;
            import threads::shared qw(share);
            share( $counter );
        }
        else {
            *lock = sub (*) {}
        }
    
        sub next_value {
            lock( $counter );
            $counter++;
        }
    }

The best description that I've seen of what you need to do to port your application to a threaded works successfully is "Where Wizards Fear to Tread" on Perl 5.8 threads.

Unicode

Although Unicode had some support prior to 5.8.0, a major feature in 5.8.0 was the near-seamless handling of Unicode within Perl itself. Prior that that, developers had to use Unicode::String and other modules. This means that you should look to handling strings as gingerly as possible if you consider support for Unicode on Perls prior to 5.8.0 as important. Luckily, most major modules already do this for you without you having to worry about it.

Discussing how to handle Unicode cleanly is an article in itself. Please see perlunicode and perluniintro for more information.

Playing Nicely with Others

If you're like me, you heard "Doesn't play well with others" a lot in kindergarten. While that's an admirable trait for a hacker, it's not something to praise in any modules that production systems depend upon. There are several common items to look out for when trying to play nicely with others.

Persistent Environments

Persistent environments, like mod_perl and FastCGI, are a fact of life. They make the WWW work. They are also a very different beast than a basic script that runs, does its thing, and ends. Basically, a persistent environment, such as mod_perl, does a few things.

  • Persistent interpreter

    Launching the Perl executable is expensive, relatively speaking. In an environment such as a web application, every request is a separate invocation of a Perl script. Persistence keeps a Perl interpreter around in memory between invocations, reducing the startup overhead dramatically.

  • Forked children

    In order to handle multiple requests at once, persistent environments tend to provide the capability for forked child processes, each with its own interpreter. Normally, this requires a copy of each module in every child's memory area.

  • Shared memory

    Nearly every request will use the same modules (CGI, DBI, etc). Instead of loading them every time, persistent environments load them into shared memory that each of the child processes can access. This can save a lot of memory that would otherwise be required to load DBI once for every child. This allows the same machine to create many more children to handle many more requests simultaneously on the same machine.

Caching needs a special mention. Because most persistent environments load most of the code into shared memory before forking off children, it makes sense to load as much code that won't change as possible before forking. (If the code does change, the child process receives a fresh copy of the modified memory space, reducing the benefit of shared memory.) This means that modules need to be able to pre-load what they need on demand. This is why CGI, which normally defers loading anything as much as possible, provides the :all option to load everything at once.

The mod_perl folks have an excellent set of documentation as to what's different about persistent environments, why you should care, and what you need to do for your module to work right.

Overloading

It's very easy to create an overloaded class that cannot work with other overloaded classes. For example, if I'm using Overload::Num1 and Overload::Num2, I would expect $num1 + $num2 to DWIM. Unfortunately, with most overloaded classes written as below, they won't. (For more information as to how this code works, please read overload, or the excellent article "Overloading.")

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    $l = ref $l ? $l->numify : $l;
    $r = ref $r ? $r->numify : $r;

    return $l + $r;
}

Overload::Num1 uses the numify() method to retrieve the number associated with the class. Overload::Num2 uses the get_number() method. If I tried to use the two classes together, I would receive an error that looks something like Can't locate object method "numify" via package "Overload::Num2".

The solution is very simple--don't define an add() method. Define a numify (0+) method, set fallback to true, and walk away. You don't need to define a method for each option. You only need to do so if you have to do something special as part of doing that operation. For example, complex numbers have to add the rational and complex parts separately.

If you absolutely have to define add(), though, use something like this:

sub add {
    my ($l, $r, $inv) = @_;
    ($l, $r) = ($r, $l) if $inv;

    my $pkg = ref($l) || ref($r);

    # This is to explicitly call the appropriate numify() method
    $l = do {
        my $s = overload::Method( $l, '0+' );
        $s ? $s->($l) : $l
    };

    $r = do {
        my $s = overload::Method( $r, '0+' );
        $s ? $s->($r) : $r
    };

    return $pkg->new( $l + $r );
}

This way, each overloaded class can handle things its way. The assumption, you'll notice, is to bless the return value into the class whose add() the caller called. This is acceptable; someone called its method, so someone thought it was top dog! (If you have an add method, no numify method, and fallback activated, you will enter an infinite loop because numify falls back to $x + 0.)

Finding Out What Something Is

At some point, your module needs to accept some data from somewhere. If you're like me, you want your module to DWIM based on what data it has received. Eventually, you want to know "Is it a scalar, arrayref, or hashref?" (Yes, I know there are seven different types in Perl.) There are many, many ways to do this. Some even work.

  • ref()

    ref() is the time-honored way to dispatch based on datatype, resulting in code that looks like:

    my $is_hash = ref( $data ) eq 'HASH';

    The problem is that ref( $data ) will return the class name of $data if it's an object. If someone has defined a class named HASH (don't do that!) that uses blessed array references, this will also break spectacularly.

  • isa()

    isa() will tell you whether a reference inherits from a class. The various datatypes are actually class-like. Some people suggest writing code like:

    my $is_hash = UNIVERSAL::isa( $data, 'HASH' );

    This will work whether or not $data is blessed. Again, though, if someone is mean enough to call a class HASH and bless an arrayref into it, you'll have trouble. Worse, this technique may break polymorphism spectacularly if $data is an object with an overloaded isa() method.

  • eval blocks

    Just try the data as a hashref and see if it succeeds.

    my $is_hash = eval { %{$data}; 1 };

    This avoids the primary issue of the two options listed above, but this may unexpectedly succeed in the case of overloaded objects. If $data is a Number::Fraction, you will mistakenly use $data as a hash because Number::Fraction uses blessed hashes for objects, even though the intent is to use them as scalars.

  • Assume that objects are special

    By using Scalar::Util's blessed() and reftype() functions, you can determine if a given scalar is a blessed reference or what type of reference it really is. If you want to find out if something is a hash reference, but you want to avoid the pitfalls listed above, write:

    my $is_hash = ( !blessed( $data ) && ref $data eq 'HASH' );
    # or
    my $is_hash = reftype( $data ) eq 'HASH';

    Nearly every use of overloading is to make an object behave as a scalar, as in Number::Fraction and similar classes. Using this technique allows you to respect the client's wishes more easily. You will still miss a few possibilities, such as (the somewhat eccentric) Object::MultiType (an excellent example of what you can do in Perl, if you put your mind to it).

    My personal preference is to let $data tell you what it can do.

  • Object representations

    Not all objects are blessed hashrefs. I like to represent my objects as arrayrefs, and other people use Inside-Out objects which are references to undef that work with hidden data. This means that my overloaded numbers are arrays, but I want you to treat them as scalars. Unless you ask $data how it wants you to treat it, how will you handle it correctly?

  • Overloading accessors

    overload allows you to overload the accessor operators, such as @{} and %{}. This means that one can theoretically bless an array reference and provide the ability to access it as a hash reference. Object::MultiType is an example of this. It is a hashref that provides array-like access.

    Unfortunately, the CPAN module that would do this doesn't exist, yet.

Letting Others Do Your Dirty Work

The modules that you and I use on a daily basis are, in general, as OS-portable, version-independent, and polite as possible. This means that the more your module depends upon other modules to do the dirty work, the less you have to worry about it. Modules like File::Spec and Scalar::Util exist to help you out. Other modules like XML::Parser will do their jobs, but also handle things like any Unicode you encounter so that you don't have to.

That said, you still have to be careful with whom your young module fraternizes with. Every module you add as a dependency is another module that can restrict where your module can live. If one of your module's dependencies is Windows-only, such as anything from the Win32 namespace, then your module is now Windows-only. If one of your dependencies has a bug, then you also have that bug. Fortunately, there are a few ways to bypass these problems.

  • Buggy dependencies

    Generally, module authors fix bugs relatively quickly, especially if you've provided a test file that demonstrates the bug and a patch that makes those tests pass. Once your module's dependency has a new version released, you can release a new version that requires the version with the bug fix.

  • OS-specific dependencies

    The first option is to accept it. If no one on Atari MiNT cares, then why should you? Alternatively, you can encapsulate the OS-dependent module and find another module that provides the same features on the OS you're trying to support. File::Spec is an excellent example of how to encapsulate OS-specific behavior behind a common API.

There's a lot to keep in mind when writing a module for CPAN: OS and Perl versions, Unicode, threading, persistence--it can be very overwhelming at times. With a few simple steps and a willingness to let your users tell you what they need, you'll be the toast of the town.

This Fortnight in Perl 6, March 22 - April 3, 2005

Perl 6 Language

ceil and floor

Ingo Blechschmidt wondered if ceil and floor would be in the core. Warnock applies, although Unicode operators would let me define circumfix \lfloor \rfloor (although I only know how to make those symbols in TeX.). Hmmm...using TeX to write Perl 6 code is an interesting idea. At least then I could figure out how to make all the special symbols. Maybe someone should make a package for that.

s/// object?

Stevan Little wanted to know if s/// could return some sort of magic object to poke or prod. Larry said "no".

Markup-Like Features

Michele Dondi asked if Perl 6 would have markup-like features in it. Luke Palmer asked for a more full explanation of what he meant. Warnock applies.

The Many Moods of does

Thomas Sandlaß wondered if anyone would actually write S14 or if does ate up tie/bless, enumerating the many powers of does. Larry explained that does will probably have mutated bless and then explained the contexts under which does performs each of its powers.

Even More Moods of does

To follow up on his question about does, Thomas Sandlaß wondered about is, specifically whether it stubbed or initialized its variable. Larry explained that is would probably initialize its variable and explained how to use is Ref to stub but not initialize something.

Perl 5 -> Perl 6 Converter

Adam Kennedy dropped a line to the list about PPI 0.903, which could form a good base for a Perl 5 to Perl 6 converter. Larry explained that he is actually using PPD (the actual Perl parser) to construct such a tool. He also explained how he was going to do it. Actually, it's a really cool approach for those of you who like elegant design approaches. You should check it out. I'll give you a hint: it starts by writing a glorified version of cat.

p5 Library Compatibility?

Andrew Savige wondered if p6 would maintain the interface for most p5 libraries. chromatic almost died of fright from the suggestion. Juerd suggested a deprecated namespace for such things. Larry gave him a Ponie instead. Later, Larry thought that perhaps a special namespace for those libraries that could be automatically converted might be appropriate.

Follow-up

Importing Constants and Header Files

Song10 wondered if there was an easy way to import constants from a module and not have to specify their full scope in the includers file. Larry explained that p6 will have "policy" modules which would allow this. He then began to let his mind explore the possibility of allowing these modules to return a string to evaluate in the user's scope. Then he realized how nasty textual inclusion was in C and C++, and figured that a hygienic policy would be better.

Giving Lists More Operators

Juerd constructed a table of string, integer, and list operators. He noticed that the list section had blank spots where string and integer both had items. He then suggested quite a few more operators to fill these blanks. This morphed into a discussion of code complexity and reading code.

String Pain

Chip wondered what exactly set str apart from Str and the impact this difference had on Unicode support. Larry and Rod Adams explained that str specifies a raw bytes view of strings and requires explicit casts between different Unicode levels.

xx on Coderefs

Luke Palmer wondered if the xx operator could include overloading to run a coderef repeatedly and slap the results together. Others liked it, but there was no word from on high.

Running Pugs

Adam Preble had some strange problems with Pugs' make install target. Warnock applies.

Manipulating Many Dimensional Hashes

Zhuang Li wanted to know how to manipulate hashes of unknown dimension. Luke Palmer provided the answer.

Semantics of Currying

Yuval Kogman has been implementing currying in Pugs. As such, he has found some of the under-specified corner cases. Thus he, Larry, Luke Palmer, and Thomas Sandlaß delved into these mysteries.

Multi-paradigmatic Perl 6

Someone named Eddie posted a fairly long message to p6l on the Google Groups interface suggesting that Perl 6 support delegation and other programming paradigms. Sadly, no one told him that it already does both of those things, because nobody saw his email. Google Groups does not send messages back to the list.

NB for Pugs on Low Memory Machines

Adam Preble posted a helpful warning about installing Pugs on machines with less than 200 MB of memory. Unfortunately he also posted it to Google Groups. People should stop doing that. Is there some way to tell Google to prevent them from doing that?

PLEAC

Tim Bunce suggested that people could add programming examples to PLEAC for Perl 6. Of course they should run in Pugs if they are being released to the world at large.

Annotating Code with Documentation

Chip wants to be able to document his code by attaching documentation directly to it. This would allow for nifty introspective features. Larry pointed out that code will have access to the surrounding POD.

Typo in S03

Aaron Sherman pointed out a typo in S03. Luke Palmer explained that dev.perl.org did not mirror the svn tree just yet. Juerd found one too and received the same answer. This time, though, Robert Spier put in the necessary magic so that dev.perl.org would update from svn.perl.org.

Optimization Hints

Yuval Kogman noted that Perl 6 has some ability to provide lexically scoped hints and suggested a few more things that might be hintable. Larry opened the door for him to try to design such features.

S29 Update

Rod Adams' efforts to update S29 continue to push a very large thread about things including numification of strings and various core operators.

String Positions

Aaron Sherman wanted a more OO way to look at the OS. Larry did not really agree but suggested that someone could create a proxy object which would reference all of those globals. Then a conversation about having units attached to numbers sprang up. That sounds like a good module to me.

modify and assign Operators

Andrew Savige wondered if there was a complete list of operators anywhere, because he could not find ~^= (string xor) documented anywhere. Larry explained that the assign should probably be a meta operator to allow for better extensibility

p5 -> p6 Guide

Adam Preble wondered if there was a basic p5 -> p6 guide. Unfortunately he posted to Google Groups.

$_.method vs $self.method

The debate about whether .method should mean $self.method or $_.method continued. $self is still winning.

Typo Problems

It seems that Juerd has typing problems. He wanted to know if he could form a support group. Apparently he can only if he uses vim.

Renaming Flattening and Slurp

Terrence Brannon wants to change the terms "flatten" and "slurp" to something else. Larry told him that this usage was unlikely to change.

How Read-Only is Read-Only?

Chip wondered how deep read-only-ness or is copy-itude went on arguments. The answer appears to be: shallow. This led to a very long discussion of how much type checking will actually occur.

pick on Non-Junctions

Ingo Blechschmidt wondered what pick would do on an array or a hash. Many folk explained that it would remove and return an item or pair from the container respectively. Larry commented that pick on a hash could be harder than it looks.

Built-in Multi Methods

Wolverian wondered if some of the common functions called on strings would actually be methods. Larry answered that they would more likely be multis to allow for easier extension.

Comparing Two References

Darren Duncan wanted to know if =:= was the correct operator for testing if two variables refer to the same object. Larry explained that it was. This led to a debate about how easily people can deal with chains of references in Perl 6.

Perl 6 Compiler

Pugs Test Failures

Will Coleda worriedly reported 115 failing subtests in Pugs. Stevan Little explained that this was normal for between releases and was really more of a TODO list than a problem.

Pugs Darcs Repo

Greg Buchholz noticed that the darcs repo for Pugs has trouble staying up to date. Tupshin Harper suggested using darcs whatsnew --look-for-adds --summary to find the offending files.

BEGIN {} Time

Autrijus wondered when BEGIN should run. Markus Laire posted a useful summary of when the various CAPITAL things should run. Larry confirmed Autrijus' suspicion.

YAML Test Output

Nathan Gray wondered if he should change his tests log to YAML output. Stevan Little pointed him to Test::Harness::Straps, which can collect test output and transform it.

Semicolons in p6

Andrew Savige found some strange behavior with respect to statement separation in Pugs. He thought that perhaps semicolons had changed their status. They haven't.

Ugly Dog Meets Ugly Bird

Pugs r2^10 can compile p6 to imc which Parrot can run. I think I speak for everyone when I say "Wow. Nice work, Autrijus.".

String Interpolation and Various Special Variables

Andrew Savige noticed a couple of odd corner cases for string interpolation in Pugs. This led to a discussion of which special variables (like $!, $/, and $") will continue to exist.

Code Coverage Metadata

Paul Johnson posted a list of requirements he would like to see satisfied so that he can easily generate Perl 6 coverage reports. Warnock applies.

Pugs Release

As is the fashion, Pugs went through two minor releases during this two week period: 6.0.13 and 6.0.14.

Text Editor Support for Perl 6

Darren Duncan suggested that it might be a good idea for people to begin prepping their favorite text editors to handle Perl 6 syntax correctly. Why stop at syntax? I know I want to be able to type :perl6do in vim.

Makefile.PL

Darren Duncan noticed that most things in Pugs use Perl 6, while Makefile.PL was still Perl 5. He suggested writing the Makefile.PLs in various modules in Perl 6 also. He then began work on a Pugs::Makemaker module.

Work Begins

Pugs to Become a Perl 6 -> Parrot AST Compiler

Autrijus explained that he was planning on steering Pugs toward becoming a Perl 6 -> Parrot compiler which would interpret code (when Parrot is not available) by mapping imc to Haskell.

Pugs Re-org

Stevan Little suggested rearranging the Pugs repository a bit. The end result is that modules which don't run in Pugs (yet) should go in modules/ while those which do should go in ext/.

YAPC::NA Pugs hack-a-thon

John Macdonald posted his plan for the YAPC::NA Pugs hack-a-thon. His description of the location makes me want to take time off work to go.

split Semantics

Stevan Little found a bug in Pugs' split. Autrijus fixed it, but noted that he had not replicated the full, bizarre semantics of Perl 5 (which come from awk). Larry told him not to work too hard on it, as it would probably work in Perl 6 through a separate function.

Statement Modifiers

A bug in Pugs' parsing led Autrijus to seek information from a higher authority. Larry explained the power of statement modifiers.

Parrot

Move Perl Tests Out of pmc/pmc.t

Steven Schubiger volunteered to reorganize pmc/*.t last time. He did it, and Leo applied the patches.

Areas of Focus

Chip, in a circumloquacious attempt to come up to speed, indirectly asked what design issues needed attention. Leo explained the CPS issues that bogged down Parrot of late.

Improving MinGW Docs

François Perrad provided a patch improving the documentation for building with MinGW. Leo applied part of it.

Moving pmc2c2.pl or pmc2c.pl

Matt Diephouse opened an RT ticket for cleaning up the file system (specifically pmc2c2?.pl).

The Learned Parrot

Christian Aperghis-Tramoni reported that he has had success using Parrot assembly as a teaching tool.

Performance and Parrot

Falcon posted a series of questions about Parrot in a fairly general sense. Unfortunately, because he posted it to Google Groups, Warnock applies.

First MMD call

Leo posted a first MMD call which uses an MMD PMC and a fair amount of hand-made calling conventions set up.

OpenBSD atan2 Trouble

Steve Peters noticed that atan2 on OpenBSD is not quite right.

API Change

Leo changed various packfile functions to take an Interp* argument. This does change the embedding API, but it had to be done.

pmc2c2.pl Cleanup

Leo pointed out that pmc2c2.pl was not functioning correctly on all platforms. He put out a call for interested parties. Matt Diephouse provided a patch to clean up the internals of pmc2c2.pl a bit. Leo applied it. Peter Sinnott returned a $ that got lost in the shuffle, and chromatic applied it. Matt went on to add better comments.

README.win32 Update

Klaas-Jan Stol provided an update to the README.win32 directions. Warnock applies.

SET_NON_ZERO_NULL

Chip wondered why Parrot had a SET_NON_ZERO_NULL macro and suggested removing it. The answer was, of course, speed. On architectures with a zero null, this can be a no-op allowing the use of calloc(). Otherwise it has to do something.

PMC Constants

Leo added support for .const things to imc. Unfortunately, the GC eats them so you can't use it yet.

Garbage Collection and Hash Creation

Cory Spencer's LISP implementation revealed a bug in the hash creation sequence. Leo fixed it.

MD5 Update

Nick Glencross provided an update to the MD5 library. Leo applied it.

Tcl Updates

Will Coleda has been updating Tcl. He moved the parser into a PMC. Then he tried to add octal and hex escapes only to discover missing transcodings. He also found missing hash functions, but Leo fixed that.

Logging IRC

Someone suggested that we start to log IRC. Chip suggested that this might not be cost effective as such logs are 99% dross and 1% value. He suggests instead that people paste the good part into emails for the list. I know that I, for one, would not volunteer to summarize IRC.

Segfault Generating config.fpmc

chromatic (as his Linux PPC is wont to do) found a bug in the build. He fixed it, and Leo applied the patch.

Lazy, Lazy Steve

Leo added a first implementation of a Lazy PMC for Autrijus to play with.

Win32 make install

François Perrad provided a patch fixing MANIFEST.generated for Win32. Warnock applies.

Parrot on Win32

Ron Blaschke spent some time fixing Parrot on Win32, extending it to provide a shared library.

mod_parrot

Jeff Horwitz released mod_parrot 0.2. It includes nifty features like the beginning of an interpreter pool.

C90 Cleanup

Peter Sinnott moved a few declarations further up. Leo applied the patch.

MMD on Argument Count

Leo added the ability to MMD on argument count and PMC types.

Documentation Typos

Offer Kaye fixed a few typos. chromatic applied the patch.

pmc freeze.t

Leo admired the trickiness of Bernhard Schmalhofer in his freeze implementation.

sys.t failure on MinGW

François Perrad fixed a test failure on MinGW. Leo applied it.

Builtin Namespaces Issue

Peter Sinnott pointed out some failing tests. Leo fixed them.

locate_runtime_file Error

Bob Rogers provide a patch to switch PARROT_TEST to 1 by default. Leo applied it.

Bytecode Re-entrance

Nigel Sandever had some questions about how Parrot and threading worked. Melvin Smith provided the answers.

Pugs Questions for Parrot FAQ

Nicholas Clark noted that the question of why bother with Parrot when one has Pugs has come up recently. The answer went into the Parrot FAQ: speed.

Pascal for Perl

Sven Schubert wondered if people had any suggestions for how to get PAPAgei (his Pacal for Parrot compiler) up and running quickly. Leo told him to stick with the tools he knows rather than going too far afield.

Infix Op Proposal

Leo posted his proposal for how to revamp infix ops. Nicholas Clark and Luke Palmer asked a few questions.

Lexical Pad Depth

Cory Spencer wondered how to find the depth of the lexical pad stack. Leo told him.

Win32 Exit Codes

Ron Blaschke noticed that there were tests failing on Win32 because the exit code was not in the high 8 bits, but appeared directly. Leo suggested looking to Perl 5 for prior art on what to do.

Other Languages on Parrot

Bloves wondered if any other compilers were currently working toward targeting Parrot. I pointed him to Cardinal, a Ruby compiler for Parrot that appears dead.

Parrot64

Adam Preble wondered if there has been any work on Parrot for AMD64. The answer is: some, but nobody told him because he posted to Google Groups.

Parrot Win32-Setup

François Perrad provided a patch that creates a standard binary distribution for Win32. There was some debate over the name of the make target, but François is ready to send an updated version at Leo's command.

Calling Convention Abstraction

Leopold Toetsch proposed a calling convention abstraction that would allow Parrot to change its ABI a little more freely in the future. Roget Hale asked a few questions which Leo answered.

No 0 Size Arrays

Ron Blaschke noticed a broken Windows build, because of a 0 sized array. Leo fixed it.

Unicode String Literal Assertion Failed

Will Coleda discovered a failing assertion in utf8.c. Leo fixed it.

NCI Call Signature Change

Leo changed the call signature for NCI to make I mean INTVAL and J mean Parrot Interpreter.

Builtin Infix Multis

Leo added support for MMD on infix multis.

touch vs utime

Chip asked if there was a reason that the TOUCH variable doesn't use utime. Michael G. Schwern suggested ExtUtils::Command. Steve Peters points out that utime works only on existing files.

make imcc.l For Modern Flex

Chip opened a TODO for updating imcc.l to modern flex.

Mac OS X Build Broken

Will Coleda reported a broken build on Mac OS X with undefined symbols. Leo found the cause and reverted it.

SVN Switch

After much debate, the decision to switch from CVS to SVN has happened. The move will include the removal of ICU as a dependency. Good progress has occurred on that front.

MD5 on 64 Bits

Nick Glencross has been hard at work trying to fix the MD5 library for 64-bit systems. It would be easier if he had access to one.

Python Version Guesswork

Ron Blaschke noticed that ActiveState Python reports its build as 2.4 instead of 2.4.0. He provided a patch to account for this.

The Usual Footer

Posting via the Google Groups interface does not work. To post to any of these mailing lists please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to ubermatt@gmail.com.

Perl Code Kata: Mocking Objects

The last Perl Code Kata was on DBD::Mock, a mock DBI driver which is useful for testing Perl DBI applications. This Kata delves once again into the world of mock objects, this time using the more general Test::MockObject module.

What are Mock Objects?

Mock objects are exactly what they sound like: "mocked" or "fake" objects. Through the power of polymorphism, it's easy to swap one object for another object which implements the same interface. Mock objects take advantage of this fact, allowing you to substitute the most minimally mocked implementation of an object possible for the real one during testing. This allows a greater degree of isolation within your tests, which is just an all around good thing.

What are Mock Objects Good For?

Mock objects are primarily useful when writing unit tests. They share a certain similarity with the Null Object pattern in that they are purposefully not meant to work. Mock objects take things one step further and allow you to mock certain actions or reactions that your mock object should have, so they are especially useful in scenarios usually considered hard to test. Here is a short list of some scenarios in which mock objects make hard things easy.

  • Tests which depend on outside resources such as networks, databases, etc.

    If your code properly encapsulates any outside resources, then it should be possible to substitute a mocked object in its place during testing. This is especially useful when you have little control over the execution environment of your module. The previous Test Code Kata illustrated this by mocking the database itself. You need not stop with databases; you can mock any sufficiently encapsulated resource such as network connections, files, or miscellaneous external devices.

  • Tests for which dependencies require a lot of setup.

    Sometimes your object will have a dependency which requires a large amount of set-up code. The more non-test code in your tests, the higher the possibility that it will contain a bug which can then corrupt your test results. Many times your code uses only a small portion of this hard-to-setup dependency as well. Mock objects can help simplify things by allowing you to create the most minimally mocked implementation of an object and its dependencies possible, thus removing the burden of the set-up code and reducing the possibility of bugs in your non-test code.

  • Tests for failures; in particular, failure edge cases.

    Testing for failures can sometimes be very difficult to do, especially when the failure is not immediate, but triggered by a more subtle set of interactions. Using mock objects, it is possible to achieve exacting control over when, where, and why your object will fail. Mock objects often make this kind of testing trivial.

  • Tests with optional dependencies.

    Good code should be flexible code. Many times this means that your code needs to adapt to many different situations and many different environments based on the resources available at runtime. Requiring the presence of these situations and/or environments in order to test your code can be very difficult to set up or to tear down. Just as with testing failures, it is possible to use mock objects to achieve a high degree of control over your environment and mock the situations you need to test.

The Problem

The example code for this kata illustrates as many points as possible about which mock objects are good at testing. Here is the code:

package Site::Member;

use strict;
our $VERSION = '0.01';

sub new { bless { ip_address => '' }, shift }

sub ip_address { 
    my ($self, $ip_address) = @_;
    $self->{ip_address} = $ip_address if $ip_address;
    return $self->{ip_address};
}

# ...

sub city {
    my ($self) = @_;
    eval "use Geo::IP";
    if ($@) {
        warn "You must have Geo::IP installed for this feature";
        return;
    }
    my $geo = Geo::IP->open(
                "/usr/local/share/GeoIP/GeoIPCity.dat", 
                Geo::IP->GEOIP_STANDARD
            ) || die "Could not create a Geo::IP object with City data";
    my $record = $geo->record_by_addr($self->ip_address());
    return $record->city();
}

This example code comes from a fictional online community software package. Many such sites offer user homepages which can display all sorts of user information. As an optional feature, the software can use the member's IP address along with the Geo::IP module to determine the user's city. The reason this feature is optional is that while Geo::IP and the C library it uses are both free, the city data is not.

The use cases suggest testing for the following scenarios:

  • User does not have Geo::IP installed.
  • User has Geo::IP installed but does not have the city data.
  • User has Geo::IP and city data installed correctly.

Using Test::MockObject, take thirty to forty minutes and see if you can write tests which cover all these use cases.

Tips, Tricks, and Suggestions

Some of the real strengths of Test::MockObject lie in its adaptability and how simply it adapts. All Test::MockObject sessions begin with creating an instance.

my $mock = Test::MockObject->new();

Even just this much can be useful because a Test::MockObject instance warns about all un-mocked methods called on it. I have used this "feature" to help trace calls while writing complex tests.

The next step is to mock some methods. The simplest approach is to use the mock method. It takes a method name and a subroutine reference. Every time something calls that method on the object, your $mock instance will run that sub.

$mock->mock('greetings' => sub {
    my ($mock, $name) = @_;
    return "Hello $name";
});

How much simpler could it be?

Test::MockObject also offers several pre-built mock method builders, such as set_true, set_false, and set_always. These methods pretty much DWIM.

$mock->set_true('foo'); # the foo() method will return true
$mock->set_false('bar'); # the bar() method will return false
$mock->set_always('baz' => 100); # the bar() method will always return 100

It's even possible for the object to mock not only the methods, but its class as well. The simplest approach is to use the set_isa method to tell the $mock object to pretend that it belongs to another class.

$mock->set_isa('Foo::Bar');

Now, any code that calls this mock object's isa() method will believe that the $mock is a Foo::Bar object.

In many cases, it is enough to substitute a $mock instance for a real one and let polymorphism do the rest. Other times it is necessary to inject control into the code much earlier than this. This is where the fake_module method comes in.

With the fake_module method, Test::MockObject can subvert control of an entire package such that it will intercept any calls to that package. The following code:

my $mock = Test::MockObject->new();
$mock->fake_module('Foo::Bar' => (
    'import' => sub { die "Foo::Bar could not be loaded" }
));
use_ok('Foo::Bar');

...actually gives the illusion that the Foo::Bar module failed to load regardless of whether the user has it installed. These kinds of edge cases can be very difficult to test, but Test::MockObject simplifies them greatly.

But wait, that's not all.

After your tests have run using your mock objects, it is possible to inspect the methods called on them and query the order of their calls. You can even inspect the arguments passed into these methods. There several methods for this, so I refer you to the POD documentation of Test::MockObject for details.

The Solution

I designed each use case to illustrate a different capability of Test::MockObject.

  • User does not have Geo::IP installed.

    use Test::More tests => 4;
    use Test::MockObject;
    
    my $mock = Test::MockObject->new();
    $mock->fake_module('Geo::IP' => (
        'import' => sub { die "Could not load Geo::IP" },
    ));
    
    use_ok('Site::Member');
    
    my $u = Site::Member->new();
    isa_ok($u, 'Site::Member');
    
    my $warning;
    local $SIG{__WARN__} = sub { $warning = shift };
    
    ok(!defined($u->city()), '... this should return undef');
    like($warning, 
            qr/^You must have Geo\:\:IP installed for this feature/, 
            '... and we should have our warning');

    This use case illustrates the use of Test::MockObject to mock the failure of the loading of an optional resource, which in this case is the Geo::IP module.

    The sample code attempts to load Geo::IP by calling eval "use Geo::IP". Because use always calls a module's import method, it is possible to exploit this and mock a Geo::IP load failure. This is easy to accomplish by using the fake_module method and making the import method die. This then triggers the warning code in the city method, which the $SIG{__WARN__} handler captures into $warning for a later test.

    This is an example of a failure edge case which would be difficult to test without Test::MockObject because it requires control of the Perl libraries installed. Testing this without Test::MockObject would require altering the @INC in subtle ways or mocking a Geo::IP package of your own. Test::MockObject does that for you, so why bother to re-invent a wheel if you don't need to?

  • User has Geo::IP installed but does not have the city data.

    use Test::More tests => 3;
    use Test::Exception;
    use Test::MockObject;
    
    my $mock = Test::MockObject->new();
    $mock->fake_module('Geo::IP' => (
        'open'           => sub { undef },
        'GEOIP_STANDARD' => sub { 0 }
    ));
    
    use_ok('Site::Member');
    
    my $u = Site::Member->new();
    isa_ok($u, 'Site::Member');
    
    $u->ip_address('64.40.146.219');
    
    throws_ok {
        $u->city()
    } qr/Could not create a Geo\:\:IP object/, '... got the error we expected';

    This next use case illustrates the use of Test::MockObject to mock a dependency relationship, in particular the failure case where Geo::IP cannot find the specified database file.

    Geo::IP follows the common Perl idiom of returning undef if the object constructor fails. The example code tests for this case and throws an exception if it comes up. Testing for this failure uses the fake_module method again to hijack Geo::IP and install a mocked version of its open method (the code also fakes the GEOIP_STANDARD constant here). The mocked open simply returns undef which will create the proper conditions to trigger the exception in the example code. The exception is then caught using the throws_ok method of the Test::Exception module.

    This example illustrates that it is still possible to mock objects even if your code is not in the position to pass in a mocked instance itself. Again, to test this without using Test::MockObject would require control of the outside environment (the Geo::IP database file), or in some way having control over where Geo::IP looks for the database file. While well-written and well-architected code would probably allow you to alter the database file path and therefore test this without using mock objects, the mock object version makes no such assumptions and therefore works the same in either case.

  • User has Geo::IP and the Geo-IP city data installed correctly.

    use Test::More tests => 7;
    use Test::MockObject;
    
    my $mock = Test::MockObject->new();
    $mock->fake_module('Geo::IP' => (
        'open'           => sub { $mock },
        'GEOIP_STANDARD' => sub { 0 }
    ));
    
    my $mock_record = Test::MockObject->new();
    $mock_record->set_always('city', 'New York City');
    
    $mock->set_always('record_by_addr', $mock_record);
    
    use_ok('Site::Member');
    
    my $u = Site::Member->new();
    isa_ok($u, 'Site::Member');
    
    $u->ip_address('64.40.146.219');
    
    is($u->city(), 'New York City', '... got the right city');
    
    cmp_ok($mock->call_pos('record_by_addr'), '==', 0,
            '... our mock object was called');
    is_deeply(
            [ $mock->call_args(0) ],
            [ $mock, '64.40.146.219' ],
            '... our mock was called with the right args');
            
    cmp_ok($mock_record->call_pos('city'), '==', 0,
            '... our mock record object was called');
    is_deeply(
            [ $mock_record->call_args(0) ],
            [ $mock_record ],
            '... our mock record was called with the right args');

    This next case illustrates a success case, where Geo::IP finds the database file it wants and returns the expected results.

    Once again, the fake_module method of Test::MockObject mocks Geo::IP's open method, this time returning the $mock instance itself. The code creates another mock object, this time for the Geo::IP::Record instance which Geo::IP's record_by_addr returns. Test::MockObject's set_always method mocks the city method for the $mock_record instance. After this, Geo::IP's record_by_addr is mocked to return the $mock_record instance. With all of these mocks in place, the tests then run. After that, inspecting the mock objects ensures that the code called the correct methods on the mocked objects in the correct order and with the correct arguments.

    This example illustrates testing success without needing to worry about the existence of an outside dependency. Test::MockObject supports taking this test one step further and providing methods for inspecting the details of the interaction between the example code and that of the mocked Geo::IP module. Accomplishing this test without Test::MockObject would be almost impossible given the lack of control over the Geo::IP module and its internals.

Conclusion

Mock objects can seem complex and overly abstract at first, but once grasped they can be a simple, clean way to make hard things easy. I hope to have shown how creating simple and minimal mock object with Test::MockObject can help in testing cases which might be difficult using more traditional means.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en