April 2002 Archives

Becoming a CPAN Tester with CPANPLUS

"This is a great war long-planned,
and we are but one piece in it,
whatever pride may say.
... And now, all realms shall be put to the test."
-- Beregond, <<Lord of the Rings>>

Introduction

In CPANPLUS -- Like CPAN.pm only better?, Jos Boumans recounted the tale of how his CPANPLUS project came to be; one of the original goals was to provide a modular backend to CPANTS, the CPAN Testing Service, so modules can be automatically built and tested on multiple platforms.

Although the initial CPANPLUS release did not support testing-related features, Jos considered them important enough to be listed near the top of his Current and Future Developments wish list, and quoted Michael Schwern's opinion on their potential benefits*:

It would alter the way free software is distributed. It would get a mark of quality; it would be tested and reviewed. And gosh darnit, it would be good.

At that time, despite having participated in CPANPLUS's development for more than four months, I was blissfully unaware of the project's original vision. But it seemed like a worthwhile goal to pursue, so I quickly jotted down several pieces that need to work together:

  • Check test reports of modules before installation.

  • Report make test results to the author, and make them available for other users to check.

  • Tools to batch-test modules with or without human intervention.

  • A clean API for other programs to perform tests, reporting and queries.

Today, with the 0.032 release of CPANPLUS, all the above pieces are completed. This article will show you how to configure and use these tools, as well as describing some important new features in CPANPLUS that made them possible.

Setting Up the Environment

First, you need a copy of CPANPLUS 0.032 or above. I recommend the tarball automatically built from the distribution branch, available at http://egb.elixus.org/cpanplus-dist.tar.gz; it should contain the latest bug-fixes and features marked as stable.

Since Jos' article and README already contain install instructions, I will not repeat the details. However, please note that the dependency detection logic in Makefile.PL has been changed since Jos' previous article, which means when faced with this prompt:

 [CPAN Test reporting]
 - Mail::Send           ...failed! (needs 0.01)
 - File::Temp           ...loaded. (0.13 >= 0.01)
 ==> Do you wish to install the 1 optional module(s)? [n]

You should generally answer y here, as only modules that can safely install on your machine are displayed. The default n does not mean we don't recommend it; it merely means it isn't mandatory for the bare-bone CPANPLUS to function.

For this article's purpose, you need to answer y to all such questions, since we will be using LWP, Mail::Send and IPC::Run extensively in testing-related functions.

After running Makefile.PL, you should run make test* like every good user; if any dependent modules were selected in the previous step, then you will have to run make test as root, so it can fetch and install these modules automatically before testing itself.

After the comforting All tests successful message appears on your screen, just do a make install, and we are ready to go.

Why Is Testing Important?

... or not. What if, instead of All tests successful, you see a series of cryptic messages:

 % sudo make test
 /usr/bin/perl Makefile.PL --config=-target,skiptest
 --installdeps=Term::Size,0.01
 *** Installing dependencies...
 *** Installing Term::Size...
 Can't bless nonreference value at lib/CPANPLUS/Internals/Module.pm
 line 239.
 Compilation failed in require at Makefile.PL line 20.
 make: *** [installdeps] Error 255

An experienced Perl user will probably take the following steps:

  • Look at lib/CPANPLUS/Internals/Module.pm and see whether it's a trivial mistake. Unfortunately, it is not.
  • Search README for the project's mailing list address. Discovering that it's hosted in SourceForge, check Geocrawler (or any search engine) for existing bug reports and discussions. Unfortunately, there are none.
  • Copy-and-paste the error message into a bug-reporting e-mail to the mailing list, including your operating system name and version, Perl version, and wait for a fix.

Apparently, the above is exactly what Juerd Waalboer (the bug's original reporter) did when he ran into this trouble. Thanks, Juerd!

However, there are a number of problems with the procedure above. It might be thwarted at every step:

  • What if Juerd had been less experienced, and unaware of this bug-reporting procedure?

  • What if he decided to kluge around it by commenting out line 239, instead of taking the laborious copy-and-paste path to report it?
  • CPANPLUS has a mailing list; other modules might not. Worse, they may not even have a working contact address -- maybe the author is on vacation.
  • He might not have the sense to provide adequate debug-related data, and simply wrote CPANPLUS didn't work! It can't install!#!@#@#! in red-colored blinking HTML email, wasting bandwidth and causing developers to ignore his mail completely.
  • Even after a fix was made available, and posted as a patch on the mailing list, other people about to install CPANPLUS 0.03 still have no way to find out about it beforehand. Thus, when the same problem occurs again, they may forget to search in the archives, resulting in duplicate bug reports, or (worse) more dissatisfied users.
  • Actually, we had not mentioned the very likely scenario: What if Juerd didn't run make test at all? If so, the developers will have no way to know that this bug has ever happened.
  • Finally, while CPANPLUS (like most CPAN modules) includes a regression test suite, some modules may omit their tests altogether. Users may not feel comfortable to mail the author to request a test script to accompany the distribution; consequently, bugs will surface in harder-to-detect ways, and may never get fixed.

As you can see, both authors and users can certainly benefit a lot from an improved test reporting system -- one that simplifies the process of reporting a failed (or successful) installation, and allows users to check for existing reports before downloading a module.

Such a system already exists; it's called CPAN Testers.

What Is CPAN Testers?

When most people run make test, it's immediately followed by a make install; that is, people only test modules that they are going to use. But they may not have the time or inclination to report the outcome, for various reasons listed above.

CPAN Testers (http://testers.cpan.org/) is an effort to set up a Quality Assurance (QA) team for CPAN modules. As its homepage states:

 The objective of the group is to test as many of the
 distributions on CPAN as possible, on as many platforms
 as possible.
 The ultimate goal is to improve the portability of the
 distributions on CPAN, and provide good feedback to the
 authors.

Central to its operation is a mailing list, cpan-testers@perl.org, and a small program, cpantest. The program is a convenient way to report a distribution's test result, which is sent to the mailing list using an uniform format. For example:

 # test passed, send a report automatically (no comments)
 cpantest -g pass -auto -p CPANPLUS-0.032
 # test failed, launch an editor to edit the report, and cc to kane
 cpantest -g fail -p CPANPLUS-0.032 kane@cpan.org

The argument (pass/fail) after the -g option is called the grade -- the overall result of a test run. It may also be na (not available), which means the module is not expected to work on this platform at all (Win32::Process on FreeBSD, for example), and unknown in the case that no tests are provided in the distribution.

All recent CPAN uploads, as well as all test results, are forwarded to the cpan-testers mailing list's subscribers -- CPAN testers. After a distribution's release on CPAN, one or more testers will pick it up, test it, then feed the result back to the author and to the list. The testers don't have to know what the module is about; all they do is to ensure it's working as advertised.

Test reports on the mailing list are automatically recorded in a ever-growing database. http://testers.cpan.org/search is its search interface, where you can query by distribution name, version, or the testing platform.

Theoretically, you can query that database before installing any CPAN module, to see whether it works on your platform, and check for associated bug reports. The same information is also present at each module's page in http://search.cpan.org/, and maybe http://rt.cpan.org/ in the near future.

Alas, while this system certainly works, it is far from perfect. Here are some of its shortcomings:

  • The integration between Web site and mailing list hadn't extended to the CPAN.pm shell, so checking for test results requires an additional step to navigate the Web browser.
  • People who aren't subscribed to the cpan-testers list rarely submit test reports, so coverage of platforms is limited to a few popular ones; a heavily used module may still be tested in only two or three platforms, hardly representing its entire user base.
  • There are no smoking (automatic cross-platform testing) mechanisms; all testers must manually fetch, extract, test and submit their test reports, including the same copy-and-paste toil we described earlier. This entry barrier has seriously limited the number of active volunteer testers.

Fortunately, CPANPLUS addressed most of these issues, and made it significantly easier to participate in testing-related activities. Let's walk through them in the following sections.

Checking test reports

CPANPLUS offers a straightforward way to check existing reports. Simply enter cpanp -c ModuleName in the command line:

 % cpanp -c CPANPLUS
 [/K/KA/KANE/CPANPLUS-0.032.tar.gz]
 PASS freebsd 4.5-release i386-freebsd

As you can see, this reports the most recent version of CPANPLUS, which passed its tests on FreeBSD as of this writing.

You can also specify a particular version of some distribution. For example, the version Juerd was having problems with is 0.03, so he can do this:

 % cpanp -c CPANPLUS-0.03
 [/K/KA/KANE/CPANPLUS-0.03.tar.gz]
     FAIL freebsd 4.2-stable i386-freebsd (*)
     PASS freebsd 4.5-release i386-freebsd (*)
     PASS linux 2.2.14-5.0 i686-linux
 ==> http://testers.cpan.org/search?request=dist&dist=CPANPLUS

As you can see, there are three reports, two of which contain additional details (marked with *), available at the URL listed above. The failed one's says:

 This bug seems to be present in machines upgrading to 0.03
 from 0.01/0.02 releases.
 (It has since been fixed in the upcoming 0.031 release.)

Which exactly addressed Juerd's original problem.

Another useful trick is using the o command in the CPANPLUS Shell to list newer versions of installed modules on CPAN, and check on them all with c *:

 % cpanp
 CPAN Terminal> o
 1   0.14     0.15   DBD::SQLite        MSERGEANT
 2   2.1011   2.1014 DBD::mysql         JWIED
 CPAN Terminal> c *
 [/M/MS/MSERGEANT/DBD-SQLite-0.15.tar.gz]
     FAIL freebsd 4.5-release i386-freebsd (*)
     PASS linux 2.2.14-5.0 i686-linux
     PASS solaris 2.8 sun4-solaris
 ==> http://testers.cpan.org/search?request=dist&dist=DBD-SQLite
 [/J/JW/JWIED/DBD-mysql-2.1014.tar.gz]
     FAIL freebsd 4.5-release i386-freebsd (*)
     PASS freebsd 4.5-release i386-freebsd
 ==> http://testers.cpan.org/search?request=dist&dist=DBD-mysql

This way, you can safely upgrade your modules, confident in the knowledge that the newer version won't break the system.

Reporting Test Results

Despite the handy utility of the c command, there will be no reports if nobody submits them. CPANPLUS allows you to send a report whenever you've done a make test in the course of installing a module; to enable this feature, please turn on the cpantest configuration variable:

 CPAN Terminal> s cpantest 1
 CPAN Terminal> s save
 Your CPAN++ configuration info has been saved!

Afterward, just use CPANPLUS as usual. You will be prompted during an installation, like below:

 CPAN Terminal> i DBD::SQLite
 Installing: DBD::SQLite
 # ...
 t/30insertfetch....ok
 t/40bindparam......FAILED test 25
     Failed 1/28 tests, 96.43% okay
 t/40blobs..........dubious
     Test returned status 2 (wstat 512, 0x200)
     DIED.  FAILED tests 8-11
     Failed 4/11 tests, 63.64% okay
 Failed 2/19 test scripts, 89.47% okay.
 5/250 subtests failed, 98.00% okay.
 *** Error code 9
 Report DBD-SQLite-0.15's testing result (FAIL)? [y/N]: y

It always defaults to n, so you won't send out bogus reports by mistake. If you enter y, then different actions will happen, depending on the test's outcome:

  • For modules that passed their tests, a pass report is sent out immediately to the cpan-testers list.
  • For modules that didn't define a test, an unknown report is sent to the author and cpan-testers; the report will include a simple test script, encourage the module author to include it in the next release, and contain a URL of Michael Schwern's Test::Tutorial manpage.
  • If the module fails at any point (perl Makefile.PL, make, or make test), then the default editor is launched to edit its fail report, which includes the point of failure and the captured error buffer.

    You are free to add any text below the Additional comments line; to cancel a report, just delete everything and save an empty file.

Before sending out a fail report, be sure to double-check if it is really the module's problem. For example, if there's no libgtk on your system, then please don't send a report about failing to install Gtk.

Also, be prepared to follow up with additional information when asked. The author may ask you to apply some patches, or to try out an experimental release; do help the author whenever you can.

Batch Testing and cpansmoke

While regular users can test modules as they install via the cpantest option, and often provide important first-hand troubleshooting information, we still need dedicated testers -- they can broaden the coverage of available platforms, and may uncover problems previously unforeseen by the author. Dedicated testing and regular reports are complementary to each other.

Historically, CPAN testers would watch for notices of recent PAUSE uploads posted on the cpan-testers list, grab them, test them manually, and run the cpantest script, as we've seen in What is CPAN Testers?. That script is bundled with CPANPLUS; please refer to its POD documentation for additional options.

However, normally you won't be using it directly, as CPANPLUS offers the cpansmoke wrapper, which consolidates the download => test => copy buffer => run cpantest => file report procedure into a single command:
 % cpansmoke Mail::Audit Mail::SpamAssassin          # by module
 % cpansmoke Mail-Audit-2.1 Mail-SpamAssassin-2.20   # by distname
 % cpansmoke /S/SI/SIMON/Mail-Audit-2.1.tar.gz       # or full name

Due to the need of testing new distributions as they are uploaded, cpansmoke will use http://www.cpan.org/ as its primary host, instead of the local mirror configured in your CPANPLUS settings. This behavior may be overridden with the -l flag.

Since cpansmoke will stop right after make test rather than installing any modules, it can test modules with different versions than the ones installed on the system. However, since it won't resolve prerequisites by default, you'll need to specify the -p flag to test and report them.

The -d flag will display previous results before testing each module; similarly, -s will skip testing altogether if the module was already tested on the same platform. The latter should only be used with automated test (to reduce spamming), since the same platform can have different results.

There are numerous other flags available; please consult cpansmoke's manpage for further information.

Automatic Testing

Besides interactive batch-testing, cpansmoke also provides the capability of unattended testing; this concept is also covered by Mozilla Project's Tinderbox toolkit, as explained in its homepage:

 There is a flock of dedicated build machines that do
 nothing but continually check out the source tree and
 build it, over and over again.

Essentially, if you have a machine with plenty of hard disk space and adequate bandwidth, you can set up cpansmoke so it tests each new module as it is uploaded.

The -a option lets cpansmoke enter the non-interactive mode. In this mode, only failures during make test are reported, because errors that occured during Makefile.PL or make are more likely the machine's problem, not the module's.

Additionally, you should specify the -s flag to avoid duplicate reports, and -p to avoid false-negatives caused by unsatisfied dependencies.

Setting up an auto-smoking machine is simple, using a mail filtering mechanism; just join the CPAN Testers list by sending an empty email to cpan-testers-subscribe@perl.org, and add the lines below to your Mail::Audit filter:

 fork || exec("sleep 40000 && cpansmoke -aps $1 >/dev/null 2>&1")
     if $mail->subject =~ /^CPAN Upload: (.*)$/;

This rule will fork out a new cpansmoke process for each incoming mail about a CPAN upload. Note that it works only if incoming mail arrives no less than once every hour; otherwise it might fork-bomb your system to a grinding halt.

In case you're wondering, the sleep 40000 line is due to the fact that a PAUSE upload needs several hours to appear on www.cpan.org. Michael Schwern suggested that using a smoke queue would be a better solution, since it allows error-checking and logging; implementations are welcome.

If you are stuck with procmail, then here's the recipe that does the same (but do consider upgrading to Mail::Audit):

 :0hc
 * ^Subject: CPAN Upload:
 |sh -c "sleep 40000 && grep Subject|cut -f4 -d' '|xargs cpansmoke -aps >/dev/null 2>&1"

If you have suggestions on how to make this work on non-unix systems, please do let me know.

Testing-Related APIs

A programmable interface is the primary strength of CPANPLUS. All features listed above are available as separate methods of the CPANPLUS::Backend object. Since the module's manpage already contains detailed information, I'll just list some sample snippets below.

To show all reports of Acme::* on FreeBSD:

 use CPANPLUS::Backend;
 my $cp = CPANPLUS::Backend->new;
 my $reports = $cp->reports( modules => [ map $_->{package}, values(
     %{ $cp->search(type => 'module', list => ['^Acme::\b']) }
 ) ] );
 while ( my ($name, $href) = each (%$reports) ) {
     next unless $href;
     my @entries = grep { $_->{platform} =~ /\bfreebsd\b/ } @$href;
     next unless @entries;
     print "[$name]\n";
     for my $rv (@entries) {
         printf "%8s %s%s\n", @{$rv}{'grade', 'platform'},
                              ($rv->{detail} ? ' (*)' : '');
     }
 }

To test all Acme::* modules, but install all needed prerequisites:

 use CPANPLUS::Backend;
 my $cp = CPANPLUS::Backend->new;
 $cp->configure_object->set_conf( cpantest => 1 );
 $cp->install(
     modules => [ map $_->{package}, values(
         %{ $cp->search(type => 'module', list => ['^Acme::\b']) }
     ) ], 
     target        => 'test',
     prereq_target => 'install',
 );

If you need to fine-tune various aspects during testing (timeout, prerequisite handling, etc.), then please consult the source code of cpansmoke; most tricks involved are documented in comments.

Unfortunately, the process of editing and sending out reports remains as the only non-programmable part, since the cpantest script doesn't exist as a module. Although Skud has a CPAN::Test::Reporter on CPAN, its format is sadly out-of-sync with cpantest, and may generate invalid reports (as of version 0.02). Any offers to back-port the cpantest script into that module would be greatly appreciated.

Conclusion

As purl said, ``CPAN is a cultural mechanism.'' It is not an abstraction filled with code, but rather depends on people caring enough to share code, as well as sharing useful feedback in order to improve each other's code.

With the advent of bug-tracking service and automatic testing, the social collaboration aspect of CPAN is greatly extended, and could be developed further into a full-featured Tinderbox system, or linked with each module's source-control repositories.

Also, since CPANPLUS is designed to accommodate different package formats and distribution systems, it provides a solid foundation for projects like Module::Build (make-less installs), NAPC (distributed CPAN) and ExtUtils::AutoInstall (feature-based dependency probing)... the possibilities are staggering, and you certainly won't be disappointed in finding an interesting project to work on. Happy hacking!

Acknowledgements

Although I have done some modest work in integrating CPANPLUS with CPAN testing, the work is really built on contributions from several brilliant individuals:

First, I'd like to dedicate this article to Elaine -HFB- Ashton, for her tireless efforts on Perl advocacy, and for sponsoring my works as described in this article.

Thanks also goes to Graham Barr and Chris Nandor, for establishing the CPAN Testers at the first place, and coming up with two other important cultural mechanisms: CPAN Search and Use Perl; respectively.

To Jos Boumans, Joshua Boschert, Ann Barcomb and everybody in the CPANPLUS team -- you guys rock! Special thanks for another team member, Michael Schwern, for reminding everybody constantly that Kwalitee Is Job One.

To Paul Schinder, the greatest tester of all time, who submitted 10551 reports by hand out of 23203 to date, and kept CPAN Testers alive for a long time. And thanks for every fellow CPAN Tester who have committed their valuable time -- I hope cpansmoke can shoulder your burden a little bit.

To Simon Cozens, for his swiss-nuke Mail::Audit module, and for keep asking me to write 'something neat' for perl.com. :-)

To Jarkko Hietaniemi, for establishing CPAN; and to Andreas J. Koenig, for maintaining PAUSE and showing us what is possible with CPAN.pm.

Finally, if you decide to follow the steps in this article and participate in the testing process, then you have my utmost gratitude; let's make the world a better place.

Footnotes

  1. It's an introductory text for the CPANPLUS project, available at http://www.perl.com/pub/a/2002/03/26/cpanplus.html.
  2. Jos later confessed that these are not exactly Schwern's words; he just made the quote up for dramatic effect.
  3. With ActivePerl on Windows, simply replace all make with nmake. If you don't have nmake.exe in your %PATH, our Makefile.PL is smart enough to fetch and install it for you.
  4. Purl, the magnet #perl infobot, is also a cultural mechanism herself; see http://www.infobot.org/ for additional information.
  5. That's Jesse Vincent's http://rt.cpan.org/, which tracks bugs in every distribution released through CPAN.

mod_perl Developer's Cookbook

I always feel uneasy getting review copies of books like this; review copies of books are for me to look through, tell people how good or bad they are, and then sit on the shelf looking pretty. This book, essentially, is far too useful just to be a review copy, and has been extremely useful for me in my daily work.

Based very much on the style of the Perl Cookbook, this new offering from SAMS provides question-and-answer guides to many areas of mod_perl. Whereas the Eagle Book (Writing Apache Modules with Perl and C) provides an in-depth tutorial on using and programming mod_perl, the mod_perl Developer's Cookbook is much more useful for dipping into when you have specific problems to solve.

The down side of this, of course, is that within the recipes, there isn't a large-scale example. The book is excellent if you already know what you need to do but not how to do it; it's not so great if you need your hands holded. In this sense, I think it's best used as a companion book to the Eagle; use the Eagle to get an idea of what you can do, and use the Developer's Cookbook when you get stuck.

The range of material covered is pretty staggering, even for a book this thick - there's good coverage of the various Perl handlers in mod_perl, invaluable information on tuning Apache with mod_perl, and even, refreshingly, recipes for creating test suites. As well as the mod_perl API itself, applications of mod_perl such as AxKit, Template::Toolkit and the use of technologies such as SOAP are discussed.

Some of the organization of this material is a little suspect, though; for instance, there's a recipe for timing the lifetime of a request, but this is hidden away in Chapter 11, "The PerlInitHandler'', rather than in the chapter on tuning. While this makes sense if you know that timing involves a PerlInitHandler, this doesn't jive well with a book teaching you to do things you don't know how to do. In a similar vein, there's a recipe titled ``Using AxKit'', for if ``you want to use AxKit in a mod_perl environment'' - with no description of why you might want to use AxKit in the first place.

Unfortunately, this isn't helped by a deficient index - in fact, I'd encourage readers to use the table of contents as if it were the index, as that seems to be a much easier way to find relevant recipes.

Stylistically, the code presented is impeccable. The authors are well-versed in both Perl and mod_perl idiom, and are not afraid to share this with the reader. At the same time, the prose is crisp and concise, but not skimpy. Discussions following on from the recipes are quite thorough.

The appendices seem to be less useful than they might first appear; I would have preferred to have seen a summary of the mod_perl API here. However, these are minor details, and if that's all I can complain about, I might as well say it: I consider this book to be an invaluable companion to the Eagle if you're ever doing any serious work with mod_perl. It's helped me out on several occasions so far, and I expect it to do so many times in the future.

The Perl You Need To Know

Introduction

Before I delve into the details of mod_perl programming, it's probably a good idea to review some important Perl basics. You will discover these invaluable when you start coding for mod_perl. I will start with pure Perl notes and gradually move to explaining the peculiarities of coding for mod_perl, presenting the traps one might fall into and explaining things obvious for some of us but may be not for the others.

Using Global Variables and Sharing Them Between Modules/Packages

It helps when you code your application in a structured way, using the perl packages, but as you probably know once you start using packages it's much harder to share the variables between the various packagings. A configuration package comes to mind as a good example of the package that will want its variables to be accessible from the other modules.

Of course, using the Object Oriented (OO) programming is the best way to provide an access to variables through the access methods. But if you are not yet ready for OO techniques, then you can still benefit from using the techniques I'm going to talk about.

Making Variables Global

When you first wrote $x in your code, you created a (package) global variable. It is visible everywhere in your program, although if used in a package other than the package in which it was declared (main:: by default), then it must be referred to with its fully qualified name, unless you have imported this variable with import(). This will work only if you do not use strict pragma; but it's important to use this pragma if you want to run your scripts under mod_perl.

Making Variables Global With strict Pragma On

First you use :


  use strict;

Then you use:


 use vars qw($scalar %hash @array);

This declares the named variables as package globals in the current package. They may be referred to within the same file and package with their unqualified names; and in different files/packages with their fully qualified names.

With perl5.6 you can use the our operator instead:


  our ($scalar, %hash, @array);

If you want to share package global variables between packages, then here is what you can do.

Using Exporter.pm to Share Global Variables

Assume that you want to share the CGI.pm object (I will use $q) between your modules. For example, you create it in script.pl, but you want it to be visible in My::HTML. First, you make $q global.


  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
  $q = CGI->new;
  My::HTML::printmyheader();

Note that I have imported $q from My::HTML. And My::HTML does the export of $q:


  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;

  BEGIN {
    use Exporter ();

    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);

  }
  use vars qw($q);
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
  }
  1;

So the $q is shared between the My::HTML package and script.pl. It will work vice versa as well, if you create the object in My::HTML but use it in script.pl. You have true sharing, since if you change $q in script.pl, then it will be changed in My::HTML as well.

What if you need to share $q between more than two packages? For example you want My::Doc to share $q as well.

You leave My::HTML untouched, and modify script.pl to include:


 use My::Doc qw($q);

Then you add the same Exporter code that I used in My::HTML, into My::Doc, so that it also exports $q.

One possible pitfall is when you want to use My::Doc in both My::HTML and script.pl. Only if you add


  use My::Doc qw($q);

into My::HTML will $q be shared. Otherwise My::Doc will not share $q any more. To make things clear here is the code:


  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
  use My::Doc  qw($q); # Ditto
  $q = new CGI;

  My::HTML::printmyheader();  
  
  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;

  BEGIN {
    use Exporter ();
    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);
  }
  use vars     qw($q);
  use My::Doc  qw($q);
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
    My::Doc::printtitle('Guide');
  }
  1;  
  
  My/Doc.pm
  ----------------
  package My::Doc;
  use strict;

  BEGIN {
    use Exporter ();
    @My::Doc::ISA         = qw(Exporter);
    @My::Doc::EXPORT      = qw();
    @My::Doc::EXPORT_OK   = qw($q);
  }
  use vars qw($q);
  sub printtitle{
    my $title = shift || 'None';
    print $q->h1($title);
  }
  1;

Using the Perl Aliasing Feature to Share Global Variables

As the title says, you can import a variable into a script or module without using Exporter.pm. I have found it useful to keep all the configuration variables in one module My::Config. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which increases the memory requirements; and adding the overhead of keeping track of what variables should be exported from the configuration module and what imported, for some particular package. I solve this problem by keeping all the variables in one hash %c and exporting that. Here is an example of My::Config:


  package My::Config;
  use strict;
  use vars qw(%c);
  %c = (
    # All the configs go here
    scalar_var => 5,

    array_var  => [qw(foo bar)],
    hash_var   => {
                   foo => 'Foo',
                   bar => 'BARRR',
                  },
  );
  1;

Now in packages that want to use the configuration variables I have either to use the fully qualified names such as $My::Config::test, which I dislike, or import them as described in the previous section. But hey, since I have only one variable to handle, I can make things even simpler and save the loading of the Exporter.pm package. I will use the Perl aliasing feature for exporting and saving the keystrokes:


  package My::HTML;
  use strict;
  use lib qw(.);
    # Global Configuration now aliased to global %c
  use My::Config (); # My/Config.pm in the same dir as script.pl
  use vars qw(%c);
  *c = \%My::Config::c;

    # Now you can access the variables from the My::Config
  print $c{scalar_var};
  print $c{array_var}[0];
  print $c{hash_var}{foo};

O'Reilly Open Source Convention -- July 22-26, San Diego, CA.

From the Frontiers of Research to the Heart of the Enterprise

mod_perl 2.0, the Next Generation Stas Bekman will provide an overview of what's new in mod_perl 2.0 and what else is planned for the future in his talk at the upcoming O'Reilly Open Source Convention, this July 22-26, in San Diego.

Of course, $c is global when you use it as described above, and if you change it, then it will affect any other packages you have aliased $My::Config::c to.

Note that aliases work either with global or local() vars - you cannot write:


  my *c = \%My::Config::c; # ERROR!

Which is an error. But you can write:


  local *c = \%My::Config::c;

For more information about aliasing, refer to the Camel book, second edition, pages 51-52.

Using Non-Hardcoded Configuration Module Names

You have just seen how to use a configuration module for configuration centralization and an easy access to the information stored in this module. However, there is somewhat of a chicken-and-egg problem: how to let your other modules know the name of this file? Hardcoding the name is brittle -- if you have only a single project, then it should be fine. If you have more projects that use different configurations and you will want to reuse their code, then you will have to find all instances of the hardcoded name and replace it.

Another solution could be to have the same name for a configuration module, like My::Config but putting a different copy of it into different locations. But this won't work under mod_perl because of the namespace collision. You cannot load different modules that use the same name; only the first one will be loaded.

Luckily, there is another solution that allows us to be flexible. PerlSetVar comes to rescue. Just like with environment variables, you can set server's global Perl variables that can be retrieved from any module and script. Those statements are placed into the httpd.conf file. For example


  PerlSetVar FooBaseDir       /home/httpd/foo
  PerlSetVar FooConfigModule  Foo::Config

Now I require() the file where the above configuration will be used.


  PerlRequire /home/httpd/perl/startup.pl

In the startup.pl I might have the following code:


    # retrieve the configuration module path
  use Apache:
  my $s             = Apache->server;
  my $base_dir      = $s->dir_config('FooBaseDir')      || '';
  my $config_module = $s->dir_config('FooConfigModule') || '';
  die "FooBaseDir and FooConfigModule aren't set in httpd.conf" 
    unless $base_dir and $config_module;

    # build the real path to the config module
  my $path = "$base_dir/$config_module";
  $path =~ s|::|/|;
  $path .= ".pm";
    # I have something like "/home/httpd/foo/Foo/Config.pm"
    # now I can pull in the configuration module
  require $path;

Now I know the module name and it's loaded, so for example if I need to use some variables stored in this module to open a database connection, then I will do:


  Apache::DBI->connect_on_init
  ("DBI:mysql:${$config_module.'::DB_NAME'}::${$config_module.'::SERVER'}",
   ${$config_module.'::USER'},
   ${$config_module.'::USER_PASSWD'},
   {
    PrintError => 1, # warn() on errors
    RaiseError => 0, # don't die on error
    AutoCommit => 1, # commit executes immediately
   }
  );

Where variable like:


  ${$config_module.'::USER'}

In my example are really:


  $Foo::Config::USER

If you want to access these variable from within your code at the run time, instead accessing to the server object $c, then use the request object $r:


  my $r = shift;
  my $base_dir      = $r->dir_config('FooBaseDir')      || '';
  my $config_module = $r->dir_config('FooConfigModule') || '';

The Scope of the Special Perl Variables

Now let's talk about Special Perl Variables.

Special Perl variables like $| (buffering), $^T (script's start time), $^W (warnings mode), $/ (input record separator), $\ (output record separator) and many more are all true global variables; they do not belong to any particular package (not even main::) and are universally available. This means that if you change them, then you change them anywhere across the entire program; furthermore you cannot scope them with my(). However, you can local()ize them, which means that any changes you apply will only last until the end of the enclosing scope. In the mod_perl situation where the child server doesn't usually exit, if in one of your scripts you modify a global variable, then it will be changed for the rest of the process' life and will affect all the scripts executed by the same process. Therefore, localizing these variables is highly recommended; I'd say even mandatory.

I will demonstrate the case on the input record separator variable. If you undefine this variable, then the diamond operator (readline) will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.


  $/ = undef; # BAD!
  open IN, "file" ....
    # slurp it all into a variable
  $all_the_file = <IN>;

The proper way is to have a local() keyword before the special variable is changed, like this:


  local $/ = undef; 
  open IN, "file" ....
    # slurp it all inside a variable
  $all_the_file = <IN>;

But there is a catch. local() will propagate the changed value to the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script.

A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:


  {
    local $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;
  }

That way when Perl leaves the block it restores the original value of the $/ variable, and you don't need to worry elsewhere in your program about its value being changed here.

Note that if you call a subroutine after you've set a global variable but within the enclosing block, the global variable will be visible with its new value inside the subroutine.

Compiled Regular Expressions

And finally I want to cover the pitfall many people have fallen into. Let's talk about regular expressions use under mod_perl.

When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not change during the execution of the program, a standard optimization technique is to add the /o modifier to the regex pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than each time the pattern is executed. Consider:


  my $pat = '^foo$'; # likely to be input from an HTML form field
  foreach( @list ) {
    print if /$pat/o;
  }

Previously in the Series

Installing mod_perl without superuser privileges

mod_perl in 30 minutes

Why mod_perl?

This is usually a big win in loops over lists, or when using the grep() or map() operators.

In long-lived mod_perl scripts, however, the variable may change with each invocation and this can pose a problem. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by that child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is supposed to depend on. Your script will appear to be broken.

There are two solutions to this problem:

The first is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself.

The above code fragment would be rewritten as:


  my $pat = '^foo$';
  eval q{
    foreach( @list ) {
      print if /$pat/o;
    }
  }

Just saying:


  foreach( @list ) {
    eval q{ print if /$pat/o; };
  }

means that I recompile the regex for every element in the list, even though the regex doesn't change.

You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an m// or s///), then you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval.

The above code fragment becomes:


  my $pat = '^foo$';
  "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
  foreach( @list ) {
    print if //;
  }

The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can't count on fixed text to ensure the match succeeds, then you have two possibilities.

If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), then you can use the dummy match:


  $pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present

If there is a possibility that the pattern can contain meta-characters, then you should search for the pattern or the nonsearchable \377 character as follows:


  "\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present

Another approach:

It depends on the complexity of the regex to which you apply this technique. One common usage where a compiled regex is usually more efficient is to "match any one of a group of patterns'' over and over again.

Maybe with a helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book "Mastering Regular Expressions''.


  #####################################################
  # Build_MatchMany_Function
  # -- Input:  list of patterns
  # -- Output: A code ref which matches its $_[0]
  #            against ANY of the patterns given in the
  #            "Input", efficiently.
  #
  sub Build_MatchMany_Function {
    my @R = @_;
    my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
    my $matchsub = eval "sub { $expr }";
    die "Failed in building regex @R: $@" if $@;
    $matchsub;
  }

Example usage:


  @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
  $Known_Browser=Build_MatchMany_Function(@some_browsers);

  while (<ACCESS_LOG>) {
    # ...
    $browser = get_browser_field($_);
    if ( ! &$Known_Browser($browser) ) {
      print STDERR "Unknown Browser: $browser\n";
    }
    # ...
  }

In the next article, I'll present a few other Perl basics directly related to the mod_perl programming.

References

  • The book "Mastering Regular Expressions'' by Jeffrey E. Friedl.

  • The book "Learning Perl'' by Randal L. Schwartz (also known as the "Llama'' book, named after the llama picture on the cover of the book).

  • The book "Programming Perl'' by L.Wall, T. Christiansen and J.Orwant (also known as the "Camel'' book, named after the camel picture on the cover of the book).

  • The Exporter, perlre, perlvar, perlmod and perlmodlib man pages.

XSP, Taglibs and Pipelines

In the first article in this series, we saw how to install, configure and test AxKit, and we took a look at a simple processing pipeline. In this article, we will see how to write a simple 10-line XSP taglib and use it in a pipeline along with XSLT to build dynamic pages in such a way that gets the systems architects and coders out of the content maintenance business. Along the way, we'll discuss the pipeline processing model that makes AxKit so powerful.

First though, let us catch up on some changes in the AxKit world.

CHANGES

Matt and company have released AxKit v1.5.1, and 1.5.2 looks as though it will be out soon. 1.5.1 provides a few improvements and bug fixes, especially in the XSP engine discussed in this article. The biggest additions are the inclusion of a set of demonstration pages that can be used to test and experiment with AxKit's capabilities, and another module to help with writing taglibs (Jorge Walter's SimpleTaglib, which we'll look at in the next article).

There has also been a release policy change: The main AxKit distributions (AxKit-1.5.1.tar.gz for instance) will no longer contain the minimal set of prerequisites; these will now be bundled in a separate tarball. This policy change enables people to download just AxKit (when upgrading it, for instance) and recognizes the fact that AxKit is an Apache project while the prerequisites aren't. Until the prerequisite tarball gets released, the source tarball that accompanies this article contains them all (though it installs them in a local directory only, for testing purposes). The main AxKit tarball still includes all the core AxKit modules.

XSP and taglibs

We touched on eXtensible Server Pages and taglibs in the last article; this time we'll take a deeper look and try to see how XSP, taglibs and XSLT can be combined in a powerful and useful way.

A quick taglib refresher: A taglib is a collection of XML tags in an XML namespace that act like conditionals or subroutine calls. Essentially, taglibs are a way of encoding logic and dynamic content in to pages without including raw "native" code (Perl, Java, COBOL; any of the popular Web programming languages) in the page; each taglib provides a set of related services and multiple taglibs may be used in the same XSP page.

For our example, we will build a "data driven" processing pipeline (clicking on a blue icon or arrow will take you to the relevant documentation or section of this article):

weather1.xsp processing pipeline The source XSP document The first XSLT stylesheet: weather.xsl The second XSLT stylesheet: as_html.xsl The final document The output from the XSP processor The output from the first XSLT processor My::Weather code and description The Util taglib
Example icon NOTE: A little icon like this will be shown with the description of each piece of the pipeline with that piece hilighted. Clicking on those icons will bring you back to this diagram.

This pipeline has five stages:

  1. the XSP document (weather1.xsp) defines what chunks of raw data (current time and weather readings) are needed for this page in a simple XML format,
  2. the XSP processor applies taglibs to assemble the raw data for the page,
  3. the first XSLT processor and stylesheet (weather.xsl) format the raw data in to usable content,
  4. the second XSLT processor and stylesheet (as_html.xsl) lays out and generates the final page ("Result Doc"), and
  5. the Gzip compressor automatically compresses the output if the client can comprehend compressed content (even when an older browser is bashful and does not announce that it can cope with gzip-encoded content).

This multipass approach mimics those found in real applications; each stage has a specific purpose and can be designed and tested independently of the others.

In a real application, there might well be more filters: Our second XSLT filter might be tweaked to build document without any "look and feel," and an additional filter could be used to implement the "look and feel" of the presentation after the layout is complete. This would allow look and feel to be altered independently of the layout.

We call this a "data driven" pipeline because the document feeding the pipeline defines what data is needed to serve the page; it does not actually contain any content. Later stages add the content and format it. We'll look at a "document driven" pipeline, which feeds the pipeline with the document to be served, in the next article.

Why pipelines?

The XML pipeline-processing model used by AxKit is a powerful approach that brings technical and social advantages to document-processing systems lsuch as Web applications.

On the social side, the concept of a pipeline or assembly line is simple enough to be taught to and grasped by nonprogrammers. Some of the approaches used by HTML-oriented tools (like many on CPAN) are not exactly designer-friendly: They rely on programmer-friendly concepts such as abstract data structures, miniature programming languages, "catch-all" pages and object-oriented techniques such as method dispatch. To be fair, some designers can and do learn the concepts, and others have Perl implementors who deploy these tools in simple patterns that are readily grasped.

The reason that HTML-oriented tools have a hard time something as simple as a pipeline model is that HTML is a presentation language and does not lend itself to describing data structures or to incremental processing. The advantage of incremental processing is that the number of stages can be designed to fit the application and organization; with other tools, there's often a one- or two-stage approach where the first stage is almost entirely in the realm of the Perl coders ("prepare thedata") and the second is halfway in the realm of the coders and halfway in the designer's realm.

XML can be used both to describe data structures and mark up prose documents; this allows a pipeline to mingle data and prose in flexible ways. Each processing stage is XML-in, XML-out (except for the first and last, which often consume and generate other formats). However, the stages aren't limited to dealing purely with XML: The taglibs we're using show one way that stages can use Perl code (and, by extension, many other languages; see the Inline series of modules), external libraries, and almost any non-XML data as needed. Not only does My::WeatherTaglib integrate with Perl code, it's also requesting data over the Internet from a remote site.

The use of XML as the carrier for data between the stages also allows each stage to be debugged and unit tested. The documents that forwarded between the stages are often passed as internal data structures instead of XML strings for efficiency's sake, but they can be rendered (like the examples shown in this article) and examined, both for teaching purposes and debugging purposes. The uniform look and feel of XML, whatever it's disadvantages, is at least readily understood by any competent Web designer.

Pipelines are also handy mechanisms in that the individual stages are chosen at request time; different stages can be selected to deliver different views of the same content. AxKit provides several powerful mechanisms for configuring pipelines, the <xml-stylesheet ...> approach used in this article is merely the simplest; future articles will explore building flexible pipelines.

Pipelines also make a useful distinction between the manager (AxKit) and the processing technologies. Not only does this allow you to mix and match processing techniques as needed (AxKit ships with nine "Languages", or types of XML processor, and several "Providers", or data sources), it also allows new technologies to be incorporated in to existing sites when a new technology is needed.

Moreover, technologies like XML and XSLT are standardized and are becoming widely accepted and supported. This means that most, if not all the documents in our example pipeline can be handed off to non-Perl coders without (much) fear of them mangling the code. When they do mangle it; tools such as xsltproc (shipped with libxslt, one of the AxKit XSLT processors) can be used to give the designers a first line of defense before calling in the programmers. Even taglibs, nonstandard though they are, leverage the XML standard to make logic and data available to noncoders in a (relatively) safe manner. For instance, here's an excellent online tutorial provided by ZVON.org.

Mind you, XML and XSLT have their rough spots; the trick is that you don't need to know all the quirky ins and outs of the XML specification or use XSLT for things that are painful to do in it. I mean, really, when was the last time you dealt with a notation declaration? Most uses of XML use a small subset of the XML specification, and other tools such as XSP, XPathScript and various schema languages can be used where XSLT would only make the problem more difficult.

What stages are appropriate depends on the application's requirements and those of the organization(s) involved in building, operating and maintaining it. In the next article, we'll examine a "document driven" pipeline and a taglib better suited for this approach that uses different stages.

All that being said there will always be a place for the non-XML and non-pipelined solutions: XML and pipelines are not panaceas. I still use other solutions when the applications or organizations I work with would not benefit from XML.

httpd.conf: the AxKit configuration

Before we start in to the example code, let's glance at the AxKit configuration. Feel free to skip ahead to the code if you like; otherwise, here's the configuration we'll use in httpd.conf:


    ##
    ## Init the httpd to use our "private install" libraries
    ##
    PerlRequire startup.pl
    
    ##
    ## AxKit Configuration
    ##
    PerlModule AxKit
    
    <Directory /home/me/htdocs">
	        Options -All +Indexes +FollowSymLinks
    
        # Tell mod_dir to translate / to /index.xml or /index.xsp
        DirectoryIndex index.xml index.xsp
        AddHandler axkit .xml .xsp
    
        AxDebugLevel 10
    
        AxGzipOutput Off
    
        AxAddXSPTaglib AxKit::XSP::Util
        AxAddXSPTaglib AxKit::XSP::Param
        AxAddXSPTaglib My::WeatherTaglib
    
        AxAddStyleMap application/x-xsp
                      Apache::AxKit::Language::XSP
    
        AxAddStyleMap text/xsl
                      Apache::AxKit::Language::LibXSLT
    </Directory>

This is the same configuration from the the last article—most of the directives and the processing model used by Apache and AxKit for them are described in detail there. The two directives in bold have been added. The key directives for our example are AxAddXSPTaglib and AxAddStyleMap.

The AxAddXSPTaglib directives load three tag libraries: Kip Hampton's Util and Param taglibs and our very own WeatherTaglib. Util will allow our example to get at the system time; Param will allow it to parse URLs; and WeatherTaglib will allow us to fetch the current weather conditions for that zip code.

The two AxAddStyleMap directives map a pair of mime types to an XSP and an XSLT processor. Our example source document will refer to these mime types to configure instances of XSP and XSLT processors in to the processing pipeline.

We're using Apache::AxKit::Language::LibXSLT to perform XSLT transforms, which uses the GNOME project's libxslt library under the hood. Different XSLT engines offer different sets of features. If you prefer, then you can also use Apache::AxKit::Language::Sablot for XSLT work. You can even use them in the same pipeline by assigning them to different mime types.

My::WeatherTaglib's position in the weather1.xsp processing pipeline My::WeatherTaglib

Here's a taglib that uses Geo::Weather module on CPAN to take a zip code and fetch some weather observations from weather.com and convert them to XML:


    package My::WeatherTaglib;
    
    $NS = "http://slaysys.com/axkit_articles/weather/";
    @EXPORT_TAGLIB = ( 'report($zip)' );
    
    use strict;
    use Apache::AxKit::Language::XSP::TaglibHelper;
    use Geo::Weather;
    
    sub report { Geo::Weather->new->get_weather( @_ ) }
    
    1;

This taglib uses Steve Willer's TaglibHelper (included with AxKit) to automate the drudgery of dealing with XML. Because of this, our example taglib distills a lot of power into a few lines of Perl. Don't be fooled, though, there's a lot going on behind the curtains with this module.

When a tag like <weather:report zip="15206"/> is encountered in an XSP page, it will be translated into a call to report( "15206" ), the result of the call will be converted to XML and will replace the original tag in the XSP output document.

The $NS variable sets the namespace URI for the taglib; this configures XSP to direct all elements within the namespace http://slaysys.com/axkit_articles/weather/ to My::WeatherTaglib, as we'll see in a bit.

When used in an XSP page, all XML elements supplied by a taglib will have a namespace prefix. For instance, the prefix weather: is mapped to My::WeatherTaglib's namespace in the XSP page below. This prefix is not determined by the taglib&8212;we could have chosen another; this section assumes that the prefix weather: is used for the sake of clarity.

The @EXPORT_TAGLIB specifies what functions will be exported as elements in this namespace and what parameters they accept (see the documentation for details). The report($zip) export specification exports a tag that is invoked like <weather:report zip="..."/> or


    <weather:report>
        <weather:zip>15206</weather:zip>
    </weather:report>

The words "report" and "zip" in the @TAGLIB_EXPORT definition are used to determine the taglib element and attribute names; the order of the parameters in the definition determines the order they are passed in to the function. When invoking a taglib, the XML may specify the parameters in any order in the XML. The names they are specified with are not important to or visible from the Perl code by default (see the *argument function specification for how to accept an XML tree if need be).

All that's left for us to do is to write the "body" of the taglib by use()ing Geo::Weather and writing the report() subroutine.

There are two major conveniences provided by TaglibHelper. The first is welding Perl subroutines to XML tags (via the @EXPORT_TAGLIB definitions). The second is converting the Perl data structure returned by report(), a hash reference like


    {
      city  => "Pittsburgh",
      state => "PA",
      cond  => "Sunny",
      temp  => 76,
      pic   => "http://image.weather.com/web/common/wxicons/52/26.gif",
      url   => "http://www.weather.com/search/search?where=15206",
      ...
    }

in to well balanced XML like:


      <city>Pittsburgh</city>
      <state>PA</state>
      <cond>Sunny</cond>
      <temp>76</temp>
      <pic>http://image.weather.com/web/common/wxicons/52/26.gif</pic>
      <url>http://www.weather.com/search/search?where=15206</url>
      ...

TaglibHelper allows plain strings, data structures and strings of well-balanced XML to be returned. By writing a single one-line subroutine that returns a Perl data structure, we've written a taglib that requires no Perl expertise to use (any XHTML jock could use it safely using their favorite XML, HTML or text editor) and that can be used to serve up "data" responses for XML-RPC-like applications or "text" documents for human consumption.

The Data::Dumper module that ships with Perl is a good way to peer inside the data structures floating around in a request. Wehn run in AxKit, a quick warn Dumper( $foo ); will dump the data structure referred to by $foo to the Apache error log.

The output XML from a taglib replaces the orignal tag in the result document. In our case, the replacement XML is not too useful as-is, it's just data that looks a bit XMLish. Representing data structures as XML may seem awkward to Perl gurus, but it's quite helpful if you want to get the Perl code safely out of the way and allow others to use XML tools to work with the data.

The "data documents" from our XSP processor will upgraded in later processing stages to content that is presentable to the client; our XSP page neither knows nor cares how it is to be presented. Exporting the raw data as XML here is intended to show how to get the Perl gurus out of the critical path for content development by allowing site designers and content authors to do it.

Emitting structured data from XSP pages is just one approach. Taglibs can return whatever the "best fit" is for a given application, whether that be raw data, pieces of content or an entire article.

Cocoon, the system AxKit was primarily inspired by, uses a different approach to writing taglibs. AxKit also supports that approach, but it tends to be more awkward, so it will be examined in the next article. We'll also look at Jorge Walter's new SimpleTaglib module then, which is newer and more flexible, but less portable than the TaglibHelper module we're looking at here.

My::WeatherTaglib's position in the weather1.xsp processing pipeline weather1.xsp: Using My::WeatherTaglib


Here's a page (weather1.xsp) that uses the My::WeatherTaglib and the "standard" XSP util taglib we used in the previous article:


    <?xml-stylesheet href="NULL"        type="application/x-xsp"?>
    <?xml-stylesheet href="weather.xsl" type="text/xsl"         ?>
    <?xml-stylesheet href="as_html.xsl" type="text/xsl"         ?>
    
    <xsp:page
        xmlns:xsp="http://www.apache.org/1999/XSP/Core"
        xmlns:util="http://apache.org/xsp/util/v1"
        xmlns:param="http://axkit.org/NS/xsp/param/v1"
        xmlns:weather="http://slaysys.com/axkit_articles/weather/"
    >
    <data>
      <title><a name="title"/>My weather report</title>
      <time>
        <util:time format="%H:%M:%S" />
      </time>
      <weather>
        <weather:report>
          <!-- Get the ?zip=12345 from the URI and pass it
               to the weather:report tag as a parameter -->
          <weather:zip><param:zip/></weather:zip>
        </weather:report>
      </weather>

    </data>
    </xsp:page>

When weather1.xsp is requested, AxKit parses the <?xml-stylesheet ...?> processing instructions and uses the AxAddStyleMap directives to build the processing chain shown above.

My::WeatherTaglib's position in the weather1.xsp processing pipeline

The XSP processor is the first processor in the pipeline. As it parses the page, it sends all elements with util:, param: or weather: prefixes to the Util, Param , and WeatherTaglib taglibs. This mapping is defined by the xmlns:... attributes and by the namespace URIs that are hardcoded into each taglib's implementation (see the $NS variable in My::WeatherTaglib).

In this page, the <util:time> element results in a call to Util's get_date() and the value of the format= attribute is passed in as a parameter. The string returned by get_date() is converted to XML and emitted instead of the <util:time> element in the output page. This shows how to pass simple constant parameters to a taglib.

We're getting slightly trickier with the <weather:report> element: This construct fetches the zip parameter from the request URI's query string (or form field) and passes it to WeatherTaglib's report() as the $zip parameter. Thanks to Kip Hampton for the help in using AxKit::XSP::Param in this manner.

Because we have the AxDebugLevel set to 10, you can see these calls the compiled version of weather1.xsp; the generated Perl code is written to Apache's error log—usually $SERVER_ROOT/logs/error_log.

The <a name="title"/> in the <title> element is a contrivance put in this page to show off a feature later in the processing chain. Be glad it's not the dreaded <blink> tag!

My::WeatherTaglib's position in the weather1.xsp processing  pipeline

The XML document that is outputted from the XSP processor and fed to the first XSLT processor looks like (taglib output in bold):


    <?xml version="1.0" encoding="UTF-8"?>
    <data>
      <title><a name="title"/>My weather report</title>
      <time>16:11:55</time>
      <weather>
        <state>PA</state>
        <heat>N/A</heat>
        <page>/search/search?where=15206</page>
        <wind>From the Southwest at 10</wind>
        <city>Pittsburgh</city>
        <temp>76</temp>
        <cond>Sunny</cond>
        <uv>2</uv>
        <visb>Unlimited</visb>
        <url>http://www.weather.com/search/search?where=15206</url>
        <dewp>53</dewp>
        <zip>15206</zip>
        <baro>29.75</baro>
        <pic>http://image.weather.com/web/common/wxicons/52/26.gif</pic>
        <humi>31</humi>
      </weather>
    </data>

This data is largely presentation-neutral—kindly overlook the U.S. centric temperature scale—it can be styled as needed.

To generate this intermediate document, just commenting out all but the first <?xml-stylesheet ... ?> processing instruction and request the page like so:

    $ lynx -source localhost:8080/02/weather1.xsp?zip=15206 | xmllint --format -

xmllint is installed with the GNOMDE libxsml library used by various parts of AxKit.

When we cover how to build pipelines in more dynamic ways than using these stodgy old xml-stylesheet PIs, those techniques can be used to allow intermediate documents to be shown by varying the request URI.

My::WeatherTaglib's position in the weather1.xsp processing pipeline weather.xsl: Converting Data to Content


Here's how we can convert the data document emitted by the XSP processor into more human-readable text. As described above, we're taking a two-step approach to simulate a "real-world" scenario of turning our data into chunks of content in one (reusable) step and then laying the HTML out in a second step.

weather.xsl is an XSLT stylesheet that uses several templates to convert the XSP output in to something more readable:


    <xsl:stylesheet 
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    
    <xsl:template match="/data/time">
      <time>Hi! It's <xsl:value-of select="/data/time" /></time>
    </xsl:template>
    
    <xsl:template match="/data/weather">
      <weather>The weather in
        <xsl:value-of select="/data/weather/city" />,
        <xsl:value-of select="/data/weather/state"/> is
        <xsl:value-of select="/data/weather/cond" /> and
        <xsl:value-of select="/data/weather/temp" />F
        (courtesy of <a href="{/data/weather/url}">The
        Weather Channel</a>).
      </weather>
    </xsl:template>
    
    <xsl:template match="@*|node()">
      <!-- Copy the rest of the doc verbatim -->
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:template>
    
    </xsl:stylesheet>

My::WeatherTaglib's position in the weather1.xsp processing pipeline This is applied by the first XSLT processor in the pipeline.

The interesting thing here is that we are using two templates (shown in bold) to process different bits of the source XML. These templates "blurbify" the time and weather data in to presentable chunks and, as a side-effect, throw away unused data from the weather report.

The third template just passes the rest through (XSLT has some annoying qualities, one of which is that it takes a complex bit of code to simple pass things through "as is"). However, this is boilerplate&8212;right from the XSLT specification, in fact&8212;and need not interfere with designers creating the two templates we actually want in this stylesheet.

Another annoying quality is that XSLT contructs look visually similar to the templates themselves. This violates the language design principle "different things should look different," which is used in Perl and many other languages. This can be ameliorated by using an XSLT-aware editor or syntax hilighting to make the differences between XSLT statements and "payload" XML clear.

My::WeatherTaglib's position in the weather1.xsp processing pipeline The output from the first XSLT processor looks like (template output in bold):


    <?xml version="1.0"?>
    <data>
      <title><a name="title"/>My weather report</title>
      <time>Hi! It's 16:50:36</time>
      <weather>The weather in
        Pittsburgh,
        PA is
        Sunny and
        76F
        (courtesy of <a href="http://www.weather.com/search/search?where=15206">The
        Weather Channel</a>)
      </weather>
    </data>

Now we have a set of chunks that can be placed on a Web page. This technique can be used to build sidebars, newspaper headlines, abstracts, contact lists, navigation cues, links, menus, etc., in a reusable fashion.

My::WeatherTaglib's position in the weather1.xsp processing pipeline as_html.xsl: Laying out the page


The final step in this example is to insert the chunks we've built into a page of HTML using the as_html.xslstylesheet:


    <xsl:stylesheet 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="1.0">

    <xsl:output method="html" />

    <xsl:template match="/">
      <html>
        <head>
          <title><xsl:value-of select="/data/title" /></title>
        </head>
        <body>
          <h1><xsl:copy-of select="/data/title/node()"   /></h1>
          <p ><xsl:copy-of select="/data/time/node()"    /></p>
          <p ><xsl:copy-of select="/data/weather/node()" /></p>
        </body>
      </html>
    </xsl:template>

    </xsl:stylesheet>

My::WeatherTaglib's position in the weather1.xsp processing pipeline To generate the final HTML:


    <html>
    <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
    <title>My weather report</title>
    </head>
    <body>
    <h1><a name="title"/>My weather report</h1>
    <p>Hi! It's 17:05:08</p>
    <p>The weather in
        Pittsburgh,
        PA is
        Sunny and
        76F
        (courtesy of <a href="http://www.weather.com/search/search?where=15206">The
        Weather Channel</a>).
    </p>
    </body>
    </html>

Using the /data/title from the data document in two places in the result document is a minor example of the benefit of separating the original data generation from the final presentation. In the <title> element, we're using xsl:value-of, which returns just the textual content; in the <h1> element, we're using xsl:copy-of, which copies the tags and the text. This allows the title to contain markup that we strip in one place and use in another.

This is similar to the situation often found in real applications where things like menus, buttons, location tell-tales ("Home >> Articles >> Foo" and the like) and links to related pages often occur in multiple places. Widgets like these make ideal "chunks" that the layout page can place as needed.

This is only one example of a "final rendering" stage; different filters could be used instead to deliver different formats. For instance, we could use XSLT to deliver XML, XHTML, and/or plain text versions, or we could use an AxKit-specific processor, XPathScript, to convert to things like RTF, nroff, and miscellaneous documentation formats that XML would otherwise have a hard time delivering.

AxKit optimizes this two-stage XSLT processing by passing the internal representation used by libxslt directly between the two stages. This means that output from one stage goes directly to the next stage without having to be reparsed.

Relating weather1.xsp to the real world

If you squint a little at the code in My::WeatherTaglib, then you can imagine using a DBI query instead of having Geo::Weather query a remote Web site (Geo::Weather is used instead of DBI in this article to keep the example code and tarball relatively simple).

Writing queries and other business logic in to taglibs has several major advantages:

  • the XML taglib API puts a designer friendly face on the queries, allowing the XSP page to be tweaked or maintained by non-Perl literate folks with their preferred tools, hopefully getting you off the "please tweak the query parms" critical path.
  • Since the taglib API is XML, standard XML editors will catch basic syntax errors without needing to call in the taglib maintainer.
  • Schema validators and XSLT tools can also be used to allow the designers to check the higher-level syntax before pestering the taglib maintainer.
  • The query parameters and output can be touched up with Perl, making the "high level" XML interface simpler and more idiot-proof.
  • The output is XML, so other XML processors can be used to enhance the content and style. This allows, for instance, XSLT literate designers to work on the presentation without needing to learn or even see (and possibly corrupt) any Perl code or a new language (as is required with most HTML templating solutions).
  • It's quite difficult to accidently generate malformed XML using XSP: a well-formed XSP usually generates well-formed output.
  • The queries are decoupled from the source XML, so they can be maintained without touching the XSP pages.
  • The taglibs can be unit tested, unlike embedded code.
  • Taglibs can be wrappers around existing modules, so the same Perl code can be shared by both the web front end and any other scripts or tools that need them.
  • The plug-in nature of taglibs allows using many public and private XSP taglibs facilities for rapid prototyping. CPAN's chock full of 'em.
  • In addition to the "function" tags like the two demostrated above, you can program "conditional" tags that control whether or not a block of the XSP page is included; this gives you the ability to respond to user preferences or rights, for instance.

The DBI module lets you work with almost any database ranging from comma separated value files (with SQL JOIN support no less) through MySQL, PostgreSQL, Oracle, etc, etc., and returns Perl data structures just crying out to be returned from a taglib function and turned in to XML.

For quick one-off pages and prototypes, the ESQL taglib allows you to embed SQL directly in XSP pages. This is not recommended practice because it's not efficient enough for heavily trafficed sites (the database connection is rebuilt each time), and because mixing programming code in with the XML leads to some pretty unreadable and hard-to-maintain pages, but it is good for one-off pages and prototypes.

Help and thanks

In case of trouble, have a look at some of the helpful resources we listed last time.

Thanks to Kip Hampton, Jeremy Mates and Martin Oldfield, for their thorough reviews, though I'm sure I managed to sneak some bugs by them. AxKit and many of the Perl modules it uses are primarily written by Matt Sergeant with extensive contributions from these good folks and others, so many thanks to all contributors as well.

Copyright 2002, Robert Barrie Slaymaker, Jr. All Rights Reserved.

Installing mod_perl without superuser privileges

As you have seen from my previous articles, mod_perl enabled Apache consists of two main components: Perl modules and Apache itself. While installing Apache without root privileges is easy, one should know how to install Perl modules in a nonsystem-wide location. In this article, I'll demonstrate ways to complete this task.

In the examples, I'll use stas as a username, and /home/stas as the home directory for that user.

Installing Perl Modules Into a Directory of Choice

Since without superuser permissions you aren't allowed to install modules into system directories such as /usr/lib/perl5, you need to find out how to install the modules under your home directory. It's easy.

First, you have to decide where to install the modules. The simplest approach is to simulate the portion of the / file system relevant to Perl under your home directory. Actually we need only two directories:


  /home/stas/bin
  /home/stas/lib

We don't have to create them, since that will be done automatically when the first module is installed. Ninety-nine percent of the files will go into the lib directory. Occasionally, when some module distribution comes with Perl scripts, these will go into the bin directory. This directory will be created if it doesn't exist.

Let's install the CGI.pm package, which includes a few other CGI::* modules. As usual, download the package from the CPAN repository, unpack it and chdir to the newly created directory.

Now do a standard perl Makefile.PL to prepare a Makefile, but this time tell MakeMaker to use your Perl installation directories instead of the defaults.


  % perl Makefile.PL PREFIX=/home/stas

PREFIX=/home/stas is the only part of the installation process that is different from usual. Note that if you don't like how MakeMaker chooses the rest of the directories, or if you are using an older version that requires an explicit declaration of all the target directories, then do this:


  % perl Makefile.PL PREFIX=/home/stas \
    INSTALLPRIVLIB=/home/stas/lib/perl5 \
    INSTALLSCRIPT=/home/stas/bin \
    INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
    INSTALLBIN=/home/stas/bin \
    INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
    INSTALLMAN3DIR=/home/stas/lib/perl5/man3

The rest is as usual:


  % make
  % make test
  % make install

make install installs all the files in the private repository. Note that all the missing directories are created automatically, so there is no need to create them. Here (slightly edited) is what it does :


  Installing /home/stas/lib/perl5/CGI/Cookie.pm
  Installing /home/stas/lib/perl5/CGI.pm
  Installing /home/stas/lib/perl5/man3/CGI.3
  Installing /home/stas/lib/perl5/man3/CGI::Cookie.3
  Writing /home/stas/lib/perl5/auto/CGI/.packlist
  Appending installation info to /home/stas/lib/perl5/perllocal.pod

If you have to use the explicit target parameters, then instead of a single PREFIX parameter, you will find it useful to create a file called, for example, ~/.perl_dirs (where ~ is /home/stas in our example) containing:


    PREFIX=/home/stas \
    INSTALLPRIVLIB=/home/stas/lib/perl5 \
    INSTALLSCRIPT=/home/stas/bin \
    INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
    INSTALLBIN=/home/stas/bin \
    INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
    INSTALLMAN3DIR=/home/stas/lib/perl5/man3

From now on, any time you want to install Perl modules locally, you simply execute:


  % perl Makefile.PL `cat ~/.perl_dirs`
  % make
  % make test
  % make install

Using this method, you can easily maintain several Perl module repositories. For example, you could have one for production Perl and another for development:


  % perl Makefile.PL `cat ~/.perl_dirs.production`

or


  % perl Makefile.PL `cat ~/.perl_dirs.develop`

Making Your Scripts Find the Locally Installed Modules

Perl modules are generally placed in four main directories. To find these directories, execute:


  % perl -V

The output contains important information about your Perl installation. At the end you will see:


  Characteristics of this binary (from libperl):
  Built under linux
  Compiled at Apr  6 1999 23:34:07
  @INC:
    /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005
    .

It shows us the content of the Perl special variable @INC, which is used by Perl to look for its modules. It is equivalent to the PATH environment variable in Unix shells that is used to find executable programs.

Notice that Perl looks for modules in the . directory too, which stands for the current directory. It's the last entry in the above output.

Of course, this example is from version 5.00503 of Perl installed on my x86 architecture PC running Linux. That's why you see i386-linux and 5.00503. If your system runs a different version of Perl, operating system, processor or chipset architecture, then some of the directories will have different names.

I also have a perl-5.6.1 installed under /usr/local/lib/ so when I do:


  % /usr/local/bin/perl5.6.1 -V

I see:


  @INC:
    /usr/local/lib/perl5/5.6.1/i586-linux
    /usr/local/lib/perl5/5.6.1
    /usr/local/lib/site_perl/5.6.1/i586-linux
    /usr/local/lib/site_perl

Previously in the Series

mod_perl in 30 minutes

Why mod_perl?

Note that it's still Linux, but the newer Perl version uses the version of my Pentium processor (thus the i586 and not i386). This makes use of compiler optimizations for Pentium processors when the binary Perl extensions are created.

All the platform specific files, such as compiled C files glued to Perl with XS or SWIG, are supposed to go into the i386-linux-like directories.

Important: As we have installed the Perl modules into nonstandard directories, we have to let Perl know where to look for the four directories. There are two ways to accomplish this: You can set the PERL5LIB environment variable or you can modify the @INC variable in your scripts.

Assuming that we use perl-5.00503, in our example the directories are:


    /home/sbekman/lib/perl5/5.00503/i386-linux
    /home/sbekman/lib/perl5/5.00503
    /home/sbekman/lib/perl5/site_perl/5.005/i386-linux
    /home/sbekman/lib/perl5/site_perl/5.005

As mentioned before, you find the exact directories by executing perl -V and replacing the global Perl installation's base directory with your home directory.

Modifying @INC is quite easy. The best approach is to use the lib module (pragma), by adding the following snippet at the top of any of your scripts that require the locally installed modules.


  use lib qw(/home/stas/lib/perl5/5.00503/
             /home/stas/lib/perl5/site_perl/5.005);

Another way is to write code to modify @INC explicitly:


  BEGIN {
    unshift @INC,
      qw(/home/stas/lib/perl5/5.00503
         /home/stas/lib/perl5/5.00503/i386-linux
         /home/stas/lib/perl5/site_perl/5.005
         /home/stas/lib/perl5/site_perl/5.005/i386-linux);
        }

Note that with the lib module we don't have to list the corresponding architecture specific directories, since it adds them automatically if they exist (to be exact, when $dir/$archname/auto exists).

Also, notice that both approaches prepend the directories to be searched to @INC. This allows you to install a more recent module into your local repository and Perl will use it instead of the older one installed in the main system repository.

Both approaches modify the value of @INC at compilation time. The lib module uses the BEGIN block as well, but internally.

Now, let's assume the following scenario. I have installed the LWP package in my local repository. Now I want to install another module (e.g. mod_perl) that has LWP listed in its prerequisites list. I know that I have LWP installed, but when I run perl Makefile.PL for the module I'm about to install I'm told that I don't have LWP installed.

There is no way for Perl to know that we have some locally installed modules. All it does is search the directories listed in @INC, and since the latter contains only the default four directories (plus the . directory), it cannot find the locally installed LWP package. We cannot solve this problem by adding code to modify @INC, but changing the PERL5LIB environment variable will do the trick. If you are using t?csh for interactive work, then do this:


  setenv PERL5LIB /home/stas/lib/perl5/5.00503:
  /home/stas/lib/perl5/site_perl/5.005

It should be a single line with directories separated by colons (:) and no spaces. If you are a (ba)?sh user, then do this:


  export PERL5LIB=/home/stas/lib/perl5/5.00503:
  /home/stas/lib/perl5/site_perl/5.005

Again, make it a single line. If you use bash, then you can use multi-line commands by terminating split lines with a backslash (\), like this:


  export PERL5LIB=/home/stas/lib/perl5/5.00503:\
  /home/stas/lib/perl5/site_perl/5.005

As with use lib, Perl automatically prepends the architecture specific directories to @INC if those exist.

When you have done this, verify the value of the newly configured @INC by executing perl -V as before. You should see the modified value of @INC:


  % perl -V

  Characteristics of this binary (from libperl): 
  Built under linux
  Compiled at Apr  6 1999 23:34:07
  %ENV:
    PERL5LIB="/home/stas/lib/perl5/5.00503:
    /home/stas/lib/perl5/site_perl/5.005"
  @INC:
    /home/stas/lib/perl5/5.00503/i386-linux
    /home/stas/lib/perl5/5.00503
    /home/stas/lib/perl5/site_perl/5.005/i386-linux
    /home/stas/lib/perl5/site_perl/5.005
    /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005
    .

When everything works as you want it to, add these commands to your .tcshrc or .bashrc file. The next time you start a shell, the environment will be ready for you to work with the new Perl.

Note that if you have a PERL5LIB setting, then you don't need to alter the @INC value in your scripts. But if, for example, someone else (who doesn't have this setting in the shell) tries to execute your scripts, then Perl will fail to find your locally installed modules. The best example is a crontab script that might use a different SHELL environment and, therefore, the PERL5LIB setting won't be available to it.

So the best approach is to have both the PERL5LIB environment variable and the explicit @INC extension code at the beginning of the scripts as described above.

The CPAN.pm Shell and Locally Installed Modules

The CPAN.pm shell saves a great deal of time when dealing with the installation of Perl modules and keeping them up to date. It does the job for us, even detecting the missing modules listed in prerequisites, fetching and installing them. So you may wonder whether you can use CPAN.pm to maintain your local repository as well.

When you start the CPAN interactive shell, it searches first for the user's private configuration file and then for the system-wide one. When I'm logged as user stas, the two files on my setup are:


    /home/stas/.cpan/CPAN/MyConfig.pm
    /usr/lib/perl5/5.00503/CPAN/Config.pm

If there is no CPAN shell configured on your system, then when you start the shell for the first time it will ask you a dozen configuration questions and then create the Config.pm file for you.

If you already have it system-wide configured, then you should have a /usr/lib/perl5/5.00503/CPAN/Config.pm. If you have a different Perl version, then alter the path to use your Perl's version number when looking up the file. Create the directory (mkdir -p creates the whole path at once) where the local configuration file will go:


  % mkdir -p /home/stas/.cpan/CPAN

Now copy the system wide configuration file to your local one.


  % cp /usr/lib/perl5/5.00503/CPAN/Config.pm \
  /home/stas/.cpan/CPAN/MyConfig.pm

The only thing left is to change the base directory of .cpan in your local file to the one under your home directory. On my machine, I replace /usr/src/.cpan (that's where my system's .cpan directory resides) with /home/stas. I use Perl, of course!


  % perl -pi -e 's|/usr/src|/home/stas|' \
  /home/stas/.cpan/CPAN/MyConfig.pm

Now you have the local configuration file ready, you have to tell it what special parameters you need to pass when executing the perl Makefile.PL stage.

Open the file in your favorite editor and replace line:


  'makepl_arg' => q[],

with:


  'makepl_arg' => q[PREFIX=/home/stas],

Now you've finished the configuration. Assuming that you are logged in as the same user you have prepared the local installation for (stas in our example), start it like this:


  % perl -MCPAN -e shell

From now on, any module you try to install will be installed locally. If you need to install some system modules, then just become the superuser and install them in the same way. When you are logged in as the superuser, the system-wide configuration file will be used instead of your local one.

If you have used more than just the PREFIX variable, then modify MyConfig.pm to use them. For example, if you have used these variables:


    perl Makefile.PL PREFIX=/home/stas \
    INSTALLPRIVLIB=/home/stas/lib/perl5 \
    INSTALLSCRIPT=/home/stas/bin \
    INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
    INSTALLBIN=/home/stas/bin \
    INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
    INSTALLMAN3DIR=/home/stas/lib/perl5/man3

then replace PREFIX=/home/stas in the line:


  'makepl_arg' => q[PREFIX=/home/stas],

with all the variables from above, so that the line becomes:


  'makepl_arg' => q[PREFIX=/home/stas \
    INSTALLPRIVLIB=/home/stas/lib/perl5 \
    INSTALLSCRIPT=/home/stas/bin \
    INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
    INSTALLBIN=/home/stas/bin \
    INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
    INSTALLMAN3DIR=/home/stas/lib/perl5/man3],

If you arrange all the above parameters in one line, then you can remove the backslashes (\).

Making a Local Apache Installation

Just like with Perl modules, if you don't have permissions to install files into the system area, then you have to install them locally under your home directory. It's almost the same as a plain installation, but you have to run the server listening to a port number greater than 1024, since only root processes can listen to lower-numbered ports.

Another important issue you have to resolve is how to add startup and shutdown scripts to the directories used by the rest of the system services. You will have to ask your system administrator to assist you with this issue.

To install Apache locally, all you have to do is to tell .configure in the Apache source directory what target directories to use. If you are following the convention that I use, which makes your home directory look like the / (base) directory, then the invocation parameters would be:


  ./configure --prefix=/home/stas

Apache will use the prefix for the rest of its target directories instead of the default /usr/local/apache. If you want to see what they are, then before you proceed add the --show-layout option:


  ./configure --prefix=/home/stas --show-layout

You might want to put all the Apache files under /home/stas/apache following Apache's convention:


  ./configure --prefix=/home/stas/apache

If you want to modify some or all of the names of the automatically created directories:


  ./configure --prefix=/home/stas/apache \
    --sbindir=/home/stas/apache/sbin
    --sysconfdir=/home/stas/apache/etc
    --localstatedir=/home/stas/apache/var \
    --runtimedir=/home/stas/apache/var/run \
    --logfiledir=/home/stas/apache/var/logs \
    --proxycachedir=/home/stas/apache/var/proxy

That's all!

Also remember that you can start the script only under a user and group you belong to. You must set the User and Group directives in httpd.conf to appropriate values.

Manual Local mod_perl Enabled Apache Installation

Now that we have learned how to install local Apache and Perl modules separately, let's see how to install mod_perl enabled Apache in our home directory. It's almost as simple as doing each one separately, but there is one wrinkle you need to know about that I'll mention at the end of this section.

Let's say you have unpacked the Apache and mod_perl sources under /home/stas/src and they look like this:


  % ls /home/stas/src
  /home/stas/src/apache_x.x.x
  /home/stas/src/mod_perl-x.xx

where x.xx are the version numbers as usual. You want the Perl modules from the mod_perl package to be installed under /home/stas/lib/perl5 and the Apache files to go under /home/stas/apache. The following commands will do that for you:


  % perl Makefile.PL \
  PREFIX=/home/stas \
  APACHE_PREFIX=/home/stas/apache \
  APACHE_SRC=../apache_x.x.x/src \
  DO_HTTPD=1 \
  USE_APACI=1 \
  EVERYTHING=1
  % make && make test && make install 
  % cd ../apache_x.x.x
  % make install

If you need some parameters to be passed to the .configure script, as we saw in the previous section, then use APACI_ARGS. For example:


  APACI_ARGS='--sbindir=/home/stas/apache/sbin, \
    --sysconfdir=/home/stas/apache/etc, \
    --localstatedir=/home/stas/apache/var, \
    --runtimedir=/home/stas/apache/var/run, \
    --logfiledir=/home/stas/apache/var/logs, \
    --proxycachedir=/home/stas/apache/var/proxy'

Note that the above multi-line splitting will work only with bash, tcsh users will have to list all the parameters on a single line.

Basically the installation is complete. The only remaining problem is the @INC variable. This won't be correctly set if you rely on the PERL5LIB environment variable unless you set it explicitly in a startup file that is require'd before loading any other module that resides in your local repository. A much nicer approach is to use the lib pragma as we saw before, but in a slightly different way -- we use it in the startup file and it affects all the code that will be executed under mod_perl handlers. For example:


  PerlRequire /home/stas/apache/perl/startup.pl

where startup.pl starts with:


  use lib qw(/home/stas/lib/perl5/5.00503/
             /home/stas/lib/perl5/site_perl/5.005);

Note that you can still use the hard-coded @INC modifications in the scripts themselves, but be aware that scripts modify @INC in BEGIN blocks and mod_perl executes the BEGIN blocks only when it performs script compilation. As a result, @INC will be reset to its original value after the scripts are compiled and the hard-coded settings will be forgotten.

The only place you can alter the ``original'' value is during the server configuration stage either in the startup file or by putting


  PerlSetEnv Perl5LIB \
  /home/stas/lib/perl5/5.00503/:/home/stas/lib/perl5/site_perl/5.005

in httpd.conf, but the latter setting will be ignored if you use the PerlTaintcheck setting, and I hope you do use it.

The remainder of the mod_perl configuration and use is just the same as if you were installing mod_perl as a superuser.

Local mod_perl Enabled Apache Installation with CPAN.pm

Assuming that you have configured CPAN.pm to install Perl modules locally as explained earlier in this article, the installation is simple. Start the CPAN.pm shell, set the arguments to be passed to perl Makefile.PL (modify the example setting to suit your needs), and tell <CPAN.pm> to do the rest for you:


  % perl -MCPAN -eshell
  cpan> o conf makepl_arg 'DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 \
        PREFIX=/home/stas APACHE_PREFIX=/home/stas/apache'
  cpan> install mod_perl

When you use CPAN.pm for local installations, after the mod_perl installation is complete, you must make sure that the value of makepl_arg is restored to its original value.

The simplest way to do this is to quit the interactive shell by typing quit and re-entering it. But if you insist, then here is how to make it work without quitting the shell. You really want to skip this :)

If you want to continue working with CPAN *without* quitting the shell, then you must:

Previously in the Series

mod_perl in 30 minutes

Why mod_perl?

  1. remember the value of makepl_arg
  2. change it to suit your new installation
  3. build and install mod_perl
  4. restore it after completing mod_perl installation

this is quite a cumbersome task as of this writing, but I believe that CPAN.pm will eventually be improved to handle this more easily.

So if you are still with me, then start the shell as usual:


  % perl -MCPAN -eshell

First, read the value of the makepl_arg:


  cpan> o conf makepl_arg

  PREFIX=/home/stas

It will be something like PREFIX=/home/stas if you configured CPAN.pm to install modules locally. Save this value:


  cpan> o conf makepl_arg.save PREFIX=/home/stas

Second, set a new value, to be used by the mod_perl installation process. (You can add parameters to this line, or remove them, according to your needs.)


  cpan> o conf makepl_arg 'DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 \
        PREFIX=/home/stas APACHE_PREFIX=/home/stas/apache'

Third, let <CPAN.pm> build and install mod_perl for you:


  cpan> install mod_perl

Fourth, reset the original value to makepl_arg. We do this by printing the value of the saved variable and assigning it to makepl_arg.


  cpan> o conf makepl_arg.save

  PREFIX=/home/stas

  cpan> o conf makepl_arg PREFIX=/home/stas

Not so neat, but a working solution. You could have written the value on a piece of paper instead of saving it to makepl_arg.save, but you are more likely to make a mistake that way.

References

Exegesis 4

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

And I'd se-ell my-y so-oul for flow of con-tro-ol ... over Perl

-- The Motels, "Total Control" (Perl 6 remix)

In Apocalypse 4, Larry explains the fundamental changes to flow and block control in Perl 6. The changes bring fully integrated exceptions; a powerful new switch statement; a coherent mechanism for polymorphic matching; a greatly enhanced for loop; and unification of blocks, subroutines and closures.

Let's dive right in.

"Now, Witness the Power of This Fully Operational Control Structure"

We'll consider a simple interactive RPN calculator. The real thing would have many more operators and values, but that's not important right now.


    class Err::BadData is Exception {...}
    
    module Calc;
    
    my class NoData is Exception {
        method warn(*@args) { die @args }
    }
    
    my %var;
    
    my sub get_data ($data) {
        given $data {
            when /^\d+$/    { return %var{""} = $_ }
            when 'previous' { return %var{""} // fail NoData }
            when %var       { return %var{""} = %var{$_} }
            default         { die Err::BadData : msg=>"Don't understand $_" }
        }  
    }
    
    sub calc (str $expr, int $i) {
        our %operator is private //= (
            '*'  => { $^a * $^b },
            '/'  => { $^a / $^b },
            '~'  => { ($^a + $^b) / 2 },
        );
        
        my @stack;
        my $toknum = 1;
        for split /\s+/, $expr -> $token {
            try {
                when %operator {
                    my @args = splice @stack, -2;
                    push @stack, %operator{$token}(*@args)
                }
                when '.', ';', '=' {
                    last
                }
                
                use fatal;
                push @stack, get_data($token);
                
                CATCH {
                    when Err::Reportable     { warn $!; continue }
                    when Err::BadData        { $!.fail(at=>$toknum) }
                    when NoData              { push @stack, 0 }
                    when /division by zero/  { push @stack, Inf }
                }
            }
            
            NEXT { $toknum++ }
        }
        fail Err::BadData: msg=>"Too many operands" if @stack > 1;
        return %var{'$' _ $i} = pop(@stack) but true;
    }
    
    module main;
    
    for 1..Inf -> $i {
        print "$i> ";
        my $expr = <> err last;  
        print "$i> $( Calc::calc(i=>$i, expr=>$expr) )\n";
    }

An Exceptionally Promising Beginning

The calculator is going to handle internal and external errors using Perl 6's OO exception mechanism. This means that we're going to need some classes for those OO exceptions to belong to.

To create those classes, the class keyword is used. For example:


    class Err::BadData is Exception {...}

Related Articles

Apocalypse 4
Apocalypse 3
Exegesis 3
Apocalypse 2
Exegesis 2

After this declaration, Err::BadData is a class name (or rather, by analogy to "filehandle," it's a "classname"). Either way, it can then be used as a type specifier wherever Perl 6 expects one. Unlike Perl 5, that classname is not a bareword string: It's a genuine first-class symbol in the program. In object-oriented terms, we could think of a classname as a meta-object -- an object that describes the attributes and behavior of other objects.

Modules and packages are also first class in Perl 6, so we can also refer to their names directly, or take references to them, or look them up in the appropriate symbol table.

Classes can take properties, just like variables and values. Generally, those properties will specify variations in the behavior of the class. For example:


    class B::Like::Me is interface;

specifies that the B::Like::Me class defines a (Java-like) interface that any subclass must implement.

The is Exception is not, however, a standard property. Indeed, Exception is the name of another (standard, built-in) class. When a classname like this is used as if it were a property, the property it confers is inheritance. Specifically, Err::BadData is defined as inheriting from the Exception base class. In Perl 5, that would have been:


    # Perl 5 code
    package Err::BadData;
    use base 'Exception';

So now class Err::BadData will have all the exceptionally useful properties of the Exception class.

Having classnames as "first class" symbols of the program means that it's also important to be able to pre-declare them (to avoid compile-time "no such class or module" errors). So we need a new syntax for declaring the existence of classes/modules/packages, without actually defining their behavior.

To do that we write:


    class MyClass {...}

That right. That's real, executable, Perl 6 code.

We're defining the class, but using the new Perl 6 "yada-yada-yada" operator in a block immediately after the classname. By using the "I'm-eventually-going-to-put-something-here-but-not-just-yet" marker, we indicate that this definition is only a stub or placeholder. In this way, we introduce the classname into the current scope without needing to provide the complete description of the class.

By the way, this is also the way we can declare other types of symbols in Perl 6 without actually defining them:


    module Alpha {...}
    package Beta {...}
    method Gamma::delta(Gamma $self: $d1, $d2) {...}
    sub epsilon() {...}

In our example, the Err::BadData classname is introduced in precisely that way:


    class Err::BadData is Exception {...}

which means that we can refer to the class by name, even though it has not yet been completely defined.

In fact, in this example, Err::BadData is never completely defined. So we'd get a fatal compile-time error: "Missing definition for class Err::BadData." Then we'd realize we either forgot to eventually define the class, or that we had really meant to write:


    class Err::BadData is Exception {}   # Define new exception class with
                                         # no methods or attributes
                                         # except those it inherits
                                         # See below.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

Lexical Exceptions

Most of the implementation of the calculator is contained in the Calc module. In Perl 6, modules are specified using the module keyword:


    module Calc;

which is similar in effect to a Perl 5:


    # Perl 5 code
    package Calc;

Modules are not quite the same as packages in Perl 6. Most significantly, they have a different export mechanism: They export via a new, built-in, declarative mechanism (which will be described in a future Apocalypse) and the symbols they export are exported lexically by default.

The first thing to appear in the module is a class declaration:


    my class NoData is Exception {
        method warn(*@args) { die @args }
    }

This is another class derived from Exception, but one that has two significant differences from the declaration of class Err::BadData:

  • The leading my makes it lexical in scope, and
  • the trailing braces give it an associated block in which its attributes and methods can be specified.

Let's look at each of those.

NoData exceptions are only going to be used within the Calc module itself. So it's good software engineering to make them visible only within the module itself.

Why? Because if we ever attempt to refer to the exception class outside Calc (e.g. if we tried to catch such an exception in main), then we'll get a compile-time "No such class: NoData" error. Any such errors would indicate a flaw in our class design or implementation.

In Perl 6, classes are first-class constructs. That is, like variables and subroutines, they are "tangible" components of a program, denizens of a symbol table, able to be referred to both symbolically and by explicit reference:


    $class = \Some::Previously::Defined::Class;
    
    # and later
    
    $obj = $class.new();

Note that the back slash is actually optional in that first line, just as it would be for an array or hash in the same position.

"First class" also means that classnames live in a symbol table. So it follows that they can be defined to live in the current lexical symbol table (i.e. %MY::), by placing a my before them.

A lexical class or module is only accessible in the lexical scope in which it's declared. Of course, like Perl 5 packages, Perl 6 classes and modules don't usually have an explicit lexical scope associated with their declaration. They are implicitly associated with the surrounding lexical scope (which is normally a file scope).

But we can give them their own lexical scope to preside over by adding a block at the end of their declaration:


    class Whatever {
        # definition here
    }

This turns out to be important. Without the ability to specify a lexical scope over which the class has effect, we would be stuck with no way to embed a "nested" lexical class:


    class Outer;
    # class Outer's namespace
    
    my class Inner;
    
    # From this line to the end of the file 
    # is now in class Inner's namespace

In Perl 6, we avoid this problem by writing:


    class Outer;
    # class Outer's namespace
    
    my class Inner {
        # class Inner's namespace
    }
    
    # class Outer's namespace again

In our example, we use this new feature to redefine NoData's warn method (upgrading it to a call to die). Of course, we could also have done that with just:


    my class NoData is Exception;       # Open NoData's namespace
    method warn(*@args) { die @args }   # Defined in NoData's namespace

but then we would have needed to "reopen" the Calc module's namespace afterward:


    module Calc;                        # Open Calc's namespace
    
    my class NoData is Exception;       # Open NoData's (nested) namespace
    method warn(*@args) { die @args }   # Defined in NoData's namespace
    
    module Calc;                        # Back to Calc's namespace

Being able to "nest" the NoData namespace:


    module Calc;                            # Open Calc's namespace
    
    my class NoData is Exception {          # Open NoData's (nested) namespace
        method warn(*@args) { die @args }   # Defined in NoData's namespace
    }
    
    # The rest of module Calc defined here.

is much cleaner.

By the way, because classes can now have an associated block, they can even be anonymous:


    $anon_class = class { 
        # definition here
    };
    
    # and later
    
    $obj = $anon_class.new();

which is a handy way of implementing "singleton" objects:


    my $allocator = class { 
                        my $.count = "ID_000001";
                        method next_ID { $.count++ }
                    }.new;
                    
    # and later...
    
    for @objects {
        $_.set_id( $allocator.next_ID );
    }

Maintaining Your State

To store the values of any variables used by the calculator, we'll use a single hash, with each key being a variable name:


    my %var;

Nothing more to see here. Let's move along.

It's a Given

The get_data subroutine may be given a number (i.e. a literal value), a numerical variable name (i.e. '$1', '$2', etc.) , or the keyword 'previous'.

It then looks up the information in the %var hash, using a switch statement to determine the appropriate look-up:


    my sub get_data ($data) {
        given $data {

The given $data evaluates its first argument (in this case, $data) in a scalar context, and makes the result the "topic" of each subsequent when inside the block associated with the given. (Though, just between us, that block is merely an anonymous closure acting as the given's second argument -- in Perl 6 all blocks are merely closures that are slumming it.)

Note that the given $data statement also makes $_ an alias for $data. So, for example, if the when specifies a pattern:


    when /^\d+$/  { return %var{""} = $_ }

then that pattern is matched against the contents of $data (i.e. against the current topic). Likewise, caching and returning $_ when the pattern matches is the same as caching and returning $data.

After a when's block has been selected and executed, control automatically passes to the end of the surrounding given (or, more generally, to the end of whatever block provided the when's topic). That means that when blocks don't "fall through" in the way that case statements do in C.

You can also explicitly send control to the end of a when's surrounding given, using a break statement. For example:


    given $number {
        when /[02468]$/ {
            if ($_ == 2) {
                warn "$_ is even and prime\n";
                break;
            }           
            warn "$_ is even and composite\n";
        }
        when &is_prime {
            warn "$_ is odd and prime\n";
        }
        warn "$_ is odd and composite\n";
    }

Alternatively, you can explicitly tell Perl not to automatically break at the end of the when block. That is, tell it to "fall through" to the statement immediately after the when. That's done with a continue statement (which is the new name for The Statement Formerly Known As skip):


    given $number {
        when &is_prime   { warn "$_ is prime\n"; continue; }
        when /[13579]$/  { warn "$_ is odd"; }
        when /[02468]$/  { warn "$_ is even"; }
    }

In Perl 6, a continue means: "continue executing from the next statement after the current when, rather than jumping out of the surrounding given." It has nothing to do with the old Perl 5 continue block, which in Perl 6 becomes NEXT.

The "topic" that given creates can also be aliased to a name of our own choosing (though it's always aliased to $_ no matter what else we may do). To give the topic a more meaningful name, we just need to use the "topical arrow:"


    given check_online().{active}{names}[0] -> $name {
        when /^\w+$/  { print "$name's on first\n" }
        when /\?\?\?/    { print "Who's on first\n" }
    }

Having been replaced by the dot, the old Perl 5 arrow operator is given a new role in Perl 6. When placed after the topic specifier of a control structure (i.e. the scalar argument of a given, or the list of a for), it allows us to give an extra name (apart from $_) to the topic associated with that control structure.

In the above version, the given statement declares a lexical variable $name and makes it yet another way of referring to the current topic. That is, it aliases both $name and $_ to the value specified by check_online().{active}{names}[0].

This is a fundamental change from Perl 5, where $_ was only aliased to the current topic in a for loop. In Perl 6, the current topic -- whatever its name and however you make it the topic -- is always aliased to $_.

That implies that everywhere that Perl 5 used $_ as a default (i.e. print, chomp, split, length, eval, etc.), Perl 6 uses the current topic:


    for @list -> $next {        # iterate @list, aliasing each element to 
                                # $next (and to $_)
        print if length > 10;   # same as: print $next if length $next > 10
        %count{$next}++;
    }

This is subtly different from the "equivalent" Perl 5 code:


    # Perl 5 code
    for my $next (@list) {      # iterate @list, aliasing each element to 
                                # $next (but not to $_)
        print if length > 10;   # same as: print $_ if length $_ > 10
                                # using the $_ value from *outside* the loop
        %count{$next}++;
    }

If you had wanted this Perl 5 behavior in Perl 6, then you'd have to say explicitly what you meant:


    my $outer_underscore := $_;
    for @list -> $next {
        print $outer_underscore
            if length $outer_underscore > 10;
        %count{$next}++;
    }

which is probably a good thing in code that subtle.

Oh, and yes: the p52p6 translator program will take that new behavior into account and correctly convert something pathological like:


    # Perl 5 code
    while (<>) {
        for my $elem (@list) {
            print if $elem % 2;
        }
    }

to:


    # Perl 6 code
    for <> {
        my $some_magic_temporary_variable := $_;
        for @list -> $elem {
            print $some_magic_temporary_variable if $elem % 2;
        }
    }

Note that this works because, in Perl 6, a call to <> is lazily evaluated in list contexts, including the list of a for loop.

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

Other whens

The remaining cases of the data look-up are handled by subsequent when statements. The first:


    when 'previous' { return %var{""} // fail NoData }

handles the special keyword "previous". The previous value is always stored in the element of %var whose key is the empty string.

If, however, that previous value is undefined, then the defaulting operator -- // -- causes the right-hand side of the expression to be evaluated instead. That right-hand side is a call to the fail method of class NoData (and could equally have been written NoData.fail()).

The standard fail method inherited from the Exception class constructs an instance of the appropriate class (i.e. an exception object) and then either throws that exception (if the use fatal pragma is in effect) or else returns an undef value from the scope in which the fail was invoked. That is, the fail acts like a die SomeExceptionClass or a return undef, depending on the state of the use fatal pragma.

This is possible because, in Perl 6, all flow-of-control -- including the normal subroutine return -- is exception-based. So, when it is supposed to act like a return, the Exception::fail method simply throws the special Ctl::Return exception, which get_data's caller will (automagically) catch and treat as a normal return.

So then why not just write the usual:


    return undef;

instead?

The advantage of using fail is that it allows the callers of get_data to decide how that subroutine should signal failure. As explained above, normally fail fails by returning undef. But if a use fatal pragma is in effect, any invocation of fail instead throws the corresponding exception.

What's the advantage in that? Well, some people feel that certain types of failures ought to be taken deadly seriously (i.e. they should kill you unless you explicitly catch and handle them). Others feel that the same errors really aren't all that serious and you should be allowed to, like, chill man and just groove with the heavy consequences, dude.

The fail method allows you, the coder, to stay well out of that kind of fruitless religious debate.

When you use fail to signal failure, not only is the code nicely documented at that point, but the mode of failure becomes caller-selectable. Fanatics can use fatal and make each failure punishable by death; hippies can say no fatal and make each failure just return undef.

You no longer have to get caught up in endless debate as to whether the exception-catching:


    try { $data = get_data($str) }
        // warn "Couldn't get data" }

is inherently better or worse than the undef-sensing:


    do { $data = get_data($str) }
        // warn "Couldn't get data";

Instead, you can just write get_data such that There's More Than One Way To Fail It.

By the way, fail can fail in other ways, too: in different contexts or under different pragmas. The most obvious example would be inside a regex, where it would initiate back-tracking. More on that in Apocalypse 5.

Still Other Whens

Meanwhile, if $data isn't a number or the "previous" keyword, then maybe it's the name of one of the calculator's variables. The third when statement of the switch tests for that:


    when %var   { return %var{""} = %var{$_} }

If a when is given a hash, then it uses the current topic as a key in the hash and looks up the corresponding entry. If that value is true, then it executes its block. In this case, that block caches the value that was looked up (i.e. %var{$_}) in the "previous" slot and returns it.

"Aha!" you say, "that's a bug! What if the value of %var{$_} is false?!" Well, if it were possible for that to ever happen, then it certainly would be a bug, and we'd have to write something ugly:


    when defined %var{$_}   { return %var{""} = %var{$_} }

But, of course, it's much easier just to redefine Truth, so that any literal zero value stored in %var is no longer false. See below.

Finally, if the $data isn't a literal, then a "previous", or a variable name, it must be an invalid token, so the default alternative in the switch statement throws an Err::BadData exception:


    default     { die Err::BadData : msg=>"Don't understand $_" }

Note that, here again, we are actually executing a method call to:


    Err::BadData.die(msg=>"Don't understand $_");

as indicated by the use of the colon after the classname.

Of course, by using die instead of fail here, we're giving clients of the get_data subroutine no choice but to deal with Err::BadData exceptions.

An Aside: the "Smart Match" Operator

The rules governing how the argument of a when is matched against the current topic are designed to be as DWIMish as possible. Which means that they are actually quite complex. They're listed in Apocalypse 4, so we won't review them here.

Collectively, the rules are designed to provide a generic "best attempt at matching" behavior. That is, given two values (the current topic and the when's first argument), they try to determine whether those values can be combined to produce a "smart match" -- for some reasonable definitions of "smart" and "match."

That means that one possible use of a Perl 6 switch statement is simply to test whether two values match without worrying about how those two values match:


    sub hey_just_see_if_dey_match_willya ($val1, $val2) {
        given $val1 {
            when $val2 { return 1 }
            default    { return 0 }
        }
    }

That behavior is sufficiently useful that Larry wanted to make it much easier to use. Specifically, he wanted to provide a generic "smart match" operator.

So he did. It's called =~.

Yes, the humble Perl 5 "match a string against a regex" operator is promoted in Perl 6 to a "smart-match an anything against an anything" operator. So now:


    if ($val1 =~ $val2) {...}

works out the most appropriate way to compare its two scalar operands. The result might be a numeric comparison ($val1 == $val2) or a string comparison ($val1 eq $val2) or a subroutine call ($val1.($val2)) or a pattern match ($val1 =~ /$val2/) or whatever else makes the most sense for the actual run-time types of the two operands.

This new turbo-charged "smart match" operator will also work on arrays, hashes and lists:


    if @array =~ $elem {...}        # true if @array contains $elem
    
    if $key =~ %hash {...}          # true if %hash{$key}
    
    if $value =~ (1..10) {...}      # true if $value is in the list
    
    if $value =~ ('a',/\s/,7) {...} # true if $value is eq to 'a'
                                    #   or if $value contains whitespace
                                    #   or if $value is == to 7

That final example illustrates some of the extra intelligence that Perl 6's =~ has: When one of its arguments is a list (not an array), the "smart match" operator recursively "smart matches" each element and ORs the results together, short-circuiting if possible.

Being Calculating

The next component of the program is the subroutine that computes the actual results of each expression that the user enters. It takes a string to be evaluated and an integer indicating the current iteration number of the main input loop (for debugging purposes):


    sub calc (str $expr, int $count) {

Give us a little privacy, please

Perl 5 has a really ugly idiom for creating "durable" lexical variables: variables that are lexically scoped but stick around from call to call.

If you write:


    sub whatever {
        my $count if 0;
        $count++;
        print "whatever called $count times\n";
    }

then the compile-time aspect of a my $count declaration causes $count to be declared as a lexical in the subroutine block. However, at run-time -- when the variable would normally be (re-)allocated -- the if 0 prevents that process. So the original lexical variable is not replaced on each invocation, and is instead shared by them all.

This awful if 0 idiom works under most versions of Perl 5, but it's really just a freakish accident of Perl's evolution, not a carefully designed and lovingly crafted feature. So just say "No!".

Perl 6 allows us to do the same thing, but without feeling the need to wash afterward.

To understand how Perl 6 cleans up this idiom, notice that the durable variable is really much more; like a package variable that just happens to be accessible only in a particular lexical scope. That kind of restricted-access package variable is going to be quite common in Perl 6 -- as an attribute of a class.

So the way we create such a variable is to declare it as a package variable, but with the is private property:


    module Wherever;
    
    sub whatever {
        our $count is private;
        $count++;
        print "whatever called $count times\n";
    }

Adding is private causes Perl to recognize the existence of the variable $count within the Wherever module, but then to restrict its accessibility to the lexical scope in which it is first declared. In the above example, any attempt to refer to $Wherever::count outside the &Wherever::whatever subroutine produces a compile-time error. It's still a package variable, but now you can't use it anywhere but in the nominated lexical scope.

Apart from the benefit of replacing an ugly hack with a clean explicit marker on the variable, the real advantage is that Perl 6 private variables can be also be initialized:


    sub whatever {
        our $count is private //= 1;
        print "whatever called $count times\n";
        $count++;
    }

That initialization is performed the first time the variable declaration is encountered during execution (because that's the only time its value is undef, so that's the only time the //= operator has any effect).

In our example program we use that facility to do a one-time-only initialization of a private package hash. That hash will then be used as a (lexically restricted) look-up table to provide the implementations for a set of operator symbols:


        our %operator is private //= (
            '*'  => { $^a * $^b },
            '/'  => { $^a / $^b },
            '~'  => { ($^a + $^b) / 2 },
        );

Each key of the hash is an operator symbol and the corresponding value is an anonymous subroutine that implements the appropriate operation. Note the use of the "place-holder" variables ($^a and $^b) to implicitly specify the parameters of the closures.

Since all the data for the %operator hash is constant, we could have achieved a similar effect with:


        my %operator is constant = (
            '*'  => { $^a * $^b },
            '/'  => { $^a / $^b },
            '~'  => { ($^a + $^b) / 2 },
        );

Notionally this is quite different from the is private version, in that -- theoretically -- the lexical constant would be reconstructed and reinitialized on each invocation of the calc subroutine. Although, in practice, we would expect the compiler to notice the constant initializer and optimize the initialization out to compile-time.

If the initializer had been a run-time expression, then the is private and is constant versions would behave very differently:


    our %operator is private //= todays_ops();   # Initialize once, the first
                                                 # time statement is reached.
                                                 # Thereafter may be changed
                                                 # at will within subroutine.

    my %operator is constant = todays_ops();     # Re-initialize every time
                                                 # statement is reached.
                                                 # Thereafter constant
                                                 # within subroutine

Let's Split!

We then have to split the input expression into (whitespace-delimited) tokens, in order to parse and execute it. Since the calculator language we're implementing is RPN, we need a stack to store data and interim calculations:


    my @stack;

We also need a counter to track the current token number (for error messages):


    my $toknum = 1;

Then we just use the standard split built-in to break up the expression string, and iterate through each of the resulting tokens using a for loop:


    for split /\s+/, $expr -> $token {

There are several important features to note in this for loop. To begin with, there are no parentheses around the list. In Perl 6, they are not required (they're not needed for any control structure), though they are certainly still permissible:


    for (split /\s+/, $expr) -> $token {

More importantly, the declaration of the iterator variable ($token) is no longer to the left of the list:


    # Perl 5 code
    for my $token (split /\s+/, $expr) {

Instead, it is specified via a topical arrow to the right of the list.

By the way, somewhat surprisingly, the Perl 6 arrow operator isn't a binary operator. (Actually, neither is the Perl 5 arrow operator, but that's not important right now.)

Even more surprisingly, what the Perl 6 arrow operator is, is a synonym for the declarator sub. That's right, in Perl 6 you can declare an anonymous subroutine like so:


    $product_plus_one = -> $x, $y { $x*$y + 1 };

The arrow behaves like an anonymous sub declarator:


    $product_plus_one = sub($x, $y) { $x*$y + 1 };

except that its parameter list doesn't require parentheses. That implies:

  • The Perl 6 for, while, if, and given statements each take two arguments: an expression that controls them and a subroutine/closure that they execute. Normally, that closure is just a block (in Perl6 all blocks are really closures):
    
        for 1..10 {         # no comma needed before opening brace
            print
        }

    but you can also be explicit:

    
        for 1..10, sub {    # needs comma if a regular anonymous sub
            print
        }

    or you can be pointed:

    
        for 1..10 -> {      # no comma needed with arrow notation
            print
        }

    or referential:

    
        for 1..10,          # needs comma if a regular sub reference
            &some_sub;

    • The variable after the arrow is effectively a lexical variable confined to the scope of the following block (just as a subroutine parameter is a lexical variable confined to the scope of the subroutine block). Within the block, that lexical becomes an alias for the topic (just as a subroutine parameter becomes an alias for the corresponding argument).
    • Topic variables created with the arrow notation are, by default, read-only aliases (because Perl 6 subroutine parameters are, by default, read-only aliases):
      
          for @list -> $i {
              if ($cmd =~ 'incr') {
                  $i++;   # Error: $i is read-only
              }
          }

      Note that the rule doesn't apply to the default topic ($_), which is given special dispensation to be a modifiable alias (as in Perl 5).

    • If you want a named topic to be modifiable through its alias, then you have to say so explicitly:
      
          for @list -> $i is rw {
              if ($cmd =~ 'incr') {
                  $i++;   # Okay: $i is read-write
              }
          }
    • Just as a subroutine can have more than one parameter, so too we can specify more than one named iterator variable at a time:
      
          for %phonebook.kv -> $name, $number {
              print "$name: $number\n"
          }

      Note that in Perl 6, a hash in a list context returns a list of pairs, not the Perl 5-ish "key, value, key, value, ..." sequence. To get the hash contents in that format, we have to call the hash's kv method explicitly.

      What actually happens in this iteration (and, in fact, in all such instances) is that the for loop looks at the number of arguments its closure takes and iterates that many elements at a time.

      Note that map and reduce can do that too in Perl 6:

      
          # process @xs_and_ys two-at-a-time...
          @list_of_powers = map { $^x ** $^y } @xs_and_ys;
          
          # reduce list three-at-a-time   
          $sum_of_powers  = reduce { $^partial_sum + $^x ** $^y } 0, @xs_and_ys;

      And, of course, since map and reduce take a subroutine reference as their first argument -- instead of using the higher-order placeholder notation -- we could use the arrow notation here too:

      
          @list_of_powers = map -> $x, $y { $x ** $y } @xs_and_ys;

      or even an old-fashioned anonymous subroutine:

      
          @list_of_powers = map sub($x,$y){ $x ** $y }, @xs_and_ys;

    Phew. If that all makes your head hurt, then don't worry. All you really need to remember is this: If you don't want to use $_ as the name of the current topic, then you can change it by putting an arrow and a variable name before the block of most control statements.

    Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

    A Trying Situation

    Once the calculator's input has been split into tokens, the for loop processes each one in turn, by applying them (if they represent an operator), or jumping out of the loop (if they represent an end-of-expression marker: '.', ';', or '='), or pushing them onto the stack (since anything else must be an operand):

    
        try {
            when %operator {                # apply operator
                my @args = splice @stack, -2;
                push @stack, %operator{$token}(*@args);
            }
            
            when '.', ';', '=' {           # or jump out of loop
                last;
            }
            
            use fatal;
            push @stack, get_data($token);  # or push operand

    The first two possibilities are tested for using when statements. Recall that a when tests its first argument against the current topic. In this case, however, the token was made the topic by the surrounding for. This is a significant feature of Perl 6: when blocks can implement a switch statement anywhere there is a valid topic, not just inside a given.

    The block associated with when %operator will be selected if %operator{$token} is true (i.e. if there is an operator implementation in %operator corresponding to the current topic). In that case, the top two arguments are spliced from the stack and passed to the closure implementing that operation (%operator{$token}(*@args)). Note that there would normally be a dot (.) operator between the hash entry (i.e. a subroutine reference) and the subroutine call, like so:

    
        %operator{$token}.(*@args)

    but in Perl 6 it may be omitted since it can be inferred (just as an inferrable -> can be omitted in Perl 5).

    Note too that we used the flattening operator (C<*>) on C<@args>, because the closure returned by C<%operator{$token}> expects two scalar arguments, not one array.

    The second when simply exits the loop if it finds an "end-of-expression" token. In this example, the argument of the when is a list of strings, so the when succeeds if any of them matches the token.

    Of course, since the entire body of the when block is a single statement, we could also have written the when as a statement modifier:

    
            last when '.', ';', '=';

    The fact that when has a postfix version like this should come as no surprise, since when is simply another control structure like if, for, while, etc.

    The postfix version of when does have one interesting feature. Since it governs a statement, rather than a block, it does not provide the block-when's automatic "break to the end of my topicalizing block" behavior. In this instance, it makes no difference since the last would do that anyway.

    The final alternative -- pushing the token onto the stack -- is simply a regular Perl push command. The only interesting feature is that it calls the get_data subroutine to pre-translate the token if necessary. It also specifies a use fatal so that get_data will fail by an throwing exception, rather than returning undef.

    The loop tries each of these possibilities in turn. And "tries" is the operative word here, because either the application of operations or the pushing of data onto the stack may fail, resulting in an exception. To prevent that exception from propagating all the way back to the main program and terminating it, the various alternatives are placed in a try block.

    A try block is the Perl 6 successor to Perl 5's eval block. Unless it includes some explicit error handling code (see Where's the catch???), it acts exactly like a Perl 5 eval {...}, intercepting a propagating exception and converting it to an undef return value:

    
        try { $quotient = $numerator / $denominator } // warn "couldn't divide";

    Where's the Catch???

    In Perl 6, we aren't limited to just blindly catching a propagating exception and then coping with an undef. It is also possible to set up an explicit handler to catch, identify and deal with various types of exceptions. That's done in a CATCH block:

    
        CATCH {
            when Err::Reportable     { warn $!; continue }
            when Err::BadData        { $!.fail(at=>$toknum) }
            when NoData              { push @stack, 0 }
            when /division by zero/  { push @stack, Inf }
        }

    A CATCH block is like a BEGIN block (hence the capitalization). Its one argument is a closure that is executed if an exception ever propagates as far as the block in which the CATCH was declared. If the block eventually executes, then the current topic is aliased to the error variable $!. So the typical thing to do is to populate the exception handler's closure with a series of when statements that identify the exception contained in $! and handle the error appropriately. More on that in a moment.

    The CATCH block has one additional property. When its closure has executed, it transfers control to the end of the block in which it was defined. This means that exception handling in Perl 6 is non-resumptive: once an exception is handled, control passes outward, and the code that threw the exception is not automatically re-executed.

    If we did want "try, try, try again" exception handling instead, then we'd need to explicitly code a loop around the code we're trying:

    
        # generate exceptions (sometimes)
        sub getnum_or_die {
            given <> {                      # readline and make it the topic
                die "$_ is not a number"
                    unless defined && /^\d+$/;
                return $_;
            }
        }
    
        # non-resumptive exception handling
        sub readnum_or_cry {
            return getnum_or_die;       # maybe generate an exception
            CATCH { warn $! }           # if so, warn and fall out of sub
        }
    
        # pseudo-resumptive
        sub readnum_or_retry {
            loop {                      # loop endlessly...
                return getnum_or_die;   #   maybe generate an exception
                CATCH { warn $! }       #   if so, warn and fall out of loop
            }                           #   (i.e. loop back and try again)
        }

    Note that this isn't true resumptive exception handling. Control still passes outward -- to the end of the loop block. But then the loop reiterates, sending control back into getnum_or_die for another attempt.

    Catch as Catch Can

    Within the CATCH block, the example uses the standard Perl 6 exception handling technique: a series of when statements. Those when statements compare their arguments against the current topic. In a CATCH block, that topic is always aliased to the error variable $!, which contains a reference to the propagating exception object.

    The first three when statements use a classname as their argument. When matching a classname against an object, the =~ operator (and therefore any when statement) will call the object's isa method, passing it the classname. So the first three cases of the handler:

    
        when Err::Reportable   { warn $!; continue }
        when Err::BadData      { $!.fail(at=>$toknum) }
        when NoData            { push @stack, 0 }

    are (almost) equivalent to:

    
        if $!.isa(Err::Reportable)  { warn $! }
        elsif $!.isa(Err::BadData)  { $!.fail(at=>$toknum) }
        elsif $!.isa(NoData)        { push @stack, 0 }

    except far more readable.

    The first when statement simply passes the exception object to warn. Since warn takes a string as its argument, the exception object's stringification operator (inherited from the standard Exception class) is invoked and returns an appropriate diagnostic string, which is printed. The when block then executes a continue statement, which circumvents the default "break out of the surrounding topicalizer block" semantics of the when.

    The second when statement calls the propagating exception's fail method to cause calc either to return or rethrow the exception, depending on whether use fatal was set. In addition, it passes some extra information to the exception, namely the number of the token that caused the problem.

    The third when statement handles the case where there is no cached data corresponding to the calculator's "previous" keyword, by simply pushing a zero onto the stack.

    The final case that the handler tests for:

    
        when /division by zero/  { push @stack, Inf }

    uses a regex, rather than a classname. This causes the topic (i.e. the exception) to be stringified and pattern-matched against the regex. As mentioned above, by default, all exceptions stringify to their own diagnostic string. So this part of the handler simply tests whether that string includes the words "division by zero," in which case it pushes the Perl 6 infinity value onto the stack.

    One Dot Only

    The CATCH block handled bad data by calling the fail method of the current exception:

    
        when Err::BadData  { $!.fail(at=>$toknum) }

    That's a particular instance of a far more general activity: calling a method on the current topic. Perl 6 provides a shortcut for that -- the prefix unary dot operator. Unary dot calls the method that is its single operand, using the current topic as the implicit invocant. So the Err::BadData handler could have been written:

    
        when Err::BadData  { .fail(at=>$toknum) }

    One of the main uses of unary dot is to allow when statements to select behavior on the basis of method calls. For example:

    
        given $some_object {
            when .has_data('new') { print "New data available\n" }
            when .has_data('old') { print "Old data still available\n" }
            when .is_updating     { sleep 1 }
            when .can('die')      { .die("bad state") }    # $some_object.die(...)
            default               { die "internal error" } # global die
        }

    Unary dot is also useful within the definition of methods themselves. In a Perl 6 method, the invocant (i.e. the first argument of the method, which is a reference to the object on which the method was invoked) is always the topic, so instead of writing:

    
        method dogtag (Soldier $self) {
            print $self.rank, " ", $self.name, "\n"
                unless $self.status('covert');
        }

    we can just write:

    
        method dogtag (Soldier $self) {     # $self is automagically the topic
            print .rank, " ", .name, "\n"
                unless .status('covert');
        }

    or even just:

    
        method dogtag {                     # @_[0] is automagically the topic
            print .rank, " ", .name, "\n"
                unless .status('covert');
        }

    Yet another use of unary dot is as a way of abbreviating multiple accesses to hash or array elements. That is, given also implements the oft-coveted with statement. If many elements of a hash or array are to be accessed in a set of statements, then we can avoid the tedious repetition of the container name:

    
        # initialize from %options...
        
        $name  = %options{name} // %options{default_name};
        $age   = %options{age};
        $limit = max(%options{limit}, %options{rate} * %options{count});
        $count = $limit / %options{max_per_count};

    by making it the topic and using unary dot:

    
        # initialize from %options...
        
        given %options {
            $name  = .{name} // .{default_name};
            $age   = .{age};
            $limit = max(.{limit}, .{rate} * .{count});
            $count = $limit / .{max_per_count};
        }

    Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

    Onward and Backward

    Back in our example, after each token has been dealt with in its loop iteration, the iteration is finished. All that remains to do is increment the token number.

    In Perl 5, that would be done in a continue block at the end of the loop block. In Perl 6, it's done in a NEXT statement within the loop block:

    
        NEXT { $toknum++ }

    Like a CATCH, a NEXT is a special-purpose BEGIN block that takes a closure as its single argument. The NEXT pushes that closure onto the end of a queue of "next-iteration" handlers, all of which are executed each time a loop reaches the end of an iteration. That is, when the loop reaches the end of its block or when it executes an explicit next or last.

    The advantage of moving from Perl 5's external continue to Perl 6's internal NEXT is that it gives the "next-iteration" handler access to any lexical variables declared within the loop block. In addition, it allows the "next-iteration" handler to be placed anywhere in the loop that's convenient (e.g. close to the initialization it's later supposed to clean up).

    For example, instead of having to write:

    
        # Perl 5 code
        my $in_file, $out_file;
        while (<>) {
            open $in_file, $_ or die;
            open $out_file, "> $_.out" or die;
            
            # process files here (maybe next'ing out early)
        }
        continue {
            close $in_file  or die;
            close $out_file or die;
        }

    we can just write:

    
        while (<>) {
            my $in_file  = open $_ or die;
            my $out_file = open "> $_.out" or die;
            NEXT {
                close $in_file  or die;
                close $out_file or die;
            }
            
            # process files here (maybe next'ing out early)
        }

    There's no need to declare $in_file and $out_file outside the loop, because they don't have to be accessible outside the loop (i.e. in an external continue).

    This ability to declare, access and clean up lexicals within a given scope is especially important because, in Perl 6, there is no reference counting to ensure that the filehandles close themselves automatically immediately at the end of the block. Perl 6's full incremental garbage collector does guarantee to eventually call the filehandle's destructors, but makes no promises about when that will happen.

    Note that there is also a LAST statement, which sets up a handler that is called automatically when a block is left for the last time. For example, this:

    
        for reverse 1..10 {
            print "$_..." and flush;
            NEXT { sleep 1 }
            LAST { ignition() && print "lift-off!\n" }
        }

    prints:

    
        10...9...8...7...6...5...4...3...2...1...lift-off!

    sleeping one second after each iteration (including the last one), and then calling &ignition at the end of the countdown.

    LAST statements are also extremely useful in nonlooping blocks, as a way of giving the block a "destructor" with which it can clean up its state regardless of how it is exited:

    
        sub handler ($value, $was_handled is rw) {
            given $value {
                LAST { $was_handled = 1 }
                when &odd { return "$value is odd" }
                when /0$/ { print "decimal compatible" }
                when /2$/ { print "binary compatible"; break }
                $value %= 7;
                when 1,3,5 { die "odd residual" }
            }
        }

    In the above example, no matter how the given block exits -- i.e. via the return of the first when block, or via the (implicit) break of the second when, or via the (explicit and redundant) break of the third when, or via the "odd residual" exception, or by falling off the end of the given block -- the $was_handled parameter is always correctly set.

    Note that the LAST is essential here. It wouldn't suffice to write:

    
        sub handler ($value, $was_handled is rw) {
            given $value {
                when &odd { return '$value is odd" }
                when /3$/ { print "ternary compatible" }
                when /2$/ { print "binary compatible"; break }
                $value %= 7;
                when 1,3,5 { die "odd residual" }
            }
            $was_handled = 1;
        }

    because then $handled wouldn't be set if an exception was thrown. Of course, if that's actually the semantics you wanted, then you don't want LAST in that case.

    WHY ARE YOU SHOUTING???

    You may be wondering why try is in lower case but CATCH is in upper. Or why NEXT and LAST blocks have those "loud" keywords.

    The reason is simple: CATCH, NEXT and LAST blocks are just specialized BEGIN blocks that install particular types of handlers into the block in which they appear.

    They install those handlers at compile-time so, unlike a try or a next or a last, they don't actually do anything when the run-time flow of execution reaches them. The blocks associated with them are only executed if the appropriate condition or exception is encountered within their scope. And, if that happens, then they are executed automatically, just like AUTOLOAD, or DESTROY, or TIEHASH, or FETCH, etc.

    So Perl 6 is merely continuing the long Perl tradition of using a capitalized keyword to highlight code that is executed automatically.

    In other words: I'M SHOUTING BECAUSE I WANT YOU TO BE AWARE THAT SOMETHING SUBTLE IS HAPPENING AT THIS POINT.

    Cache and Return

    Meanwhile, back in calc...

    Once the loop is complete and all the tokens have been processed, the result of the calculation should be the top item on the stack. If the stack of items has more than one element left, then it's likely that the expression was wrong somehow (most probably, because there were too many original operands). So we report that:

    
        fail Err::BadData : msg=>"Too many operands"
            if @stack > 1;

    If everything is OK, then we simply pop the one remaining value off the stack and make sure it will evaluate true (even if its value is zero or undef) by setting its true property. This avoids the potential bug discussed earlier.

    Finally, we record it in %var under the key '$n' (i.e. as the n-th result), and return it:

    
        return %var{'$' _ $i} = pop(@stack) but true;

    "But, but, but...", I hear you expostulate, "...shouldn't that be pop(@stack) is true???"

    Once upon a time, yes. But Larry has recently decided that compile-time and run-time properties should have different keywords. Compile-time properties (i.e. those ascribed to declarations) will still be specified with the is keyword:

    
        class Child is interface;
        my $heart is constant = "true";
        our $meeting is private;

    whereas run-time properties (i.e. those ascribed to values) will now be specified with the but keyword:

    
        $str = <$trusted_fh> but tainted(0);
        $fh = open($filename) but chomped;
        return 0 but true;

    The choice of but is meant to convey the fact that run-time properties will generally contradict some standard property of a value, such as its normal truth, chompedness or tainting.

    It's also meant to keep people from writing the very natural, but very misguided:

    
        if ($x is true) {...}

    which now generates a (compile-time) error:

    
        Can't ascribe a compile-time property to the run-time value of $x.
        (Did you mean "$x but true" or "$x =~ true"?)

    The Forever Loop

    Once the Calc module has all its functionality defined, all that's required is to write the main input-process-output loop. We'll cheat a little and write it as an infinite loop, and then (in solemn Unix tradition) we'll require an EOF signal to exit.

    The infinite loop needs to keep track of its iteration count. In Perl 5 that would be:

    
        # Perl 5 code
        for (my $i=0; 1; $i++) {

    which would translate into Perl 6 as:

    
        loop (my $i=0; 1; $i++) {

    since Perl 5's C-like for loop has been renamed loop in Perl 6 -- to distinguish it from the Perl-like for loop.

    However, Perl 6 also allows us to create semi-infinite, lazily evaluated lists, so we can write the same loop much more cleanly as:

    
        for 0..Inf -> $i {

    When Inf is used as the right-hand operand to .., it signifies that the resulting list must be lazily built, and endlessly iterable. This type of loop will probably be common in Perl 6 as an easy way of providing a loop counter.

    If we need to iterate some list of values, as well as tracking a loop counter, then we can take advantage of another new feature of Perl 6: iteration streams.

    A regular Perl 6 for loop iterates a single stream of values, aliasing the current topic to each in turn:

    
        for @stream -> $topic_from_stream {
            ...
        }

    But it's also possible to specify two (or more) streams of values that the one for loop will step through in parallel:

    
        for @stream1 ; @stream2 -> $topic_from_stream1 ; $topic_from_stream2 {
            ...
        }

    Each stream of values is separated by a semicolon, and each topic variable is similarly separated. The for loop iterates both streams in parallel, aliasing the next element of the first stream (@stream1) to the first topic ($topic_from_stream1) and the next element of the second stream (@stream2) to the second topic ($topic_from_stream2).

    The commonest application of this will probably be to iterate a list and simultaneously provide an iteration counter:

    
        for @list; 0..@list.last -> $next; $index {
            print "Element $index is $next\n";
        }

    It may be useful to set that out slightly differently, to show the parallel nature of the iteration:

    
        for  @list ; 0..@list.last
         ->  $next ; $index   {
            print "Element $index is $next\n";
        }

    It's important to note that writing:

    
        for @a; @b -> $x; $y {...}
        # in parallel, iterate @a one-at-a-time as $x, and @b one-at-a-time as $y

    is not the same as writing:

    
        for @a, @b -> $x, $y {...}
        # sequentially iterate @a then @b, two-at-a-time as $x and $y

    The difference is that semicolons separate streams, while commas separate elements within a single stream.

    If we were brave enough, then we could even combine the two:

    
        for @a1, @a2; @b -> $x; $y1, $y2 {...}
        # sequentially iterate @a1 then @a2, one-at-a-time as $x
        # and, in parallel, iterate @b two-at-a-time as $y1 and $y2

    This is definitely a case where a different layout would help make the various iterations and topic bindings clearer:

    
        for @a1, @a2 ;  @b
         -> $x       ;  $y1, $y2   {...}

    Note, however, that the normal way in Perl 6 to step through an array's values while tracking its indices will almost certainly be to use the array's kv method. That method returns a list of interleaved indices and values (much like the hash's kv method returns alternating keys and values):

    
        for @list.kv -> $index, $next {
            print "Element $index is $next\n";
        }

    Editor's note: this document is out of date and remains here for historic interest. See Synopsis 4 for the current design information.

    Read or Die

    Having prompted for the next expression that the calculator will evaluate:

    
        print "$i> ";

    we read in the expression and check for an EOF (which will cause the <> operator to return undef, in which case we escape the infinite loop):

    
        my $expr = <> err last;

    Err...err???

    In Apocalypse 3, Larry introduced the // operator, which is like a || that tests its left operand for definedness rather than truth.

    What he didn't mention (but which you probably guessed) was that there is also the low-precedence version of //. Its name is err:

    
              Operation         High Precedence       Low Precedence
              
             INCLUSIVE OR             ||                     or
             EXCLUSIVE OR             ~~                    xor
              DEFINED OR              //                    err

    But why call it err?

    Well, the // operator looks like a skewed version of ||, so the low-precedence version should probably be a skewed version of or. We can't skew it visually (even Larry thought that using italics would be going a bit far), so we skew it phonetically instead: or -> err.

    err also has the two handy mnemonic connotations:

    • That we're handling an error marker (which a returned undef usually is)

    • That we're voicing a surprised double-take after something unexpected (which a returned undef often is).

    Besides all that, it just seems to work well. That is, something like this:

    
        my $value = compute_value(@args)
            err die "Was expecting a defined value";

    reads quite naturally in English (whether you think of err as an abbreviation of "on error...", or as a synonym for "oops...").

    Note that err is a binary operator, just like or, and xor, so there's no particular need to start it on a new line:

    
        my $value = compute_value(@args) err die "Was expecting a defined value";

    In our example program, the undef returned by the <> operator at end-of-file is our signal to jump out of the main loop. To accomplish that we simply append err last to the input statement:

    
        my $expr = <> err last;

    Note that an or last wouldn't work here, as both the empty string and the string "0" are valid (i.e. non-terminating) inputs to the calculator.

    Just Do It

    Then it's just a matter of calling Calc::calc, passing it the iteration number and the expression:

    
        Calc::calc(i=>$i, expr=>$expr)
    

    Note that we used named arguments, so passing them in the wrong order didn't matter.

    We then interpolate the result back into the output string using the $(...) scalar interpolator:

    
        print "$i> $( Calc::calc(i=>$i, expr=>$expr) )\n";

    We could even simplify that a little further, by taking advatage of the fact that subroutine calls interpolate directly into strings in Perl 6, provided we use the & prefix:

    
        print "$i> &Calc::calc(i=>$i, expr=>$expr)\n";

    Either way, that's it: we're done.

    Summing Up

    In terms of control structures, Perl 6:

    • provides far more support for exceptions and exception handling,
    • cleans up and extends the for loop syntax in several ways,
    • unifies the notions of blocks and closures and makes them interchangeable,
    • provides hooks for attaching various kinds of automatic handlers to a block/closure,
    • re-factors the concept of a switch statement into two far more general ideas: marking a value/variable as the current topic, and then doing "smart matching" against that topic.

    These extensions and cleanups offer us far more power and control, and -- amazingly -- in most cases require far less syntax. For example, here's (almost) the same program, written in Perl 5:

    
        package Err::BadData; 
        use base 'Exception';   # which you'd have to write yourself
        
        package NoData;         # not lexical
        use base 'Exception';
        sub warn { die @_ }
        
        package Calc;
        
        my %var;
        
        sub get_data  {
            my $data = shift;
            if ($data =~ /^\d+$/)       { return $var{""} = $data }
            elsif ($data eq 'previous') { return defined $var{""}
                                                     ? $var{""}
                                                     : die NoData->new() 
                                        }
            elsif ($var{$data})         { return $var{""} = $var{$data} }
            else                        { die Err::BadData->new(
                                                 {msg=>"Don't understand $data"}
                                              )
                                         }
        }
        
        sub calc {
            my %data = @_;
            my ($i, $expr) = @data{'i', 'expr'};
            my %operator = (
                '*'  => sub { $_[0] * $_[1] },
                '/'  => sub { $_[0] / $_[1] },
                '~'  => sub { ($_[0] + $_[1]) / 2 },
            );
            
            my @stack;
            my $toknum = 1;
            LOOP: for my $token (split /\s+/, $expr) {
                defined eval {
                    TRY: if ($operator{$token}) {
                        my @args = splice @stack, -2;
                        push @stack, $operator{$token}->(@args);
                        last TRY;
                    }
                    last LOOP if $token eq '.' || $token eq ';' || $token eq '=';
    
                    push @stack, get_data($token);
                } || do {
                    if ($@->isa(Err::Reportable))     { warn $@; }
                    if ($@->isa(Err::BadData))        { $@->{at} = $i; die $@ }
                    elsif ($@->isa(NoData))           { push @stack, 0     }
                    elsif ($@ =~ /division by zero/)  { push @stack, ~0 }
                }
            }
            continue { $toknum++ }
            die Err::BadData->new(msg=>"Too many operands") if @stack > 1;
            $var{'$'.$i} = $stack[-1] . ' but true';
            return 0+pop(@stack);
        }
        
        package main;
        
        for (my $i=1; 1; $i++) {
            print "$i> ";
            defined( my $expr = <> ) or last;
            print "$i> ${\Calc::calc(i=>$i, expr=>$expr)}\n";
        }

    Hmmmmmmm. I know which version I'd rather maintain.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en