October 2004 Archives

This Fortnight on Perl 6, October 2004

Perl 6 Summary for 2004-10-01 through 2004-10-17

All~

Welcome to my first summary. Since I am relatively new at this game, I will just steal Piers' approach and start with Perl6 internals. But before that let me warn you that my ability to make strange characters with accents is not great, thus please do not be offended if I don't include them in your name. If you want them to appear in the future, a quick email about how to make them appear using a U.S. qwerty keyboard and Mozilla should suffice.

Also, groups.google.com does not seem to have picked up perl6.compiler yet.

With that legal disclaimer out of the way onward to:

Perl 6 Internals

Configure Problems and Improvements

Leo noticed that Configure doesn't rebuild things correctly. Takers welcome. Nicholas Clark added a --prefix option for the make install target.

http://xrl.us/divy
http://xrl.us/divz
http://xrl.us/div2

Non-core Module Dependency

Joshua Gatcomb accidentally introduced a dependency on Config::IniFiles. Since it is implemented in pure Perl he offered to add it to the repository. Warnock applies.

OpenBSD Troubles

Jens Rieks found and fixed a coredump on OpenBSD. Thanks, Jens.

Threads on Cygwin

Joshua Gatcomb discovered some trouble with threads on Cygwin. It seems that there are problems with both the thread implementation, and the tests not being generous enough if accepting out-of-order input. Still unresolved, I think.

Parrot IO Not Quite Threadsafe

Nicholas Clark discovered a double free in the IO system. While his problem was solved by Jens, the underlying problem still remains.

make install Portability Issues

Nicholas Clark asked why the install taget was not portable. Steve Fink responded that it was a quick hack at the time and made it better.

Namespaces

The namespace thread continues to churn. It is slowly making progress, but I believe there is a fair amount of people talking past each other going on. Perhaps Dan could step in and provide one final state of the namespaces?

http://xrl.us/div8
http://xrl.us/div9
http://xrl.us/diwa

Parrot Abstract Syntax Tree a.k.a. PAST

Sam Ruby decided to pick up the Python on Parrot ball. To that end he enquired as to what PAST is. Leo provided answers and help. Will asked for more help. Leo once again provided.

http://xrl.us/diwb
http://xrl.us/diwc

JIT for Non-Branching Compare opcodes

Stephane Peiry provided a patch to JIT with some more opcodes. Leo applied. Stephane then provided a patch with tests. Jens applied that one.

http://xrl.us/diwd
http://xrl.us/diwe

Comparing Pythons

Sam Ruby posted a link comparing various Pythons and their conformance to a test suite.

Sam Ruby: Comparing Pythons

Metaclasses?

Dan admitted confusion as to what exactly metaclasses are/do. Papers and explanations were provided by Aaron Sherman, Michael Walter, and Sam Ruby.

http://xrl.us/diwg
http://xrl.us/diwh
http://xrl.us/diwi

Plain Old Hash

William Coleda wondered if Parrot had a basic hash implementation (not a PerlHas). Dan said "D'oh!" and asked for takers. Coleda added a TODO.

Parakeet 0.3

Michael Pelletier's language Parakeet has hit 0.3 and been added to CVS. Everybody should play with it.

make install Thoughts

Leo conjectured about creating a parrot_config.c, which would encode all of Parrot's configuration. Then Parrot would know its own config. Jens suggested letting miniparrot generate it; Leo agreed.

More piethon

Dan's register spilling problems give him free time. He used it to work on piethon a little.

Privilege Implementation

Felix Gallo posted some questions/thoughts with respect to privileges. While Leo addressed Felix's question about the location of source files (and provided a nice plug for vim). The others all remain Warnocked.

make in Languages/Tcl

Matthew Zimmerman supplied a patch to fix Tcl's make. William Coleda modified and applied it.

MANIFEST Fix Up

Andy Dougherty provided a patch to remove some old files from the manifest. Steve Fink applied it.

Parrot 0.1.1 "Poicephalus"

There was a little talk about names. Then a little talk about getting it posted to perl.org. In the end the 0.1.1 release did happen and even made it to Slashdot. Thank you to everyone who contributed.

Data::Dumper TODO

Will Coleda added a TODO for Data::Dumper. Apparently it cannot dump FixedPMCArrays. Will conjectures that there are probably other new PMCs it cannot handle either. Patches welcome.

Emacs, XEmacs, and pir-mode

Jerome Quelin kicked off a thread that resulted in emacs getting better pir-mode support. Thanks to all involved, but I will continue to use vim. ;-)

A %= B

Dan noticed that we did not have support for %= in PIR. There was some confusion as several people rushed to the rescue. In the end, the problem was fixed. Thanks all.

Pushing and Popping Arrays

Will Coleda wondered why he could not push onto a Fixed Array. The answer was quickly provided that Fixed means that its size cannot be modified, not that it is bounded. This answer seemed reasonable enough to him. He then went on to ask why he could not pop a Resizable Array. The answer is, of course, because it is not implemented yet; patches welcome.

http://xrl.us/diwv
http://xrl.us/diww

imcc Reserved Words

Sam Ruby wondered how to create a subname num in imcc. The answer, "No," and a workaround was provided by Jens.

Quiet vs. Loud Build System

Andy Dougherty provided a patch to improve the build system. Steve Fink applied it. Leo disliked it. Consensus seems to be not to use it.

Cygwin Bugs

Joshua Gatacomb has been fighting with Cygwin getting Parrot to work. Apparently we trip a few of its bugs. Read more if you like.

http://xrl.us/diwz
http://xrl.us/diw2
http://xrl.us/diw3

Python Concerns

Jeff Clites raised some concerns about dealing with Python's bound and unbound methods. There was some discussion about it. In the end it just means a little more work.

Makefile Cleanup

Will submitted a patch to clean up the Makefile. There was some back and forth about what parts of the patch should be kept.

Win32 Issues

Ron Blaschke provided a few patches for Win32 which Jens and Leo applied.

http://xrl.us/diw6
http://xrl.us/diw7
http://xrl.us/diw8
http://xrl.us/diw9

Dynamically Loadable Modules

Steve Fink spent some time chugging away at dynclass and dynamically loaded modules. There was some discussion of proper variable names and some working out of problems. Friendly reminder -- when in doubt make clean; perl Configure.pl; make.

http://xrl.us/dixa
http://xrl.us/dixb

ICU Without --prefix Problems

Steve Fink was apparently having troubles with ICU and locating necessary data files. Leo confirmed that it is a problem.

MinGW/MSYS

Jens has apparently had success working with MinGW. Yay!

ICU Issues

Will had some ICU issues. Various suggestions and solutions were attempted. Any success?

Locals Inside Macros

Will wants .locals inside .macros. Apparently it can only be done if your macro only takes PMCs.

t/pmc/signal.t Improvements

Jeff Clites provided a patch with improvements to t/pmc/signal.t. Warnock applies.

Co-routines and --python

Michal at withoutane.com (whose last name I cannot find) wondered about the --python flag. The answers and discussion that followed indicate that it was a quick and dirty hack. Eventually it will be gone and Python will have PMCs of its own (relatively quickly if Sam Ruby has anything to say about it).

Small Patches

Sam Ruby fixed a small problem in PerlInt; Jeff Clites fixed a bug and cleaned some warnings in darwin/dl.c. He also fixed some tests and added a Parrot_memalign function for Mac OS X. Stephane Payrand provided a patch to allow multiple identifiers on one line in pir. Stephane also added a get_representation op. Bernhard Schmalhofer added support for synchronous callbacks. Ion Alexandru Morega added more functionality to complex PMC. Leo applied the patches.

Link Failure on amd64

Adam Thomason pointed out that amd64 lost a vital link flag somewhere in the shuffle.

Non-vtable Methods on Built-in pmcs

Sam Ruby wondered about allowing pmcs to implement additional non-vtable methods. Leo thought it would be good. Sam provided a patch, chromatic tweaked it, and Leo applied it.

http://xrl.us/dixs
http://xrl.us/dixt

Tinderbox?

Jens wondered what happened to tinderbox.perl.org. Apparently it died a while back and has not yet been resurrected. Robert Spier also noted that it was a little difficult to deal with and said he is interested in creating a new one. He is also looking for help in this endeavor.

Register Stacks [Again]

Leo has once again pushed forward his idea for register stacks and been Warnocked.

Ncurses Troubles

Andy Dougherty had some ncurses trouble. In particular, one of the error messages was exceptionally unhelpful. Now it is a little more helpful.

Coding Standards

Leo would like to draw everyone's attention to docs/pdds/pdd07_codingstd.pod, and the fact that it should also apply to Perl code where appropriate. Thank you.

Rx_compile and FSAs

Aaron Sherman has some ambitious stuff in the works with respect to regular expressions. Stay tuned for details.

Single Character from STDIN

Matt Diephouse wants to know how to get a single character from STDIN. The answer seems to be no, but this has triggered Dan to go back to the IO/Event doc. Best of luck, Dan.

Parrotcode.org Samples

Paul Seamons noticed that some of the samples on parrotcode.org are really out of date. Will Coleda agreed and suggested that they should all be updated to PIR, too. Takers welcome.

Newcomers

Pratik Roy wondered if he could join into the work on Parrot or if he would remain an outsider looking in. Many people rushed to suggest ways in which he could help. So, please don't feel afraid to contribute. At worst a patch of yours will be turned away with an explanation as to why. At best, you will start being referred to by first name only in the summaries.

Python and Perl Interop

Sam Ruby wondered about how Python and Perl would interoperate. Leo, Steve Fink, and Thomas Sandlass have ideas and suggestions. I think the answer will be "magically."

Inline::Parrot

Ovid pointed the world at Inline::Parrot. It looks cool.

http://xrl.us/dix4
http://xrl.us/dix5

Problems with 0.1.1 on x86-64

Brian Wheeler had a problem with Parrot on x86-64. He provided a patch, but Leo couldn't apply it and asked for a resend. Silence thereafter. . . .

Bug in Factorial?

James Ghofulpo wondered if the Parrot examples page's factorial program was supposed to truncate output. Warnock applies.

Configure.pl Auto-detection

Sridhar discovered that Configure.pl would fail if he told it to --cxx=/usr/bin/g++-3.3; Jens pointed out that he would also need to tell it --link=g++-3.3.

Parrot Forth 0.1

Matt Diephouse releases Parrot Forth 0.1. It has many cool features. Nice work. Perhaps this should go into languages/?

.return Directives

Stephane Payrard wants to remove the multiple meaning of .return to provide a cleaner path forward. Everyone agrees.

YAGCB (Yet Another Garbage Collection Bug)

Will thought he had turned up another GC bug with Tcl. This time he hadn't and it was his fault. Can't win them all I guess.

Testing a PMC for Truth

Will wondered how to determine a PMC's Boolean value. Leo pointed him toward istrue.

Grsecurity Problems

Christian Jaeger noticed taht grsecurity was stopping on Parrot's attempts to execute JITted code. Leo pointed out that we already have support for doing the right thing, we are just failing to detect that we need to.

PMCs and Inheritance

Sam Ruby wanted inheritance for PMCs. Apparently what we really need is a scheme for multiple inheritance of PMCs. Sam Ruby supplied several patches and Leo applied parts of them.

C++ and typedef struct Parrot_Interp *Parrot_Interp Don't Play Well

Jeff Clites discovered that Parrot has troubles on Mac OS X because C++ does not like typedef struct Parrot_Interp *Parrot_Interp. We need C++ to link ICU. Fortunately, that particular typedef is out of the ordinary and violates our normal conventions. Brent "Dax" Royal-Gordon threatened to apply the (fairly large) fix. Will tested Brent's patch and gave it the thumbs up.

JIT, Exec Core, Threads, and Architecture Proposal

Leo proposed mandating that all JIT must be done interpreter-relative. This allowed some happy optimizations including constant table access. General response was favorable.

pmc2c2.pl --tree Option

Bernhard Schmalhofer submitted a patch fixing the aforementioned --tree option. No answer yet, but it has not really been long enough to officially invoke Warnock.

Unununium

Jacques Mony asked for help/advice porting Parrot to unununium. No responses yet...

Perl 6 Compiler

Google groups still doesn't have Perl 6 Compiler, so this section won't have links. Sorry.

Compilation Paradigms

Jeff Clites put out a request for some basic P6 examples and resultant bytecode so that everyone would be on the same page. Warnock applies.

Internals, rules, REPL

Herbert Snorrason had some basic questions about Perl 6's interaction with Parrot. He also wanted to know about whether he should be playing with the re_tests or if that was wasted effort. Finally he wanted to know about read-eval-print loops. Sadly Warnock applies across the board.

Perl 6 Language

Google groups has nothing for Perl6.language between October 2 and 14. Is this really the case? (I had not signed up until shortly before volunteering to summarize.) If there is email that I just can't find, I would be appreciative if someone could produce a summary or a pointer to the missing mail for me. Thanks.

Updating Multiple Rows

Nitin Thakur attempted to de-Warnock himself, apparently unsuccessfully. Sorry, Nitin.

Perl 6 Summaries

Piers raised the white flag after several years as a wonderful summarizer. Having now just finished my first summary, let me say thank you for all of your hard work and if you ever want the job back its yours. ;-) I hope you don't mind my stealing your general format for these things.

The Usual Footer

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to ubermatt@gmail.com.

The Perl Foundation
Perl 6 Development site
Parrot Blog aggregator

Installing Bricolage


Now that Content Management with Bricolage has piqued your interest, you might be wondering what you need to do to install it. I'll be the first to admit that installing Bricolage is not trivial, given that it requires several third-party applications and modules to do its job. That said, the installer tries hard to identify what pieces you have and which ones you don't, to help you through the process. Even still, it can help to have a nice guide to step you through the process.

This article is here to help.

Packaging Systems

First off, depending on your operating system, you may be able to install Bricolage via the supported packaging system. If you run FreeBSD, you can install a recent version from the Free BSD ports collection. To do so, update your ports tree, and then:

% cd /usr/ports/www/bricolage
% make
% make install

A Debian package is also available. To install it, add these lines to your /etc/apt/sources.list file:

# bricolage
deb http://tetsuo.geekhive.net/mark/debian/unstable /
deb-src http://tetsuo.geekhive.net/mark/debian/unstable /

Then you can install Bricolage using apt-get:

# apt-get install bricolage-db
# apt-get install bricolage

The packaged distributions of Bricolage are great because they handle all of the dependencies for you, making installation extremely easy. The downside, however, is that there is frequently a lag behind a new release of Bricolage and the updating of the relevant packages. For example, the current stable release of Bricolage is 1.8.2, but the FreeBSD ports package is currently at 1.8.1. The Debian port is at 1.8.0. Furthermore, as of this writing, neither packaging system supports upgrading an existing installation of Bricolage, which may require database updates.

Building Bricolage

The alternative is to compile and install Bricolage and all of its dependencies yourself. This is not as difficult as it might at first sound, because Bricolage is a 100% Perl application and therefore requires no compilation. Many of the dependencies, however, do require compilation and have their own histories of successful installation on a given platform. For the most part, however, they have solid histories of success, and in the event of trouble, there are lots of resources for help on the Internet (see, for example, my articles on building Apache/mod_perl on Mac OS X). The platform-specific README files that come with Bricolage also contain useful information to help with your installation.

The next few sections of this article cover manual installation of Bricolage. If you're happy with a package install, this information can still be very useful for understanding Bricolage's requirements. If you're antsy, skip to the end to find out where to go next.

Prerequisites

First: did you read the README file for your platform?

The most important prerequisites for Bricolage are:

Perl

What kind of Perl.com article wouldn't have this requirement? Bricolage requires Perl 5.6.1 or later, but if you're going to work with any kind of non-ASCII characters in your content, I strongly recommend Perl 5.8.3 or later for its solid Unicode support. All text content managed by Bricolage is UTF-8, so for sites such as Radio Free Asia the newer versions of Perl are a must.

Experience has also shown that some vendor versions of Perl don't work too well. Red Hat's Perl, in particular, seems to have several problems that just go away once a sys-admin decides to compile her own. Caveat Perler.

Apache

Bricolage doesn't serve content, but it does require a web server to serve its interface. It requires Apache 1.3.12 or later, with a strong recommendation for the latest, 1.3.31. Bricolage does not yet support Apache 2, though the upcoming release of mod_perl 2 will lead to a port.

mod_perl

Speaking of mod_perl, Bricolage requires mod_perl 1.25 or later, with a strong recommendation to use the latest, 1.29. You can either statically compile mod_perl into Apache or, as of the recent release of Bricolage 1.8.2, compile it as a dynamically shared object library (DSO). However, in order to use mod_perl as a DSO, you must have compiled with a Perl that was configured with -Uusemymalloc or -Ubincompat5005. See this mod_perl FAQ for more details. Bricolage's installer will check this configuration against the Perl you use to run the installation and will complain if the installing Perl lacks these attributes. However, this check is only valid if the Perl running the installation is the same as the Perl used by mod_perl, so it pays to be aware of this issue.

As I said, Bricolage does not currently support mod_perl 2. However, now that mod_perl 2 is nearing release, there is greater interest in porting Bricolage to it (and therefore to Apache 2). Some work has begun in this area, and we hope to be able to announce mod_perl 2 support by the end of the year.

PostgreSQL

Bricolage stores all of its data in a PostgreSQL database. For those not familiar with PostgreSQL, it is an advanced, ACID-compliant, open-source object-relational database management system. I've found the compilation very easy on all platforms I've tried it on (although I have had to install libreadline on Mac OS X, first). Bricolage requires PostgreSQL 7.3 or later and recommends version 7.4. Bricolage will support the forthcoming PostgreSQL 8.0 around the time of its release, but to date no one has tested them together.

The one other recommendation I make is that you specify --no-locale or --locale=C when you initialize the PostgreSQL database. This is especially important if you will be managing content in more than one language, as it will prevent searches and sort ordering from being specific to one language and possibly incompatible with others. A Unicode and searching discussion on the pgsql-general mail list provides a broader perspective.

mod_ssl or apache-ssl

If you want encrypted communications between Bricolage and its clients, install either mod_ssl or apache-ssl. SSL is optional in Bricolage, but I recommend using it for security purposes. Bricolage can use SSL for all requests or just for authentication and password changing requests. Tune in for the next article in this series, Bricolage Runtime Configuration, for information on configuring SSL support.

Expat

Bricolage uses the XML::Parser Perl module, which in turn requires the Expat XML parser library. Most Unix systems have a version of Expat installed already, but if you need it, install it from the Expat home page.

CPAN Modules

Bricolage uses a very large number of CPAN modules. Most of those required in turn require still more modules. For the most part, we recommend that you let the Bricolage installer install the required modules. It will determine which modules you need and install them using the CPAN module. If you want to get ahead of the game, use the CPAN module to install them yourself, first. The easiest way to do it is to install Bundle::Bricolage. This module bundles up all of the required modules so that CPAN will install them for you:

% perl -MCPAN -e 'install Bundle::Bricolage'

There are also several optional modules. Install these all in one command by using the Bundle::BricolagePlus module:

% perl -MCPAN -e 'install Bundle::BricolagePlus'

Installing the Perl modules yourself can be useful if you expect to have trouble with one or more of them, as you can easily go back and manually install any troublesome modules. If you want to install them all yourself, without using the bundles, the INSTALL file has a complete list (copied from Bric::Admin). I don't recommend this approach, however; it will take you all night!

Note: Bricolage currently does not run on Windows. This situation will likely change soon, with the forthcoming introduction of PostgreSQL 8.0 with native Windows support as well as mod_perl 2. Watch the Bricolage web site for announcements in the coming months.

Installation

With all of the major dependencies worked out, it's time to install Bricolage. Download it from the Bricolage download page to the directory of your choice. Bricolage is distributed as a tarball like most Perl modules. Decompress it and then execute the usual Perl module commands to install it:

% wget http://www.bricolage.cc/downloads/bricolage-1.8.2.tar.gz
% tar zxvf bricolage-1.8.2.tar.gz
% cd bricolage-1.8.2
% perl Makefile.PL
% make
% make test
% make install

OK, to be fair, the process is actually more complicated than that, principally during make. Let's walk through the process.

Installation Configuration

The first step, perl Makefile.PL, doesn't really do what it does with your typical Perl modules. It's really just a wrapper around a custom Makefile to make sure that everything thereafter uses the Perl binary with which you executed Makefile.PL. If you're using an installation of Perl somewhere other than in your path, use it to execute Makefile.PL explicitly, such as /path/to/my/perl Makefile.PL.

The next step, make, will take the most time as the installer pauses to ask several questions. Let's take it step-by-step.

% make
/usr/bin/perl inst/required.pl


==> Probing Required Software <==

looking for PostgreSQL with version >= 7.3.0...
Found PostgreSQL's pg_config at '/usr/local/pgsql/bin/pg_config'.
Is this correct? [yes] 

The first thing the Bricolage installer does is to check for all of its dependencies. Here, it asks for the location of pg_config, the PostgreSQL configuration program. The installer will use this application to determine the version number of PostgreSQL, among other things. If you're using a package-installed version of PostgreSQL, make sure that you have the PostgreSQL development tools installed, as well (yes, I'm looking at you, Red Hat users!). Bricolage will look in several common locations for pg_config; if it doesn't find it, or if it finds the wrong one (because you have more than one installed), type in the location of pg_config. Otherwise, simply accept the one it has found.

Is this correct? [yes] [Return]
Found acceptable version of Postgres: 7.4.3.
Looking for Apache with version >= 1.3.12...
Found Apache server binary at '/usr/sbin/httpd'.
Is this correct? [yes] 

Next, the Bricolage installer searches for an instance of Apache 1.3.x. This time it's looking for the httpd executable. The same comments that applied to PostgreSQL apply to the Apache Web server; either accept the instance of httpd or type in an alternate. On my Mac, I never use Apple's Apache (an old habit because Apple's Apache uses a DSO mod_perl, whereas I always compile my own with a static mod_perl).

Is this correct? [yes] no
Enter path to Apache server binary [/usr/sbin/httpd] /usr/local/apache/bin/httpd
Are you sure you want to use '/usr/local/apache/bin/httpd'? [yes] [Return]
Found Apache executable at /usr/local/apache/bin/httpd.
Found acceptable version of Apache: 1.3.31.
Looking for expat...
Found expat at /usr/local/lib/libexpat.so.

From here, the Bricolage installer continues looking for other dependencies, starting with the Expat XML parsing library. Then the installer probes for all of the required and optional Perl modules:

==> Finished Probing Required Software <==

/usr/bin/perl inst/modules.pl


==> Probing Required Perl Modules <==

Looking for Storable...found.
Looking for Time::HiRes...found.
Looking for Unix::Syslog...found.
Looking for Net::Cmd...found.
Looking for Devel::Symdump...found.
Looking for DBI...found.
Checking that DBI version is >= 1.18... ok.
&x2026;

As I said, Bricolage requires quite a few Perl modules, so I'm truncating the list here for the sake of space. If any required modules are missing, the installer makes a note of it. If any optional modules are missing, it will prompt you to find out if you want to install them. Respond as appropriate.

…
Looking for HTML::Template...found.
Looking for HTML::Template::Expr...found.
Looking for Template...found.
Checking that Template version is >= 2.14... ok.
Looking for Encode...found.
Looking for Pod::Simple...found.
Looking for Test::Pod...found.
Checking that Test::Pod version is >= 0.95... ok.
Looking for Devel::Profiler... found.
Checking that Devel::Profiler version is >= 0.03... ok.
Looking for Apache::SizeLimit...found.
Looking for Net::FTPServer...found.
Looking for Net::SFTP...not found.
Do you want to install the optional module Net::SFTP? [no] [Return]
Looking for HTTP::DAV...not found.
Do you want to install the optional module HTTP::DAV? [no] [Return]
Looking for Text::Levenshtein...not found.
Do you want to install the optional module Text::Levenshtein? [no] yes
Looking for Crypt::SSLeay...found.
Looking for Imager...found.
Looking for Text::Aspell...not found.

Do you want to install the optional module Text::Aspell? [no] [Return]
Looking for XML::DOM...not found.
Do you want to install the optional module XML::DOM? [no] [Return]
Looking for CGI...found.

In this example, I've elected to install the Text::Levenshtein module, but no other optional modules not already installed.

Optional Perl Modules

Of course, if you previously installed Bundle::BricolagePlus from CPAN, you will have all of the optional modules installed. Let me provide a bit of background on each optional module so that you can decide for yourself which you need and which you don't. If you're just starting out with Bricolage, I recommend you don't worry too much about the optional modules; you can always add them if you decide that you need them later.

HTML::Template and HTML::Template::Expr

These two modules are necessary to create HTML::Template templates to format your content in Bricolage. Most Bricolage users use the required HTML::Mason module, but you should elect to install these modules if you're an HTML::Template user.

Template 2.14

Install the Perl Template Toolkit if you plan to write your content formatting templates in Template Toolkit rather than in Mason or HTML::Template.

Encode

The Encode module comes with and only works with Perl 5.8.0 and later. Install it you plan to support any character encodings other than UTF-8 in the Bricolage UI.

Pod::Simple and Test::Pod 0.95

These modules help to test the Bricolage API documentation, but are not otherwise necessary.

Devel::Profiler 0.03

This module can be useful if you experience performance problems with Bricolage and need to profile it to identify the bottleneck. You can always install it later if you need it.

Apache::SizeLimit

This module is useful for busy Bricolage installations. Because Perl does not return memory to the operating system when it has finished with it, the Apache/mod_perl processes can sometimes get quite large. This is especially true if you use the SOAP interface to import or publish a lot of documents. Apache::SizeLimit allows you to configure mod_perl to kill off its processes when they exceed a certain size, thus returning the memory to the OS. This is the best way to keep the size of Bricolage under control in a busy environment.

Net::FTPServer

This module is necessary to use the Bricolage virtual FTP server. The virtual FTP server makes it easy to edit Bricolage templates via FTP. It's a very nice feature when you're doing a lot of template development work, offering a more integrated interface for your favorite editor than the cut-and-paste approach of the UI. The downside is that FTP is an unencrypted protocol, so it sends passwords used to log in to the Bricolage virtual FTP server sent in the clear. This may not be so important if you're using Bricolage behind a firewall or on a VPN, and is irrelevant if you're not using SSL, because you're already sending passwords in the clear; but don't do that.

Net::SFTP 0.08

This module is necessary if you plan to distribute document files to your delivery server via secure FTP. Bricolage supports file system copying, FTP, secure FTP, and DAV distribution.

HTTP::DAV

Install this module if you plan to distribute document files to your delivery server via DAV.

Text::Levenshtein

This module is an optional alternative to the required Text::Soundex module. Bricolage uses it to analyze field names and suggest alternatives for misspellings in the Super Bulk Edit interface. Either of these modules is fine, although many people consider Text::Levenshtein to have a superior algorithm. I'll show an example of how this works in the Super Bulk Edit interface in a later article.

Crypt::SSLeay

Install this module if you plan to use SSL with Bricolage. It allows the SOAP clients to negotiate an encrypted connection to Bricolage.

Imager

This module is necessary if you plan to enable thumbnail images in Bricolage — why wouldn't you want that? You'll need to make sure that you first have all of the supporting libraries you need installed, such as libpng, libtiff, and libgif (or giflib). I'll discuss enabling thumbnail support in the next article.

Text::Aspell, XML::DOM, and CGI

These modules are necessary to use the spell-checking available with the optional HTMLArea module. I'll discuss HTMLArea support in the next article.

Back to Installation Configuration

After the Bricolage installer has determined which Perl module dependencies need to be satisfied, it moves on to checking the Apache dependencies, using the path to the httpd binary we provided earlier:

==> Finished Probing Required Perl Modules <==

/usr/bin/perl inst/apache.pl


==> Probing Apache Configuration <==

Extracting configuration data from `/usr/local/apache/bin/httpd -V`.
Reading Apache conf file: /usr/local/apache/conf/httpd.conf.
Extracting static module list from `/usr/local/apache/bin/httpd -l`.
Your Apache supports loadable modules (DSOs).
Found Apache user: nobody
Found Apache group: nobody
Checking for required Apache modules...
All required modules found.
====================================================================

Your Apache configuration suggested the following defaults.  Press
[return] to confirm each item or type an alternative.  In most cases
the default should be correct.

Apache User:                     [nobody]

The most important settings relative to Apache are the Apache user, group, and port, as well as the domain name of your new Bricolage server. The Bricolage installer probes the default Apache httpd.conf file to select default values, so you can often accept these:

Apache User:                     [nobody] [Return]
Apache Group:                    [nobody] [Return]
Apache Port:                     [80] [Return]
Apache Server Name:              [geertz.example.com] bricolage.example.com

Here I've elected only to change the hostname for my Bricolage server. Because Bricolage requires its own hostname to run, I've just given it a meaningful name. Be sure to set up DNS as necessary to point to your Bricolage-specific domain name. You can also run Bricolage on alternate ports, which can be useful on a server running Bricolage in addition to an existing web server (see the Bricolage web site for more information on running Bricolage concurrent with another web server process).

Bricolage will also check to see if your Apache binary includes support for mod_ssl or Apache-SSL. If so, it will ask if you wish to use SSL support with Bricolage:

Do you want to use SSL? [no] yes
SSL certificate file location [/usr/local/apache/conf/ssl.crt/server.crt] [Return]
SSL certificate key file location [/usr/local/apache/conf/ssl.key/server.key] [Return]
Apache SSL Port:                 [443] [Return]

Here I've elected to use the default values. If your Apache server has both mod_ssl and Apache-SSL support, the installer will prompt to find out which you wish to use. The installer will pull the default SSL certificates from the Apache conf directory; type in alternatives if you want to use different certificates or if the installer couldn't find any.

Once it has all of the Apache configuration information in hand, the Bricolage installer moves on to gathering PostgreSQL information:

==> Finished Probing Apache Configuration <==

/usr/bin/perl inst/postgres.pl


==> Probing PostgreSQL Configuration <==

Extracting postgres include dir from /usr/local/pgsql/bin/pg_config.
Extracting postgres lib dir from /usr/local/pgsql/bin/pg_config.
Extracting postgres bin dir from /usr/local/pgsql/bin/pg_config.
Finding psql.
Finding PostgreSQL version.

In order to create the Bricolage database and populate it with default data, the installer needs access to the database server as the PostgreSQL administrative or Root user, usually postgres. Then it will ask you to pick names for the Bricolage database and PostgreSQL user, which it will create:

Postgres Root Username [postgres] [Return]
Postgres Root Password (leave empty for no password) [] [Return]
Postgres System Username [postgres] [Return]
Bricolage Postgres Username [bric] [Return]
Bricolage Postgres Password [NONE] password
Are you sure you want to use 'password'? [yes] [Return]
Bricolage Database Name [bric] [Return]

Here I've accepted the default value for the Postgres Root Username. I left the password empty because by default PostgreSQL allows local users to access the server without a username. Instances of PostgreSQL installed from a package may have other authentication rules; consult the documentation for your installation of PostgreSQL for details. The Postgres System Username is necessary only if you're running PostgreSQL on the same box as Bricolage. If so, then you'll need to type in the Unix username under which PostgreSQL runs (also usually postgres). If PostgreSQL is running on another box, enter root or some other real local username for this option.

You can give your Bricolage database and PostgreSQL user any names you like, but the defaults are typical. You must provide a password for the Bricolage PostgreSQL username (here I've entered password). Next, the Bricolage installer will prompt for the location of your PostgreSQL server:

Postgres Database Server Hostname (default is unset, i.e. local domain socket)
      [] [Return]
Postgres Database Server Port Number (default is local domain socket)
      [] [Return]

Here I've accepted the defaults, because I'm running PostgreSQL on the local box and on the default port. In fact, if you leave these two options to their empty defaults, Bricolage will use a Unix socket to communicate with the PostgreSQL server. This has the advantage of not only being faster than a TCP/IP connection, but it also allows you to turn off PostgreSQL's TCP/IP support if you worry about having another port open on your server. However, if PostgreSQL is running on a separate box, you must enter a host name or IP address. If it's running on a port other than the default port (5432), enter the appropriate port number.

Next, the Bricolage installer asks how you want to install its various parts:

==> Finished Probing PostgreSQL Configuration <==

/usr/bin/perl inst/config.pl


==> Gathering User Configuration <==

========================================================================

Bricolage comes with two sets of defaults.  You'll have the
opportunity to override these defaults but choosing wisely here will
probably save you the trouble.  Your choices are:

  s - "single"   one installation for the entire system

  m - "multi"    an installation that lives next to other installations
                 on the same machine

Your choice? [s] 

There are essentially two ways to install Bricolage: The first, single, assumes that you will only ever have a single instance of Bricolage installed on your server. In such a case, it will install all of the Perl modules into the appropriate Perl @INC directory like any other Perl module and the executables into the same bin directory as your instance of Perl (such as /usr/local/bin).

The second way to install Bricolage is with the multi option. This option allows you to have multiple versions of Bricolage installed on a single server. Even if you never intend to do this, I generally recommend taking this approach, because the upshot is that all of your Bricolage files (with the exception of the database, the location of which depends on your PostgreSQL configuration) will install into a single directory. This makes it very easy to keep track of where everything is.

Your choice? [s] m

Next, the Bricolage installer wants to know where to install Bricolage. The default option, /usr/local/bricolage, is the easiest, but you can put it anywhere you like. All of the other relevant directories will by default be subdirectories of this directory, but you can change them too. For example, you might prefer to have the error log file in the typical log directory for your OS, such as /var/log. Personally, I prefer to keep everything in one place.

Bricolage Root Directory [/usr/local/bricolage] [Return]
Temporary Directory [/usr/local/bricolage/tmp] [Return]
Perl Module Directory [/usr/local/bricolage/lib] [Return]
Executable Directory [/usr/local/bricolage/bin] [Return]
Man-Page Directory (! to skip) [/usr/local/bricolage/man] [Return]
Log Directory [/usr/local/bricolage/log] [Return]
PID File Location [/usr/local/bricolage/log/httpd.pid] [Return]
Mason Component Directory [/usr/local/bricolage/comp] [Return]
Mason Data Directory [/usr/local/bricolage/data] [Return]

If you elected for the single installation option, then your choices would look more like:

Bricolage Root Directory [/usr/local/bricolage] [Return]
Temporary Directory [/tmp] [Return]
Perl Module Directory [/usr/local/lib/perl5/site_perl/5.8.5] [Return]
Executable Directory [/usr/local/bin] [Return]
Man-Page Directory (! to skip) [/usr/local/man] [Return]
Log Directory [/usr/local/apache/logs/] [Return]
PID File Location [/usr/local/apache/logs/httpd.pid] [Return]
Mason Component Directory [/usr/local/bricolage/comp] [Return]
Mason Data Directory [/usr/local/bricolage/data] [Return]

Again, you can customize these as you like. That's it for the installation configuration!

==> Finished Gathering User Configuration <==


===========================================================
===========================================================

Bricolage Build Complete. You may now proceed to
"make cpan", which must be run as root, to install any
needed Perl modules; then to
"make test" to run some basic tests of the API; then to
"make install", which must be run as root.

===========================================================
===========================================================

Installing CPAN Modules

Whether you elected to install optional CPAN modules, the Bricolage installer still might have identified missing module dependencies, so it's a good idea to follow the helpful instructions and run make cpan. Of course, the cpan target will implicitly execute if you just moved on to make test, but it's a good idea to run it on its own to have more control over things and to identify any possible problems. My system had all of the dependencies satisfied already (I've done this once or twice before), but you'll recall that I had elected to install the optional Text::Leventshtein module. The Bricolage installer will therefore attempt to install it from CPAN.

% make cpan
/usr/bin/perl inst/cpan.pl
This process must (usually) be run as root.
Continue as non-root user? [yes] n
make: *** [cpan] Error 1

Whoops! Don't make the mistake I just made! make cpan must run as the root user.

% sudo make cpan 
/usr/bin/perl inst/cpan.pl


==> Installing Modules From CPAN <==

CPAN: Storable loaded ok
CPAN: LWP::UserAgent loaded ok

&x2026;

Found Text::Levenshtein.  Installing...
Running install for module Text::Levenshtein
Running make for J/JG/JGOLDBERG/Text-Levenshtein-0.05.tar.gz
Fetching with LWP:
  http://www.perl.com/CPAN/authors/id/J/JG/JGOLDBERG/Text-Levenshtein-0.05.tar.gz

&x2026;

Text::Levenshtein installed successfully.


==> Finished Installing Modules From CPAN <==

I've truncated the output here, but you should have the general idea. The Bricolage installer uses the Perl CPAN module to install any needed modules from CPAN. If you encounter any problems, you might need to stop and manually configure and install a module. If so, once you're ready to continue with the Bricolage installation, delete the modules.db file in order to force the installer to detect all modules again so that it notices that you now have the module installed:

% rm modules.db
% sudo make cpan

Running Tests

The next step in installing Bricolage is optional, but will help identify any pitfalls before going any further. That's running the test suite.

% make test                                    
PERL_DL_NONLAZY=1 /usr/bin/perl inst/runtests.pl
t/Bric/Test/Runner....ok                                                     
All tests successful, 7 subtests skipped.
Files=1, Tests=2510, 21 wallclock secs ( 8.83 cusr +  1.39 csys = 10.22 CPU)

Make it So!

Once all tests pass, you're ready to install Bricolage:

% sudo make install
/usr/bin/perl inst/is_root.pl
/usr/bin/perl inst/cpan.pl
All modules installed. No need to install from CPAN.
rm -f lib/Makefile
cd lib; /usr/bin/perl Makefile.PL; make install
&x2026;
==> Finished Copying Bricolage Files <==

If you happened to select a database name for Bricolage for a database that already exists, the installer will warn you about it:

/usr/bin/perl inst/db.pl


==> Creating Bricolage Database <==

Becoming postgres...
Creating database named bric...
Database named "bric" already exists.  Drop database? [no] 

Now you have a choice. If you elect to dropt the database, the Bricolage installer will drop it and then create a new copy — but it must have Root user access to the PostgreSQL server. In other situations you might want to continue with the installed database, as in the case when your ISP has created the database for you ahead of time. You will also receive a prompt if the PostgreSQL user for the Bricolage database already exists. Again, you can either opt to drop and recreate the user or continue with the existing username:

Database named "bric" already exists.  Drop database? [no] [Return]
Create tables in existing database? [yes] [Return]
Creating user named bric...
User named "bric" already exists. Continue with this user? [yes] [Return]
Loading Bricolage Database (this may take a few minutes). 

At this point, the Bricolage installer is creating the Bricolage database. On my Mac, it takes about a minute to create the database, but your mileage may vary. Once that ends, the installer grants the appropriate PostgreSQL permissions and the installation is complete!

Done.
Finishing database...
Done.
/usr/bin/perl inst/db_grant.pl
Becoming postgres...
Granting privileges...
Done.
/usr/bin/perl inst/done.pl


=========================================================================
=========================================================================

                   Bricolage Installation Complete

You may now start your Bricolage server with the command (as root):

  /usr/local/bricolage/bin/bric_apachectl start

If this command fails, look in your error log for more information:

  /usr/local/bricolage/log/error_log

Once your server is started, open a web browser and enter the URL for
your server:

  http://bricolage.example.com

Login in as "admin" with the default password "change me now!". Your
first action should be changing this password. Navigate into the ADMIN ->
SYSTEM -> Users menu, search for the "admin" user, click the "Edit"
link, and change the password.

=========================================================================
=========================================================================

Start 'er Up and Login

That's it. Bricolage should start with the command helpfully provided by the installer:

% sudo /usr/local/bricolage/bin/bric_apachectl start
bric_apachectl start: starting httpd
bric_apachectl start: httpd started

If you set the Bricolage root directory to something other than /usr/local/bricolage, you'll need to set the $BRICOLAGE_CONF environment variable, first. For example, using Bash or Zsh, do:

% BRICOLAGE_ROOT=/opt/bricolage \
> sudo /opt/bricolage/bin/bric_apachectl start
bric_apachectl start: starting httpd
bric_apachectl start: httpd started

Once Bricolage successfully starts, point your browser to the appropriate URL and login as the admin user and change the password!

Up Next: Bricolage Runtime Configuration

Now that you have Bricolage up and running, you can start using it. Consult the documentation as directed in the README file to get started. Feel free to also subscribe to the Bricolage mail lists to ask any questions and to learn from the brave souls who have gone before you.

If you're interested in tuning your Bricolage installation, be sure to catch my next article, Bricolage Runtime Configuration, in which I'll cover all of the options when configuring Bricolage for added functionality and features.

Perl Code Kata: Testing Taint

To be a better programmer, practice programming.

It's not enough to practice, though. You must practice well and persistently. You need to explore branches and ideas and combinations as they come to your attention. Set aside some time to experiment with a new idea to see what you can learn and what you can use in your normal programming.

How do you find new ideas? One way is through code katas, short pieces of code that start your learning.

This article is the first in a series of code kata for Perl programmers. All of these exercises take place in the context of writing tests for Perl programs.

Why give examples in the context of testing? First, to promote the idea of writing tests. One of the best techniques of writing good, simple, and effective software is to practice test-driven development. Second, because writing tests well is challenging. It often pushes programmers to find creative solutions to difficult problems.

Taint Testing Kata #1

One of Perl's most useful features is the idea of tainting. If you enable taint mode, Perl will mark every piece of data that comes from an insecure source, such as insecure input, with a taint flag. If you want to use a piece of tainted data in a potentially dangerous way, you must untaint the data by verifying it.

The CGI::Untaint module family makes this process much easier for web programs — which often need the most taint protection. There are modules to untaint dates, email addresses, and credit card numbers.

Recently, I wrote CGI::Untaint::boolean to untaint data that comes from checkboxes in web forms. It's a simple module, taking fewer than 20 lines of sparse code that untaints any incoming data and translates a form value of on into a true value and anything else (including a non-existent parameter) into false.

Writing the tests proved to be slightly more difficult. How could I make sure that the incoming parameter provided to the module was tainted properly? How could I make sure that the module untaints it properly?

Given the code for CGI::Untaint::boolean, how would you write the tests?

package CGI::Untaint::boolean;

use strict;

use base 'CGI::Untaint::object';

sub _untaint_re { qr/^(on)$/ }

sub is_valid
{
    my $self  = shift;
    my $value = $self->value();

    return unless $value and $value =~ $self->_untaint_re();

    $self->value( $value eq 'on' ? 1 : 0 );
    return 1;
}

1;

Your code should check that it passes in a tainted value and that it receives an untainted value. You should also verify that the resulting value, when extracted from the handler, is not tainted, no matter its previous status.

Write using one of Perl's core test modules. I prefer Test::Simple and Test::More, but if you must use Test, go ahead. Assume that Test::Harness will honor the -T flag passed on the command line.

Don't read the tests that come with CGI::Untaint::boolean unless you're really stuck. The next section has a further explanation of that technique. For best results, spend at least 30 minutes working through the kata on your own before looking at the hints.

Tips, Tricks, Suggestions, and One Solution

To test tainting properly, you must understand its effects. When Perl sees the -T or -t flags, it immediately marks some of its data and environment as tainted. This includes the PATH environment variable.

Also, taint is sticky. If you use a piece of tainted data in an expression, it will taint the results of that expression.

Both of those facts make it easy to find a source of taint. CGI::Untaint::boolean's do the following to make tainted data:

my $tainted_on = substr( 'off' . $ENV{PATH}, 0, 3 );

Concatenating the clean string off with the tainted value of the PATH environment variable produces a tainted string. The substr() expression then returns the equivalent of original string with tainting added.

How can you tell if a variable holds a tainted value? The Perl FAQ gives one solution that attempts to perform an unsafe operation with tainted data, but I prefer the Scalar::Util module's tainted() function. It's effectively the same thing, but I don't have to remember any abnormal details.

This technique does rely on Test::Harness launching the test program with the -T flag. If that's not an option, the test program itself could launch other programs with that flag, using the $^X variable to find the path of the currently executing Perl. It may be worthwhile to check that the -T flag is in effect before skipping the rest of the tests or launching a new process and reporting its results.

The prove utility included with recent versions of Test::Harness may come in handy; launch the test with prove -T testfile.t to run under taint mode. See perldoc prove for more information.

You could also use this approach to launch programs designed to abort if the untainting fails, checking for exit codes automatically. It seems much easier to use Scalar::Util though.

Conclusion

This should give you everything you need to solve the problem. Check your code against the tests for CGI::Untaint::boolean.

If you've found a differently workable approach, I'd like to hear from you. Also, if you have suggestions for another kata (or would like to write one), please let me know.

chromatic is the author of Modern Perl. In his spare time, he has been working on helping novices understand stocks and investing.

FMTYEWTK About Mass Edits In Perl

For those not used to the terminology, FMTYEWTK stands for Far More Than You Ever Wanted To Know. This one is fairly light as FMTYEWTKs usually go. In any case, the question before us is, "How do you apply an edit against a list of files using Perl?" Well, that depends on what you want to do....

The Beginning

If you only want to read in one or more files, apply a regex to the contents, and spit out the altered text as one big stream -- the best approach is probably a one-liner such as the following:

perl -p -e "s/Foo/Bar/g" <FileList>

This command calls perl with the options -p and -e "s/Foo/Bar/g" against the files listed in FileList. The first argument, -p, tells Perl to print each line it reads after applying the alteration. The second option, -e, tells Perl to evaluate the provided substitution regex rather than reading a script from a file. The Perl interpreter then evaluates this regex against every line of all (space separated) files listed on the command line and spits out one huge stream of the concatenated fixed lines.

In standard fashion, Perl allows you to concatenate options without arguments with following options for brevity and convenience. Therefore, you'll more often see the previous example written as:

perl -pe "s/Foo/Bar/g" <FileList>

In-place Editing

If you want to edit the files in place, editing each file before going on to the next, that's pretty easy, too:

perl -pi.bak -e "s/Foo/Bar/g" <FileList>

The only change from the last command is the new option -i.bak, which tells Perl to operate on files in-place, rather than concatenating them together into one big output stream. Like the -e option, -i takes one argument, an extension to add to the original file names when making backup copies; for this example I chose .bak. Warning: If you execute the command twice, you've most likely just overwritten your backups with the changed versions from the first run. You probably didn't want to do that.

Because -i takes an argument, I had to separate out the -e option, which Perl otherwise would interpret as the argument to -i, leaving us with a backup extension of .bake, unlikely to be correct unless you happen to be a pastry chef. In addition, Perl would have thought that "s/Foo/Bar/" was the filename of the script to run, and would complain when it could not find a script by that name.

Running Multiple Regexes

Of course, you may want to make more extensive changes than just one regex. To make several changes all at once, add more code to the evaluated script. Remember to separate each additional line of code with a semicolon (technically, you should place a semicolon at the end of each line of code, but the very last one in any code block is optional). For example, you could make a series of changes:

perl -pi.bak -e "s/Bill Gates/Microsoft CEO/g;
 	s/CEO/Overlord/g" <FileList>

"Bill Gates" would then become "Microsoft Overlord" throughout the files. (Here, as in all examples, we ignore such finicky things as making sure we don't change "HERBACEOUS" to "HERBAOverlordUS"; for that kind of information, refer to a good treatise on regular expressions, such as Jeffrey Friedl's impressive book Mastering Regular Expressions, 2nd Edition. Also, I've wrapped the command to fit, but you should type it in as just one line.)

Doing Your Own Printing

You may wish to override the behavior created by -p, which prints every line read in, after any changes made by your script. In this case, change to the -n option. -p -e "s/Foo/Bar/" is roughly equivalent to -n -e "s/Foo/Bar/; print". This allows you to write interesting commands, such as removing lines beginning with hash marks (Perl comments, C-style preprocessor directives, etc.):

perl -ni.bak -e "print unless /^\s*#/;" <FileList>

Fields and Scripts

Of course, there are far more powerful things you can do with this. For example, imagine a flat-file database, with one row per line of the file, and fields separated by colons, like so:

Bill:Hennig:Male:43:62000
Mary:Myrtle:Female:28:56000
Jim:Smith:Male:24:50700
Mike:Jones:Male:29:35200
...

Suppose you want to find everyone who was over 25, but paid less than $40,000. At the same time, you'd like to document the number and percentage of women and men found. This time, instead of providing a mini-script on the command line, we'll create a file, glass.pl, which contains the script. Here's how to run the query:

perl -naF':' glass.pl <FileList>

glass.pl contains the following:

BEGIN { $men = $women = $lowmen = $lowwomen = 0; }

next unless /:/;
/Female/ ? $women++ : $men++;
if ($F[3] > 25 and $F[4] < 40000)
    { print; /Female/ ? $lowwomen++ : $lowmen++; }

END {
print "\n\n$lowwomen of $women women (",
      int($lowwomen / $women * 100),
      "%) and $lowmen of $men men (",
      int($lowmen / $men * 100),
      "%) seem to be underpaid.\n";
}

Don't worry too much about the syntax, other than to note some of the awk and C similarities. The important thing here and in later sections is to see how Perl makes these problems easily solvable.

Several new features appear in this example; first, if there is no -e option to evaluate, Perl assumes the first filename listed, in this case glass.pl, refers to a Perl script for it to execute. Secondly, two new options make it easy to deal with field-based data. -a (autosplit mode) takes each line and splits its fields into the array @F, based on the field delimiter given by the -F (Field delimiter) option, which can be a string or a regex. If no -F option exists, the field delimiter defaults to ' ' (one single-quoted space). By default, arrays in Perl are zero-based, so $F[3] and $F[4] refer to the age and pay fields, respectively. Finally, the BEGIN and END blocks allow the programmer to perform actions before file reading begins and after it finishes, respectively.

File Handling

All of these little tidbits have made use only of data from within the files being operated on. What if you want to be able to read in data from elsewhere? For example, imagine that you had some sort of file that allows includes; in this case, we'll assume that you somehow specify these files by relative pathname, rather than looking them up in an include path. Perhaps the includes look like the following:

...
#include foo.bar, baz.bar, boo.bar
...

If you want to see what the file looks like with the includes placed into the master file, you might try something like this:

perl -ni.bak -e "if (s/#include\s+//) {foreach $file
 (split /,\s*/) {open FILE, '<', $file; print <FILE>}}
 else {print}" <FileList>

To make it easier to see what's going on here, this is what it looks like with a full set of line breaks added for clarity:

perl -ni.bak -e "
        if (s/#include\s+//) {
            foreach $file (split /,\s*/) {
                open FILE, '<', $file;
                print <FILE>
            }
        } else {
            print
        }
    " <FileList>

Of course, this only expands one level of include, but then we haven't provided any way for the script to know when to stop if there's an include loop. In this little example, we take advantage of the fact that the substitution operator returns the number of changes made, so if it manages to chop off the #include at the beginning of the line, it returns a non-zero (true) value, and the rest of the code splits apart the list of includes, opens each one in turn, and prints its entire contents.

There are some handy shortcuts as well: if you open a new file using the name of an old file handle (FILE in this case), Perl automatically closes the old file first. In addition, if you read from a file using the <> operator into a list (which the print function expects), it happily reads in the entire file at once, one line per list entry. The print call then prints the entire list, inserting it into the current file, as expected. Finally, the else clause handles printing non-include lines from the source, because we are using -n rather than -p.

Better File Lists

The fact that it is relatively easy to handle filenames listed within other files indicates that it ought to be fairly easy to deal entirely with files read from some other source than a list on the end of the command line. The simplest case is to read all of the file contents from standard input as a single stream, which is common when building up pipes. As a matter of fact, this is so common that Perl automatically switches to this mode if there are no files listed on the command line:

<Source> | perl -pe "s/Foo/Bar/g" | <Sink>

Here Source and Sink are the commands that generate the raw data and handle the altered output from Perl, respectively. Incidentally, the filename consisting of a single hyphen (-) is an explicit alias for standard input; this allows the Perl programmer to merge input from files and pipes, like so:

<Source> | perl -pe "s/Foo/Bar/g" header.bar - footer.bar
 | <Sink>

This example first reads a header file, then the input from the pipe source, and then a footer file — the whole mess. The program modifies this text and sends it through to the out pipe.

As I mentioned earlier, when dealing with multiple files it is usually better to keep the files separate, by using in-place editing or by explicitly handling each file separately. On the other hand, it can be a pain to list all of the files on the command line, especially if there are a lot of files, or when dealing with files generated programmatically.

The simplest method is to read the files from standard input, pushing them onto @ARGV in a BEGIN block; this has the effect of tricking Perl into thinking it received all of the filenames on the command line! Assuming the common case of one filename per input line, the following will do the trick:

<FilenamesSource> | perl -pi.bak -e "BEGIN {push @ARGV,
 <STDIN>; chomp @ARGV} s/Foo/Bar/g"

Here we once again use the shortcut that reading in a file in a list context (which push provides) will read in the entire file. This adds the entire contents, one filename per entry, to the @ARGV array, which normally contains the list of arguments to the script. To complete the trick, we chomp the line endings from the filenames, because Perl normally returns the line ending characters (a carriage return and/or a line feed) when reading lines from a file. We don't want to consider these to be part of the filenames. (On some platforms, you could actually have filenames containing line ending characters, but then you'd have to make the Perl code a little more complex, and you deserve to figure that out for yourself for trying it in the first place.)

Response Files

Another common design is to provide filenames on the command line as usual, treating filenames starting with an @ specially. The program should consider their contents to be lists of filenames to insert directly into the command line. For example, if the contents of the file names.baz (often called a response file) are:

two
three
four

then this command:

perl -pi.bak -e "s/Foo/Bar/g" one @names.baz five

should work equivalently to:

perl -pi.bak -e "s/Foo/Bar/g" one two three four five

To make this work, we once again need to do a little magic in a BEGIN block. Essentially, we want to parse through the @ARGV array, looking for filenames that begin with @. We pass through any unmarked filenames, but for each response file found, we read in the contents of the response file and insert the new list of filenames into @ARGV. Finally, we chomp the line endings, just as in the previous section. This produces a canonical file list in @ARGV, just as if we'd specified all of the files on the command line. Here's what it looks like in action:

perl -pi.bak -e "BEGIN {@ARGV = map {s/^@// ? @{open RESP,
 '<', $_; [<RESP>]} : $_} @ARGV; chomp @ARGV} s/Foo/Bar/g"
 <ResponseFileList>

Here's the same code with line breaks added so you can see what's going on:

perl -pi.bak -e "
        BEGIN {
            @ARGV = map {
                        s/^@// ? @{open RESP, '<', $_;
                                   [<RESP>]}
                               : $_
                    } @ARGV;
            chomp @ARGV
        }
        
        s/Foo/Bar/g
    " <ResponseFileList>

The only tricky part is the map block. map applies a piece of code to every element of a list, returning a list of the return values of the code; the current element is in the $_ special variable. The block here checks to see if it could remove a @ from the beginning of each filename. If so, it opens the file, reads the whole thing into an anonymous temporary array (that's what the square brackets are there for), and then inserts that array instead of the response file's name (that's the odd @{...} construct). If there is no @ at the beginning of the filename to remove, the filename goes directly into the map results. Once we've performed this expansion and chomped any line endings, we can then proceed with the main work, in this case our usual substitution, s/Foo/Bar/g.

Recursing Directories

For our final example, let's deal with a major weakness in the way we've been doing things so far — we're not recursing into directories, instead expecting all of the files we need to read to appear explicitly on the command line. To perform the recursion, we need to pull out the big guns: File::Find. This Perl module provides very powerful recursion methods. It also comes standard with any recent version of the Perl interpreter. The command line is deceptively simple, because all of the brains are in the script:

perl cleanup.pl <DirectoryList>

This script will perform some basic housecleaning, marking all files readable and writeable, removing those with the extensions .bak, .$$$, and .tmp, and cleaning up .log files. For the log files, we will create a master log file (for archiving or perusal) containing the contents of all of the other logs, and then delete the logs so that they remain short over time. Here's the script:

use File::Find;

die "All arguments must be directories!"
    if grep {!-d} @ARGV;
open MASTER, '>', 'master.lgm';
finddepth(\&filehandler, @ARGV);
close MASTER;
rename 'master.lgm', 'master.log';

sub filehandler
{
    chmod stat(_) | 0666, $_ unless (-r and -w);
    unlink if (/\.bak$/ or /\.tmp$/ or /\.\$\$\$$/);
    if (/\.log$/) {
        open LOG, '<', $_;
        print MASTER "\n\n****\n$File::Find::name\n****\n";
        print MASTER <LOG>;
        close LOG;
        unlink;
    }
}

This example shows just how powerful Perl and Perl modules can be, and at the same time just how obtuse Perl can appear to the inexperienced. In this case, the short explanation is that the finddepth() function iterates through all of the program arguments (@ARGV), recursing into each directory and calling the filehandler() subroutine for each file. That subroutine then can examine the file and decide what to do with it. The example checks for readability and writability with -r and -w, fixing the file's security settings if needed with chmod. It then unlinks (deletes) any file with a name ending in any of the three unwanted extensions. Finally, if the extension is .log, it opens the file, writes a few header lines to the master log, copies the file into the master log, closes it, and deletes it.

Instead of using finddepth(), which does a depth-first search of the directories and visits them from the bottom up, we could have used find(), which does the same depth-first search from the top down. As a side note, the program writes the master log file with the extension .lgm, then renames it at the end to have the extension .log, so as to avoid the possibility of writing the master log into itself if the program is searching the current directory.

Conclusion

That's it. Sure, there's a lot more that you could do with these examples, including adding error checking, generating additional statistics, producing help text, etc. To learn how to do this, find a copy of Programming Perl, 3rd Edition, by Larry Wall, Tom Christiansen, and Jon Orwant. This is the bible (or the Camel, rather) of the Perl community, and well worth the read. Good luck!

Why Review Code?

Richard Gabriel of Sun Microsystems suggests that beginning programmers should study the source code for great works of software as part of a Master of Fine Arts course in software. Prominent Extreme Programmers talk about finding the Quality Without a Name in software, listening to your code, and reworking your code when it "smells."

With this in mind, this article contains a review of one of the "base" Perl modules, available from CPAN but also included as part of any distribution of Perl. It's a module that I particularly like (with algorithms that I can more or less understand); I aim to show why I like it and what you can learn from it.

Code and Communication

Reviewing (even reading) source code is not always easy. Most software doesn't have human readability as a primary goal — just compilers or interpreters. The exception is software written as examples or to help others learn. This code is primarily of interest in itself and not for what it does. As far as a computer cares, software would have the same behavior if it had no comments and all variables named as a1, a2, ... -- as the obfuscated programming contests prove. They also prove that we humans then find it difficult to read. Most literature (including poems, essays, short stories, novels), on the other hand, has the primary purpose of communicating with other people, and so is often easier to understand.

Nonetheless, code reviews are still useful. Projects that perform them usually do so to find bugs or to suggest improvements; they exist for the benefit of the code writer or tester. Here, we seek to learn from the code, to take away from it lessons that we can apply to our own code. These are lessons, or patterns, but not design patterns; the patterns discussed here appear at a lower level and are more of the order of tips and tricks.

As mentioned above, there are parts of software programs that do exist to communicate: comments as well as variable and function (or method) names are all about communicating (with other people). The challenge for the considerate [1] software developer (who thinks beyond the short term) is to use comments and names to tell the story of the software: what it is, what it should do, and why the developers made certain choices. Ideally, the form of software should match its function [2]: comments and names should explain and clarify the rest of the code, rather than disagreeing with it, repeating what it says, or being irrelevant. Source code reviews are part of this process. They further explain the code to a wider audience, telling its story in a broader and hopefully deeper way.

The Review Itself

Math::Complex is an open source Perl package that is part of the core Perl distribution. It provides methods for complex arithmetic (and overloading of operators).

Raphael Manfredi created the module in 1996. Since then, first Jarkko Hietaniemi and, currently, Daniel S. Lewart have maintained it (according to its own comments). My comments below relate to version 1.34.

I should explain that while I've been using Perl for maybe three years, it's mostly been for test automation and text processing; my day-to-day programming is primarily in C. I tend to notice the things about Perl that are difficult (at times verging on impossible) to do in C.

Early on in the Math::Complex package, a huge (and often reused) regular expression $gre appears:

# Regular expression for floating point numbers.
my $gre =
qr'\s*([\+\-]?(?:(?:(?:\d+(?:_\d+)*(?:\.\d*(?:_\d+)*)?|\.\d+(?:_\d+)*)(?:[eE][\+\-]?\d+(?:_\d+)*)?)))';

This regular expression captures floating-point numbers (which may include isolated underscores) in a single reference. It thus provides flexibility and robustness (key motivations for using regular expressions) without repetition. It also uses ?: within the brackets to cluster parts of the regular expression without providing back references; this means that the whole regular expression provides one back reference rather than five or six.

This regular expression may be easier to understand when refactored into smaller chunks. The test I used when refactoring was quite simple:

my @testexpressions = ('100', '100.12', '100.12e-13', '100_000.545_123',
	'1e-3', '.1', '-100_000.545_123e+11_12');
foreach my $expr (@testexpressions)
{
   if ($expr =~ m/$gre/)
   {
      print "$expr matches gre\n";
   }
}

The same test code can run against the refactored regular expression to make sure that it still matches the same strings. Here, working from the inside out, I extracted some common expressions. For each expression extracted and captured by the quote-like operator qr, I also removed the ?: clustering. $gre above is equivalent to $newgre below:

my $underscoredigits = qr/_\d+/;
my $digitstring      = qr/\d+$underscoredigits*/;
my $fractional       = qr/\.\d*$underscoredigits*/;
my $mantissa         = qr/$digitstring$fractional?|\.$digitstring/;
my $exponent         = qr/[eE][\+\-]?$digitstring/;
my $newgre           = qr/\s*([\+\-]?$mantissa$exponent?)/;

For example, $underscoredigits matches _123; $digitstring matches 545_123; $fractional matches .545_123; $mantissa matches either 100_000.545_123 or .1; $exponent matches e+11_12; and finally $newgre matches the whole string -100_000.545_123e+11_12.

The construction method make is, again, flexible and robust. The author has extracted part of its complexity to different methods to avoid repetition. For example, the module uses the _cannot_make function internally to report errors, calling it from several places in make. It looks like:

# Die on bad *make() arguments.

sub _cannot_make {
    die "@{[(caller(1))[3]]}: Cannot take $_[0] of $_[1].\n";
}

In turn, it calls the built-in caller function to refer back to the original code (similar to an assert in C, or a lightweight version of the cluck or confess functions in the Carp package).

make also calls the the remake function, if make receives only one argument (for example, a string such as "1 + 2i") and must deduce the real and/or imaginary parts:

sub _remake {
    my $arg = shift;
    my ($made, $p, $q);

    if ($arg =~ /^(?:$gre)?$gre\s*i\s*$/) {
        ($p, $q) = ($1 || 0, $2);
        $made = 'cart';
    } elsif ($arg =~ /^\s*\[\s*$gre\s*(?:,\s*$gre\s*)?\]\s*$/) {
        ($p, $q) = ($1, $2 || 0);
        $made = 'exp';
    }

    if ($made) {
        $p =~ s/^\+//;
        $q =~ s/^\+//;
    }

    return ($made, $p, $q);
}

The regular expression $gre appears here as part of a larger regular expression, to interpret not only "1 + 2i" but also "2i" by itself. The expression $1 || 0, used here and elsewhere, replaces the undef value of Perl (treated as 0 in a Boolean context) with 0, while leaving other values unchanged.

The plus function, one of the several binary operators provided in the package, also has some interesting features:

#
# (plus)
#
# Computes z1+z2.
#
sub plus {
        my ($z1, $z2, $regular) = @_;
        my ($re1, $im1) = @{$z1->cartesian};
        $z2 = cplx($z2) unless ref $z2;
        my ($re2, $im2) = ref $z2 ? @{$z2->cartesian} : ($z2, 0);
        unless (defined $regular) {
                $z1->set_cartesian([$re1 + $re2, $im1 + $im2]);
                return $z1;
        }
        return (ref $z1)->make($re1 + $re2, $im1 + $im2);
}

It uses the trinary conditional operator * = * ? * : *, which is also present in C. In Perl, this can return not just scalars but also lists. Thus the code to calculate the values of $re2 and $im2 is much more compact than the equivalent code in C could be. This code uses the Cartesian coordinates of $z2 if it's already a complex number. Otherwise, it turns a real number into a complex number.

The plus function later uses ref $z1, the package name of $z1, to create the sum of $z1 and $z2; this allows subclasses of Math::Complex to reuse exactly the same function.

Finally, the Cartesian function mentioned above can either return existing values for the real and imaginary part, if these are "clean" (valid), or recalculate them from the polar form. Each complex number object stores the "cleanliness" (validity) of its Cartesian values, as follows:

sub cartesian {$_[0]->{c_dirty} ?
        $_[0]->update_cartesian : $_[0]->{'cartesian'}}

This is a neat trick to avoid recalculating Cartesian coordinates when it's not necessary.

Summary

The Math::Complex package is not only useful, but efficient, robust, and flexible. It does use brief variable names, but this is traditional in mathematics; given that it uses complicated (or at least long) expressions, this means that the full expression is easy to understand (or read aloud). The functions themselves tend also to be brief and easy to understand.

We see here the benefits of reuse and refactoring. Over 8 years and 34 versions of this code, it has no doubt seen heavy rewriting, to the point of perhaps each line being different from the corresponding line in version 1.1 (or 0.1!). This extended refactoring has removed unnecessary and repeated code, clarified comments and usage, and led to clear and clean code. It provides an example not only of how to write a mathematical package in Perl, using regular expressions and references as described above, but also of what code can look like. That is perhaps its best lesson.

Endnotes

[1] "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." (Martin Golding)

[2] Though perhaps not always to this extent.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en