May 2005 Archives

This Week in Perl 6, May 18 - 24, 2005

Note to self: It's generally not a good idea to go installing Tiger on the day you return from holiday. It's especially not a good idea to fail to check that it didn't completely and utterly radish your Postfix configuration. And your Emacs. And the backing store for your website. And a bunch of other stuff. It's an especially bad idea not to have backups of things like your aliases file...

Nor is it a good idea to get preoccupied with all these joys and completely forget that you should be writing the Perl 6 summary.

Ahem.

I'm very, very sorry.

So, on with the show.

This Week in perl6-compiler

Inline::Pugs

Autrijus announced the availability of Inline::Pugs. If you've ever wanted to mix up Perl's 5 and 6 in one program, your prayers have been answered. Just grab Pugs and Inline and you're set. Brian Ingerson made things even more delightfully evil:

#!perl
use pugs;
sub postfix:<!> { [*] 1..$_ }
sub sum_factorial { [+] 0..$_! }
no pugs;
print sum_factorial(3); # 21

Experimental Coroutine Support

Autrijus announced that Pugs now has an experimental implementation of coroutines. It's almost certainly not final, but it's good enough for exploration and feedback purposes.

Graphing Tool for PerlGuts Illustrated

Yuval Kogman asked what tool generated the "pretty diagrams" in PerlGuts Illustrated because he wanted to use it for diagrams in a forthcoming PugsGuts Illustrated. Ingy said that Gisle had hand hacked PostScript based on initial diagrams drawn on graph paper. After some discussion, the plan seems to be that Yuval will just draw diagrams, scan them, and bung them into the Pugs repository. He'll rely on the LazyWeb to turn them into beautiful scalable graphics.

Perl Development Server

Okay everyone, repeat after me: "Juerd is a star!"

You may ask me why, and I shall tell you.

Juerd and his cosponsors, Twistspace, will make a Perl 6 development server available over the internet to any Perl 6 developers who are working on "everything that improves Perl 6 development". If you've been put off working on Pugs by the hassles of getting Haskell working on your machine, or if you have the kind of bandwidth that makes svn updates a painful prospect, worry no longer. Just sign up for a development account.

There was much rejoicing and suggesting of hostnames. Rather bizarrely, there was also discussion of the etymology of "sipuli" (Finnish for "onion", in case you were wondering).

Two Releases in One Day

Autrijus announced the release of Pugs 6.2.4. About half an hour later he announced the release of Pugs 6.2.5.

Undef Issues

Adrian Taylor thought he'd found some issues with Perl 6's understanding of undef. It turned out that he'd found some issues with his own understanding of same.

Method/Attribute Chaining

Alex Gutteridge found some weirdness with the chaining of autogenerated attribute methods (I wonder if the same weirdness occurs with hand rolled attribute methods). So far it remains unfixed, but given the speed of Pugs development I doubt it'll stay that way for long.

Meanwhile, in perl6-internals

Parrot as an Extension Language

Colin Adams continued to have problems using Parrot as an extension language for Eiffel. It turns out that interfacing between statically strongly typed languages and Parrot isn't easy.

Fixing t/src/manifest.t

Dino Morelli reported problems with t/src/manifest.t and wondered how some of the failures came about. Jürgen Bömmels thought that the problem was an overzealous test--the original version of which simply ensured that version control and the MANIFEST were in sync. He provided his suggested version of a less eager, but still svn compatible test. Further discussion thrashed out the various different use cases for manifest checking.

More t/p6rules Tests

Dino Morelli posted a bunch of tests for the Perl 6 rules. Well, he did once he'd done battling his mailer's somewhat bizarre choice of MIME type for his test files. Remember, if you're about to attach a .t file to a message you send to the list, make sure your mailer doesn't declare it to be an application/x-troff file--text/plain is your friend.

Then someone applied his patches.

Stressing the Hash

Leo asked for some stress and bench mark tests for hashes because he was in the process of redoing src/hash.c. Bob Rogers provided one.

In Other News, PyPy Gets an Initial Release

Leo crossposted the announcement of PyPy 0.6, a Python implementation written in Python. It's not bootstrapping yet, but it's getting there...

PIR Compilers Broken

Will Coleda had some problems with his TCL in PIR implementation. It turns Nick Glencross helped to track down the problems with the snippet he posted. I'm not sure whether his fix is extendable to work with the real ParTCL.

MMD

While working on mod_parrot, Jeff Horwitz ran into an issue with Multi Method Dispatch (MMD). In particularly he didn't seem to be able to declare a multimethod that accepted an arbitrary PMC. Leo asked for a .t file so he could explore the issue further, which Jeff provided.

State of ParTCL

Will Coleda posted a summary of the current state of ParTCL. By the sound of things, it's getting there.

Meanwhile, in perl6-language

Virtual Methods

Whilst noting that Perl 6 doesn't really need to be able to declare methods as virtual in the same way as C++, since one can simply use the handy ... to do the job, Aaron Sherman noted that there was a case for something similar when declaring "stub" methods that Roles could override. The idea being that you would implement an initial behaviour that you could further decorate by a Role. Except, as Luke pointed out, the Roles system as currently defined treats all such methods as overridable.

Aaron wasn't sure that this was such a good idea and produced code to illustrate why.

Default Precedence of User-Defined infix ops

Ingo Blechschmidt pointed out that user defined infix operators work in Pugs now. He wondered what their default precedence should be and how to define the precedence explicitly. Luke came forth with an answer, Sam Vilain asked an evil question, and Damian Conway suggested that, given how drastically precedence weirdness can mess with a programmer's head, there shouldn't be a default precedence at all, and if there were, it should be looser than infix:<+>. I agree with Damian.

1,(2,3),4)[2]

Argh! My head hurts!

However, if you're not sure how context works in Perl 6, Juerd provided a really good summary later in the thread

Reduce Metaoperator on an Empty List

Matt Fowles wondered how the shiny new reduce metaoperator worked given an empty list. Various suggestions were forthcoming, but I lean towards Randal's inject solution--but I'm a Smalltalk fan, so there's no surprise there. Personally, I reckon that the metaoperator version should just return undef given an empty list. If you want anything clever you should eschew the syntactic sugar and use inject or something like it. It seems that the consensus leans towards using an identity attribute on the infix operator.

BTW, what happens when you apply [/] to a list with one element?

Complex Arithmetic

Doing complex arithmetic right in a programming language is tricky. Edward Cherlin wondered if Perl 6 should follow what seems to be the consensus among programming languages that care about this sort of thing and use the shared definitions used by APL, Common LISP, and Ada. Luke thought it might be better to leave this to C6PAN. (This is the discerning language designer's equivalent to paying no attention to the man behind the curtain methinks).

Syntax for Specifying Role Parameters

Ingo Blechschmidt wondered if the syntax for specifying role parameters should be the same as the standard subroutine signature syntax (with a slightly modified proposed meaning for :). Thomas Sandlaß had some related suggestions to add. Nothing from any of the design team yet.

./method

Martin Kuel can't make himself like ./method as a shortcut for $whatever_you_call_self_this_week.method. Frankly, I can't blame him, but then I continue to think that the originally specified semantics of .method (calls method on $_, whether in a method or a sub, or anywhere else for that matter) are fine.

uniq

Ingo wondered why uniq wasn't in the current draft of Synopsis 29. He also wondered if its default comparator should be =:=. It turns out that there's rather more to the semantics of uniq than you'd expect; Damian's "hyper correct" implementation blew my mind.

s/.../{ $junction }/

Junctions? In substitutions? What is Ingo thinking? Warnock applies.

Argument Type Checking

Joshua Gatcomb sought reassurance that

sub foo (Int $bar) { say $bar }
foo 'hello';

would do the right thing, namely throw an exception. Luke reassured him.

How to Create a New Meta Operator

Ingo's obviously been in a very wondering mood this week. This time he wondered if he could create a new meta operator in the obvious (to anyone who's read Apocalypse 12 carefully) way. What? You've not read Apocalypse 12 carefully? Shame on you! Like this:

sub infix_circumfix_meta_operator:{'->', '<-'} (Code &op, $left, $right)
{ op $left + 1, $right + 1 }

say 2 ->+<- 3; # 7

Luke thought so, but threw in his own question about how we'd specify meta operators that only work on particular classes of operators.

How to Invoke a Method Reference

Continuing to mine his wondering vein, Ingo asked how to invoke method references. Juerd thought it'd work pretty much as it does in Perl 5.

Junctive and Higher-Order Types

Sam Vilain's question about converting a Haskellish chunk of code into Perl 6 went Warnocked.

foo(1: 2: 3: 4:)

I was so tempted to use "Multimethod colonoscopy" as the heading for this section. Aren't you glad I resisted?

Autrijus has started to implement multi-level invocants in MMDs. He asked a bunch of sanity check questions before proceeding. Luke and Damian provided the sanity.

Lazy Context

Borrowing Ingo's wondering hat, Yuval Kogman had questions about the semantics of laziness. Laziness is one of many features of Perl 6 that's reasonably easy to understand from the point of view of the user, but which is a big old can of worms from the point of view of the implementer. I think I understood Yuval's proposed semantics/implementation, but it stumps me when it comes to summarizing it. People seemed to like it though.

Declaration of my() Variables Using Symbolic Referentiation

Snatching back his wondering hat, Ingo asked a question I didn't understand about declaring my variables using symbolic referentiation. Frankly, I don't even understand the subject. The consensus of those responding seemed to be that what Ingo wanted to do was pretty silly in the first place.

Explicit Laws about Whitespace in Rules

Jeff "japhy" Pinyan wanted to know what the rules were about permitting whitespace in rules. In particular, was it legal to write \c [CHARACTER NAME], or must he write \c[CHARACTER NAME]. Damian reckons only the second is legal.

And We're Done

That's it for another week. Tune in at the same time next week when Mr. Fowles will entertain you all with his interpretation of the coming week's events. I'll be back here the week after that.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.

Or, you can check out my website. Maybe now that I'm back writing stuff I'll start updating it. There are also vaguely pretty photos by me.

Manipulating Word Documents with Perl

In a recent lightning article, Customizing Emacs with Perl, Bob DuCharme explained how to use the Emacs shell-command-on-region function to invoke a Perl script on a marked region of text. Bob writes that he was reluctant to invest the time needed to write the elisp code needed for a particular string manipulation, especially when he knew how much easier it would be for him to do that manipulation with Perl. However, by using the Emacs function shell-command-on-region, Bob could have his cake an eat it too--keep editing with Emacs, while using Perl on demand for string manipulation.

I've often been in the same boat as Bob, though while using Microsoft Word. When facing a thorny string manipulation problem, I, too, have found myself thinking, This would be so easy if I could just use Perl!

Unfortunately, Word VBA doesn't include a feature analogous to the shell-command-on-region function ...which certainly sounded like a challenge to me. In Word Hacks, Sean M. Burke and I demonstrated how to use the Windows clipboard as a primitive but simple means of exchanging data between Word and a Perl script. I decided to see if I could generalize that technique to emulate the Emacs shell-command-on-region function from Word, using VBA and Perl. I'm happy to report that it works. Using the code shown in this article, you'll be able to run any DOS command that accepts standard input (like most in the fabulous UnxUtils package) on text you've selected in Microsoft Word, then either have the output echoed back or use it to replace the selected text. As an example, you can use the code in this article to run Bob's OLDate2ISO.pl script on a selected date. (The OLDate2ISO.pl script converts a Microsoft Outlook style date, like "Sat 4/16/2005 7:35 PM" to ISO 8609 format, like "2005-04-16T19:35".)

VBA doesn't offer any easy way to capture shell command output, so there are actually two scripts needed to emulate the Emacs shell-command-on-region function: a Perl script to interact with the DOS shell, and a VBA function to manage that Perl script and deal with the output. In addition, I also wrote a simple wrapper macro that emulates the interactive (Escape+|) form of the shell-command-on-region function, using an input box, as shown in Figure 2.

The clip2shell.pl script

It's true that VBA does include a Shell() function, but when using that function, there's no way to capture the output (if any) of the command invoked. To capture the output of a command run on the DOS shell, I've used a simple Perl script, named clip2shell.pl that takes one argument: a single string containing the name of the command to run (including any arguments). The current contents of the Windows clipboard are the input to that command, and the command's output is then put to the clipboard. (The Win32::Clipboard module is included in the standard ActivePerl distribution). Here's the code:

# clip2shell.pl 
# A script for running a DOS shell command
# on the contents of the clipboard, then
# putting the command's results back
# on the clipboard
use Win32::Clipboard;
my $TEMP          = $ENV{'TEMP'};
my $semaphore     = "$TEMP/$$.tmp";
my $cliptext_file = "$TEMP/$$.cliptext.tmp";

my $clipboard     = Win32::Clipboard();
my $cliptext      = $clipboard->Get();
my $shell_command = shift @ARGV;

open F, ">$cliptext_file" or die;
print F $cliptext;
close F;

my $output = `$shell_command $cliptext_file`;
$clipboard->Set($output);

unlink($cliptext_file);
rmdir($semaphore);

Note that before exiting, the script deletes a folder named with $semaphore, which contains the script's PID, appended with a .tmp extension, located in the current user's TEMP folder. This is how clip2shell.pl tells VBA that it's safe to read the result back from the clipboard. As I mentioned earlier, the VBA Shell() command doesn't return the command's output--but it does return the command's PID.

The shell-command-on-region function, VBA style

The VBA function, ShellCommandOnRegion, manages the interaction between Word and the clip2shell.pl script. When this function invokes clip2shell.pl using the VBA Shell() function, it also creates the semaphore folder, named using clip2shell.pl's PID (the return value of the Shell function), then waits for the deletion of that folder before getting the output from the clipboard. At that point, it either echoes back the output (using the Status Bar, or, for output more than 80 characters long, a separate message box), or pastes it back in to the document, replacing the current selection. The optional bReplaceCurrentSelection argument controls the output method, which is False by default.

Function ShellCommandOnRegion(sShellCommand As String, _
               Optional ByVal sngMaxWaitSeconds As Single = 5, _
               Optional ByVal bReplaceCurrentSelection As Boolean = False) As Boolean
Dim lPID As Long
Dim sSemaphore As String
Dim sngStartTime As Single
Dim oClipboard As New DataObject
Dim sel As Selection
Dim sResult As String

Set sel = Selection

If sel.Type = wdSelectionIP Then
    StatusBar = "Please select some text first."
    Exit Function
End If

sel.Copy

sngStartTime = Timer

lPID = Shell("perl c:\clip2shell.pl " + Chr(34) + sShellCommand + Chr(34))
sSemaphore = Environ("TEMP") + "\" + CStr(lPID) + ".tmp"
MkDir sSemaphore

Do: DoEvents
Loop Until Len(Dir(sSemaphore, vbDirectory)) = 0 _
            Or ((Timer - sngStartTime) > sngMaxWaitSeconds)

If Not Len(Dir(sSemaphore, vbDirectory)) = 0 Then
    RmDir sSemaphore
    Exit Function
End If

oClipboard.GetFromClipboard
sResult = oClipboard.GetText

If bReplaceCurrentSelection Then
    Selection.Paste
Else
    If Len(sResult) > 80 Then
        MsgBox sResult
    Else
        StatusBar = sResult
    End If
End If

ShellCommandOnRegion = True
End Function

If there's a problem and clip2shell.pl doesn't delete the semaphore folder within a certain amount of time (the default shown here is 5 seconds, as represented with the optional sngMaxWaitSeconds argument), ShellCommandOnRegion gives up, deletes the semaphore folder itself, and returns a value of False.

NOTE: Before using this code, make sure you have a reference set to the Microsoft Forms 2.0 Object Library, as shown in Figure 1. Without this reference set, the DataObject object type won't be available to your code, which will cause a compilation error. To reach this dialog from Word, choose Tools, Macro, Visual Basic Editor. From within the Visual Basic Editor, choose Tools, References.

setting a reference
Figure 1. Setting a reference to the Microsoft Forms 2.0 Object Library.

Putting it Together

With the clip2shell.pl script and the ShellCommandOnRegion function set up, it's time to demonstrate how to use Bob DuCharme's OLDate2ISO.pl script on text selected in a Word document. First, I want to show a macro that runs OLDate2ISO.pl on selected text, then I'll show a more generalized macro, which emulates the interactive (Escape+|) version of the Emacs shell-command-on-region function.

Here's the OLDate2ISO.pl script, modified slightly to make its substitution globally on all the dates in its input:

# Convert Outlook format date to ISO 8309 date 
#(e.g. Wed 2/16/2005 5:27 PM to 2005-02-16T17:27)
while (<>) {
  s{\w+ (\d+)/(\d+)/(\d{4}) (\d+):(\d+) ([AP])M} {
     my $hour  = $4;
     $hour    += 12 if $6 eq 'P';
     sprintf( '%04d-%02d-%02dT%02s:%02s', $3, $1, $2, $hour, $5 );
  }gse;
  print;
}

Here's the OLDate2ISO macro. After putting it in the template or document of your choice, run it by pressing Alt-F8 (which is the same as choosing Tools -> Macro -> Macros), or by assigning it to a menu or keyboard shortcut:

Sub OLDate2ISO()
If ShellCommandOnRegion("perl c:\ol2iso.pl", , True) = False Then
    MsgBox "Couldn't fix selected dates"
End If
End Sub

A dedicated macro is great for frequently used Perl scripts (or other shell commands), but it's also helpful to be able to run an arbitrary shell command on the selected Word text, and have the text echoed back, using the Status Bar (or, if longer than will easily fit on the Status Bar, in a separate message box--the Status Bar/message box route was the closest I could get to an Emacs mini-buffer).

Here's the code for the general-purpose macro, named RunShellCommandOnSelection:

Sub RunShellCommandOnSelection()
Static sShellCommand As String
sShellCommand = InputBox(prompt:="Enter full Shell Command to run on selection:", _
                Title:="ShellCommandOnRegion", _
                Default:=sShellCommand)

If Len(sShellCommand) = 0 Then Exit Sub

If Selection.Type = wdSelectionIP Then
    MsgBox "Please select some text first."
    Exit Sub
End If

If ShellCommandOnRegion(sShellCommand) = False Then
    MsgBox "Shell command failed"
End If
End Sub

Running this macro brings up the dialog shown in Figure 2.

the macro's dialog
Figure 2. The dialog that appears when running the RunShellCommandOnSelection macro.

By using a Static variable declaration, the command entered will be "sticky" within Word sessions (to some extent), making it easy to re-run the same command multiple times. If you wanted to replace the selected text instead of echoing the result, modify the RunShellCommandOnSelection macro by adding the optional bReplaceCurrentSelection argument to the ShellCommandOnRegion call (leaving a blank value for the sngMaxWaitSeconds argument):

If ShellCommandOnRegion(sShellCommand, ,True) = False Then

Word may not be quite as customizable as Emacs, but by emulating a single Emacs function using Perl and VBA, it's possible to add a powerful new tool to Word, and in doing so, give any Perl refugees who find themselves in the Word world--whether by choice or circumstance--some of the comforts of home.

Build a Wireless Gateway with Perl

You have set up and configured various wireless access devices, but could not find one that included all of the features you needed. You could wait for a firmware upgrade from your manufacturer, hoping that they will include the features you want. However, your chances of finding all of your issues addressed by a new firmware package, if it ever comes out, are slim to none. Now is the time to roll up your sleeves and build your own wireless access gateway from scratch. Don't let this idea scare you; this is all possible thanks to the open source world.

This article introduces an open source project called AWLP (Alptekin's Wireless Linux Project), which turns a PC with an appropriate wireless LAN card (Prism2/2.5/3) into a full-featured, web-managed wireless access gateway. That old Pentium 120 machine in your basement might march back up the stairs shortly.

Building your own wireless access device is nothing new. Around three years ago, Jouni Malinen released HostAP, a Linux driver for wireless LAN cards with Intersil's Prism2/2.5/3 chipset. When operated in a special mode called Managed, the host computer acts as a wireless access point. HostAP does its job and does it well, but it is command line only, and that's not suitable for everyone. A complete solution that potentially competes with off-the-shelf devices needs more features, including DHCP, firewall, DNS, and NTP, along with a web interface for configuration. This is the de facto standard nowadays.

To use a car analogy, this is like building a custom car; you want to be able to put in a new clutch, change the suspension, or try a new braking system and see how it performs. In the case of the gateway, you might want to implement an outgoing port-based filter in addition to the incoming one in the firewall. You may want to try a different DNS server or develop a special module that will block MAC addresses through ACLs after exceeding a certain amount of bandwidth in a short period of time.

For AWLP, the chassis is GNU/Linux Slackware, the engine is Host AP Driver, the transmission is Host AP Utilities, and so on. AWLP code, written in Perl, is the part that makes all these components work together in harmony as a wireless access gateway. After you set up AWLP, you will have a functioning, preconfigured, wireless access gateway to start with. Then you can start modifying the Perl code and configuration files to test and implement the extra capabilities.

Hardware Requirements

You need a dedicated machine for this task. Running with less than 32 MB of RAM will be painful; the system is more RAM-intensive than it is CPU-intensive so you can run it even on a 486 processor--but be aware that most 486 machines have only ISA slots. You might have problems finding ISA 10/100 Mbps Ethernet cards and ISA wireless LAN cards on the market. An old Pentium machine with at least 1GB HDD is ideal for this task.

If you plan on running your wireless device on a 24/7 basis, and if availability and portability are major concerns, you might consider dedicated hardware. Check out OpenBrick Community for compact hardware platforms with HDD support.

In addition to the computer, you also need an Ethernet card compatible with Linux, a PCI or PCMCIA wireless LAN card that has a Prism2/2.5/3 chipset, and a PCI-to-PCMCIA converter, depending on your choice of wireless LAN card and the slots on your board. Refer to the AWLP Hardware Compatibility section for in-depth information on choosing these three components.

Installing AWLP

There are three phases of installation:

  1. Custom installing Slackware with tagfiles
  2. Upgrading packages in Slackware
  3. Installing AWLP codes and configuration files

The third step is the easiest one. The first step, installing Slackware with tagfiles provided by the AWLP package, will take most of your time and effort. You must have Slackware installation discs on hand. Consult the Slackware Linux Project site to obtain them. In order to keep this article readable, I again refer you to the related site, the step-by-step AWLP Installation Instructions. After completing the first and second phase of the installation instructions, you can install AWLP, the code, and the configuration files that will make everything work together.

I highly recommend that you take a disk image after successfully installing AWLP and before you start to modify the code if you have installed it on a slow machine. Installing AWLP code and configuration files take no more than couple of minutes, but in order to reach this step, you must have custom-installed the Slackware with the tagfiles provided, and this might take a couple of hours on an old (<200MHz) Pentium machine. Taking a disk image and restoring from that image when needed will be a lot easier and quicker than starting afresh from Step 1.

Under the Hood

AWLP uses several configuration files:

  • DHCP, the Dynamic Host Configuration Protocol: /etc/dhcpd.conf
  • Apache Web Server: /etc/apache/httpd.conf
  • DNS, the Domain Name System: /etc/named.conf, /var/named/caching-example/localhost.zone, /var/named/caching-example/named.ca, and /var/named/caching-example/named.local
  • NTP, the Network Time Protocol: /etc/ntp.conf
  • Firewalling: /etc/rc.d/rc.firewall

In addition to these standard configuration files, there are some custom configuration files such as /etc/awlp/oui_filtered.txt. This file contains the filtered version of oui.txt, representing the IEEE Organizationally Unique Identifiers. It makes it possible to find the company manufacturing a specific Ethernet card, wired or wireless, using the first 24 bits of MAC address. In order to have an in-depth knowledge of all the configuration files and their layouts, I encourage you to examine installer.sh from the AWLP package.

AWLP code resides in /var/www/cgi-bin/awlp. The core of the AWLP code is index.pl. It does all of the error checking, manipulation, and modification. The web server, Apache 1.3.33, runs as the user and group apache. In order to manipulate configuration files through the web browser, the related configuration files have suid and guid permissions set. This strategy is definitely more secure than running the web server as root.

The part that interacts with HostAP command line tools and utilities, along with standard Linux tools and utilities, is engines1.pl. It serves as an include file and contains various subroutines to do the work. engines2.pl also contains various subroutines, but these usually do sorting, searching, and conversion. radar.pl provides status checking and monitoring functionality. It's not crucial to the operating of the wireless access gateway, but it definitely does add value because watching and monitoring how your device is performing is the key factor to the success of your implementation.

extras.pl provides ASCII to HEX conversion, DHCP lease table display, and several other non-core functions. As the name implies, error_messages.pl has all the error messages and their descriptions. global_configuration.pl has pretty much all the configuration variables ranging from critical to non-critical. You must understand the inner workings of the system in order to change the configuration variables up to $MAIN_TITLE. $MAIN_TITLE and the following variables are also necessary, but for mostly cosmetic purposes, so you can customize them without worrying about accidentally disabling needed features.

Sample Modification

Now you know what AWLP is and what it can do. What about all these modifications and customizations? Now, it is time to show you a step-by-step instruction for adding a feature to AWLP. As of AWLP 1.0, you can configure the DHCP server only by manually changing /etc/dhcpd.conf. The feature I want to demonstrate will simply show the contents of the file. Once you have familiarity with the code, you can improve it to modify the contents of dhcpd.conf.

The Apache web server runs as the user and group apache. The result of ls -l on /etc/dhcpd.conf shows:

# ls -l /etc/dhcpd.conf
-rwxrwx---  1 root apache 618 Mar 5 12:41 /etc/dhcpd.conf*

The script will be able to read and modify the /etc/dhcpd.conf file. Open the index.pl file. At the top, there are two configuration variables; @MainPageLinksAction and @MainPageLinksName. These two control the links on the left side. To add an additional link for DHCP, add DHCP to both arrays.

Next, find the line that says:

elsif ($FORM_Action1 eq "Administration") {

Just before this line, add the following lines of code, then save and close the file.

elsif ($FORM_Action1 eq "DHCP") {

    my $DHCPConfigFileContent;

    if (open(FILE, "/etc/dhcpd.conf")) {
        local $/;
        $DHCPConfigFileContent = <FILE>;
        close(FILE);
    }

    unless ($DHCPConfigFileContent) {
        $DHCPConfigFileContent = "/etc/dhcpd.conf could not be read or it is empty!";
        $DHCPLeaseRangeStart = "N/A";
        $DHCPLeaseRangeEnd   = "N/A";
    }

    my ($DHCPLeaseRangeStart, $DHCPLeaseRangeEnd);

    if ($DHCPConfigFileContent =~ m/Range\s+(\d+\.\d+\.\d+\.\d+)\s+(\d+\.\d+\.\d+\.\d+)/i) {
        $DHCPLeaseRangeStart = $1;
        $DHCPLeaseRangeEnd   = $2;
    }

    $Right_Plane_Output .=<<HTMLCODE
    <TABLE ALIGN=CENTER CELLSPACING=3 CELLPADDING=3 BGCOLOR="#000000">
    <TR BGCOLOR="#FFFFFF">
            <TD ALIGN=LEFT>
            <font face="Helvetica, Arial, Sans-serif, Verdena" size="2">
            <TABLE CELLSPACING=0 CELLPADDING=0>
            <TR>
                    <TD>
                    <font face="Helvetica, Arial, Sans-serif, Verdena" size="2">
                    <PRE><CODE>
                    ${DHCPConfigFileContent}
                    </CODE></PRE>

                    <BR><BR><BR>
                    <font color="#FF0000">Lease Range:</font> 
                    <B>${DHCPLeaseRangeStart} to ${DHCPLeaseRangeEnd}</B>
                    <BR><BR><BR>
                    </TD>
            </TR>
            </TABLE>
            </TD>
    </TR>
    </TABLE>
HTMLCODE

}

Once you do this modification, there will be a new menu section on the left side with the name DHCP. Clicking the DHCP link on the left will show you the contents of /etc/dhcpd.conf and Lease Range values, which come from the contents of the file through a simple regular expression construct.

Conclusion

You will have a functioning wireless access gateway once you install AWLP. The above illustration proves how easy it is to add features to this software-based wireless access gateway, provided that you are familiar with Perl. However, to accomplish more useful modifications tailored to your needs, you should examine index.pl and the other core, and include scripts and configuration files.

Related Links

This Week in Perl 6, May 3, 2005 - May 17, 2005

All~

Welcome ot another fortnight's summary. Wouldn't it just figure that I can't think of anything sufficiently non-sequiterish to amuse myself. Perhaps I need a running gag like Leon Brocard or chromatic's cummingesque capitalization. Maybe I should start one and not tell you. That could be fun.

Perl 6 Compiler

Pugs commit emails

If you have ever been foolish enough to want an email for every commit in Pugs, Sam Vilain created a way to help you sip from the firehose. Have fun.

given when nested

Luke Palmer had a question about how nested when statements in a given block should act. His intuition disagreed with Pugs, but most others supported Pugs.

I don't need to walk around in circles

Autrijus has made Pugs into a registered compiler for Parrot. Because Pugs already allowed you to embed Parrot code (PIR anyway) directly into Perl 6, this allows you to embed the Perl 6 in your PIR in your Perl 6. Now the possibilities are endless, at least until you blow your mental stack. Those of you with tail call optimization in your mental stack may simply go into an infinite loop if you prefer.

xor on lists

Trewth Seeker expressed his opinion about the proper definition of xor quite strongly. Unfortunately, his opinion is at odds with established mathematics, as Mark Biggar pointed out to him.

PGE features update

Patrick provided an update on the state of the Perl Grammar Engine. It has many nifty new features.

Pugs on Cygwin

Rob Kinyon and Gaal Yahas worked to improve Pugs support for Cygwin. Unfortunately the thread winds down with an unanswered question. Fortunately Stevan clued me in on IRC that things are not working just yet.

Pugs gets some objects and some rules

Autrijus announced that Pugs now has basic support for Objects and Rules. Sometimes he scares me, though usually he just makes me really want to learn Haskell.

regression test

Miroslav Silovic provided a regression test for hyper ops. Some people just don't appreciate the fun of regressing.

basic test for classes

Stevan Little provided a patch for a simple object test. Autrijus applied it. That's odd, because I am pretty sure that Stevan has the commit bit...

torturing PGE

Juerd provided a link to a big rule that could segfault PGE. It kind reminds me of a homework assignment I had to create a regular expression which matched all strings of numbers that did not contain any repeated digits. That's easy in Perl, but hard in math. I think the resultant regex was somewhere around 17 MB.

Pugs 6.2.3 with Live CD

Autrijus released Pugs 6.2.3 which contains 10% more awesome then Pugs 6.2.2. You should check it out on the live CD that Ingo Blechschmidt released.

PXPerl meets Pugs

Grégoire Péan announced that he has added Pugs binaries to his windows distribution of Perl. Pretty cool. Autrijus innocently asked him to take on the slightly larger task of producing binaries of Parrot too, so that Pugs could be at its most powerful.

Parrot

Wow, did you see how I mentioned Parrot before going into this new section. That was an awesome transition! My high school English teachers would be so proud.

character classes

Patrick wants character class opcodes of the form find first and find first not. Leo pointed him to some hysterical raisins who might help.

PGE on MinGW

François Perrad fixed a problem with building PGE on MinGW. Patrick applied the patch.

PIO_fdopen return value

Luke Palmer both intoduced me to the wonderfully cute phrase "untodid" and provided a patch making PIO_fdopen return NULL when given bad flags. Leo applied the patch, but Melvin Smith warned that this might be a bad idea. Silence followed.

embedding initialization

Jeff Horwitz was having trouble embedding PIR into C. Leo provided some pointers. Jeff was happy.

Test::Builder updates

Previously, Michael G Schwern announced an update to Test::Builder. chromatic asked if it was worth the upgrade. Michael replied probably, but I don't think anyone has acted on it.

miniparrot

Robert Spier created a miniparrot at Bernhard Schmalhofer's request. This miniparrot does not replace our make system, but it does make our website less camel centric.

Autrijus gets the commit bit

Leo, Autrijus, and Chip had one of the nerdiest conversations ever. The summary of which is that Autrijus gets commit priveledges for Parrot. The general consensus was that he was too productive in Haskell and we needed to hobble him with a real man's language like C.

Parrot 0.2.0 "NLnet"

Leo announed the release of Parrot 0.2.0. This one didn't seem to make it to Slashdot. That's kinda sad, because I always get a warm feeling when I know about stuff before /. Oddly, Google Groups seems to have lost the email.

really make realclean

make realclean failed to find a few files: a flaw forcefully fixed by Jerry Gay.

load_bytecode shouldn't segfault

Bob Rogers made it not. Leo applied the patch.

tell me sweet little lies

Patrick put out a request for a rudimentary set of lies and damn lies. People are welcome to provide benchmarks too.

Parrot Panic

Leo found that Parrot was panicking during start up. He rolled that patch back.

make testr

Leo put out a request for a make test target which would invoke Parrot twice, once to compile to PBC and once to run it. Dino Morelli provided a patch. Leo applied it.

trans test failures

Jens Rieks opened a ticket for some failing test long ago. Now he wondered if there was a status update. Warnock applies.

parrotcode.org update

Leo noticed that parrotcode.org needed a little loving. Robert Spier provided it. He also mentioned that people could provide their own patches for it against https://svn.perl.org/perl.org/docs/live/parrotcode/ . Now is your chance to contribute to Parrot's public face.

runtime/parrot/library search

Jonathan Scott Duff wondered why runtime/parrot/library wasn't in Parrot's search paths. Leo added it for load_bytecode.

on the road to a tiny Parrot

Leo began down the road to miniparrot, creating first a Parrot without a config and using that to generate a config.fpmc for Parrot. The information provided there helps to create a larger Parrot.

MMD pmcs

Bob Rogers posted some questions about how to work with multi subs and provided a preliminary patch. Leo provided some answers but felt the need to pin down the calling conventions before the patch.

commit bit for Matt

Matt Diephouse received a commit bit. Congrats. Leo took the opportunity to remind himself to run make test before committing.

NULL deref in real_exception

Nicholas Clark found a NULL reference in real_exception. Leo explained that he needed to call Parrot_run_native to allocate the exception structute (and set the stack top pointer). Nicholas didn't want to set the stack top as he was tracking some Perl refcount bugs.

PGE::Hs

Autrijus provided a patch to make PGE escape strings as Haskell FFI expects. Patrick suggested a slightly different approach, which Autrijus took.

Bug in Boolean.pmc

John Lenz found and fixed a bug in Boolean.pmc. Leo applied the patch and Juergen Boemmels provided a test.

svn revision number for releases

Andy Dougherty noticed that Configure.pl printed "failed" for release tarballs as they don't have .svn directories. He changed it to print "done". Leo applied the patch.

spawnw @args

Jeff Horwitz provided a patch which allows spawnw to take an array. Leo applied it (with a brief reminder on platform-specific ettiquette).

dynclasses build problem on Win32

Jerry Gay fixed a problem building dynclasses on Win32. Leo applied the patch.

basic JIT questions

Millsa Erlas had a few basic questions about Parrot's JIT. Leo provided answers.

@ANON tests and test fixes

Jerry Gay fixed some tests and added some more. Leo applied the patch.

filepath manipulations

Leo put out a request for some filepath and string manipulation support in Parrot.

call syntax abstraction

In a failed attempt to dewarnock himself, Leo reported his call syntax abstraction proposal.

Old Tags

Nick Glencros suggested renaming (or possibly removing) some old tag files from our CVS days. Leo was unsure about the removing option but liked the renaming one.

MinGW build problems

François Perrad provided a patch to fix some build problems on MinGW. Leo applied the patch.

omniscient debugging in parrot

Andy Bach wondered how much of Omniscient Debugging would be possibly in Parrot. Leo reasoned that someone could add it with some work. It would involve replacing all mutating vtables with special versions that store extra information to allow them to roll back.

embedding/extending interface

Jeff Horwitz wondered who else was actively working on mebedding Parrot. Nicholas Clark provided a very uncertain pointer.

OO support in Parrot

Autrijus explained that Parrot's current implementation made attribute access difficult. Leo went further saying that he felt it was wrong. The consensus is that Parrot needs to allow non-absolute access to attributes, so Leo made it so.

config.t fails

François Perrad found that config.t fails without first doing a make clean. Leo deemed his initial solution a little too quick and too dirty.

find ops return for not found

Patrick provided a patch which changes the return value of find and find_not to the string's length (instead of -1) if the character does not occur. Warnock applies.

clean *_config files

Jerry Gay provided a patch to clean the _config files during make clean. Leo applied it.

MMD for logical ops

Leo changed the logical ops to return one of their operands as appropriate.

warning cleanup

Jerry Gay provided a patch to remove a warning on Win32. Bernhard Schmalhofer applied it.

Namespace updates?

Tim Bunce wondered whether any resolution with respect to namespaces had come about. Leo told him not much.

rules questions

Dino Morelli was trying to add some unit tests when he ran into questions. Patrick provided some answers but suggested further conversation move to p6l, which it did.

MD5 library clean ups and speed ups

Nick Glencross posted some updates to the MD5 library. This led to a few rounds of speading it up and comparing its speed with various other MD5 libraries. The final result is: slower than C but MUCH faster than pure Perl.

disassemble segfaults

Bob Rogers pointed out that disassemble was sefaulting on some byte code. Leo fixed it.

s/internal (exception)/real \1/

Jerry Gay provided a patch which changed some internal exceptions to real ones. Leo applied it.

failing tests

Tim Bunce reported some failing tests on Mac OS X. Leo fixed them.

small typo in PBC_COMPAT

Uwe Voelker provided a patch fixing a typo in PBC_COMPAT, which chromatic applied. He also noticed that p6rules/*.t did not have plans. Patrick welcomes all updates to PGE tests.

t/p6rules/ws.t

Dino Morelli added some tests for p6rules. Patrick applied the patch.

index up bug in PGE

Jerry Gay found a bug in PGE involving escaping sequences strangely. Leo tracked it down, and Patrick fixed it.

Parrot on Python

Kevin Tew wondered what the state of Python on Parrot was. Sam Ruby and Michal Wallace provided updates. Hopefully it will take off again soon.

paths with spaces need quoting

Ron Blaschke provided a patch to quote some paths that needed it in dynclasses. chromatic wondered if that would break with paths that already contain quotes.

.cvsignore

Juergen Boemmels noticed that the SVN repository still contains some .cvsignore files. He suggested removing them, but a few things need to be updated to the svn world before that can happen. Bernhard Schmalhofer made it happen.

NULL pointer deref

Adrian Taylor found a NULL pointer problem in Parrot. Leo fixed it.

Parrot embedded in XSLT 2.0

Colin Paul Adams wondered how he could get information back from an embedded parrot. Autrijus pointed him to Parrot_call_sub with a signature of SS (takes a string and returns a string).

thread detatch hangs on win32

Jerry Gay noticed that thread detatch was hanging on Win32. He provided a patch to skip it so that other tests could fail in its place. Leo applied the patch.

Perl 6 Language

Semantics of Coroutines

Joshua Gatcomb wondered whether coroutines were invokable with new arguments on successive invocations. Some pointed out that allowing the arguments to change is a more powerful model, but I didn't see anything definitive.

function composition operator

Michele Dondi wondered if there was a function composition binary operator. While one does not exist, it's addable (as Ingo Blechschmidt demonstrated much later).

initialization of state vars

Ingo Blechschmidt wondered how state vars and parentheses would interact. Larry replied that his examples were probably correct.

==> automap?

Brad Bowman wondered if a single arg sub or block would automatically map when used on the sharp side of a pipe. Luke Palmer thought this might be too much dwimmery.

refactoring IDE

J Matisse Enzer wondered if Perl 6 would have strong IDE tools like refactoring supported or automated syntax completion. Larry explained that he would like to make it possible for Perl 6 to support these things "just as Perl 1 built in all the system interfaces".

reduce precedence

Juerd wondered what the precedence for the reduce metaoperator was. Luke Palmer said "listop".

piping into random things

Juerd wondered about piping into various things such as arrays, scalars, hashes, and filehandles. Larry gave one of his characteristically speculative answers.

That's why they call me Mister Bitterness

Juerd wondered what "complain bitterly" meant in the context of the yada operator. Larry explained that ... would fail, ??? would warn, and !!! would die.

isa specifics

Stevan Little wondered how isa would act when called with junctions, nothing, classes, or instances. Larry explained that it would act intelligently by autothreading, returning a list of all options, returning a bool, or something I didn't follow.

available operators

Juerd created a somewhat lengthy list of available operators, hoping to inspire someone to come up with a good operator for block labels. This led to a very meandering thread.

reduce meta operator

Some of you might be confused by my earlier meantion of a reduce metaoperator. Larry introduced one. Much debate ensued, but Larry seems fairly set on it.

override built ins

Andrew Savige wondered if he would be able to redefine built in functions such as read in Perl 6. Larry explained that Perl 6 will give you so much rope that you could hang yourself from several trees while blowing off your own foot with it.

opening stdout

Gaal Yahas wondered how to open stdout or a file named "-". Larry explained that io() would have the dwimmy parts like opening stdout, while open would not try and dwim.

adverbial blocks explained

Terrence Brannon stumbled upon the phrase "adverbial blocks" but didn't understand what it meant. Luke Palmer provided a very clear and cogent explanation.

circular dereferencing

Autrijus noticed that the autodereferencing of references would cause an infinite loop for circular refernce. Larry recanted and decided that the full on autodrill down was not as cool as he had initially thought, but he did warn us that next week he might think it was even cooler.

scoping of $/

Ingo Blechschmidt wondered what sort of scope $/ would have. Luke Palmer replied that it would be lexical just like Perl 5. Larry corrected him pointing out that it would actually be lexical, unlike Perl 5.

binding subs' return values

Joshua Gatcomb wondered what binding of subs' return values would do by default. Juerd answered that it would allow modification only for subs declared as lvalues.

use fatal, no fatal, exceptions, and undef

Aaron Sherman wondered how various scopes fatality levels would interact. Luke Palmer explained that you need to do 360s on the control pad while holding block. He also provided some thorough examples.

XML grammar in Perl 6

A while back Juerd wrote a Perl 5 script to transform the EBNF spec of XML into Perl 6 rules. Now that Pugs might be able to support it, he suggests that it would be a good project for a brave soul. No takers have yet appeared.

mailing list indexing

Aaron Sherman posted a link to his initial version of an annotated version of the mailing list. He asked for comments, but Warnock applies.

prefix adverbs

Someone who posted to Google Groups (receiving the unfortunate name mangling of arcadi.sheh...@gmail.com) asked if it made sense to write $a = stuff @foo, how => 'scrambled', 1, 2, 3; as $a = :>how('scrambled') stuff @foo, 1, 2, 3; or some such. Sadly, we will never know.

semantics of split

Autrijus asked if he had Pugs splitting correctly. It wasn't, but it is now.

S29: punt

Rod Adams announced that he found the real world intruding too much and was going to have to leave off his work on S29. Sam Vilain, Aaron Sherman, and Max Maischein all volunteered to take up the effort.

chomp!

Joshua Gatcomb wondered exactly what is chomped did. Larry Wall explained exactly what is chomped does.

character classes

Patrick, based on his experience with PGE, suggested a slightly new syntax for character classes in Perl 6. Larry liked the syntax and went on to muse about other unresolved issues involving character classes.

Numification of match objects

Autrijus noticed that numification of match objects made strings of digits numify to 1 (i.e., true). He didn't like this. Actually no one did, so it has changed to numify as one would expect. After all, it can numify to 0 but true.

traits and properties API

Stéphane Payrard wondered when and how traits would interact with properties. Brent "Dax" Royal-Gordon and Larry provided answers.

single element lists

Jonathan Scott Duff wondered what (1)[0] would do. Larry though that we would have to specialize ()[] to parse as (,)[].

Void type?

Thomas Sandlaß, Rod Adams, and Autrijus speculated about ways to deal with a Void type. Nothing definitive came out of it though.

uniquely identifying objects

Stevan Little wondered if there was a way to uniquely identify objects in Perl 6. Larry pointed him to the .id and the associated =:= operator.

BEGIN and lexicals

Benjamin Smith wondered if BEGIN could modify lexicals that don't really exist yet. Larry expressed the opinion that one should be able to modify compiler state in BEGIN blocks; however, he did not answer the question of what Benjamin's example does.

:: vs ::: in rules

Patrick confused many people when he asked about the difference between :: and ::: at the top level of rules. The answer seems to be that ::: will fail the entire match while :: will simply fail it at that offset in the string on which it is attempting to match.

negatives of junctions

Larry noticed that != and !~ will confuse English speakers when applied to junctions. Luke noticed that defining $a != $b as !( $a == $b ) works, averting tragedy.

boxed types from builtins

Aaron Sherman worried that many built in functions return boxed types which could cause a big speed hit. Rod Adams explained that this was necessary but optimizations would be made available.

./method

Juerd suggested using ./method to mean $?SELF.method (in an attempt to solve the long standing debate of $?SELF.method vs $_.method). Much discussion ensued although the general response seems favorable.

operators everywhere

Rob Kinyon noted that there seemed to be an extremely large number of operators. He expressed concern, because he had believed that P6 was going to have a small core with modules. Larry explained that most of these operators were the combination of a small set of operators and meta operators in a combinatorially explosive way, giving the wonderfully lucid example of [>>+^=<<]. Much discussion ensued.

BUILD and submethods

Ingo Blechschmidt wanted to be sure that all appropriate submethods would be called when they should and that only the correct one would be called when they shouldn't all be. Larry answered that it did work as he expected.

$. vs $:

Luke Palmer was having trouble understanding the difference between $. and $:. Aaron Sherman prointed out a few differences according to A12.

not 4,3,2,1,0;

Autrijus wondered what the signature for not was in Perl 6. Larry explained that unlike Perl 5, Perl 6's not function should act like !<<[4,3,2,1,0].

multiple colons in MMD

Luke Palmer caught Autrijus off guard when he pointed out that multis could have multiple levels of :, each of which is less important than the last. Larry surmised that they might not have documented this hard enough.

Nested Captures

Carl Franks started a very long thread when he noted that nested captures caused extra layers in the match array rather than counting parens like Perl 5. This led to discussion of 0 vs 1 indexing. Discussion ensued, with the resulting decision that $0 == $/[0]. There was much rejoicing.

'1.28' * '2.56'

Autrijus wondered what path '1.28' * '2.56' should take to arrive at 3.2768. Larry told him that infix * used prefix + to numify non-Num args.

methods from roles vs classes

Aaron Sherman wants to lazily load a role but only the first time it is necessary. Luke Palmer assured him that his example would work correctly.

currying and defaults

Aaron Sherman, in his efforts to get up to speed on S29, wondered how currying would interact with default values. Larry explained that default values would not be bound until the invocation of the curried function.

precedence of custom infix ops

Ingo Blechschmidt wondered how to define the precedence of custon infix ops. Luke Palmer pointed out the looser, tighter, and equiv traits (by default it is equiv( &infix:<+> ) ).

The Usual Footer

Posting via the Google Groups interface does not work. To post to any of these mailing lists please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to

Inside YAPC::NA 2005

The Perl Foundation organizes and holds several community-based Perl conferences each year. This year's North American conference, YAPC::NA 2005 is in Toronto, Canada, June 27-29. chromatic recently interviewed Richard Dice, organizer of the conference this year, to discuss his plans and experiences.

Can you tell our readers a little bit about yourself?

Richard Dice: Richard's just this guy, see?

I'm 32, born in Montreal and grew up in the sleepy southern Ontario city of London, Ontario. I did my undergrad degree in astronomy and applied mathematics at the University of Western Ontario. I moved to The Big Smoke (Toronto) in 1996 and have lived here (with brief stints in Montreal again and in Waterloo, ON) since then. I finished my MBA at the University of Toronto about a year ago, and I have recently started a job as a senior IT consultant at a company in Toronto called Information Balance. I've been married for not quite 2 years now, and we're soon to be moving into our first house. (The day after YAPC! Ugh!)

I've been a Unix-head since 1992 and a Linux guy since 1994, which is around the same time that I started programming in Perl, coming to it from C. (The first Perl program I ever wrote took 45 minutes. To do the same in C would have taken me three days. I never looked back.)

I have a love of good beer and scotch (and know everywhere in Toronto to get both), overestimate my ability to speak French, have an ever-growing personal library that my wife laments, and I go to all the good raves in the city. (I get in to most of the after-parties, too.)

Also, I'm a cat person.

How did you decide that not only did you want someone to host a YAPC in Toronto, but that you were the one to write the proposal and organize things?

RD: I blame Damian.

Damian Conway is a professor of computer science at Monash University in Melbourne, Australia, a Perl programmer par excellence, and an amazing public speaker. In the latter half of 2000, Yet Another Society (also known as The Perl Foundation) raised money to "buy" Damian for the year 2001, releasing him from his professorial duties at Monash and tasking him in part to tour the world speaking about Perl. (The Perl 6 announcement had just recently been made at the time and it was thought that he would make an excellent "champion" of the cause.)

In early 2001, I was in the process of moving from Montreal to Toronto when I noticed that there was a small gap in Damian's tour schedule in June. I had seen him the year before at YAPC 19100 in Pittsburgh and thought he was simply amazing. I wanted to support his mission to evangelize Perl. So I wrote him an email, introduced myself, and asked him to come to Toronto to present a few free public talks on Perl. I didn't want for him to say no, so in order to help make the decision easier I told him I'd make it all expenses paid. He quickly replied to say yes. Then, I wrote the Toronto Perl Monger (TPM) mailing list (which had never heard of me before this) and asked for help with fundraising to bring Damian to Toronto. It took some doing on my part, but in the end Damian's trip was completely covered and I didn't have to back-stop it. Not only was the trip a success financially, but everyone really seemed to enjoy the experience. This was my introduction to the Toronto Perl Mongers, to Damian, and to the wide world of Perl community organization.

Just before he left Toronto, Damian suggested that I consider organizing the same for him during his North American summer tour in 2002. How could I say no? Only this time, with a year to plan things, I could turn it from a small event into Damian-pooloza. Spread out over a week in June 2002 he presented three talks, each with around 250 people in attendance. The budget was also an order of magnitude greater than what I raised the year before. That was when some people locally (and Damian) started suggesting to me that I organize a YAPC in Toronto. I cringed, because I had known the suggestion was going to be raised eventually, and because I knew how much work it would be. But I also knew that, again, there was no way I could say no.

In the summer of 2003 I organized TPM to put together a bid for YAPC::NA 2004, which lost to the Buffalo bid (as judged by The Perl Foundation). And I'm glad that it did! The Buffalo team did a great job, and it just spurred us on to try even harder for our 2005 bid, which we won.

You offered to fund Damian's trip to Toronto without having things lined up first?

RD: That's correct. People are often surprised when I mention this aspect of things.

Is that confidence or something else?

RD: Some of it is confidence, but only part of it would be confidence in myself. (Though certainly I've been accused of this more than once.) The other part is confidence in the Perl community, its generosity and its ability to recognize when something good (e.g. a Damian Conway series of talks) is about to happen.

When I first hatched the scheme to bring Damian to Toronto in my head, I realized that it had all the makings for an organizational race condition. That is, if I went to Damian telling him that we would like him to come to Toronto conditional on funding being found, then he'd say "Great! Get back to me when you've got the funding and I'll make my plans." And then I'd turn to the Toronto Perl community to ask for funding to bring Damian to Toronto, and the reply I'd get would be "I'd love to donate! Tell me when Damian is confirmed and I'll contribute something." That's the kind of situation that just eats planning time away until it's all gone. To break this vicious cycle before it even started I decided to guarantee the funds myself. It worked. There were risks, but the potential rewards were too great not to make the effort.

I remember Damian lobbying for the Toronto bid for 2004. Obviously your approach worked.

RD: I assure you that no kick-back schemes were involved. :-)

I think that whatever support Damian showed for Toronto would have come from his experience in seeing us "pull out all the stops" when we invited him to Toronto in the past. I guess he had confidence in us to do the same, but for hundreds of people this time and not just him.

What's the process of putting together the budget like? I'm not looking for specific details and secrets, just some idea of what goes into organizing a conference.

RD: The budget is everything. The Perl Foundation isn't some huge organization with bottomless pits of money--YAPC has to stand on its own, financially at least. I never allowed myself to plan a budget where I wasn't 99.9% sure that no losses would occur. On the other hand, I didn't want to have a conference that was so skimpy on goodies that people would be talking to each other years from now saying things like, "Remember YAPC in Toronto? That just sucked!"

This made for a rather iterative process. The conference budget was designed "modularly", so as I identified more funding I could add on to the conference. The original budgetary pass at the conference did look pretty sucky. It wouldn't lose any money, but it would only barely make the YAPC grade, too. So then the effort shifted over to finding corporate sponsorships. This was a lot of hard work, but it has been entirely worthwhile. We have had a lot of excellent people and companies support the conference well beyond the call of duty. (Please visit the sponsors page on the YAPC::NA web site to see what I mean and who I'm talking about.)

The other main point regarding the budgetary process is that I didn't do it alone. Eric Bower, one of the Toronto Mongers, stepped forward to create the budgetary spreadsheet models that he and I consulted (and still consult) regularly. The latest iteration of this process has so many smarts built in to it that I wonder sometimes if I could just step away from planning the conference from here on in and let the spreadsheet take care of things by itself. :-)

In the past couple of years, I've heard that there's more collaboration between YAPC organizers than before. (Jeff Bisbee talked about this quite a bit after the YAPC::2003 in Florida.) Is there a community of burned-out ex-organizers that still work together?

RD: The first rule of YAPC ex-organizer club is you do not talk about YAPC ex-organizer club.

Seeing as how I just broke the first rule, I may as well spill the beans entirely.

The majority of the contact I have with the ex-organizers is with Kevin Meltzer (YAPC:NA 2003 in Boca Raton, FL, alongside Jeff Bisbee) and Jim Brandt (YAPC::NA 2004 in Buffalo). Jeff and Kevin glommed themselves onto The Perl Foundation to act as the core of the Conferences Committee, and Jim is just a generally accessible nice guy who can't hide from me, as I've got him on AIM. I've had a few emails with Sarah Burcham (YAPC::NA 2002 in St. Louis) and Rich Lafferty (YAPC::NA 2001 in Montreal) but nothing major.

In addition to the YAPC organizers of previous years, I have been in a great deal of contact with Allison Randal, president of The Perl Foundation, and Kurt DeMaagd, treasurer and general "boy Friday" of The Perl Foundation. He's the unsung hero in all of this as far as I can tell.

Truth be told, even if the previous YAPC organizers were incredibly tight and available to me, I'm not sure what I would be able to do with the "wisdom of the past" they could offer. They might be able to give me a sense of what needs to be accomplished and by what T-minus-YAPC date, and maybe they could give me some leads in terms of corporate sponsorship, but these things really aren't the bottleneck. The early bottleneck for the conference this year has been money and making a budget for the conference that works. Doing that took so much time that things we would have liked to do sooner had to wait too long for my liking. (For example, opening up attendee registration.) The current bottleneck is finding people who can look after task implementation details on the ground here in Toronto. For that, I work with my local TPM members.

How does organizing a conference compare to organizing a software project?

RD: That's a really interesting question. I hadn't considered this before, but I can posit a few possibilities.

First, I want to make a distinction between types of software projects: commercial and open source. I think these two types have quite different organizational characteristics. Organizing a YAPC is more like organizing an open source software project, but it has elements of commercial organization, too.

The open source software aspect of YAPC organizing is that everyone involved is a volunteer. I can only get people to move in the needed direction by setting an example, moral suasion, persuasion, and by being the obvious best person to make decisions and lead the group. That said, I can't make anybody do anything they don't want to. Even if I was ostensibly "successful" in talking someone into something they were reluctant to do, the end result is that they wouldn't put in enough time and wouldn't do a good enough job. I have to be very sensitive to finding the right people to do the right things.

Fortunately, it's not like I'm trying to make an organization that sells widgets. I'm running a YAPC, which is an intrinsically interesting and exciting thing. There are lots of people in TPM who want to help out and are willing to put in entirely unreasonable amounts of time and effort to make it a great conference. What I have to do is provide a structure in which they can be effective, and in which they can feel like they're being effective. I'm very grateful to all the people in TPM who have helped out so much so far, and I know that this help will continue (and accelerate, even!) going from here all the way to the conference, which should be most excellent as a result of all this energy and planning. (The speakers must be commended, too; they're all volunteers who help out YAPC as well, even if they do it in a different way.)

The commercial software project aspect of things is that YAPC has a deadline, and it has a budget, and people are charged money to attend (not a lot, but still it makes for a different dynamic). There is no "release early, release often" aspect of YAPC. There is no "we'll only release it when it's ready" aspect. These constraints drive the process.

I've heard rumors that you have some new events planned. What's this about a scavenger hunt? How about the hackathon beforehand?

RD: You are well-connected to the rumor mill, I see.

The scavenger hunt actually originated with Kevin Meltzer. He and I were chatting on AIM one day and he suggested it to me, seemingly out of the blue. I guess what happened was he got interested in seeing what there was to do in Toronto for personal touristing reasons (people often come a few days early or stay a few days late to YAPCs in order just to see the sights). What he found was that there was just plain too much to do in Toronto, and he got excited about experiencing the city as a whole. This is actually something that the TPM group wanted to accomplish in hosting a YAPC. Toronto is one of the great urban environments of North America (on the scale of San Francisco or Chicago) and YAPC is being held right in the heart of downtown. We're happy and proud for an opportunity to share the city with all the people who are coming in from afar to visit, especially people who haven't been before.

So Kevin made the suggestion, and I thought it was a great idea. At the next TPM meeting I mentioned it to the whole group and everyone seemed really pleased with it. Basically, we'll divide up the conference attendees into groups of 10 and send people out into the city with their digicams to track down and photograph local landmarks and places of geek-interest. This will happen on the night of Monday 27 June. The winners will be announced...

... on Tuesday night, which will be the night of the social centerpiece of YAPC. For those who don't know their geography (or don't happen to have a map handy), Toronto is situated on the north shore of Lake Ontario. Lake Ontario is one of the Great Lakes, which means it's big. Really big. More of an inland sea than a lake. Tuesday night, we're taking a cruise. We've booked a major cruise ship to hold the entire conference, and for 4 hours it will navigate Toronto Harbor, the channels through the Toronto Islands, and the near coast of Lake Ontario. I've done this kind of cruise once before; it makes for a great combination of natural and urban beauty. (The Toronto skyline is one of the most impressive in the world, and it looks great from the lake as day turns to night.) We're also going to be having our banquet dinner on the cruise boat, and our fundraising auction. (There will be some special items to be had during the auction as well. They are sure to raise eyebrows.)

There will be happenings before the conference officially begins, too. Uri Guttman (of Boston.pm) is the official YAPC 'socialist czar', and he's arranging for a group restaurant dinner for anyone who arrives to the conference on Sunday (which should be almost everyone from out of town) and he's also trying to book a movie theatre in the downtown area for our exclusive use to see something, we're not quite sure what yet. (What are the big sci-fi films to be released towards the end of June?) We're still looking for sponsorship for this.

Though neither social nor officially part of the conference, the 'lambdacamels' will descend upon Toronto, too. This is the group led by Autrijus Tang, founder of the Pugs project. Pugs is an effort to implement Perl 6 in Haskell as a kind of proof-of-concept for the real Perl 6 (which will run on Parrot), as well as to give Larry a working Perl 6 example through which he can focus on the details in the Perl 6 language design. About a dozen Pugs hackers will show up in Toronto nearly a week before the conference happens, and then be whisked away to a wilderness hideaway where they will do nothing but hack on Pugs for several days before the conference starts. (We have a lot of wilderness in Canada, even near Toronto, and one of the Toronto Mongers has a large cottage where he'll house the whole gang.)

From time to time we will throw them scraps of food. If we are pleased with the results, we'll drive them back to civilization in time for the conference.

How did you go about wrangling sponsors?

RD: I'd say it was a three-edged sword.

Yours, mine, and the truth?

RD: You do not understand, but you will.

All 3 edges have something to do with simply being networked into various IT communities. The least of the 3 was me needing sponsors and flipping through my Rolodex. (Well, Palm Pilot, but same idea.) The first--and crucial--sponsor came as a result of my being active in CLUE, the Canadian Linux Users Exchange. It is an organization that attempts to act as a clearing-house for Linux community activities around Canada. The head of that organization, Matt Rice, is a big Perl programmer and advocate as well. And he is also closely connected with the people at LPI, the Linux Professional Institute.

So one day Matt and I were talking, and I told him how I was mustering TPM to make a YAPC bid. This was in July 2004, making it the 2nd bid that TPM would make for YAPC, as we lost the 2003 bid to Buffalo. Buffalo did a great job, and I was happy to have lost to them after I went to YAPC::NA 2004 there and saw the quality of their event. One of the really big things that they had going for them was that the CompSci department at SUNY Buffalo donated the entire venue to them, gratis. They just had a new building built on campus and they were eager for an opportunity to showcase it. This, plus a desire on the part of the chairman of the CS department there to 'give back' to the open source community in general (and the Perl community in specific) led to the donation of their new building as a venue. It's a really nice building. We were very fortunate to have had it for the conference.

Did this happen before TPF accepted the proposal or after?

RD: It was part of the 2004 bid that Buffalo.pm submitted to TPF in August 2003. As a member of the Perl community, I was really happy with the donation from the CS department there and how it made for a great YAPC. As a losing bidder for YAPC::NA 2004, though, I felt differently. (Like I said, I'm glad that they did [win]. You can't look that kind of gift horse in the mouth.)

I felt like I had a trump card played against me. We put a lot into our 2004 bid, and yet there was no argument I could make that would have changed the outcome, or should have changed the outcome.

So I felt vulnerable when it came to creating a bid in the summer of 2004 for 2005. Like, no matter how hard I worked, someone else could produce the same magical trump and we would be shut out. So I made finding that same trump card myself to put into the Toronto bid my first order of business. I explained it this way to Matt, and he brought it to some people at LPI, and they realized the value of supporting a great open source community like the Perl community, and they essentially committed to a sponsorship large enough that our venue requirements would be covered.

So LPI became the foundation sponsor of YAPC::NA 2005 in Toronto. I honestly can't imagine how I would have made the conference work without them.

How much of the previous bid did you re-use?

RD: Much of it. We cleaned it up a bit, and added "options" to it. In the previous year I identified one potential venue to put into the bid. For the 2005 bid, I kept that one but added another, in case TPF found one or the other preferable. But the main addition to the bid was the LPI sponsorship. To compare the work effort of the two bids, I'd say that the bid we put together in the summer of 2003 took about a dozen of us about 2.5 weeks to put together, in our off-hours, using a Kwiki. (That was when I first appreciated just how useful wikis can be in collaborative documentation development tasks!)

The bid in 2004, I assembled myself in about 2 days. (Not including the effort of finding the LPI sponsorship or in 'scouting out' the venue possibilities.)

Having potential venues and a real sponsor must have helped.

RD: I don't know if the 2005 bid benefited from having two venue possibilities as opposed to one, but it certainly benefited from having the foundational sponsorship of LPI.

How about the rest of the sponsors?

RD: So I've covered two edges of the sword. The third edge is by random chance. Although it must be said that chance favors the prepared mind.

In September 2004 there was a PHP conference happening in Toronto, and I kept up a degree of correspondence with the producer of that conference, Marco Tabini. (This was as part of the work I was doing for CLUE, which was making myself aware of open source events happening in Canada.) The day that conference was to wrap up, I eked some time out of work and went over to the conference venue to congratulate Marco on his conference. So we finally met in person there, and he invited me to stay and join him for lunch. I did... and promptly got separated from him and the rest of his core group in the buffet line.

So after I got my food, I went to whatever other random table appeared to have a free seat and I got into conversation with the rest of the table. At one point during conversation at the table one of the guys there asked us all what we thought of DB2 (the IBM database product). About a year earlier I had worked with it in a Perl project and I had had a hell of a time, so I heaped a bit of good-natured scorn upon it, mostly having to do with how difficult it was to integrate with Perl. After a few minutes of ranting I asked him why he asked. The reason was because he was the Product Manager for DB2 and Open Source Software at IBM. ($name = 'Dan Scott')

Moral: ask before ranting, unless you're the kind of person who asks and then doesn't rant.

RD: Fortunately, Dan is one of those people who takes criticism constructively! :-) He was really happy to hear that I had an opinion on Perl and DB2, and he wanted to get together with me to get down my thoughts in a more organized fashion, so that he could direct some resources within IBM toward addressing those concerns. So we made an appointment to do that. And then I mentioned to him that there was this YAPC thing that had just recently been awarded to Toronto, and maybe IBM would like to help out with that too. Actually, this may have been a few days before the official announcement that Toronto.pm had been awarded YAPC, but Kevin Meltzer and I had talked informally about it before that.

So anyhow, Dan was receptive to the idea, and he put me in touch with the right people at IBM for this kind of thing. It took a few months to move its way through the process there, but with the help of some great sympathetic people at IBM they came through with a tremendous sponsorship, which is how we are able to have the cruise/banquet I told you about earlier. To summarize: direct networking, Rolodex, and random. (The latter two being manifestations of indirect networking, I suppose).

This covers my sponsorship efforts. Since then I have deputized a few people at TPM to try to find some more. That is in process but they've had some promising responses so far.

Is the goal of a YAPC to be self-sufficient or to make a bit of money?

RD: The goal is to be self-sufficient. As I understand it, there is no official TPF position that YAPC should create a surplus that can go back to TPF to help it fund its other goals.

That said, the previous few YAPCs have had surpluses, though not huge ones. I really don't know if YAPC::NA 2005 in Toronto will or won't this year.

Thus every new sponsor you find can fund some other neat thing, like a movie trip or the cruise?

RD: Yes, that's exactly correct. As I mentioned earlier, we're still hoping to find a sponsor for the movie night that Uri is organizing. Sometimes sponsorship money won't be used for "fun" or "conference enhancing" items. For instance, I really don't know whether or not I can provide my conference volunteers with free admission to the conference. I need to find maybe $2000 in order just to cover their variable food costs.

A conference without volunteers definitely counts as "not fun" in my book.

RD: Oh, we'll have volunteers. It's just that, as things stand right now, they'll pay their conference fee just like everyone else. I've paid my conference fee. Many others on the volunteer side of things have already, too. It's like that bit in "Starship Troopers": In Raczak's Roughnecks, everyone jumps, everyone fights!

What were you looking for when you and your team put together the schedule this year?

RD: We kind of didn't know what we were looking for, at first. The call-for-papers (CFP) went active on January 24, I think. Up until maybe three days before the close of CFP (April 18), we had paper submissions totaling maybe only 75% of what was needed to fill the conference, even if we stretched some of the talks out. When the CFP ended, we were up to maybe 250% of what was needed to fill the conference. Yes, there were a LOT of last minute submissions.

I can't manage to feel any surprise!

RD: No, I don't blame you.

But there's a huge difference between being an impassionate outside observer, and being the poor schmuck who won't know whether or not their life has meaning until the CFP totals are in in another couple of days. I was pretty tense leading up to that.

It was pretty obvious to us that there would be certain "tracks" that would form in the conference: Perl 6 and testing were the two big ones. The process of putting together the schedule went something like this:

  • Pick out the very few "absolute-must-have" talks from the stack.
  • Find a time for them in the utterly empty schedule.
  • Now, figure out the talks which thematically support those talks.
  • If there are too many for those themes, then figure out if any of those overlap, and pick one of the overlapping.
  • And now cut the list down by another talk or two, somehow.

So now we have our anchors, and our tracks. That left us with only so much schedule left to fill, and way too many talks to do it with. So we (the program selection committee) went into a kind of frenzied scrum for about two hours with back-and-forth advocating of personal favorites. When the dust cleared, we had our schedule.

I should say that all of this happened on Saturday, April 23, when six of us got together and spent the whole day working through the schedule.

Did you notice other trends>

RD: Other trends... hmm... well, we have a database track, more or less, which is heavily influenced by Class::DBI. Apart from that, I don't know if there is anything we can call a trend, but I can point out a few other features of the schedule.

The opening 1.5 hours of the first day is a plenary session--a bit of time for Toronto.pm to welcome everyone who came (and we've got people coming from all over the world!), followed by Larry giving a keynote, followed by Allison giving her "state of the carrot" update on the world of Perl over the past year. The afternoon of day three will be much as it has been the past few years; the Lightning Talks, followed by a closing keynote, followed by a Town Hall meeting.

One interesting thing that happened with the schedule that I was expecting is that, in day two, we are actually going to bump up the number of parallel tracks from three to four, which I think is a first for a YAPC::NA. This is to accommodate a "donated tutorial" from Stonehenge that brian d foy, Stonehenge trainer and originator of the Perl Monger movement, will present. We haven't yet identified the room at the conference venue that we will book for him, so we've been calling it the "brian d foyer". We're all just way too pleased with ourselves for that. Even once we have figured out a room, no doubt that name will stick.

We thought it was important to put this tutorial into the schedule so that newcomers to Perl would have something meaningful at the conference to cater to their needs. This was raised at the Town Hall meeting in Buffalo at the end of YAPC::NA 2004 and so we're trying to take those comments to heart. It has meant finding more money to book the extra room, but we all thought it was important enough to make the stretch to do this.

Do you expect a lot of newcomers? I didn't really catch the breakdown between experienced programmers and neophytes at the last couple of YAPCs.

RD: I wish I was in a better position to answer that. Almost by definition it's hard to predict, as these people don't circulate in the same communication channels that I do. I talked with brian d foy about what his expectations were in terms of the number of people who would attend his introductory Perl tutorial at YAPC, and he said that when he has given similar talks at similar conferences he's always had a good turn out.

That's as close I can get to knowing, I'm afraid. As for previous YAPCs, well, what I recall from Buffalo was that the newcomers were certainly vocal in the Town Hall meeting. That's a good thing! I absolutely hate the idea that the Perl community could degenerate into an echo chamber. New perspectives are necessary.

How is registration going?

RD: Just a second...I'll have a look....

Wow! This was the first time I looked today--seven new since yesterday. Which is huge. It could be the biggest single day jump yet. The total is 106. Which is considerably ahead of any previous YAPC::NAs I have any data for this number of days before the conference starts. (T-minus-YAPC, as it were.)

My data set goes back to YAPC::NA 2002 in St. Louis. 1999 - 2001, I don't know. Maybe the information is out there somewhere within TPF, but I haven't figured out where yet. Right now, my projection for registrants + speakers + volunteers is between 300 and 400.

Do you have access to a lot of that through previous organizers?

RD: Only going back to 2002. I think that's when TPF instituted a new back-end data system.

How many speakers and volunteers do you estimate will be there?

RD: Speakers, I don't have to estimate--the number is 45. (Not including Lightning Talks.) Volunteers--maybe 20? 25? We're in low teens now, and there's a lot of buzz within the larger TPM group so I expect a lot of people will offer their time for the actual conference duration.

Regarding the range, my actual projections are for 340 right now. 300 is being conservative, assuming we lose quite a lot of momentum. My gut tells me that we're actually going to gain a fair bit more. 400 would be a blow-out success.

How many can the venue hold?

RD: The venue can hold 420 before I start getting the facility manager to start lining chairs up against the walls. More importantly, the cruise boat can hold 500. :-) While this is all going by gut, I think we'll be around the 400 total mark. Maybe I'm being influenced by the seven registrants today. Earlier this weekend there was a day when there wasn't a single registrant. That doesn't happen often. Maybe had you asked me on that day I wouldn't have been so optimistic.

How about the hotel and the dorms?

RD:Ahh....there's the rub.

Right now, I have 250 rooms reserved in the same hotel facility that is hosting the conference. I have 100 rooms reserved in a university dorm about 1km away from the conference, for students or other people who are particularly cost-conscious. I also have 50 rooms reserved in a hotel that is part of the same university that has the dorm rooms, so that's also about 1km away from the conference facility. So, we have plenty for the whole conference.

But...

The group reservation for the 250 rooms at the hotel facility expires on May 12. At that point, any unbooked rooms go back on the open market. They're still bookable with the conference code, but they just aren't reserved for us. So they'll probably be sold to the public with a half-life of 4 days.

Toronto is notoriously under-served by hotels, and compounding this is that AA is having its 75th anniversary conference in Toronto on June 30th and July 1. That conference is bringing 79,000 people into the city. The 'thin tail' of AA conference attendees who want to come to Toronto a bit early to do some touristing before their conference starts will put a great deal of strain on the hotel availability in the city.

We call that the long snout. Or the short snout, I forget.

RD: Right! That's a better term. I was wondering what to call it.

Looking at things from the opposite direction, YAPC attendees who decide at the last minute that they want to spend a few extra days in Toronto after the conference ends will be S.O.L. trying to find a hotel. So I'm concerned about all this.

Looking at the registration figures from the previous few years of YAPCs, about 40% of attendees register in the last 2.5 weeks. So for YAPC this year, that would mean 40% registering June 6 and after. There will still be a few rooms left in the dorms and the hotel attached to the university that contains the dorms (our booking arrangements are slightly different with them), but still, there will be a distinct shortage of hotel rooms.

Anything else to say in closing?

RD: YAPC is the Perl conference by Perl programmers, for Perl programmers (and people who think they might like to be Perl programmers). Everyone in the Toronto.pm organizing team is doing it because we all feel so fortunate for all the support we have received from our fellow Perl Mongers around the world, and we feel privileged to return the generosity as best we can through the conference.

We hope you can join us in Toronto for the conference, and we're hoping to make it a memorable one for everyone.

Massive Data Aggregation with Perl

This article is a case study of the use of Perl and XML/RDF technologies to channel disparate sources of data into a semi-structured repository. This repository helped to build structured OLAP warehouses by mining an RDF repository with SAX machines. Channels of data included user-contributed datasets, data from FTP and HTTP remote-based repositories, and data from other intra-enterprise based assets. We called the system the 'Kitchen Sync', but one of the project's visionaries best described it as akin to a device that accepts piles of random coins and returns them sorted for analysis. This system collected voter data and was the primary data collection point in a national organization for the presidential campaign during the 2004 election.

Introduction

My initial question was why anyone would want to store data in XML/RDF formats. It's verbose, it lacks widely accepted query interfaces (such as SQL), and it generally requires more work than a database. XML, in particular, is a great messaging interface, but a poor persistence medium.

Eventually, I concluded that this particular implementation did benefit from the use of XML and RDF as messaging protocols. The messaging interface involved the use of SAX machines to parse a queue of XML and RDF files. The XML files contained the metadata for what we called polls, and the RDF files contained data from those polls. We had a very large buffer, from which cron-based processes frequently constructed data warehouses for analysis.

Hindsight and Realizations

The difficulty of this project was in the gathering of requirements and vendor interfacing. When implementing application workflow, it is critical to use a programming language that doesn't get in the way and allows you to do what you want--and that is where Perl really shined. A language that allows for quick development is an asset, especially in a rushed environment where projects are due "yesterday". The code samples here are not examples of how to write great object-oriented Perl code. They are real world examples of the code used to get things done in this project.

For example, when a voter-data vendor changed its poll format, our data collection spiders stopped returned data and alerted our staff immediately. In just minutes, we adapted our SAX machine to the vendor's new format and we had our data streams back up and running. It would have taken hours or days to call the vendor about the change and engage in a technical discussion to get them to do things our way. Instead, Perl allowed us to adapt to their ways quickly and efficiently.

Project Goals

The architects of this project specified several goals and metrics for the application. The main goals--with the penultimate objective being to accumulate as much data as possible before election day--were to:

  • Develop a web-based application for defining metadata of polls, and uploading sets of poll data to the system.

    The application had to give the user the ability to define sets of questions and answers known as polls. Poll metadata could contain related data contained in documents of standard business formats (.doc, .pdf). The users also needed an easy method, one that minimized possible errors, to upload data to the system.

  • Meet requirements of adding 50 million new records per day.

    That metric corresponds to approximately 578 records per second. Assuming a non-linear load distribution over time, peak transaction requirements were likely to be orders of magnitude higher than the average of 578 per second.

  • Develop a persistent store for RDF and XML data representing polls and poll data.

    The web application had to generate XML documents from poll definitions and RDF documents from uploaded poll data. We stored the poll data in RDF. We needed an API to manage these documents.

  • Develop a mechanized data collection system for the retrieval of data from FTP- and HTTP-based data repositories.

    The plan was to assimilate data sources into our organization from several commercial and other types of vendors. Most vendors had varying schemas and formats for their data. We wanted to acquire as much data as possible before the election to gauge voter support levels and other key metrics crucial to winning a political election.

Web Application

When I started this project, I had been using mod_perl2 extensively in prototyping applications and also as a means of finding all of the cool new features. Mod_perl2 had proven itself stable enough to use in production, so I implemented a Model-View-Controller application design pattern using a native mod_perl2 and an libapreq2-enabled Apache server. I adopted the controller design patterns from recipes in the Modperl Cookbook. The model classes subclassed Berkeley DBXML and XML::LibXML for object methods and persistence. We used Template Toolkit to implement views. (I will present more about the specifics of the persistence layer later in this article.)

Of primary importance with the web application component of the system was ease of use. If the system was not easy to use, then we would likely receive less data as a result of user frustration. The component of the web application that took extended transaction processing time was the poll data upload component.

If the user uploads a 10MB file on a 10Kbps upstream connection (common for residential DSL lines), the transaction would take approximately twenty minutes. On a 100Kbps upstream connection (business grade DSL), the transaction would take two minutes--certainly much longer than most unsuspecting users would wait before clicking on the browser refresh button.

To prevent the user from accidentally corrupting the lengthy upload process, I created a monitoring browser window which opened via the following JavaScript call when the user clicked the upload button.

<input type=submit name='submit' value='Upload'
    onClick="window.open('/ksync/dataset/monitor', 'Upload',
       'width=740,height=400')">

The server forked off a child process which read the upload status from a BerkeleyDB database. The parent process used a libapreq UPLOAD_HOOK-based approach to measure the amount of data uploaded, and to write that plus a few other metrics to the BerkeleyDB database. The following is a snippet of code from the upload handler:

<Location /ksync/poll/data/progress>
    PerlResponseHandler KSYNC::Apache::Data::Upload->progress
</Location>

sub progress : method {
    my ( $self, $r ) = @_;

    # We deal with commas and tabs as delimiters currently
    my $delimiter;

    # Create a BerkeleyDB to keep track of upload progress
    my $db = _init_status_db( DB_CREATE );

    # Get the specifics of the poll we're getting data for
    my $poll = $r->pnotes('SESSION')->{'poll'};

    # Generate a unique identifier for files based on the poll
    my $id = _file_id($poll);

    # Store any data which does not validate according to the poll schema
    my $invalid = IO::File->new();
    my $ivfn = join '', $config->get('data_root'), '/invalid/', $id, '.txt';
    $invalid->open("> $ivfn");

    # Set the rdf filename
    my $gfn = join '', $config->get('data_root'), '/valid/', $id, '.rdf';

    # Create an RDF document object to store the data
    my $rdf = KSYNC::Model::Poll::Data::RDF->new(
                $gfn, 
                $poll,
                $r->pnotes('SESSION')->{'creator'}, 
                DateTime->now->ymd, 
    );

    # Get the poll questions for to make sure the answers are valid
    my $questions = $poll->questions;

    # Create a data structure to hold the answers to validate against.
    my @valid_answers = _valid_answers($questions);

    # And a data structure to hold the validation results
    my $question_data = KSYNC::Model::Poll::validation_results($questions);

    # Set progress store parameters
    my $length              = 0;
    my $good_lines_total    = 0;
    my $invalid_lines_total = 0;
    my $began;              # Boolean to determine if we've started parsing data
    my $li                  = 1;    # Starting line number

    # The subroutine to process uploaded data
    my $fragment;
    my $upload_hook = sub {
        my ( $upload, $data, $data_len, $hook_data ) = @_;

        if ( !$began ) {   # If this is the first set check the array length

            # Chop up the stream
            my @lines = split "\n", $data;

            # Determine the delimiter for this line
            $delimiter = _delimiter(@lines);

            unless ( ( split( /$delimiter/, $lines[0] ) ) ==
                scalar( @{$question_data} ) + 1 )
            {
                $db->db_put( 'done', '1' );
                
                # The dataset isn't valid, so throw an exception
                KSYNC::Apache::Exception->throw('Invalid Dataset!');
            }
        }

        # Mark the start up the upload
        $began = 1;

        # Validate the data against the poll answers we've defined
        my ( $good_lines, $invalid_lines );

        ( $good_lines, $invalid_lines, $question_data, $li, $fragment ) =
          KSYNC::Model::Poll::Data::validate( \@valid_answers, 
                                              $data, 
                                              $question_data,
                                              $li, 
                                              $delimiter, 
                                              $fragment );

        # Keep up the running count of good and invalid lines
        $good_lines_total     += scalar( @{$good_lines} );
        $invalid_lines_total  += scalar( @{$invalid_lines} );

        # Increment the number of bytes processed
        $length += length($data);

        # Update the status for the monitor process
        $db->db_put(
                     valid     => $good_lines_total,
                     invalid   => $invalid_lines_total,
                     bytes     => $length,
                     filename  => $upload->filename,
                     filetype  => $upload->type,
                     questions => $question_data,
                   );

        # And store the data we've collected
        $rdf->write( $good_lines ) if scalar( @{$good_lines} );

        # Write out any invalid data points to a separate file
        _write_txt( $invalid, $invalid_lines ) if scalar( @{$invalid_lines} );
    };

    my $req = Apache::Request->new(
        $r,
        POST_MAX    => 1024 * 1024 * 1024,    # One Gigabyte
        HOOK_DATA   => 'Note',
        UPLOAD_HOOK => $upload_hook,
        TEMP_DIR    => $config->get('temp_dir'),
    );

    my $upload = eval { $req->upload( scalar +( $req->upload )[0] ) };
    if ( ref $@ and $@->isa("Apache::Request::Error") ) {

        # ... handle Apache::Request::Error object in $@
        $r->headers_out->set( Location => 'https://'
              . $r->construct_server
              . '/ksync/poll/data/upload/aborted' );
        return Apache::REDIRECT;
    }

    # Finish up
    $invalid->close;
    $rdf->save;

    # Set status so the progress window will close
    $db->db_put('done', 1');
    undef $db;
    
    # Send the user to the summary page
    $r->headers_out->set(
      Location => join('', 
                       'https://', 
                       $r->construct_server, 
                       '/poll/data/upload/summary',
                      )                   
    );
    return Apache::REDIRECT; 
}

During the upload process, the users saw a status window which refreshed every two seconds and had a pleasant animated GIF to enhance their experience, as well as several metrics on the status of the upload. One user uploaded a file that took 45 minutes because of a degraded network connection, but the uploaded file had no errors.

The system converted CSV files that users uploaded into RDF and saved them to the RDF store during the upload process. Because of the use of the UPLOAD_HOOK approach for processing uploaded data, the mod_perl-enabled Apache processes never grew in size or leaked memory as a result of handling the upload content.

Poll and Poll Data Stores

Several parties involved raised questions about the use of XML and RDF as persistence mediums. Why not use a relational database? Our primary reasons for deciding against a relational database were that we had several different schemas and formats of incoming data, and we needed to be able to absorb huge influxes of data in very short time periods.

Consider how a relational database could have handled the variation in schemas and formats. Creating vendor-specific drivers to handle each format would have been straightforward. To handle the variations in schema, we could have normalized each data stream and its attributes so that we could store all the data in source, object, attribute, and value tables. The problem with that approach is that you get one really big table with all the values, which becomes more difficult to manage as time goes on. Another possible approach, which I have used in the past, is to create separate tables for each data stream to fit the schema, and then use the power of left, right, and outer joins to extract the needed information. It scales much better than the first approach but it is not as well suited for data mining as warehouses are.

With regard to absorbing a lot of data very quickly, transactional relational databases have limitations when you insert or update data in a table with many rows. Additionally, the insert and update transactions are not asynchronous. When inserting or updating a record, the transaction will not complete until the indexes associated with the indexed fields of that record have updated. This slows down as the database grows in size.

We wanted the transactions between users, machines, and the Kitchen Sync to be as asynchronous as possible. Our ability to take in data in RDF format would not degrade with increasing amounts of data already taken in before warehousing for analysis. Data exchange challenges between vendors and us included a few large transactions in RDF format per data set, and how the length of the transaction time depended solely on the speed of the network connection between the vendor and our data center.

With the decision to use XML for storing poll metadata and RDF for storing poll data in place, we turned our attention to the specifics of the persistence layer. We stored the poll objects in XML, as shown in this example:


<?xml version="1.0"?>
<poll>
    <creator>Fred Moyer</creator>
    <date>2005-03-01</date>
    <vendor>Voter Data Inc.</vendor>
    <location>https://www.voterdatainc.com/poll/1234</location>
    <questions>
        <question>
            <name>Who is buried in Grant's Tomb?</name>
            <answers>
                <answer>
                    <name>Ulysses Grant</name>
                    <value>0</value>
                </answer>
                <answer>
                    <name>John Kerry</name>
                    <value>1</value>
                </answer>
                <answer>
                    <name>George Bush</name>
                    <value>2</value>
                </answer>
                <answer>
                    <name>Alfred E.  Neumann</name>
                    <value>3</name>
                </answer>
            </answers>
        </question>
    </questions>
    <media>
        <pdf>
            <name>Name of a PDF file describing this poll</name>
            <raw>The raw contents of the PDF file</raw>
            <text>The text of the PDF file, generated with XPDF libs</text>
        </pdf>
    </media>
</poll>

We also needed an API to manage those documents. We chose Berkeley DBXML because of its simple but effective API and its ability to scale to terabyte size if needed. We created a poll class which subclassed the Sleepycat and XML::LibXML modules and provided some Perlish methods for manipulating polls.

package KSYNC::Model::Poll;

use strict;
use warnings;

use base qw(KSYNC::Model);
use SleepyCat::DbXml qw(simple);
use XML::LibXML;
use KSYNC::Exception;

my $ACTIVITY_LOC = 'data/poll.dbxml';

BEGIN {
    # Initialize the DbXml database
    my $container = XmlContainer->new($ACTIVITY_LOC);
}

# Call base class constructor KSYNC::Model->new
sub new {
    my ($class, %args) = @_;

    my $self = $class->SUPER::new(%args);
    return $self;
}

# Transform the poll object into an xml document
sub as_xml {
    my ($self, $id) = @_;
    
    my $dom = XML::LibXML::Document->new();
    my $pi = $dom->createPI( 'xml-styleshet', 
                             'href="/css/poll.xsl" type="text/xsl"' );
    $dom->appendChild($pi);
    my $element = XML::LibXML::Element->new('Poll');

    $element->appendTextChild('Type',        $self->type);
    $element->appendTextChild('Creator',     $self->creator);
    $element->appendTextChild('Description', $self->description);
    $element->appendTextChild('Vendor',      $self->vendor);
    $element->appendTextChild('Began',       $self->began);
    $element->appendTextChild('Completed',   $self->completed);

    my $questions = XML::LibXML::Element->new('Questions');

    for my $question ( @{ $self->{question} } ) {
        $questions->appendChild($question->as_element);
    }

    $element->appendChild($questions);

    $dom->setDocumentElement($element);
    return $dom;
}

sub save {
    my $self = shift;

    # Connect to the DbXml databae
    $container->open(Db::DB_CREATE);

    # Create a new document for storage from xml serialization of $self
    my $doc = XmlDocument->new();
    $doc->setContent($self->as_xml);
    
    # Save, throw an exception if problems happen
    eval { $container->putDocument($doc); };
    KSYNC::Exception->throw("Could not add document: $@") if $@;

    # Return the ID of the newly added document
    return $doc->getID();
}

We chose RDF as the format for poll data because the format contains links to resources that describe the namespaces of the document, making the document self-describing. The availability of standardized namespaces such as Dublin Core gave us predefined tags such as dc:date and dc:creator. We added our own namespaces for representation of poll data. Depending on what verbosity of data the vendors kept, we could add dc:date tags to different portions of the document to provide historical references. We constructed our URLs in a REST format for all web-based resources.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:ourparty="http://www.ourparty.org/xml/schema#">
	
	<rdf:Description rdf:about="http://www.ourparty.org/poll/1234">
	    <dc:date>2004-10-14</dc:date>
            <dc:creator>fmoyer@plusthree.com</dc:creator>
        </rdf:Description>
        
        <rdf:Bag>
        <rdf:li ourparty:id="6372095736" ourparty:question="1"
		    ourparty:answer="1" dc:date="2005-03-01" />
        <rdf:li ourparty:id="2420080069" ourparty:question="2"
            ourparty:answer="3" dc:date="2005-03-02" />
	</rdf:Bag>
</rdf:RDF>

We used SAX machines as drivers to generate summary models of RDF files and LibXML streaming parsers to traverse the RDF files. We stacked drivers by using pipelined SAX machines and constructed SAX drivers for the different vendor data schemas. Cron-based machines scanned the RDF store, identified new poll data, and processed them into summary XML documents which we served to administrative users via XSLT transformations. Additionally, we used the SAX machines to create denormalized SQL warehouses for data mining.

An example SAX driver for Voter Data, Inc. RDF poll data:

package KSYNC::SAX::Voterdatainc;

use strict;
use warnings;

use base qw(KSYNC::SAX);

my %NS = (
    rdf      => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    dc       => 'http://purl.org/dc/elements/1.1/',
    ourparty => 'http://www.ourparty.org/xml/schema#',
);

my $VENDOR = 'Voter Data, Inc.';

sub new {
    my $class = shift;

    # Call the super constructor to create the driver
    my $self = $class->SUPER::new(@_, { vendor => $VENDOR });

    return $self;
}

sub start_element {
    my ($self, $data) = @_;
    
    # Process rdf:li elements
    if ( $data->{Name} eq 'rdf:li' ) {
    
        # Grab the data
        my $id      = $data->{Attributes}{ "{$NS{ourparty}}id" }{Value};
        my $answer  = $data->{Attributes}{ "{$NS{ourparty}}answer" }{Value};
        my $creator = $data->{Attributes}{ "{$NS{dc}}creator" }{Value};
        my $date    = $data->{Attributes}{ "{$NS{dc}}date" }{Value};

        # Map the data to a common response
        $self->add_response({ vendor        => $VENDOR,
                              voter_id      => $id, 
                              support_level => $answer, 
                              creator       => $creator,
                              date          => $date,
                           });

        # Call the base class start_element method to do something with the data
        $self->SUPER::start_element($data);
}

1;

We stored RDF documents compressed in bzip2 format, because bzip2 compression algorithm is especially efficient at compressing repeating element data. As shown below in the SAX machine example, using bzcat as the intake to a pipeline parser allowed decompression of the bzip2 documents for parsing and creating a summary of a poll data set.

#!/usr/bin/env perl

use strict;
use warnings;

use KSYNC::SAX::Voterdatainc;
use XML::SAX::Machines qw(Pipeline);

# The poll data
my $rdf = 'data/voterdatainc/1759265.rdf.bz2';

# Create a vendor specific driver
my $driver = KSYNC::SAX::Voterdatainc->new();

# Create a driver to add the data to a data warehouse handle
my $dbh = KSYNC::DBI->connect();
my $warehouser = KSYNC::SAX::DBI->new(
                    source => 'http://www.voterdatainc/ourparty/poll.xml',
                    dbh    => $dbh,
                );

# Create a parser which uncompresses the poll data set, summarizes it, and 
# outputs data to a filter which warehouses the denormalized data
my $parser = Pipeline(
                "bzcat $rdf |" =>
                $driver        => 
                $warehouser    => 
;

# Parse the poll data
$parser->parse();

# Summarize the poll data
print "Average support level:  ",   $driver->average_support_level, "\n";
print "Starting date:  ", 	    $driver->minimum_date, "\n";
print "Ending date:  ", 	    $driver->maximum_date, "\n";

Between the polls, the XML Schema dictionaries, and the RDF files, we know who the polls contacted, what they saw, and how they responded. A major benefit of keeping the collected information in RDF format is the preservation of historical information. We constructed SQL warehouses to analyze changes in voter support levels over time. This was critical for measuring the effect of events such as presidential debates on voter interest and support.

Using RDF also provided us with the flexibility to map new data sources as needed. If a vendor collected some information which we had not processed before, they would add an about tag such as <rdf:Description rdf:about="http://www.datavendor.com/ourparty/poll5.xml" /> , which we would map to features of our SAX machines as needed.

We added some hooks to the SAX machines to match certain URIs and then process selected element data. Late in the campaign, when early voting started, we were able to quickly modify our existing SAX machines to collect early voting data from the data streams and produce SQL warehouses for analysis.

Mechanization of Data Collection

A major focus of the application was retrieving data from remote sources. Certain vendors used our secure FTP site to send us data, but most had web and FTP sites to which they posted the information. We needed a way to collect data from those servers. Some vendors were able to provide data to us in XML and RDF formats, but for the most part, we would receive data in CSV, TSV, or some form of XML. Each vendor generally had supplementary data beyond the normal voter data fields which we also wanted to capture. Using that additional data was not an immediate need, but by storing it in RDF format we could extract it and generate SQL warehouses whenever necessary.

We developed a part of the application known as the spider and created a database table containing information on the data source authentication, protocol, and data structure details. A factory class KSYNC::Model::Spider read the data source entries and constructed spider objects for each data source. These spiders used Net::FTP and LWP to retrieve poll data, and processed the data using the appropriate KSYNC::SAX machine. To add a new data source to our automated collection system, an entry in the database configured the spider, and if the new data source had data in a format that we did not support, we added a SAX machine for that data source.

An example of spider usage:

package KSYNC::Model::Spider;

use strict;
use warnings;

use Carp 'croak';
use base 'KSYNC::Model';

sub new {
    my ($class, %args) = @_;
    
    # Create an FTP or HTTP spider based on the type specified in %args
    my $spider_pkg = $class->_factory($args{type});
    my $self = $spider_pkg->new(%args);

    return $self;
}

sub _factory {
    my ($class, $type) = @_;

    # Create the package name for the spider type
    my $pkg = join '::', $class, $type;
    
    # Load the package
    eval "use $pkg";
    croak("Error loading factory module: $@") if $@;

    return $pkg;
}

1;

package KSYNC::Model::Spider::FTP;

use Net::FTP;
use KSYNC::Exception;

sub new {
    my ($class, %args) = @_;
    
    my $self = { %args };

    # Load the appropriate authentication package via Spider::Model::Auth 
    # factory class
    $self->{auth} = Spider::Model::Auth->new(%{$args{auth}});

    return bless $self, $class;
}

sub authenticate {
    my $self = shift;
    
    # Login
    eval { $self->ftp->login($self->auth->username, $self->auth->password); };
     
    # Throw an exception if problems occurred
    KSYNC::Exception->throw("Cannot login ", $self->ftp->message) if $@;
}

sub crawl {
    my $self = shift;
    
    # Set binary retrieval mode
    $self->ftp->binary;

    # Find new poll data
    my @datasets = $self->_find_new();

    # Process that poll data
    foreach my $dataset (@datasets) {
        eval { $self->_process($dataset); };
        $self->error("Could not process poll data $dataset->id", $@) if $@;
    }
}

sub ftp { 
    croak("Method Not Implemented!") if @_ > 1; 
    $_[0]->{ftp} ||= Net::FTP->new($self->auth->host); 
}

1;

#!/usr/bin/env perl

use strict;
use warnings;

use KSYNC::Model::Spider;
use KSYNC::Model::Vendor;

# Retrieve a vendor so we can grab their latest data
my $vendor = KSYNC::Model::Vendor->retrieve({ 
  name => 'Voter Data, Inc.',
});

# Construct a spider to crawl their site
my $spider = KSYNC::Model::Spider->new({ type => $vendor->type });

# Login
$spider->login();

# Grab the data
$spider->crawl();

# Logout
$spider->logout();

1;

Conclusions

In this project, getting things done was of paramount importance. Perl allowed us to deal with the complexities of the business requirements and the technical details of data schemas and formats without presenting additional technical obstacles, as programming languages occasionally do. The CPAN, mod_perl, and libapreq provided the components that allowed us to quickly build an application to deal with complex, semi-structured data on an enterprise scale. From creating a user friendly web application to automating data collection and SQL warehouse generation, Perl was central to the success of this project.

Credits

Thanks to the following people who made this possible and contributed to this project: Thomas Burke, Charles Frank, Lyle Brooks, Lina Brunton, Aaron Ross, Alan Julson, Marc Schloss, and Robert Vadnais.

Thanks to Plus Three LP for sponsoring work on this project.

This Week in Perl 6, April 26 - May 3, 2005

All,

Welcome to another week's summary. This week I shall endeavor not to delete my summary accidentally or destroy the world. Here we go with p6c.

Perl 6 Compilers

Implicit $_ on for Loops

Kiran Kumar found a bug in Pugs involving for loops which use $_ but don't iterate over it. Aaron Sherman and Luke Palmer confirmed the bug. There's no word as to its final status, but given the rate of development of Pugs...

Pugs Darcs Trouble

Glenn Ehrlich noticed that Pugs' Darcs repository wasn't updating. Sam Vilain explained that occasionally a daemon needed kicking.

Memory Game v0.2

BÁRTHÁZI András announced the release of the latest version of Memory. He also put out a call for 85x75 pixel photos for the next version.

Haddock for Pugs

Stuart Cook decided that the easiest way for him to understand Pugs internals was to provide better documentation. To that end he started working with Haddock to automatically generate cross-linked documentation for Pugs. He even met with some success.

is export Trait

Garrett Rooney wondered why the is export trait appeared to do nothing in Pugs. Stevan Little explained that it was just a place holder which, though it parses, does nothing semantically yet.

Pugs 6.2.2

Autrijus proudly announced the release of Pugs 6.2.2. It features many, many changes. High on the list is a bunch of speed ups, and also thread-safe, deadlock-free internal storage.

Pugs on Cygwin

Rob Kinyon noticed that Pugs had trouble on Cygwin. He has made some headway rectifying the situation, although work remains to be done.

Pugs TODO Model

Stevan has put some more thought into the TODO model for Pugs. His latest suggestion, annotating TODO tests with a flag indicating why they are not passing, seems a little less hackish then the last one and received general support.

Parrot Hiding Inside Pugs

Autrijus wanted to embed the newly released PGE, which is PIR code that runs on Parrot. He decided to embed Parrot into Pugs. He also posted an interesting link to JHC as a possible bootstrap solution.

New PGE Released

Maybe I should have mentioned this first. Patrick R. Michaud released a new version of the Parrot Grammar Engine. It is entirely PIR code and generates PIR code. It has many features but not enough tests... <cough>hint</cough>.

Parrot

Monthly Release?

Jared Rhine wondered if the monthly releases included April. Chip announced that April's release would be slushier then most, but would start on the fourth.

t/op/debuginfo.t Failure

François Perrad noticed a failure in debuginfo. Leo pointed out that it was an issue of flushing output handles. Francois provided a patch (well, actually two patches). Warnock applies to the second.

ParTcl Happy?

Will Coleda thought that ParTcl's GC bugs had finally gone away. Leo burst his bubble. Apparently these GC bugs can disappear and reappear according to sunspot activity.

S1zzegfault in load_bytecode

Nick Glencross submitted a patch fixing a segfault in load_bytecode. Jens pointed out that it should use real_exception instead of internal_exception. chromatic offered to write the test. There is no official committed message though.

Large PackFile Tinker

Leo implemented a change in the interpreter PackFile structure which has been under discussion for a long time. Unfortunately, it has the potential to break a lot of JIT stuff. Tests and fixes would be wonderful.

PMC Inheritance Issue

Nicholas Clark had some trouble with his Perl5 PMCs. Later, he posted a "mea culpa" email, but Leo provided some useful pointers anyway.

RT Cleanup

Bernhard Schmalhofer cleaned out an old ticket from RT.

RFC assign Px, Py

Some time ago, Leo requested comments on the semantics of assign. Brent "Dax" Royal-Gordon tried to de-Warnock the thread with his support. He also suggested a clone operator.

NULL in real_exception

Nicholas Clark struggled with a NULL pointer deref in real_exception. Leo pointed him toward the correct approach.

Unary Operator Overhaul

Having finished overhauling the infix operators, Leo set to work updating the unary operators to provide variants which allocate their results.

Die die, die!

Leo announced that he was removing the die opcode and adding a Bruce Willis opcode in its place. Unlike Mr. Willis, the die_hard opcode actually died relatively easily. Leo then renamed it to just die out of popular demand.

Core Dump with Computed goto Core

Nick Glencross found a core dump with the computed goto core. Leo explained that a hackish attempted optimization caused it. He also fixed it.

Ignore Generated Files in SVN

Juergen Boemmels made SVN ignore some .so files. Leo asked if it could also ignore .rej and .orig files.

DYNSUPER Issues

Nicholas Clark found that DYNSUPER did not work well with his dynamic classes. Leo provided a suggestion for something to try and also suggested a super vtable call.

MMD Type Info

Autrijus wondered how Parrot carries around the type info for MMD. Leo provided answers.

IMC HTTP Server

Markus Amslser wrote a tiny webserver in IMC. This led to the discovery of that the binary to ASCII transcoding is missing. Leo suggested several possible solutions.

Ponie's Use of PMC Flag Bits

Nicholas Clark asked how best to use the 8 private bits of PMC in Ponie. His initial thought was to mark type with it. Leo suggested that adding the type to the flag bits was unnecessary, as one can usually just call a vtable or MMD function directly and the type will work out.

Built-in MMDs as Methods

Leo added support for calling built-in MMDs on objects as methods.

Unary and Infix Ops Update

Leo posted a summary and TODO for his overhaul of opcodes that return new results. Jerry Gay and Bob Rogers helped fill out the corners. Bob even provided some tests.

Parrot and Ref Counting

Robin Redeker wondered why Parrot had decided to go with a non-ref counting GC. Dan Sugalski took responsibility for the decision and explained his motivations. Non-ref counting GCs are essentially simpler, cleaner, and faster. You can get a more detailed answer from reading the thread or Dan's Squawks of the Parrot: What the heck is: Garbage Collection from way back when.

Deprecate fast_call?

Leo mused that he would like to deprecate the fast_call PIR construct. No one squawked, so I call it officially deprecated.

Win32 Thread Primitives

Vladimir Lipsky and Leo went back and forth a few times working out thread primitives for Windows. I am not sure what state they finally reached.

Call Syntax Abstraction

Leo posted a proposal for a calling convention abstraction. No comments have appeared yet, but it has not been up for long.

Tailcalls in PIR

Patrick R. Michaud wondered if PIR supported tailcalls and tailmethods yet. Leo provided a few pointers. Bob Rogers suggested a nice syntax, which Leo implemented.

Vtables in extend.h

Nicholas Clark noted a need for vtables in extend.h. Leo agreed that they should be auto-generated. chromatic eagerly requested the chance to write the Perl code that does this auto-generation.

s/BAILOUT/BAIL_OUT/

Michael G. Schwern announced that he'd removed BAILOUT from Test::Builder in favor of BAIL_OUT. He admitted that a deprecated BAILOUT would remain as undocumented. At some point we should update.

Perl 6 Language

Anonymous Roles and Closures

Aaron Sherman wondered if he could generate anonymous roles that are also closures. It made my head hurt. People seem to think it's possible though.

Peter Piper picked a hash...

Ingo Blechschmidt wondered what pick would return when called on a hash. Rod Adams suggested that it would return a pair. Larry thought that would be a good idea, if perhaps difficult to implement.

All (Junctions, Bad Subjects, Indecision)

Joshua Gatcomb found that he could use junctions to quickly answer questions but not provide specifics. Sadly, he is not the first person to have this problem. Junctions naturally provide Boolean operations but do not naturally explain what motivated that answer. Perhaps they should be thrown out along with the axiom of choice. After all, do we really need a basis for ALL of our vector spaces?

is rw a No-Op on References?

Ingo Blechschmidt wondered if is rw did anything for references, as you can still modify the value to which they refer. Juerd answered basically yes.

Complex Number Package

Jonathan Lang wondered about creating a complex number package. Including returning junctions of values for roots of unity. Unfortunately these lists can be quite large, even infinite. Thus, he wondered if he could use lazy junctions. Thomas Sandlaß conjectured that he could by the "law of laziness preservation".

Auto-Threading of Junctions and Threads

Aaron Sherman worried that the auto-threading of junction would actually run in separate threads. This is most assuredly not the common case, although some warped soul could implement it that way.

Context of Indices

Autrijus asked what context want would provide when used as an index. Larry provided answers.

Sun's Fortress

Autrijus posted a link to a next-generation computer language from Sun called Fortress and then went on to ask about parameterized types, tuple types, and block labels. Bad Autrijus! Make summarizers lives hard. Then we have to punt like this rather than try to extract the three different threads at once.

Restricting Variable Scopes With while

Joshua Gatcomb wanted to know how he could restrict the scope of variables used in the conditionals of his while statement to the loop. This led to some discussion about the implementation of while as a macro or some other beast. Larry began to ponder the submacro.

FIRST, LAST et al. Support in Pugs

David Christensen decided to try to add support for FIRST, LAST blocks and the like into Pugs. This led him to p6l with some questions about traits which are closures. Answers and suggestions followed.

Junctions of Classes

Ingo Blechschmidt wondered how junctions of classes would act. The consensus seems to be that they act as type specifiers of a sort, restricting whatever the variables they describe can contain.

Subtype Declarations

Luke Palmer wondered what exactly subtype declarations provided and suggested jettisoning them. Larry remained unconvinced.

Labels on Blocks

Stevan Little wondered if there was some way to attach labels to blocks. Discussion ensued, including a comment from Larry that it won't need to be redone.

open and pipe

Gaal Yahas proposed a basic open and pipe built-in for discussion. Discussion followed. Larry mentioned that he has "pretty much" blessed io (from IO::All).

Is if Function?

Juerd wondered if if as a function had unintended consequences. Fortunately, if is not a function, it is a statement-level construct.

Junctions in use Statements

Juerd wondered if he could say use strict & warnings. Larry explained that there were parsing problems with it and possibly limited utility, but he has not officially ruled it out, as the problems have workarounds.

.foo() == one($_.foo(), $?SELF.foo())

This great quandary continues. This week, Larry is leaning towards the .foo() == $_.foo() camp, but nothing is yet set. He also suggested $^ as being equivalent to $?SELF (which I really like). Time will tell how this one will work out. My prediction is that the argument will rage on until Pugs is ready to implement it and needs a definitive answer...then it will change a few more times.

Type System Questions

Autrijus asked a few questions about the type system, and at which times it did what things. Larry provided some answers.

The Usual Footer

Posting via the Google Groups interface does not work. To post to any of these mailing lists please subscribe by sending email to perl6-internals-subscribe@perl.org, perl6-language-subscribe@perl.org, or perl6-compiler-subscribe@perl.org. If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback to

This Week in Perl 6, April 20 - 26, 2005

The Perl 6 Summary for the Week Ending 2005-04-26

It's my turn again. What fun.

"What," I hear you all ask, "has been going on in the crazy, mixed-up world of Perl 6 design and development?" Read this summary and, beginning with perl6-compiler, I shall tell you.

This Week In perl6-compiler

Refactoring Test.pm

Stevan Little had an idea while he was refactoring Test.pm. He wondered whether to get rid of the various todo_* functions in favour of just using a t/force_todo file. This led him to wonder about doing away with t/force_todo in favour of a force_todo() function. He asked for opinions before he started making the change (which isn't exactly a refactoring).

General opinion seemed favorable, though I confess to feeling perturbed by the proposed release trick of proclaiming all failures, whether expected or not, to be TODOs. The current system generates an explicit list of tests that fail on "core" systems. The proposed solution seems to make all failures equal, so as not to catch even unexpected "platform" failures.

Weird Thing with say ++$

What do you know, say $i++, ++$i behaves weirdly.

Pugs 6.2.1 Released

Autrijus announced the availability of Pugs 6.2.1 which comes complete with much shininess.

This Week in perl6-language

Parrot Common Lisp

Cory Spencer's port of Common Lisp to Parrot received much admiration (it has some way to go before it's really Common Lisp, but it's a cracking start). Uwe Volker suggested porting Emacs to it, and Lars Balker Rasmussen promptly accused him of being Erik Naggum.

Cory acquired (or is acquiring) a committer bit.

State of the Tcl

Will Coleda gave the list a heads up on the state of ParTCL, the Parrot TCL port. It still fails some tests, apparently because of GC issues.

A few days later, these problems went away. We're not quite sure how, though.

alarm() and later()

Leo stated that Parrot provides subsecond timer resolution as well as alarm callbacks and multiple timers. Hurrah!

RFC assign Px, Py

Leo posted a discussion of the semantics of assign and set, with a proposed change to PIR syntax. Warnock applies.

RFC Unary Operations

In another RFC, Leo discussed changes to Parrot's unary operators and proposed other changes.

One More MMD: Assignment

Dan noted that he was writing a great deal of code in his set_pmc vtable methods that looked very MMD-like. He suggested that adding assignment to the list of MMD functions might be a good idea. Leo pointed him at his assignment RFC.

Fun with morph()

Nicholas Clark wondered about the responsibilities of the morph method with respect to handling PMC_struct_val. In the subsequent discussion it became apparent that morph can grow complicated. Bob Rogers supplied a bunch of extra complications and wondered about the feasibility of making Parrot morph-free. Leo agreed that it seemed feasible and is probably a good idea. Another subthread made my head hurt; I can understand this stuff much better when I sit 'round a table with people and we have a plentiful supply of drinks, notepaper and, in Leo's case, industrial quantities of tobacco. (Ah... YAPC::Paris!)

Building An Incomplete Code Generator into Parrot

rocko@adampreble.com (that's the only name I have) has started work on implementing a JIT backend for the AMD64 processor. He asked a few questions and Leo provided answers.

Calling Convention Abstraction

This thread continues to rumble on. Leo said that what he wants is for the HLL folks to create a workable scheme for abstract and extendable calling conventions that could express all the various HLL-specific semantics of function calling. He pointed out that unless we have this we can forget interoperability (or at least easy interoperability).

Alpha Development Box

Bob Rogers "has" an Alpha development box that open source projects can use. He wondered if the Parrot project could make use of it, and if so what was the best way of doing this. Some discussion occurred on the list, but I assume (hope) more happened offline.

Meanwhile, in perl6-language...

Calling Junctions of Closures

Brad Bowman wondered about calling junctions of closures. He guessed that the rule is "call 'em all and return a similarly structured junction", but wasn't sure. Thomas Sandlaß wasn't so sure.

My head hurts.

{ => } Autocomposition

Autrijus asked about the following fragment:

%ret = map { $_ => uc $_ }, split '', $text;

which give Pugs a massive headache.

From the ensuing discussion, it appears to cause Larry and Autrijus headaches, too. Also, it turns out that Larry's Perl 5 to Perl 5 translator has both madprops and madskills. Hurrah! Darren Duncan suggested that

%ret = map:{ $_ => uc $_ }, split '', $text;

should serve to say that the block is a block rather than a hash constructor. He thought that this came from one of the synopses, but couldn't remember which.

I wonder if:

%ret = map -> {$_ => uc $_}, split '', $text; 

wouldn't do the job. (Or did the syntax change on me when I wasn't looking?)

Embedding Languages in Perl 6

BÁRTHÁZI András had some questions about introducing different parsing rules in the middle of a Perl 6 program. Larry's answer was essentially, "All's fair if you predeclare", but with an interesting idea about using ` as a way of introducing a "self-terminating construct". So one could do:

use XML;

$a = `<elems><elem>Content #1</elem><elem>Content #2</elem></elems>;

or

use SQL;

$a = `select * from table`;

Various possibilities came up, but nothing set in stone.

Closure/Block/Sub Multiplier

Matt Creenan wondered about doing @names = &get_next(...) XX 5; (which, obviously, would call &get_next five times and shove the results into @names). Juerd pointed out some subtleties to do with functions that return closures. Later in the thread he decided that he controlled both the horizontal and the vertical, with particular reference to redefining true, false, and undef.

alarm() and later()

Remember the discussion of alarm and later in perl6-internals? It moved over to perl6-language. Larry agreed that Perl 6's time interfaces will favor floating point time values, but he wasn't quite sure if any of them will have the name alarm. Discussion ensued, both on the topic and on matters of naming style.

$?OS Globals, Etc.

Scott McWhirter proposed making the various $?OS, etc. variables into attributes of some GLOBAL class. Larry thought it was a good idea, but wasn't sure it was entirely right as proposed. This area is still under design.

Parens v. Subroutine Parameter

Autrijus had a question about how function signatures and various forms of paren magic interacted. He, Juerd, and Larry thrashed things out.

-Xs Auto-(Un)quoting

Michele Dondi had some questions/suggestions about the various file test operators. Larry answered and the thread spun off into a discussion of all sorts of aspects of these handy operators. Well, that was before it turned into a discussion of the semantics of <.foo>, or ^foo, or maybe _foo, or possibly ....foo. At this point, things grew a little heated. There's no decision yet. (Personally, I'm a fan of the scheme as originally proposed; .foo calls the method foo on the current topic, whatever that may be. If you need to hang on to old topics, give them a name. I appear to be in something of a minority on this.)

Unify cwd

It turns out that the "current working directory" isn't as obvious as it sounds. It also turns out that Larry would like to be able to pretend that it is until it turns out not to be.

Blocks, Continuations, and eval

When last we saw this thread, Larry had said that continuations would be available in Perl for people who ask specially, but that he wouldn't leave them lying around in the open where "some poor benighted pilgrim might trip over them, unaware." Wolverian asked what the interface would be. Larry thought it would probably start use Continuations;, or possibly use CONTINUATIONS;.

The thread prompted Stéphane Payrard to ask about the possibility of some of the more "out there" functional programming tricks making it into Perl 6. Once again, all's fair if you predeclare, but it looks like Perl 6 already has core access to some pretty "out there" stuff.

Accepted Abbreviations

Juerd wondered if we could compile a list of standard abbreviations for various terms so as to apply them consistently. He kicked off with a list of his own. There was some discussion, but I somehow doubt that people will use his list rigorously.

Thunking Semantics of :=

Once someone starts to implement a language, you have a wonderful driver for design decisions that need to be made and for finding ambiguities that need to be ironed out. On this occasion, Autrijus needed some clarification of the semantics of the binding operator, :=. Now we have ironed out ambiguities and written implementations (and yes, I do mean implementations.)

for all(@foo) {...}

Brad Bowman had questions about the workings of for all(@foo) {...} based on S03. It turns out that the Synopsis is wrong. Larry explained how it really should work (which is how it already works in Pugs).

Lazy Lists + Mutable Arrays + Garbage Collection

Brad also had questions about the workings of lazy lists. In particular, he wondered about treating streams as mutable arrays. Warnock applies.

map { $_ => uc $_ }, @foo Again

Autrijus proposed a cunning plan to deal with the ambiguities inherent in:

map { $_ => uc $_ }, @foo;

by suggesting that using a block without the comma should force the interpretation of said block as a block rather than as a hash constructor. Larry wasn't sure, arguing that it was best to disambiguate with something just before or just inside the block (in the same way that pattern modifiers now go before the pattern.)

Passing Hash to a Sub Expecting Named Params

Carl Franks wondered if he could pass a splatted hash (*%hash) to a function that expects named arguments. Answer: yep.

Turning Off Warnings For a Function's Params?

David Storrs wanted to be able to selectively turn off some warnings when he's testing. He asked how to go about doing it. Luke and Juerd provided some answers.

How do I Tie Hashes/Arrays?

Discussion of how to tie hashes and arrays continued.

Surprising consequences

Juerd worried that code like if($foo) { say 'foo'} would throw syntax errors. It turns out that one of his givens wasn't quite as given as he thought, so it's not a syntax error.

Calls and Parens

Juerd posted a set of examples of the new rules for parsing parentheses in function calls and asked which of his assumptions were wrong. Luke Palmer reassured him.

Context and Index Expressions

Autrijus posted a set of examples of array indexes and asked if he had all the contexts right. There was no answer at the time of this writing.

Hmm...still fun. I could get used to this.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.

Or, you can check out my website. Maybe now that I'm back to writing stuff I'll start updating it.

There are also vaguely pretty photos by me.

See you all in a fortnight.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en