July 2001 Archives

The State of the Onion 5

Larry's State of the Onion presentation this year was, as every year, completely different from previous years. As he noted, previous talks have been about chemistry, theology, biology, and music; this year, for once, Larry actually talked about Perl.

And this year the format was rather different. Based on the success of the lightning talks at previous Perl conferences, Larry decided to adopt this for his presentation - he gave us thirty-three lightning talks of fifty-five seconds each, with his daughter Heidi ruthlessly manning the bell. So ruthless, in fact, that Larry had to encourage us to laugh quickly to avoid cutting into his talking time.

Perl 6 and Apocalypses

Larry Wall during his State of the Onion speech at TPC 5. Larry's time keeper for the lightening rounds was Heidi Alayne Wall. For more on TPC 5 and the O'Reilly Open Source Convention, visit O'Reilly Network's conference coverage page.

Photo courtesty D. Story/J. Blanchard/O'Reilly Network

Larry talked about both Perl 5 and Perl 6. He noted that many talented people were putting dedication and love into Perl 5, and Perl 5 is doing great. So much for Perl 5.

Perl 6 was obviously the major focus of Larry's talk, with each lightning talk laying out the major points of an Apocalypse. As before, Apocalypses mirrored chapters of the Camel book, Programming Perl.

Larry hinted that he'd already recanted part of the second Apocalypse; the third Apocalypse was due to come out, but Larry got sick. On the other hand, this gave him a chance to go slowly, and to do it right.

So his lightning talks really began with apocalypse three, about operators. Larry noted that there had been a number of very specific proposals, but he wanted to concentrate on generalities. He also said that he was wavering on the idea of user-defined operators, and particularly Unicode operators. -> will become . - start accepting that now. This will mean that the concatenation operator will have to change, and this will probably become ~.

The bell tolled, and so Larry had to move onto talk four - control structures. To loud applause, he announced that Perl 6 would include a switch statement; to some bemusement, however, he let on that it would be called "given" - case statements would be called "when". Another notable renaming: "local" will become "temp".

Larry reiterated the need for optional type and property declarations; this isn't the same as typing, but it's a way of specifying metadata about a variable or subroutine. As you supply Perl with more metadata about variables, it will gain efficiency in terms of storage and manipulation. As the bell went, Larry was just explaining how 90% of code wouldn't get that much faster but...

Regular Expressions, Formats, Packages and Modules

Larry's been thinking a lot about direct assignment to variables from within a regular expressions, but has decided this isn't the real problem - the real problem is that people want to build up structures of anonymous hashes or arrays from a regular expression.

There will be set operations on characters classes: for instance, you'll be able to specify a match of all letters without the vowels. Actually, this will probably be in Perl 5 as well. There was also time to tell us that he'd like the /x modifier to be on by default.

The next talk was about subroutines. Prototypes will be extended into a complete type signature system; the sub keyword would be discarded on closures - in fact, any curly braces which are not immediately recognisable will be assumed to be a closure. Parameters to closures will be "self-identifying", such as $^a and $^b.

Larry said that formats would be a module, so he didn't have to say anything more about it.

The reference talk opened to the statement that pseudo-hashes must die; there was loud applause. Larry reiterated that dereferences will be assumed when a scalar is followed by braces - that is to say, what is now $foo->[$a] will be reduced to $foo[$a] in Perl 6. This will require prohibiting whitespace before hash subscripts.

In terms of data types, there will be compact arrays; pseudo-hashes will be replaced by opaque objects with named parameters. The => operator will now create a pair type; the range operators will create a slightly different type of pair, which gets expanded out to a range on demand.

There will be a distinction between classes and modules. Within a class or a module, there will be subpackages: there will be no more need to type out Myclass::SubclassA::SubclassB; just like Unix has relative pathnames and directories, Perl will have modules that can be specified as relative namespaces.

On the theme of modules, module names would be extended to include metadata on the version and the author's name. The default use statement would allow the version and author's name to be wildcards. There will also be virtual interface modules, and better automation of documentation based on module metadata.

Objects should be easy to declare and have accessible metadata. When you put an attribute onto the module, it will appear as a variable inside of the class; outside of the class, it can be accessed as a method. There will also be optional multimethods, and syntax like Class.bless($ref) to bless a reference into a class.

Addressing Concerns

Larry mentioned that overloading was a headache; you hate it in languages that have it, and you hate it when languages don't have it. Sometimes there are gratuitous abuses of overloading, such as C++'s left-shift operator to add more arguments to a function. The overloading system will be a lot cleaner, by specifying operators as special method names. The vtable system will be leveraged to provide overloading on objects. There will also be overloading hooks in printf so that new formats can be defined.

Larry said that many of the proposals about tied variables missed the point; what is tied is the variable, and not the value. Tying needs to be naturally scoped to the variable's lifetime, and tying needs to be declared in advance for efficiency.

He also begged for compassion towards programmers when dealing with Unicode; we don't want to force people to have to deal with Unicode if they don't want to, and equally we don't want to leave Unicode people out in the cold. Strings need to be completely polymorphic, with internal routines able to specify what type of strings they're able to cope with. Normalization will normally be done at the filehandle level, and the type system must remember whether or not some data has been normalized.

The IPC talk asked for "no pain" installation of new protocols; there will be easy mapping of high-level structured data, such as XML-RPC or SOAP, from the network onto Perl's internal data structures. Perl 6 will continue to have safe signals. IPv6 will be supported.

The thread model will basically be ithreads, the new model in 5.6.0; variables may be shared by declaration, but will not be shared by default. The Pthread model will mean "share everything". Modules ought to be thread safe by default - they should declare their thread-safety or otherwise in their metadata.

Perl will have a parser of its own, written in Perl. This will allow us to bytecompile the parser, and also have the parser modifiable. It'll also help us port eval to Perl running on small virtual machines. Lexical analysis will remain as a one-pass process, and subroutines will be compiled immediately on parsing.

The command line interface will not dramatically change; only one RFC concerned it. Larry stressed Perl's role as a glue language, and that it must cooperate with its environment.

Larry also had a shameful confession; he wrote the Perl debugger, but hardly ever uses it - he's more a print-statement kinda guy. Hence, he's happy to delegate the writing of the debugger to other people. He did, however, point out the heavy dependency a debugger has on the debugging facilities of the platform it runs on.

He's also punting on the internals, leaving the details of that to Dan Sugalski, who's talking tomorrow. However, he said that the internals will be much more modular; they will comprise a software CPU, and regular expressions will compile down to normal Perl opcodes. It will implement a variety of garbage collection models, and will use vtables to despatch operations on values.

What about CPAN?

CPAN got too big to download, and there's a problem that ISPs never install enough of it for what their users want. Bundles are a partial solution to this, but of late there has been more interest in the development of an SDK for Perl.

Tainting will be implemented by the new property method, and sandboxing will be achieved by using separate interpreter threads.

Perl 6 will attempt to remove some common goofs: Perl 5 has already stopped taking up huge quantities of memory when you say foreach $i (1 .. 1_000_000_000) { ... };, but Perl 6 will also apply the same optimization to @big = 1 .. 1_000_000_000;. There'll also be support for embedding your tests in POD.

Speaking of POD, =begin and =end will work for commenting because now =end will go back to the previous state rather than going to POD. Larry wants multiple POD streams, noting that the DATA filehandle is essentially just another POD stream.

Perl culture has been mostly self-correcting; the Perl 6 announcement drove people to work together more strongly and fix up some of the flaws of the community. However, Larry encouraged everyone to do their part; newbie friendliness is one thing, but it's important to hold yourself to higher standards than those to which you hold yourself.

The cleanup of special variables will be an exercise in balancing cleanup with convenience. For instance $_ will obviously be staying; $(, on the other hand, will go. This allows us to free up $( ... ) for interpolating expressions. Bareword filehandles will go, and error status variables may be merged.

Built-in functions. For functions with terribly long return lists, such as stat, Perl will return modules; some subset of the proposed array operators (merge, partition, flatten, etc.) will be included in the core. The logical return values from functions such as system will be made more sensible; select will be removed.

The standard Perl library will be almost entirely removed. The point of this is to force ISPs to believe that Perl is essentially useless on its own, and hence they will need to install additional modlues. Pragmatic modules will have more freedom to warp Perl's syntax and semantics, and will also provide real optimization hints.

Larry suggested that the standard module library could be downloaded basically on demand; there will be a few modules which support basic built-in functionality and their documentation. Perl 6 should deal with I18N and L10N issues, and also support the sort of exception handling that you would expect from languages of its type.

"There are 35 seconds left. Any questions?"

So that's the rudiments of the design of Perl 6. Over the coming months, we'll see the rest of the Apocalypses, backed up by Exegeses from Damian, and hopefully, eventually, code from Dan and the rest of the team. We'll bring you our analysis of how the Perl 6 language is shaping up when we all get home from the conference!

Mail Filtering with Mail::Audit

Let's face it. procmail is horrid. But for most of us, it's the only sensible way to handle mail filtering. I used to tolerate procmail, with its grotesque syntax and its less-than-helpful error messages, because it was the only way I knew to separate out my mail. One day, however, I decided that I'd been told "delivery failed, couldn't get lock" or similar garbage for the very last time, and decided to sit down and write a procmail replacement.

That's when it dawned on me that what I really disliked about procmail was the recipe format. I didn't want to handle my mail with a collection of colons, zeroes, and single-letter commands that made sendmail.cf look like a Shakespearean sonnet; I wanted to program my mail routing in a nice, high-level language. Something like Perl, for instance.

The result is the astonishingly simple Mail::Audit module. In this article, we'll examine what we can do with Mail::Audit and how we can use it to create mail filters. We'll also look at the News::Gateway module for turning mailing lists into newsgroups and back again.

What Is It?

Mail::Audit itself isn't a mail filter - it's a toolkit that makes it very easy for you to build mail filters. You write a program that describes what should happen to your mail, and this replaces your procmail command in your .forward or .qmail file.

Mail::Audit provides the functionality for extracting mail headers, bouncing, accepting, rejecting, forwarding, and filtering incoming mail.

A Very Simple Mail Filter

Here's the simplest filter program we can make with Mail::Audit.

    use Mail::Audit;
    my $incoming = Mail::Audit->new;
    $incoming->reject;

If you save this as ~/bin/chuckmail, then you can put the following in a .forward file:

    |~/bin/chuckmail

or in a .qmail file:

    preline ~/bin/chuckmail

Every mail message you receive will now pass through this program. The mail comes into the program via standard input, and the new() method takes it from there and turns it into a Mail::Audit object:

    my $incoming = Mail::Audit->new;

Next, we bounce it as undeliverable:

    $incoming->reject;

We could even get fancy, and supply a reason with the bounce:

$incoming->reject(<<EOF);
    The local user was silly enough to leave chuckmail as his
    mail filter.  Too bad you can't mail him to let him know.
EOF

This reason will be relayed back to the sender as part of the bounce message.

Separating Mail Into Folders

The one thing most people use procmail for is to separate mail out into several mail folders. Here's an example of how we'd do this:

    use Mail::Audit;
    my $item = Mail::Audit->new;

    if ($item->from =~ /perl5-porters/) {
        $item->accept("/home/simon/mail/p5p")
    }

    $item->accept;

Now any mail with perl5-porters in the From: line will be added to the file mail/p5p under my home directory. Any other mail will be accepted into my inbox as normal.

Two things to note here:

  • Once the mail has been filed to mail/p5p via accept(), it leaves the program. Game over, end of story. The same goes for the other methods such as reject(), pipe(), and bounce().
  • The last line in the program should probably be an accept() call; mail that reaches the end of the program without being deposited in a mailbox or rejected will be silently ignored. (This may change to an implicit accept() in a later version, to be more procmail-like.)

If you've got a few mailing lists or people you want to filter, you could do this:

  use Mail::Audit;
  my $item    = Mail::Audit->new;
  my $maildir = "/home/simon/mail/";

  my %lists = (
      perl5-porters    => "p5p",
      helixcode        => "gnome",
      uclinux          => "uclinux",
      'infobot\.org'   => "infobot",
      '@dion\.ne\.jp'  => "yamachan"
  );

  for my $pattern (keys %lists) {
      $item->accept($maildir.$lists{$pattern})
          if $item->from =~ /$pattern/ 
             or $item->to =~ /$pattern/;
  }

  $item->accept;

This time, we perform a regular expression match to see if either the From: line or the To: line match any of the patterns in our hash keys, and if they do, direct the mail to the corresponding folder. Since we're using ordinary Perl regular expressions, we can do this sort of thing:

  '\bxxx.*\.com$'  => "spam"

(And you'd be surprised at quite how much junk mail that one traps.)

Here's another simple but remarkably effective spamtrap recipe:

  $item->accept("questionable")
      if $item->from !~ /simon/i and $item->cc !~ /simon/i;

We check the From: and CC: headers for my name, and if it's not in either - the mail probably isn't to me. This one only makes sense after we've filtered out mailing list messages, which could validly be sent from a subscriber to a generic list address.

Mail and News

I much prefer reading mailing lists as newsgroups; while a good mail client like mutt can display mail as threaded discussions, I personally prefer navigating in a newsreader. So, how do we gate mailing lists to newsgroups and back? Russ Allbery's News::Gateway module helps do just that - it provides a program called listgate which takes an incoming mailing list message, reformats it as a valid news article, and then posts it to the news server. We can plug this into our mail filter quite easily; assuming we've got the group lists.p5p set up on the local news server and we've configured listgate appropriately, we can just say:

  $item->pipe("listgate p5p") if $item->from =~ /perl5-porters/;

Again, if we've got multiple groups, we can use a hash to correlate patterns to groups as we did with mailing lists above.

So much for getting incoming mail to news - what about getting posted articles back into the mailing list? The key to this is in the newsgroup moderation system - when you post to a moderated newsgroup, the article is mailed to a moderator for approval. If we set the moderator of lists.p5p to the list address, we can get our outgoing posts sent to the list. In /usr/news/etc/moderators, you'd say

    lists.p5p:  perl5-porters@perl.org

Very easy. The only problem is that it doesn't work. Mail messages and news articles have a slightly different format, and some mailing list managers will reject mail messages that look like news articles. So we need to send our message through a clean-up phase first. Instead of sending it to perl5-porters@perl.org, we'll instead send it to news-outgoing@localhost:

    lists.*:    news-outgoing@localhost

Mail arriving at that account needs to go through another Perl program to clean up and dispatch the outgoing article, and that looks like this:

    #!/usr/bin/perl
    use News::Gateway;
    my $gw=News::Gateway->new(0);
    $gw->modules( 'newstomail', 'headers');
    $gw->config_line("newstomail /home/simon/bin/news2mail.h");
    $gw->config_line("header newsgroups drop");
    $gw->config_line("header organisation drop");
    $gw->config_line("header nntp-posting-host drop");
    $gw->read(*STDIN) or die $!;
    $gw->apply();
    $gw->mail();

This reads an article from standard input, drops the Newsgroups, Organisation, and NNTP-Posting-Host headers, reformats it as a mail message using the configuration file /home/simon/bin/news2mail.h to find the address, and then sends it. That config file is just a list of newsgroups and the addresses they belong to:

    lists.p5p perl5-porters@perl.org
    lists.tlug tlug@tlug.gr.jp
    lists.advocacy advocacy@perl.org
    lists.linux-kernel linux-kernel@vger.rutgers.edu
    lists.perl-friends perl-friends@perlsupport.com

So here's the recipe for filtering news to mail and back again:

• Incoming messages

will be trapped by a rule in your mail filter, and be piped to listgate via a line like

    $item->pipe("listgate p5p")
		if $item->from =~ /perl5-porters/;

listgate will then post them to your news server, to the group lists.p5p.

• Outgoing articles

will be sent to the moderator address, news-outgoing@localhost for cleanup. The cleanup program will drop unnecessary headers, reformat as a mail message, and then look at the configuration file to determine where to send them on. They'll be sent to the mailing list, and sometime later will be returned to you by mail, to appear in the newsgroup as above.

A Complete Filter

Here, to show off exactly what I do with Mail::Audit, is a suitably anonymized and annotated version of the filter I currently use to process my incoming mail.

    #!/usr/bin/perl
    use Mail::Audit;
    $folder = "/home/simon/mail/";

Anything that actually reaches me is going to be logged so that I can tail -f a summary of incoming mail to one of my terminals.

    open (LOG, ">/home/simon/.audit_log");

Read in the new mail message, and extract the important headers from it:

    my $item = Mail::Audit->new;
    my $from = $item->from();
    study $from;
    my $to = $item->to();
    my $cc = $item->cc();
    my $subject = $item->subject();
    chomp($from, $to, $subject);

If I'm likely to be at the office, I appreciate a copy of all mail I receive, in case there's something I need to deal with immediately. So I need time-controlled filtering. Try doing this with procmail:

    my ($hour, $wday) = (localtime)[2,6];
    if ($wday !=0 and $wday !=6         # Not Saturday/Sunday
        and $hour > 9 and $hour < 18) { # Between 9am and 6pm
        print LOG "$subject: $from: Bouncing to work\n";
        $item->resend('simon@theoffice.com');
        # resend is the only action
        # which doesn't end the program.
    }

One of my users didn't have their own email address for a while, so they had their friends send mail to me instead. Now they have their own address, so the mail is bounced across to them:

    $item->bounce('ei@somewhere.com') if $subject =~ /^For Ei:/;

I maintain two FAQs: the perl5-porters FAQ and the Tokyo high speed connectivity FAQ. The mail comes to different email addresses, but it all ends up at my box. They need to go in separate folders.

    $item->accept("$folder/p5p-faq")   if $to=~ /p5p-faq/;
    $item->accept("$folder/tokyo-faq") if $to=~ /faq/;

I get some mail in Greek which needs to be processed with metamail to sort out the character sets. The pipe method squirts the mail to a separate program:

    $item->pipe("metamail -B -x > $folder/greek")
				if $from =~/hri\.org$/;

Some people I definitely want to hear from, so they get accepted at this stage to save time:

    for (qw(goodguy dormouse locust)) {
        if ($from =~ /$_/) {
            print LOG "$from:$subject:Exception, 
				accepting into inbox\n";
            $item->accept;
        }
    }

Some people I very definitely do not want to hear from:

    for (qw(badguy nasty enemy)) {
        if ($from =~ /$_/) {
            print LOG "$from:$subject:Dumped\n";
            $item->reject("Go away! Stop emailing me!");
        }
    }

Some people or mailing lists I currently just don't have time for, so they get silently ignored:

    for (qw(freshmeat.net microsoft news\@myhost cron)) {
        if ($from =~ /$_/) {
            print LOG "$from:$subject:Ignored\n";
            $item->ignore;
        }
    }

Some mailing lists I want to stay as lists:

    my %lists = (
        "pound.perl.org"	=> "purl",
        "helixcode"      	=> "gnome",
        "uclinux"        	=> "uclinux",
        "infobot"        	=> "infobot",
        "european-"      	=> "yapc",
        "tpm\@otherside" 	=> "tpm",
        "hellenic"       	=> "greeknews",
    );
    for my $what (keys %lists) {
        next unless $from =~ /$what/i or 
			$to =~ /$what/i or $cc =~/$what/i;
        my $where = $lists{$what};
        print LOG "$from:$subject:List, 
			accepting to folder $where\n";
        $item->accept($folder.$where);
    }

And some I want to pipe to listgate as newsgroups:

    my %gated = (
        "tlug"	=> "tlug",
        "advocacy"	=> "advocacy",
        "security-sig"	=> "security",
        "iss.net"	=> "security",
        "securityfocus"	=> "security",
        "perl5-porters"	=> "p5p",
        "linux-kernel"	=> "linux-kernel",
        "perlsupport"	=> "perl-friends",
    );

    for my $what (keys %gated) {
        next unless $from =~ /$what/i or 
			$to =~ /$what/i or $cc =~/$what/i;
        my $where=$gated{$what};
        print LOG "$from:$subject:Gated to lists.$where\n";
        $item->pipe("/usr/local/bin/listgate $where");
    }

Some spammers just don't give up, so we actually reject their messages. We do this based on subject, which is a bit risky but seems to work:

    for ("Invest", "nude asian"))  {
        $item->reject("No! Go away!")
				if $subject=~/\b$_\b/;
    }

Before we let the article in to the inbox, there's a long list of patterns at the end of the program which match known spam senders. We check the incoming mail against this list, and save it for analysis and reporting:

    while (<DATA>) {
        chomp;
        next unless $from =~ /$_/i or $to =~ /$_/i;
        print LOG "$from:$subject:Spam?\n";
        $item->accept($folder."spam");
    }

Now our final check for mail which doesn't appear to be for us:

    if ($item->from !~ /simon/i and $item->cc !~ /simon/i) {
        print LOG "$from:$subject:Badly addressed mail\n";
        $item->accept("questionable")
    }

Finally, we let the mail in:

    print LOG "INCOMING MAIL:$from:$subject:
			Accepting to inbox\n";
    $item->accept();

Caveats

I'm perfectly happy to trust Mail::Audit with all my incoming email. For a while it was running alongside procmail, but now it rules the roost. However, there are some things which you do need to take care about if you want to run it yourself.

Mail::Audit has been tested on qmail and postfix - it should work fine on other MTAs (Message Transfer Agents), so long as they believe that exit 100; means reject. If they don't, you can override the reject method like this:

    $item = Mail::Audit->new(
            reject => sub { exit 67; }
    );

It also assumes that the default mailbox is /var/spool/mail/name where name is user ID of the current user. If this isn't the case, (I believe mh doesn't work like this) say accept("Mailbox") or override accept with a subroutine of your own.

Finally, Mail::Audit isn't sophisticated. It's little other than a wrapper around Mail::Internet. While it's probably perfectly fine for most filters you want to write, don't expect it to do everything for you.

Conclusion

Mail::Audit and News::Gateway are both available from CPAN; together they allow you to very easily construct mail filters and newsgroup gateways in Perl. It's a great way to filter your mail with Perl, and an excellent replacement for moldy old procmail.

Copyright The Perl Journal. Reprinted with permission from CPM Media LLC. All rights reserved.

This Week on p5p 2001/07/16

Notes

This Week on P5P

5.7.2 is out!

Many threading fixes

package;

SUPER:: debate rages on

Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

A somewhat abridged summary this week, since I'm out and on the road. Today Boston, tomorrow Montreal - Wednesday, the world!

5.7.2 is out!

Jarkko released Perl 5.7.2 on Friday the 13th, tempting fate a little.

5.7.2 has an odd-numbered subversion and so it's a development release, use it at your peril.

Jarkko says:

    Perl 5.7.2 can be considered to be the virtual Release Candidate
    Zero for the Perl 5.8.0, it is just not called a Release Candidate.
    It is in pretty good shape, it is just that it is not *quite* yet
    ready to be a major release.  No large changes are expected between
    now and 5.8.0.

Many threading fixes

Artur Bergman put in a lot of work to move PMOPs into the pad; PL_regex_padav will now give you a padlist of regexes. He also made Perl use re-entrant C library calls where available - workarounds for localtime and gmtime are used when Perl is configured with -Dusereentrant. There's also a per-interpreter memory buffer which helps out with the re-entrant stuff. (No, I'm not sure how.)

Artur notes that Win32 and Digital Unix are already re-entrant due to sane C libraries, but other C runtimes may need -Dusereentrant.

Abhijit also wrote re_dup, a cloning function for regular expressions, and there have been discussions of deep cloning functions as well.

package;

The "super-strict" package; construction - used for turning off the default package - has been judged to be confusing, buggy as hell and a pain in the neck. As such, it's being ripped out - Abhijit Menon-Sen is seeing to this.

SUPER:: debate rages on

Oh well, I tried to quickly brush off the discussion of SUPER:: last week, but it didn't work: 60 or so messages were wasted, uh, spent on this interminable argument, which essentially boils down to " SUPER:: is resolved at compiled time, not run time. This is wrong. Oh, no it isn't. Oh yes it is." Great.

Various

Russ Allbery released a new version of the podlators and Term::ANSIColor.

Sadayuki Tomohiro produced a bunch of very useful Encode fixes, which unfortunately just missed 5.7.2.

Jeff Pinyan spent ages perfecting a patch to warn on q//o, and then realised this was a bad idea. Roughly the same thing happened for his idea of Scalar::Utils::curse, which was more complex than it might first appear. Oh, and Larry didn't like it the last time someone tried this. Schwern used the curse patch to take the opportunity to encourage people to use Test::More when writing new test suites. Jonathan Stowe added an option to h2xs to produce Test::More-aware test suites automatically.

Until next week I remain, your humble and obedient servant,


Simon Cozens

Symmetric Cryptography in Perl

Having purchased the $250 cookie recipe from Neiman-Marcus, Alice wants to send it to Bob, but keep it away from Eve, who snoops on everyone's network traffic from the cubicle down the hall.

How can Perl help her?

Ciphers

Cryptographic algorithms, or ciphers, offer Alice one way to protect her data. By encrypting the recipe before sending it over the network, she can render it useless to anyone but Bob, who alone possesses the secret information required to decrypt it.

Ciphers were once closely guarded secrets, but relying on the secrecy of an algorithm is a risky proposition. If your security were somehow compromised, adversaries could read all of your past messages, and (if you ever discovered the breach) you must find an entirely different algorithm to use in future.

Modern ciphers, usually publicly known and widely studied, rely on the secrecy of a key instead. They encrypt the same plaintext differently for each key; to decrypt a ciphertext, you must know the key used to produce it. New keys are easy to generate, so the compromise of a single key is a smaller problem. Although messages encrypted with the stolen key are rendered readable, the algorithm itself can be reused.

Algorithms that use the same key for both encryption and decryption are called symmetric ciphers. To use such an algorithm, Alice and Bob must agree on a key to use before they can exchange messages. Since decryption depends only on the knowledge of this key, they must ensure that they share the key by a secure channel that Eve cannot access (Alice could whisper the key into Bob's ear over dinner, for example).

Most well-known symmetric ciphers are block ciphers. The plaintext to be encrypted must be split into fixed-length blocks (usually 64 or 128 bits long) and fed to the cipher one at a time. The resulting blocks (of the same length) are concatenated to form the ciphertext.

The ciphers in widespread use today vary in strength, key length, block size and their approach to encrypting data. Some of the popular ciphers (IDEA, Twofish, Rijndael) are implemented by eponymous modules in the Crypt:: namespace on the CPAN (Crypt::IDEA and so on).

To decide which cipher to use for a particular application, one must consider the strength and speed required, and the computational resources available. The decision cannot be made without research, but IDEA is often considered the best practical choice for a general purpose cipher.

Keys

Symmetric ciphers usually use randomly generated keys (typically between 64 and 256 bits in length), and computers are notoriously bad at truly random number generation. Fortunately, many modern systems have some support for the generation of cryptographically secure random numbers, ranging from expensive hardware to device drivers that gather entropy from the timing delay between interrupts.

Crypt::Random, available from the CPAN, is a convenient interface to the /dev/random device on many Unix systems. Once installed, it is simple to use:

    use Crypt::Random qw( makerandom );
    $key = makerandom( Size => 128, Strength => 1);

For cryptographic key generation, the Strength parameter should always be 1. The Size in bits of the desired key depends on the cipher you want to use the key with. Typical symmetric key sizes range from 128 to 256 bits.

How Can I Use These in Perl?

The Crypt modules all support the same simple interface: new($key) creates a cipher object, and the encrypt() and decrypt() methods operate on single blocks of data. The responsibility for key generation and sharing, providing suitable blocks, and the transmission of the ciphertext, lies with the user. In the examples below, we will use the Crypt::Twofish module. Twofish is a free, unpatented 128-bit block cipher with a 128, 192, or 256-bit key.

    use Crypt::Twofish;
    # Create a new Crypt::Twofish object with the 128-bit key generated
    # above.
    $cipher = Crypt::Twofish->new($key);
    # Encrypt a block full of 0s...
    $ciphertext = $cipher->encrypt(pack "H*", "00"x16);
    # And then decrypt the result.
    print unpack "H*", $cipher->decrypt($ciphertext);

The implementation raises an important issue: What does one do with the second chunk of an 18-byte file? Twofish cannot operate on anything less than a 16-byte block, so padding must be added to the end of the last block to make it 16 bytes long. NULs (\000) are usually used to pad the block, but the value used doesn't matter, because the padding is removed after the ciphertext is decrypted.

Alice can now use this code to encrypt her recipe:

    # Assume that $key contains a previously-generated key, and that
    # PLAINTEXT and CIPHERTEXT are filehandles opened for reading and
    # writing respectively.
    $cipher = Crypt::Twofish->new($key);
    while (read(PLAINTEXT, $block, 16)) {
        $len   = length $block;
        $size += $len;
        # Add padding if necessary
        $block .= "\000"x(16-$len) if $len < 16;
        $ciphertext .= $cipher->encrypt($block);
    }
    # Record the size of the plaintext, so that the recipient knows how
    # much padding to remove.
    print CIPHERTEXT "$size\n";
    print CIPHERTEXT $ciphertext;

The output of this program can be safely sent across the network to Bob, perhaps as an e-mail attachment. Bob, having received the secret key by some other means, can then use the following code to decrypt the message:

    $cipher = Crypt::Twofish->new($key);
    $size = <CIPHERTEXT>;
    while (read(CIPHERTEXT, $ct, 16)) {
        $pt .= $cipher->decrypt($ct);
    }
    # Write only $size bytes of the output; ignore padding.
    print PLAINTEXT substr($pt, 0, $size);

This is really all we need for symmetric cryptography in Perl. Using a different cipher is simply a matter of installing another module and changing the ``Twofish'' above. From a cryptographic perspective, however, there are still some problems we must consider.

Cipher Modes

The code above uses the Twofish cipher in Electronic Code Book (ECB) mode, meaning that nth ciphertext block depends only on the key and the nth plaintext block. For a particular key, one could build an exhaustive table (or Code Book) of plaintext blocks and their ciphertext counterparts. Then, instead of actually encrypting the plaintext, one could simply look at the relevant entries in the table to find the ciphertext.

Because of the highly repetitive nature of most texts, plaintext blocks and their corresponding blocks in the ciphertext tend to be repeated quite often. Further, it is often possible to make informed guesses about parts of the plaintext (Eve knows, for example, that Alice's messages all have a long Tolkien quote in the signature).

Given enough patience and ciphertext, Eve can start to build a code book that maps ciphertext blocks to plaintext ones. Then, without knowing either the algorithm or the key, she could simply look up the relevant blocks in the intercepted ciphertext and write down large parts of the original plaintext!

Several new cipher modes have been invented to address this problem. One of the most generally useful ones is Cipher Block Chaining. CBC starts by generating a random block (called an Initializ,ation Vector, or IV) and encrypting it. The first plaintext block is XORed with the encrypted IV before being encrypted. Thereafter, each block is XORed with the ciphertext of the block preceding it, and then encrypted.

Here, each ciphertext block depends on the preceding ciphertext block, and the plaintext blocks so far. Thus, the blocks must be decrypted in order, and none of the patterns displayed by ECB are present. The IV itself does not need to be kept secret, and is usually transmitted with the ciphertext like $size above.

Decryption of the ciphertext proceeds in the opposite order. The first ciphertext block is decrypted and XORed with the IV to form the first plaintext block, and each ciphertext block thereafter is XORed with the previous one to form a plaintext block. Other modes are similar in intent, but vary in detail, including the way errors in transmission affect the ciphertext, and the amount of feedback or dependency on previous blocks.

Alice and Bob could alter their code to perform cipher block chaining, but the handy Crypt::CBC module can save them the trouble. The module, available from the CPAN, is used in conjunction with a symmetric cipher module (like Crypt::Twofish). It handles padding, IV generation and all other details. The user only needs to specify a key, and the data to be encrypted or decrypted.

Thus, Alice could just do:

    use Crypt::CBC;
    $cipher = new Crypt::CBC ($key, 'Twofish');
    undef $/; $plaintext = <PLAINTEXT>;
    print CIPHERTEXT $cipher->encrypt($plaintext);

And Bob could do:

    use Crypt::CBC;
    $cipher = new Crypt::CBC ($key, 'Twofish');
    undef $/; $ciphertext = <CIPHERTEXT>;
    print PLAINTEXT $cipher->decrypt($ciphertext);

Much simpler!


Asymmetric Cryptography

Asymmetric (or public-key) ciphers use a pair of mathematically related keys, and the algorithms are so designed that data encrypted with one half of the key pair can only be decrypted by the other. Bob can generate a key pair and keep one half secret, while publishing the other half. Alice can then encrypt the recipe with Bob's public key, knowing that it can only be decrypted with the secret half. Although this eliminates the need to share keys over a secure channel, it has its problems, too. For one, most public key encryption schemes require much longer keys (often 2048 bits or more) and are much slower.

The Crypt namespace contains modules for public key cryptography as well. Crypt::RSA is a portable implementation of the (now free) RSA algorithm, one of the most widely studied public-key encryption schemes. There are interfaces to various versions of PGP (Crypt::PGP2, Crypt::PGP5, Crypt::GPG), as well as implementations of public-key based signature algorithms (Crypt::DSA).


Cryptanalysis

Unfortunately, our implicit assumption that the ciphertext is useless to Eve is not always true. Depending on the information and resources that are available to her, she can try various means to retrieve the recipe. The simplest strategy is to try and guess the key Alice used. This is known as a brute-force attack, and involves repeatedly generating random keys and trying to decrypt the ciphertext with each one.

The effectiveness of this approach depends on the size of the key: the longer it is, the more possible keys there are, and the more guesses will be required, on average, to find the right one. Thus, the only possible defense is to use a key long enough to make a key search computationally impractical.

How long is a safe key? DES with 56-bit keys was recently cracked in a little less than a day, but the 128-bit keyspace (range of possible keys) is 4 * 10**21 times larger still. Although computing power is becoming cheaper, it seems likely that 128-bit keys will be safe from brute-force attacks for many years to come.

Of course, there are far more sophisticated attacks that they may be vulnerable to. As we saw in the description of ECB, cryptanalysts can often exploit patterns in the plaintext (long signatures, repeated phrases) or ciphertext (repeated blocks) to great advantage, or they may look for weaknesses (or exploit known ones) in the algorithm. Often, a combination of such techniques reduces the potential keyspace enough that a brute-force attack becomes practical.

Cryptanalysis and cryptographic techniques advanced hand-in-hand; new ciphers are designed to withstand old attacks, and newer attacks are attempted all the time. This makes it very important to stay abreast of current advances in cryptographic technology if you are serious about protecting your data for long periods of time.


Further Resources

google.com
The Great God Google knows enough about cryptography-related material to keep you occupied for a considerable amount of time.

perl-crypto-subscribe@perl.org
The perl-crypto mailing list, although not very active at the moment, is intended for discussion of all aspects (both user and developer level) of cryptography with Perl.

Crypt::Twofish
The Crypt::Twofish module, which has benefited from the inputs of several people, is a good example of how to write a portable cipher implementation.

This Week on p5p 2001/07/09

Notes

This Week on P5P

No 5.8.0 Yet

New Modules

Numbers, numbers, numbers

PerlIO considered evil

Asynchronous Callbacks

Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

This was a reasonably busy week, seeing just over 400 messages. I say that every week, don't I?

No 5.8.0 Yet

Jarkko sadly announced that 5.8.0 wasn't going to happen before the Perl Conference, but 5.7.2 is imminent:

I think it's time for me to give up the fantasy that 5.8.0 will happen before the Perl Conference. The stars just refuse to align properly, too many loose ends, too many delays, too many annoying mysterious little problems, too little testing of CPAN modules. Luckily, nothing is majorly broken, and I think that I've more or less achieved what I set out to do with Perl, so I still hope to be able wrap something up before TPC and call it 5.7.2 (it must happen next week or it will not happen), and soon after the conference put out the Release Candidate 1 for 5.8.0, and then keep cracking the whip till we are happy with what we've got.

New Modules

These aren't really very new, but they may have slipped through the net and you haven't noticed them yet, and since they're interesting, you might want to have a look...

I18N::LangTags detects and manipulates RFC3066 language tags; Locale::Maketext is extremely useful for localizing text; Unicode::UCD is a neat interface to the Unicode Character Database; Encode is coming on strong, and can now read IBM ICU character tables; Mark-Jason Dominus' Memoize module is now part of the core.

Numbers, numbers, numbers

Remember last week's weird Amdahl UTS bug, where Nicholas Clark was convinced UTS C was doing a decrement statement twice? He found the problem - the decrement statement shouldn't have been there at all...

This prompted him to find a bug in grok_number; this surprised me a little, because I didn't know that grok_number even existed. All of the useful, platform-independent code which deals with numeric operations - casting between different sizes, converting binary, hex, and octal numbers, recognising numbers in strings, and so on, has been moved to numeric.c. Take a look at it, there's a load of handy stuff in there.

Hal Morris, our UTS wizard, also pointed out some unpleasant casting assumptions, which needed a patch:

UV_MAX must NOT be defined as (unsigned long){whatever} for UTS, because then comparisons with double will not work correctly (there is no problem with (unsigned) typecasts, only with (unsigned long))

grok_number again came in handy on QNX, when Norton Allen found that strtoul wasn't setting errno correctly on overflow. It's sad when we have to start reimplementing people's broken C libraries, but this is the price of portability.

PerlIO considered evil

Ilya found a bug in PerlIO, then found another bug while attempting to demonstrate it. The original bug was:

The *actual* problem is that char-by-char input requires DUPLICATE pressing of ENTER key for this key to be seen by Perl. Debugging this problem (via Term::ReadKey test suite) shows the following logic:

pp_getc() calls is_eof() which does getc/ungetc calls getc()

[BTW, I see no logic in this sequence of events.]

The problem is that ungetc() can't unget "\n" if this \n is the first char in the buffer, and quietly drops "\n" to the floor.

Ilya had a lot of invective set aside for PerlIO, which we need not go into. Needless to say, he did not provide an alternative implementation of a multi-layered standard IO system as a patch. Or indeed any patch at all.

Vadim Konovalov did provide a simple patch to clean something up, but then Andy and Nick both showed that it didn't help at all, the generated code being the same and some compilers not being able to cope with lvalue casts, which Nick had carefully removed and Vadim's patch reintroduced. Guess Nick might actually know what he's doing after all.

Asynchronous Callbacks

David Lloyd asked how do safely do asynchronous callbacks from C to Perl. Bejamin Stuhl suggested hacking the core to introduce some checks during the inter-opcode PERL_ASYNC_CHECK, and suggested that Perl 5.8.x had a public way of registering inter-opcode callbacks. David Lloyd replied that PHP/Zend already had this, and you could even implement a signal checking add-on module without any core hacking. Paul Johnson went one further, and suggested using a pluggable runops routine. Surprisingly, this has actually been implemented but nobody really knows about it; Devel::Cover apparently makes use of it. Of course, the problem is that only one thing can use a custom op loop at a time, so David suggested writing an XS module that allowed other modules to add callbacks. I hope that happens.

Artur Bergman popped up and, predictably, suggested using ithreads.

Various

Rudi Farkas found a weird one on Win32 - on that platform, executableness (the -x test) is determined by the filename being examined. For instance, foo.bat is classed as executable, but foo.bar is not, even if they contain exactly the same data. This rather curious design decision leads to the fact that if you call stat with a filename, the execute bit is set depending on the extension. If, on the other hand, you call fstat with a filehandle, Windows can't retrieve the filename and test the extension, so it silently sets the execute bit to zero, no matter what it gets fed. This is Bad, and means that -x _ on Windows is unpredictable. Radi provided a suggested workaround, but nobody cared enough about fixing something so obviously braindead to produce a patch.

Mike Schwern fixed up MakeMaker to stop producing extraneous -I...s when building extensions, and also found that the XS version of Cwd won't do much good as a separate CPAN module, since it relies on the core function sv_getcwd, which only appears in 5.7.1. Oops. Oh, and speaking of Cwd, Ilya patched it up a bit for OS/2, while noting that its results were untainted on that platform.

Ilya also fixed a glaring debugger bug (oh, the irony) prompting Jarkko to lament the lack of a test suite. Robin fixed up a couple of weird, weird bugs in B::Deparse.

Jonathan Stowe upgraded pl2pm to be warnings and strict clean. I'm not sure why.

Philip Newton patched a score of typos. Norton Allen updated the QNX documentation and provided a couple of other fixes.

Piers Cawley found something that looked like a bug in SUPER:: but was assured that it wasn't; Randal won the day with a reference to Smalltalk.

Abhijit Menon-Sen (look out for this guy...) made mkdir warn if it was given a non-octal literal, since that generally doesn't do what people want, and after prompting from Gisle, did the same for umask and chmod. Unfortunately, he forgot about constant folding...

Until next week I remain, your humble and obedient servant,


Simon Cozens

This Fortnight in Perl 6 (17 - 30 June 2001)

This Week

Perl Doesn't Suck

Multiple Classifications

Once Inherited, Twice Shy

Class::Object

The Internal String API

Miscellany

Last Words

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

Please send corrections and additions to bwarnock@capita.com.

The lists have been very light recently. During the last two weeks of June, three of the mailing lists received a mere 142 messages across 20 different threads. 40 different authors contributed. Only 5 threads generated much traffic. Eventually, I'll come up with a better way of reporting these meaningless metrics.

Perl Doesn't Suck

Adam Turoff provided a detailed summary of some recent battles against The Big Bean:

Now, at the end of the day, I have no fewer than five JVMs installed, all completely different implementations of two Java standards. As a Perl programmer, I find this abhorrent. Installing any version of Perl release in the last 7 years is no different from installing any other release: download, extract, ./configure -des, make, make test, make install. Done.

Elaine Ashton, however, disagreed, to a point:

I don't believe I was saying that. My point was that you had a bad experience installing Java on FreeBSD and have declared that it sucks to install it. Unsurprisingly, I have never had a problem installing or supporting Java on Solaris but there are plenty of things to grumble about Perl sometimes, especially if you deploy multiple versions and configurations across multiple platforms and multiple versions of those platforms.

Michael Schwern pointed out that Solaris is "Sun's Blessed Platform", and it shouldn't be surprising that Java should install easily there. The discussion then touched a bit on distributions, licensing, support roles, and, yes, even George Carlin.

Once Inherited, Twice Shy

Multiple Classifications

David Whipp asked if bless could take, and ref return, a list, allowing for a cleaner multiple-inheritance model for objects in Perl. Dan Sugalski simplified the request to object-based vice class-based inheritance, and then provided some potential trade-offs.

Damian, of course, submitted code to fake it in Perl 5. He did muse about an ISA property, though, which would act like @ISA, but at the object level.

Class::Object

Michael "Class::Object" Schwern asked why all this (Class::Object) had to be (Class::Object) in the core (Class::Object). Dan Sugalski opined:
Doing it properly in a module is significantly more of a pain than doing it in the core. Faking it with a module means a fair amount of (reasonably slow) perl code, doing it in the core requires a few extra lines of C code in the method dispatch opcode function.

To which, of course, Michael Class::Objected:

I've already done it, it was easy. Adding in an object-based inheritance system should be just as easy, I just need an interface. $obj->parents(@other_objs) is a little clunky.

...Look at Class::Object! Its really, really thin. Benchmark it, its no slower than regular objects. http://www.pobox.com/~schwern/src/Class-Object-0.01.tar.gz

[The Golden Troll Award goes to Dan Brien for this gem.]

The Internal String API

Dan Sugalski initiated discussion on the internal API for strings:

Since we're going to try and take a shot at being encoding-neutral in the core, we're going to need some form of string API so the core can actually manipulate string data. I'm thinking we'll need to be able to at least do this with string:

  • Convert from and to UTF-32
  • lengths in bytes, characters, and possibly glyphs
  • character size (with the variable length ones reporting in negative numbers)
  • get and set the locale (This might not be the spot for this)
  • normalize (a noop for non-Unicode data)
  • Get the encoding name
  • Do a substr operation by character and glyph

David Nicol suggested implementing strings as a tree, vice a contiguous memory block. After some pondering, this seemed to grow on Dan, and he is awaiting a yea-or-nay from Larry. Copy-On-Write for Strings will also be implemented, although there was no mention of a potential key signature.

Miscellany

Simon Cozens released an updated version of his Perl 6 emulator.

Marcel Grunauer announced a Proof-of-Concepts page for Perl 6, which contains info and links to Perl 5 modules that may provide a glimpse of things to come.

There were more complaints about operator choices. (Specifically, ~ for string concatenation, and . (the dot) for dereference (vice ->).)

Last Words

There's but three weeks till TPC 5.0 kicks off in San Diego.


Bryan C. Warnock

People Behind Perl: Nathan Torkington

So you use Perl, and you probably know that it was brought to you by "Larry Wall and a cast of thousands". But do you know these people that make up the Perl development team? Every month, we're going to be featuring an interview with a Perl Porter so you can get to know the people behind Perl. This time, Simon Cozens, www.perl.com editor, talks to Nathan Torkington, a long-time Perl developer and a mainstay of the Perl community.

Who are you and what do you do?

I'm Nathan Torkington. My day job is a book editor for O'Reilly and Associates. Before that I was a trainer, working for Tom Christiansen. Perl folks might know me as co-author (with Tom) of The Perl Cookbook, the schedule planner for The Perl Conference (and this year the Open Source Convention), or as the project manager for perl6.

How long have you been programming Perl?

Since the early 90s. I forget exactly when I first started programming in Perl. I think it was toward the end of 1992, when I was working with gopher and the early web. Back in those days we didn't talk about "the web", we were just trying to set up a Campus-Wide Information Service. I took the plunge and pushed for (and got) the www as the basis for the CWIS, even though Gopher was more mature and had more stuff on it.

Using the CERN httpd and line mode browser, I worked on interfacing a bunch of data sources to the web. Perl was, of course, the language for that. I did work with SGML, comma separated files, and even used oraperl once or twice (I feared and loathed Oracle, though, it was only once or twice!).

So I got into Perl with the web. When I moved to the US in 1996, I worked as a systems administrator at an ISP. There I rapidly became the Perl guy, writing admin scripts and CGI applications for in-house and customers. When I left in 1998, we were using mod_perl and had a bunch of good Perl programmers.

What, if anything, did you program before learning Perl?

I first learned Commodore BASIC when I was 8 or 9, then 6502 assembler. I learned Pascal on the IBM PC, as well as C and 8086 assembler. At university they taught us Pascal again, then Modula-2 or Modula-3 (whichever it was, it was with a MetroWerks Mac IDE that kept crashing). Through Pascal, and their eventual reteaching of C, I got the hang of pointers (yes, I wasn't a very GOOD assembly language programmer if it took me about six years to learn pointers and realize why my programs would sometimes die).

What got you into Perl?

The web. After I'd written a few programs, I really enjoyed it. The language was fun, and the culture good. It was so thrilling to be able to do something in 5 minutes and 20 lines that used to take two days in C. In some ways I miss that fun--over the years (and especially writing the Cookbook) I've learned so much about how to do things in Perl that there's not much discovery of fun new features any more. And I'm so used to being able to do things in 5 minutes and 20 lines that I'm no longer delighted when I can do so--I expect it!

Why do you still prefer Perl?

It can do everything I want to do, and I already know it. If I wasn't a Perl programmer, I'd probably be happy in Ruby or Python. But so long as there's Perl, I see no reason to become as familiar with those languages as I am with Perl.

That's not to say I think everyone should program exclusively in Perl. My friend Jules was the one who learned a lot of languages. He was writing extensions for Microsoft COBOL in 1992, and has done significant projects in C++ and Java. He works at a contracting company where the projects don't always lend themselves to Perl. I'm totally cool with that.

What sort of things do you do with Perl now? What was the last Perl program you wrote?

When I became a trainer, I was glad to stop being a full-time programmer. Since then I've begun to miss using Perl. These days I only write utilities and basic web applications. For instance, the last few projects I've written have been:

  • A small web-based database system for keeping track of the books I'm editing--author addresses, status, number of chapters to go, etc. That was a CGI program with home-grown templates and a DBM database.
  • Some small tools to unmangle Framemaker files for the Perl Conference proceedings.
  • A translation of a Python PDF-generating library to Perl. That's incomplete, because I struck Python code I need to research.

What coaxed you into the development side of Perl?

The feeling that I should know more about it. In some ways it's probably insecurity--"sure, you know all stuff about the Perl language, but can you tell an SV from an SUV?" So I started probing at the fringes of the internals.

Do you remember your first patch?

My first (and probably only ;-) real patch was to comment toke.c, the tokenizer. The code that works out where double-quoted strings and regular expressions end was a lot of fun (and by fun I mean "pain") to work out. Although I know a lot of low level (data structures) and high level principles about how the internals work (compile to op tree, interpret), I'm still missing a lot of the middle ground that would actually enable me to fix bugs.

What do you usually do for Perl development now?

Nothing, I'm a manager :-) I'm slowly herding Larry towards finishing the Apocalypses, and from that will spring perl 6. For Perl 5 I'm more behind the scenes. I'm often asked to nag or poke or otherwise get results from others.

Talking of Perl 6, how do you think the project's going so far?

Last year I'd naively hoped that we'd have an alpha for TPC, but that's not happening. Instead, we have the start of Larry's pronouncements. And while perl6 has been slower than we expected, perl5 has received a shot in the arm! Jarkko's patching like a madman, there are new internals hackers springing up, and there are cool new modules (SOAP::Lite, Inline, Attributes::*, Filter::Simple, etc.) coming out every week.

That's not to say that we're all blocked on Larry, though. We know the large shape (and a lot of the details) of how the internals will work. Dan's specing those out in Perl Design Documents, and your fair self has implemented a few of the projected perl6 syntactic features with patches to Perl 5. (You can download the Perl 6 emulator and play about with Perl 6.) If you haven't already seen Marcel Grunauer's page of Perl6-like modules, check it out!

So there's a lot of activity in perl6 as well as in perl5.

Finally, what's the best thing about Perl, and what's the worst thing about it?

Best thing? The way that it values programmer fun as much as anything else. Perl delights in being a language that is supposed to be fun. Having seen many people burn out, fun is a good thing. Fun is what keeps you sane, keeps you interested, keeps you going.

What's the worst thing? Probably the internals. They're ugly and resemble nothing so much as a Lovecraftian horror. We really want perl6 to have much nicer innards. Ideally it'll be almost as simple to program in perl6 innards as it is to program in perl6 itself. That's one of the main reasons to have perl6--a cleaner core.

This Week on p5p 2001/07/02

Notes

This week on P5P

Notes
Module Versioning and Testing
Carpentry
Regex Capture-to-variable
Perl on S390
UTS Amdahl
Various

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

This was a reasonably normal week, seeing the usual 500 or so messages.

Many thanks again to Leon for last week's summary.

Module Versioning and Testing

There's a move on to make the modules under ext/ free-standing: that is, to be able to say

cd ext/File/Glob

make dist

and get a bundle that can be uploaded to CPAN or otherwise distributed. The only problem with this is tests. Currently the tests are kept under t/ of the main root of the Perl source tree, and are run when a make test is done there. To make the modules freestanding, you'd have to move the tests to ext/Foo/Bar/t/ and have the main make test also traverse the extension subdirectories and run the tests there. But then, of course, there's another problem. Can you spot it?

make test is run from an uninstalled Perl, which needs explicit hints about where to find the Perl library. Hence, the tests in t/ directly wibble @INC. This wouldn't work if we're making the modules freestanding, or if we move the tests to ext/Foo/Bar/t/. So the trick, which nobody's done yet, is to move the tests to the right place, change the main make test to recurse extension subdirectories, but also to propagate an environment variable telling the module where to find the library. ( PERL5LIB is what you want for this.) That would be a nice little task for someone...

Speaking of testing, Schwern got Test::Simple and Test::More added to the core, bringing the first All Your Base reference into the Perl source tree. Oh, and how many different testing mechanisms?

Robin Houston incremented B::Deparse's version number and added some change notes. Here's what's improved since 5.6.1:

    Changes between 0.60 and 0.61 (mostly by Robin Houston) 
    - many bug-fixes 
    - support for pragmas and 'use' 
    - support for the little-used $[ variable 
    - support for __DATA__ sections 
    - UTF8 support 
    - BEGIN, CHECK, INIT and END blocks 
    - scoping of subroutine declarations fixed 
    - compile-time output from the input program can be suppressed, so that the 
      output is just the deparsed code. (a change to O.pm in fact) 
    - our() declarations 
    - *all* the known bugs are now listed in the BUGS section 
    - comprehensive test mechanism (TEST -deparse)

The new test mechanism is great: it runs the standard Perl test suite through B::Deparse and then back through Perl again to ensure the deparsed code still passes the tests. And, as a testament to the work that's been done on B::Deparse, they mostly do pass.

Schwern also bumped up ExtUtils::Manifest, which caused Jarkko to appeal for a script which checks two Perl source trees to find updated versions of modules without a version change. Larry Shatzer provided one, and Jarkko used it to update the versions in the current tree. Schwern also put Cwd on CPAN, and found a weird dynamic loading bug with the XS version of Cwd. Oh, and noted that after his benchmarking, there's no significant performance loss between 5.6.1 without PerlIO and bleadperl with it.

Carpentry

Mike Guy partially fixed a problem whereby when a magic variable like $1 is passed as a subroutine parameter, carp and the debugger don't see it properly. Tony Bowden took the opportunity to ask for either more or less documentation for longmess and shortmess depending on whether or not they were meant to be internal to Carp. Tony wrote a documentation patch himself.

Schwern asked for a better name for those functions, perhaps more in line with the carp/croak/cluck theme. Jarkko suggested yodel and yelp which Schwern implemented.

Andreas had a quick bitch about Schwern's, uh, idiosyncratic naming style:

Your style of naming things is just plain sick:

AnyLoader does anything but interface to any loader. Ima::DBI has something to do with DBI, but nothing with Ima. D::oh is the only really funny one. But it's from 1999 and getting old. Sex: I still don't know if it is fun or what. Bone::Easy? Same here. Semi::Semicolon? Same here.

And now yodle!

Unfair, though, since yodel and yelp weren't his...

However, Hugo objected to yodel/ yelp because the other verbs write to standard error, hence "speaking" whereas longmess and shortmess don't actually "say" anything. Jarkko agreed, and there the matter rested. (Modulo Rich Lafferty's suggestion of "flame" for objections in a written medium...)

Regex Capture-to-variable

Jeffrey Friedl (he of the Regexp book) came up with an interesting patch which adds the new special variable $^N for the most-recently assigned bracket match. This is different from $+ which is the highest numbered match; that's to say, given

    /(foo(bar))/

then $+ is equivalent to $2, whereas $^N is equivalent to the bracket match for the last closing bracket; that is, $1. This essentially allows you to do capture-to-variable, like this:

    (?:(\d+)(?{ $phone_number = $^N }))

without having to worry about which number bracket the match was. (Especially useful if you have to change your regexp around.) Whether this makes regular expressions cleaner or dirtier, I'll leave up to you... However, Jeffrey also noted that you can use regexp overloading (my, that's an obscure feature - look at re.pm) to make such syntax as

    (?<$phone_number>\d+)

work. Now that's cool.

Phillip Newton added a nmenomic: $^N is the most recently closed Nested parenthesis.

Perl on S390

Hal Morris got Perl going on Linux/390, with only one test failing. Good news for the new generation of mainframe hackers.

There's mixed news for the old-timers, though; Peter Prymmer has got it down to 10 test failures, but one of the tests completely hangs Perl. Apparently study under OS/390 is best avoided. He also started some investigation of a Bigint bug, under the direction of Tels and John Peacock, but left for his holidays and the discussion moved to the perl-mvs mailing list.

UTS Amdahl

Oh, and talking of weird platforms, UTS. Hal (who's actually from UTS Global, so much kudos there) has been testing out recent Perl builds on UTS, and turning up some ... take a deep breath ... icky numeric conversion issues.

Nicholas Clark was convinced they (well, some of them at least) were due to UTS C going through a foo-- statement twice, but Hal pointed out he didn't expect UTS C to be quite that braindead. On the other hand, Nick's analysis looked convincing...

Hal also fixed up hints/uts.sh so that UTS now configures and builds nicely at least.

Various

David Wheeler found a known but really, really weird bug with lexical arrays; if you do:

    my @a = foo(); my @b = foo();
    sub foo { $x = 0; my @ary; push @ary, "hi" if $x; push @ary, "ho"; return @ary }

@a gets "ho" as you'd expect, but @b gets "ho","ho". Ronald Kimball told him Not To Do That, Then.

Peter Prymmer noted that Perl on VMS was bailing out during the test suite, leading to lots of bogus failures. This only happens if none of the DBM libraries (GDBM, DB_File, NDBM or SDBM) were built. As the first three require external libraries that VMS doesn't have and the last one is currently broken, it's no wonder Perl is bailing out. The fix is to work out why SDBM has stopped building on VMS. Peter also produced a lot of other VMS and HPUX reports.

Andy was pleasantly surprised to note that the promised "binary compatibility with 5.005" actually works even in bleadperl. Perhaps we need to break more things.

John Peacock asked a portability question for XS bit-twiddling; he's trying to adapt a math library which depends on casting two numbers to a long and adding them together to avoid overflow. Jarkko's fantastic architecture experience was brought to bear as he revealed that Cray Unicos has long == short == int == double. Oh, and type casting has issues too, so you have to use a union. Nicholas Clark suggested the old trick of comparing the operands of an addition with the result; if the result is smaller than either of the operands, you've overflowed, so you add a carry and off you go.

The news that GCC 3.0 was out brought a rush of people testing Perl out with it; I got it through on Linux with all tests successful, as did H. Merijn Brand on HPUX, but with rather a lot more warnings. This was because HPUX messed up the test for __attribute__ due to using a HP linker instead of a GNU one. Merijn and Jarkko got this fixed up.

Artur continued his iThreads quest; he renamed the "shared" attribute to "static", (and then again to "unique" after objections) presumably to free it up for an attribute which actually does share variables between interpreters, and also added a function which cloned the Perl host. He said that threads-0.01, the new threading module, will be released to CPAN when 5.7.2 hits the road. (Which Jarkko keeps hinting will be very, very, very soon now.) He also complained bitterly when Marcel Grunauer tried to document attributes.pm as useful, despite the fact that Marcel has some really very cool modules on CPAN based on it...

Marcel also found, and Radu Greab fixed, an insiduous bug in split, whereby if the default whitespace pattern was used for one iteration of a loop, it would be used for all succeeding ones; the PMf_WHITE flag for the regular expression was being set but never unset. Urgh.

Ilya produced some rough changes documentation for OS/2, as well as some other little patches. Norton Allen provided some QNX updates.

Phillip Newton documented the neat

    $count = () = function()

idiom for counting the number of return values from an operation. That was something I hadn't seen before; you learn something new every day... Until next week I remain, your humble and obedient servant,


Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en