September 2001 Archives

Asymmetric Cryptography in Perl


Symmetric cryptography allows Alice to exchange secret messages with Bob over the network, but only after they have shared a secret key. If Alice and Bob don't live within commutable distance, or are unable to meet in person for any number of reasons, they have no option other than to share this key over the network. This, however, poses a problem. The network is insecure, and Eve, who systematically sniffs traffic between Alice & Bob, can get hold of any key they share over the network and read all subsequent communications encrypted with it without having to break the symmetric cipher.

In the early days of public networks, Witfield Diffie & Martin Hellman, then researchers at Stanford's mathematics department, were battling against this fundamental problem - how to enable two complete strangers to bootstrap secure communications over an insecure network. Their work culminated in a seminal paper, titled ``New Directions In Cryptography,'' that laid down the foundations of what has come to be known as asymmetric cryptography. Asymmetric cryptography did not only provide an elegant solution to the aforementioned bootstrapping problem, it simultaneously solved several other problems of digital privacy, including unforgeable signatures and non-repudiation.

In this article we look at the various methods of asymmetric cryptography and Perl modules that provide ready-to-use implementations of these methods.

Trapdoor One-Way Functions

Central to asymmetric cryptography, is the notion of trapdoor one-way functions. One-way functions are easy to compute in one direction but impossibly hard in the reverse direction. Breaking an egg, for example, is a one-way function. Trapdoor one-way functions are much like one-way functions except they are easy to reverse if certain ``trap-door'' information is available.

In an asymmetric cryptosystem, each user has two keys: a public key that others use to encrypt messages to the user and a secret key for decrypting messages encrypted with the public key. (For this reason, asymmetric cryptography is also commonly referred to as public-key cryptography.) The secret key is zealously guarded by the user while its public counterpart is indiscriminately distributed to the world. Unlike symmetric keys, that are usually random numbers of a certain length, asymmetric keys are numbers with special mathematical properties that are computed using a key generation algorithm.

To send a private message to Bob using public-key cryptography, Alice first acquires his public key. Public key exchange can be carried out over an insecure network because the knowledge of Bob's public key does not enable Eve to decrypt messages encrypted with it. Alice then encrypts her message with this key. The encryption operation consists of computing the trapdoor one-way function on the message (in the easy direction), using the public key, to generate the ciphertext. Once generated, Alice sends the ciphertext over to Bob.

Table of Contents

Trapdoor One-Way Functions



Digital Signatures and Non-repudiation

Diffie-Hellman Key Agreement

Digital Signature Algorithm


Future Articles

Related Features

An Introduction to Symmetric Cryptography in Perl

To recover the plaintext, Bob and Eve (who's sniffed a copy of ciphertext off the wire) must reverse the trapdoor one-way function. Bob uses the trapdoor information, his private key, to easily decrypt and read the message. Eve, on the other hand, without the benefit of trapdoor information, is faced with a task much harder than putting a broken egg together.


When Diffie & Hellman proposed their theory of asymmetric cryptography and trapdoor one-way functions, they did not suggest a particular trapdoor one-way function. Rivest, Shamir and Aldeman, then-professors at MIT, were the first to devise such a function around the conjectured difficulty of factoring prime products. Their cryptosystem, RSA, has withstood several years of cryptanalysis and is the most widely used public-key cryptosystem today.

Here's how RSA works: To generate an RSA key pair, Bob chooses two large primes, p & q (each 1024 bits/308 digits long) and computes their product,

    n = pq

Then, he chooses a random number C<e> that is smaller than and relatively prime to C<(p-1)(q-1)>,

    1 < e < (p-1)(q-1)
    gcd(e, ((p-1)(q-1)) = 1

Finally, with the extended euclidean algorithm, he computes,

    ed = 1 mod (p-1)(q-1)

(n,e) becomes his public key, and d the secret key.

Upon getting Bob's public key, Alice encrypts her message with it. To do so, she converts her message into a number, m, between 1 and n, and computes:

    c = (m ^ e) mod n

c is the ciphertext; Alice sends it to Bob, who computes:

    m = (c ^ d) mod n

thereby recovering the original m. He converts m back to text form and reads the message.

Eve, who has been sniffing the traffic between Alice & Bob all this while, has Bob's public key, (n,e), and the encrypted message from Alice, c. To decrypt c, she must learn the value of d. Eve could compute d and decrypt the message if she knew p and q. But to discover p and q, she has to factorize n, which, given the known algorithms for factoring and the current state of computational technology, is considered intractable.


If you are wondering where Perl comes into all this, then the answer is right here. Through extension modules on CPAN, Perl provides efficient, secure and DWIM implementations of several asymmetric cryptosystems. A comprehensive RSA implementation is provided by the Crypt::RSA package.

Here's how Alice sends an RSA encrypted message to Bob using Crypt::RSA:

Bob generates a 2048-bit key-pair:

    $rsa = new Crypt::RSA;
    ($public, $private) = $rsa->keygen( Size => 2048 )

He sends $public to Alice, possible by e-mail, who encrypts her message (a string stored in $message) with it:

    my $c = $rsa->encrypt( Message => $message, Key => $public );

Alice sends the ciphertext, $c, to Bob, who decrypts it with his private key, $private, to recover the original message:

    $message = $rsa->decrypt( Ciphertext => $c, Key => $private );

Usually, Alice & Bob don't need more four or so lines of code to carry out a secure exchange using Crypt::RSA. However, it's instructive to look at what Crypt::RSA does underneath these four lines and how it can be made to do things a little differently.

Crypt::RSA is structured as a bundle of modules that encapsulate different aspects of the RSA cryptosystem. Crypt::RSA itself is a convenient front-end to other modules in the bundle that sits between the user and other modules, and pre-processes data to meet expectations of both.

One of Crypt::RSA's responsibilities is to convert text to numbers and back. Since the RSA algorithm works on numbers smaller than n, plaintext is first converted into a list of numbers smaller than n, that are converted back to text after encryption and concatenated to form the final ciphertext. The same process is carried out at decryption.

The encryption modules of Crypt::RSA additionally carry out a process called ``padding.'' It has been shown that textbook RSA, as described in the previous section, is vulnerable to known plaintext attack. That is, if Eve can force Alice to encrypt certain quantities of known text, then she can recover p and q without having to factor n. To prevent this attack, plaintext blocks must be padded with random data in a special way.

Over the years, a padding standard known as PKCS #1 has come into wide use. Crypt::RSA implements PKCS #1 as well as the newer standard, OAEP. OAEP provides the notion of plaintext-aware encryption, and is recommended for applications that don't need to interoperate with with older PKCS #1 applications. Crypt::RSA uses OAEP padding by default. However, it can be instructed to use PKCS #1 by passing a parameter to new(), like so:

    $rsa = new Crypt::RSA ( ES => 'PKCS1v15 )

ES stands for ``Encryption Scheme'' and PKCS1v15 means the PKCS #1 standard version 1.5 as described in the RFC 2437.

Digital Signatures and Non-repudiation

Encryption provides only half of the solution to digital privacy. Identities in the online world are malleable and can be faked trivially; this necessitates the use of a signature mechanism to ensure that people are who they say they are. When you receive a message, how can you really know that the sender is who he or she purports to be? The message could have been written by someone else entirely; or it could have been intercepted in transit, modified and then passed on to you.

Digital signatures provide you with the tools to unequivocally determine whether a message is legitimate. You can use digital signatures to detect in-transit message tampering and to determine whether a message is truly from the advertised sender. Such signatures are unforgeable guarantees of both a message's true author and of its validity. This concept, known as non-repudiation, also extends in the opposite direction: Since a digital signature provides absolute proof that the sender signed a piece of data, the sender cannot subsequently deny having signed that piece of data. Taken to its logical extension, once an author has signed his work, he can deny neither the validity of the signature nor the authorship of the work.

To digitally sign a message is to apply one's private key to that message. Digital signatures use the same public and private keys that are used for encrypting and decrypting messages; in contrast to the sender using the recipient's public key to encrypt a message, however, when signing the sender uses her private key to sign the message. This works because certain trapdoor functions also work as trapdoor permutations. Once the private key has scrambled the message, the resulting signature can only be unscrambled by the matching public key. If the recipient of the message holds a copy of this public key, he/she can then determine absolutely whether the sender signed it using the private key.

Here is an example:

Alice wishes to send a message to Bob, and wants Bob to be able to verify that she sent the message, and that it has not been modified in transit. After composing the message, Alice ``signs'' that message using her private key, then sends the message and the signature to Bob. This signature is computed over the data in the message, such that it is unique to Alice's private key and the message. In other words, this particular signature can only have been produced by Alice's private key.

To verify the signature, Bob uses his copy of Alice's public key to unscramble the signature; if he produces the original message, he knows that the message has not been tampered with. Meanwhile, if Eve is listening in on the network traffic and sniffs a copy of Alice's message, she can easily modify it and send it on to Bob; but once she has modified the original message, the signature no longer matches the message. When Bob receives the modified message with its original signature, his attempts at verification will fail, and he will know that the message has been tampered with. Eve has no way of ``fixing'' the signature such that it matches the modified message, because doing so requires possession of Alice's private key -- and only Alice has access to that key. Thus Alice's identity and authorship can be guaranteed.

Here's how Alice signs her message with Crypt::RSA:

    $private = new Crypt::RSA::Key::Private( 
                  Filename => "keys/alice.private", 
                  Password => 'alice's passphrase' 
    $rsa = new Crypt::RSA; 
    $signature = $rsa->sign ( Message => $message, Key => $private );

Alice reads her private key from a disk file. Private keys are usually stored in encrypted form (using a symmetric cipher) for additional security, so they must be decrypted before use. Alice provides a password with which the key is decrypted, and reads the key in $private. She then signs her message with the sign() method of Crypt::RSA, and sends $signature and $message to Bob. Bob verifies the signature with the verify() method of Crypt:RSA using Alice's public key:

    $public = new Crypt::RSA::Key::Public( 
                  Filename => "keys/alice.public", 
    $rsa->verify( Message => $message, Signature => $signature,
                  Key => $public ) || 
                  die "Signature doesn't verify!\n";

It should be noted that sign() computes a digest of the message and signs that instead of the original message. This is typically how real-world applications generate digital signatures. Signing the entire message is expensive and doesn't provide any benefit over signing a digest.

Diffie-Hellman Key Agreement

Key agreement is the process of two parties agreeing on a shared encryption key without exposing that key to potential intruders. Key agreement protocols are different from public-key encryption protocols in that they are not meant for encrypting communications; rather they are used to agree upon a secret that can be used for encrypting communications with symmetric ciphers. Shortly after Rivest, Shamir and Aldeman published their RSA paper, Diffie & Hellman published a key agreement protocol that has come to be known as Diffie-Hellman Key Agreement.

Here's how it works:

To begin the key exchange, Alice and Bob choose a property p and a property g; the values for these properties are shared by both parties, and are often computed by a central authority, rather than be either one of the parties. Each party then computes a random private-key integer x, where the length of x is at most (number of bits in p) - 1. Each party then computes a public key y based on g, x, and p; the exact value is:

    y = g^x mod p

The parties exchange these public keys.

The shared secret key is generated from the exchanged public key, the private key, and p. If Bob's public key is denoted y_Bob, then Alice computes the shared secret using the formula

    secret = y_Bob^x mod p

The mathematical principles involved ensure that both parties will generate the same shared secret key. Having computed the shared secret -- again, without that secret ever passing over the insecure network -- Alice and Bob can use the key to encrypt their communications; all future messages between the two (during this session) can be symmetrically encrypted using the shared secret.

More information can be found in PKCS #3 (Diffie-Hellman Key Agreement Standard).

A Perl implementation of Diffie-Hellman key agreement can be found in Crypt::DH. To use Crypt::DH, Alice must agree ahead of time with Bob as to the values for p and g. For example, the SSH-2 protocol, which uses Diffie-Hellman, uses a value of 2 for g, and a value of

      FFFFFFFF FFFFFFFF C90FDAA2 2168C234 C4C6628B 80DC1CD1
      29024E08 8A67CC74 020BBEA6 3B139B22 514A0879 8E3404DD
      EF9519B3 CD3A431B 302B0A6D F25F1437 4FE1356D 6D51C245
      E485B576 625E7EC6 F44C42E9 A637ED6B 0BFF5CB6 F406B7ED
      EE386BFB 5A899FA5 AE9F2411 7C4B1FE6 49286651 ECE65381

for p. We begin using Crypt::DH by initializing a Crypt::DH object with the chosen values for p and g:

    use Crypt::DH;
    my $dh = Crypt::DH->new;

Each party then generates a public and a private key:


After generating the keys, Alice must send her public key to Bob, and Bob must send his public key to Alice, over the network. Once Alice has received Bob's public key, she can generate the shared secret using that public key, and her own private key:

    my $shared_secret = $dh->compute_key( $bob_public_key );

This shared secret is a big integer. Often this big integer is linearized into a string of octets to be used as a key for symmetric encryption between Alice and Bob. Since both parties have generated the same shared secret, Alice is able to decrypt messages that Bob has encrypted with the shared secret, and vice versa.

Digital Signature Algorithm

DSA is the Digital Signature Algorithm, an algorithm developed by the U.S. government for use as the Digital Signature Standard. At the time of its creation, RSA was threatening to become a de facto standard: Microsoft and Lotus, for example, were already including it in their products. RSA provided both signatures and encryption, and this frightened the government: A future where regular citizens might secure their personal communications, preventing government agents from snooping on networks and phone lines, was terrifying to the NSA (National Security Agency). With this in mind, the government pushed for the standardization of the DSA, an algorithm that provided only digital signatures, and could not encrypt messages.

The government wanted cryptography back in the hands of the NSA rather than in a private company, RSA Data Security, Inc. For this reason they pushed DSA as a royalty-free standard, as an alternative to the license-restricted RSA; DSA was not subject to RSA licensing restrictions because it is based not on the RSA algorithm but on the digital signature algorithm published by Tehar Elgamal. DSA is by many opinions an inferior standard to RSA: It is difficult to implement, more complicated mathematically and slower than RSA for signature verification (though faster for signing). More importantly, DSA keys are restricted to a maximum bitsize of 1024. Despite these concerns -- or, perhaps, because of them -- DSA was chosen as the government standard.

Here's how DSA works:

To begin generating a new DSA key pair, Alice generates a large prime q (generated either from a user-supplied seed or from a string of random octets), where

    2^159 < q < 2^160

The same seed is then used to construct a prime p such that q divides p-1 and

    2^(bits-1) < p < 2^(bits)

where bits is the number of bits in the key; the bitsize can range from 512 to 1024.

A number g is then chosen such that

    g = h^((p-1)/q) mod p

where h is any integer greater than 1, less than p-1. A typical value for h is 2.

Finally, two numbers x and y are generated that represent the private and public portions of the key, respectively. x is a randomly generated integer such that x is greater than 0 and less than q; y is derived as follows:

    y = g^x mod p

A DSA public key consists of the values of p, q, g and y; a private key consists of those values, and the value of x. A DSA signature consists of two values, r and s, generated by applying the private DSA key to a SHA-1 hash of a message. To verify a message, the verifier uses the original message and the sender's public key to mathematically backtrack from the value of s to arrive at a final value v; if v is equal to r, then the signature is valid. Otherwise, it is invalid.

Perl and Crypt::DSA, a pure Perl implementation of DSA, make digital signatures with DSA very simple. Alice generates a new DSA key:

    use Crypt::DSA;
    my $dsa = Crypt::DSA->new;
    my $key = $dsa->keygen( Size => 512 );

After sending the public portion of this key to Bob, she signs her message (a string stored in $message) with her private key:

    my $sig = $dsa->sign( Message => $message, Key => $key );

She then sends the signature $sig and the message $message to Bob, who uses Alice's public key to verify the signature:

    my $valid = $dsa->verify( Signature => $sig, Message => $message,
                              Key => $alice_pub_key );

If $valid is true, then the signature $sig is valid.

Meanwhile, if Eve has sniffed the network traffic to intercept the message and signature in transit to Bob, then she can modify the message $message; but in order to regenerate the signature $sig, Eve must have access to Alice's private key. Since Eve is not in possession of that key, the signature will not match the message and signature verification will fail.


Recommended Reading

The sci.crypt FAQ

Diffie & Hellman, New Directions in Cryptography

Kaliski & Staddon, PCKS #1, RSA Cryptography Specifications

PKCS #3, Diffie-Hellman Key Agreement Standard

FIPS 186-2, Digital Signature Standard

Like DH and DSA, Schnorr is another signature algorithm based on the difficulty of the discrete logarithm problem. Schnorr was specifically designed in 1991 by Claus-Peter Schnorr for use in smart cards. Schnorr is both a signature scheme and an interactive identification scheme that makes extensive use of lookup tables for generating signatures, thereby minimizing the computational effort on part of the signature computing device. A Perl implementation of Schnorr is provided by the Crypt::Schnorr::AuthSign module. For usage information, we refer the readers to module's POD documentation.

Future Articles

This concludes part one in a series of three articles on Asymmetric Cryptography in Perl. Part two will discuss asymmetric crypto in the real world: systems and protocols such as SSH, PGP and SSL that are applications of asymmetric cryptography you use everyday. Along with a discussion of these applications, we will also look at the computational and security contraints that go in the design of real-world cryptography systems. Part three will discuss the building blocks of asymmetric cryptography, which will cover several interesting topics such as prime-number generation, bitfield/buffer manipulations, large number mathematics and number theory, all of which have comprehensive implementations in Perl.

Parrot : Some Assembly Required

Last week, the first public version of Parrot was released. This week, we're going to take a close look at what Parrot is, how you can get hold of it and play with it, and what we intend for Parrot in the future.

What Is Parrot?

First, though, what is Parrot, and why are we making such a fuss about it? Well, if you haven't been living in a box for the past year, you'll know that the Perl community has embarked on the design and implementation of a new version of Perl, both the language and the interpreter.

Parrot is strongly related to Perl 6, but it is not Perl 6. To find out what it actually is, we need to know a little about how Perl works. When you feed your program into perl, it is first compiled into an internal representation, or bytecode; then this bytecode is fed to almost separate subsystem inside perl to be interpreted. So there are two distinct phases of perl's operation: compilation to bytecode, and interpretation of bytecode. This is not unique to Perl; other languages following this design include Python, Ruby, Tcl and, believe it or not, even Java.

In previous versions of Perl, this arrangement has been pretty ad hoc: There hasn't been any overarching design to the interpreter or the compiler, and the interpreter has ended up being pretty reliant on certain features of the compiler. Nevertheless, the interpreter (some languages call it a Virtual Machine) can be thought of as a software CPU - the compiler produces "machine code" instructions for the virtual machine, which it then executes, much like a C compiler produces machine code to be run on a real CPU.

Perl 6 plans to separate the design of the compiler and the interpreter. This is why we've come up with a subproject, which we've called Parrot that has a certain, limited amount of independence from Perl 6. Parrot is destined to be the Perl 6 Virtual Machine, the software CPU on which we will run Perl 6 bytecode. We're working on Parrot before we work on the Perl 6 compiler because it's much easier to write a compiler once you have a target to compile to!

The name "Parrot" was chosen after this year's April Fool's Joke, which had Perl and Python collaborating on the next version of their interpreters. This is meant to reflect the idea that we'd eventually like other languages to use Parrot as their VM; in a sense, we'd like Parrot to become a "common language runtime" for dynamic languages.

Where We're At

After the release last Monday, we've seen a huge amount of activity on the development list, with more than 100 CVS commits in the past week. However, it should be stressed we're still in the early stages of development.

But don't let that put you off! Parrot is still very much usable; we've already seen one mini-language emerge that compiles down to Parrot bytecode (more on that later) and Leon Brocard has been working on automatically converting Java bytecode to Parrot.

At the moment, it's possible to write simple programs in Parrot assembly language, use an assembler to convert them to machine code and then execute them on a test interpreter. We have support for a wide variety of ordinary and transcendental mathematical operations, some rudimentary string support and some conditional operators.

How to Get It

So let's get ourselves a copy of Parrot, so that we can start investigating how to program in the Parrot assembler.

We could get the initial release from CPAN, but an awful lot has changed since then. To really keep up to date with Parrot, we should get our copy from the CVS repository. Here's how we do that:

% cvs -d login
(Logging in to
CVS password: [ and here we just press return ]
% cvs -d co parrot
cvs server: Updating parrot
U parrot/.cvsignore
U parrot/

For those of you who can't use CVS, there are CVS snapshots built every six hours that you can find here.

Now we have downloaded Parrot, we need to build it; so:

% cd parrot
% perl
Parrot Configure
Copyright (C) 2001 Yet Another Society

Since you're running this script, you obviously have
Perl 5 -- I'll be pulling some defaults from its configuration.
You'll then be asked a series of questions about your local configuration; you can almost always hit return for each one. Finally, you'll be told to type make test_prog; with any luck, Parrot will successfully build the test interpreter. (If it doesn't, the address to complain to is at the end of the article ...)

The Test Suite

Now we should run some tests; so type make test and you should see a readout like the following:

perl t/harness
t/op/basic.....ok, 1/2 skipped:  label constants unimplemented in
t/op/string....ok, 1/4 skipped:  I'm unable to write it!
All tests successful, 2 subtests skipped.
Files=2, Tests=6,  2 wallclock secs ( 1.19 cusr +  0.22 csys =  1.41 CPU)

(Of course, by the time you read this, there could be more tests, and some of those which skipped might not skip - but none of them should fail!)

Parrot Concepts

Before we dive into programming Parrot assembly, let's take a brief look at some of the concepts involved.


The Parrot CPU has four basic data types:

An integer type; guaranteed to be wide enough to hold a pointer.
An architecture-independent floating-point type.
An abstracted, encoding-independent string type.
A scalar.

The first three types are pretty much self-explanatory; the final type, Parrot Magic Cookies, are slightly more difficult to understand. But that's OK, because they're not actually implemented yet! We'll talk more about PMCs at the end of the article.


The current Perl 5 virtual machine is a stack machine - it communicates values between operations by keeping them on a stack. Operations load values onto the stack, do whatever they need to do and put the result back onto the stack. This is easy to work with, but it's slow: To add two numbers together, you need to perform three stack pushes and two stack pops. Worse, the stack has to grow at runtime, and that means allocating memory just when you don't want to be allocating it.

So Parrot's going to break with the established tradition for virtual machines, and use a register architecture, more akin to the architecture of a real hardware CPU. This has another advantage: We can use all the existing literature on how to write compilers and optimizers for register-based CPUs for our software CPU!

Parrot has specialist registers for each type: 32 IV registers, 32 NV registers, 32 string registers and 32 PMC registers. In Parrot assembler, these are named I1...I32, N1...N32, S1...S32, P1...P32.

Now let's look at some assembler. We can set these registers with the set operator:

    set I1, 10
    set N1, 3.1415
    set S1, "Hello, Parrot"

All Parrot ops have the same format: the name of the operator, the destination register and then the operands.


There are a variety of operations you can perform: the file docs/parrot_assembler.pod documents them, along with a little more about the assembler syntax. For instance, we can print out the contents of a register or a constant:

    print "The contents of register I1 is: "
    print I1
    print "\n"

Or we can perform mathematical functions on registers:

    add I1, I1, I2  # Add the contents of I2 to the contents of I1
    mul I3, I2, I4  # Multiply I2 by I4 and store in I3
    inc I1          # Increment I1 by one
    dec N3, 1.5     # Decrement N3 by 1.5

We can even perform some simple string manipulation:

    set S1, "fish"
    set S2, "bone"
    concat S1, S2       # S1 is now "fishbone"
    set S3, "w"
    substr S4, S1, 1, 7
    concat S3, S4       # S3 is now "wishbone"
    length I1, S3       # I1 is now 8


Code gets a little boring without flow control; for starters, Parrot knows about branching and labels. The branch op is equivalent to Perl's goto:

         branch TERRY
JOHN:    print "fjords\n"
         branch END
MICHAEL: print " pining"
         branch GRAHAM
TERRY:   print "It's"
         branch MICHAEL
GRAHAM:  print " for the "
         branch JOHN
END:     end

It can also perform simple tests to see whether a register contains a true value:

         set I1, 12
         set I2, 5
         mod I3, I2, I2
         if I3, REMAIND, DIVISOR
REMAIND: print "5 divides 12 with remainder "
         print I3
         branch DONE
DIVISOR: print "5 is an integer divisor of 12"
DONE:    print "\n"

Here's what that would look like in Perl, for comparison:

    $i1 = 12;
    $i2 = 5;
    $i3 = $i1 % $i2;
    if ($i3) {
      print "5 divides 12 with remainder ";
      print $i3;
    } else {
      print "5 is an integer divisor of 12";
    print "\n";

And speaking of comparison, we have the full range of numeric comparators: eq, ne, lt, gt, le and ge. Note that you can't use these operators on arguments of disparate types; you may even need to add the suffix _i or _n to the op to tell it what type of argument you are using - although the assembler ought to divine this for you, by the time you read this.

Some Parrot Programs

Now let's look at a few simple Parrot programs to give you a feel for the language.

Displaying the Time

This little program displays the Unix epoch time every second: (or so)

        set I3, 3000000
REDO:   time I1
        print I1
        print "\n"
        set I2 0
SPIN:   inc I2
        le I2, I3, SPIN, REDO

First, we set integer register 3 to contain 3 million - that's a completely arbitrary number, due to the fact that Parrot averages a massive 6 million ops per second on my machine. Then the program consists of two loops. The outer loop stores the current Unix time in integer register 1, prints it out, prints a new line and resets register 2 to zero. The inner loop increments register 2 until it reaches the 3 million we stored in register 3. When it is no longer less than (or equal to) 3 million, we go back to the REDO of the outer loop. In essence, we're just spinning around a busy loop to waste some time. This is only because Parrot doesn't currently have a "sleep" op; we'll see how to implement one later.

How do we run this? First, we need to assemble this into Parrot bytecode, with the provided. So, copy the assembler to a file showtime.pasm, and inside your Parrot directory, run:

      perl showtime.pasm > showtime.pbc

(.pbc is the file extension for Parrot bytecode.)

Finding a Fibonacci number

The Fibonnaci series is defined like this: take two numbers, 1 and 1. Then repeatedly add together the last two numbers in the series to make the next one: 1, 1, 2, 3, 5, 8, 13, and so on. The Fibonacci number fib(n) is the n'th number in the series. Here's a simple Parrot assembler program that finds the first 20 Fibonacci numbers:

# Some simple code to print some Fibonacci numbers
# Leon Brocard <>

        print   "The first 20 fibonacci numbers are:\n"
        set     I1, 0
        set     I2, 20
        set     I3, 1
        set     I4, 1
REDO:   eq      I1, I2, DONE, NEXT
NEXT:   set     I5, I4
        add     I4, I3, I4
        set     I3, I5
        print   I3
        print   "\n"
        inc     I1
        branch  REDO
DONE:   end

This is the equivalent code in Perl:

        print "The first 20 fibonacci numbers are:\n";
        my $i = 0;
        my $target = 20;
        my $a = 1;
        my $b = 1;
        until ($i == $target) {
           my $num = $b;
           $b += $a;
           $a = $num;
           print $a,"\n";

(As a fine point of interest, one of the shortest and certainly the most beautiful ways of printing out a Fibonacci series in Perl is perl -le '$b=1; print $a+=$b while print $b+=$a'.)


So much for programming in assembler; let's move on and look at a medium-level language - Jako. Jako was written by Gregor Purdy who obviously got sick (as a parrot) of programming in assembler. Jako looks a little bit like C and a little bit like Perl, and it can do anything you can do in Parrot assembler, but a little tidier. Let's try that Fibonacci program again:

        print("The first 20 fibonacci numbers are:\n");
        var int i = 0;
        var int target = 20;
        var int a = 1;
        var int b = 1;
        var int num;
        while (i != target) {
           num = b;
           b += a;
           a = num;

Notice how similar this is to the Perl version? I stripped away the $ sigils, replaced the Perlish until with a more common while, replaced my with var int and I was nearly done.

The Jako compiler, jakoc, ships with Parrot in the little_languages subdirectory:

 % perl little_languages/jakoc fib.jako > fib.pasm
 % perl fib.pasm > fib.pbc
 % ./test_prog fib.pbc
The first 20 fibonacci numbers are:

Where Next?

Parrot is obviously developing very rapidly, and we still have a long way to go before we are ready for a compiler to this platform. This section is for those who are interested in helping us take Parrot further.

Adding Operations

The first thing we need is a lot more operations; but this needs to be carefully thought out, and all new proposals for operations should be checked by Dan Sugalski, the Parrot designer.

That said, adding operations to Parrot is actually simple. Let's add the sleep operator we were complaining about earlier.

To add an operator to the Parrot core, we need to edit two files: opcode_table, which contains the description of operations and their arguments, and basic_opcodes.ops, which contains the C implementation for our opcodes.

Although we've been able to say print "String" and print I3 to print a register, Parrots ops are not polymorphic - this is some trickery carried out by the assembler. Those two operations would be implemented by two different ops, print_sc to print a string constant, and print_i to print an integer register. So we'll add two ops sleep_i to sleep for the number of seconds determined by the contents of an integer register, and sleep_ic, to sleep for a constant number of seconds. Each op has one argument. At the top of opcode_table there is a list of argument types:

# Revised arg types:
#      i       Integer constant
#      I       Integer register
#      n       Numeric constant
#      N       Numeric register
#      s       String constant?
#      S       String register
#      D       Destination

So sleep_ic has 1 argument, i, and sleep_i has 1 argument, I. Let's add these into opcode_table in the "Miscellaneous and debugging ops" category:

 # Miscellaneous and debugging ops
 time_i              1                   I
 print_i             1                   I
 print_ic            1                   i
 time_n              1                   N
 print_n             1                   N
 print_nc            1                   n
+sleep_i             1                   I
+sleep_ic            1                   i

And now we need to implement them by adding to basic_opcodes.ops. The format of this file is a little funny; it's C, which is preprocessed by a Perl program. The C functions should be declared AUTO_OP, which means that the preprocessor will automatically work out the return address of the next op in the bytecode stream. (branch operators need special treatment, and as such are MANUAL_OPs) Parameters are denoted by P1, P2 and so on - they aren't actual parameters to the C function, but are pulled out of the bytecode stream. Finally, we can access register n by saying INT_REG(n), NUM_REG(n) and so on.

So let's do the constant sleep op first. We want to take the first parameter, P1, and pass it to sleep:

AUTO_OP sleep_ic {

That was easy. The second op is only slightly more complex. We have to use INT_REG to retrieve the contents of the register:

AUTO_OP sleep_i {

All that's missing is a test suite (see t/op/basic.t for an example) and some documentation (we need to add entries to docs/parrot_assembly.pod) and we've added our instructions to the Parrot CPU. The assembler will automatically determine whether we're sleeping for a constant time or a variable time, and will dispatch the right op when we just say sleep. Now we can rewrite the showtime code a little more neatly - or rather, you can, as a nice little exercise!

Vtable Datatypes

The next major thing that Parrot needs to implement is PMCs; these are almost like Perl 5's SVs, only more so. A PMC is an object of some type, which can be instructed to perform various operations. So when we say

      inc P1

to increment the value in PMC register 1, the increment method is called on the PMC - and it's up to the PMC how it handles that method.

PMCs are how we plan to make Parrot language-independent - a Perl PMC would have different behavior from a Python PMC or a Tcl PMC. The individual methods are function pointers held in a structure called a vtable, and each PMC has a pointer to the vtable which implements the methods of its "class." Hence a Perl interpreter would link in a library full of Perl-like classes and its PMCs would have Perl-like behavior.

We've already designed the vtables and the structure of PMCs, but sitting down and implementing them is one of the priorities for Parrot development that will make it truly useful.

More Todos

There are a huge number of things we want to do with Parrot: We need, for instance, to create some I/O operators to make programs actually interesting; we want to create a range of string functions to deal with various encodings and convert between them; we want more documentation; we really, really need more tests; we want to check Parrot's portability to various platforms; and finally, there are more ops that we need to implement.

Getting Involved

We have a good number of people working on Parrot, but we could always use a few more. To help out, you'll need a subscription to the perl6-internals mailing list, where all the development takes place; you should keep up to date with the CVS version of Parrot - if you want to be alerted to CVS commits, you can subscribe to the cvs-parrot mailing list. CVS commit access is given to those who taken responsibility for a particular area of Parrot, or who often commit high-quality patches.

A useful Web page is, which reminds you how to use CVS and allows you to browse the CVS repository; the code page is a summary of this information and other resources about Parrot.

So don't delay - pick up a Parrot today!

wxPerl: Another GUI for Perl


If you don't just use Perl for creating CGI scripts, you'll probably have to create some kind of front-end for your applications sooner or later. You might use the Curses library, but if you want a nice GUI, you will probably use Tk. It's certainly the most stable, best documented and widest used GUI that's available for Perl. However, more and more people are using other GUIs, such as Gtk and Win32::GUI. The main reason for this is probably that Tk does not have the slickest interface that exactly matches the environment that people use. Tk has a motif-like interface, while Gnome users will want the Gtk look-and-feel, and Windows users will want the Windows look-and-feel. Of course, Tk looks more like Windows when you use it on a Win32 machine and looks more like Gtk when you run it under Gnome, but still it is a different interface.

Recently, I discovered another GUI for Perl. No, not the unmaintained Qt and FWTK modules, but wxPerl, which is being developed by Mattia Barbon. It's not on CPAN yet (he's working on that now) but it is on Sourceforge ( wxPerl is the Perl binding for wxWindows (, which is a cross-platform GUI library for C++. When I say cross-platform, I indeed mean cross-platform: There is wxWindows for Windows, Gtk, Motif and Macintosh. wxWindows has been developed since 1992 with version 2 (the current version) being developed since 1997. It's not a GUI that has been ported from a certain platform where it had its roots: wx stands for Windows and X -- it has been designed to be cross-platform. Also it has been around for a while so it has had the chance to become a stable product.

The wxPerl approach

wxPerl has a very rich set of standard widgets (called ``controls'' in Wx-terms), ranging from simple buttons to complex HTML windows and Font dialogs. This makes it a good GUI to create full-featured applications. All needed controls are available ``off the shelf,'' and if there is still a complex control you want to create yourself, then you can do so with little effort. This is a particularly big difference from, for example, Tk, where it is a real pain to define new widgets.

Programming wxPerl is different. It's not better or worse than the Tk or Gtk interfaces, but it's a totally different approach. The main reason for this is the source of the library, which is a C++ library. It is less Perlish - but it's more OO.

When you want to create a new wxPerl application you start creating a new class that inherits from Wx::App. This subclass has to have at least one method called OnInit which defines the windows (called ``Frames'' in Wx-terms) the application uses. If you want a default window, you use the default classes. If you want to add controls to a window, you subclass a default windowclass and add the controls to it.

This is a much more object-oriented approach than Tk and Gtk use. But unfortunately it lacks the named parameter approach Tk uses, which makes Tk look more Perlish.

Currently, there is one big disadvantage to wxPerl: It is very poorly documented. That is to say: wxWindows has lots of documentation. And if you try hard enough, then you can use that documentation for wxPerl. That takes quite a bit of effort, though. But after reading this first wxPerl tutorial you might become interested and find your own way into wxPerl.

Hello World!

Like every tutorial, this tutorial has its own ``Hello World!'' application to get started and create the first application. Examine this script:

   =1= #!/usr/bin/perl -w
   =2= use strict;
   =3= use Wx;
   =5= ###########################################################
   =6= #
   =7= # Define our HelloWorld class that extends Wx::App
   =8= #
   =9= package HelloWorld;
  =11= use base qw(Wx::App);   # Inherit from Wx::App
  =13= sub OnInit
  =14= # Every application has its own OnInit method that will
  =15= # be called when the constructor is called.
  =16= {
  =17=    my $self = shift;
  =18=    my $frame = Wx::Frame->new( undef,         # Parent window
  =19=                                -1,            # Window id
  =20=                                'Hello World', # Title
  =21=                                [1,1],         # position X, Y
  =22=                                [200, 150]     # size X, Y
  =23=                              );
  =24=   $self->SetTopWindow($frame);    # Define the toplevel window
  =25=   $frame->Show(1);                # Show the frame
  =26= }
  =28= ###########################################################
  =29= #
  =30= # The main program
  =31= #
  =32= package main;
  =34= my $wxobj = HelloWorld->new(); # New HelloWorld application
  =35= $wxobj->MainLoop;

Like every well-written Perl application, this one also begins with -w and use strict. After that we use Wx, the main wxPerl module. On line 9 we start our package HelloWorld, which inherits (line 11) from Wx::App, like all wxPerl applications. This new application now needs to have an OnInit method that defines the Frames (line 18) and defines which of the Frames is the TopWindow (line 24). Finally, we call the Show method (line 25), which makes the created $frame visible.

The frame is created using a few parameters. The first is the parent window. If we were creating two frames, then the second one could be appointed as a child of the first by using $frame as the first parameter of the constructor of the second frame. But in our example we have only one window, so the parent window is undef.

At this moment we don't care about the second parameter (the window id, -1 means the default value), but the third and fourth are more interesting. They define the position on the screen and the size respectively. These parameters are passed as array references, and could also be the predefined wxDefaultPosition and wxDefaultSize respectively.

After defining the HelloWorld package, we have to create the main program by defining the main package (line 32). This package creates a new Wx object (line 34) out of our defined HelloWorld package and then calls the MainLoop method on it (line 35).

The MainLoop is the only thing that resembles to the Tk and Gtk GUIs. The whole approach of defining a new subclass of Wx::App is totally different.

When you execute this first example, it will look like this:

Hello World

Fill the empty window.

So this was a simple example that creates an empty window with a title named ``Hello World!'' Not really exciting, huh? Now we want to see more controls in the window. Let's see how we can add a useless button that does nothing and a piece of text on the screen.

We need to have a bit of background information on how wxPerl applications (and wxWindows applications) work in general, before we can create something inside the window. As you saw in the previous example, to create an application, we need to subclass Wx::App. To create our own contents in a frame, we first need to subclass Wx::Frame and create an instance of that subclassed frame in the OnInit method of the newly created subclass of Wx::App.

To put controls in our subclassed frame, you first have to create a Panel inside that frame, since controls can only be placed on an instance of Wx::Panel. To be able to access and modify properties of the Panel and other things that you want to put inside a Frame, you will have make those items objects of the Frame.

That's a lot of (potentially) confusing information. Let's take this example:

   =1= #!/usr/bin/perl -w
   =2= use strict;
   =3= use Wx;
   =5= ###########################################################
   =6= #
   =7= # Extend the Frame class to our needs
   =8= #
   =9= package MyFrame;
  =11= use base qw(Wx::Frame); # Inherit from Wx::Frame
  =13= sub new
  =14= {
  =15=     my $class = shift;
  =16=     my $self = $class->SUPER::new(@_); # call the superclass' constructor
  =18=     # Then define a Panel to put the button on
  =19=     my $panel = Wx::Panel->new( $self,  # parent
  =20=                                 -1      # id
  =21=                               );
  =22=     $self->{txt} = Wx::StaticText->new( $panel,             # parent
  =23=                                         1,                  # id
  =24=                                         "A buttonexample.", # label
  =25=                                         [50, 15]            # position
  =26=                                        );
  =27=     $self->{btn} = Wx::Button->new(     $panel,             # parent
  =28=                                         1,                  # id
  =29=                                         ">>> Press me <<<", # label
  =30=                                         [50,50]             # position
  =31=                                        );
  =32=     return $self;
  =33= }
  =35= ###########################################################
  =36= #
  =37= # Define our ButtonApp class that extends Wx::App
  =38= #
  =39= package ButtonApp;
  =41= use base qw(Wx::App);   # Inherit from Wx::App
  =43= sub OnInit
  =44= {
  =45=     my $self = shift;
  =46=     my $frame = MyFrame->new(    undef,         # Parent window
  =47=                                  -1,            # Window id
  =48=                                  'Button example', # Title
  =49=                                  [1,1],         # position X, Y
  =50=                                  [200, 150]     # size X, Y
  =51=                                );
  =52=     $self->SetTopWindow($frame);    # Define the toplevel window
  =53=     $frame->Show(1);                # Show the frame
  =54= }
  =56= ###########################################################
  =57= #
  =58= # The main program
  =59= #
  =60= package main;
  =62= my $wxobj = ButtonApp->new(); # New ButtonApp application
  =63= $wxobj->MainLoop;

You can see here that again we define a subclass of Wx::App called ButtonApp (line 39). Only this time the created frame is not a Wx::Frame instance, but a MyFrame instance. This MyFrame is a new subclass of Wx::Frame that we define in line 9.

Basically we only have to override the new constructor of Wx::Frame. We want to extend the Wx::Frame class, so our constructor first calls its SUPERclass' constructor, and defines its extensions after that. Our extensions consist of a new Panel (line 19), which has a StaticText (line 22) and a Button (line 27) on it. Just like the original Wx::Frame class would do, our constructor also returns $self (line 32), which finishes the definition of MyFrame.

As you can see, we've defined the Button and the StaticText objects as attributes of MyFrame. This is not strictly neccesary now, but if we want to add some interaction to this script, which we will do in the next example, we want to access those objects. Since they're now stored as attributes of MyFrame we can access the Button and StaticText everywhere we have access to the MyFrame object. So it's just a matter of style that it's stored this way here, because we don't actually do anything with it in this example.

When you execute this example it will look like this:

Button example

Adding interaction

But what does it do? Err ... it does nothing - yet. But a GUI application without interaction is useless. So we're going to implement some interaction. I already explained in the previous example: If you want to change the properties of the defined objects, then you will have to define them as attributes of the Frame object. That way you can always access any attribute of the object, be it a StaticText, a Button or a Menu.

Consider the following code:

   =1= #!/usr/bin/perl -w
   =2= use strict;
   =3= use Wx;
   =5= ###########################################################
   =6= #
   =7= # Extend the Frame class to our needs
   =8= #
   =9= package MyFrame;
  =11= use Wx::Event qw( EVT_BUTTON );
  =13= use base qw/Wx::Frame/; # Inherit from Wx::Frame
  =15= sub new
  =16= {
  =17=  my $class = shift;
  =19=  my $self = $class->SUPER::new(@_);  # call the superclass' constructor
  =21=     # Then define a Panel to put the button on
  =22=  my $panel = Wx::Panel->new( $self,  # parent
  =23=                              -1      # id
  =24=                            );
  =26=  $self->{txt} = Wx::StaticText->new( $panel,             # parent
  =27=                                      1,                  # id
  =28=                                      "A buttonexample.", # label
  =29=                                      [50, 15]            # position
  =30=                                     );
  =32=  my $BTNID = 1;  # store the id of the button in $BTNID
  =34=  $self->{btn} = Wx::Button->new(     $panel,             # parent
  =35=                                      $BTNID,             # ButtonID
  =36=                                      ">>> Press me <<<", # label
  =37=                                      [50,50]             # position
  =38=                                     );
  =40=  EVT_BUTTON( $self,          # Object to bind to
  =41=              $BTNID,         # ButtonID
  =42=              \&ButtonClicked # Subroutine to execute
  =43=             );
  =45=  return $self;
  =46= }
  =48= sub ButtonClicked 
  =49= { 
  =50=  my( $self, $event ) = @_; 
  =51=  # Change the contents of $self->{txt}
  =52=  $self->{txt}->SetLabel("The button was clicked!"); 
  =53= } 
  =55= ###########################################################
  =56= #
  =57= # Define our ButtonApp2 class that extends Wx::App
  =58= #
  =59= package ButtonApp2;
  =61= use base qw(Wx::App);   # Inherit from Wx::App
  =63= sub OnInit
  =64= {
  =65=     my $self = shift;
  =66=     my $frame = MyFrame->new(   undef,         # Parent window
  =67=                                 -1,            # Window id
  =68=                                 'Button interaction example', # Title
  =69=                                 [1,1],         # position X, Y
  =70=                                 [200, 150]     # size X, Y
  =71=                                );
  =72=     $self->SetTopWindow($frame);    # Define the toplevel window
  =73=     $frame->Show(1);                # Show the frame
  =74= }
  =76= ###########################################################
  =77= #
  =78= # The main program
  =79= #
  =80= package main;
  =82= my $wxobj = ButtonApp2->new(); # New ButtonApp application
  =83= $wxobj->MainLoop;

This example is basically the same as the previous one, but the main difference here is the addition of some interaction. In the previous example, nothing happened when you tried to click the button. This time clicking the button will alter the text of the StaticText object. Let's see what has been changed in the code:

First of all, we use Wx::Event and import EVT_BUTTON. EVT_BUTTON is the event handling subroutine for button-events. There are many more event handlers available, but we only need this one now.

On line 31 I'm introducing a variable to hold the button id called $BTNID. I could still have used the hard-coded 1 I used in the previous example, but by using this variable it will be clearer to see where I'm referring to it. For example, it's needed for the EVT_BUTTON we call at line 40. This is where we define what to do when the button is clicked. It takes the $self object, the $BTNID and a subroutine reference as parameters. On line 48 we define that subroutine.

An event callback in wxPerl always takes two parameters: the first is the object to which it belongs (which caused the event to happen) and the second is the event object itself. In our case we don't need that second parameter, but we do need the first, because we want to change the text of the StaticText object. This is the place where we see the use of defining the StaticText object as attribute of the MyFrame object. We can now simply call the SetLabel method on that attribute (line 52).

Before we press the button, the window will look like the one in the previous example. After we press the button, the application window will look like this:

Button interaction


I've shown a bit of the way wxPerl works. More precisely, I've shown how you can work with wxPerl. It's obvious that this is a different approach from other GUIs. I admit that at first I myself thought this was an unnatural way of programming Perl, not to mention programming Perl GUIs. But having done some exercises, I get the feeling this is in fact a more natural approach than Tk or Gtk use. Of course, it all comes down to a matter of taste. And there's no accounting for taste.

In the next wxPerl tutorial, I will show you how to create menus, show some more event handling and I'll even add some more advanced controls. But the goal will be the same: to show you the hidden beauties of wxPerl!

This Week on Perl 6 (2 - 8 September 2001)


This Week in Perl 6 News

%MY:: Goodness


Documents Released

Math Functions To Add

Parroty Bits

You can subscribe to an email version of this summary by sending an empty message to

Please send corrections and additions to

It was a busy week in the Perl 6 community with 363 messages contributed by 42 authors across 32 threads. A fourth of the threads comprised over three-fourths of the traffic.

%MY:: Goodness

There were two huge discussions on the new %MY:: interface to the lexical symbol table.

(70 posts) The major thread, started by Ken Fox, centered around %MY:: as a language feature to be used and abused.

Is stuff like:

%MY::{'$lexical_var'} = \$other_var;

supposed to be a compile-time or run-time feature?

Modifying the caller's environment:

$lexscope = caller().{MY};

$lexscope{'&die'} = &die_hard;

is especially annoying because it means that I can't trust lexical variables anymore. The one good thing about Damian's caller() example is that it appears in an import() function. That implies compile-time, but isn't as clear as Larry's Apocalypse.

This feature has significant impact on all parts of the implementation, so it would be nice if a little more was known. A basic question: how much performance is this feature worth?

Most of the discussions addressed adjusting lexical variables during runtime, and what semantics that would change with what is currently Perl 5's lexical variables. Of particular concern, how runtime adjustment of lexical variables could defeat the current compile-time optimizations for variable resolution that Perl 5 currently enjoys, and whether %MY:: symbol resolution is confined to a physical scope, whereas a lexical $x may refer to a lexical in an outer scope. These issues are being mulled over.

(45 posts) Brent Dax also asked whether %MY:: should be a real symbol table instead of the current scratchpad structure currently used by Perl 5. There was a lot of debate on the differences between lexical and local global variables, and whether that distinction would help or hinder a transition to a true symbol table. Much of this decision will be affected by the linguistic questions discussed above.


(20 posts) I proposed a method for runtime prototype checking and value assignment that was generally accepted.

Documents Released

(71 posts) Simon Cozens released an overview of the Parrot interpreter. This is mostly codifying and coalescing much of the information that has been presented before. Feedback has been rolled into the docs that will be provided with the first Parrot release, so you can catch the updated info there.

Later stages of the thread turned into a debate between Paolo Molaro and Dan Sugalski, centered once again around the decision to do a register-based virtual machine.

(5 posts) Dan Sugalski re-released the second version of PDD 6: Parrot Assembly Language.

(3 posts) Dave Storrs released the next version of his Perl 6 Debugger API PDD.

(16 posts) I released versions one and two of my "Statements and Blocks" language specification.

Math Functions To Add

(30 posts) Dan Sugalski queried the Perl community for math functions to add to Parrot.

Parroty Bits

By the time you read this, the initial Parrot baseline should be available via anonymous CVS. You may find details here. Simon Cozens holds the source pumpkin.

Bryan C. Warnock

Changing Hash Behaviour with tie


In my experience, hashes are just about the most useful built-in datatype that Perl has. They are useful for so many things - from simple lookup tables to complex data structures. And, of course, most Perl Objects have a blessed hash as their underlying implementation.

The fact that they have so many uses must mean the Larry and the Perl5 Porters must have got the functionality of hashes pretty much right when designing them - it's simple, instinctive and effective. But have you ever come across a situation where you wanted to change the way that hashes worked? Perhaps you wanted hashes that only had a fixed set of keys. Faced with this requirement, it's tempting to move away from the hash interface completely and use an object. The downside to this decision is that you lose the easy-to-understand hash interface. But using tied variables it is possible to create an object and still use it like a hash.

Tied Objects

Tied objects are, in my opinion, an underused feature of Perl. The details (together with some very good examples) are in perltie and there are some extended examples in the ``Tied variables'' chapter of Programming Perl. Despite all of this great documentation, most people seem to believe that tieing is only used to tie a hash to a DBM file. The truth is that any type of Perl data structure can be tied to just about anything. It's simply a case of writing an object that includes certain pre-defined methods. If you want to create a tied object that emulates a standard Perl object most of the time, then it's even easier, as the Perl distribution contains modules that define objects that mimic the behavior of that standard data types. For example, there is a class called Tie::StdHash (in the file Tie::Hash) that mimics the behavior of a real hash. To alter that behavior we simply have to subclass Tie::StdHash and override the methods that we're interested in.

Using Tied Objects

In your Perl program, you make use of a tied object by calling the tie function. tie takes two mandatory parameters: the variable that you are tieing and the name of the class to tie it to, followed by any number of optional paramters. For example, if we had written the hash with fixed keys discussed earlier (which we will do soon), we could use the class in our program like this:

  use Tie::Hash::FixedKey;

  my %person;

  my @keys = qw(forename surname date_of_birth gender);

  tie %person, 'Tie::Hash::FixedWidth', @keys;

After running this code, %person can still be used like a hash, but its behavior will have been changed. Any attempt to assign a value to a key outside the list that we used in the call to tie will fail in some way that we get to specify when we write the module.

If for some reason we wanted to get to the underlying object that is tied to the hash, then we can use the tied function. For example,

  my $obj = tied(%person);

will give us back the Tie::Hash::FixedKeys object that is tied to our %person hash. This is sometimes used to extend the functionality in ways that aren't available through the standard hash interface. In our fixed keys example, we might want the user to be able to extend or reduce the list of valid keys. There is no way to do this in the standard hash interface so we would need to add new methods called, say, add_keys and del_keys, which can be called like this:

  tied(%person)->add_keys('weight', 'height');

When you have finished with the tied object and want to return it to being an ordinary hash, you can use the untie function. For example,

  untie %person;

returns %person to being an ordinary hash.

To tie an object to a Perl hash, your object needs to define the following set of methods. Notice that they are all named in upper case. This is the standard for function names that Perl is going to call for you.

This is the constructor function. It is called when the user calls the tie function. It is passed the name of the class and the list of parameters that were passed to tie. It should return a reference to the new tied object.
This is the method that is called when the user accesses a value from the hash. The method is passed a reference to the tied object and the key that the user is trying to access. It should return the value associated with the given key (or undef if the key isn't found).
This method is called when the user tries to store a value against a key in the tied hash. It is passed a reference to the object, together with the key and value pair.
This method is called when the user calls the delete function to remove one of the key/value pairs in the tied hash. It is passed a reference to the tied object and the key that the user wishes to remove. The return value becomes the return value from the delete call. To emulate the 'real' delete function, this should be the value that was stored in the hash before it was deleted.
This method is called when the user clears the whole hash (usually by asigning an empty list to the hash). It is passed a reference to the tied object.
This method is called when the user calls the exists function to see whether a given key exists in the hash. It is passed a reference to the tied object and the key to search for. It should return a true value if the key is found and false otherwise.
This method is called when one of the hash iterator functions (each or keys) is called for the first time. It is passed a reference to the tied object and should return the first key in the hash.
This method is called when one of the iterator functions is called. It is passed a reference to the tied object and the name of the last key that was processed. It should return the name of the next key or undef if there are no more keys.
This method is called when the untie function is called. It is passed a reference to the tied object.
This method is called when the tied variable goes out of scope. It is passed a reference to the tied object.

As you can see, there are a large number of methods to implement, but in the next section we'll see how you can get away with only implementing some of them.

A First Example: Tie::Hash::FixedKeys

Let's take a look at the implementation of Tie::Hash::FixedKeys. This module is available on CPAN if you want to take a closer look.

Writing the module is made far easier for use by the existence of a package called Tie::StdHash. This is a tied hash that mirrors the behavior of a standard Perl hash. This package is stored in the module Tie::Hash. This means that if you wrote code like the following example, then you would have a tied hash that acts the same way as a 'real' hash.

    use Tie::Hash;

    my %hash;

    tie %hash, 'Tie::StdHash';

So far, so good. But it hasn't really achieved much. The hash %hash is now a tied object, but we haven't changed any of its functionalities. Tie::StdHash works much better if it is used as a base class from which you inherit behavior. For example, the start of the Tie::Hash::FixedKeys class looks like this:

    package Tie::Hash::FixedKeys;

    use strict;

    use Tie::Hash;

    use Carp;

    use vars qw(@ISA);

    @ISA = qw(Tie::StdHash);

This is standard for a Perl object, but notice that we've loaded the Tie::Hash module (with use Tie::Hash) and have told our package to inherit behavior from Tie::StdHash by putting Tie::StdHash in the @ISA package variable.

If we stopped there, our Tie::Hash::FixedKeys package would have the same behavior as a standard Perl hash. This is because each time Perl tried to find one of the tie interface methods (like FETCH or STORE) in our package it would fail and would call the version found in our parent class, Tie::StdHash.

At this point we can start to change the standard hash behavior by simply overriding the methods that we want to change. We'll start by implementing the TIEHASH method differently.

    sub TIEHASH {

      my $class = shift;

      my %hash;

      @hash{@_} = (undef) x @_;


      bless \%hash, $class;


The TIEHASH function is passed the name of the class as its first parameter, so we shift that into $class in the first line. The rest of the parameters in @_ are whatever extra parameters have been passed into the tie call. In the example of how to use our proposed class at the start of this article, we passed it the list of valid keys. Therefore, we take this list of keys and (using a hash slice) we initialize a hash so that it has undef as the value for each of these keys. Finally, we take a reference to this hash, bless it into the required class and return the reference.

It's worth pointing out here, the one caveat about using Tie::StdHash. In order to use the default behavior, your new class must be based on a hash reference and this hash must contain only real hash data. We couldn't, for example, invent a key called _keys that would contain a list of valid key names as, for example, this key would be shown if the user called the keys method.

At this point we have a hash that has values (of undef) for each of the allowed keys. This doesn't yet prevent us from adding new keys. For that we need to override the STORE method.

    sub STORE {

      my ($self, $key, $val) = @_;

      unless (exists $self->{$key}) {

        croak "invalid key [$key] in hash\n";



      $self->{$key} = $val;


The three parameters passed to the STORE method are a reference to the tied object, and a new key/value pair. We need the STORE method to prevent new keys being added to the underlying hash, and we achieve that by checking that the given key exists before setting the value. Note that as our underlying object is a real hash, we can check this simply by using the exists function. If the key doesn't exist we give the user a friendly warning and return from the method without changing the hash.

We have now prevented the hash from growing by adding keys, but it is still possible to remove keys from the hash (and our STORE implementation would prevent them from being set once they had been removed), so we also need to override the implementation of DELETE.

    sub DELETE {

      my ($self, $key) = @_;

      return unless exists $self->{$key};


      my $ret = $self->{$key};

      $self->{$key} = undef;

      return $ret;


Once again, we don't actually want to change the existing set of keys in the hash, so we check to see whether the key already exists and return immediately if it doesn't. If the key does exist, then we don't want to actually delete it, so we simply set the value back to undef. Notice that we note the value before deleting it so that we can return it from the method, thus mimicking the behavior of the real delete function.

There's one other way to affect the keys in our hash. Code like this:

    %hash = ();

will cause the CLEAR method to be called. The default behavior for this method is to remove all of the data from the hash. We need to replace this with a method that will reset all of the values to undef without changing the keys in any way.

    sub CLEAR {

      my $self = shift;

      $self->{$_} = undef foreach keys %$self;


And that's all that we need to do. All of the other functionality of a standard hash is inherited from Tie::StdHash. You can fetch values from our hash as normal without us writing any more lines of code. Built-in Perl functions like each and keys also work as expected.

Another Example: Tie::Hash::Regex

Let's look at another example. This module came about from a discussion on Perlmonks a couple of months ago. Someone asked whether it was possible to match hash keys approximately. I suggested that a hash that matched keys as regular expressions might solve their problem and wrote the first draft of this module. I'm grateful to Jeff Pinyan, who made some suggestions for improvements to the module.

In order to make this change to the behavior of the hash, we need to override the behavior of the FETCH, EXISTS and DELETE methods. Here's the FETCH method.

  sub FETCH {

    my $self = shift;

    my $key = shift;	

    my $is_re = (ref $key eq 'Regexp');


    return $self->{$key} if !$is_re && exists $self->{$key};


    $key = qr/$key/ unless $is_re;

    /$key/ and return $self->{$_} for keys %$self;




Knowing what we know about tied objects, this is pretty simple to follow. We start by getting the reference to the tied object (which will be a hash reference) and the required key. We then check to see whether the key is a reference to a precompiled regular expression (which would have been compiled with qr//. If the key isn't a regex, then we start by checking whether the key exists in the hash. If it does, we return the associated value. If the key isn't found, then we assume that it is a regex to search for. At this point we compile the regex as if it isn't already precompiled (this gives us a preforamnce boost as we could potentially need to match the regex against all of the keys in the hash). Finally, we check each key in the hash in turn against the regex and if it matches, then we return the associated value. If there are no matches we simply return.

At this point you may realize that it's possible for more than one key to match a regex and you may suggest that it would be nice for FETCH to return all matches as if it was called in scalar context. This is a nice idea, but in current versions of Perl the syntax $hash{$key} always calls FETCH in scalar context (and the syntax @hash{@keys} calls FETCH once in scalar context for each element of @keys) so this won't work. To get round this, you can use the slightly kludgey syntax @vals = tied(%hash)-FETCH($pattern)> and the version of the module on CPAN supports this.

The EXISTS method uses similar processing, but in this case we return 1 if the key is found instead of the associated value.

  sub EXISTS {

    my $self = shift;

    my $key = shift;

    my $is_re = (ref $key eq 'Regexp');

    return 1 if !$is_re && exists $self->{$key};

    $key = qr/$key/ unless $is_re;

    /$key/ && return 1 for keys %$key;



The DELETE method is somewhat different. In this case, we can delete all matching key/value pairs, which we do with the following code:

  sub DELETE {

    my $self = shift;

    my $key = shift;

    my $is_re = (ref $key eq 'Regexp');

    return delete $self->{$key} if !$is_re && exists $self->{$key};

    $key = qr/$key/ unless $is_re;

    for (keys %$self) {

      if (/$key/) {

        delete $self->{$_};




I should point out that there is another similar module on CPAN called Tie::RegexpHash written by robert Rothenberg. Tie::RegexpHash actually does the opposite to Tie::Hash::Regex. When you store a value in it, the key is a regular expression and any time you look up a value with a key, you will get the value associated with the first regex key that matches your string. It's interesting to note that Tie::RegexpHash isn't based on Tie::StdHash and, as a result, contains a lot more code than Tie::Hash::Regex.

Another recent addition to CPAN is Tie::Hash::Approx, which was written by Briac Pilpr�. This addresses a similar problem, but instead of using regex matching, it uses Jarkko Hietaniemi's String::Approx module.

Conclusion: Tie::Hash::Cannabinol

As a final example, here's something that isn't quite so useful. This is a hash that forgets just about everything that you tell it. Its exists function isn't exactly to be trusted either.

    package Tie::Hash::Cannabinol;

    use strict;

    use vars qw(@ISA);

    use Tie::Hash;


    $VERSION = '0.01';

    @ISA = qw(Tie::StdHash);

    sub STORE {

      my ($self, $key, $val) = @_;


      return if rand > .75;

      $self->{$key} = $val;


    sub FETCH {

      my ($self, $key) = @_;

      return if rand > .75;

      return $self->{rand keys %$self};


    sub EXISTS {

      return rand > .5;


As you can see, it's simple to make some radical alterations to the behavior of Perl hashes using tie and the Tie::StdHash base class. As I said at the start of the article, this often enables you to create new ``objects'' without having to make the leap to full object orientation in you programs.

And it isn't just hashes that you can do it for. The standard Perl distribution also comes with packages called Tie::StdArray, Tie::StdHandle and Tie::StdScalar.

Have fun with them.

This Week on p5p 2001/09/03


This Week on P5P

Testing, testing


Default random seed

local chdir()




Please send corrections and additions to where YYYYMM is the current year and month. Changes and additions to the perl5-porters biographies are particularly welcome.

Testing, testing

The focus this week has very definitely been on testing, with the great Michael Schwern providing all sorts of QA advice, tests and patches. He patched: t/op/rand.t, t/op/time.t, t/op/srand.t, t/op/local.t, t/op/concat.t, t/op/misc.t, t/run/segfault.t, pod/perlhack.pod, t/op/pack.t, lib/, lib/File/, and lib/File/Find/taint.t, in an earnest attempt to deprive himself of $500.

He also wrote a testing module, and passed on tests from Andrew Wilson (thanks, Andrew!) for CGI::Switch, CGI::Apache and CGI::Cookie.

Uhm. Wow.

Jonathan Stowe wrote a test suite for (Yes, honest) and Rafael Garcia-Suarez, who seems to have taken responsibility for the coderef-in- @INC feature, wrote some tests for that. Rafael also wins the prize for the funniest JAPH I've seen in a long time, but I'm going to make you hunt through the archives to find that. :)


Rafael also sought to make the information in %INC useful for modules loaded via the coderef-in- @INC. Now, for instance, you could see entries in %INC such as

     /loader/0x81095c8/ - CODE(0x81095c8)

The address in the "loader" section matches the address of the coderef.

Artur complained that the tests would only work on PerlIO, and could be rewritten to be more general; Rafael knew about this and tried to find a cleaner solution.

Gisle, on the other hand, was more concerned about the nature of what was going in %INC:

What is still missing is to make sure pp_require() invokes the hook again when an absolute filename starting with things like "/loader/0x81095c8/" is used. Currently this bypass the @INC search which is quite likely to make the require fail.

If you for instance try to serve up the Tk modules via a hook like this you will discover that it has a special AUTOLOAD function that construct absolute file names based on the %INC value. It will then load its .al files like this:

    require '/loader/0x81095c8/auto/Tk/Frame/';

Rafael said that AutoLoader falls back to a relative path if it has problems, and also discussed the possibility of having DynaLoader serve up binaries via a @INC hook. (Blugh.)

Nick Clark, who's one of the evil minds behind this whole thing, wanted something less plausible than /loader/whatever which could conceivably be a path if someone's really out to get us, and so Rafael counter-proposed &(0x...). Nick also expressed suitable disgust at the getting-binaries-from-a-coderef idea.

Default random seed

Michael Schwern (Yes, him again) noticed that in certain circumstances, calling srand twice with no argument can produce the same set of random numbers. He asked for more pseudo-random data that we can use to perturb the seed of srand on machines that don't have /dev/urandom. Merijn suggested times, but Jarkko said that the usual granularity for that was only a jiffy, which might not be enough. Jarkko and Mike Guy both pointed out that running srand twice was generally speaking a Don't Do That, Then error. Mike Guy's comments on srand bear repeating:

You shouldn't ever use srand() (i.e. without argument) more than once in a script. The internal state of the RNG should contain more entropy than can be provided by any seed, so calling srand() again actually *loses* randomness. And you shouldn't use srand() at all unless you need backward compatibility with *very* old Perls.

Of course, srand($x) with an explicit argument is a quite different kettle of fish. But you should only be doing that if you know what you are doing ...

Jarkko pointed out that

    @first_run  = mk_rand;

    @second_run = mk_rand;

might fail if we have really, really, really bad luck. But with 100 numbers in each array, it would have to be cosmically significantly bad luck. And Mike Guy pointed out that if we do get the same sequence back, then our rand isn't sufficiently random and this could be considered a bug.

local chdir()

Michael Schwern (Yes, him again) expressed his deep-seated longing for

    local chdir($foo);

which changed directory back once the scope is over. Kurt Starsinic said that we wouldn't always be able to go back, and so you might as well fork if that's what you want. (Amazingly, Artur didn't suggest using threads.) Jeremy Zawodny got really excited by the idea, and suggested being able to local an entire block of code and have the effects rolled back at the end of the scope. Yeah, right. However, he did suggest writing a little pushdir/popdir module, which was a little more sensible than hacking core for fun and profit. Schwern wrote something using Abigail's DESTROY trick, and Abigail showed a nicer variant by using a tied scalar which changed directory when you assigned to it.

Abhijit Menon-Sen got in a particularly silly mood and actually implemented local chdir($foo) (with a bit of help from you're truly, who's always in a particularly silly mood) which caused Schwern to ask for more, more, more... select, umask, chmod and so on were now on the table. Sarathy expressed some dismay at the waste of precious op_private bits, as well as the "semantic complexity" of the idea itself. Abhijit himself summed up the opinions of several of us:

I'm not convinced that we should allow localizing actions (as opposed to values), and adding destructors piecemeal for random ops would make me very uncomfortable.

That said, I wouldn't object to a module which -- with suitable hooks in the core -- allowed arbitrary leave_scope() actions to be registered. I might even write it sometime.

That'd be worth seeing.


Michael Schwern - oh no, it was Chris Nandor - had a problem with File::Spec:

In Mac OS, you can tell it to be relative by making the first argument a leading empty string. So catfile("a", "b") is absolute, while catfile("", "a", "b") is relative. In Unix/Windows, it is exactly the opposite: the default is relative, but adding a leading empty string makes it absolute!

As Chris rightly pointed out, "Yikes!" If File::Spec is supposed to make things more portable, we have a problem. Chris suggested making relative paths the default and breaking MacPerl in the interests of sanity. Barrie Slaymaker, who owns File::Spec said that he'd be willing to accept patches if he thought the MacPerl community could cope with the breakage. Peter Prymmer said that the required change on VMS probably wouldn't break that much and he was more interested in making the API cross-platform. Tim Jenness asked whether or not catdir ought to support volumes, such as on DOS or VMS. Phillip Newton reminded us that there is a difference between A:b\c and A:\b\c, and that he expected catdir('a:', 'b', 'c') to do the former rather than the latter. Yup, each drive has its own idea of the current working directory.

Anyway, it looks like the consensus was that Chris's suggestion will be adopted once Barrie gets Mac and VMS patches, File::Spec will be truly portable again, and all the world will rejoice.


The daily build project is great! Quite a few interesting bug reports have come out of the fact that we have Perl being tested almost continuously now on different platforms and architectures with all manner of different build options. I started smoke testing my desktop this week and confirmed some bugs that other people were seeing on AIX and FreeBSD, probably saving Merijn from a lot of unpleasant AIX and pains. If you want to get involved and donate some of your processing power, subscribe to the daily-build mailing list.

Paul Marquess found a potential DB_File bug with the help of Merijn's smoke results, and Artur got to track down a bizarre segfault in the File::Taint tests. Will Coleda broke Cygwin, but Gerrit Hasse thinks that was his own fault. Nick Clark found a bug in MANIFEST of all places, but this was explained as being a bad time to resync.


Ed Peschko asked why we don't have $SIG{__EXIT__} and was told by various people to use an END block.

Paul Johnson produced the useful but perhaps dangerous coretest make target, which only runs a subset of the testing suite, allowing you to dash off your latest patch without completely testing you haven't broken anything subtle. (I know, because I did it last week.)

Nicholas Clark silently went on making more and more of Perl able to preserve IVs where possible. Nobody cared. He also found lots and lots of interesting bugs.

Artur, pumpkinging madly, found a lovely bug in HP-UX which we can probably blame on gcc: inet_ntoa always returns This is obviously not useful. He also found a bug in the tests of Time::Hires where every so often the test fails due to rounding error between the ordinary integer version of time and the floating-point hi-res version.

Yusuf Goolamabbas asked whether something (lack of support for large files) was a Perl problem or a RedHat packaging problem. Two people took the opportunity to pass the buck. Amusingly, Chip Turner from RedHat turned up to pass it back:

I believe Large files are supported only with 2.4 kernels and certain glibcs. Which basically means for proper large file support, you will need Red Hat 7.1. Even then, some utilities don't work, Perl being one.


Daniel Lewart picked up a few miscellaneous bugs in Time::Local, and Rafael fixed up the example of an array shuffle in perlfaq4 to be less confusing to the learner. (It used the same variable name for an array and an array reference!)

Jarkko is now back, and rapidly catching up on his P5P backlog, so it's time for me to go and generate some more for him, and until next week I remain, your humble and obedient servant,

Simon Cozens

This Week on Perl 6 (26 August - 1 September 2001)


This Week in Perl 6 News


Expunge Implicit @_ Passing

Finalization and Deterministic Destruction

Multiple Dispatch on Objects

Program Metadata

!< versus >=

Parroty Bits

Last Words

You can subscribe to an email version of this summary by sending an empty message to

Please send corrections and additions to

The Perl 6 lists saw a little more traffic during this week: 137 messages across 19 threads, with 40 authors contributing.

Expunge Implicit @_ Passing

(22 posts) This topic from two weeks ago came up again, as Ken Fox mentioned its use in redirectors. Michael Schwern suggested using goto &code instead, and provided this final justification:

Why not just $method->(@_); or &{$method}(@_); or goto $method?

Any time you want to implicitly pass @_, you can just as easily *explicitly* pass it or use goto. As we're not doing pass-throughs all over the place, it's not the sort of thing you want implicit, as opposed to, say $_.

(This thread then devolved into a general debate on the usefulness of Java final classes.)

Finalization and Deterministic Destruction

(11 posts) Hong Zhang, however, did branch off and talk about the differentiation between finalization and destruction. There were then quite a few posts lamenting the demise of deterministic destruction with the move away from ref counting towards a more complex garbage collection scheme.

Dan Sugalski pointed out:

GC has nothing to do with finalization. Many people want it to, and seem to conflate the two, but they're separate. Dead object detection and cleanup doesn't have to be tied to memory GC. It won't be in perl 6. The perl 6 engine will guarantee whatever cleanup/finalization order and timliness that Larry puts into the language definition. That's not a problem.

Multiple Dispatch on Objects

(11 posts) The first of two threads on multiple dispatch started here, with two examples here and here.

There was talk about whether it was an OO technique, how it should work with the dynamicness of Perl, and what the best, most efficient manner of implementing multimethod dispatch is. In the end, Perl will support some form of pluggable multimethod dispatcher, although that was about all that was agreed upon.

(10 posts) The second thread decoupled multiple dispatch from objects, creating what it essentially subroutine overloading (by signature).

Program Metadata

(15 posts) I mentioned a few pieces of metadata that I would like access to from within a Perl 6 program. The bulk of the thread was about how to access the source of a script from within a script in Perl 5.

!< versus >=

(7 posts) Raptor suggested adding !> and !< to the logical operators as Another Way To Do It. Reactions were mixed, but no technical reason was given why it couldn't be. (It should be noted that in tri-state logic, where he saw this, !< is not the same as >=.)

Parroty Bits

Simon Cozens and Dan Sugalski are finishing up the seed code for the Parrot interpreter base.


The broad design of the Parrot internals is sufficiently complete to start implementing parts, and we are. We've the first cut core of an interpreter and, while I figure we'll probably rewrite the thing at least once before final release, it runs. You can now write code in Parrot assembler, assemble it, and run the results.

Most of the defined opcodes don't have corresponding code for them, so it's limited at the moment to integer and float operations with some control flow (branch, jump, and if) but more will be on the way soon.


I've been sitting down and writing bits of Perl 6 (I'm working primarily on the string functions at the moment, because I can do that without getting in Dan's hair too much) and also collecting our thoughts on the interpreter into documents that will specify the API and as much of the Grand Design as people need to know to be able to start helping.

The next phase of Parrot will be a code review - for the Perl internals community to poke and prod and make sense of what Dan and Simon have done. The community will provide feedback, and Dan and Simon will disappear for a brief period, before the code is opened up for development.

After going public, work will mostly progress according to Dan's To Do list.

Last Words

The last I heard, the next pair of stone tablets from Larry Wall and Damian Conway are coming down the mountain. That's my story, and I'm sticking to it.

Bryan C. Warnock
Visit the home of the Perl programming language:

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en