January 2002 Archives

Beginning PMCs

One of the best things about Parrot is that it's not just for Perl implementors. Parrot 0.0.3 came with support for extensible data types that can be used to implement the types used in your favorite language. The mechanism by which these types are extensible is called the PMC.

The PMC, or Parrot Magic Cookie, type is a special data container for user-defined data types. Because these user-defined types are essentially implementations of a set of methods, we refer to them as PMC classes. Currently, the legal PMC classes are the PerlInt, PerlNum, PerlString, PerlArray, and PerlHash types. The PerlInt, PerlNum and PerlString data types combine to form the PerlScalar data type.

PMC registers, unlike the basic Integer, Number, and String registers, must be specially allocated with the new P0, PMCType instruction. Other operations like set P0,5 are handled by special functions that are implemented by the PMC class. The rest of this article is about how to create your own PMC class implementation, alongside the PerlInt and PerlHash data types.

For our example, we're going to implement a simple queue data structure. Our queue will be a set of integers; the queue will grow when an integer is assigned to it, and will shrink when an integer is read from it. We'll use the PerlInt class as a basis, so it may be helpful to look at some examples of operations that use it:

  new P0, PerlInt  # Create a new PMC in the 'PerlInt' class
  set P0, 1234     # Set the value of the PMC to 1234
  set P0, "4567"   # Set the value of the PMC to 4567
  set P0, 12.34    # Set the value of the PMC to 12
  set I0, P0       # Set I0 to the current value of the PMC
  print P0
  print "n"

Note that no special instructions like set_string or set_float were required to assign data of different types to the PMC. Each instruction does the Right Thing given the initial type of the PMC. This has several important consequences when designing new data types, the largest of which is that it generally isn't necessary to add special instructions to access data contained within a PMC.

On the other side, this means that PMCs should attempt to behave rationally in all situations. It's not an onerous requirement, but in some cases, rational behavior is hard to define. Queues are fairly simple to define though, in terms of behavior. A queue has one way to get data in and one way to get data out.

Since we can use one instruction for multiple classes, we'll use set Pn,In to add an integer to the queue, and set In,Pn to get an element out of the queue. The last operation we need to perform on a queue is to determine whether the queue is empty. The PerlArray class uses set In,Pn to return the length of the array into In, but we've already decided to use that to get an integer out of the queue.

Instead of set In,Pn to determine how many elements are in the queue, all we really need to know is whether the queue is empty or in use. For that, we can use the handy boolean operator, if Pn,In. Here, the integer register is actually the number of instructions to skip over if the condition is true. We'll have it branch if the queue is empty.

So, our IntQueue data type will implement three instructions. First, the set Pn,In instruction will add an integer to the queue. Second, when the queue is empty, if Pn,In will branch to the appropriate offset. Finally, set In,Pn will dequeue the last integer in the queue and place it into the appropriate integer register.

Some sample source using IntQueue may come in handy at this time:

  new P0, IntQueue   # Create the queue
  set P0, 7          # Enqueue 7
  set P0, -43        # Enqueue -43
  set I0, P0         # Dequeue 7
  print I0           # Should print '7'.
  if P0, QUEUE_emPTY # Goto label 'QUEUE_emPTY'

Core Operations

Before forging ahead with the IntQueue, let's take a look at the core operations file. Within your CVS tarball, open parrot/core.ops and search for the set operations. While there are files such as parrot/core_ops.c and parrot/Parrot/OpLib/core_ops.pm, this is the master file. Changes in parrot/core_ops.c will be overwritten the next time you build, so make your edits to parrot/core.ops.

Having said that, let's look at a sample PMC operation.

  inline op set(out PMC, in NUM) {
    $1->vtable->set_number(interpreter,$1,$2);
    goto NEXT();
  }

Since core.ops is split into a Perl and C source file, the syntax is, of necessity, a mixture of Perl and C. The 'inline' declaration is a hint to the JIT compiler, which is beyond the scope of the article. Parameters also have hints for the JIT compiler, but the most important bits here are the PMC and NUM tags, because these let the compiler know what types this operation can take.

When preprocessing into Perl, the prototype is the only piece of interest, as the assembler only needs to know the name and parameter list in order to build the assembly code.

C preprocessing is a bit more complicated, but still fairly straightforward. Tokens like $2 are replaced with the appropriate code to access the declared parameter, and a few keywords like NEXT() are replaced with code to return the next instruction in the stream.

With the exception of those tags, the rest of the code is pure C, with access to all of the Parrot internals. Of course, you shouldn't access such things as the register internals, but the rest of the C API is available, the most common APIs being located in parrot/include/parrot/string.h and parrot/include/parrot/key.h, the latter primarily being used for aggregate data structures.

The preprocessor, while slightly confusing, is much more flexible than the current system of nested CPP macros that Perl currently uses, and hopefully easier to understand.

Virtual Tables

The code above used a curious construct:

  $1->vtable->set_number(INTERP,$1,$2);

Parameter $1 is a PMC, and since these are user-defined types, the code simply can't assign $2 to $1, as the non-PMC operations would do. Instead, each PMC has a table of function pointers assigned to it, and the interpreter calls the appropriate function.

For example, assuming that the P0 register is being initialized by the new P0,IntQueue instruction, the above code would run the set_number member of the IntQueue class. Since the type of P0 is decided on at runtime, the dispatch mechanism is completely independent of the parameter type. What this means in the case of the IntQueue type is that no modifications need to be made to the parrot/core.ops file.

Parrot Class Files

parrot/classes contains all of the PMC classes used by Parrot. Like the parrot/core.ops file, this too is preprocessed before final compilation, so all edits should be made to the parrot/classes/*.pmc files.

Creating a new class file from scratch is somewhat daunting, so we'll use an existing class file to base IntQueue on. While IntQueue is an aggregate type like PerlHash, the interface matches PerlInt closest, in that it only deals with one element at a time.

Start by copying parrot/classes/PerlInt.pmc to parrot/classes/IntQueue.pmc, and replace all instances of PerlInt with IntQueue. There will be some additional C code necessary that will be available in the sample source at the end of the article, but not discussed beyond the API.

Registering the parrot/classes/IntQueue.pmc is done in two files. Add the appropriate lines to parrot/global_setup.c to initialize the new PMC type, and add the new vtable entry to parrot/include/parrot/pmc.h. This is only done in the case of types that are intended to be part of Parrot itself; when Parrot has the ability to dynamically load PMC classes at runtime, a more flexible mechanism will be derived for registering classes, but for now, we'll pretend that IntQueue is going to be a core interpreter data type.

Within parrot/core.ops, the instructions the IntQueue type uses look like this:

  op new(out PMC, in INT) {
    PMC* newpmc;
    if ($2 <0 || $2 >= enum_class_max) {
      abort(); /* Deserve to lose */
    }
    newpmc = pmc_new(interpreter, $2);
    $1 = newpmc;
    goto NEXT();
  }
  
  inline op set(out PMC, in INT) {
    $1->vtable->set_integer_native(interpreter, $1, $2);
    goto NEXT();
  }
  
  inline op set(out INT, in PMC) {
    $1 = $2->vtable->get_integer(interpreter, $2);
    goto NEXT();
  }
  
  op if(in PMC, in INT) {
    if ($1->vtable->get_bool(interpreter, $1)) {
      goto OFFSET($2);
    }
    goto NEXT();
  }

Naturally, each of these call PMC vtable entries, and each one of these has to be implemented. As of this writing, the appropriate vtable entries as they are in parrot/classes/perlint.pmc look like this:

    void init () { /* This is called from pmc_new() */
        SELF->cache.int_val = 0;
    }
    void set_integer_native (INTVAL value) {
        SELF->cache.int_val = value;
    }
    INTVAL get_integer () {
        return SELF->cache.int_val;
    }
    BOOLVAL get_bool () {
        return pmc->cache.int_val != 0;
    }

Any code before the pmclass declaration in a parrot/classes/*.pmc file is literally copied into the C source, so we'll use this area to store our data structures and APIs. In order to make matters simple, we'll assume that the following API is available for our use:

  static CONTAINER* new_container ( void );
  static void enqueue ( CONTAINER* container, INTVAL value );
  static INTVAL dequeue ( CONTAINER* container );
  static INTVAL queue_length ( CONTAINER* container );

The API should be fairly straightforward to use. Initializing the container is done with new_container, which returns a pointer to our new queue data type. Adding a new queue element is done with enqueue, and deleting an element is done with dequeue. The queue's length can be found with queue_length.

The CONTAINER data type has to be stored somewhere, and we look into parrot/include/parrot/pmc.h to find out where to store it. We find the definition of the PMC structure to be:

    struct PMC {
      VTABLE *vtable;
      INTVAL flags;
      DPOINTER *data;
      union {
        INTVAL int_val;
        FLOATVAL num_val;
        DPOINTER *struct_val;
      } cache;
      SYNC *synchronize;
    };

There are two areas we can store data: data is used as a general dumping ground for a data type's internal data structures, and the cache union is used for fast access to simpler data structures. data is the right place to hang our CONTAINER structure.

Like most of the other files within Parrot, the IntQueue class is also preprocessed. The major preprocessing done here is to replace the SELF tag with a reference to the current PMC. In the rare case that you need a reference to the current interpreter, that tag is INTERP.

Initializing the IntQueue class is done with the init member. Since we're storing our queue in the data, we'll let the new_container function hand us a pointer to our new queue, and save that.

    void init () {
        SELF->data = new_container();
    }

Getting an integer out of the queue is done with the get_integer member. This isn't meant to be production-quality, so we won't worry about error checking. So, we'll simply return the integer from the container.

    INTVAL get_integer () {
        return dequeue((CONTAINER*)SELF->data);
    }

Adding an integer to the queue is done with the set_integer_native member. We'll simply use the enqueue function to place the integer onto the queue like so:

    void set_integer_native (INTVAL value) {
        enqueue(SELF->data,value);
    }

The final function we need to support is being able to determine whether the queue is empty, and we use the queue_length function for that. The PMC member function that does this is get_bool, and the code to access this is pretty straightforward:

    BOOLVAL get_bool () {
        return queue_length(SELF->data) != 0;
    }

This code has been checked in to the Parrot CVS, so feel free to look at the full version there. We've now walked through the major files needed to implement a Parrot Magic Cookie. Next time, we'll explore the functions needed to implement aggregate data types like hashes and arrays, and learn about the new garbage collection system.

In the meantime, if you want to play with implementing your own data types for Parrot, then take a look at docs/vtables.pod in the Parrot source tree for more information about the members that you can implement and how to design your own classes from scratch.

This Week on Perl 6 (13 - 19 Jan 2002)

This summary, as with past summaries, can be found in here. (Note that this is an @Home address, and will change sometime in the next two months.) Please send additions, submissions, corrections, kudos, and complaints to bwarnock@capita.com.

Perl 6 is the major redesign and rewrite of the Perl language. Parrot is the virtual machine that Perl 6 (and other languages) will be written for. For more information on the Perl 6 and Parrot development efforts, visit dev.perl.org and parrotcode.org.

There were 166 messages across 69 threads, with 38 authors contributing. Again, most of the messages were patches.

Apocalypse 4

Larry Wall released Apocalypse 4, covering blocks (and scopes and statements). It's quite weighty, particularly up front, but here's a quick glossy on what's covered:

  • the new given / when switch block

  • exceptions
  • scope changes
  • no more required parentheses for expressions in block constructs
  • a lot more flow-control blocks
  • multiple iterators in looping blocks

There are quite a few other tidbits inside. The Apocalypse was released late Thursday night, so little feedback, and no Damian Conway's Exegesis, have been processed yet. I'll pick up community reaction next week.

Parrot Strings

Jarkko Hietaniemi, the Perl 5.8.0 pumpking, posted a developing PDD on string handling in Parrot, largely based on his experience providing Unicode support for Perl 5. The main tenets of the proposal are:

  • separate binary data and its API from textual data and its API - at both the language and internals level

  • convert all text - string constants, source code, input data - to the internal representation: UTF-16, or a UCS-2 (non-surrogate) and UTF-16 (surrogate) hybrid
  • handle localization as a separate layer

Most of the following discussion centered around regular expression character classes, and how to best implement them. Brent Dax is currently using UTF-32 within the regex engine, with a hybrid bitmap, binary lookup scheme for character classes, similar to the way that Perl 5 does. Jarkko suggested using an inversion list.

This discussion is ongoing, so there'll be more to report next week with this, too.

The Parrot Spotlight

Most folks already have an inkling of what Larry Wall, Damian Conway, Dan Sugalski, and Simon Cozens have been doing for Perl 6 and Parrot, so we're going to pad space with some brief introductions to some of the other Parrot Troopers getting things done.

Daniel Grunblatt is a 21 year old university student in Argentina. He's been working in Perl for several years, but Parrot is his first time working on Perl internals. He's the creator of Parrot's JIT compiler, and also plays basketball and role-playing games.

Parroty Bits

The Perl Development Grant Fund exceeded $80,000, thanks to sizable contributions from DynDNS, pair Networks, and SAGE.


Bryan C. Warnock

Finding CGI Scripts

Introduction

No matter how much we try to convince people that Perl is a multi-purpose programming language, we'd be deluding ourselves if we didn't admit that the majority of programmers first come into contact with Perl through their experience with CGI programs. People have a small Web site and one day they decide that they need a guest book, a form mail script or a hit counter. Because these people aren't programmers, they go out onto the Web to see what pre-written scripts they can find.

And there are plenty to choose from. Try searching on ``CGI scripts'' at Google. I received about 2 million hits. The first two were those well-known sites - Matt's Script Archive and the CGI Resource Index. Our Web site owner will visit one of these sites, find the required scripts and install them on his site. What could be simpler? See, the Web is as easy as people make it out to be.

In this article, I'll take a closer look at this scenario and show that all is not as rosy as I've portrayed it above.

CGI Script Quality

An important factor that Google takes into account when displaying search results is the number of links to a given site. Google assumes that if there are a large number of links to a given Web page, then it must be a well-known page and that Google's visitors will want to visit that site first.

Notice that I said ``well-known'' in that previous paragraph. Not ``useful'' or ``valuable.'' Think about this for a second. The types of people that I described in the introduction are not programmers. They certainly aren't Perl programmers. Therefore, they are in no position to make value judgments on the Perl code that they download from the Internet.

This means that the ``most popular'' site becomes a self-fulfilling prophecy. The best known site is listed first on the search engines. More people download scripts from that site, assuming that the most popular site must have the highest quality scripts and that the popular sites end up becoming more popular.

At no point does any kind of quality control enter into the process.

OK, so that's not strictly true. If the scripts from a particular site just didn't work at all, then word would soon get out and that site's scripts would become unpopular. But what if the problems were more subtle and didn't manifest themselves on all sites. Here is a list of some potential problems:

  • Not checking the results of an open call. This will work fine if the expected file exists and has the right permissions. But what happens when the file doesn't exist? Or it exists but the CGI process doesn't have permissions to read from it or write to it?
  • Bad CGI parameter parsing code. CGI parameter parsing is one of those things that is easy to do badly and hard to do well. It's simple enough to write a parser function that handles most cases, but does it handle both GET and POST requests? What about keys with multiple associated values? And does it process file uploads correctly?
  • Lack of security. Installing a CGI program allows anyone with an Internet connection to run a program on your server. That's quite a scary thing to allow. You'd better be well aware of the security implications. Of course, if people only ever run the script from your HTML form, then everything will probably be fine, but a cracker won't do that. He'll fire ``interesting'' sets of parameters at your script in an attempt to find its weaknesses. Suddenly a form mail script is being used to send copies of vital system files to the cracker.

    It's also worth bearing in mind that because these scripts are available on the Web, crackers can easily get the source code. They can then work out any insecurities in the scripts and exploit them. Recently, a friend's Web site came under attack from crackers and amongst the traces left in the access log were a large number of calls to well-known CGI scripts.

    For this reason, it is even more important that you are careful about security when writing CGI scripts that are intended to be used by novice Webmasters.

The fact is, unfortunately, that these kinds of problems are commonplace in the scripts that you can download from many popular CGI script archives. That's not to say that the authors of these scripts are deliberately trying to give crackers access to your servers. It's simply evidence that Perl has moved on a great deal since the introduction of Perl 5 in 1994 and many of the CGI script authors haven't kept their scripts up to date with current practices. In other cases, the authors know only too well how out of date their scripts are and have produced newer, improved versions, but other people are still distributing the older versions.

Setting a Good Example

Although the people who are downloading these scripts aren't usually programmers, there often comes a time when they want to start changing the way a program works and perhaps even writing their own CGI programs. When this time comes, they will go to the scripts they already have for examples of how to write them. If the original script contained bad programming practices, then these will be copied in the new scripts. This is the way that many bad programming practices have become so common among Perl scripts. I, therefore, think that it's a good idea for any publicly distributed programs to follow best programming practices as much as possible.

Script Quality - A Checklist

So now we have an obvious problem. I said before that the people who are downloading and installing these scripts aren't qualified to make judgments on the quality of the code. Given that there are some problematic scripts out there, how are they supposed to know whether they should be using a particular script that they find on the Web?

It's a difficult question to answer, but there are some clues that you can look for that give a idea of how well-written a script is. Here's a brief checklist:

  • Does the script use -w and use strict? The vast majority of Perl experts recommend using these tools when writing Perl programs of any level of complexity. They make any Perl program more robust. Anyone distributing Perl programs without them probably doesn't know as much Perl as they think they do.
  • Does the script use Perl's taint mode? Accepting external data from a Web browser is a dangerous business. You can never be sure what you'll get. If you add -T to a program's shebang line, then Perl goes into taint mode. In this mode Perl distrusts any data that it gets from external sources. You need to explicitly check this data before using it. Using -T is a sign that the author is at least thinking about CGI security issues.
  • Does the script use CGI.pm? Since Perl 5.004, CGI.pm has been a part of the standard Perl distribution. This module contains a number of functions for handling various parts of the CGI protocol. The most important one is probably param, which deals with the parsing of the query string to extract the CGI parameters. Many CGI scripts write their own CGI parameter parsing routine that is missing features or has bugs. The one in CGI.pm has been well-tested over many years in thousands of scripts - why attempt to reinvent it?
  • How often is the script updated? One reason for a script not to use CGI.pm might be that it hasn't been updated since the module was added to the Perl distribution. This is generally a bad sign. You should look for scripts that are kept up to date. If there hasn't been been a new version of the script for several years, then you should probably avoid it.
  • How good is the support? Any program is of limited use if it's unsupported. How do you get support for the program? Is there an e-mail address for the author? Or is there a support mailing list? Try dropping an e-mail to either the author or the mailing list and see how quickly you get a response.

Of course, these rules will have exceptions, but if a script scores badly on most of them, then you might have second thoughts on whether you should be using the script.

nms - A New CGI Program Archive

Having spent most of this article being quite negative about existing CGI program archives, let's now get a bit more positive. In the summer of 2001, a group of London Perl Mongers started to wonder what would be involved in writing a set of new CGI programs that could act as replacements for the ones in common use. After some discussion, the nms project was born. The name nms originally stood for a disparaging remark about one of the existing archives, but we decided that we didn't want the kind of negativity in the name. By that time, however, the abbreviated name was in common usage so we decided to keep it - but it no longer stands for anything.

The objectives for nms were quite simple. We wanted to provide a set of CGI programs which fulfilled the following:

  • As easy (or easier) to use as existing CGI scripts.
  • Use best programming practices
  • Secure
  • Bug-free (or, at least, well supported)

We decided that we would base our programs on the ones found in Matt's Script Archive. This wasn't because Matt Wright's scripts were the worst out there, but simply that they were the most commonly used. We made a rule that our scripts would be drop-in replacements for Matt's scripts. That meant that anyone who had existing data from using one of Matt's scripts would be able to take our replacement and simply put it in place of the old script. This, of course, meant that we had to become familiar with the inner workings of Matt's scripts. This actually turned out not to be a hard as I expected. The majority of Matt's scripts are simple. It's only really formmail, guestbook and wwwboard that are complex.

Sometimes our objectives contradicted one anther. We decided early on, that part of making the scripts as easy to use as possible meant not relying on any CPAN modules. We forced ourselves to only use only modules that came as part of the standard Perl distribution. The reason for this is that our target audience probably doesn't know anything about CPAN modules and wouldn't find it easy to install them. A large part of our audience isprobably operating a Web site on a hosted server where they may not be able to install new modules and in many cases won't have telnet access to their server. We felt that asking them to install extra modules would make them far less likely to use our programs. This, of course, goes against our objective of using best programming practices as in many cases there is a CPAN module that implements functionality that we use. The best example of this is in formmail where we resort to sending e-mails by talking directly to sendmail rather than using one of the e-mail modules. In these cases, we decided that getting people to use the scripts (by not relying on CPAN) was more important to us than following best practices.

nms is a SourceForge project. You can get the latest released versions of the scripts from http://nms-cgi.sourceforge.net or, if you're feeling braver, then you can get the leading edge versions from CVS at the project page at http://sourceforge.net/projects/nms-cgi/. Both of those pages also have links to the nms mailing lists. We have two lists, one for developers and one for support questions. There is also a FAQ that will hopefully answer any further questions that you have about the project.

Here is a list of the scripts available from nms

  • Countdown Count down the time to a certain date
  • Free For All Links A simple Web link database
  • Formmail Send e-mails from Web forms
  • Guestbook A simple guest book script
  • Random Image Display a random image
  • Random Links Display a link chosen randomly from a list
  • Random Text Display a randomly chosen piece of text
  • Simple Search Simple Web site search engine
  • SSI Random Image Display a random image using SSI
  • Text Clock Display the time
  • Text Counter Text counter

I should point out that this is very much a ``work in progress.'' While we're happy with the way that they work, we can always use more people looking at the code. The one advantage that Matt's scripts have over ours is that they've had many years of testing on a large number of Web sites.

A Plea for Help

So now we have a source of well-written CGI programs that we can point users to. What more needs to be done? Well, the whole point of writing this article was to ask more people to help. There's always more work to do :-)

  • Peer review. We think we've done a pretty good job on the scripts, but we're not interested in resting on our laurels. The more people that look at the scripts the more likely we'll catch bugs and insecurities. Please download the scripts and take a look at them. Pass any bugs on to the developers mailing.
  • Testing. We test the scripts on as many platforms with as many different configurations as we can, but we'll always miss one or two. Please try to install the scripts on your systems and let us know about any problems you have.
  • Documentation. Our documentation isn't any worse than the documentation for the existing archives, but we think it could be much better. If you'd like to help out with this, then please get in touch with us.
  • Advocacy. This is the most important one. Please tell everyone that you know about nms. Everywhere that you see people using other CGI scripts, please explain to them the potential problems and show them where to get the nms scripts. Having written these scripts, we feel it's important that they get as wide exposure as possible. If you have any ideas for promoting nms, then please let us know.

While I don't pretend for a minute that these are the only well-written and secure CGI programs available, I do think that the Perl community needs a well-known and trusted set of CGI programs that we can point people to. With your help, that's what I want nms to become.

This Week on Perl 6 (6 - 12 Jan 2002)

Notes

You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

This summary, as with past summaries, can be found in here. (Note that this is an @Home address, and will change sometime in the next two months.) Please send additions, submissions, corrections, kudos, and complaints to bwarnock@capita.com.

For more information on the Perl 6 and Parrot development efforts, visit dev.perl.org and parrotcode.org.

There were 224 messages across 114 threads, with 39 authors contributing. Most of the messages were patches.

Regexes

Parrot now has primitive regex support, courtesy of Brent Dax. It includes a set of regex op primitives, including hooks for a generic regex compiler. Currently, however, there's some debate over where match should be implemented.

Naming Conventions

One of the larger threads discussed getting a handle on the current naming convention. Or, more accurately, the lack of one. No consensus has been reached.

Warnings

The Parrot community received a well-deserved slap on the wrist from Parrot hacker Nicholas Clark. A common complaint on checking out our code is the incessant stream of warnings when compiling. You're not alone.

Parroty Bits

Dan Sugalski posted some more random thoughts on Parrot's implementation direction. He also started on the new memory allocator, and modified life.pasm to be a new benchmark.

David M. Lloyd fixed a some problems that exist when the size of an opcode is smaller than the configured size of an integer value. (A perfectly legal scenario in Parrot.)

Simon Glover also did significant hacking on arrays and hashes.

The Perl Development Grant Fund is holding steady at 26% (and change).


Bryan C. Warnock

Apocalypse 4

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

Table of Contents
  • Accepted RFCs
  • Rejected RFCs
  • Withdrawn RFCs
  • Other decisions
  •  

    This Apocalypse is all about syntax in the large. The corresponding chapter in the Camel book is entitled "Statements and Declarations", but it could just as easily have been entitled, "All About Blocks". The basic underlying question is "What exactly do those curlies mean?"

    For Perl 5 and earlier, the answer to that question was, "Too many things". Or rather, too many things with inconsistent rules. We'll continue to use curlies for much of what we've used them for up till now, but by making a few critical simplifications, the rules will be much more consistent. In particular, built-ins will parse with the same rules as user-defined constructs. It should be possible to make user-extensible syntax look just like built-in syntax. Perl 5 started down this road, but didn't get all the way there. In Perl 6, all blocks operate under the same rules. Effectively, every block is a kind of closure that can be run by user-defined constructs as well as built-ins.

    Associated with block structure are the various constructs that make use of block structure. Compound constructs like loops and conditionals use blocks explicitly, whereas declarations refer to their enclosing block implicitly. This latter feature was also inconsistently applied in Perl 5. In Perl 6, the rule is simple: A lexically scoped declaration is in effect from the declaration to the end of its enclosing block. Since blocks are delimited only by curlies or by the ends of the current compilation unit (file or string), that implies that we can't allow multi-block constructs in which lexically scoped variables "leak" or "tunnel" from the end of one block to the beginning of the next. A right curly (without an intervening left curly) absolutely stops the current lexical scope. This has direct bearing on some of these RFCs. For instance, RFC 88 proposes to let lexical scope leak from a try block into its corresponding finally block. This will not be allowed. (We'll find a different way to solve that particular issue.)

    While lexical declarations may not leak out of a block, control flow must be able to leak out of blocks in a controlled fashion. Obviously, falling off the end of a block is the most "normal" way, but we need to exit blocks in other "abnormal" ways as well. Perl 5 has several different ways of exiting a block: return, next, last, redo, and die, for instance. The problem is that these various keywords are hard-wired to transfer control outward to a particular built-in construct, such as a subroutine definition, a loop, or an eval. That works against our unifying concept that every block is a closure. In Perl 6, all these abnormal means of block exit are unified under the concept of exceptions. A return is a funny kind of exception that is trapped by a sub block. A next is an exception that is trapped by a loop block. And of course die creates a "normal" exception that is trapped by any block that chooses to trap such exceptions. Perl 6 does not require that this block be an eval or try block.

    You may think that this generalization implies excessive overhead, since generally exception handling must work its way up the call stack looking for an appropriate handler. But any control flow exception can be optimized away to a "goto" internally when its target is obvious and there are no user-defined blocks to be exited in between. Most subroutine return and loop control operators will know which subroutine or loop they're exiting from because it'll be obvious from the surrounding lexical scope. However, if the current subroutine contains closures that are being interpreted elsewhere in user-defined functions, it's good to have the general exception mechanism so that all needed cleanup can be automatically accomplished and consistent semantics maintained. That is, we want user-defined closure handlers to stay out of the user's face in the same way that built-ins do. Control flow should pretend to work like the user expects, even when it doesn't.

    Here are the RFCs covered in this Apocalypse. PSA stands for "problem, solution, acceptance", my private rating of how this RFC will fit into Perl 6. Interestingly, this time I've rejected more RFCs than I accepted. I must be getting cruel and callous in my old age. :-)

        RFC   PSA    Title
        ---   ---    -----
        006   acc    Lexical variables made default
        019   baa    Rename the C<local> operator
        022   abc    Control flow: Builtin switch statement
        063   rr     Exception handling syntax
        064   bdc    New pragma 'scope' to change Perl's default scoping
        083   aab    Make constants look like variables
        088   bbc    Omnibus Structured Exception/Error Handling Mechanism
        089   cdr    Controllable Data Typing
        106   dbr    Yet another lexical variable proposal: lexical variables made default
        113   rr     Better constants and constant folding
        119   bcr    Object neutral error handling via exceptions
        120   bcr    Implicit counter in for statements, possibly $#   
        167   bcr    Simplify do BLOCK Syntax
        173   bcc    Allow multiple loop variables in foreach statements
        199   abb    Short-circuiting built-in functions and user-defined subroutines
        209   cdr    Fuller integer support in Perl   
        262   cdr    Index Attribute
        279   cdr    my() syntax extensions and attribute declarations
        297   dcr    Attributes for compiler hints
        309   adr    Allow keywords in sub prototypes
        330   acc    Global dynamic variables should remain the default
        337   bcc    Common attribute system to allow user-defined, extensible attributes
        340   dcr    with takes a context
        342   bcr    Pascal-like "with"

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    Accepted RFCs

    Previous Apocalypses

    Apocalypse One

    Apocalypse Two

    Apocalypse Three

    Note that, although these RFCs are in the "accepted" category, most are accepted with major caveats (a "c" acceptance rating), or at least some "buts" (a "b" rating). I'll try to list all those caveats here, but where there are systematic changes, I may indicate these generally in this document without attempting to rewrite the RFC in every detail. Those who implement these features must be sensitive to these systematic changes and not just uncritically implement everything the RFC says.

    I'd like to talk about exceptions first, but before that I have to deal with the switch statement, because I think it's silly not to unify exception handlers with switch statements.

    RFC 022: Control flow: Builtin switch statement

    Some OO purists say that any time you want to use a switch statement, you ought to make the discriminant of the switch statement into a type, and use method dispatch instead. Fortunately, we are not OO purists here, so forget that argument.

    Another argument against having a switch statement in Perl 6 is that we never had it in the first five versions of Perl. But it would be incorrect to say that we didn't miss it. What actually happened was that every time we started discussing how to add a switch statement, it wasn't obvious how far to go. A switch statement in Perl ought to do more than a switch statement in C (or in most any other language, for that matter). So the fact that we haven't added a switch statement so far says more about how hard it is to design a good one than about how much we wanted a lousy one. Eventually the ever inventive Damian Conway came up with his famous design, with a Perl 5 module as proof of concept, and pretty much everyone agreed that he was on the right track, for some definition of "right" (and "track"). This RFC is essentially that design (not surprisingly, since Damian wrote it), so it will be accepted, albeit with several tweaks.

    In the first place, as a quasi-linguist, I loathe the keywords switch and case. I would prefer keywords that read better in English. Much as I love verbing nouns, they don't work as well as real verbs or real prepositions when topicalizers are called for. After thrashing over several options with Damian and other folks, we've settled on using given instead of switch, and when instead of case:

        given EXPR {
            when EXPR { ... }
            when EXPR { ... }
            ...
        }

    The other great advantage of using different words is that people won't expect it to work exactly like any other switch statement they may be familiar with.

    That being said, I should point out that it is still called "the switch statement", and the individual components are still "cases". But you don't have to put "switch" or "case" into constant-width font, because they're not keywords.

    Because curlies are so extremely overloaded in Perl 5, I was at first convinced that we would need a separator of some sort between the expression and the block, maybe a : or => or some such. Otherwise it would be too ambigous to come upon a left curly when expecting an operator--it would be interpreted as a hash subscript instead. Damian's RFC proposes to require parentheses in certain situations to disambiguate the expression.

    But I've come to the conclusion that I'd rather screw around (a little) with the "insignificant whitespace" rule than to require an extra unnatural delimiter. If we observe current practice, we note that 99% of the time, when people write a hash subscript they do so without any whitespace before it. And 99% of the time, when they write a block, they do put some whitespace in front of it. So we'll just dwim it using the whitespace. (No, we're not going all the way to whole-hog whitespace dwimmery--Python will remain the best/worst example of that approach.)

    Subscripts are the only valid use of curlies when an operator is expected. (That is, subscripts are essentially postfix operators.) In contrast, hash composers and blocks are terms, not operators. Therefore, we will make the rule that a left curly that has whitespace in front of it will never be interpreted as a subscript in Perl 6. (If you think this is totally bizarre thing to do, consider that this new approach is actually consistent with how Perl 5 already parses variables within interpolated strings.) If there is any space before the curly, we force it to start a term, not an operator, which means that the curlies in question must delimit either a hash composer or a block. And it's a hash composer only if it contains a => pair constructor at the top level (or an explicit hash keyword on the front.) Therefore it's possible to unambiguously terminate an expression by following it with a block, as in the constructs above.

    Interestingly, this one tweak to the whitespace rule also means that we'll be able to simplify the parentheses out of other similar built-in constructs:

        if $foo { ... }
        elsif $bar { ... }
        else { ... }
        while $more { ... }
        for 1..10 { ... }

    I think throwing out two required punctuation characters for one required whitespace is an excellent trade in terms of readability, particularly when it already matches common practice. (You can still put in the parens if you want them, of course, just for old times' sake.) This tweak also allows greater flexibility in how user-defined constructs are parsed. If you want to define your own constructs, they should be able to follow the same syntax rules as built-ins.

    By a similar chain of logic (or illogic), I also want to tweak the whitespace rules for the trailing curly. There are severe problems in any C-derived language that allows user-defined constructs containing curlies (as Perl does). Even C doesn't entirely escape the head-scratching puzzle of "When do I put a semicolon after a curly?" A struct definition requires a terminating semicolon, for instance, while an if or a while doesn't.

    In Perl, this problem comes up most often when people say "Why do I have to put a semicolon after do {} or eval {} when it looks like a complete statement?"

    Well, in Perl 6, you don't, if the final curly is on a line by itself. That is, if you use an expression block as if it were a statement block, it behaves as one. The win is that these rules are consistent across all expression blocks, whether user-defined or built-in. Any expression block construct can be treated as either a statement or a component of an expression. Here's a block that is being treated as a term in an expression:

        $x = do {
            ...
        } + 1;

    However, if you write

        $x = do {
            ...
        }
        + 1;

    then the + will be taken erroneously as the start of a new statement. (So don't do that.)

    Note that this special rule only applies to constructs that take a block (that is, a closure) as their last (or only) argument. Operators like sort and map are unaffected. However, certain constructs that used to be in the statement class may become expression constructs in Perl 6. For instance, if we change BEGIN to an expression construct we can now use a BEGIN block inside an expression to force compile-time evaluation of a non-static expression:

        $value = BEGIN { call_me_once() } + call_me_again();

    On the other hand, a one-line BEGIN would then have to have a semicolon.

    Anyway, back to switch statements. Damian's RFC proposes various specific kinds of dwimmery, and while some of those dwims are spot on, others may need adjustment. In particular, there is an assumption that the programmer will know when they're dealing with an object reference and when they're not. But everything will be an object reference in Perl 6, at some level or other. The underlying characteristics of any object are most generally determined by the answer to the question, "What methods does this object respond to?"

    Unfortunately, that's a run-time question in general. But in specific, we'd like to be able to optimize many of these switch statements at compile time. So it may be necessary to supply typological hints in some cases to do the dwimmery efficiently. Fortunately, most cases are still fairly straightforward. A 1 is obviously a number, and a "foo" is obviously a string. But unary + can force anything to a number, and unary _ can force anything to a string. Unary ? can force a boolean, and unary . can force a method call. More complicated thoughts can be represented with closure blocks.

    Another thing that needs adjustment is that the concept of "isa" matching seems to be missing, or at least difficult to express. We need good "isa" matching to implement good exception handling in terms of the switch mechanism. This means that we need to be able to say something like:

        given $! {
            when Error::Overflow { ... }
            when Error::Type { ... }
            when Error::ENOTTY { ... }
            when /divide by 0/ { ... }
            ...
        }

    and expect it to check $!.isa(Error::Overflow) and such, along with more normal pattern matching. In the case of the actual exception mechanism, we won't use the keyword given, but rather CATCH:

        CATCH {
            when Error::Overflow { ... }
            when Error::Type { ... }
            when Error::ENOTTY { ... }
            when /divide by 0/ { ... }
            ...
        }

    CATCH is a BEGIN-like block that can turn any block into a "try" block from the inside out. But the insides of the CATCH are an ordinary switch statement, where the discriminant is simply the current exception object, $!. More on that later--see RFC 88 below.

    Some of you may recall that I've stated that Perl 6 will have no barewords. That's still the case. A token like Error::Overflow is not a bareword because it's a declared class. Perl 6 recognizes package names as symbolic tokens. So when you call a class method as Class::Name.method(), the Class::Name is actually a class object (that just happens to stringify to "Class::Name"). But the class method can be called without a symbolic lookup on the package name at run time, unlike in Perl 5.

    Since Error::Overflow is just such a class object, it can be distinguished from other kinds of objects in a switch statement, and an "isa" can be inferred. It would be nice if we could go as far as to say that any object can be called with any class name as a method name to determine whether it "isa" member of that class, but that could interfere with use of class name methods to implement casting or construction. So instead, since switch statements are into heavy dwimmery anyway, I think the switch statement will have to recognize any Class::Name known at compile time, and force it to call $!.isa(Class::Name).

    Another possible adjustment will involve the use of switch statements as a means of parallelizing regular expression evaluation. Specifically, we want to be able to write parsers easily in Perl, which means that we need some way of matching a token stream against something like a set of regular expressions. You can think of a token stream as a funny kind of string. So if the "given" of a switch statement is a token stream, the regular expressions matched against it may have special abilities relating to the current parse's data structure. All the regular expressions of such a switch statement will likely be implicitly anchored to the current parse location, for instance. There may be special tokens referring to terminals and non-terminals. Basically, think of something like a yacc grammar, where alternative pattern/action grammar rules are most naturally expressed via switch statement cases. More on that in the next Apocalypse.

    Another possible adjustment is that the proposed else block could be considered unnecessary. The code following the final when is automatically an "else". Here's a duodecimal digit converter:

        $result = given $digit {
            when "T" { 10 }
            when "E" { 11 }
            $digit;
        }

    Nevertheless, it's probably good documentation to line up all the blocks, which means it would be good to have a keyword. However, for reasons that will become clearer when we talk about exception handlers, I don't want to use else. Also, because of the identification of when and if, it would not be clear whether an else should automatically supply a break at the end of its block as the ordinary when case does.

    So instead of else, I'd like to borrow a bit more from C and use default:

        $result = given $digit {
            when "T" { 10 }
            when "E" { 11 }
            default  { $digit }
        }

    Unlike in C, the default case must come last, since Perl's cases are evaluated (or at least pretend to be evaluated) in order. The optimizer can often determine which cases can be jumped to directly, but in cases where that can't be determined, the cases are evaluated in order much like cascaded if/elsif/else conditions. Also, it's allowed to intersperse ordinary code between the cases, in which case the code must be executed only if the cases above it fail to match. For example, this should work as indicated by the print statements:

        given $given {
            print "about to check $first";
            when $first { ... }
            print "didn't match $first; let's try $next";
            when $next { ... }
            print "giving up";
            default { ... }
            die "panic: shouldn't see this";
        }

    We can still define when as a variant of if, which makes it possible to intermix the two constructs when (or if) that is desirable. So we'll leave that identity in--it always helps people think about it when you can define a less familiar construct in terms of a more familiar one. However, the default isn't quite the same as an else, since else can't stand on its own. A default is more like an if that's always true. So the above code is equivalent to:

        given $given {
            print "about to check $first";
            if $given =~ $first { ...; break }
            print "didn't match $first; let's try $next";
            if $given =~ $next { ...; break }
            print "giving up";
            if 1 { ...; break; }
            die "panic: shouldn't see this";
        }

    We do need to rewrite the relationship table in the RFC to handle some of the tweaks and simplifications we've mentioned. The comparison of bare refs goes away. It wasn't terribly useful in the first place, since it only worked for scalar refs. (To match identities we'll need an explicit .id method in any event. We won't be relying on the default numify or stringify methods to produce unique representations.)

    I've rearranged the table to be applied in order, so that default interpretations come later. Also, the "Matching Code" column in the RFC gave alternatives that aren't resolved. In these cases I've chosen the "true" definition rather than the "exists" or "defined" definition. (Except for certain set manipulations with hashes, people really shouldn't be using the defined/undefined distinction to represent true and false, since both true and false are considered defined concepts in Perl.)

    Some of the table entries distinguish an array from a list. Arrays look like this:

        when [1, 3, 5, 7, 9] { "odd digit intersection" }
        when @array          { "array intersection" }

    while a list looks like this:

        when 1, 3, 5, 7, 9    { "odd digit" }
        when @foo, @bar, @baz { "intersection with at least one array" }

    Ordinarily lists and arrays would mean the same thing in scalar context, but when is special in differentiating explicit arrays from lists. Within a when, a list is a recursive disjunction. That is, the comma-separated values are treated as individual cases OR-ed together. We could use some other explicit notation for disjunction such as:

        when any(1, 3, 5, 7, 9) { "odd" }

    But that seems a lot of trouble for a very common case of case, as it were. We could use vertical bars as some languages do, but I think the comma reads better.

    Anyway, here's another simplification. The following table will also define how the Perl 6 =~ operator works! That allows us to use a recursive definition to handle matching against a disjunctive list of cases. (See the first entry in the table below.) Of course, for precedence reasons, to match a list of things using =~ you'll have to use parens:

        $digit =~ (1, 3, 5, 7, 9) and print "That's odd!";

    Alternatively, you can look at this table as the definition of the =~ operator, and then say that the switch statement is defined in terms of =~. That is, for any switch statement of the form

        given EXPR1 {
            when EXPR2 { ... }
        }

    it's equivalent to saying this:

        for (scalar(EXPR1)) {
            if ($_ =~ (EXPR2)) { ... }
        }

    Table 1: Matching a switch value against a case value

        $a      $b        Type of Match Implied    Matching Code
        ======  =====     =====================    =============
        expr    list      recursive disjunction    match if $a =~ any($b)
        list    list      recursive disjunction*   match if any($a) =~ any($b)
        hash    sub(%)    hash sub truth           match if $b(%$a)
        array   sub(@)    array sub truth          match if $b(@$a)
        expr    sub($)    scalar sub truth         match if $b($a)
        expr    sub()     simple closure truth*    match if $b()
        hash    hash      hash key intersection*   match if grep exists $a{$_}, $b.keys
        hash    array     hash value slice truth   match if grep {$a{$_}} @$b
        hash    regex     hash key grep            match if grep /$b/, keys %$a
        hash    scalar    hash entry truth         match if $a{$b}
        array   array     array intersection*      match if any(@$a) =~ any(@$b)
        array   regex     array grep               match if grep /$b/, @$a
        array   number    array entry truth        match if $a[$b]
        array   expr      array as list            match if any($a) =~ $b
        object  class     class membership         match if $a.isa($b)
        object  method    method truth             match if $a.$b()
        expr    regex     pattern match            match if $a =~ /$b/
        expr    subst     substitution match       match if $a =~ subst
        expr    number    numeric equality         match if $a == $b
        expr    string    string equality          match if $a eq $b
        expr    boolean   simple expression truth* match if $b
        expr    undef     undefined                match unless defined $a
        expr    expr      run-time guessing        match if ($a =~ $b) at runtime

    In order to facilitate optimizations, these distinctions are made syntactically at compile time whenever possible. For each comparison, the reverse comparison is also implied, so $a/$b can be thought of as either given/when or when/given. (We don't reverse the matches marked with * are because it doesn't make sense in those casees.)

    If type of match cannot be determined at compile time, the default is to try to apply the very same rules in the very same order at run time, using the actual types of the arguments, not their compile-time type appearance. Note that there are no run-time types corresponding to "method" or "boolean". Either of those notions can be expressed at runtime as a closure, of course.

    In fact, whenever the default behavior is not what you intend, there are ways to force the arguments to be treated as you intend:

        Intent      Natural           Forced
        ======      =======           ======
        array       @foo              [list] or @{expr}
        hash        %bar              {pairlist} or %{expr}
        sub(%)      { %^foo.aaa }     sub (%foo) { ... }
        sub(@)      { @^bar.bbb }     sub (@bar) { ... }
        sub($)      { $^baz.ccc }     sub ($baz) { ... }
        number      numeric literal   +expr int(expr) num(expr)
        string      string literal    _expr str(expr)
        regex       //, m//, qr//     /$(expr)/
        method      .foo(args)        { $_.$method(args) }
        boolean     $a == $b          ?expr or true expr or { expr }

    A method must be written with a unary dot to distinguish it from other forms. The method may have arguments. In essence, when you write

        .foo(1,2,3)

    it is treated as if you wrote

        { $_.foo(1,2,3) }

    and then the closure is evaluated for its truth.

    A class match works only if the class name is known at compile time. Use .isa("Class") for more complicated situations.

    Boolean expressions are recognized at compile time by the presence of a top-level operator that is a comparison or logical operator. As the table shows, an argumentless closure (a sub (), that is) also functions as a boolean. However, it's probably better documentation to use the true function, which does the opposite of not. (Or the unary ? operator, which does the opposite of unary !.)

    It might be argued that boolean expressions have no place here at all, and that you should use if if that's what you mean. (Or use a sub() closure to force it to ignore the given.) However, the "comb" structure of a switch is an extremely readable way to write even ordinary boolean expressions, and rather than forcing people to write:

        anyblock {
            when { $a == 1 } { ... }
            when { $b == 2 } { ... }
            when { $c == 3 } { ... }
            default          { ... }
        }

    I'd rather they be able to write:

        anyblock {
            when $a == 1 { ... }
            when $b == 2 { ... }
            when $c == 3 { ... }
            default      { ... }
        }

    This also fits better into the use of "when" within CATCH blocks:

        CATCH {
            when $!.tag eq "foo" { ... }
            when $!.tag eq "bar" { ... }
            default              { die }
        }

    To force all the when clauses to be interpreted as booleans without using a boolean operator on every case, simply provide an empty given, to be read as "given nothing...":

        given () {
            when $a.isa(Ant) { ... }
            when $b.isa(Bat) { ... }
            when $c.isa(Cat) { ... }
            default          { ... }
        }

    A when can be used by other topicalizers than just given. Just as CATCH will imply a given of $!, a for loop (the foreach variety) will also imply a given of the loop variable:

        for @foo {
            when 1   { ... }
            when 2   { ... }
            when "x" { ... }
            default  { ... }
        }

    By symmetry, a given will by default alias $_ to the "given". Basically, the only difference between a given and a for is that a given takes a scalar expression, while a for takes a pre-flattened list and iterates over it.

    Suppose you want to preserve $_ and alias $g to the value instead. You can say that like this:

        given $value -> $g {
            when 1 { /foo/ }
            when 2 { /bar/ }
            when 3 { /baz/ }
        }

    In the same way, a loop's values can be aliased to one or more loop variables.

        for @foo -> $a, $b {  # two at a time
            ...
        }

    That works a lot like the definition of a subroutine call with two formal parameters, $a and $b. (In fact, that's precisely what it is.) You can use modifiers on the formal paramaters just as you would in a subroutine type signature. This implies that the aliases are automatically declared as my variables. It also implies that you can modify the formal parameter with an rw property, which allows you to modify the original elements of the array through the variable. The default loop:

        for @foo { ... }

    is really compiled down to this:

        for @foo -> $_ is rw { ... }

    Since for and given work by passing arguments to a closure, it's a small step to generalize that in the other direction. Any method definition is a topicalizer within the body of the method, and will assume a "given" of its $self object (or whatever you have named it). Bare closures topicalize their first argument, implicitly aliasing it to $_ unless $^a or some such is used. That is, if you say this:

        grep { $_ eq 3 } @list

    it's equivalent to this more explicit use of a curried function:

        grep { $^a eq 3 } @list

    But even a grep can use the aliasing syntax above:

        grep -> $x { $x eq 3 } @list

    Outside the scope of any topicalizer, a when will assume that its given was stored in $_ and will test implicitly against that variable. This allows you to use when in your main loop, for instance, even if that main loop was supplied by Perl's -n or -p switch. Whenever a loop is functioning as a switch, the break implied by finishing a case functions as a next, not a last. Use last if that's what you mean.

    A when is the only defaulting construct that pays attention to the current topicalizer regardless of which variable it is associated with. All other defaulting constructs pay attention to a fixed variable, typically $_. So be careful what you're matching against if the given is aliased to something other than $_:

        $_ = "foo";
        given "bar" -> $f {
            if /foo/   { ... } # true, matches against $_
            when /bar/ { ... } # true, matches against $f
        }

    Oh, one other tweak. The RFC proposes to overload next to mean "fall through to the next case". I don't think this is wise, since we'll often want to use loop controls within a switch statement. Instead, I think we should use skip to do that. (To be read as "Skip to the next statement.")

    Similarly, if we make a word to mean to explicitly break out of a topicalizer, it should not be last. I'd suggest break! It will, of course, be unnecessary to break out of the end of a when case because the break is implied. However, there are times when you might want to break out of a when block early. Also, since we're allowing when modifiers that do not implicitly break, we could use an explicit break for that situation. You might see cases like this:

        given $x {
            warn("Odd value")        when !/xxx/;
            warn("No value"), break  when undef;
            when /aaa/ { break when 1; ... }
            when /bbb/ { break when 2; ... }
            when /ccc/ { break when 3; ... }
        }

    So it looks to me like we need a break.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    RFC 088: Omnibus Structured Exception/Error Handling Mechanism

    This RFC posits some requirements for exception handling (all of which I agree with), but I do have some additional requirements of my own:

    • The exception-catching syntax must be considered a form of switch statement.
    • It should be easy to turn any kind of block into a "try" block, especially a subroutine.
    • Even try-less try blocks must also be able to specify mandatory cleanup on exit.
    • It should be relatively easy to determine how much cleanup is necessary regardless of how a block was exited.
    • It must be possible to base the operation of return, next, and last on exception handling.
    • The cleanup mechanism should mesh nicely with the notions of post condition processing under design-by-contract.
    • The exception-trapping syntax must not violate encapsulation of lexical scopes.
    • At the same time, the exception-trapping syntax should not force declarations out of their natural scope.
    • Non-linear control flow must stand out visually, making good use of block structure, indentation and even keyword case. BEGIN and END blocks are to be considered prior art.
    • Non-yet-thrown exceptions must be a useful concept.
    • Compatibility with the syntax of any other language is specifically NOT a goal.

    RFC 88 is massive, weighing in at more than 2400 lines. Annotating the entire RFC would make this Apocalypse far too big. ("Too late!" says Damian.) Nonetheless, I will take the approach of quoting various bits of the RFC and recasting those bits to work with my additional requirements. Hopefully this will convey my tweaks most succinctly.

    Here's what the RFC gives as its first example:

        exception 'Alarm';
        try {
            throw Alarm "a message", tag => "ABC.1234", ... ;
            }
    
        catch Alarm => { ... }
    
        catch Error::DB, Error::IO => { ... }
    
        catch $@ =~ /divide by 0/ => { ... }
    
        catch { ... }
    
        finally { ... }

    Here's how I see that being written in Perl 6:

        my class X::Alarm is Exception { }     # inner class syntax?
        try {
            throw X::Alarm "a message", tag => "ABC.1234", ... ;
            CATCH {
                when X::Alarm             { ... }
                when Error::DB, Error::IO { ... }
                when /divide by 0/        { ... }
                default                   { ... }
            }
            POST { ... }
        }

    The outer block does not have to be a try block. It could be a subroutine, a loop, or any other kind of block, including an eval string or an entire file. We will call such an outer block a try block, whether or not there is an explicit try keyword.

    The biggest change is that the various handlers are moved inside of the try block. In fact, the try keyword itself is mere documentation in our example, since the presence of a CATCH or POST block is sufficient to signal the need for trapping. Note that the POST block is completely independent of the CATCH block. (The POST block has a corresponding PRE block for design-by-contract programmers.) Any of these blocks may be placed anywhere in the surrounding block--they are independent of the surrounding control flow. (They do have to follow any declarations they refer to, of course.) Only one CATCH is allowed, but any number of PRE and POST blocks. (In fact, we may well encourage ourselves to place POST blocks near the constructors to be cleaned up after.) PRE blocks within a particular try block are evaluated in order before anything else in the block. POST blocks will be evaluated in reverse order, though order dependencies between POST blocks are discouraged. POST blocks are evaluated after everything else in the block, including any CATCH.

    A try {} without a CATCH is equivalent to Perl 5's eval {}. (In fact, eval will go back to evaluating only strings in Perl 6, and try will evaluate only blocks.)

    The CATCH and POST blocks are naturally in the lexical scope of the try block. They may safely refer to lexically scoped variables declared earlier in the try block, even if the exception is thrown during the elaboration sequence. (The run-time system will guarantee that individual variables test as undefined (and hence false) before they are elaborated.)

    The inside of the CATCH block is precisely the syntax of a switch statement. The discriminant of the switch statement is the exception object, $!. Since the exception object stringifies to the error message, the when /divide by 0/ case need not be explicitly compared against $!. Likewise, explicit mention of a declared class implies an "isa" lookup, another built-in feature of the new switch statement.

    In fact, a CATCH of the form:

        CATCH { 
            when xxx { ... }          # 1st case
            when yyy { ... }          # 2nd case
            ...                       # other cases, maybe a default
        }
     means something vaguely like:
        BEGIN {
            %MY.catcher = {
                given current_exception() -> $! {
                    when xxx { ... }          # 1st case from above
                    when yyy { ... }          # 2nd case from above
                    ...                       # other cases, maybe a default
                    die;            # rethrow $! as implicit default
                }
                $!.markclean;       # handled cleanly, in theory
            }
        }

    The unified "current exception" is $!. Everywhere this RFC uses $@, it should be read as $! instead. (And the too-precious @@ goes away entirely in favor of an array stored internally to the $! object that can be accessed as @$! or $![-1].) (For the legacy Perl 5 parser, $@ and $? will be emulated, but that will not be available to the Perl 6 parser.)

    Also note that the CATCH block implicitly supplies a rethrow (the die above) after the cases of the switch statement. This will not be reached if the user has supplied an explicit default case, since the break of that default case will always bypass the implicit die. And if the switch rethrows the exception (either explicitly or implicitly), $! is not marked as clean, since the die will bypass the code that marks the exception as "cleanly caught". It should be considered an invariant that any $! in the normal control flow outside of a CATCH is considered "cleanly caught", according to the definition in the RFC. Unclean exceptions should only be seen inside CATCH blocks, or inside any POST blocks that have to execute while an exception is propagating to an outer block because the current try block didn't handle it. (If the current try block does successfully handle the exception in its CATCH, any POST blocks at the same level see a $! that is already marked clean.)

    RFC:

    eval {die "Can't foo."}; print $@; continues to work as before.

    That will instead look like

        try { die "Can't foo" }; print $!;

    in Perl 6. A try with no CATCH:

        try { ... }

    is equivalent to:

        try { ... CATCH { default { } } }

    (And that's another reason I didn't want to use else for the default case of a switch statement--an else without an if looks really bizarre...)

    Just as an aside, what I'm trying to do here is untangle the exception trapping semantics of eval from its code parsing and running semantics. In Perl 6, there is no eval {}. And eval $string really means something like this:

        try { $string.parse.run }

    RFC:

    This RFC does not require core Perl functions to use exceptions for signalling errors.

    However, Perl core functions will by default signal failure using unthrown proto-exceptions (that is, interesting values of undef) that can easily be turned into thrown exceptions via die. By "interesting values of undef", I don't mean undef with properties. I mean full-fledged exception objects that just happen to return false from their .defined and .true methods. However, the .str method successfully returns the error message, and the .int method returns the error code (if any). That is, they do stringify and numify like $! ought to. An exception becomes defined and true when it is thrown. (Control exceptions become false when cleanly caught, to avoid spoofing old-style exception handlers.)

    RFC:

    This means that all exceptions propagate unless they are cleanly caught, just as in Perl 5. To prevent this, use:
        try { fragile(); } catch { } # Go on no matter what.

    This will simply be:

        try { fragile; }

    But it means the same thing, and it's still the case that all exceptions propagate unless they are cleanly caught. In this case, the caught exception lives on in $! as a new proto-exception that could be rethrown by a new die, much as we used to use $@. Whether an exception is currently considered "cleanly caught" can be reflected in the state of the $! object itself. When $! passes through the end of a CATCH, it is marked as clean, so that subsequent attempts to establish a new $! know that they can clear out the old @$! stack. (If the current $! is not clean, it should just add its information without deleting the old information--otherwise an error in a CATCH could delete the exception information you will soon be wanting to print out.)

    RFC:

        try { ... } catch <test> => { ... } finally { ... }

    Now:

        { ... CATCH { when <test> { ... } } POST { ... } }

    (The angle brackets aren't really there--I'm just copying the RFC's metasyntax here.)

    Note that we're assuming a test that matches the "boolean" entry from the switch dwimmery matrix. If not, you can always wrap closure curlies around the test:

        { ... CATCH { when { <test> } { ... } } POST { ... } }

    That will force the test to be called as a subroutine that ignores its argument, which happens to be $!, the exception object. (Recall that the implied "given" of a CATCH statement sets $! as the given value. That given value is automatically passed to any "when" cases that look like subroutines or closures, which are free either to ignore the passed value, or access it as $_ or $^a.)

    Or you might just prefer to use the unary true operator:

        { ... CATCH { when true <test> { ... } } POST { ... } }

    I personally find that more readable than the closure.

    RFC:

    The test argument of the catch clause is optional, and is described below.

    The test argument of a when clause is NOT optional, since it would be impossible to distinguish a conditional closure from the following block. Use default for the default case.

    RFC:

    try, catch, and finally blocks should share the same lexical scope, in the way that while and continue do.

    Actually, this is not so--the while and continue blocks don't share the same lexical scope even in Perl 5. But we'll solve this issue without "tunneling" in any case. (And we'll change the continue block into a NEXT block that goes inside, so we can refer to lexical variables from within it.)

    RFC:

    Note that try is a keyword, not a function. This is so that a ; is not needed at the end of the last block. This is because a try/catch/finally now looks more like an if/elsif/else, which does not require such a ;, than like an eval, which does).

    Again, this entire distinction goes away in Perl 6. Any expression block that terminates with a right curly on its own line will be interpreted as a statement block. And try is such an expression block.

    RFC:

    $@ contains the current exception, and @@ contains the current exception stack, as defined above under die. The unshift rule guarantees that $@ == $@[0].

    Why an unshift? A stack is most naturally represented in the other direction, and I can easily imagine some kinds of handlers that might well treat it like a stack, stripping off some entries and pushing others.

    Also, @@ is a non-starter because everything about the current exception should all be in a single data structure. Keeping the info all in one place makes it easy to rethrow an exception without losing data, even if the exception was marked as cleanly caught. Furthermore I don't think that the exception stack needs to be Huffman coded that badly.

    So $! contains the current exception, and $!.stack accesses the current exception stack. Through the magic of overloading, the $! object can likely be used as an array even though it isn't one, in which case @$! refers to that stack member. The push rule guarantees that $!.id == $![-1].id.

    RFC (speaking of the exception declaration):

    If the given name matches /::/, something like this happens:
        @MyError::App::DB::Foo::ISA = 'MyError::App::DB';

    and all non-existent parent classes are automatically created as inheriting from their parent, or Exception in the tail case. If a parent class is found to exist and not inherit from Exception, a run-time error exception is raised.

    If I understand this, I think I disagree. A package ought to able to contain exceptions without being an exception class itself. There certainly ought to be a shorthand for exceptions within the current package. I suspect they're inner classes of some sort, or inner classes of an inner package, or some such.

    RFC:

    If the given name does not match /::/ (say it's just Alarm), this happens instead:
        @Alarm::ISA = 'Exception';

    This means that every exception class isa Exception, even if Exception:: is not used at the beginning of the class name.

    Ack! This could be really bad. What if two different modules declare an Alarm exception with different derivations?

    I think we need to say that unqualified exceptions are created within the current package, or maybe within the X subpackage of the current package. If we have inner classes, they could even be lexically scoped (and hence anonymous exceptions outside the current module). That might or might not be a feature.

    I also happen to think that Exception is too long a name to prefix most common exceptions, even though they're derived from that class. I think exceptions will be better accepted if they have pithier names like X::Errno that are derived from Exception:

        our class X::Control is Exception;
        our class X::Errno is Exception;
        our class X::NumericError is Exception;
        our class C::NEXT is X::Control;
        our class E::NOSPC is X::Errno;
        our class X::FloatingUnderflow is X::NumericError;

    Or maybe those could be:

        c::NEXT
        e::NOSPC
        x::FloatingUnderflow

    if we decide uppercase names are too much like user-defined package names. But that looks strange. Maybe we just reserve single letter top-level package names for Perl. Heck, let's just reserve all top-level package names for Perl. Er, no, wait... :-)

    RFC 80 suggests that exception objects numerify to the system's errno number when those are available. That's a possibility, though by the current switch rules we might have to write

        CATCH {
            when +$ENOSPC { ... }
        }

    to force $ENOSPC to do a numeric comparison. It may well be better to go ahead and make the errno numbers into exception classes, even if we have to write something like this:

        CATCH {
            when X::ENOSPC { ... }
        }

    That's longer, but I think it's clearer. Possibly that's E::NOSPC instead. But in any event, I can't imagine getting people to prefix every exception with "Exception::". That's just gonna discourage people from using exceptions. I'm quite willing to at least reserve the X top-level class for exceptions. I think X:: is quite sufficiently distinctive.

    RFC:

        try { my $f = open "foo"; ... } finally { $f and close $f; }

    Now:

        {
            my $f = open "foo"; ...
            POST { $f and close $f }
        }

    Note that $f is naturally in scope and guaranteed to have a boolean value, even if the exception is thrown before the declaration statement is elaborated! (An implementation need not allocate an actual variable before the my. The code of the POST block could always be compiled to know that $f is to be assumed undefined if the allocating code has not yet been reached.)

    We could go as far as to make

            POST { close $f }

    do something reasonable even without the guard. Maybe an undefined object could "emulate" any method for you within a POST. Maybe try is really a unary operator:

            POST { try close $f }

    Or some such. I dunno. This needs more thought along transactional lines...

    Time passes...

    Actually, now that I've thought on it, it would be pretty easy to put wrappers around POST blocks that could do commit or rollback depending on whether the block exits normally. I'd like to call them KEEP and UNDO. KEEP blocks would only be executed if the block succeeded. UNDO blocks would only be executed if the block failed. One could even envision a syntax that ties the block to particular variable:

        UNDO $f { close $f }

    After all, like the CATCH block, all of these blocks are just fancy BEGIN blocks that attach some meaning to some predefined property of the block.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    It's tempting to make the execution of UNDO contingent upon whether the block itself was passed during execution, but I'm afraid that might leave a window in which a variable could already be set, but subsequent processing might raise an exception before enabling the rollback in question. So it's probably better to tie it to a particular variable's state more directly than just by placing the block at some point after the declaration. In fact, it could be associated directly with the variable in question at declaration time via a property:

        my $f is undo { close $f } = open $file or die;

    Note that the block is truly a closure because it relies on the lexical scoping of $f. (This form of lexical scoping works in Perl 6 because the name $f is introduced immediately within the statement. This differs from the Perl 5 approach where the name is not introduced till the end of the current statement.)

    Actually, if the close function defaults to $_, we can say

        my $f is undo { close } = open $file;

    presuming the managing code is smart enough to pass $f as a parameter to the closure. Likewise one could attach a POST block to a variable with:

        my $f is post { close } = open $file;

    Since properties can be combined, you can set multiple handlers on a variable:

        my $f is post { close } is undo { unlink $file } = open ">$file" or die;

    There is, however, no catch property to go with the CATCH block.

    I suppose we could allow a pre property to set a PRE block on a variable.

    RFC:

        sub attempt_closure_after_successful_candidate_file_open
        {
            my ($closure, @fileList) = @_; local (*F);
            foreach my $file (@fileList) {
                try { open F, $file; } catch { next; }
                try { &$closure(*F); } finally { close F; }
                return;
                }
            throw Exception "Can't open any file.",
                   debug => @fileList . " tried.";
            }

    Now:

        sub attempt_closure_after_successful_candidate_file_open
          (&closure, @fileList)
        {
            foreach my $file (@fileList) {
                my $f is post { close }
                    = try { open $file or die; CATCH { next } }
                &closure($f);
                return;
            }
            throw Exception "Can't open any file.",
                   debug => @fileList . " tried.";
        }

    Note that the next within the CATCH refers to the loop, not the CATCH block. It is legal to next out of CATCH blocks, since we won't use next to fall through switch cases.

    However, X::Control exceptions (such as X::NEXT) are a subset of Exceptions, so

        CATCH {
            when Exception { ... }   # catch any exception
        }

    will stop returns and loop exits. This could be construed as a feature. When it's considered a bug, you could maybe say something like

        CATCH {
            when X::Control { die }  # propagate control exceptions
            when Exception  { ... }  # catch all others
        }

    to force such control exceptions to propagate outward. Actually, it would be nice to have a name for non-control exceptions. Then we could say (with a tip of the hat to Maxwell Smart):

        CATCH {
            when X::Chaos   { ... }  # catch non-control exceptions
        }

    And any control exceptions will then pass unimpeded (since by default uncaught exceptions are rethrown implicitly by the CATCH). Fortunately or unfortunately, an explicit default case will not automatically rethrow control exceptions.

    Following are some more examples of how the expression evaluation of when can be used. The RFC versions sometimes look more concise, but recall that the "try" is any block in Perl 6, whereas in the RFC form there would have to be an extra, explicit try block inside many subroutines, for instance. I'd rather establish a culture in which it is expected that subroutines handle their own exceptions.

    RFC:

        try { ... } catch $@->{message} =~ /.../ => { ... }

    Now:

        try {
            ...
            CATCH {
                when $!.message =~ /.../ { ... }
            }
        }

    This works because =~ is considered a boolean operator.

    RFC:

        catch not &TooSevere => { ... }

    Now:

        when not &TooSevere { ... }

    The unary not is also a boolean operator.

    RFC:

        try { ... } catch ref $@ =~ /.../ => { ... }

    Now:

        try { ... CATCH { when $!.ref =~ /.../ { ... } } }

    RFC:

        try { ... } catch grep { $_->isa("Foo") } @@ => { ... }

    Now:

        try {
            ...
            CATCH {
                when grep { $_.isa(Foo) } @$! { ... }
            }
        }

    I suppose we could also assume grep to be a boolean operator in a scalar context. But that's kind of klunky. If we accept Damian's superposition RFC, it could be written this way:

        try {
            ...
            CATCH {
                when true any(@$!).isa(Foo) { ... }
            }
        }

    Actually, by the "any" rules of the =~ table, we can just say:

        try {
            ...
            CATCH {
                when @$! =~ Foo { ... }
            }
        }

    The RFC proposes the following syntax for finalization:

        try { my $p = P->new; my $q = Q->new; ... }
        finally { $p and $p->Done; }
        finally { $q and $q->Done; }

    A world of hurt is covered over by that "...", which could move the finally clauses far, far away from what they're trying to clean up after. I think the intent is much clearer with POST. And note also that we avoid the "lexical tunneling" perpetrated by finally:

        {
            my $p = P.new;   POST { $p and $p.Done; }
            my $q = Q.new;   POST { $q and $q.Done; }
            ...
        }

    More concisely, we can say:

        {
            my $p is post { .Done } = P.new;
            my $q is post { .Done } = Q.new;
            ...
        }

    RFC:

        try     { TryToFoo; }
        catch   { TryToHandle; }
        finally { TryToCleanUp; }
        catch   { throw Exception "Can't cleanly Foo."; }

    How I'd write that:

        try {
            try {
                TryToFoo;
                POST    { TryToCleanUp; }
                CATCH   { TryToHandle; }
            }
            CATCH   { throw Exception "Can't cleanly Foo."; }
        }

    That also more clearly indicates to the reader that the final CATCH governs the inner try completely, rather than just relying on ordering.

    RFC:

    Instances of the actual (non-subclassed) Exception class itself are used for simple exceptions, for those cases in which one more or less just wants to say throw Exception "My message.", without a lot of extra tokens, and without getting into higher levels of the taxonomy of exceptions.

    die "My message." has much the same effect. I think fail "My message."  will also default similarly, though with return-or-throw semantics that depend on the caller's use fatal settings.

    RFC (regarding on_raise):

    Derived classes may override this method to attempt to "handle" an exception or otherwise manipulate it, just before it is raised. If on_raise throws or returns true the exception is raised, otherwise it is not. An exception can be manipulated or replaced and then propagated in modified form simply by re-raising it in on_raise.

    Offhand, I don't see this one. Not only does it seem to be making the $SIG{__DIE__} mistake all over again, it also makes little sense to me to use "throw" to do something that doesn't throw. A throw should guarantee termination of control, or you're just going to run user code that wasn't expected to be run. It'd be like return suddenly not returning! Let's please use a different method to generate an unthrown exception. I think a fail method is the right approach--it terminates the control flow one way or another, even if just returning the exception as a funny-looking undef.

    The on_catch might be a bit more useful.

    RFC:

    ...because the authors are of the opinion that overloading else and continue with unwind semantics not traditionally associated with else and continue can be confusing, especially when intermixed with local flow-control forms of else and continue (which may be present in any { ... } block), or when an else die $@ is forgotten on a switch that needs to re-throw.

    CATCH will rethrow by default (unless there is a user-specified default).

    RFC:

    Some perl6-language-error discussions have suggested leaving out the try altogether, as in simply writing { } else { } to indicate non-local flow-control at work. Yikes!

    The try is not for Perl's sake. It's for the developer's sake. It says, watch out, some sort of non-local flow control is going on here. It signals intent to deal with action at a distance (unwinding semantics). It satisfies the first requirement listed under MOTIVATION.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    try {} is the new spelling of eval {}, so it can still be used when self-documentation is desired. It's often redundant, however, since I think the all-caps CATCH and POST also serve the purpose of telling the developer to "watch out". I expect that developers will get used to the notion that many subroutines will end with a CATCH block. And I'm always in favor of reducing the bracket count of ordinary code where practical. (That's why the package declaration has always had a bracketless syntax. I hope to do the same for classes and modules in Perl 6.)

    RFC:

    The comma or = in a conditional catch clause is required so the expression can be parsed from the block, in the fashion of Perl 5's parsing of: map <expression, <list>; Without the comma, the form catch $foo { ... } could be a test for $foo or a test for $foo{...} (the hash element).>

    We now require whitespace before non-subscript block, so this is not much of a problem.

    RFC:

    How can we subclass Exception and control the class namespace? For example, if the core can use any Exception::Foo, where does one connect non-core Exceptions into the taxonomy? Possibly the core exceptions can derive from Exception::CORE, and everyone else can use the Exception::MyPackage convention.

    I don't think defining things as core vs non-core is very useful--"core" is not a fundamental type of exception. I do think the standard exception taxonomy should be extensible, so that non-standard exceptions can migrate toward being standard over time. I also think that modules and classes should have their own subpackage in which to store exceptions.

    RFC:

    How can we add new instance variables and methods to classes derived from Exception and control those namespaces? Perhaps this will be covered by some new Perl 6 object technology. Otherwise, we will need yet another naming scheme convention.

    Instance variables and methods in a derived class will not interfere with base classes (except by normal hiding of duplicate method names).

    RFC:

    What should the default values be for Exception object instance variables not specified to the constructor? For example, tag could default to file + line number.

    Depends on the constructor, I suspect.

    RFC:

    What assertions should be placed on the instance variables, if any?

    Probably depends on the class.

    RFC:

    What should stringification return?

    I lean towards just the message, with a different method for more info. But this is somewhat dependent on which representational methods we define for all Objects. And that has not been entirely thunk through.

    RFC:

    Mixed Flow Control

    Some of the reference texts, when discussing exception handling, refer to the matter that it may be difficult to implement a go to across an unwinding semantics block, as in:

            try { open F, $f } catch { next; }

    This matter will have to be referred to the internals experts. It's ok if this functionality is not possible, it can always be simulated with lexical state variables instead.

    However, the authors would very much prefer that gotos across unwinding boundaries would dwim. If that is not possible, hopefully some sort of compile-time warning could be produced.

    We can do this with special control exceptions that aren't caught until it makes sense to catch them. (Where exactly control exceptions fit in the class hierarchy is still open to debate.) In any event, there's no problem throwing a control exception from a CATCH, since any exception thrown in a CATCH or POST would propagate outside the current try block in any event.

    Ordinary goto should work as long as it's leaving the current try scope. Reentering the try somewhere in the middle via goto is likely not possible, or even desirable. A failed try should be re-entered from the top, once things have been cleared up. (If the try is a loop block, going to the next iteration out of its CATCH will probably be considered safe, just as if there had been an explicit try block within the loop. But I could be wrong on that.)

    RFC:

    Use %@ for Errors from Builtins

    RFC 151 proposes a mechanism for consolidating the information provided by of $@, $!, $?, and $^E. In the opinion of the author of RFC 88, merging $@ and $! should not be undertaken, because $@ should only be set if an exception is raised.

    The RFC appears to give no justification for this last assertion. If we unify the error variables, die with no arguments can simply raise the current value of $!, and we stay object oriented all the way down. Then $! indicates the current error whether or not it's being thrown. It keeps track of its own state, as to whether it is currently in an "unclean" state, and refuses to throw away information unless it's clean.

    %@ should be used to hold this fault-hash, based on the following arguments for symmetry.
            $@    current exception
            @@    current exception stack
            %@    current core fault information
            $@[0]        same as $@
            $@{type}     "IO::File::NotFound"
            $@{message}  "can't find file"
            $@{param}    "/foo/bar/baz.dat"
            $@{child}    $?
            $@{errno}    $!
            $@{os_err}   $^E
            $@{chunk}    That chunk thingy in some msgs.
            $@{file}     Source file name of caller.
            $@{line}     Source line number of caller.

    %@ should not contain a severity or fatality classification.

    Every call to a core API function should clear %@ if it returns successfully.

    Internally, Perl can use a simple structured data type to hold the whole canonical %@. The code that handles reading from %@ will construct it out of the internal data on the fly.

    If use fatal; is in scope, then just before returning, each core API function should do something like: %@ and internal_die %@;

    The internal_die becomes the one place where a canonical Exception can be generated to encapsulate %@ just before raising an exception, whether or not the use of such canonical Exceptions is controlled by a pragma such as use exceptions;.

    This %@ proposal just looks like a bunch of unnecessary complication to me. A proto-exception object with methods can be just as easily (and lazily) constructed, and will map straight into a real exception, unlike this hash. And an object can always be used as a hash to access parameterless methods such as instance variable accessors.

    RFC:

    eval

    The semantics of eval are, "clear $@ and don't unwind unless the user re-dies after the eval". The semantics of try are "unwind after try, unless any raised exception was cleanly and completely handled, in which case clear $@".

    In the author's opinion, both eval and try should exist in Perl 6. This would also mean that the legacy of examples of how to use eval in Perl will still work.

    And, of course, we still need eval $string.

    Discussions on perl6-language-errors have shown that some would prefer the eval { ... } form to be removed from Perl 6, because having two exception handling methods in Perl could be confusing to developers. This would in fact be possible, since the same effect can be achieved with:

            try { } catch { } # Clears $@.
            my $e;
            try { ... } catch { $e = $@; }
            # now process $e instead of $@

    On the other hand, eval is a convenient synonym for all that, given that it already works that way.

    I don't think the exact semantics of eval {...} are worth preserving. I think having bare try {...} assume a CATCH { default {} } will be close enough. Very few Perl 5 programs actually care whether $@ is set within the eval. Given that and the way we've defined $!, the translation from Perl 5 to Perl 6 involves simply changing eval {...} to try {...} and $@ to $! (which lives on as a "clean" exception after being caught by the try). Perhaps some attempt can be made to pull an external handler into an internal CATCH block.

    RFC:

    catch v/s else + switch

    Some participants in discussions on perl6-language-errors have expressed the opinion that not only should eval be used instead of try, but else should be used instead of multiple catch blocks. They are of the opinion that an else { switch ... } should be used to handle multiple catch clauses, as in:

            eval { ... }
            else {
                switch ($@) {
                    case $@->isa("Exception::IO") { ... }
                    case $@->my_method { ... }
                    }
                }

    This problem with else { switch ... } is: how should the code implicitly rethrow uncaught exceptions? Many proponents of this model think that uncaught exceptions should not be implicitly rethrown; one suggests that the programmer should undef $@ at the end of *every* successful case block, so that Perl re-raises any $@ still extant at the end of the else.

    This RFC allows a switch to be used in a catch { ... } clause, for cases where that approach would minimize redundant code in catch <expr> { ... } clauses, but with the mechanism proposed in this RFC, the switch functionality shown above can be written like this, while still maintaining the automatic exception propagation when no cases match:

            try { ... }
            catch Exception::IO => { ... }
            catch $@->my_method => { ... }

    The switch construct works fine, because the implied break of each handled case jumps over the default rethrow supplied by the CATCH. There's no reason to invent a parallel mechanism, and lots of reason not to.

    RFC:

    Mechanism Hooks

    In the name of extensibility and debugging, there should be hooks for callbacks to be invoked when a try, catch, or finally block is entered or exited, and when a conditional catch is evaluated. The callbacks would be passed information about what is happening in the context they are being called from.

    In order to scope the effect of the callbacks (rather than making them global), it is proposed that the callbacks be specified as options to the try statement, something like this:

        try on_catch_enter => sub { ... },
            on_catch_exit  => sub { ... },
        {
            ...
            }

    The (dynamic, not lexical) scope of these callbacks is from their try down through all trys nested under it (until overridden at a lower level). Nested callbacks should have a way of chaining to callbacks that were in scope when they come into scope, perhaps by including a reference to the outer-scope callback as a parameter to the callback. Basically, they could be kept in "global" variables overridden with local.

    Yuck. I dislike cluttering up the try syntax with what are essentially temp assignments to dynamically scoped globals. It should be sufficient to say something like:

        {
            temp &*on_catch_enter = sub { ... };
            temp &*on_catch_exit  = sub { ... };
            ...
        }

    provided, of course, the implementation is smart enough to look for those hooks when it needs them.

    RFC:

    Mixed-Mode Modules

    Authors of modules who wish to provide a public API that respects the current state of use fatal; if such a mechanism is available, can do so as follows.

    Internal to their modules, authors can use lexically scoped use fatal; to explicitly control whether or not they want builtins to raise exceptions to signal errors.

    Then, if and only if they want to support the other style, and only for public API subroutines, they do something like one of these:

    • Use return internally, now add support for throw at API:
           sub Foo
           {
              my $err_code = ... ; # real code goes here
              # Replace the old return $err_code with this:
              return $err_code unless $FATAL_MODE && $error_code != $ok;
              throw Error::Code "Couldn't Foo.", code => $err_code;
              }

    • Use throw internally, add support for return at API:
           sub Foo
           {
              try {
                  # real code goes here, may execute:
                  throw Exception "Couldn't foo.", code => $err_code;
                  }
              catch !$FATAL_MODE => { return $@->{code}; }
              return $ok;
              }

    Yow. Too much mechanism. Why not just:

        return proto Exception "Couldn't foo.", code => $err_code;

    The proto method can implement the standard use fatal semantics when that is desired by the calling module, and otherwise set things up so that

        Foo() or die;

    ends up throwing the proto-exception. (The current proto-exception can be kept in $! for use in messages, provided it's in thread-local storage.)

    Actually, this is really important to make simple. I'd be in favor of a built-in that clearly says what's going on, regardless of whether it ends in a throw or a return of undef:

        fail "Couldn't foo", errno => 2;

    Just as an aside, it could be argued that all such "built-ins" are really methods on an implicit class or object. In this case, the Exception class...

    RFC:

    $SIG{__DIE__}

    The try, catch, and finally clauses localize and undef $SIG{__DIE__} before entering their blocks. This behavior can be removed if $SIG{__DIE__} is removed.

    $SIG{__DIE__} must die. At least, that name must die--we may install a similar global hook for debugging purposes.

    RFC:

    Legacy

    The only changes in respect of Perl 5 behaviour implied by this RFC are that (1) $@ is now always an Exception object (which stringifies reasonably), it is now read-only, and it can only be set via die, and (2) the @@ array is now special, and it is now read-only too.

    Perhaps $! could be implicitly declared to have a type of Exception. But I see little reason to make $! readonly by default. All that does is prevent clever people from doing clever things that we haven't thought of yet. And it won't stop stupid people from doing stupid things. In any event, $! is just a reference to an object, and access to the object will controlled by the class, not by Perl.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    RFC 199: Short-circuiting built-in functions and user-defined subroutines

    First I should note in passing that it is likely that

        my ($found) = grep { $_ == 1 } (1..1_000_000);

    will be smart enough to stop on the first one without additional hints, since the left side will only demand one value of the right side.

    However, we do need to unify the behaviors of built-ins with user-defined control structures. From an internal point of view, all of these various ways of exiting a block will be unified as exceptions.

    It will be easy enough for a user-defined subroutine to catch the appropriate exceptions and do the right thing. For instance, to implement a loop wrapper (ignoring parser issues), you might write something like this:

        sub mywhile ($keyword, &condition, &block) {
            my $l = $keyword.label;
            while (&condition()) {
                &block();
                CATCH {
                    my $t = $!.tag;
                    when X::Control::next { die if $t && $t ne $l); next }
                    when X::Control::last { die if $t && $t ne $l); last }
                    when X::Control::redo { die if $t && $t ne $l); redo }
                }
            }
        }

    Remember that those die calls are just rethrows of the current exception to get past the current try scope (the while in this case).

    How a block gets a label in general is an interesting question. It's all very well to say that the keyword is the label, but that doesn't help if you have two nested constructs with the same name. In Perl 5, labels are restricted to being at the beginning of the statement, but then how do you label a grep? Should there be some way of specifying a label on a keyword rather than on a statement? We could end up with something like this:

        my $found = grep:NUM { $_ == 1 and last NUM: $_ } (1..1_000_000);

    On the other hand, considering how often this feature is (not) going to used, I think we can stick with the tried-and-true statement label:

        my $found = do { NUM: grep { $_ == 1 and last NUM: $_ } (1..1_000_000) };

    This has the advantage of matching the label syntax with a colon on the end in both places. I like that.

    I don't think every block should implicitly have a way to return, or we'll have difficulty optimizing away blocks that don't do anything blockish. That's because setting up a try environment is always a bit blockish, and does in fact impose some overhead that we'd just as soon avoid when it's unnecessary.

    However, it's probably okay if certain constructs that would know how to deal with a label are implicitly labelled by their keyword name when they don't happen to have an explicit label. So I think we can allow something like:

        last grep: $_

    Despite its appearance, that is not a method call, because grep is not a predefined class. What we have is a unary operator last that is taking an adverbial modifier specifying what to return from the loop.

    The interesting policy question as we go on will be whether a given construct responds to a given exception or not. Some exceptions will have to be restricted in their use. For instance, we should probably say that only explicit sub declarations may respond to a return. People will expect return to exit the subroutine they think they're in, even if there are blocks floating around that are actually closures being interpreted elsewhere. It might be considered antisocial for closure interpreters like grep or map or sort to trap X::Control::return sooner than the user expects.

    As for using numbers instead of labels to indicate how many levels to break out of, that would be fine, except that I don't believe in breaking out by levels. If the problem is complex enough that you need to break out more than one level, you need a name, not a number. Then it doesn't matter if you refactor your code to have more block levels or less. I find I frequently have to refactor my code that way.

    It's possible to get carried away and retrofit grep and map with every conceivable variety of abort, retry, accept, reject, reduce, reuse, recycle, or whatever exception. I don't think that's necessary. There has to be some reason for writing your own code occasionally. If we get rid of all the reasons for writing user-defined subroutines, we might as well pack our bags and go home. But it's okay at minimum to treat a looping construct like a loop.

    RFC 006: Lexical variables made default

    This RFC proposes that strict vars should be on by default. This is motivated by the desire that Perl better support (or cajole, in this case) the disciplines that enable successful programming in the large. This goal is laudable.

    However, the programming-in-the-small advocates also have a valid point: they don't want to have to go to all the trouble of turning off strictures merely to write a succinct one-liner, since keystrokes are at a premium in such programming, and in fact the very strictures that increase clarity in large programs tend to decrease clarity in small programs.

    So this is one of those areas where we desire to have it both ways, and in fact, we pretty much can. The only question is where to draw the line. Some discussion suggested that only programs specified on the command line via the -e switch should be exempt from stricture. But I don't want to force every little file-based script into the large model of programming. And we don't need to.

    Large programming requires the definition of modules and classes. The typical large program will (or should) consist mostly of modules and classes. So modules and classes will assume strict vars. Small programming does not generally require the definition of modules and classes, though it may depend on existing modules and classes. But even small programs that use a lot of external modules and classes may be considered throw-away code. The very fact that the main code of a program is not typically reused (in the sense that modules and classes are reused) means that there is where we should draw the line. So in Perl 6, the main program will not assume strict vars, unless you explicitly do something to turn it on, such as to declare "class Main".

    RFC 330: Global dynamic variables should remain the default

    This is fine for the main program, but modules and classes should be held to the higher standard of use strict.

    RFC 083: Make constants look like variables

    It's important to keep in mind the distinction between variables and values. In a pure OO environment, variables are merely references to values, and have no properties of their own--only the value itself would be able to say whether it is constant. Some values are naturally constant, such as a literal string, while other values could be marked constant, or created without methods that can modify the object, or some such mechanism. In such an environment, there is little use for properties on variables. Any time you put a property on a variable, it's potentially lying about its value.

    However, Perl does not aspire to be a pure OO environment. In Perl-think, a variable is not merely a container for a value. Rather, a variable provides a "view" of a value. Sometimes that view could even be construed as a lie. That's okay. Lying to yourself is a useful survival skill (except when it's not). We find it necessary to repeat "I think I can" to ourselves precisely when we think we can't. Conversely, it's often valuable psychologically to treat possible activities as forbidden. Abstinence is easier to practice if you don't have to decide anew every time there's a possible assignation, er, I mean, assignment.

    Constant declarations on variables fall into this category. The value itself may or may not naturally be constant, but we will pretend that it is. We could in theory go farther than that. We could check the associated object to make sure that it is constant, and blow up if it's not, but that's not necessary in this case for consistent semantics. Other properties may be stricter about this. If you have a variable property that asserts a particular shape of multidimensional array, for instance, the object in question had better be able to supply semantics consistent with that view, and it's probably a good idea to blow up sooner rather than later if it can't. This is something like strong typing, except that it's optional, because the variable property itself is optional.

    Nevertheless, the purpose of these variable properties is to allow the compiler to deduce things about the program that it could not otherwise deduce, and based on those deductions, produce both a more robust and more efficient compile-time interpretation of the semantics of the program. That is to say, you can do more optimizations without compromising safety. This is obviously true in the case of inlining constants, but the principle extends to other variable properties as well.

    The proposed syntax is fine, except that we'll be using is instead of : for properties, as discussed in Apocalypse 2. (And it's constant, not const.)

    RFC 337: Common attribute system to allow user-defined, extensible attributes

    As already revealed in Apocalypse 2, attributes will be known as "properties" in Perl 6, to avoid confusion with existing OO nomenclature for instance variables. Also, we'll use the is keyword instead of the colon.

    Setting properties on array and hash elements bothers me, particularly when those properties have names like "public" and "private". This seems to me to be an attempt to paper over the gap of some missing OO functionality. So instead, I'd rather keep arrays and hashes mostly for homogenous data structures, and encourage people to use objects to store data of differing types. Then public and private can be properties of object attributes, which will look more like real variables in how they are declared. And we won't have to worry about the meaning of my @foo[2], because that still won't be allowed.

    Again, we need to be very clear that the object representing the variable is different than any objects contained by the variable. When we say

        my Dog @dogpound is loud;

    we mean that the individual elements of @dogpound are of type Dog, not that the array variable is of type Dog. But the loud property applies to the array, not to the dogs in the array. If the array variable needs to have a type, it can be supplied as if it were a property:

        my Dog @dogpound is DogPound is loud;

    That is, if a property is the name of a known package/class, it is taken to be a kind of tie. Given the declaration above, the following is always true:

        @dogpound.is.loud

    since the loud is a property of the array object, even if it contains no dogs. It turns out that

        @dogpound.is.DogPound

    is also true. This does not do an isa lookup. For that, say:

        @dogpound.isa(Pound)

    Note that you can use:

        @dogpound =~ Dog

    to test the individual elements for Doghood.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    RFC 173: Allow multiple loop variables in foreach statements

    Unfortunately, the proposed syntax could also be interpreted as parallel traversal:

      foreach ($a, $b) (@a, @b)

    Also the RFC assumes pairs will be passed as two elements, which is no longer necessarily the case. A hash by itself in list context will return a list of pair objects. We'll need to say something like:

        %hash.kv

    to get a flattened list of keys alternating with values. (The same method on arrays produces alternating indices and values.)

    I like the idea of this RFC, but the proposed syntax is not what I'd like. There are various possible syntaxes that could also potentially fulfill the intent of RFC 120:

        for [$i => $elem] (@array) { }
        for {$i => $elem} (@array) { }
        for ($i, $elem) = (@array.kv) { }

    But I like the idea of something that feels like repeated binding. We could use the := binding operator, but since binding is actually the operation performed by formal parameters of subroutines, and since we'd like to keep the list near the for and the formals near the closure, we'll use a variant of subroutine declaration to declare for loops:

        for @list -> $x { ... }         # one value at a time
        for @list -> $a, $b { ... }     # two values at a time

    You can un-interleave an array by saying:

        for @xyxyxy -> $x, $y { ... }

    Iterating over multiple lists in parallel needs a syntax much like a multi-dimensional slice. That is, something like a comma that binds looser than a comma. Since we'll be using semicolon for that purpose to delimit the dimensions of multi-dimensional slices, we'll use similar semicolons to delimit a parallel traversal of multiple lists: So parallel arrays could be stepped through like this:

        for @xxx; @yyy; @zzz -> $x; $y; $z { ... }

    If there are semicolons on the right, there must be the same number as on the left.

    Each "stream" is considered separately, so you can traverse two arrays each two elements at a time like this:

        for @ababab; @cdcdcd -> $a, $b; $c, $d { ... }

    If there are no semicolons on the right, the values are taken sequentially across the streams. So you can say

        for @aaaa; @bbbb -> $a, $b { ... }

    and it ends up meaning the same thing as if the comma were a semicolon, but only because the number of variables on the right happens to be the same as the number of streams on the right. That doesn't have to be the case. To get values one at a time across three streams, you can say

        for @a; @b; @c -> $x { ... }

    Each semicolon delimited expression on the left is considered to be a list of generated values, so it's perfectly legal to use commas or "infinite" ranges on the left. The following prints "a0", "b2", "c3", and so on forever (or at least for a very long time):

        for 0 .. Inf; "a" .. "z" x 1000 -> $i; $a {
            print "$a$i";
        }

    RFC 019: Rename the local operator

    We'll go with temp for the temporizing operator.

    In addition, we're going to be storing more global state in objects (such as file objects). So it ought to be possible to temporize (that is, checkpoint/restore) an attribute of an object, or at least any attributes that can be treated as an lvalue.

    RFC 064: New pragma 'scope' to change Perl's default scoping

    I can't stop people from experimenting, but I'm not terribly interested in performing this experiment myself. I made my short for a reason. So I'm accepting this RFC in principle, but only in principle. Standard Perl declarations will be plainly marked with my or our.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    Rejected RFCs

    Just because I've rejected these RFCs doesn't mean that they weren't addressing at a valid need. Usually an RFC gets rejected simply because I think there's a better way to do it. Often there's little difference between a rejected RFC that I've borrowed ideas from and an RFC accepted with major caveats.

    We're already running long, so these descriptions will be terse. Please read the RFC if you don't understand the commentary.

    RFC 089: Controllable Data Typing

    This is pretty close to what we've been planning for Perl for a long time. However, a number of the specifics are suboptimal.

    If you declare a constant, it's a constant. There's no point in allowing warnings on that by default. It should be fatal to modify a constant. Otherwise you lose all your optimization possibilities.

    For historical reasons, the assignment in

         my ($a, $b) = new Foo;

    will not distribute automatically over $a and $b. If you want that, use the ^= hyperassignment instead, maybe.

    Constraint lists are vaguely interesting, but seem to be too much mechanism for the possible benefits. If you really want a data type that can be polymorphic, why not just define a polymorphic type?

    In general, there seems to be a lot of confusion in this RFC between constraints on variables and constraints on values. For constraints to be useful to the compiler, they have to be on the variable, and you can't be "pushing" constraints at runtime.

    On aliasing via subroutine calls, note that declared parameters will be constant by default.

    So anyway, although I'm rejecting this RFC, we'll certainly have a declaration syntax resembling some of the tables in the RFC.

    RFC 106: Yet another lexical variable proposal: lexical variables made default

    Yes, it's true that other widely-admired languages like Ruby do implicit declaration of lexicals, but I think it's a mistake, the results of which don't show up until things start getting complicated. (It's a sign of this weakness that in Ruby you see the workaround of faking up an assignment to force declaration of a variable.)

    I dislike the implicit declaration of lexicals because it tends to defeat the primary use of them, namely, catching typos. It's just too easy to declare additional variable names by accident. It's also too easy to broaden the scope of a variable by accident. You might have a bunch of separate subroutines each with their own lexical, and suddenly find that they're all the same variable because you accidentally used the same variable name in the module initialization code.

    When you think about it, requiring my on declaration is a form of orthogonality. Otherwise you find your default scoping rules arbitrarily tied to an inner scope, or an outer scope, or a subroutine scope. All of these are suboptimal choices. And I don't buy the notion of using my optionally to disambiguate when you feel like it. Perl gives you a lot of rope to hang yourself with, but this is the wrong kind of rope, because it obscures a needful visual distinction. Declarations should look like declarations, not just to the programmer, but also to whoever has to read the program after them, whether carbon-based or silicon-based.

    And when it comes down to it, I believe that declarations with my are properly Huffman encoded. Declaring a lexical ought to be harder than assigning to one. And declaring a global ought to be harder than declaring a lexical (at least within classes and modules).

    RFC 119: Object neutral error handling via exceptions

    Good goals, but I don't want yet another independent system of exception handling. Simplicity comes through unification. Also, the proposed syntax is all just a little too intertwingled for my tastes. Let's see, how can I explain what I mean?

    The out-of-band stuff doesn't stand out visually enough to me, and I don't like thinking about it as control flow. Nevertheless, I think that what we've ended up with solves a number of the problems pointed out in this RFC. The RFC essentially asks for the functionality of POST, KEEP and UNDO at a statement level. Although POST, KEEP, and UNDO blocks cannot be attached to any statement, I believe that allowing post, keep, and undo properties in scoped declarations is powerful enough, and gives the compiler something tangible to attach the actions to. There is a kind of precision in attaching these actions to a specific variable--the state is bound to the variable in a transactionally instantaneous way. I'm afraid if we attach transactional actions to statements as the RFC proposes, it won't be clear exactly when the statement's state change is to be considered successful, since the transaction can't "know" which operation is the crucial one.

    Nonetheless, some ideas from this RFC will live on in the post, keep, and undo property blocks.

    RFC 120: Implicit counter in for statements, possibly $#.

    I am prejudiced against this one, simply because I've been burned too many times by implicit variables that mandate implicit overhead. I think if you need an index, you should declare one, so that if you don't declare one, the compiler knows not to bother setting up for it.

    Another problem is that people will keep asking what

        for (@foo,@bar) { print $# }

    is supposed to mean.

    I expect that we'll end up with something more like what we discussed earlier:

        for @array.kv -> $i, $elem { ... }

    RFC 262: Index Attribute

    Everyone has a use for : these days...

    This one seems not to be of very high utility, suffering from similar problems as the RFC 120 proposal. I don't think it's possible to efficiently track the container of a value within each contained object unless we know at compile time what a looping construct is, which is problematic with user-defined control structures.

    And what if an item is a member of more than one list?

    Again, I'd rather have something declared so we know whether to take the overhead. Then we don't have to pessimize whenever we can't do a complete static analysis.

    RFC 167: Simplify do BLOCK Syntax

    I think the "do" on a do block is useful to emphasize that the closure in the braces is to be executed immediately. Otherwise Perl (or the user (or both)) might be confused as to whether someone was trying to write a closure that is to be executed later, particularly if the block is the last item in a subroutine that might be wanting to return a closure. In fact, we'll probably outlaw bare blocks at the statement level as too ambiguous. Use for 1 {} or some such when you want a one-time loop, and use return or sub when you want to return a closure.

    We'll solve the ; problem by jiggering the definition of {...}, not by fiddling with do.

    RFC 209: Fuller integer support in Perl.

    The old use integer pragma was a hack. I think I'd rather use types and representation specs on individual declarations for compile-time selection, or alternate object constructors for run-time selection, particularly when infinite precision is desired. I'm not against using pragmas to alter the defaults, but I think it's generally better to be more specific when you have the capability. You can force your programs to be lexically scoped with pragmas, but data wants to flow wherever it likes to go, so your lexically scoped module had better be able to deal rationally with any data thrown at it, even if it isn't in the exact form that you prefer.

    By the way, the RFC is misleading when it asserts that 32-bit integer precision is lost when represented in floating point. That's only true if you use 32-bit floats. Perl has always used 64-bit doubles, which give approximately 15 digits of integer precision. (The issue does arise with 64-bit integers, of course.)

    All that being said, Perl 6 will certainly have better support for integer types of various sorts. I just don't think that a pragma redefining what an "integer" is will provide good documentation to whoever is trying to understand the program. Better to declare things of type MagicNum, or whatever.

    I could be wrong, of course. If so, write your pragma, and have the appropriate amount of fun.

    RFC 279: my() syntax extensions and attribute declarations

    We already treated this in Apocalypse 2.

    The RFC assumes that the type always distributes over a my list. This is not what is necessary for function signatures, which need individual types for each formal argument.

    And again, it doesn't make much sense to me to put properties on a variable at run-time.

    It makes even less sense to me to be able to declare the type of an array element lexically. This is the province of objects, not arrays pretending to be structs.

    RFC 297: Attributes for compiler hints

    Sorry, we can't have the semantics suddenly varying drastically merely because the user decided to run the program through a different translator. I think there's a happy medium in there somewhere where we can have the same semantics for both interpreter and compiler.

    RFC 309: Allow keywords in sub prototypes

    This RFC is rejected only because it doesn't go far enough. What we'll eventually need is to allow a regex-ish syntax notation for parsing that may be separate from the argument declarations. (Then again, maybe not.) In any event, I think some kind of explicit regex notation is called for, not the promotion of identifiers to token matchers. We may want identifiers in signatures for something else later, so we'll hold them in reserve.

    RFC 340: with takes a context

    This seems like a solution in search of a problem. Even if we end up with a context stack as explicit as Perl 5's, I don't think the amount we'll deal with it warrants a keyword. (And I dislike "return with;" as a needlessly opaque linguistic construct.)

    That being said, if someone implements (as user-defined code) the Pascalish with as proposed in RFC 342 (and rejected), and if the caller function (or something similar) returns sufficient information to build references to the lexical scope associated with the call frame in question, then something like this could also be implemented as user code. I can't decide whether it's not clear that this is a good idea, or it's clear that this is not a good idea. In any event, I would warn anyone doing this that it's likely to be extremely confusing, akin to goto-considered-harmful, and for similar reasons, though in this case by displacing scopes rather than control flow.

    Note that some mechanism resembling this will be necessary for modules to do exportation to a lexical scope (see %MY in Apocalypse 2). However, lexical scope modification will be allowed only during the compile time of the lexical scope in question, since we need to be careful to preserve the encapsulation that lexical scoping provides. Turning lexical variables back into dynamic variables will tend to destroy that security.

    So I think we'll stick with closures and continuations that don't transport lexical scopes at runtime.

    RFC 342: Pascal-like "with"

    I expect Perl's parsing to be powerful enough that you could write a "with" if you wanted one.

    Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 04 for the latest information.

    Withdrawn RFCs

    RFC 063: Exception handling syntax

    RFC 113: Better constants and constant folding


    Other decisions

    C-style for loop

    Due to syntactic ambiguities with the new for syntax of Perl 6, the generalized C-style for loop is going to get its keyword changed to loop. And for will now always mean "foreach". The expression "pill" is now optional, so instead of writing an infinite loop like this:

        for (;;) {
            ...
        }

    you can now write it like this:

        loop {
            ...
        }

    C-style do {} while EXPR no longer supported

    In Perl 5, when you used a while statement modifier on a statement consisting of nothing but a do {}, something magical happened, and the block would be evaluated once before the condition was evaluated. This special-cased construct, seldom used and often misunderstood, will no longer be in Perl 6, and in fact will produce a compile-time error to prevent people from trying to use it. Where Perl 5 code has this:

        do {
            ...
        } while CONDITION;

    Perl 6 code will use a construct in which the control flow is more explicit:

        loop {
            ...
            last unless CONDITION;
        }

    Bare blocks

    In Perl 5, bare blocks (blocks used as statements) are once-through loops. In Perl 6, blocks are closures. It would be possible to automatically execute any closure in void context, but unfortunately, when a closure is used as the final statement in an outer block, it's ambiguous as to whether you wanted to return or execute the closure. Therefore the use of a closure at the statement level will be considered an error, whether or not it's in a void context. Use do {} for a "once" block, and an explicit return or sub when you want to return a reference to the closure.

    continue block

    The continue block changes its name to NEXT and moves inside the block it modifies, to work like POST blocks. Among other things, this allows NEXT blocks to refer to lexical variables declared within the loop, provided the NEXT block is place after them. The generalized loop:

        loop (EXPR1; EXPR2; EXPR3) { ... }

    can now be defined as equivalent to:

        EXPR1;
        while EXPR2 {
            NEXT { EXPR3 }
            ...
        }

    (except that any variable declared in EXPR3 would have different lexical scope). The NEXT block is called only before attempting the next iteration of the loop. It is not called when the loop is done and about to exit. Use a POST for that.

    Well, that about wraps it up for now. You might be interesting to know that I'm posting this from the second sesquiannual Perl Whirl cruise, on board the Veendam, somewhere in the Carribean. If the ship disappears in the Bermuda Triangle, you won't have to worry about the upcoming Exegesis, since Damian is also board. But for now, Perl 6 is cruising along, the weather's wonderful, wish you were here.

    Creating Custom Widgets

    In this Perl/Tk article, I'll discuss balloon help, photos and widget subclassing. Help balloons can be attached to widgets, menu items, and, as we'll see here, individual canvas items. Subclassing a Perl/Tk widget is also known as creating a derived (mega) widget. For this article, I'll presume basic knowledge of mega widgets. If the subject is new to you, or if there are points you don't understand, then please read Mastering Perl/Tk, Chapter 14, Creating Custom Widgets in Pure Perl/Tk, for complete details. Photos are described in Chapter 17, Images and Animations, and balloon help is discussed in Chapter 23, Plethora of pTk Potpourri.

    We are going to develop a color picker, a window that allows us to select a color that we might use to configure an application. This widget differs from most other color pickers you've seen because our palette is a box of crayons.

    Tk::CrayolaCrayonColorPicker is a Tk::DialogBox-derived widget that allows a user to select a color from a photo of a box of 64 Crayola crayons. Nominally, one positions the cursor over the desired crayon and clicks button-1, whereupon the RGB values of the pixel under the cursor are returned. However, in reality, one can click anywhere over the photo.

    Balloon help is provided, so that if the cursor lingers over a crayon, then a ballon pops up, displaying the crayon's actual color - for instance, "robin's egg blue."

    Because Tk::CrayolaCrayonColorPicker is a subclass of Tk::DialogBox, the widget can have one or more buttons, with the default being a single Cancel button. This functionality is provided automatically by the superclass, Tk::DialogBox.

    Our widget also overrides the Tk::DialogBox::Show() method with one of its own. We do this because, by definition, dialogs are modal, which means they perform a grab. Unfortunately, balloon help does not work with a grab in effect, so Tk::CrayolaCrayonColorPicker::Show() deiconifies the color picker window itself, waits for a color selection or a click on the Cancel button, and then hides the window.

    The return value from our Show() method is either a reference to an array of three integers, the red, green and blue pixel triplet, or a string indicating which dialog button was clicked.

    Here's an example, which creates the window seen in Figure 1:

        use Tk::CrayolaCrayonColorPicker;
        my $cccp = $mw->CrayolaCrayonColorPicker(-title => 'Crayon Picker');
        my $color = $cccp->Show;
    
        if ( ref($color) =~ /ARRAY/ ) {
            my ($r, $g, $b) = @$color;
            print "r/g/b=$r/$g/$b!\n";
        } else {
            print "no color selected, response=$color!\n";
        }
    

      Figure 1. -- Box of Crayons

    Figure 1

     

    Notice the use of the -title option. Since Tk::CrayolaCrayonColorPicker is derived from Tk::DialogBox, it supports all the option/value pairs defined by its superclass, of which -title is one.

    Now let's look at the definition of class Tk::CrayolaCrayonColorPicker. I like to place the module's version number as the first line of the file, making it easy for MakeMaker (and humans) to find it. (MakeMaker usage is also explained in Mastering Perl/Tk, Chapter 14, Creating Custom Widgets in Pure Perl/Tk.)

    Next is the package definition.

    Tk::widgets is a fast way to use a list of widgets. It expands to "use Tk::Widget1; use Tk::Widget2;", and so on.

    The "use base" statement is important. It tells us two things: First, that we are defining a derived widget (i.e. subclassing an existing widget), and, second, the precise widget being subclassed. Including Tk::Derived in a widget's @ISA array is the telltale marker of a derived widget. Without Tk::Derived, the assumption is that we are creating a composite widget.

    We then pre-declare a subroutine and enable a strict programming style.

    The final statement in the module prologue actually defines the widget contructor name by modifying our symbol table, and performs other heavy magic, allowing us to use the new widget in the same manner as any other Perl/Tk widget.

    $Tk::CrayolaCrayonColorPicker::VERSION = '1.0';
    
    package Tk::CrayolaCrayonColorPicker;
    
    use Tk::widgets qw/Balloon/;
    use base        qw/Tk::Derived Tk::DialogBox/;
    use subs        qw/pick_color/;
    
    use strict;
    
    Construct Tk::Widget 'CrayolaCrayonColorPicker';
      

    A CrayolaCrayonColorPicker widget is simply a canvas with a photo of a box of Crayola crayons covering it. Since photos are objects that persist until they are destroyed, all widget instances can share the same photo. So we can create the photo from an image file once, and store its reference in a class global variable. For sizing the canvas, we keep the photo's width and height in class variables, too.

    our (
         $crayons,                  # Photo of a bunch of crayons
         $cray_w,                   # Photo width
         $cray_h,                   # Photo height
    );
      

    As part of class initialization, Perl/Tk makes a call to the ClassInit() method. This method serves to perform tasks for the class as a whole. Here we create the photo object and define its dimensions.

    sub ClassInit {
    
        my ($class, $mw) = @_;
    
        $crayons = $mw->Photo(-file => 'crayons.gif', -format => 'gif');
        ($cray_w, $cray_h) = ($crayons->width, $crayons->height);
    
        $class->SUPER::ClassInit($mw);
    
    } # end ClassInit
      

    The heart of a widget module is Populate(), where we create new widget instances. A CrayolaCrayonColorPicker widget consists of a canvas with a photo of a box of Crayola crayons (taken with my handy digital camera). Clicking anywhere on the photo invokes a callback that fetches the RGB components of the pixel under the click.

    Additionally, transparent, trapezoidal, canvas polygons are superimposed over the tips of each crayon, and each of these items has a ballon help message associated with it. The message indicates the crayon's color.

    sub Populate {
    
        my ($self, $args) = @_;
      

    Since we are a Tk::DialogBox widget at heart, set up a default Cancel button to ensure our superclass' Populate() has a chance to process the option list, then withdraw the window until it's shown.

        $args->{'-buttons'} = ['Cancel'] unless defined $args->{'-buttons'};
        $self->SUPER::Populate($args);
    
        $self->withdraw;
      

    Create the canvas with its photo, and store the canvas reference and the image id as instance variables. We'll need access to both later.

        $self->{can} = $self->Canvas(
            -width  => $cray_w,
            -height => $cray_h,
        )->pack;
        $self->{iid} = $self->{can}->createImage(0, 0,
            -anchor => 'nw',
            -image  => $crayons,
        );
      

    Define the canvas callback that fetches and returns an RGB triplet. The CanvasBind() method operates on the entire canvas, unlike the canvas' bind() method that operates on an individual canvas tag or id.

        $self->{can}->CanvasBind('<buttonrelease-1>' => [\&pick_color, $self]);
    

    Next, create the tiny transparent trapezoids that cover the tip of the 64 crayons, and define the balloon help. When specifying balloon help for one or more canvas items, the balloon widget expects its -msg option to be a reference to a hash, where the hash keys are canvas tags or ids, and the hash values are the balloon help text.

    So, we first create an instance variable that references an empty anonymous hash, then invoke the private method make_balloon_items() to do the dirty work. The method creates the canvas polygon items and populates the hash pointed to by $self->{col}. Then, we create the balloon widget, and attach the canvas and help messages. The ballon text appears next to the cursor.

        $self->{col} = {};         # anonymous hash indexes colors by id
        $self->make_balloon_items;
    
        $self->{bal} = $self->Balloon;
        $self->{bal}->attach($self->{can},
            -balloonposition => 'mouse', 
            -msg             => $self->{col},
        );
    
    } # end Populate
      

    Here's the class private method make_balloon_items(), which simply makes 64 calls to make_poly().

    The 64-crayon Crayola box in divided into 4 sections of 16 crayons each. Each section contains two rows of eight crayons. These subroutine calls create each section, starting with the section's background row, followed by the section's foreground row.

    We create the polygons items from back to front so that the canvas stacking order is back to front. This ensures that the balloon help of foreground polygons items takes precedence over background items.

    For obvious brevity, most of the make_poly() calls have been removed.

    sub make_balloon_items {
    
        my ($self) = @_;
    
        # 16 northwest crayons.
    
        $self->make_poly(132,   8, 'red');
    
        # 16 northeast crayons.
    
        $self->make_poly(306,  61, 'gray');
    
        # 16 southwest crayons.
    
        $self->make_poly(107,  97, 'brick red');
    
        # 16 southeast crayons.
    
        $self->make_poly(270, 157, 'tumbleweed');
    
    } # end make_balloon_items
      

    Given the coordinates of the point of a crayon, the class private method make_poly() creates a transparent polygon over the tip so we can attach a balloon message to it. The message is the crayon's color, and is stored in the hash pointed to by $self->{col}, indexed by polygon canvas id.

    The transparent stipple is important, as it allows balloon events to be seen. The fill color is irrelevant; we just need something to fill the polygon items so events are registered.

    If we remove the stipple, then the polygon items covering the crayon tips become visible, as shown in Figure 2.

        sub make_poly {
    
            my ($self, $x, $y, $color) = @_;
    
            my $id = $self->{can}->createPolygon(
                $x-3, $y, $x+3, $y, $x+11, $y+38, $x-11, $y+38, $x-3, $y,
                -fill    => 'yellow',
                -stipple => 'transparent',
            );
    
            $self->{col}->{$id} = $color;
    
        } # end make_poly
      

      Figure 2. -- Crayons with Yellow Tips

    Figure 2.

     

    Subroutine pick_color() is our last class private method. It demonstrates a rather dubious object oriented programming technique - meddling with the internals of its superclass! But we do this out of necessity, as a workaround for the "balloons do not work with a grab" bug.

    We want to override Tk::DialogBox::Show, so we need to know what its waitVariable() is waiting for. It's this variable that the dialog buttons set when we click on them, and it turns out to be $self->{'selected_button'}.

    We make pick_color() set the same variable when returning a pixel's RGB values, thus unblocking the waitVariable() and returning the RGB data to the user.

    In case you're interested, early-on in the coding I determined the coordinates of each crayon's point by printing $x and $y in this callback.

        sub pick_color {
    
            my ($canvas, $self) = @_;
            my ($x, $y) = ($Tk::event->x, $Tk::event->y);
            my $i = $canvas->itemcget($self->{iid}, -image);
            $self->{'selected_button'} = $i->get($x, $y);
    
        } # end pick_color
      

    Here is our only class public method, Show(). We can't use the standard DialogBox Show() method because the grab interferes with balloon help. So we roll our own, forgoing the modal approach. Control passes from waitVariable() in one of two ways: 1) a color is selected (see pick_color() above), or, 2), the Cancel button is activated.

        sub Show {
    
            my ($self) = @_;
            $self->Popup;
            $self->waitVariable(\$self->{'selected_button'});
            $self->withdraw;
            return $self->{'selected_button'};
    
        } # end Show
      
    And that's it. Until next time ... use Tk;
    You can download the class module, associated .GIF file and a test program that uses the new class.

    O'Reilly & Associates recently released (January 2002) Mastering Perl/Tk.

    This Week on Perl 6 (30 December 2001 - 5 Jan 2002)

    Notes

    You can subscribe to an email version of this summary by sending an empty message to perl6-digest-subscribe@netthink.co.uk.

    This summary, as with past summaries, can be found in here. Please send additions, submissions, corrections, kudos, and complaints to bwarnock@capita.com.

    For more information on the Perl 6 and Parrot development efforts, visit dev.perl.org and parrotcode.org.

    There were 373 messages across 112 threads, with 50 authors contributing. Most of the messages were patches. For 2001, about 340 folks submitted over 9000 messages across 1300 or so threads.

    Generators

    (8 posts) Clark C. Evans asked whether Parrot will support generators, a cousin to continuations. Dan Sugalski says coroutines and continuations are in, but didn't really answer about generators. (As a sidebar, the rest of the thread was an interesting discussion about Python's recent addition of generators and how they work.)

    Platform Fixes

    Win32 has had more work done on it; in particular, the continual makefile problems. LP64 environments, such as 64-bit Solaris and Tru64, also received a much-needed fix, allowing them to finally work.

    Signed vs. Unsigned

    A huge signed-to-unsigned migration was finally begun. Somewhere between the original design and the initial code, the use of an unsigned partner to INTVAL (The Type Formerly Known As IV) was dropped. It's now been reintroduced, and implicitly unsigned values are slowly being converted.

    Strings

    There were a lot of cleanup and additions made to string support in Parrot. There was a mild discussion on being able to dereference a stringified address.

    Fixed-sized Output Records

    The language list received a question from Tzadik Vanderhood, asking if a fixed-sized output record will be allowed, in line with its input record counterpart: $/ =\80;.

    Aaron Sherman gave a good response.

    Parroty Bits

    The Perl Development Grant Fund creeped up to 26%.


    Bryan C. Warnock

    Beginning Bioinformatics

    Bioinformatics, the use of computers in biology research, has been increasing in importance during the past decade as the Human Genome Project went from its beginning to the announcement last year of a "draft" of the complete sequence of human DNA.

    The importance of programming in biology stretches back before the previous decade. And it certainly has a significant future now that it is a recognized part of research into many areas of medicine and basic biological research. This may not be news to biologists. But Perl programmers may be surprised to find that their handsome language has become one of the most - if not the most popular - of computer languages used in bioinformatics.

    My new book Beginning Perl for Bioinformatics from O'Reilly & Associates addresses the needs of biologists who want to learn Perl programming. In this article, I'm going to approach the subject from another, almost opposite, angle. I want to address the needs of Perl programmers who want to learn biology and bioinformatics.

    First, let me talk about ways to go from Perl programmer to "bioinformatician". I'll describe my experience, and give some ideas for making the jump. Then, I'll try to give you a taste of modern biology by talking about some of the technology used in the sequencing of genomes.

    My Experience

    Bioinformaticians generally have either a biology or programming background, and then receive additional training in the other field. The common wisdom is that it's easier for biologists to pick up programming than the other way around; but, of course, it depends on the individual. How does one take the skills learned while programming in, say, the telecommunications industry, and bring them to a job programming for biology?

    I used to work at Bell Labs in Murray Hill, N.J., in the Speech Research Department. It was my first computer programming job; I got to do stuff with computer sound, and learn about speech science and linguistics as well. I also got to do some computer music on the side, which was fantastic for me. I became interested in the theory of computer science, and entered academia full time for a few years.

    When it became time for me to get back to a regular salary, the Human Genome Project had just started a bioinformatics lab at the university where I was studying. I had a year of molecular biology some years before as an undergraduate, but that was before the PCR technique revolutionized the field. At that time, I read Watson's classic "The Molecular Biology of the Gene" and so I had an inkling about DNA, which probably helped, and I knew I liked the subject. I went over to meet the directors and leveraged my Unix and C and Bell Labs background to get a job as the systems manager. (PCR, the polymerase chain reaction, is the way we make enough copies ("clones") of a stretch of DNA to be able to do experiments on it. After learning the basics of DNA -- keep reading! -- PCR would be a great topic to start learning about molecular biology techniques. I'll explain how in just a bit.)

    In my new job I started working with bioinformatics software, both supporting and writing it. In previous years, I'd done practically no programming, having concentrated on complexity theory and parallel algorithms. Now I was plunged into a boatload of programming -- C, Prolog, Unix shell and FORTRAN were the principal languages we used. At that time, just as I was starting the job, a friend at the university pressed his copy of Programming Perl into my hands. It made a strong impression on me, and in short order I was turning to Perl for most of the programming jobs I did.

    O'Reilly Bioinformatics Technology Conference

    Don't miss the Beginning Perl for Bioinformatics session, Monday, January 28, 2002, at the O'Reilly Bioinformatics Technology Conference.

    I also started hanging out with the genome project people. I took some graduate courses in human genetics and molecular biology, which helped me a lot in understanding what everyone around me was doing.

    After a few years, when the genome project closed down at my university, I went to other organizations to do bioinformatics, first at a biotech startup, then at a national comprehensive cancer center, and now consulting for biology researchers. So that's my story in a nutshell, which I offer as one man's path from programming to bioinformatics.

    Bringing Programming to Biology

    Especially now that bioinformatics is seen as an important field, many biology researchers are adding bioinformatics to their grant proposals and research programs. I believe the kind of path that I took is even more possible now than then, simply due to the amount of bioinformatics funding and jobs that are now staffed. Find biology research organizations that are advertising for programmers, and let them know you have the programming skills and the interest in biology that would make you an asset to their work.

    But what about formal training? It's true that the ideal bioinformatician has graduate degrees in both computer science and biology. But such people are extremely rare. Most workers in the field have a good mix of computer and biology skills, but their degrees tend to come from one or the other. Still, formal training in biology is a good way for a computer programmer to learn about bioinformatics, either preceding or concurrently with a job in the field.

    I can understand the reluctance to face another degree. (I earned my degrees with a job and a family to support, and it was stressful at times.) Yes, it is best to get a degree if you're going to be working in biology. A masters degree is OK, but most of the best jobs go to those who have their doctrate degree. They are, however, in ample supply and often get relatively low pay, as in postdoc positions that are frequently inhabited for many years. So the economic benefit of formal training in biology is not great, compared to what you may be earning as a computer expert. But at present bioinformatics pays OK.

    On the other hand, to really work in biology, training is a good thing. It's a deep subject, and in many ways quite dissimilar to computer science or electrical engineering or similar fields. It has many surprises, and the whole "wet lab" experimental approach is hard to get out of books.

    For self-study, there's one book that I think is a real gem for Perl programmers who want to learn about modern biology research. The book is called "Recombinant DNA," by the co-discoverer of the structure of DNA, James Watson, and his co-authors Gilman, Witkowski, Zoller, and Witkowski. The book was deliberately written for a wide audience, so you can start at the beginning with an explanation of what, exactly, are DNA and proteins, the two most important types of molecules in biology. But it goes on to introduce a wide range of fundamental topics in biology research, including explanations of the molecular biology laboratory techniques that form the basis of the revolution and the golden age in biology that we're now experiencing. I particularly like the use of illustrations to explain the techniques and the biology -- they're outstanding. In my jobs as manager of bioinformatics, I've always strongly urged the programmers to keep the book around and to dip into it often.

    The book does have one drawback, however. It was published in 1992. Ten years is as long in biology as it is in computer technology; so "Recombinant DNA" will not go into newer stuff such as microarrays or SNPs. (And don't get the even earlier "Recombinant DNA: A Short Course" -- the 1992 edition is the one to get for now.) But what it does give you is a chance to really understand the fundamental techniques of modern molecular biology; and if you want to bring your Perl programming expertise to a biology research setting, then this is a great way to get a good start getting the general idea.

    There are a few other good books out, and several more coming during the next year, in the bioinformatics field. Waterman; Mount; Grant and Ewens; Baxevanis et al, and Pevzner are a few of the most popular books (some more theoretical than others). My book, although for beginning programmers, may be helpful in the later chapters to get an idea of basic biological data and programs. Gibas and Jambeck's book Developing Bioinformatics Computer Skills gives a good overview of much of the software and the general computational approach that's used in bioinformatics, although it also includes some beginning topics unsuitable for the experienced programmer.

    Of all the bioinformatics programs that one might want to learn about, the Perl programmer will naturally gravitate toward the Bioperl project. This is an open-source, international collaborative effort to write useful Perl bioinformatics modules, and it has reached a point during the past few years where it is quite useful stuff. The 1.0 release may be available by the time you read this. Exploring this software, available at http://www.bioperl.org, is highly recommended, with one caveat: It does not include much tutorial material, certainly not for teaching basic biology concepts. Still, you'll find lots of great stuff to explore and use in Bioperl. It's a must for the Perl bioinformatician.

    Apart from self-study, you may also want to try to get into some seminars or reading groups at the local university or biotech firm, or generally meet people. If you're job hunting, then you may want to go introduce yourself to the head of the biology department at the U, and let her (yes, there are a lot of women working in biology research, a much better situation than in programming) -- know that you want a bioinformatics job and that you are a wizard at 1) programming in general, 2) Web programming, and 3) getting a lot out of computers for minimal money. But be prepared for them to have sticker shock when it comes to salaries. Maybe it's getting a little better now, but I've often found that biologists want to pay you about half of what you're worth on the market. Their pay level is just lower than that in computer programming. When you get to that point, you might have to be a bit hardnosed during salary negotiations to maintain your children's nutritional requirements.

    I don't know of a book or training program that's specifically targeted at programmers interested in learning about biology. However, many universities have started offering bioinformatics courses, training programs, and even degrees, and some of their course offerings are designed for the experienced programmer. You might consider attending one of the major bioinformatics conferences. However, there will be a tutorial aimed at you in the upcoming O'Reilly bioinformatics conference -- indeed, the main focus of that conference is from the programming side more than the biology side.

    Apart from the upcoming O'Reilly conference already mentioned, there is the ISMB conference, the largest in the bioinformatics field, which is in Calgary this coming summer; a good place to meet people and learn. It will also play host to the Bioperl yearly meeting, which is directly on target. Actually, if you check out the presenters at the ISMB, RECOMB or O'Reilly conferences, then you will find computer science people who are specializing in biology-related problems, as well as biologists specializing in infomatics, and many of these will be many of these will be lab heads or managers who maintain staffs of programmers.

    The thing about biology is that it's a very large area. Most researchers stake out a claim to some particular system -- say, the regulation of nervous system development in the fly -- and work there. So it's hard to really prepare yourself for the particular biology you might find on the job. The "Recombinant DNA" book will give you an overview of some of the more important techniques that are commonly used in most labs.

    A Taste of Molecular Biology

    Now that I've given you my general take on how a Perl programmer could move into biology research, I'll turn my attention to two basic molecular biology techniques that are fundamental in biology research, as for instance in the Human Genome Project: restriction enzymes and cloning with PCR.

    First, we have to point out that the two most important biological molecules, DNA and proteins, are both polymers, which are chains of smaller building block molecules. DNA is made out of four building blocks, the nucleotides or "bases"; proteins are made from 20 amino acids. DNA has a regular structure, usually the famous double helix of two complementary strands intertwined; whereas proteins fold up in a wide variety of ways that have an important effect on what the proteins are able to do inside the cell. DNA is the primary genetic material that transmits traits to succeeding generations. Finally, DNA contains the coded templates from which proteins are made; and proteins accomplish much of the work of the cell.

    One important class of proteins are enzymes, which promote certain specific chemical reactions in the cell. In 1978 the Nobel Prize was awarded to Werner Arber, Daniel Nathans, and Hamilton Smith for their discovery and work on restriction enzymes in the 1960s and early 1970s. Restriction enzymes are a large group of enzymes that have the useful property of cutting DNA at specific locations called restriction sites. This has been exploited in several important ways. It has been an important technique in fingerprinting DNA, as is used in forensic science to identify individuals. It has been instrumental in developing physical maps, which are known positions along DNA and are used to zero in on the location of genes, and also serve as a reference point for the lengthy process of the determination of the entire sequence of bases in DNA.

    Restriction enzymes are fundamental to modern biological research. To learn more about them, you could go to the REBASE restriction enzyme database where detailed information about all known restriction enzymes is collected. Many of them are easily ordered from supply houses for use in the lab.

    One of the most common restriction enzymes is called EcoRI. When it finds the six bases GAATTC along a strand of DNA, it cleaves the DNA.

    The other main technique I want to introduce is one already mentioned: PCR, or the polymerase chain reaction. This is the most important way that DNA samples are cloned, that is, have copies made. PCR is very powerful at this; in a short time many millions of copies of a stretch of DNA can be created, at which point there is enough of a "sample" of the DNA to perform other molecular biology experiments, such as determining what exactly is the sequence of bases in the DNA (as has been accomplished for humans in the Human Genome Project.)

    PCR also won a Nobel prize for its invention, by Kary Mullis in 1983. The basic idea is quite simple. We've mentioned that the two intertwined strands of the double helix of DNA are complementary. They are different, but given one strand we know what the other strand is, as they always pair in a specific way. PCR exploits this property.

    Motivation

    It's clear that a short article is not going to get very far in introducing a major science such as biology. But I hope I've given you enough pointers to enable you to make a good start at learning about this explosive science, and about how a Perl programmer might be able to bring needed skills to the great challenge of understanding life and curing disease.

    In the 10 years I've been working in biology, I've found it to be a really exciting field, very stimulating intellectually; and I've found that going to work to try to help to cure cancer, Alzheimer's disease, and others, has been very satisfying emotionally.

    I wish you the very best of luck. If you make it to the O'Reilly conference, please look me up!

    Visit the home of the Perl programming language: Perl.org

    Sponsored by

    Monthly Archives

    Powered by Movable Type 5.13-en