December 2000 Archives

This Week on p5p 2000/12/24


Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

You'd think this week would be pretty quiet, but we saw the usual 300 or so messages. Don't you all have homes to go to?

5.6.1 Trial is out!

The big news this week is that the long-awaited first trial release of Perl 5.6.1 is out, and available for download from CPAN: get the patch against 5.6.0 from your nearest CPAN here. (Don't forget you'll also need to get the 5.6.0 source as well)

Please test it out thoroughly, run your favourite bugs through it and see if they've been fixed, and above all, tell us if it works or if itdoesn't. Use the perlbug utility to get in touch, and the make ok or make nok Makefile targets to send build success and failure reports.

Profiling

Both Alan and I have been doing some work with profilers recently, and looking at perl's hotspots. They seem to have been in surprising places; Alan found a lot of time was spent setting up and destroying objects, and also dealing with sigsetjmp; these are all things that Ilya looked at last week, but Alan said he only saw a 6% speedup with Ilya's patches, which was a little less than we were expecting.

Nick suggested a couple of optimizations, including the rearrangement of pp_hot.c. This might cause people to wonder what the whole business about pp_hot.c is, anyway. "PP code" are the functions in perl which implement the interpreter's operations: as a simplification, they're the source code to things like print and +. They're called "PP code" because most of their work consists of Pushing and Popping things on and off the argument stack. The idea behind pp_hot.c is that all the frequently used functionality goes in that one file, which means that the object code can be cached by the processor. At least, that's what we hope will happen. So, we need to check that the functions we have in there actually are the most frequently used operations, which means we need to periodically profile and reorganise it. However, since processor caches vary wildly between machines, it's very hard to do the cache tuning accurately. Read about it.

Solaris and Sockets

Stephen Potter brought up an old bug related to sockets on Solaris, which Alan diagnosed as a change in Solaris' behaviour with respect to restarting interrupted system calls - specifically, the restarting of accept() and connect() after a child signal was caught. While this wasn't a bug in Perl, per se, it raised the question of how to shield the user from this sort of operating system change. Things like POSIX don't specify whether or not system calls should be restarted, so it's all down, essentially, to tradition. To make things worse, when the system call was interrupted, IO::Socket hadn't been catching the error, and carried on working, which created a dead filehandle - this in turn generated "bad filehandle" errors instead of "interrupted system call" errors, masking the true problem. And maybe IO::Socket should restart the interrupted operation as well, since that's usually what people want.

Either way, a fix to IO::Socket is needed, which nobody seems to have come up with yet. If you want to do this, you'll want to read the whole thread.

strtoul

The two Nicks discovered that Perl is being slowed down by the fact that it calls atol, (which is sometimes strtoul in disguise) and this performs lots of integer division - a computationally expensive process which is quite unnecessary when you're dealing with base-ten integers.

Nicholas Clark suggested that it would be a lot more efficient if Perl implemented strtoul itself as part of the looks_like_number routine, the code called every time Perl wants to convert a string to a number. The only worry was locale-based grouping: the C library sometimes lets you turn "123,456" into 123456. (assuming that commas are your local thousands separator) We would lose this ability by doing the conversion as part of looks_like_number. But Nick Ing-Simmons pointed out that this is a red herring, since Perl won't pass non-digits through to strtoul anyway: $a = "123,456"; print 0+$a will give you 123. The only other dispute was to whether this should be done for Perl 5.7 or Perl 6; we'll see whether or not it gets done - and, of course, how much it helps.

Language-sensitive editors

The section in the Perl FAQ on Perl-aware editors and IDEs has been updated, thanks to a very complete survey by Peter Primmer and others. Here's what it looks like now:

PerlBuilder (http://www.solutionsoft.com/perl.htm) is an integrated development environment for Windows that supports Perl development. PerlDevKit (http://www.activestate.com/Products/Perl_Dev_Kit/index.html) is an IDE from ActiveState supporting the ActivePerl. (VisualPerl, a Visual Studio (or Studio.NET, in time) component is currently (late 2000) in beta). The visiPerl+ IDE is available from Help Consulting (http://helpconsulting.net/visiperl/). Perl code magic is another IDE (http://www.petes-place.com/codemagic.html). CodeMagicCD (http://www.codemagiccd.com/) is a commercial IDE. The Object System (http://www.castlelink.co.uk/object_system/) is a Perl web applications development IDE.

Perl programs are just plain text, though, so you could download GNU Emacs (http://www.gnu.org/software/emacs/windows/ntemacs.html) or XEmacs (http://www.xemacs.org/Download/index.html), or a vi clone such as Elvis (ftp://ftp.cs.pdx.edu/pub/elvis/, see also http://www.fh-wedel.de/elvis/), nvi (http://www.bostic.com/vi/, or available from CPAN in src/misc/), or Vile (http://www.clark.net/pub/dickey/vile/vile.html), or vim (http://www.vim.org/) (win32: http://www.cs.vu.nl/~tmgil/vi.html). (For vi lovers in general: http://www.thomer.com/thomer/vi/vi.html)

The following are Win32 multilanguage editor/IDESs that support Perl: Codewright (http://www.starbase.com/), MultiEdit (http://www.MultiEdit.com/), SlickEdit (http://www.slickedit.com/).

There is also a toyedit Text widget based editor written in Perl that is distributed with the Tk module on CPAN. The ptkdb (http://world.std.com/~aep/ptkdb/) is a Perl/tk based debugger that acts as a development environment of sorts. Perl Composer (http://perlcomposer.sourceforge.net/vperl.html) is an IDE for Perl/Tk GUI creation.

In addition to an editor/IDE you might be interested in a more powerful shell environment for Win32. Your options include the Bash from the Cygwin package (http://sources.redhat.com/cygwin/), or the Ksh from the MKS Toolkit (http://www.mks.com/), or the Bourne shell of the U/WIN environment (http://www.research.att.com/sw/tools/uwin/), or the Tcsh (ftp://ftp.astron.com/pub/tcsh/, see also http://www.primate.wisc.edu/software/csh-tcsh-book/), or the Zsh (ftp://ftp.blarg.net/users/amol/zsh/, see also http://www.zsh.org/). MKS and U/WIN are commercial (U/WIN is free for educational and research purposes), Cygwin is covered by the GNU Public License (but that shouldn't matter for Perl use). The Cygwin, MKS, and U/WIN all contain (in addition to the shells) a comprehensive set of standard UNIX toolkit utilities.

Numeric conversion on HPUX

Merijn and Nicholas Clark went through twelve rounds with the HPUX compiler this week; it seems to have won. It all started with some innocent-looking numconvert.t test failures. (Well, all right, nothing about numconvert.t looks innocent, but let's not let that spoil the narrative.)

Then we discovered that the problem was in edge cases: "4294967296" was getting wrapped around to 0 when used as a UV. Nick thought this was a problem with casting, so had Merijn remove Perl's trust in the compiler's casting abilities - this still didn't help. Then Nick thought it was something to do with sv_2nv, but then we found out it was something to do with addition and writing back the value: $a += 0 was giving the wrong answer, but $a+0 was fine.

By now, it's time to go through the usual routine of blaming the optimiser and the compiler, but turning off optimisation didn't help. Nick had a brainwave, and tried testing earlier versions of Perl - all gave the same result, thankfully ruling out his recent UV-preservation code. But when he tried a simple C program to do the same thing, he found that C gets it wrong on FreeBSD x86, but Perl gets it right! Nick's gone away on holiday, but we can expect him to be vigorously scratching his head about this one all the while...

Various

Lots of fun little things happened this week.

Repository browser

I announced the repository browser, at http://the.earth.li/~simon/cgi-bin/repository; this allows you to grab files and patches from the repository, look at what patches affect a file, and get a cvs annotate-style blamelog of a file.

Dependency checker

Rocco Caputo's distribution dependency checker is up at http://poe.perl.org/poedown/deptest-0.1201.tar.gz - run it in the root of a module distribution, and it'll tell you what dependencies that module has.

Unicode task

Jarkko has a task for someone:

Here's a little something that someone might consider doing over the holiday season, a nice way to get to know UTF-8 if someone feels the urge or interest to do so.

If that's you, see Read about it.

use constant Updates

Casey Tweten has hacked around the constant pragma so that you can now declare multiple constants at once, which is pretty cool, but there's been no word on whether or not this is going in the tree.

Miscellany

The usual contingent of "OK" messages, test results, new bugs, old bugs (thanks to Stephen Potter for dredging up still-active old bugs) and non-bugs. And a few thanks messages - thanks for them!

So, that's all for now; enjoy the holidays, set $|=1 on the appropriate filehandle, and have yourselves a merry little Christmas.

Until next millennium I remain, your humble and obedient servant,

What every Perl programmer needs to know about .NET

.NET is the latest hype blitz from Microsoft (and you thought it was just a domain). If your eyes are glazing over or you're tempted to write it off as marketing speak, read on.

Although Microsoft is loath to admit it, .NET is really their answer to Sun. The Java language, the Java virtual machine and CORBA have proved to be a threat. .NET is the umbrella name for Microsoft's attempt to better Sun.

Whereas Java is the programming language for Sun's "computer is the network" effort, Microsoft has given us C# ("C sharp"). It's derived from C, and attempts to avoid some of the pitfalls of Java and C++. And much as Java compiles down to Java Virtual Machine (JVM) instructions, C# compiles down to Intermediate Language (IL).

Where Microsoft betters Sun is that while Java is the only real language that compiles to the JVM (see update below,) Microsoft intends IL to be cross-language. That is, Perl, Visual Basic and C# can be compiled down to IL. The idea is to make it possible to integrate multiple languages into one system.

Sun chose CORBA as its distributed application platform, but Microsoft has gone with SOAP (Simple Object Access Protocol). In my opinion, SOAP has one huge benefit over CORBA: mere mortals can implement it! Soap uses HTTP to send XML-encoded instance and method calls on remotely defined objects and receive return values. The umbrella term for SOAP and its cousin XML-RPC is "Web Services." There are already SOAP and XML-RPC modules for Perl.

While it's still early days for .NET, and there's still considerable potential for it all to turn out to be vaporware, Microsoft has begun to release .NET components. And it's not just Microsoft - a number of people are using SOAP to give concrete APIs to their Web-based systems. Think of all the times you've tried to parse HTML to extract information - the SOAP way of the future is for that information to be directly accessible through SOAP calls. As I said, this is already happening.

I've heard .NET described as "Microsoft's tacit admission that most shops are not 100 percent Microsoft, so Microsoft products need to work better with other platforms." There are many places for Perl in this new world:

More Information

So where can you learn more? Microsoft has a lot of information about .NET, of course.

http://www.microsoft.com/net/default.asp

There's an introduction from the Windows point of view (covering the plans for COM and ADO and all the other Windowsy things I didn't mention):

http://www.vbxml.com/xml/articles/dotnetintro/default.asp

Your best bet for things you can use now, though, are the SOAP and XML-RPC modules. Be warned: The SOAP modules are hard to get into.

XML-RPC via the Frontier::RPC module

http://bitsko.slc.ut.us/~ken/xml-rpc/

SOAP Modules:

http://www.cpan.org/modules/by-module/SOAP/

Here are some references for Web services:


I meant that while Microsoft is funding and encouraging other languages to compile down to the IL, Sun never seemed to do that with the JVM. As far as I could tell (and I am the first to admit that I am on the fringe of the Java world), their main push was to have Java be The Language. I'm interested to hear whether they really did encourage other languages to compile to the JVM.

Poor choice of words on my part, sorry for the confusion. See http://grunge.cs.tu-berlin.de/~tolk/vmlanguages.html for a list of languages that compile to the JVM.

Beginners Intro to Perl - Part 5

Editor's note: this venerable series is undergoing updates. You might be interested in the newer versions, available at:

Beginners Intro to Perl

Part 1 of this series
Part 2 of this series
Part 3 of this series
Part 4 of this series
Part 6 of this series

What Is an Object?
Our Goal
Starting Off
What Does Our Object Do?
Our Goal, Part 2
Encapsulation
Play Around!

So far, we've mostly stuck to writing everything for our programs ourselves. One of the big advantages of Perl is that you don't need to do this. More than 1,000 people worldwide have contributed more than 5,000 utility packages, or modules, for common tasks.

In this installment, we'll learn how modules work by building one, and along the way we'll learn a bit about object-oriented programming in Perl.

What Is an Object?

Think back to the first article in this series, when we discussed the two basic data types in Perl, strings and numbers. There's a third basic data type: the object.

Objects are a convenient way of packaging information with the things you actually do with that information. The information an object contains is called its properties, and the things you can do with that information are called methods.

For example, you might have an AddressEntry object for an address book program - this object would contain properties that store a person's name, mailing address, phone number and e-mail address; and methods that print a nicely formatted mailing label or allow you to change the person's phone number.

During the course of this article, we'll build a small, but useful, class: a container for configuration file information.

Our Goal

So far, we've put the code for setting various options in our programs directly in the program's source code. This isn't a good approach. You may want to install a program and allow multiple users to run it, each with their own preferences, or you may want to store common sets of options for later. What you need is a configuration file to store these options.

We'll use a simple plain-text format, where name and value pairs are grouped in sections, and sections are indicated by a header name in brackets. When we want to refer to the value of a specific key in our configuration file, we call the key section.name. For instance, the value of author.firstname in this simple file is ``Doug:''

   [author]
   firstname=Doug
   lastname=Sheppard

   [site]
   name=Perl.com
   url=http://www.perl.com/

(If you used Windows in the ancient days when versions had numbers, not years, you'll recognize this as being similar to the format of INI files.)

Now that we know the real-world purpose of our module, we need to think about what properties and methods it will have: What do TutorialConfig objects store, and what can we do with them?

The first part is simple: We want the object's properties to be the values in our configuration file.

The second part is a little more complex. Let's start by doing the two things we need to do: read a configuration file, and get a value from it. We'll call these two methods read and get. Finally, we'll add another method that will allow us to set or change a value from within our program, which we'll call set. These three methods will cover nearly everything we want to do.

Starting Off

We'll use the name TutorialConfig for our configuration file class. (Class names are normally named in this InterCapitalized style.) Since Perl looks for a module by its filename, this means we'll call our module file TutorialConfig.pm.

Put the following into a file called TutorialConfig.pm:

    package TutorialConfig;

    warn "TutorialConfig is successfully loaded!\n";
    1;

(I'll be sprinkling debugging statements throughout the code. You can take them out in practice. The warn keyword is useful for warnings - things that you want to bring to the user's attention without ending the program the way die would.)

The package keyword tells Perl the name of the class you're defining. This is generally the same as the module name. (It doesn't have to be, but it's a good idea!) The 1; will return a true value to Perl, which indicates that the module was loaded successfully.

You now have a simple module called TutorialConfig, which you can use in your code with the use keyword. Put the following into a very simple, one-line program:

    use TutorialConfig;

When we run this program, we see the following:

    TutorialConfig is successfully loaded!

What Does Our Object Do?

Before we can create an object, we need to know how to create it. That means we must write a method called new that will set up an object and return it to us. This is also where you put any special initialization code that you might need to run for each object when it is created.

The new method for our TutorialConfig class looks like this, and goes into TutorialConfig.pm right after the package declaration:

    sub new {
        my ($class_name) = @_;

        my ($self) = {};
        warn "We just created our new variable...\n ";

        bless ($self, $class_name);
        warn "and now it's a $class_name object!\n";

        $self->{'_created'} = 1;
        return $self;
    }

(Again, you won't need those warn statements in actual practice.)

Let's break this down line by line.

First, notice that we define methods by using sub. (All methods are really just a special sort of sub.) When we call new, we pass it one parameter: the type of object we want to create. We store this in a private variable called $class_name. (You can also pass extra parameters to new if you want. Some modules use this for special initialization routines.)

Next, we tell Perl that $self is a hash. The syntax my ($self) = {}; is a special idiom that's used mostly in Perl object programming, and we'll see how it works in some of our methods. (The technical term is that $self is an anonymous hash, if you want to read more about it elsewhere.)

Third, we use the bless function. You give this function two parameters: a variable that you want to make into an object, and the type of object you want it to be. This is the line that makes the magic happen!

Fourth, we'll set a property called ``_created''. This property isn't really that useful, but it does show the syntax for accessing the contents of an object: $object_name->{property_name}.

Finally, now that we've made $self into a new TutorialConfig object, we return it.

Our program to create a TutorialConfig object looks like this:

    use TutorialConfig;
    $tut = new TutorialConfig;

(You don't need to use parentheses here, unless your object's new method takes any extra parameters. But if you feel more comfortable writing $tut = new TutorialConfig();, it'll work just as well.)

When you run this code, you'll see:

    TutorialConfig is successfully loaded!
    We just created the variable ...
    and now it's a TutorialConfig object!

Now that we have a class and we can create objects with it, let's make our class do something!

Our Goal, Part 2

Look at our goals again. We need to write three methods for our TutorialConfig module: read, get and set.

The first method, read, obviously requires that we tell it what file we want to read. Notice that when we write the source code for this method, we must give it two parameters. The first parameter is the object we're using, and the second is the filename we want to use. We'll use return to indicate whether the file was successfully read.

   sub read {
      my ($self, $file) = @_;
      my ($line, $section);

      open (CONFIGFILE, $file) or return 0;

      # We'll set a special property 
      # that tells what filename we just read.
      $self->{'_filename'} = $file;



      while ($line = <CONFIGFILE>) {

         # Are we entering a new section?
         if ($line =~ /^\[(.*)\]/) {
            $section = $1;
         } elsif ($line =~ /^([^=]+)=(.*)/) {
            my ($config_name, $config_val) = ($1, $2);
            if ($section) {
               $self->{"$section.$config_name"} = $config_val;
            } else {
               $self->{$config_name} = $config_val;
            }
         }
      }

      close CONFIGFILE;
      return 1;
   }

Now that we've read a configuration file, we need to look at the values we just read. We'll call this method get, and it doesn't have to be complex:


    sub get {
        my ($self, $key) = @_;

        return $self->{$key};
    }

These two methods are really all we need to begin experimenting with our TutorialConfig object. Take the module and sample configuration file from above (or download the configuration file here and the module here), put it in a file called tutc.txt, and then run this simple program:


    use TutorialConfig;

    $tut = new TutorialConfig;
    $tut->read('tutc.txt') or die "Couldn't read config file: $!";

    print "The author's first name is ", 
             $tut->get('author.firstname'), 
             ".\n";

(Notice the syntax for calling an object's methods: $object->method(parameters).)

When you run this program, you'll see something like this:

    TutorialConfig has been successfully loaded!
    We just created the variable... 
    and now it's a TutorialConfig object!
    The author's first name is Doug.

We now have an object that will read configuration files and show us values inside those files. This is good enough, but we've decided to make it better by writing a set method that allows us to add or change configuration values from within our program:

    sub set {
        my ($self, $key, $value) = @_;

        $self->{$key} = $value;
    }

Now let's test it out:

    use TutorialConfig;
    $tut = new TutorialConfig;

    $tut->read('tutc.txt') or die "Can't read config file: $!";
    $tut->set('author.country', 'Canada');

    print $tut->get('author.firstname'), " lives in ",
          $tut->get('author.country'), ".\n";

These three methods (read, get and set) are everything we'll need for our TutorialConfig.pm module. More complex modules might have dozens of methods!

Encapsulation

You may be wondering why we have get and set methods at all. Why are we using $tut->set('author.country', 'Canada') when we could use $tut->{'author.country'} = 'Canada' instead? There are two reasons to use methods instead of playing directly with an object's properties.

First, you can generally trust that a module won't change its methods, no matter how much their implementation changes. Someday, we might want to switch from using text files to hold configuration information to using a database like MySQL or Postgres. Our new TutorialConfig.pm module might have new, read, get and set methods that look like this:

      sub new {
          my ($class) = @_;
          my ($self) = {};
          bless $self, $class;
          return $self;
      }

      sub read {
          my ($self, $file) = @_;
          my ($db) = database_connect($file);
          if ($db) {
              $self->{_db} = $db;
              return $db;
          }
          return 0;
      }

      sub get {
          my ($self, $key) = @_;
          my ($db) = $self->{_db};

          my ($value) = database_lookup($db, $key);
          return $value;
      }

      sub set {
          my ($self, $key, $value) = @_;
          my ($db) = $self->{_db};

          my ($status) = database_set($db, $key, $value);
          return $status;
      }

(Our module would define the database_connect, database_lookup and database_set routines elsewhere.)

Even though the entire module's source code has changed, all of our methods still have the same names and syntax. Code that uses these methods will continue working just fine, but code that directly manipulates properties will break!

For instance, let's say you have some code that contains this line to set a configuration value:

     $tut->{'author.country'} = 'Canada';

This works fine with the original TutorialConfig.pm module, because when you call $tut->get('author.country'), it looks in the object's properties and returns ``Canada'' just like you expected. So far, so good. However, when you upgrade to the new version that uses databases, the code will no longer return the correct result. Instead of get() looking in the object's properties, it'll go to the database, which won't contain the correct value for ``author.country''! If you'd used $tut->set('author.country', 'Canada') all along, things would work fine.

As a module author, writing methods will let you make changes (bug fixes, enhancements, or even complete rewrites) without requiring your module's users to rewrite any of their code.

Second, using methods lets you avoid impossible values. You might have an object that takes a person's age as a property. A person's age must be a positive number (you can't be -2 years old!), so the age() method for this object will reject negative numbers. If you bypass the method and directly manipulate $obj->{'age'}, you may cause problems elsewhere in the code (a routine to calculate the person's birth year, for example, might fail or produce an odd result).

As a module author, you can use methods to help programmers who use your module write better software. You can write a good error-checking routine once, and it will be used many times.

(Some languages, by the way, enforce encapsulation, by giving you the ability to make certain properties private. Perl doesn't do this. In Perl, encapsulation isn't the law, it's just a very good idea.)

Play Around!

1. Our TutorialConfig.pm module could use a method that will write a new configuration file to any filename you desire. Write your own write() method (use keys %$self to get the keys of the object's properties). Be sure to use or to warn if the file couldn't be opened!

2. Write a BankAccount.pm module. Your BankAccount object should have deposit, withdraw, and balance methods. Make the withdraw method fail if you try to withdraw more money than you have, or deposit or withdraw a negative amount of money.

3. CGI.pm also lets you use objects if you want. (Each object represents a single CGI query.) The method names are the same as the CGI functions we used in the last article:

    use CGI;
    $cgi = new CGI;

    print $cgi->header(), $cgi->start_html();
    print "The 'name' parameter is ", $cgi->param('name'), ".\n";
    print $cgi->end_html();

Try rewriting one of your CGI programs to use CGI objects instead of the CGI functions.

4. A big advantage of using CGI objects is that you can store and retrieve queries on disk. Take a look in the CGI.pm documentation to learn how to use the save() method to store queries, and how to pass a filehandle to new to read them from disk. Try writing a CGI program that saves recently used queries for easy retrieval.


This Week on p5p 2000/12/17



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

Object creation and destruction

Ilya came up with another startling patch this week, but this one was a little more complex: he estimates, however, that it "decreases the overhead of creation/destruction of objects 3-4 times". It does this by applying the same "handler" mechanism that operator overloading uses to DESTROY methods. He also says that "other handlers can be easily done the same way", which (I think) means that similar speed-ups are possible in BEGIN and END blocks.

More cool PerlIO stuff

Nick has been working his magic yet again; you can now say

    use Encode;
    open($fh,"<encoding(iso8859-7)",$greek) 
      || die "Cannot open $greek:$!";

This makes me ecstatically happy.

He also suggested it should be possible to use a scalar as data to be read from a filehandle, something I imagine every programmer's wanted to do at least once in their life. That is:

    $data = "...";
    open FH, "<", \$data or die "This really can't happen";
    while (<FH>) {
        ...
    }

While he hasn't coded this yet, it shouldn't be very difficult, and would be a nice general solution to all sorts of problems.

What to do with bugs

Jarkko pointed out that there was a problem in the bug process, in that bugs can very easily be forgotten about without a trace, and there's no feedback between us and the bug reporter. We obviously want to fix this, so there's communication with the reporter, and so we make sure that every bug is dealt with and doesn't get forgotten.

Richard Foley suggested sending out a reminder of open bugs to the bug admins and p5p; there was then some wonderful set theory mathematics about which categories of currently open bugs we should start the ball rolling with and how many there were. This process will definitely happen with bugs submitted from now on, though. Read about it.

UV Preserving Arithmetic

Nicholas Clark's great work with UV/IV preserving arithmetic (You know, so that $a=3; $b=5; $a + $b results in an IV, not an NV) seems to have collapsed around his feet. It seemed like everything was going well, after a couple of 70-odd K patches starting this thread managed to get the whole thing working quickly and accurately, but then Jarkko discovered it was giving nasty results on some platforms. Helmut Jarausch found it failing on Irix, and Jarkko was having it cause strange problems on Digital Unix. The patches have been pulled out of the repository temporarily, and Nick is reportedly looking for access to "something slightly more esoteric than FreeBSD".

STOP PRESS! This from Nick:

Jarkko and I worked hard at hammering out the problems - I was relying on strtoul behaviour when presented with a number with leading "-" working everywhere the way it does on FreeBSD, linux (and Irix Jarkko reports) but it didn't work that way on Digital Unix, and that's where his nasty problems came from. I re-wrote that bit of code in sv.c to avoid even passing in the string with the leading "-" and with this solution it works everywhere we've tested, and seems to be in the repository as of yesterday (Saturday)

Code checkers

Last time I asked:

(Hey, maybe someone would like to try writing a program that automatically extracts example code from the documentation and makes sure it compiles?)

Well, Tels did it, and produced Pod::Checker::Code; initially, it checked the synopsis and examples sections of module documentation, but I suggested it should be extended to the example code in /pod. Tels duly did that, but this raised another question: how to mark up the code appropriately so that it can be checked? Various ideas were raised, the most promising being

    =code perl

    =back

I don't think anything was actually decided; check the rest of the thread for details.

Precedence

Jeff Pinyan reminded us again that the output of

    $x = 10;
    print ++$x / $x;

is not what you might expect; Johan Vromans trumped this with

    $i = 1; @a = ($i++, $i++, $i++, $i++, $i++);
    $i = 0; @b = (++$i, ++$i, ++$i, ++$i, ++$i);
    print "@a\n@b\n";

Now, we all know that if you do things with multiple side-effects at the same time, Weird Things occur. However, there was some argument as to whether this ought to be the case. Johan said:

I expect

$x = 10; print ++$x / $x--;

to produce the same output as:

$x = 10; $x1 = sub { ++$x }; $x2 = sub { $x-- }; print $x1->() / $x2->();

If not, it's a bug, or we'd better have a _good_ explanation.

Various people disagreed; it's not just a question of side-effects, but also a question of the order of evaluation of operands. C leaves the order of evaluation undefined, but do we want Perl to go this way? Nicholas Clark maintained that keeping the order undefined allows for flexibility in the implementation: if we promise a certain evaluation order, but then someone comes up with a huge speed increase hack which jiggles the evaluation order, we can't use it. John Peacock summed it up rather differently: "Doctor, it hurts when I do this!". Read about it.

Reminder about the FAQ

Daniel Stutz asked how to get the latest development sources, (Answer: read perlhack, it tells you - well, the perlhack in the latest development sources does...) and also complained that "it's very hard to find information about P5P". Various people pointed out that P5P is mentioned in the first part of the Perl FAQ, and I finally remembered to post the FAQ.

Do you know about the P5P FAQ? You should. Email perl5-porters-faq@perl.org for a copy, or read it here.

Various

Until next week I remain, your humble and obedient servant,


Simon Cozens

Beginners Intro to Perl - Part 4

Editor's note: this venerable series is undergoing updates. You might be interested in the newer versions, available at:

It's CGI time

Beginners Intro to Perl

Part 1 of this series
Part 2 of this series
Part 3 of this series
Part 5 of this series
Part 6 of this series

What is CGI?
A Real CGI Program
Uh-Oh!
Our Second Script
Sorting
Trust No One
Play Around!

So far, we've talked about Perl as a language for mangling numbers, strings, and files - the original purpose of the language. Now it's time to talk about what Perl does on the Web. In this installment, we're going to talk about CGI programming.

What is CGI?

The Web is based on a client-server model: your browser (the client) making requests to a Web server. Most of these are simple requests for documents or images, which the server delivers to the browser for display.

Of course, sometimes you want the server to do more than just dump the contents of a file. You'd like to do something with a server-side program - whether that "something" is using Web-based e-mail, looking up a phone number in a database or ordering a copy of Evil Geniuses in a Nutshell for your favorite techie. This means the browser must be able to send information (an e-mail address, a name to look up, shipping information for a book) to the server, and the server must be able to use that information and return the results to the user.

The standard for communication between a user's Web browser and a server-side program running on the Web server is called CGI, or Common Gateway Interface. It is supported by all popular Web server software. To get the most out of this article, you will need to have a server that supports CGI. This may be a server running on your desktop machine or an account with your ISP (though probably not a free Web-page service). If you don't know whether you have CGI capabilities, ask your ISP or a local sysadmin how to set things up.

Notice that I haven't described how CGI works; that's because you don't need to know. There's a standard Perl module called CGI.pm that will handle the CGI protocol for you. CGI.pm is part of the core Perl distribution, and any properly installed Perl should have it available.

Telling your CGI program that you want to use the CGI module is as simple as this:

use CGI ':standard';

The use CGI ':standard'; statement tells Perl that you want to use the CGI.pm module in your program. This will load the module and make a set of CGI functions available for your code.

A Real CGI Program

Let's write our first real CGI program. Instead of doing something complex, we'll write something that will simply throw back whatever we throw at it. We'll call this script backatcha.cgi:

#!/usr/local/bin/perl

use CGI ':standard';

print header();
print start_html();

for $i (param()) {
    print "<b>", $i, "</b>: ", param($i), "<br>\n";
}

print end_html();

If you've never used HTML, the pair of <b> and </b> tags mean "begin bold" and "end bold", respectively, and the <br> tag means "line break." (A good paper reference to HTML is O'Reilly's HTML & XHTML: The Definitive Guide, and online, I like the Web Design Group.)

Install this program on your server and do a test run. (If you don't have a Web server of your own, we've put a copy online for you here.) Here's a short list of what you do to install a CGI program:

  1. Make sure the program is placed where your Web server will recognize it as a CGI script. This may be a special cgi-bin directory or making sure the program's filename ends in .pl or .cgi. If you don't know where to place the program, your ISP or sysadmin should.
  2. Make sure the program can be run by the server. If you are using a Unix system, you may have to give the Web-server user read and execute permission for the program. It's easiest to give these permissions to everybody by using chmod filename 755.
  3. Make a note of the program's URL, which will probably be something like http://server name/cgi-bin/backatcha.cgi) and go to that URL in your browser. (Take a guess what you should do if you don't know what the URL of the program is. Hint: It involves the words "ask," "your" and "ISP.")

If this works, you will see in your browser ... a blank page! Don't worry, this is what is supposed to happen. The backatcha.cgi script throws back what you throw at it, and we haven't thrown anything at it yet. We'll give it something to show us in a moment.

If it didn't work, you probably saw either an error message or the source code of the script. We'll try to diagnose these problems in the next section.

Uh-Oh!

If you saw an error message, your Web server had a problem running the CGI program. This may be a problem with the program or the file permissions.

First, are you sure the program has the correct file permissions? Did you set the file permissions on your program to 755? If not, do it now. (Windows Web servers will have a different way of doing this.) Try it again; if you see a blank page now, you're good.

Second, are you sure the program actually works? (Don't worry, it happens to the best of us.) Change the use CGI line in the program to read:

use CGI ':standard', '-debug';

Now run the program from the command line. You should see the following:

(offline mode: enter name=value pairs on standard input)

This message indicates that you're testing the script. You can now press Ctrl-D to tell the script to continue running without telling it any form items.

If Perl reports any errors in the script, you can fix them now.

(The -debug option is incredibly useful. Use it whenever you have problems with a CGI program, and ignore it at your peril.)

The other common problem is that you're seeing the source code of your program, not the result of running your program. There are two simple problems that can cause this.

First, are you sure you're going through your Web server? If you use your browser's "load local file" option (to look at something like /etc/httpd/cgi-bin/backatcha.cgi instead of something like http://localhost/cgi-bin/backatcha.cgi), you aren't even touching the Web server! Your browser is doing what you "wanted" to do: loading the contents of a local file and displaying them.

Second, are you sure the Web server knows it's a CGI program? Most Web server software will have a special way of designating a file as a CGI program, whether it's a special cgi-bin directory, the .cgi or .pl extension on a file, or something else. Unless you live up to these expectations, the Web server will think the program is a text file, and serve up your program's source code in plain-text form. Ask your ISP for help.

CGI programs are unruly beasts at the best of times; don't worry if it takes a bit of work to make them run properly.

Making the Form Talk Back

At this point, you should have a working copy of backatcha.cgi spitting out blank pages from a Web server. Let's make it actually tell us something. Take the following HTML code and put it in a file:

<FORM ACTION="putyourURLhere" METHOD=GET>
    <P>What is your favorite color? <INPUT NAME="favcolor"></P>
<INPUT TYPE=submit VALUE="Send form">
lt;/FORM>

Be sure to replace putyourURLhere with the actual URL of your copy of backatcha.cgi! If you want, you can use the copy installed here at Perl.com.

This is a simple form. It will show a text box where you can enter your favorite color and a "submit" button that sends your information to the server. Load this form in your browser and submit a favorite color. You should see this returned from the server:

favcolor: green

CGI functions

The CGI.pm module loads several special CGI functions for you. What are these functions?

The first one, header(), is used to output any necessary HTTP headers before the script can display HTML output. Try taking this line out; you'll get an error from the Web server when you try to run it. This is another common source of bugs!

The start_html() function is there for convenience. It returns a simple HTML header for you. You can pass parameters to it by using a hash, like this:

print $cgi->start_html( -title => "My document" );

(The end_html() method is similar, but outputs the footers for your page.)

Finally, the most important CGI function is param(). Call it with the name of a form item, and a list of all the values of that form item will be returned. (If you ask for a scalar, you'll only get the first value, no matter how many there are in the list.)

$yourname = param("firstname");
print "<P>Hi, $yourname!</P>\n";

If you call param() without giving it the name of a form item, it will return a list of all the form items that are available. This form of param() is the core of our backatcha script:

for $i (param()) {
    print "<b>$i</b>: ", param($i), "<br>\n";
}

Remember, a single form item can have more than one value. You might encounter code like this on the Web site of a pizza place that takes orders over the Web:

    <P>Pick your toppings!<BR>
       <INPUT TYPE=checkbox NAME=top VALUE=pepperoni> Pepperoni <BR>
       <INPUT TYPE=checkbox NAME=top VALUE=mushrooms> Mushrooms <BR>
       <INPUT TYPE=checkbox NAME=top VALUE=ham> Ham <BR>
    </P>

Someone who wants all three toppings would submit a form where the form item top has three values: "pepperoni," "mushrooms" and "ham." The server-side code might include this:

    print "<P>You asked for the following pizza toppings: ";
    @top = param("top");
    for $i (@top) {
        print $i, ". ";
    }
    print "</P>";

Now, here's something to watch out for. Take another look at the pizza-topping HTML code. Try pasting that little fragment into the backatcha form, just above the <INPUT TYPE=submit...> tag. Enter a favorite color, and check all three toppings. You'll see this:

    favcolor: burnt sienna
    top: pepperonimushroomsham

Why did this happens? When you call param('name'), you get back a list of all of the values for that form item. This could be considered a bug in the backatcha.cgi script, but it's easily fixed - use join() to separate the item values:

    print "<b>$i</b>: ", join(', ', param($i)), "<br>\n";

or call C<param()> in a scalar context first to get only the first value:

    $j = param($i);
    print "<b>$i</b>: $j
\n";

Always keep in mind that form items can have more than one value!

Our Second Script

So now we know how to build a CGI program, and we've seen a simple example. Let's write something useful. In the last article, we wrote a pretty good HTTP log analyzer. Why not Web-enable it? This will allow you to look at your usage figures from anywhere you can get to a browser.

Download the source code for the HTTP log analyzer

First, let's decide what we want to do with our analyzer. Instead of showing all of the reports we generate at once, we'll show only those the user selects. Second, we'll let the user choose whether each report shows the entire list of items, or the top 10, 20 or 50 sorted by access count.

We'll use a form such as this for our user interface:

    <FORM ACTION="/cgi-bin/http-report.pl" METHOD=POST>
        <P>Select the reports you want to see:</P>

 <P><INPUT TYPE=checkbox NAME=report VALUE=url>URLs requested<BR>
    <INPUT TYPE=checkbox NAME=report VALUE=status>Status codes<BR>
    <INPUT TYPE=checkbox NAME=report VALUE=hour>Requests by hour<BR>
    <INPUT TYPE=checkbox NAME=report VALUE=type>File types
 </P>

 <P><SELECT NAME="number">
     <OPTION VALUE="ALL">Show all
     <OPTION VALUE="10">Show top 10
     <OPTION VALUE="20">Show top 20
     <OPTION VALUE="50">Show top 50
 </SELECT></P>

 <INPUT TYPE=submit VALUE="Show report">
    </FORM>

(Remember that you may need to change the URL!)

We're sending two different types of form item in this HTML page. One is a series of checkbox widgets, which set values for the form item report. The other is a single drop-down list which will assign a single value to number: either ALL, 10, 20 or 50.

Take a look at the original HTTP log analyzer. We'll start with two simple changes. First, the original program gets the filename of the usage log from a command-line argument:

      # We will use a command line argument to determine the log filename.
      $logfile = $ARGV[0];

We obviously can't do that now, since the Web server won't allow us to enter a command line for our CGI program! Instead, we'll hard-code the value of $logfile. I'll use "/var/log/httpd/access_log" as a sample value.

      $logfile = "/var/log/httpd/access_log";

Second, we must make sure that we output all the necessary headers to our Web server before we print anything else:

      print header();
      print start_html( -title => "HTTP Log report" );

Now look at the report() sub from our original program. It has one problem, relative to our new goals: It outputs all the reports instead of only the ones we've selected. We'll rewrite report() so that it will cycle through all the values of the report form item and show the appropriate report for each.

 sub report {
    for $i (param('report')) {
 if ($i eq 'url') {
     report_section("URL requests", %url_requests);
 } elsif ($i eq 'status') {
     report_section("Status code requests", %status_requests);
 } elsif ($i eq 'hour') {
     report_section("Requests by hour", %hour_requests);
 } elsif ($i eq 'type') {
     report_section("Requests by file type", %type_requests);
 }
    }
 }

Finally, we rewrite the report_section() sub to output HTML instead of plain text. (We'll discuss the new way we're using sort in a moment.)

    sub report_section {
 my ($header, %type) = @_;
 my (@type_keys);

 # Are we sorting by the KEY, or by the NUMBER of accesses?
 if (param('number') ne 'ALL') {
     @type_keys = sort { $type{$b} <=> $type{$a}; } keys %type;

     # Chop the list if we have too many results
     if ($#type_keys > param('number') - 1) {
         $#type_keys = param('number') - 1;
     }
 } else {
     @type_keys = sort keys %type;
 }

 # Begin a HTML table
 print "<TABLE>\n";

 # Print a table row containing a header for the table
 print "<TR><TH COLSPAN=2>", $header, "</TH></TR>\n";

 # Print a table row containing each item and its value
 for $i (@type_keys) {
     print "<TR><TD>", $i, "</TD><TD>", $type{$i}, "</TD></TR>\n";
 }

 # Finish the table
 print "</TABLE>\n";
    }

Sorting

Perl allows you to sort lists with the sort keyword. By default, the sort will happen alphanumerically: numbers before letters, uppercase before lowercase. This is sufficient 99 percent of the time. The other 1 percent of the time, you can write a custom sorting routine for Perl to use.

This sorting routine is just like a small sub. In it, you compare two special variables, $a and $b, and return one of three values depending on how you want them to show up in the list. Returning -1 means "$a should come before $b in the sorted list," 1 means "$b should come before $a in the sorted list" and 0 means "they're equal, so I don't care which comes first." Perl will run this routine to compare each pair of items in your list and produce the sorted result.

For example, if you have a hash called %type, here's how you might sort its keys in descending order of their values in the hash.

    sort {
        if ($type{$b} > $type{$a}) { return 1; }
 if ($type{$b} < $type{$a}) { return -1; }
 return 0;
    } keys %type;

In fact, numeric sorting happens so often, Perl gives you a convenient shorthand for it: the <=> operator. This operator will perform the above comparison between two values for you and return the appropriate value. That means we can rewrite that test as:

    sort { $type{$b} <=> $type{$a}; } keys %type

(And this, in fact, is what we use in our log analyzer.)

You can also compare strings with sort. The lt and gt operators are the string equivalents of < and >, and cmp will perform the same test as <=>. (Remember, string comparisons will sort numbers before letters and uppercase before lowercase.)

For example, you have a list of names and phone numbers in the format "John Doe 555-1212." You want to sort this list by the person's last name, and sort by first name when the last names are the same. This is a job made for cmp!

     @sorted = sort {
         ($c) = ($a =~ / (\w+)/);
  ($d) = ($b =~ / (\w+)/);
  if ($c eq $d) {   # Last names are the same, sort on first name
      ($c) = ($a =~ /^(\w+)/);
      ($d) = ($b =~ /^(\w+)/);
      return $c cmp $d;
  } else {
      return $c cmp $d;
  }
     } @phone_numbers;
     for $i (@sorted) { print $i, "\n"; }

Trust No One

Now that we know how CGI programs can do what you want, let's make sure they won't do what you don't want. This is harder than it looks, because you can't trust anyone to do what you expect.

Here's a simple example: You want to make sure the HTTP log analyzer will never show more than 50 items per report, because it takes too long to send larger reports to the user. The easy thing to do would be to eliminate the "ALL" line from our HTML form, so that the only remaining options are 10, 20 and 50. It would be very easy - and wrong.

Download the source code for the HTTP analyzer with security enhancements.

We saw that you can modify HTML forms when we pasted the pizza-topping sample code into our backatcha page. You can also use the URL to pass form items to a script - try going to http://www.perl.com/2000/12/backatcha.cgi?itemsource=URL&typedby=you in your browser. Obviously, if someone can do this with the backatcha script, they can also do it with your log analyzer and stick any value for number in that they want: "ALL" or "25000", or "four score and seven years ago."

Your form doesn't allow this, you say. Who cares? People will write custom HTML forms to exploit weaknesses in your programs, or will just pass bad form items to your script directly. You cannot trust anything users or their browsers tell you.

You eliminate these problems by knowing what you expect from the user, and disallowing everything else. Whatever you do not expressly permit is totally forbidden. Secure CGI programs consider everything guilty until it is made innocent.

For example, we want to limit the size of reports from our HTTP log analyzer. We decide that means the number form item must have a value that is between 10 and 50. We'll verify it like this:

    # Make sure that the "number" form item has a reasonable value
    ($number) = (param('number') =~ /(\d+)/);
    if ($number < 10) {
        $number = 10;
    } elsif ($number > 50) {
        $number = 50;
    }

Of course, we also have to change the report_section() sub so it uses the $number variable. Now, whether your user tries to tell your log analyzer that the value of number is "10," "200," "432023," "ALL" or "redrum," your program will restrict it to a reasonable value.

We don't need to do anything with report, because we only act when one of its values is something we expected. If the user tries to enter something other than our expressly permitted values ("url," "status," "hour" or "type"), we just ignore it.

Use this sort of logic everywhere you know what the user should enter. You might use s/\D//g to remove non-numeric characters from items that should be numbers (and then test to make sure what's left is within your range of allowable numbers!), or /^\w+$/ to make sure that the user entered a single word.

All of this has two significant benefits. First, you simplify your error-handling code, because you make sure as early in your program as possible that you're working with valid data. Second, you increase security by reducing the number of "impossible" values that might help an attacker compromise your system or mess with other users of your Web server.

Don't just take my word for it, though. The CGI Security FAQ has more information about safe CGI programming in Perl than you ever thought could possibly exist, including a section listing some security holes in real CGI programs.

Play Around!

You should now know enough about CGI programming to write a useful Web application. (Oh, and you learned a little bit more about sorting and comparison.)

  1. Write the quintessential CGI program: a guestbook. Users enter their name, e-mail address and a short message, which is appended to an HTML file for all to see.

    Be careful! Never trust the user! A good beginning precaution is to disallow all HTML by either removing < and > characters from all of the user's information or replacing them with the < and > character entities.

    Use substr(), too, to cut anything the user enters down to a reasonable size. Asking for a "short" message will do nothing to prevent the user dumping a 500k file into the message field!

  2. Write a program that plays tic-tac-toe against the user. Be sure that the computer AI is in a sub so it can be easily upgraded. (You'll probably need to study HTML a bit to see how to output the tic-tac-toe board.)

This Week on p5p 2000/12/03



Notes

You can subscribe to an e-mail version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to simon@cozens.net.

Tests

I opted not to mention this last week, but Casey Tweten pointed out that it was quite important for module authors: regression tests. You write regression tests, right? Of course, you do. The problem is that there are umpteen gazillion ways to write regression tests, which makes it horrible to debug them and find out what's really happening when a test fails. There's a core module called Test that gives a neat framework for writing tests. There was some noise on perl5-porters to the effect that people wanted the core regression tests to use Test, but the counter-argument was that the core tests should be kept as free from outside interference as possible - the Test module may contain some constructs that the core tests are trying to test. It's no good loading a module to help with your tests if one of your tests is whether you can load a module or not! Nevertheless, it would be nice if some of the more advanced tests were converted to the Test interface. (That was a hint, by the way, for anyone who fancies doing that.)

The real outcome of this was a patch by Casey to convert the standard module template generated by h2xs to use the Test module, and to encourage (i.e. force) module authors to use it. So, module authors, if you don't know about Test.pm, you will soon.

Charnames

This patch from Ilya took me a while to get my head around, but now I have and I think it's beautiful. When doing Unicode testing and entering Unicode data without a Unicode editor, we have to resort to things like

    $x =
    "\x{395}\x{3CD}\x{3B1}\x{3B3}\x{3B3}\x{3B5}\x{3BB}\x{3CA}\x{3B1}";

or

    $x =
    v917.973.945.947.947.949.955.970.945;

or even

    $x = 
    "\N{GREEK CAPITAL LETTER EPSILON}\N{GREEK SMALL LETTER UPSILON}...";

This is a nightmare.

Ilya's solution allows you to enter Unicode texts in foreign languages as Latin transliterations. He gives a module that provides Russian transliterations, so with Ilya's module you can now do:

    use Charnames qw(cyrillic);
    $x = "\N{Il'ya Zakharevich}";

and Perl will do the right thing. The suggestion is to have a few transliteration modules in the core for testing and to have less-commonly used ones on CPAN.

However, in many non-Latin languages, transliteration to the Latin alphabet is vague at best, and there are usually several different methods of doing so; worse, the mappings are sometimes nonreversable and/or non-one-to-one. Ilya's module for Russian is neat, but doesn't cover everything.

Regular Expression Bug

Jarkko has been turning up all sorts of wonders with his experiments in UTF8 regular-expression land. This time, he has found that

    use utf8; @a=("b" =~ /(.)/)

will cause a segmentation fault, which is horrid. Worse, this only seems to fail on 64-bit platforms, regardless of the setting of use64bitint, which suggested some hidden assumption. Eventually, it was traced to a careless read in sv_utf8_downgrade; Jarkko says:

Why the different platforms behave so differently (core dump vs. no core dump) on this bug is a but of a mystery, but if I had to guess I would mumble something like 'alignment.'

This is why being the Configure pumpkin is such a demanding job.

Another core dump came from

    use utf8; "," =~ /([^,]*,)*/

and another from

    use utf8; 
    $x = $^R = 67;
    "foot" =~ /foo(?{ $^R + 12 })((?{ +$x = 12; $^R + 17 })[xy])?/;

which was traced to a failure to save and restore the parantheses count. Again, the symptoms were confusingly different on different machines.

xsubpp

Ilya produced a patch for xsubpp that allows the OUT and IN_OUT keywords; this is in addition to the old IN_OUTLIST and OUTLIST keywords.

These are somewhat confusing, but here's my understanding of what they do: A parameter in a C function marked OUTLIST will have its value at the end of the function added to the list of return values to Perl. A parameter labeled IN_OUT will be read from a Perl variable at the beginning of the C function, and the value of the C variable at the end of the C function will be put back into the Perl variable. In effect, IN_OUT gives you a pointer to write through, which is "tied" to a Perl variable. [IN_OUTLIST] does the same, but instead of writing the value back to the Perl variable, it goes onto the list of return values.

An OUT value is set to the return value of the C function - I think. Decide for yourself.

Perlipc Examples Buggy

Nicholas Clark gave what I shall call an "impassioned appeal" about the state of the perlipc documentation; some of the examples didn't even compile, much less do what they claimed to do. This also turned up a problem with Net::hostent, which was particularly embarrassing since Net::hostent didn't have a regression test. Nicholas wiped up the worst of the perlipc bugs, and provided a basic regression test, which Robert Spier expanded. As Jarkko pointed out, writing a portable test for it is tricky, but any test is better than none ... .

(Hey, maybe someone would like to try writing a program that automatically extracts example code from the documentation and makes sure it compiles?)

PerlIO news

Using my magic crystal ball, I found that this week saw 500 patches to the Perl repository. Naturally, the bulk of them - a massive 400 - were the development main line, 32 were Sarathy integrating bunches of patches into 5.6.1-to-be, but the remaining 67 were Nick beavering away on the PerlIO branch. This should remind you that most of the PerlIO improvements happen without much advertisement, and it's easy to be unaware of exactly how much work is going on there.

Here's what Nick says about how PerlIO is going:

-Duseperlio now works as a replacement for stdio on UNIX platforms. As of last weekend, it was also working in "same functions as before" mode on Win32 in Win32's "simple" configuration. There has been some progress, but not success, in getting OS/2 in line. (Nothing on VMS yet.)

This week's target is the PERL_IMPLICIT_SYS scheme on Win32 that is needed for fork() emulation. Once that is built, the plan is to replace low-level pseudo-unix read() on Win32 with our own version.

The other area of work is to turn on use of PerlIO to allow files to be read/written as utf8 under programmer control.

Once that works, then we hook PerlIO to Encode - and we are "done" ;-) (This is actually a bit messy right now as PerlIO is deep under the core, and Encode.pm is an external XS module.)

Since Nick is going to be allowing layers to be accessible under programmer control, we need to know what layers ought to do, and this was Nick's question: "So would anyone care to remind me what the Unicode issues were that we want to solve?"

Briefly, we want to be able to read in UTF8-encoded text into UTF8-encoded SVs, and have the same output. One of the other uses of layers would be the CRLF translation magic used on DOS-derived systems and to replace the source filter mechanism.

Dodgy Function Names

This causes a syntax error:

    sub f {}
    $x-f($y);

This is because Perl assumes that -f is a file-test operator, and wonders what it's doing next to a variable with no binary operator in the middle. Some people, including Jarkko, thought that was silly; if I define a sub f, Perl should know that I'm trying to call that subroutine.

This naturally applies not just for the file tests, but any other operators that look like functions, such as y and s. Several solutions were proposed, such as forcing Perl to use the subroutine, or outlawing subroutines with "reserved" names. In the end, Jarkko produced a patch for the file tests that spits out a warning in the above case - I think the y, m, and other cases are still on the loose. The whole thread (36 messages) is worth reading, if only so you can get an idea of what nefarious things Perl porters get up to when syntax goes bad.

Lvalue subs

Casey asked for more useful lvalue subroutines. At the moment, you can say things like:

  package Person;
  sub new { bless { name => $name }, shift }
  sub name : lvalue { $_[0]->{name} }

  package main;

  my $p    = Person->new;
  $p->name = "casey";
  print $p->name . "\n";

Note that $p->name on the left-hand side of that assignment is actually a method call returning an lvalue. Cool, huh?

Casey mentioned that he'd really like some way of getting at the right-hand side of the assignment as well, in order to do things like implementing substr in pure Perl. Rick Delaney suggested that you could return a tied lvalue, but Casey replied that that was slow; the alternative was yet another global. Piers appealed for a faster tie system, which is fine, but someone has to design it, code it and make it better than the current one while doing all the same things.

Yitzchak pointed out that there's more to lvalues than just the assignment context, and having a way to get at the rvalue would probably break in nonassignment cases.

Various

Yes, floating-point numbers are imprecise. We know. However, thanks to Nick Clarke, there's now far fewer of them.

Until next week, I remain, your humble and obedient servant,


Simon Cozens
Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en