February 2001 Archives

This Week on p5p 2001/02/26



Notes

You can subscribe to an email version of this summary by sending an empty message to perl5-porters-digest@netthink.co.uk; there's also a similar summary for the Perl 6 mailing lists, which you can subscribe to at perl6-digest@netthink.co.uk.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

This week was very busy, but there were a lot of cross-posts from other lists. Partly due to the volume of traffic and partly due to work getting horribly, horribly busy, I've had to delay finishing this summary for a couple of days. My apologies.

Smoke Testing

Mr. Schwern sprung into action yet again with another brilliant idea: automated builds of Perl in all possible configurations and reporting the "smoke test" results to P5P. OK, it's been talked about a number of times in the past, but this time someone's done something about it. Bravo, Schwern! Of course, as with all good ideas, it was nearly drowned out by lots and lots of trivial bickering. Schwern had called the mailing list smokers@perl.org, and the smoke testing software SmokingJacket. This produced objections from non-smokers and recovering smokers, and started a long and tedious objection - counterproposal cycle. Eventually, the list was given an alias of daily-build@perl.org.

Schwern, however, had the last laugh when a change was needed to [SmokingJacket]:

It seems we're overwhelming perlbug and p5p with redundant reports. A little too successful. :) The perlbug people are working on a way to accommodate us and we should produce a new version of SmokingJacket shortly.

Until then, please *STOP* using SmokingJacket. Sorry about the trouble, I know it can be difficult to stop smoking, but we'll be sure to issue a patch to help. :P

If you have spare cycles and you want to help put them to use without much effort on your part, join the daily build mailing list.

Overriding +=

Alex Gough noted that overriding += does unexpected things when the left-hand side is undefined or non-overloaded; in his words:

I'm not claiming overload has a problem, just that it is not possible to write overloading modules which do not warn on

$undef += $obj

without also not warning on

$whatever = $undef + $obj

Rick Delaney had a patch which makes the "add-assign" method (instead of the "add" method, which is the current behaviour) get called even on non-overload left-hand sides. This broke old code, so there was some discussion as to whether there was a neater way to do it. Tels suggested treating undefined left-hand sides as zero, but Ronald Kimball pointed out:

I think that, since the += is being overloaded, we *don't* know that the undef will be treated like a 0. An overloaded += could do whatever it wants with an undef.

Someone could even implement an overloaded += that's supposed to warn when the lefthand operand is undef. :)

Read about it.

More Big Unicode Wars

Most of this week's (many) messages were taken up in various debates about the state of Unicode handling and how the Unicode semantics should work. I'm obviously too involved in the whole thing to give you an objective summary of what went on, but I can point you at the highlights and the starts of the threads.

One of the Unicode threads started here, and eventually, let to some agreement between myself, Nick Ing-Simmons, Ilya and Jarkko, which is a feat in itself; we decided that the model for Unicode on EBCDIC will look like this. (Incidentally, thanks to Morgan Stanley Dean Witter, who've promised me a day's hacking time on their mainframes, this might even be implemented soon.)

Another of the threads started here with Karsten Sperling attempting to nail down the semantics of Unicode handling. Most of the ensuing discussion was a mixture of boring language-lawyering and acrimony. Karsten also found some interesting bugs related to character ranges on EBCDIC, which everyone swore had been fixed years ago, but still seem to remain.

Nick Ing-Simmons posted a well thought-out and informative list and discussion of the remaining conflicts between our Unicode implementation and the Camel III's discussion of what should happen.

Unintentional irony of the week award goes to Ilya, for breathtakingly accusing Jarkko of "unnecessarily obfuscating" the regular expression engine.

Patchlevel in $^V and 5.7.1

Nicholas Clark asked

Would it be possible to make the $^V version string for bleadperl have the devel number after a third dot?

ie instead of

    perl -we 'printf "%vd\n", $^V'
    5.7.0

I'd like it if I could get

    perl -we 'printf "%vd\n", $^V'
    5.7.0.8670

Jarkko noted that this would cause problems with CPAN.pm; Nick turned around and asked when 5.7.1 was likely to happen. The outstanding issues seem to be Unicode, PerlIO and numerical problems including casting.

PerlIO is now the default IO system, and isn't giving that many problems. Nick Ing-Simmons noted that Nicholas Clark had produced a PerlIO::gzip filter extension which had flushed out a bunch of bugs.

Philip Newton said that a 5.7.0.8670-style release number wouldn't help us much anyway, because features would get folded back into, say, 5.6.1 or 5.6.2, and if you said

    require 5.7.0.8760;

Perl barf on 5.6.1 even if the features you need had been folded back in, bringing up the "feature test" discussion again.

Johan Vromans suggested that Perl could have a built-in Config.pm equivalent to report its configuration. Ted Ashton complained about the size of the resulting binary, but Robert Spier pointed out that Config.pm itself is pretty bloaty. Vadim Konovalov suggested that the advantage of having an external Config.pm is that you can change it and lie to Perl about how it was configured. Don't try this at home, folks.

Deleting stashes

Here's an interesting and probably not too hard job for someone. Alan Burlison found that if you delete a stash and then call a subroutine that was in it, Perl segfaults:

    $ perl -e 'sub bang { print "bang!\n"; } undef %main::; bang;'
    Segmentation Fault(coredump)

What Alan and I agreed should happen is that stash deletion should be allowed, but the method cache needs to be invalidated for that stash when it is deleted so that the next attempt to call a sub on it will give the ordinary "undefined subroutine" error.

IO on VMS

VMS seemed to be doing something very strange with output and pipes to the effect that Test::Harness couldn't properly see the results of Test.pm. Eventually it was simplified to

    print "a ";
    print "b ";
    print "c\n";

acting differently to

    print "a b c\n";

and this was explained by Dan Sugalski in a way that startled nearly everyone: "The way perl does communication between processes on VMS involves mailboxes."

But it transpired that the reality is somewhat more boring than we imagined: rather than a Heath-Robinsonian email-based IPC system, mailboxes are actually a little like Unix domain sockets. You send output as batches of records. Hence, there's a difference between sending the output as three records and as one. As there's a record separator between the prints, you get different output.

Various

Tim Jenness fixed up some long-standing known issues with File::Temp; if you were getting scary warning messages from File::Temp tests in the past, you won't any more.

Alan's been whacking at some more memory leaks; Jarkko was reproducing far more leaks than Alan until he turned up PERL_DESTRUCT_LEVEL, which actually frees the memory in use tidily, instead of allowing it to be reclaimed when the process exits. He asked why we don't do this all the time; the answer was "speed" - the exit's going to happen anyway, so why shut down gracefully? As Jarkko put it, "No point shaving in the morning if you are a kamikaze pilot?" Naturally, this lead to a discussion about the grooming habits of kamikaze.

Sarathy said "yikes" again, although on an unrelated topic.

Until next week I remain, your humble and obedient servant,


Simon Cozens

DBIx::Recordset VS DBI

Introduction

Writing this article was pure hell. No, actually, writing most of it was quite fun - it was just when I had to write the functional equivalent of my DBIx::Recordset code in DBI that I began to sweat profusely. It was only when I had finished writing the mountain of DBI code to do the same thing as my molehill of DBIx::Recordset that I could heave a sigh of relief. Since starting to use DBIx::Recordset, I have been loath to work on projects where the required database API was DBI. While it may seem like a play on words, it is crucial to understand that DBI is the standard database interface for Perl but it should not be the interface for most Perl applications requiring database functionality.

The key way to determine whether a particular module/library is matched to the level of a task is to count the number of lines of ``prep code'' you must write before you can do what you want. In other words, can the complex operations and data of your domain be dealt with in a unitary fashion by this module? In the case of DBI, we can say that it has made the tasks of connection, statement preparation, and data fetching tractable by reducing them to single calls to the DBI API. However, real-life applications have much larger and more practical complex units and it is in these respects that the DBI API falls short. To me, it comes as no surprise that DBI, a module whose only design intent was to present a uniform API to the wide variety of available databases, would lack such high-level functionality. But it does surprise me to no end that hoards of Perl programmers, some of whom may have had a software engineering course at some point in their careers, would make such errant judgment. Thus the fault lies with the judgment of the programmers, not DBI.

In most cases the gap between DBI's API and Perl applications has been bridged by indiscriminately mixing generic application-level functionality with the specifics of the current application. This makes it difficult to reuse the generic routines in another part of the application or in an altogether different application. Another maladaptive way that the DBI API has been extended for application-level databasing is by developing a collection of generic application-level tools but not publishing them. Thus, as larger scale tools are built from two camps using differing generic application-level APIs, amends for discrepancies in calling conventions must be duct-taped between the code bodies. The final way to misuse DBI in an application is to use it directly.

However, an unsung module, publicly available on CPAN that bridges the gap between DBI and application-level programming robustly and conveniently is DBIx::Recordset. It is built on top of DBI and is so well-matched to the level at which database-driven applications are conceived that in most cases one sentence in an application design specification equates to one line of DBIx::Recordset.

Problems Using DBI at Application-Level

Intolerance to Table and Data Mutation

Table mutation - the addition, deletion or rearrangement of fields from a table or data mutation - the addition, removal or rearrangement of portions of the input sources intended for database commission, can break a large number of calls to the DBI API. This is due to the fact that most routines expect and return arrays or array references and thus fail when the expected arrays shrink or grow. For example, the following DBI code:

 $dbh->do("INSERT INTO students (id,name) VALUES (1,$name)");

would break once fields were removed from the table students. However, the equivalent DBIx::Recordset code(1):

 DBIx::Recordset->Insert({%dsn,'!Table'=>'students',%dbdata});

would work regardless of constructive or destructive mutations of the students table or %dbdata. If there are fewer field-value pairs in %dbdata than in the table, then the insert will be performed with the corresponding fields. If there are irrelevant fields in %dbdata, then the extra fields are by default silently ignored.

Now, the import of this intolerance for DBI usage is that changes in either the tables or the input data require changes in the source. For some, such rigidity is of merit because it forces both the source and target of database commission to be made explicitly manifest within the source code. However, for other Perl programmers, such rigidity is nothing more than an imposition on their highly cultivated sense of Laziness.

Error-Prone and Tedious Query Construction

A query string is presented to the DBI API in placeholder or literal notation. An example of DBI placeholder usage is shown below:

 $sql='insert into uregisternew
        (country, firstname, lastname, userid, password, address1, city,
        state, province, zippostal, email, phone, favorites, remaddr,
        gender, income, dob, occupation, age)
        values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)';
 my @data=($formdata{country}, $formdata{firstname}, $formdata{lastname},
        $formdata{email}, $formdata{password}, $formdata{address},
        $formdata{city},  $formdata{state},  $formdata{province},
        $formdata{zippostal}, $formdata{email}, $formdata{phone},
        $formdata{favorites}, $formdata{remaddr}, $formdata{gender},
        $formdata{income}, $formdata{date}, $formdata{occupation},
 $formdata{age});
        $sth2 = $dbh->prepare($sql);
        $sth2->execute(@data);
        $sth2->finish();

This is a slightly modified version of a minefield I had to tiptoe through during a recent contract I was on. This code has several accidents waiting to happen. For one you must pray that the number of question-mark placeholders is the same as the number of fields you are inserting. Secondly, you must manually insure that the field names in the insert statement correspond with the data array in both position and number.

If one were developing the same query using DBI's literal notation, one would have the same issues and would in addition devote more lines of code to manually quoting the data and embedding it in the query string.

In contrast, DBIx::Recordset's Insert() function takes a hash in which the keys are database field names and the values are the values to be inserted. Using such a data structure eliminates the correspondence difficulties mentioned above. Also, DBIx::Recordset generates the placeholder notation when it calls the DBI API.

Thus, with no loss of functionality(2), the entire body of code above could be written as:

 DBIx::Recordset->Insert({%dsn,%formdata});

In fact, the DBI code is not equivalent to the DBIx::Recordset code because connection and database operations are always separate calls to the DBI API. The additional work required to use DBI has been omitted for brevity.

Manual and Complex Mapping of Database Data to Perl Data Structures

operation DBI DBIx::Recordset
Single row fetch
selectrow_array
selectrow_arrayref
$set[0]
Multiple row fetch
fetchall_arrayref  
selectall_arrayref 
fetchrow_array     
fetchrow_arrayref  
fetchrow_hashref
 for $row (@set) {...}
       OR    
while ($href=$set->Next())

In DBI, database record retrieval is manual, complex and in most cases intolerant to table mutation. By manual, we mean that performing the query does not automatically map the query results onto any native Perl data structures. By complex, we mean that DBI can return the data in a multiplicity of ways: array, array reference and hash reference.

In DBIx::Recordset, however, retrieval of selected recordsets can be (3) automatic, simple and field-mutation tolerant. By automatic, we mean that requesting the records leads to an automatic tie of the result set to a hash.(4) No functions need be called for this transfer to take place. The retrieval process is simple because the only way to receive results is via a hash reference. Because DBIx::Recordset returns a hash, fields are referred to by name as opposed to position. This strategy is robust to all table mutations.

Having seen DBIx::Recordset's solution to some of the more troublesome aspects of DBI use, we now move on to explore the wealth of application-level benefits that DBIx::Recordset offers in the following areas:

  1. Succinct CGI to SQL transfer
  2. Control and monitoring of table access
  3. Scalability

As impressive as these topics may sound, DBIx::Recordset is designed to achieve each of them in one line of Perl!(5)

Succint CGI-SQL Interaction (Database Control via ``CGI One-Liners'')

Assuming that the keys in the query string match the field names in the target database table, DBIx::Recordset can shuttle form data from a query string into a database in one line of Perl code:

 DBIx::Recordset->Insert({%formdata,%dsn});

One line of DBIx::Recordset can also drive recordset retrieval via form data as well as iterate through the results:


 # here we: SELECT * FROM sales and automatically tie
 # the selected records to @result
 *result = DBIx::Recordset->Search({
        %dsn,'!Table'=>'sales',%formdata
        });
 # here we iterate across our results...
 map { 
  printf ("<TR>Sucker # %d purchased item # %s on %s</TR>", 
        $_->{customer_id}, $_->{item_id}, $_->{purchase_date}) 
 } @result;

The above code is automatically quoted and requires no tiresome connect, prepare and execute ritual.

DBIx::Recordset also has helper functions which create the HTML for ``previous-next-first-last'' navigation of recordsets:

 $nav_html = $::result -> PrevNextForm ({
        -first => 'First',  -prev => '<<Back', 
        -next  => 'Next>>', -last => 'Last',
                -goto  => 'Goto #'}, 
        \%formdata);

In this case, we use the scalar aspect of the typeglob, which provides object-oriented access to the methods of the created recordset.

A final CGI nicety has to do with the fact that browsers send empty form fields as empty strings. While in some cases you may want this empty string to propagate into the database as a SQL null, it is also sometimes desirable to have empty form fields ignored. It is possible to specify which behavior you prefer through the DBIx::Recordset '!IgnoreEmpty' hash field of the Insert() and Update() function.

Control and Monitoring of Table Access

A database handle in DBI is a carte blanche to add, retrieve, or remove anything from a database that one can do with a console interface to the database with the same login. The problem with this is that the semantics and volatility of an application's database code is not self-consistent but instead varies as a function of database permission alteration.(6) Au contraire, a DBIx::Recordset handle is much more structured. A handle is created and configured with a number of attributes: table write modes, accessible tables, and the method of logging database usage to name a few.

The first form of much-needed table access control that DBIx::Recordset offers is specification of the manners in which a particular database connection will be allowed to alter database tables. By the use of a binary-valued string, one specifies the subset of write operations (specifically none/insert/update/delete/clear) that are allowable. Such facilities are often needed when defining access levels to a database for various parties. For example, it is conceivable for a corporate intranet database to give insert access to sales employees, update access to customer service, delete access to processing and complete access to technical support. To implement such constraints in plain DBI would yield a confusing maelstrom of if-thens and 2-3 suicide attempts. With DBIx::Recordset, one simply creates a properly configured connection for each corporate intranet sector.

Tangential to the issue of write permission is the issue of which tables can be accessed at all. Control of table access is simply one more key-value pair to the connection setup hash. Finally to monitor the use of database handles, one only need setup a debug file for each handle.

Thus, assuming the package company::database has a hash %write_mode which contains the write modes for the intranet, a hash %log_file with the log files for each handle, and a hash %table_access which contains the tables for each member of the intranet, one would specify the tables and their mode of access and usage logs for the entire intranet as follows:

{ 
  package company::database; 

  for (keys %write_mode) {

  *{$handle{$_}} = 
        DBIx::Recordset->Setup({%dsn, 
        '!Writemode' => $write_mode{$_}, 
        '!Tables'    => $table_access{$_}
        });
   DBIx::Recordset->Debug({
                '!Level' => 4,
                '!File'  => $log_file{$_},
                '!Mode'  => '>'
                });
   }
}

Scalability

Operation Code changes needed (DBI) Code changes (DBIx::Recordset)
Adding or removing form elements from a webpage but still having the database code commit the generated query string properly. For each change of the form (and hence the query string), the database code would have to be modified. None
Taking an un-normalized main table and splitting it into a number of "satellite" tables with foreign keys in the main table to reference the satellite tables. For each table split, additional terms would have to added to the WHERE or JOIN clause. None

Regardless of how well one plans a project, prototyping and early development are often evolutionary processes. Significant development time can be saved if database-processing routines remain invariant in the face of HTML and database re-design. It is in this regard that DBIx::Recordset dwarfs DBI, making it far more viable during the prototyping phases of a project. Having already shown DBIx::Recordset's scalability in the face of table mutations, this section will demonstrate DBIx::Recordset's scalability in the face of form data variations as well as database table splits.

Form Data Variations

Let's assume you were developing a user registration form that submitted it's form data to db-commit.cgi for insertion into a database:

#!/usr/bin/perl
use CGI (:standard);
$formdata{$_} = param{$_} for param();
if ($#(DBIx::Recordset->Search(
 { %dsn,
  '!Table'  => user_registration,
  'username' => $formdata{username}
 })) >= 0) {
 &username_taken_error;
} else {
 DBIx::Recordset->Insert(
 { %dsn,
  '!Table'  => user_registration,
  %formdata
 }
}

Now assume that you decided to add a new field called AGE to a table and a corresponding form. Under DBI, the insert query would have to be modified to account for the change. Because DBIx::Recordset takes a hash reference for its inserts, no code modification is required. Now of course, I can hear the DBI users squawking: ``I can develop a library that converts form data to hashes and turns this into query strings.'' And of course my hot retort is: ``But don't you see this is a homegrown, non-standard (7)API that will have to be duct-taped to other people's homegrown, non-standard solutions?''

Table Splits

For another example of DBIx::Recordset's flexibility to architecture changes, consider the case where a table is split, perhaps for reasons of normalization. Thus, in the core table where you once explicitly coded a user's name into a field user_name you now have a foreign key titled user_name_id which points to a table called user_name which has a field titled id. Assume that you also later decided to do the same sort of normalization for other fields such as age-bracket or salary-bracket. With plain DBI, each time that a query was supposed to retrieve all fields from each of the associated tables, the query would have to be rewritten to accommodate the splitting of the main table. With DBIx::Recordset, no query would have to be rewritten because the tables were described in a format recognizable by DBIx::Recordset's database meta-analysis.

Sample Code

On a recent contract I had to copy a user registration table (named uregister) to a new table (called uregisternew) which had all of the fields of the old table plus a few new fields designed to store profile information on the users.

The key thing to note about the DBIx::Recordset version of this code is that it is highly definitional: very little database mechanics clutters the main logic of the code, allowing one to focus on recordset migration from one table to another.

DBIx::Recordset Version of Code

 #!/usr/bin/perl
 =head1
 uregisternew is a table with all the fields of uregister plus a few
 profile fields (ie, salary bracket, occupation, age) which contain a
 positive integer which serves as index into the array for that
 particular profile field.
 The purpose of this script is to copy over the same fields and
 generate a valid array index for the new profile fields.
 =cut
 use Angryman::Database;
 use Angryman::User;
 use DBIx::Recordset;
 use Math::Random;
 use strict;
 $::table{in}  = 'uregister';
 $::table{out} = 'uregisternew';

 # connect to database and SELECT * FROM uregister
 *::uregister = DBIx::Recordset->Search ({            
        %Angryman::Database::DBIx::Recordset::Connect, 
        '!Table' => $::table{in}  
        });

 # because we will re-use the target table many times, we separate the 
 # connection and insert steps with this recordset
 *::uregisternew = DBIx::Recordset->Setup({  
        %Angryman::Database::DBIx::Recordset::Connect, 
        '!Table' => $::table{out} 
        });

 # iterate through the recordsets from the old table:
 for my $uregister (@::uregister) {
     &randomize_user_profile;
     # INSERT 
        # the old table data into the new table and
        # the computed hash of profile data
    $::uregisternew->Insert({%{$uregister},%::profile});
 }
 # Angryman::User::Profile is a hash in which each key is a reference 
 # to an array of profile choices. For example:
 # $Angryman::User::Profile{gender} = [ 'male', 'female' ];
 # $Angryman::User::Profile{age} = ['under 14', '14-19', '20-25', ... ];
 # Because we don't have the actual data for the people in uregister,
 # we randomly assign user profile data over a normal distribution.
 # when copying it to uregisternew.
 sub randomize_user_profile {
    for (keys %Angryman::User::Profile) {
        my @tmp=@{$Angryman::User::Profile{$_}};
        $::profile{$_} = random_uniform_integer(1,0,$#tmp);
        $::profile{dob}='1969-05-11';
    }
 }

DBI Version of Code

 #!/usr/bin/perl

 =head1
 uregisternew is a table with all the fields of uregister plus a few
 profile fields (ie, salary bracket, occupation, age) which contain
 a positive integer which serves as index into the array for that
 particular profile field.

 The purpose of this script is to copy over the same fields and
 generate a valid array index for the new profile fields.

 This file is twice as long as the DBIx::Recordset version and it 
 easily took me 5 times longer to write.
 =cut 

 use Angryman::Database;
 use Angryman::User;
 use DBI;
 use Math::Random;
 use strict;

 $::table{in}  = 'uregister';
 $::table{out} = 'uregisternew';

 # connect to database and SELECT * FROM uregister
 my $dbh = DBI->connect($Angryman::Database::DSN, 
                        $Angryman::Database::Username, 
                        $Angryman::Database::Password);
 my $sth = $dbh->prepare('SELECT * FROM uregister');
 my $ret = $sth->execute;

 &determine_target_database_field_order;

 # because we will re-use the target table many times, we separate the 
 # connection and insert steps with this recordset

 # iterate through the recordsets from the old table:
 while ($::uregister = $sth->fetchrow_hashref) {

     &randomize_user_profile;
     &fiddle_with_my_data_to_get_it_to_work_with_the_DBI_API();

     # INSERT 
         # the old table data into the new table and
         # the computed hash of profile data
     my $sql = "INSERT into $::table{out}($::sql_field_term) values($::INSERT_TERM)";
     $dbh->do($sql);
 }

 # Angryman::User::Profile is a hash in which each key is a reference 
 # to an array of profile choices. For example:
 # $Angryman::User::Profile{gender} = [ 'male', 'female' ];
 # $Angryman::User::Profile{age} = ['under 14', '14-19', '20-25',  ];
 # Because we don't have the actual data for the people in uregister,
 # we randomly assign user profile data over a normal distribution.
 # when copying it to uregisternew.
 sub randomize_user_profile {
     for (keys %Angryman::User::Profile) {
         my @tmp=@{$Angryman::User::Profile{$_}};
         $::profile{$_} = random_uniform_integer(1,0,$#tmp);
     }

     $::profile{dob}='';
 }

 # Hmm, I cant just give DBI my data and have it figure out the order
 # of the database fields... So here he we go getting the field
 # order dynamically so this code doesnt break with the least little
 # switch of field position.
 sub determine_target_database_field_order {

     my $order_sth = $dbh->prepare("SELECT * FROM $::table{out} LIMIT 1");
     $order_sth->execute;

 # In DBIx::Recordset, I would just say $handle->Names()... but here we 
 # must iterate through the fields manually and get their names.

     for (my $i = 0; $i < $order_sth->{NUM_OF_FIELDS}; $i++) {
         push @::order_data, $order_sth->{NAME}->[$i];
     }

     $::sql_field_term = join ',',  @::order_data;

 }

 # As ubiquitous as hashes are in Perl, the DBI API does not
 # offer a way to commit hashes to disk.
 sub fiddle_with_my_data_to_get_it_to_work_with_the_DBI_API {

     my @output_data;
     for (@::order_data) {
         push @output_data, $dbh->quote
             (
              defined($::uregister->{$_}) 
              ? $::uregister->{$_} 
              : $::profile{$_}
              );
    }

    $::INSERT_TERM=join ',', @output_data;
 }

Empirical Results

DBI DBIx::Recordset
1.4 seconds (1,2) 3.7 seconds (3,4)

The average, minimum, and maximum number of seconds required to execute the sample code under DBI and DBIx::Recordset. The code was run on a database of 250 users.

Conclusion

DBI accelerated past the ODBC API for database interface because it was simpler and more portable. Because DBIx::Recordset is built on top of DBI, it maintains these advantages and improves upon DBI's simplicity. Because it also adds much-needed application-level features to DBI, it is a clear choice for database driven Perl applications.

A strong contender for an improvement of DBI is the recent effort by Simon Matthews to simplify DBI use via a Template Toolkit plugin. Many of the advantages of DBIx::Recordset are available to the DBI plugin either intrinsically or due to the context in which it was developed. For example, DBIx::Recordset allows filtering of recordsets through the !Filter key to its database processing functions. The plugin did not have to provide filtering because there are several generic, widely useful filters (e.g., HTML, date, etc.) already available for Template Toolkit. However, Matthew's DBI plugin uses the same level of abstraction as DBI. This shortcoming, along with the plug-in's lack of application-level databasing conveniences, lands the plugin in the same functional boat as DBI with only nicer syntax to pad the same troublesome ride.

That being said, DBI is preferable to DBIx::Recordset when speed is of utmost importance. DBI's speed advantage is due to several factors (8). First DBIx::Recordset is built on the DBI API and thus one has the overhead of at least one additional function call per application-level database command. Secondly, it takes time for DBIx::Recordset to decode its compact input algebra and produce well-formed SQL.

All theory aside, my experience and the timing results show that you don't lose more than a second or two when you reach for DBIx::Recordset instead of DBI. Such a slowdown is acceptable in light of what DBIx::Recordset offers over DBI: speed of development, power of expression and availability of standard and necessary application-level functionality.

Even if time constraints do lead one to decide that DBIx::Recordset is inappropriate for a finished product because it is slightly slower than DBI, it can prove especially handy during early prototyping or when one is solving a complex problem and wants to focus on the flow of recordsets as opposed to the mechanics of managing this flow.

Acknowledgements

I would like to thank Gerald Richter (richter@ecos.de) for authoring DBIx::Recordset, commenting on an early version of this manuscript as well as providing me and others with free help on his great tool.

Footnotes

  1. Actually, the DBI code is not equivalent to the DBIx::Recordset code because connection and database operations are always separate calls to the DBI API. The additional work required to use DBI has been omitted for brevity.
  2. The DBIx::Recordset code is also more accurate because it uses database metadata to determine which data to quote, while DBI uses string-based heuristics.
  3. DBIx::Recordset can be automatic and simple, but, you can also operate in a more manual mode to afford yourself time/space efficiency on the same order as DBI.
  4. More precisely, each row in the recordset is an anonymous hash which is referred to by one element of an array whose name is determined by the typeglob bound during the call to the Search() function.
  5. I can't wait to see the next generation of obfuscated Perl now that major database operations only take one line!
  6. Be this alteration done by friend or foe.
  7. Now admittedly, the transfer of a CGI query string into a hash is non-standard as well, but, most high-end web application frameworks for Perl (e.g. HTML::Embperl and HTML::Mason) provide this transfer automatically as part of their application-level API to web site development.
  8. Maybe that's why there's a cheetah on the front of the DBI book.

The e-smith Server and Gateway: a Perl Case Study


The e-smith server and gateway system is a Linux distribution designed for small to medium enterprises. It's intended to simplify the process of setting up Internet and file-sharing services and can be administered by a non-technical user with no prior Linux experience.

We chose Perl as the main development language for the e-smith server and gateway because of its widespread popularity (making it easier to recruit developers) and it's well suited to e-smith's blend of system administration, templating and Web-application development.

Of course, the system isn't just Perl. Other parts of the system include the base operating system (based on Red Hat 7.0), a customized installer using Red Hat's Anaconda (which is written in Python), and a range of applications including mail and Web servers, file sharing, and Web-based e-mail using IMP (which is written in PHP). However, despite the modifications and quick hacks we've made in other languages, the bulk of development performed by the e-smith team is in Perl.

The E-Smith Manager: a Perl CGI Application

Administration of an e-smith server and gateway system is performed primarily via a Web interface called the "e-smith manager." This is essentially a collection of CGI programs that display system information and allow the administrator to modify it as necessary.

This allows system owners with no previous knowledge of Linux to administer their systems easily without the need to understand the arcana of the command line, text configuration files, and so on.

The manager interface is based on the CGI module that comes standard with the Perl distribution. However, a module esmith::cgi has been written to provide further abstractions of common tasks such as:

  • generating page headers and footers
  • generating commonly used widgets
  • generating status report pages

It is likely that this module will be further extended in the next version to provide more abstract ways of building "wizard" style interfaces, so that developers don't have to copy and paste huge swathes of code calling the CGI module directly.

Global Configuration Files

The e-smith server and gateway is a collection of many parts, all of which can be configured in different ways depending on the user's needs. All the global configuration data is kept in a central repository and this information is used as the base for specific configurations for the various software on the system.

Much of the system configuration data is kept in a simple text file, /home/e-smith/configuration. The basic format is name=value, as shown below:

AccessType=dedicated
ExternalDHCP=off
ExternalNetmask=255.255.255.0

Obviously, this can be simply parsed with Perl, by using something equivalent to:

my %conf;
while (<>) {
    chomp;
    my ($name, $value) = split(/=/);
    $conf{$name} = $value;
}

print "$conf{DomainName}\n";

However, it is also possible to store more information in a single line by adding zero or more pairs of property names and values delimited by pipe symbols (|), like this:

fetchmail=service|status|enabled
flexbackup=backupservice|erase_rewind_only|true
ftp=service|access|private|status|enabled

Parsing this gets tricky, requiring an additional split and putting the property names and values into a hash.

As it happens, e-smith has a module with common utilities such as this built in, so a developer would only write something like this:

use esmith::db;

my %conf;
tie %conf, 'esmith::config', '/home/e-smith/configuration';

my $domain = db_get(\%conf, 'DomainName');

my ($type, %properties) = db_get(\%conf, 'ftp');

Similar to the main configuration file, user and group information is kept in /home/e-smith/accounts, in a format that is also simple to parse.

This simplicity is intentional; although the data could have been stored in a more complex database, the developers decided to keep the core of e-smith as simple as possible so that the learning curve for new developers would not be steep.

Templated Configuration Files Using Text::Template

Things become more complex when we examine the configuration files stored in /etc. Each piece of software has its own configuration format, and writing parsers for each one can be a complex, time-consuming and error-prone process. The e-smith software avoids this by using a template-driven system instead, using Text::Template.

Templates are stored in a directory hierarchy rooted at /etc/e-smith/templates. Each configuration file is either a Text::Template file or can be given a subdirectory in which template fragments are stored. These templates are then parsed (and in the case of a subdirectory of fragments, concatenated together) to generate the config files for each service on the system. The fragmented approach is part of e-smith's modular and extensible architecture; it allows third-party modules to add fragments to the configuration, if necessary.

For example, let's look at the ntpd service (which keeps the system's clock right by querying a time server using the Network Time Protocol). It usually has a config file /etc/ntp.conf. On an e-smith server, this is built out of the template fragments found in the /etc/e-smith/templates/etc/ntp.conf/ directory.

This is a simple template, and only requires basic variable substitutions. (Since Text::Template evaluates anything in braces and replaces it with the return value of the code, some templates have more complex code embedded within them.) Here is what is in the template fragment file /etc/e-smith/templates/etc/ntp.conf/template-begin:

Server { $NTPServer }
driftfile /etc/ntp/drift
authenticate no

In this example, $NTPServer would be replaced with the value of that variable.

Instead of calling Text::Template directly, e-smith developers use the e-smith::util module to automate the process. Here's how we would generate the ntp.conf file:

use esmith::util;

processTemplate({
    CONFREF => \%conf,  # this is the %conf from the last section
    TEMPLATE_PATH => '/etc/ntp.conf',
});

The above example takes advantage of a number of default values set by the processTemplate(). If we want more control, we can specify such things as the user ID (UID), group ID (GID) and permissions to use when writing the configuration file.

use esmith::util;

processTemplate({
    CONFREF => \%conf,  # this is the %conf from the last section
    TEMPLATE_PATH => '/etc/ntp.conf',
    UID => $username,
    GID => $group,
    PERMS => 0644,
});

Incidentally, the way in which the processTemplate() routine lets the programmer override the default values is a good example of Perlish idiom:

# the parameter hash the programmer passed to the routine is

my %p = (�faults, %params_hash);

Events and Actions

When the user hits "submit" on a Web form, a number of things can occur:

  • master configuration files are updated
  • templated configuration files in /etc are updated
  • network services restarted
  • new user account created
  • backup performed
  • ... or any of a number of other events.

The model used to make these things happen is one of actions and events. An event is something that happens on the system (such as the user submitting a Web form, an installation completing, a reboot, etc). An action is the atomic unit of things-that-need-doing, several of which may be called when an event occurs. For instance, the post-install event calls actions to configure and start various services, initialize the password file, and so on.

Actions are written as Perl scripts, and stored in /etc/e-smith/events/actions. Some of them just use system() calls to do what's needed, as in this example ( /etc/e-smith/events/actions/reboot):

package esmith;

use strict;
use Errno;

exec ("/sbin/shutdown", qw(-r now)) or die "Can't exec shutdown: $!";
exit (2);

Others are more complex and can contain as many as a few hundred lines of Perl code.

An event is defined by creating a subdirectory under /etc/e-smith/events and filling it with symlinks to the actions to be performed.

[root@e-smith events]# ls -l user-create/
total 0
lrwxrwxrwx    1 root     root           27 Jan 24 00:07 S15user-create-unix -> ../actions/user-create-unix
lrwxrwxrwx    1 root     root           27 Jan 24 00:07 S20conf-httpd-admin -> ../actions/conf-httpd-admin
lrwxrwxrwx    1 root     root           28 Jan 24 00:07 S20email-update-user -> ../actions/email-update-user
lrwxrwxrwx    1 root     root           30 Jan 24 00:07 S25email-update-shared -> ../actions/email-update-shared
lrwxrwxrwx    1 root     root           22 Jan 24 00:07 S25ldap-update -> ../actions/ldap-update
lrwxrwxrwx    1 root     root           29 Jan 24 00:07 S25reload-httpd-admin -> ../actions/reload-httpd-admin
lrwxrwxrwx    1 root     root           23 Jan 24 00:07 S50email-assign -> ../actions/email-assign
lrwxrwxrwx    1 root     root           21 Jan 24 00:07 S80pptpd-conf -> ../actions/pptpd-conf

Events are called via a script called /sbin/e-smith/signal-event, itself written in Perl. It's included here nearly in full, as a detailed example of e-smith code.

#!/usr/bin/perl -w

package esmith;
use strict;


my $event = $ARGV [0];
my $handlerDir = "/etc/e-smith/events/$event";


opendir (DIR, $handlerDir)
    || die "Can't open directory /etc/e-smith/events/$event\n";

my @handlers = sort (grep (!/^\.\.?$/, readdir (DIR)));

closedir (DIR);


open (LOG, "|/usr/bin/logger -i -t e-smith");


my $ofh = select (LOG);
$| = 1;
select ($ofh);

print LOG "Processing event: @ARGV\n";


my $exitcode = 0;
my $handler;
foreach $handler (@handlers)
{
    my $filename = "$handlerDir/$handler";
    if (-f $filename)
    {
 print LOG "Running event handler: $filename\n";
 print LOG `$filename @ARGV 2>&1`;
 if ($? != 0)
 {   
     $exitcode = 1;
 }
    } 
}

close LOG;
exit ($exitcode);

Future Development

There is currently a locking issue with the global configuration files. The techniques used to manipulate these files do not allow multiple processes to modify them concurrently. If two programs try to manipulate the files at the same time, one of them will overwrite the other's changes. This is obviously a serious issue, albeit one that seldom causes problems in normal use, as most e-smith servers do not have multiple administrators working on the system at the same time.

A seemingly obvious solution is to use DBM instead of the current flat-text file system. However, the flat-text files are important because they make the system config readable and modifiable using a standard text editor from the Linux shell prompt. A simple command can then regenerate other configs or stop or start services based on the changes, without requiring the Web interface to be used. This is useful in the situation when the Web interface might have been broken (a rare situation) or when a configuration option is hidden from less technical users.

A solution combining the benefits of text files and DBM has been suggested, in which the routine that reads the config database would check to see whether the text file has been changed recently. If it has been changed, it would convert it to DBM, otherwise it would just use the DBM directly. When a configuration option is changed, it would be written to both the DBM and the text file.

Another problem is the way that multiple instances of the Perl interpreter are invoked to run events and actions, causing some performance problems. A number of alternatives are being considered, including mod_perl and POE. The goal is to reduce the wait experienced by the user when they click "submit" via the Web interface; ideally, response should be near-instantaneous.

Other forthcoming improvements include a simpler way to create "wizard" interfaces for the e-smith manager (possibly using the FormMagick Perl module currently under development), and internationalisation (probably using the Locale::Maketext module).

In Conclusion ...

The e-smith server and gateway is a great example of a large project using Perl both as a system administration scripting tool and a Serious Programming Language. Although it has about 20,000 lines of Perl code, the system is easy to understand and the Perl code is maintainable and readable, even by relatively inexperienced Perl programmers.

If you're interested in taking a closer look at the e-smith code, or maybe contributing to it, more information is available from the e-smith developer Web site.

This Week on p6p 2001/02/18



Notes

Please send corrections and additions to perl6-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

We looked at over 400 messages this week, about a quarter of which were to do with garbage collection. Again. I'm afraid this week's summary is a little short, but I'd rather get it out early than leave it until it's a week old.

Garbage Collection

The GC fetish rages on, despite Dan's valiant efforts to call a temporary halt to the discussion. Dan also valiantly tried to distinguish between garbage collection (which is the freeing of unused memory) and destruction. (which is what the DESTROY method provides for objects) When he claimed that "Perl needs some level of tracking for objects with finalization attached to them. Full refcounting isn't required, however", (Note: Jan Dubois later pointed out that what we were calling finalization is actually object destruction) Sam Tregar came back with three important questions:

I think I've heard you state that before. Can you be more specific? What alternate system do you have in mind? Is this just wishful thinking?

It has to be said that Dan seemed reluctant to answer the first two questions, and both Sam and Jan Dubois pulled him up on this. Dan said that he did not have time right now, but also said that most variables would not need finalization, and of those which did, most would not need reference counting because the lifespan of a variable can be determined by code analysis:

Most perl doesn't use that many objects that live on past what's obvious lexically, at least in my experience. (Which may be rather skewed, I admit) And the ratio of non-destructible objects to everything else is also very small. Even if dealing with destructable things is reasonably expensive, the number of places we pay that (and the number of times we pay that) will be small enough to balance out. If that turns out not to be the case, then we toss the idea and go with plan B.

A lot of people made noises to the effect that they want predictable destruction, so that's probably something that will happen - Perl 5 now claims to have predictable DESTROY calling, after a patch by Ilya a couple of months back. Unfortunately, it transpires that the only way to get predictable destruction is to use reference counting.

There was some discussion of the weird and usually unexpected interaction between AUTOLOAD and DESTROY, where the consensus seemed to be that AUTOLOAD should not, in future, be consulted for a DESTROY subroutine; Perl should do what its programmers actually want, instead of what they consider consistent. And there was a lot more

discussion which unfortunately produced far more light than heat. On the other hand, stay tuned for a potential GC PDD from Dan next week.

More end of scope actions

(Thanks to Bryan Warnock for this report)

In response to various peripheral discussions, Tony Olekshy kicked off a revisit to RFC 88, dealing with end-of-scope matters, particularly in the area of exception handling. The bulk of the various discussions subtitled "Background", "Visibility", "POST blocks", "Reference model 2.0.2.1", "Error messages", and "Core Exceptions" resulted in light traffic - responses were generally limited to Q&A. (Although James Mastros did provide an alternate syntax for a POST block, in an effort to minimalize the exception handling syntax.) The thead covering "do/eval duality" generated more discussion, but was mainly centered around the semantics of the duality in Perl 5. Likewise, the thread covering "Garbage collection" did little more than to try to agree on proper terminology.

The only new material presented was in the sub-thread "Toward a hybrid approach", where Tony and Glenn Linderman attempted to consolidate a traditional static try/catch/finally exception model with a dynamic always/except model. Both Tony and Glenn posted a number of examples - too lengthy to do justice to here. But the whole discussion can basically be boiled down to these two messages: this one and this one.

Tony has a working model, and you may want to revisit RFCs 88 and 119.

Quality Assurance

(Thanks to Woodrow Hill for this summary; you wouldn't believe how much easier this job gets when other people do it for you.)

Michael got the whole ball rolling with a number of "wake up" postings to perl-qa, including such highlights as:

...we had some ideas about developing a sane patching process.[...] Patch, Test, Review, Integrate. Please comment/add/take away.

Which no one seems to have done. But his comment that:

As part of the QA process we need to do alot of test coverage analysis and, to a lesser extent, performance profiling. Our existing tools (Devel::Coverage, Devel::DProf, Devel::SmallProf) are a start, but need alot of work. We need really solid, tested, documented libraries *and* tools to pull this off.

got folks talking about how complex a topic this is, and how many different way it can be looked at. Paul Johnson came to the rescue with a nice piece of work describing Code Coverage.

All this finally led to the creation of perl-qa-metrics, for the discussion of code metrics and other forms of analysis as thy apply to Perl.

Michael also asked for Administrative help:

I need someone to maintain/take responsibility for:

  • A list of projects and their development status and needs.
  • Making sure things move forward
  • A "this week on perl-qa" style summary
  • The code repository
  • Mailing list organization (creating new lists when necessary, etc..)

Which he then clarified with:

I think that's what I need. A project manager. If anyone out there actually has experience in any of this, feel free to shout loudly.

Michael started another thread with his comment about Test::Harness. He noticed that there's an ill-documented option for it to allow certain test to fail by design, for unimplemented features and the like.

This led to a discussion about how exactly to write the test, closures vs. if/then vs. CODE references, which seems to have come to this conclusion:

Michael: Okay, we'll file this discussion under YMMV.

Barrie: That's my point. Your style isn't the only one out here.

String Encoding

(Thanks again to Woodrow Hill)

Character representations in Perl 6

Hong Zhang started out the thread with:

I want to give some of my thougts about string encoding... Personally I like the UTF-8 encoding. ... The new style will be strange, but not very difficult to use. It also hide the internal representation.

The UTF-32 suggestion is largely ignorant to internationalization. Many user characters are composed by more than one unicode code point. If you consider the unicode normalization, canonical form, hangul conjoined, hindic cluster, combining character, varama, collation, locale, UTF-32 will not help you much, if at all.

Simon pointed out that the general direction for Perl 6 currently seemed to point towards the use of codepoints instead of an internal UTF-8 representation, for simplicity of tracking character positions, amongst other issues. Hong disagreed, and thus began a interesting little set of emails concerning the use of UTF-8, 16, or 32 vs. codepoints in Perl, the efficiency of determining the position of a character in Perl using the various encoding schemes, and so on. As Dan would maintain:

To find the character at position 233253 in a variable-length encoding requires scanning the string from the beginning, and has a rather significant potential cost. You've got a test for every character up to that point with a potential branch or two on each one. You're guaranteed to blow the heck out of your processor's D-cache, since you've just waded through between 200 and 800K of data that's essentially meaningless for the operation in question.

And Simon commented, towards the end of this thread, that:

I think you're confused. Codepoints *are* characters. Combining characters are taken care of as per the RFC.

The commentary seemed to end with Hong restating his basic position for the record, that UTF-8 was the way to go, and Dan's response:

Um, I hate to point this out, but perl isn't going to have a single string encoding. I thought you knew that.

Various

Branden tried to bring up the deadly ||| operator again. This did not go down well. Ziggy suggested a PDD to document all the hoary old crap that we don't want to drag up again.

Until next week I remain, your humble and obedient servant,


Simon Cozens

This Week on p5p 2001/02/12



Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

I've been having mail problems this week, so it's possible that I've missed one or two important things. Even so, it seems to be looking relatively quiet out there.

Perl FAQ updates

A number of bug reports and suggested fixes came in regarding the Perl FAQ this week; the most contentuous one was a bug in the regular expression for matching email addresses given in perlfaq9. Everyone agreed that there was no point trying to be completely correct, since the RFC822 grammar for an email address is not easily reduced into a regular expression. There are complete regular expressions, such as the one in Abigail's RFC::RFC822::Address module, and the one given in Jeffery Friedl's "Mastering Regular Expressions". However, the current one required some slight tweaking.

This, combined with the other bug reports both on P5P and perl-friends, led Jarkko to set up a working group to look after the FAQ. If you're interested in helping out, see the list's home page.

Namespace for IO Layers

Nicholas Clark asked three difficult questions about IO layers:

1: How is it proposed to avoid namespace clashes with layers?

2: Is there a suggested namespace (eg Layer:: ?) for modules implementing layers?

3: Is there a standard (memorable) way of passing arguments to layers?

The other Nick said that he wanted the layer names ( :gzip and the like) to behave like (and possibly even converge with) attribute names, and so generalised the problem to "how do we avoid namespace clashes with attributes?". As for question two, he suggested the PerlIO:: or perlio:: namespaces. The final question caused more consternation. Nicholas wanted to avoid problems where an argument to a layer could have meaning to the layer system in general. It wouldn't be good, for instance, to say

    open (FOO, ':layer(name="(test1),:gzip")', $file)

He first proposed a system similar to URL encoding, using %XX to escape significant characters. Nick Ing-Simmons got a little worried at this point:

I have a feeling that putting too much in the one string is a mistake. I think we may need something akin to ioctl() which can call "method(s)" on the "layer object" to allow a more extensible approach.

Surprisingly, this is what Ilya was saying nearly a year ago.

Memory Leak Plumbing

Alan has been working exceptionally hard of late on what he called the "south end of the camel": memory leaks. He's been using a software tool called "Purify" that checks for leaks and illegal memory accesses.

Firstly, he discovered something awry in Perl_pmtrans, part of the regular expression engine. This led him to issue the following plea:

The fact that there isn't even a comment explaining what it is supposed to do isn't helping - it is virtually impossible to figure out if it is working correctly if I have no idea of what is correct behaviour.

A heartfelt plea - *please* comment your code. Take pity on those of us following at the south end of the camel, carrying the shovel and bucket.

Next, he produced evidence of a leak in another part of the RE engine, along with some code which is probably doing something exceptionally evil indeed: it saves a way a pointer, sets it to zero, saves it again and then re-allocates some memory for it. Obviously, when it's time for the saved-away values to be restored, Bad Things happen. Alan says that before he started work on Perl_pmtrans it was leaking 96% of available memory - a far cry from the usual call that "the only thing that leaks memory is failed evals".

That was, however, the next area that Alan planned to take on but it's a very hard problem. Nick Ing-Simmons is going to take a look at it, and Nicholas Clark came up with his favourite memory leak, which Alan duly fixed.

He also found a rather major memory leak in the Perl destruction process; his fixes to the SV allocation process were coming up with strange errors about "unbalanced reference counts". He thought this was initially a problem in his patch, but eventually it became clear that strings in environment variables or parser tokens weren't being properly freed, because they weren't been allocated from designated blocks of memory. ("arenas") Worse, everything in stashes had problems with circular references: the variable would contain a reference to its parent stash, and vice versa. Just for fun, %main:: refers to itself, because as it's set up, it declares itself to be its own parent. (That's why you can do $main::main::main::c)

Hopefully, Perl will be a lot less leaky in the very near future - and you have Alan to thank for that.

Shared functions

Last week, I mentioned Doug MacEachern's experiments with shared subs. This week, he's produced a complete patch which allows you to say (in an XSUB):

    void
    foo()

      ATTRS: shared

and the resulting sub will be shared between interpreters. Apache::* module authors, listen up. This one could be useful for you.

There was a little concern that there was no locking provided on the shared subroutine code, but Doug countered that the GvSHARED test combined with the SvREADONLY flag meant that if anyone messed around with the GV, they basically had to take responsibility for their actions...

Perl 6 Is Alive!

Rumours of Perl 6's death have been greatly exaggerated, and devotees of this weekly summary will be overjoyed to learn that I (together with my merry band of volunteers) will be putting together summaries of the Perl 6 mailing lists.

Later on this week and from next week, the summaries will be appearing on the front page of http://www.perl.com/; this week's issue is temporarily located at http://simon-cozens.org/perl6/THISWEEK-20010211.html

Various

Quite a few bug reports annd a few quick fixes, but nothing else of major note.

Until next week I remain, your humble and obedient servant,

Perl 6 Alive and Well! Introducing the perl6-mailing-lists Digest


"What *is* going on over there, anyway? It is unfortunately true that the effort looks stalled from the outside."
    - Eric S. Raymond, to the perl6-meta mailing list

The push towards the next major version of perl started with a bang -- literally, thanks to Jon Orwant and his now-infamous coffee-mug-tossing act. Mailing lists were set up, a flurry of RFCs were submitted, and now, almost five months since the RFC process ended... Quiet. What *is* going on with perl6, anyway?

As it stands now, the silence is mostly due to the fact a lot of the work has gone underground. First, of course, we continue to wait eagerly for Larry's language design, although as Larry himself points out, he's got his hands full with the 361 RFCs submitted last fall. Elsewhere, work continues at a steady murmur, especially on the perl6-language and perl6-internals lists. In particular, the perl6-internals group, led by the redoubtable Dan Sugalski, has borne some recent fruits, as discussions have started to coalesce into "Perl Design Documents," or PDDs.

PDDs are detailed white papers that will hopefully serve as guides when people actually sit down to write code. So far, PDDs have been submitted to perl6-internals relating to the structures of the interpreter itself, and of vtables, which will be used to implement primitive variables, like scalars. More PDDs are expected on other language-independent features, like garbage collection and I/O abstraction, which will need to be implemented somehow, regardless of what Larry's final language design looks like. Some preliminary code for the perl6 interpreter might even be written in the next month or so, once the existing PDDs are finalized.

So, contrary to all outward appearances, perl6 is indeed alive and well! In order to remedy this information deficit, Simon Cozens has stepped forward, and volunteered a companion to his perl5-porters digest. As such, the O'Reilly Network is pleased to introduce the first edition of the perl6 mailing lists digest. Simon plans to set up e-mail distribution (analogous to the p5p digest), so we'll be sure to let you know when that happens. Meanwhile, the perl6 digest will become a regular weekly feature of www.perl.com, hot off the mailing lists to you. Enjoy!

This week on perl6 (04--11 Feb 2001)

Please send corrections and additions to perl6-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

This week was reasonably quiet, seeing around 350 messages in all the groups. For a change, most of the traffic was on perl6-language

Autoloading Modules

Last week, Dan asked people to think about ways to autoload modules when a function was used; the idea being that you'd be able to say, for instance:

    socket(SOCK, PF_INET, SOCK_STREAM, $proto);

(or moral equivalent) and Perl would load in all the socket functions. This is actually what Perl 5 already does for glob and some of the Unicode functionality. Some people went off on a bit of a tangent and started discussing ways to autoload modules more generally, by having modules declare what functionality they're providing.

One big question for both sub-discussions was how we key the functions to code. Jarkko said:

A gut feeling that I have is we can't simply go by interface 'names', be they just simple names of funtions/methods or their full 'signatures' (let us not even start on (1) how difficult with Perl's type system and functions it is to define signatures (2) the difficulty in defining an ontology/vocabulary), either would not really be enough.

What I think is needed is some sort of opaque tag: the name of the 'contract' the API claims to fulfill. The name can be the name of the standard, the name of the company, the name of the individual.

Branden suggested that a URI should be used, leading to the inevitable but still horribly scary notion of

    use autoload { Bar => 'http://www.cpan.org/modules/Bar' },
                 { Baz => 'ftp://my.local.domain/perl-modules/Baz', VERSION
                 => 2 };

Various people pointed out that this might not be secure.

Packaging

The autoloaded core functions idea got slightly left by the wayside, as the discussion finally veered off onto how to package up modules and programs to satisfy dependencies and make things easy for the user. A setup similar to Java's "jar"s was suggested. Dan came up with the very simple and neat idea of simply shipping a bytecode compiled version of a program instead. Schwern was a bit concerned that this would lose source code and would separate out documentation; Dan's brilliant answer was:

Not unless you strip the bytecode. I want to optionally package the source in the bytecode, since otherwise you can't do some optimizations after the fact on the generated bytecode stream.

He also suggested a text segment in bytecode so that, for instance, you can still get POD embedded in code.

That's something that may well happen anyway, but Branden came back on the packaging issue. He noted that Dan's suggestion wouldn't help for modules with C extensions, and also said:

Actually, I think the archive approach is more general, because it wouldn't have this kind of problems and would allow other resources to be deployed together with the code, like documentation, perhaps even text files and images used by a Perl/Tk application

Comparisons were made between this and the currently-existing PPM. Branden produced a draft PDD for his ideas.

Vtables

At long last, Dan produced the second PDD, specifying the vtable API. As expected, this exposed a lot of hidden confusion about what vtables are for and how they're going to be handled. Tim piped up with a few questions and corrections, including a discussion about how string handling is going to work, especially string encoding. Dan said he deliberately left UTF8 off, because dealing with variable-length data is horrid. Most people disagreed, saying that UTF32 was too big for most people, and UTF8 was a good compromise for most data. It was generally agreed that an abstracted string handling layer would make most of the problem go away.

Edwin Steiner asked whether the vtable API should be made up of macros; I pointed out that this was the road that Perl 5 went down, and look what happened to that. Dan also said that there wouldn't be an "API" for vtables - they're to be used by the Perl guts only.

There was still a lot of confusion as to how overloading and aliasing would be accomplished. Branden came up with an alternative suggestion for how to handle vtables, which seemed to be rather more high-level. The current vtable PDD wants to make many core ops a single vtable call if possible. There seemed to be much confusion about how the key field worked, and what operation was being carried out on what. No doubt further revisions of the PDD will clear this up. Dan also said that once the PDD has matured a little more, he wants to start writing code for the base classes. We're nearly there, guys

.

Subroutine return values

There was a lot of light but very little heat in the continuing saga of assigning to a magic return value variable. Some people seem to want to do this:

    sub foo {
        foo = 10;
    }

instead of return 10, just like Pascal, Algol and all those other failed, now-dead languages.

A (slightly) better suggestion was a magic variable to hold the return value, similar to what Parse::RecDescent (and of course, yacc) does. The names $__ and $^R were suggested, but there was no consensus on whether or not it would even be a good idea.

End of Scope Actions

A far better idea came out when people stopped looking at what they wanted and started looking at why they wanted it. A lot of the value in having a assignable return value is in the situation of subroutines which set something up, compute, and then turn it down again. Another way of looking at that was to stipulate a block executed at the end of the scope, like this:

    sub lines {
        open IN, $_ or die $!;
        return scalar(<in>);
    }
    post lines { # This is executed on return
        close IN;
    }

Damian had, of course, thought ahead, and this is covered by his RFC 271. However, he agreed that post-block actions should be allowed everywhere, not just on subroutines. The always keyword was agreed upon as a good way of doing this, although POST was also suggested. This lead to the semi-inevitable rehash of the try- catch exception handling debate. According to John Porter,

There is no try, there is only do. :-)

Garbage Collection

Jamie Zawinski published his rant about Java, which caused certain sensible people to ponder how to make sure Perl avoids the same mistakes. A few of the things mentioned included platform independence, the size of SVs, locking, but the discussion settled down to garbage collection, as rather a lot of discussions on perl6-internals are wont to do. (Oh, this was on perl6-language. Ho hum.)

The trigger was a question from Branden:

I actually don't understand how traversing a graph can be faster than incrementing/decrementing/testing for zero on a refcount. I believe you, but I just don't understand. Could you point me to some URLs that talk about this?

and a masterful answer from Piers:

There's a jolly good book on this called (would you believe) 'Garbage Collection'. The crux of the matter would appear to be that with refcounts you have to do a pretty small amount of work very, very often. With a well designed GC system you do a largish amount of work much less frequently. The total amount of work done tends to come out higher in the refcounting scenario.

This was coupled with a more comprehensive answer from Ken Fox. Dan said he wanted to put GC-related data at the end of a variable, so that it didn't always get loaded into memory. He also pointed out that

The less memory you chew through the faster your code will probably be (or at least you'll have less overhead). Reuse is generally faster and less resource-intensive than recycling. What's true for tin cans is true for memory.

and hinted that Perl 6 is likely to be using a generational semi-space garbage collection scheme.

kdb

Joshua Pritikin mentioned kdb, but had to be tortured before he would explain why. It eventually became clear he was talking about the K language and its interesting data model; he says:

Whether K is ultimately a failure or not, i find it to be an interesting mix of design trade-offs. Of course i'd have to use it in a real project to offer a detailed report of its weaknesses.

ESR on Perl 6

Eric Raymond released two more chapters of his on-line book The Art of Unix Programming, something Perl 6 people would do well to read. Unfortunately, he wasn't particularly complimentary about Perl, claiming that both Perl 5 and Perl 6 are currently stagnant and stalled. This led to a rather acrimonious discussion about our public image, and it was resolved that these summaries might help us let the public know what's going on. So here we are.

And there we were. Until next week I remain, your humble and obedient servant,

Simon Cozens

This Week on p5p 2001/02/06


Notes

You can subscribe to an email version of this summary by sending an empty message to p5p-digest-subscribe@plover.com.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year and month.

Wow. 600 messages this week, and that's not counting a lot of the test result messages.

Perl 5.6.1 not delayed after all

It had to happen. Just after I announced last week that 5.6.1 would be delayed, Sarathy announced the release of 5.6.1 trial 2. This is available for testing on CPAN at $CPAN/authors/id/G/GS/GSAR/perl-5.6.1-TRIAL2.tar.gz . Sarathy says:

Thanks largely to Jarkko's help, the second trial version of perl v5.6.1 is now available. (CPAN may need some time to catch up.)

If this release passes muster, I will update perldelta.pod and send it to CPAN as v5.6.1. Owing to the large number of patches, testing is very very important. So give it all you've got on all your favourite platforms!

In particular, I'd like to see some purify (or similar) runs. Patches to documentation are also welcome. Changes to *.[ch] are probably out unless they are fixing serious bugs.

Naturally, this produced a deluge of test results, the vast majority of which were successful. As usual, if you've got any weird and funky platforms, give it a spin.

And, of course, well done to Jarkko and Sarathy for putting this one together.

MacPerl

I forgot to mention this last week, but it's important enough for me to mention it this week: Chris Nandor has taken over the MacPerl pumpkin. If you have a Mac and you want to run Perl on it, (or even better, help move MacPerl up to 5.6.1) then you really ought to read Chris' State of MacPerl posting.

select() on Win32

Barrie Slaymaker mentioned that he wanted to get select() working on Win32, and that Perforce were interested in funding the work. Nick Ing-Simmons said that the PerlIO abstraction layer would help with this:

he problem is that on Win32 to use select() the Socket has to be in synchronous mode. While to use WaitForMultipleEvents() the Socket has to be in asynchronous mode - thus if you want to use Win32's native "poll-oid" API you cannot use select(). In addition MS's C runtime ( read/ fread etc) will not work on sockets in asynchronous mode (or non-blocking IO in general).

So you need to replace read and stdio with another IO subsystem and get perl to use it - hence PerlIO.

Uri Guttman predictably took this as a cue to push for a portable event interface; Rocco Caputo said that he'd added an event-driven IPC::Run-style process communication model to his POE module which worked fine on Win32, using TCP sockets as a form of select()-able pipe emulation.

Nick wanted to work at the problem from the other end, by building up a new PerlIO bottom layer for Windows, using the native Windows IO calls. Sean McCune, who's working with Barrie on this, said that's what he would try to do. As Jarkko pointed out:

First fork() emulation and now select()? If we are not careful in ten years or so NT/W2K/W2010 will be almost as useful as UNIX was in mid-1980's.

Test::Harness

At last, Schwern's Test::Harness patch made it in, after a tiny bit more messing around. The discussion turned into a useful thread on patching strategies. For instance, it's apparently not very widely known that if you add a new file to the Perl distribution, you also need to patch the MANIFEST file. There's also a load of good information in the file Porting/patching.pod. Andreas Koenig also put in a plug for Johan Vroman's makepatch utility:

Johan Vromans has written a powerful utility to create patches based on large diretory trees -- makepatch. See the JV directory on CPAN for the current version. If you have this program available, it is recommended to create a duplicate of the perl directory tree against which you are intending to provide a patch and let makepatch figure out all the changes you made to your copy of the sources. As perl comes with a MANIFEST file, you need not delete object files and other derivative files from the two directory trees, makepatch is smart about them.

Nicholas Clark suggested that each time you plan to make a change, you can call configure with -Dmksynlinks which creates a symlink farm. Then when you change a file, remove the symlink and replace it with a real copy of the file. This means you can maintain multiple patch trees without the space overhead of full source trees. Other suggestions included various version control systems, and some people provided programs to sync up bleadperl with their local version control repository.

Schwern also came out with a load of documentation patches explaining the difference between chop and chomp from a portability point of view, and changing the examples to use chomp where they previously used chop.

CHECK blocks

Piers Cawley asked about CHECK blocks

I have a CHECK block that checks that all the methods requested by the interface are accessible. No problem.

Until, one of my class's client classes comes to do a deferred load using require and everything falls over in a heap because it's

 Too late to run CHECK block.

And I can't, for the life of me, understand why.

So, what's a CHECK block? The idea is that they're supposed to be called after compilation is completed. They're intended to be used by the compiler backends, to save the program state once everything's been assembled into an op tree. However, there's no reason why you can't use them for other things instead.

The problem that Piers was coming up against was that we expect CHECK blocks to be run every time something is compiled, but this doesn't happen yet; Sarathy explains:

In the current implementation, there is exactly one point at which CHECK and INIT blocks are run (this being the point at which the Compiler would do its work, when it saves and restores program state, respectively).

But I believe Larry has stated that CHECK blocks should be able to run at the end of compilation of every individual "compilation unit", whatever that happens to be (file/BEGIN block/eval"").

As far as I'm aware, nobody is currently working on making those new semantics reality, but I don't think it would be too difficult.

C library functions in the core

I compiled a list of standard C library functions that are either reimplemented in the Perl core, or redefined to have more predictable semantics. This helps you write more `politically correct' internals code. For instance, instead of saying

 char *foo = malloc(10);

you should really say

 New(0, foo, 10, char);
Read about it.

Perl for Windows CE

So they beat me to it. Perl for Windows CE is finally available, at http://www.rainer-keuchel.de/software.html

Well done, Rainer, you mad individual.

Various

Lupe Christoph came up with some patches to make Solaris's malloc the default rather than Perl's malloc on that platform; this works around a known problem with Perl's malloc with more than 2G of memory.

Doug MacEachern had a really neat patch which shared globs containing XSUBs across cloned Perl interpreters, something that could save a lot of memory for those embedding Perl. (Especially things that clone a lot of interpreters, like mod_perl)

And that's about it. Until next week I remain, your humble and obedient servant,


Simon Cozens

Pathologically Polluting Perl

Pathologically Polluting Perl


Table of Contents

Inline in Action - Simple examples in C
Hello, world
Just Another ____ Hacker
What about XS and SWIG?
One-Liners
Supported Platforms for C
The Inline Syntax
Fine Dining - A Glimpse at the C Cookbook
External Libraries
It Takes All Types
Some Ware Beyond the C
See Perl Run. Run, Perl, Run!
The Future of Inline
Conclusion

No programming language is Perfect. Perl comes very close. P! e! r! l? :-( Not quite ``Perfect''. Sometimes it just makes sense to use another language for part of your work. You might have a stable, pre-existing code base to take advantage of. Perhaps maximum performance is the issue. Maybe you just ``know how to do it'' that way. Or very likely, it's a project requirement forced upon you by management. Whatever the reason, wouldn't it be great to use Perl most of the time, but be able to invoke something else when you had to?

Inline.pm is a new module that glues other programming languages to Perl. It allows you to write C, C++, and Python code directly inside your Perl scripts and modules. This is conceptually similar to the way you can write inline assembly language in C programs. Thus the name: Inline.pm.

The basic philosophy behind Inline is this: ``make it as easy as possible to use Perl with other programming languages, while ensuring that the user's experience retains the DWIMity of Perl''. To accomplish this, Inline must do away with nuisances such as interface definition languages, makefiles, build directories and compiling. You simply write your code and run it. Just like Perl.

Inline will silently take care of all the messy implementation details and ``do the right thing''. It analyzes your code, compiles it if necessary, creates the correct Perl bindings, loads everything up, and runs the whole schmear. The net effect of this is you can now write functions, subroutines, classes, and methods in another language and call them as if they were Perl.

Inline in Action - Simple examples in C

Inline addresses an old problem in a completely revolutionary way. Just describing Inline doesn't really do it justice. It should be seen to be fully appreciated. Here are a couple examples to give you a feel for the module.

Hello, world

It seems that the first thing any programmer wants to do when he learns a new programming technique is to use it to greet the Earth. In keeping with that tradition, here is the ``Hello, world'' program using Inline.

    use Inline C => <<'END_C';
    void greet() {
        printf("Hello, world\n");
    }
    END_C
    greet;

Simply run this script from the command line and it will print (you guessed it):

    Hello, world

In this example, Inline.pm is instantiated with the name of a programming language, ``C'', and a string containing a piece of that language's source code. This C code defines a function called greet() which gets bound to the Perl subroutine &main::greet. Therefore, when we call the greet() subroutine, the program prints our message on the screen.

You may be wondering why there are no #include statements for things like stdio.h? That's because Inline::C automatically prepends the following lines to the top of your code:

    #include "EXTERN.h"
    #include "perl.h"
    #include "XSUB.h"
    #include "INLINE.h"

These header files include all of the standard system header files, so you almost never need to use #include unless you are dealing with a non-standard library. This is in keeping with Inline's philosophy of making easy things easy. (Where have I heard that before?)

Just Another ____ Hacker

The next logical question is, ``How do I pass data back and forth between Perl and C?'' In this example we'll pass a string to a C function and have it pass back a brand new Perl scalar.

    use Inline C;
    print JAxH('Perl');

    __END__
    __C__
    SV* JAxH(char* x) {
        return newSVpvf("Just Another %s Hacker\n", x);
    }

When you run this program, it prints:

    Just Another Perl Hacker

You've probably noticed that this example is coded differently then the last one. The use Inline statement specifies the language being used, but not the source code. This is an indicator for Inline to look for the source at the end of the program, after the special marker '__C__'.

The concept being demonstrated is that we can pass Perl data in and out of a C function. Using the default Perl type conversions, Inline can easily convert all of the basic Perl data types to C and vice-versa.

This example uses a couple of the more advanced concepts of Inlining. Its return value is of the type SV* (or Scalar Value). The Scalar Value is the most common Perl internal type. Also, the Perl internal function newSVpfv() is called to create a new Scalar Value from a string, using the familiar sprintf() syntax. You can learn more about simple Perl internals by reading the perlguts and perlapi documentation distributed with Perl.

What about XS and SWIG?

Let's detour momentarily to ponder ``Why Inline?''

There are already two major facilities for extending Perl with C. They are XS and SWIG. Both are similar in their capabilities, at least as far as Perl is concerned. And both of them are quite difficult to learn compared to Inline. Since SWIG isn't used in practice to nearly the degree that XS is, I'll only address XS.

There is a big fat learning curve involved with setting up and using the XS environment. You need to get quite intimate with the following docs:

 * perlxs
 * perlxstut
 * perlapi
 * perlguts
 * perlcall
 * perlmod
 * h2xs
 * xsubpp
 * ExtUtils::MakeMaker

With Inline you can be up and running in minutes. There is a C Cookbook with lots of short but complete programs that you can extend to your real-life problems. No need to learn about the complicated build process going on in the background. You don't even need to compile the code yourself. Perl programmers cannot be bothered with silly things like compiling. ``Tweak, Run, Tweak, Run'' is our way of life. Inline takes care of every last detail except writing the C code.

Another advantage of Inline is that you can use it directly in a script. As we'll soon see, you can even use it in a Perl one-liner. With XS and SWIG, you always set up an entirely separate module, even if you only have one or two functions. Inline makes easy things easy, and hard things possible. Just like Perl.

Finally, Inline supports several programming languages (not just C and C++). As of this writing, Inline has support for C, C++, Python, and CPR. There are plans to add many more.

One-Liners

Perl is famous for its one-liners. A Perl one-liner is short piece of Perl code that can accomplish a task that would take much longer in another language. It is one of the popular techniques that Perl hackers use to flex their programming muscles.

So you may wonder: ``Is Inline powerful enough to produce a one-liner that is also bonifide C extension?'' Of course it is! Here you go:

    perl -e 'use Inline C=>
	q{void J(){printf("Just Another Perl Hacker\n");}};J'

Try doing that with XS! We can even write the more complex Inline JAxH() discussed earlier as a one-liner:

    perl -le 'use Inline C=>
	q{SV*JAxH(char*x){return newSVpvf("Just Another %s Hacker",x);}};print JAxH+Perl'

I have been using this one-liner as my email signature for the past couple months. I thought it was pretty cool until Bernhard Muenzer posted this gem to comp.lang.perl.modules:

    #!/usr/bin/perl -- -* Nie wieder Nachtschicht! *- -- lrep\nib\rsu\!#
    use Inline C=>'void C(){int m,u,e=0;float l,_,I;for(;1840-e;putchar((++e>907
     &&942>e?61-m:u)["\n)moc.isc@rezneumb(rezneuM drahnreB"]))for(u=_=l=0;79-(m
      =e%80)&&I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}';&C

Supported Platforms for C

Inline C works on all of the Perl platforms that I have tested it with so far. This includes all common Unixes and recent versions of Microsoft Windows. The only catch is that you must have the same compiler and make utility that was used to build your perl binary.

Inline has been successfully used on Linux, Solaris, AIX, HPUX, and all the recent BSD's.

There are two common ways to use Inline on MS Windows. The first one is with ActiveState's ActivePerl for MSWin32. In order to use Inline in that environment, you'll need a copy of MS Visual C++ 6.0. This comes with the cl.exe compiler and the nmake make utility. Actually these are the only parts you need. The visual components aren't necessary for Inline.

The other alternative is to use the Cygwin utilities. This is an actual Unix porting layer for Windows. It includes all of the most common Unix utilities, such as bash, less, make, gcc and of course perl.

The Inline Syntax

Inline is a little bit different than most of the Perl modules that you are used to. It doesn't import any functions into your namespace and it doesn't have any object oriented methods. Its entire interface is specified through 'use Inline ...' commands. The general Inline usage is:

    use Inline C => source-code,
               config_option => value,
               config_option => value;

Where C is the programming language, and source-code is a string, filename, or the keyword 'DATA'. You can follow that with any number of optional 'keyword => value' configuration pairs. If you are using the 'DATA' option, with no configuration parameters, you can just say:

    use Inline C;

Fine Dining - A Glimpse at the C Cookbook

In the spirit of the O'Reilly book ``Perl Cookbook'', Inline provides a manpage called C-Cookbook. In it you will find the recipes you need to help satisfy your Inline cravings. Here are a couple of tasty morsels that you can whip up in no time. Bon Appetit!

External Libraries

The most common real world need for Inline is probably using it to access existing compiled C code from Perl. This is easy to do. The secret is to write a wrapper function for each function you want to expose in Perl space. The wrapper calls the real function. It also handles how the arguments get passed in and out. Here is a short Windows example that displays a text box with a message, a caption and an ``OK'' button:

    use Inline C => DATA =>
               LIBS => '-luser32',
               PREFIX => 'my_';
    MessageBoxA('Inline Message Box', 'Just Another Perl Hacker');

    __END__
    __C__
    #include <windows.h>
    int my_MessageBoxA(char* Caption, char* Text) {
      return MessageBoxA(0, Text, Caption, 0);
    }

This program calls a function from the MSWin32 user32.dll library. The wrapper determines the type and order of arguments to be passed from Perl. Even though the real MessageBoxA() needs four arguments, we can expose it to Perl with only two, and we can change the order. In order to avoid namespace conflicts in C, the wrapper must have a different name. But by using the PREFIX option (same as the XS PREFIX option) we can bind it to the original name in Perl.

It Takes All Types

Older versions of Inline only supported five C data types. These were: int, long, double, char* and SV*. This was all you needed. All the basic Perl scalar types are represented by these. Fancier things like references could be handled by using the generic SV* (scalar value) type, and then doing the mapping code yourself, inside the C function.

The process of converting between Perl's SV* and C types is called typemapping. In XS, you normally do this by using typemap files. A default typemap file exists in every Perl installation in a file called /usr/lib/perl5/5.6.0/ExtUtils/typemap or something similar. This file contains conversion code for over 20 different C types, including all of the Inline defaults.

As of version 0.30, Inline no longer has any built in types. It gets all of its types exclusively from typemap files. Since it uses Perl's default typemap file for its own defaults, it actually has many more types available automatically.

This setup provides a lot of flexibility. You can specify your own typemap files through the use of the TYPEMAPS configuration option. This not only allows you to override the defaults with your own conversion code, but it also means that you can add new types to Inline as well. The major advantage to extending the Inline syntax this way is that there are already many typemaps available for various APIs. And if you've done your own XS coding in the past, you can use your existing typemap files as is. No changes are required.

Let's look at a small example of writing your own typemaps. For some reason, the C type float is not represented in the default Perl typemap file. I suppose it's because Perl's floating point numbers are always stored as type double, which is higher precision than float. But if we wanted it anyway, writing a typemap file to support float is trivial.

Here is what the file would look like:

    float                   T_FLOAT

    INPUT
    T_FLOAT
            $var = (float)SvNV($arg)

    OUTPUT
    T_FLOAT
            sv_setnv($arg, (double)$var);

Without going into details, this file provides two snippets of code. One for converting a SV* to a float, and one for the opposite. Now we can write the following script:

    use Inline C => DATA =>
               TYPEMAPS => './typemap';

    print '1.2 + 3.4 = ', fadd(1.2, 3.4), "\n";

    __END__
    __C__
    float fadd(float x, float y) {
        return x + y;
    }

Some Ware Beyond the C

The primary goal of Inline is to make it easy to use other programming languages with Perl. This is not limited to C. The initial implementations of Inline only supported C, and the language support was built directly into Inline.pm. Since then things have changed considerably. Inline now supports multiple languages of both compiled and interpreted nature. And it keeps the implementations in an object oriented type structure, whereby each language has its own separate module, but they can inherit behavior from the base Inline module.

On my second day working at ActiveState, a young man approached me. ``Hi, my name is Neil Watkiss. I just hacked your Inline module to work with C++.''

Neil, I soon found out, was a computer science student at a local university. He was working part-time for ActiveState then, and had somehow stumbled across Inline. I was thrilled! I had wanted to pursue new languages, but didn't know how I'd find the time. Now I was sitting 15 feet away from my answer!

Over the next couple months, Neil and I spent our spare time turning Inline into a generic environment for gluing new languages to Perl. I ripped all the C specific code out of Inline and put it into Inline::C. Neil started putting together Inline::CPP and Inline::Python. Together we came up with a new syntax that allowed multiple languages and easier configuration.

Here is a sample program that makes uses of Inline Python:

    use Inline Python;
    my $language = shift;
    print $language, 
          (match($language, 'Perl') ? ' rules' : ' sucks'),
          "!\n";
    __END__
    __Python__
    import sys
    import re
    def match(str, regex):
        f = re.compile(regex);
        if f.match(str): return 1
        return 0

This program uses a Python regex to show that ``Perl rules!''.

Since Python supports its own versions of Perl scalars, arrays, and hashes, Inline::Python can flip-flop between them easily and logically. If you pass a hash reference to python, it will turn it into a dictionary, and vice-versa. Neil even has mechanisms for calling back to Perl from Python code. See the Inline::Python docs for more info.

See Perl Run. Run Perl, Run!

Inline is a great way to write C extensions for Perl. But is there an equally simple way to embed a Perl interpreter in a C program? I pondered this question myself one day. Writing Inline functionality for C would not be my cup of tea.

The normal way to embed Perl into C involves jumping through a lot of hoops to bootstrap a perl interpreter. Too messy for one-liners. And you need to compile the C. Not very Inlinish. But what if you could pass your C program to a perl program that could pass it to Inline? Then you could write this program:

    #!/usr/bin/cpr
    int main(void) {
        printf("Hello, world\n");
    }

and just run it from the command line. Interpreted C!

And thus, a new programming language was born. CPR. ``C Perl Run''. The Perl module that gives it life is called Inline::CPR.

Of course, CPR is not really its own language, in the strict sense. But you can think of it that way. CPR is just like C except that you can call out to the Perl5 API at any time, without any extra code. In fact, CPR redefines this API with its own CPR wrapper API.

There are several ways to think of CPR: ``a new language'', ``an easy way to embed Perl in C'', or just ``a cute hack''. I lean towards the latter. CPR is probably a far stretch from meeting most peoples embedding needs. But at the same time its a very easy way to play around with, and perhaps redefine, the Perl5 internal API. The best compliment I've gotten for CPR is when my ActiveState coworker Adam Turoff said, ``I feel like my head has just been wrapped around a brick''. I hope this next example makes you feel that way too:

    #!/usr/bin/cpr
    int main(void) {
        CPR_eval("use Inline (C => q{
            char* greet() {
                return \"Hello world\";
            }
        })");
        printf("%s, I'm running under Perl version %s\n",
               CPR_eval("&greet"),
               CPR_eval("use Config; $Config{version}"));
        return 0;
    }

Running this program prints:

    Hello world, I'm running under Perl version 5.6.0

Using the eval() call this CPR program calls Perl and tells it to use Inline C to add a new function, which the CPR program subsequently calls. I think I have a headache myself.

The Future of Inline

Inline version 0.30 was written specifically so that it would be easy for other people in the Perl community to contribute new language bindings for Perl. On the day of that release, I announced the birth of the Inline mailing list, inline@perl.org. This is intended to be the primary forum for discussion on all Inline issues, including the proposal of new features, and the authoring of new ILSMs.

In the year 2001, I would like to see bindings for Java, Ruby, Fortran and Bash. I don't plan on authoring all of these myself. But I may kickstart some of them, and see if anyone's interested in taking over. If you have a desire to get involved with Inline development, please join the mailing list (inline-subscribe@perl.org) and speak up.

My primary focus at the present time, is to make the base Inline module as simple, flexible, and stable as possible. Also I want to see Inline::C become an acceptable replacement for XS; at least for most situations.

Conclusion

Using XS is just too hard. At least when you compare it to the rest of the Perl we know and love. Inline takes advantage of the existing frameworks for combining Perl and C, and packages it all up into one easy to swallow pill. As an added bonus, it provides a great framework for binding other programming languages to Perl. You might say, ``It's a 'Perl-fect' solution!''

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en