Independently Parsing Perl
by Adam KennedyJune 09, 2005
A few years into my programming career, I found myself involved in a somewhat unusual web project for an enormous global IT company. Due to some odd platform issues, we could write the intranet half of the project only in Perl and the almost-identical public internet half only in Java.
In my efforts to pick up enough Java to help my Perl code interoperate with the code from the Java guys, I stumbled on a relatively new editor with the rather expansive name of JetBrains IntelliJ IDEA.
What a joy! It quite simply made learning Java an absolute pleasure, with comprehensive tab completion, light and simple API docs, easy exploration of the mountain of Java classes, and unobtrusive indicators showing me my mistakes and offering to fix them. In short, it had lots of brains and a fast, clean user interface.
Where Is IntelliPerl?
Although I only needed it heavily for a few months, it's been my gold standard ever since, and my definition of what a "great" editor should be. I install every new Perl editor and IDE I come across in the hope that Perl might one day get an editor half as good as what Java has had for years.
|
Related Reading
Advanced Perl Programming |
These great editors are spreading. Java is now up to one and a half (Eclipse is nearly great but still seems not quite "effortless" enough about what it does). Dreamweaver gave HTML people their great editor years ago, and I've heard that Python may now have something that qualifies.
Interestingly, these great editors seem to share one major thing in common.
How to Build a Great Editor
Rather than relying on the language's parser to examine code, great editors seem to implement special parsers of their own. These parsers treat a file less like code and more like a generic document (that just also happens to be code).
It's a key distinction, and one that provides two critical capabilities.
First, it creates a "round-trip" capability, parsing a file into an internal model and back out again without moving a single white space character out of place. Even if parts of a file are broken or badly formatted, you can still change other parts of the file and save it correctly without it changing anything you don't alter.
Second, it makes the parser extremely safe and error-tolerant. Any code open in an editor is there for a reason--generally because it isn't finished yet, is broken, or needs changing. A document parser can hit a problem, flag it, stumble for a character or so until it finds something it recognizes, and then continue on.
Parsing as code is an entirely different task, and one often unsuited to these type of faults.
For example, take the following.
print "Hello World!\n";
}
MyModule->foobar;
For an editor using Perl itself to understand this code, it's game over once it hits the naked closing brace, because the code is invalid. Without knowledge of what is below the brace, you lose all of the intelligence that needs the parser: syntax highlighting, module checking, helpful tips, the lot.
It's just simply not a reasonable way to build an editor, where a file can be both unfinished and have dozens of bugs.
Building a Document Parser for Perl
Even without an editor to put it in (yet), a document parser for Perl would be extraordinarily useful for all sorts of tasks. At the time, though, all I really wanted was a really accurate HTML syntax highlighter.
Some time in early 2002, I was bored one afternoon and had a first stab at the problem. The result was pretty predictable, given patterns I've seen in others trying the same thing. It was A) based on regular expressions, and B) useless for anything even remotely interesting.
Between then and the start of The Perl Foundation grant in December 2004, I've spent a day or so a month on the problem, rewriting and throwing away code. I've junked two tokenizers, one lexer, an analysis package, three syntax highlighters, an obfuscation package, a quote engine, and half of the classes in the current object tree.
Now, finally, PPI is complete, bar some minor features and testing. It is 100 percent round-trip safe, and it's been stress tested against the 38,000 (non-Acme) modules in CPAN, handling all but 28 of the most broken and bizarre.
What Does It Do?
PPI should be the basis for any task where you need to parse, analyze or manipulate Perl, and it finally provides a platform for doing these tasks to their full potential. This covers a huge range of possible tasks; far too many to cover in any depth here.
For this article, I want to demonstrate how PPI can improve existing tools that currently only do a very basic job, when there is the potential for so much more.
One of these is part of the PAR application-packaging module. When PAR bundles a module into the internal include directory, it tries to reduce the size of the modules by stripping out POD. Of course, what would be better would be to strip out everything that is excess and cut PAR file sizes even more.
This is a form of compression, but given the potential confusion in using something like "Compress::Perl" as a name, I'm picking my own term. I hereby anoint the term "Squish". A squished module occupies as little space as possible, having had redundant characters removed. It will be extremely small, although it might look a little "squished" to look at :)
Perl::Squish
Rather than showing you the final project, I prefer to show the process of squishing a single module.
# Where is File::Spec on our system?
use Class::Inspector;
my $filename = Class::Inspector->resolved_filename( 'File::Spec' );
# Load File::Spec as a document
use PPI;
my $Document = PPI::Document->new( $filename );
Everything you do with PPI starts and finishes with PPI::Document objects. If you find yourself using the lexer directly, you are probably doing something wrong.
Where can I start cutting out the fat? For starters, many core modules
have an __END__ section.
# Get the (one and only) __END__ section
my $End = $Document->find_first( 'Statement::End' );
# Delete it from the document
$End->delete if $End;
PPI provides a set of search methods that you can use on any element that
has children. find_first is a safe guess, because there can only
be one __END__ section. The search methods actually take
&wanted functions like File::Find, so
'Statement::End' is really syntactic sugar for:
sub wanted {
my ($Document, $Element) = @_;
$Element->isa('PPI::Statement::End');
}
Of course, there's a faster way to do the same thing. The
prune method finds and immediately deletes all elements that match
a particular condition.
# Delete all comments and POD
$Document->prune( 'Token::Pod' );
$Document->prune( 'Token::Comment' );
For a more serious example, here's how to strip the non-compulsory braces
from ->method():
# Remove useless braces
$Document->prune( sub {
my $Braces = $_[1];
$Braces->isa('PPI::Structure::List') or return '';
$Braces->children == 0 or return '';
my $Method = $Braces->sprevious_sibling or return '';
$Method->isa('PPI::Token::Word') or return '';
$Method->content !~ /:/ or return '';
my $Operator = $Method->sprevious_sibling or return '';
$Operator->isa('PPI::Token::Operator') or return '';
$Operator->content eq '->' or return '';
return 1;
} );
It's a little bit wordy, but is relatively straightforward to write. Just add conditions and discard as you go. You can get other elements, calculate anything or call sub-searches.
When you have finished, be sure to save the file.
# Save the file
$Document->save( "$filename.squish" );
Wrapping It All Up
All you need to do now is is wrap it all up in some typical module boilerplate.
package Perl::Squish;
use strict;
use PPI;
our $VERSION = '0.01';
# Squish a file in place
# Perl::Squish->file( $filename )
sub file {
my ($class, $file) = @_;
my $Document = PPI::Document->new( $file ) or return undef;
$class->document( $Document ) or return undef;
$Document->save( $file );
}
# Squish a document object
# Perl::Squish->document( $Document );
sub document {
my ($squish, $Document) = @_;
# Remove the stuff we did earlier
$Document->prune('Statement::End');
$Document->prune('Token::Comment');
$Document->prune('Token::Pod');
$Document->prune( sub {
my $Braces = $_[1];
$Braces->isa('PPI::Structure::List') or return '';
$Braces->elements == 0 or return '';
my $Method = $Braces->sprevious_sibling or return '';
$Method->isa('PPI::Token::Word') or return '';
$Method->content !~ /:/ or return '';
my $Operator = $Method->sprevious_sibling or return '';
$Operator->isa('PPI::Token::Operator') or return '';
$Operator->content eq '->' or return '';
return 1;
} );
# Let's also do some whitespace cleanup
my @whitespace = $Document->find('Token::Whitespace');
foreach ( @whitespace ) {
$_->{content} = $_->{content} =~ /\n/ ? "\n" : " ";
}
1;
}
1;
Finally, the last step is to wrap it all up as a proper module. You can see the finished product prettied up with PPI's syntax highlighter at CPAN::Squish. I've added a few additional small features to the basic code described above, but you get the idea. See also Perl::Squish for more details.
In 15 minutes, I've knocked together a pretty simple module that dramatically improves on what you could do without something like PPI. Now imagine the hard things it makes possible.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 19 of 19.
- Python Editor?
2005-06-13 06:58:06 douglascalvert [Reply]
What is the python editor that you speak of?- Python Editor?
2005-06-13 21:34:36 AdamKennedy [Reply]
I'm afraid I don't have a name for you.
When I was writing this article and discussing IntelliJ with various people, I had a couple of them mention that "you can get something similar for Python now".
That's all the information I have.
However, given python's very simplistic syntax, compared to Perl, it would not surprise me in the least if one exists.- Python Editor?
2005-06-14 23:42:35 StefanHanke [Reply]
Eric?
The screenshots show promise.- Python Editor?
2005-06-15 21:14:18 YPERL [Reply]
Thea beast is called BOA:
http://boa-constructor.sourceforge.net/
- Python Editor?
- Python Editor?
- Python Editor?
- python editor?
2005-06-13 09:43:24 ksnkjj [Reply]
I am also curious about the python editor you mentioned- python editor?
2005-06-15 21:12:24 YPERL [Reply]
The beast is called BOA :
http://boa-constructor.sourceforge.net/
- python editor?
- I too have dreamed of IntelliPerl
2005-06-13 13:51:48 MaxDemian [Reply]
I also used java for a large project and stumbled onto IntelliJ IDEA. It has become the standard that all later IDE's are compared to.
I currently use Komodo as my perl IDE, and they have made strides in that direction but i'm hoping that someone in the OSS world will take this module and run with it to create a plugin to eclipse or scite or something or anything to rival IntelliJ.
Hell, if Komodo ever gets functionality like IntelliJ's they can double their price and I'll still buy it!- I too have dreamed of IntelliPerl
2005-06-13 21:39:49 AdamKennedy [Reply]
A couple of Perl editor authors have written to me since this article came out. In particular an EPIC contributor and the PCE perl editor ( http://proton-ce.sourceforge.net/site/en/news.shtml ( so there may be some movement in this regard now.- I too have dreamed of IntelliPerl
2005-06-14 14:49:34 MaxDemian [Reply]
That's great, e-p-i-c is coming along nicely already and PCE I had been looking at when i periodically look through the latest developments in perl IDE's. I'll have to look into contributing to one of these projects.
- I too have dreamed of IntelliPerl
2005-06-14 23:38:46 ADB [Reply]
What do you think of Active State's Visual Perl?- I too have dreamed of IntelliPerl
2005-09-22 03:05:25 AdamKennedy [Reply]
I'm afraid that would seem to require a license of Visual Studio, something I don't have. But I have spoken to it's author, and I believe it's at roughly a similar point to Komodo.
- I too have dreamed of IntelliPerl
- I too have dreamed of IntelliPerl
2005-06-16 03:58:15 AdamKennedy [Reply]
And to show how fast things can move, the following is a screen shot of my private version of PCE, hacked up to include support for the new PPI::Transform refactoring API.
http://ali.as/photos/IntelliPerl.png
As you can see, now I can Perl::Squish my documents in the comfort of an editor! :)
- I too have dreamed of IntelliPerl
2005-07-09 05:53:59 Algieba [Reply]
I just wonder why nobody here mentioned emacs (and its cperl-mode)? Is it the case that everybody with emacs are already happy with its perl support (as I do) or somebody already working on its integration with PPI?- I too have dreamed of IntelliPerl
2005-07-09 06:00:12 AdamKennedy [Reply]
From what I can see cperl is mainly about syntax highlighting and similar things. PPI and many of the creations based on it, will be for working at a higher level. Tab completion, autodoc, various things.
That said, I would imagine that someone could fairly easily create a console-launched emacs adapter than did transformations and fed data back and forth via pipes.
- Time to do your homework
2005-09-21 23:41:38 tactfulcactus [Reply]
You have implemented a stand-alone Perl parser to get good editor support, and you haven't even looked properly at Emacs cperl...?
Hint: It's not "mainly" about syntax highlighting; it certainly does tab completion and "various things" as well. I don't know what exactly you mean by "autodoc" but it certainly has support for POD.
I can understand that some people have a bit of a bias against Emacs; there is definitely a bit of a learning curve and a kind of "lock in", but it's lock-in of the good kind -- you simply don't want anything else once you get accustomed to how it works.- Time to do your homework
2005-09-22 03:01:28 AdamKennedy [Reply]
Well, I implemented a Perl parser to do a number of things, one of the most recent of which is editors.
By autodoc, I mean something that reads your code, understands the structure, and can generate useful documentation automatically, in addition to the POD.
Or if you don't have POD, use it to let the editor automatically generate all the boilerplate POD for existing code, or apply different transforms or metrics.
For example transforming code to bring it in line with Perl Best Practises rules (if that's your thing) or determining what version of perl you need for some bit of code, or any number of things that go far beyond what cperl does today, which according to #perl cperl users is "mostly about syntax highlighting and indenting" and according to purl is "about guessing (badly) the syntax so that indenting and stuff works".
- Time to do your homework
- Time to do your homework
- I too have dreamed of IntelliPerl
- I too have dreamed of IntelliPerl
- I too have dreamed of IntelliPerl
2005-09-22 03:03:54 AdamKennedy [Reply]
To follow this up there's been some recent talk about integrating PPI and related functionality into Komodo. While it might take a little while, it's looking like it will happen at some point.
- I too have dreamed of IntelliPerl
- Jetbrains MPS (IP)
2005-06-20 20:16:08 sozin [Reply]
The jetbrains guys are working on some very interesting stuff -- check out http://www.jetbrains.com/mps/. When Sergey says he's working on something that will competely change the way programmers work forever .. he isn't kidding.
- IDEA language OpenAPI ?
2005-07-23 22:20:30 Dash [Reply]
I can comiserate with the Adam. I have used IDEA for 2 years but switched to perl coding recently. I miss IDEA's features every single day since then.
In IDEA 5.0 they have added OpenAPI to use IDEA's capabilities on other languages. It would be cool PPI was used to add a perl extension to that IDE.



