More Lightning Articles
by Mark Leighton Fisher, chromatic , Shlomi Fish, Bob DuCharmeMarch 31, 2005
Customizing Emacs with Perl
by Bob DuCharme
Over time, I've accumulated a list of Emacs customizations I wanted to implement when I got the chance. For example, I'd like macros to perform certain global replaces just within a marked block, and I'd like a macro to reformat an Outlook formatted date to an ISO 8609 formatted date. I'm not overly intimidated by the elisp language used to customize Emacs behavior; I've copied elisp code and modified it to make some tweaks before, I had a healthy dose of Scheme and LISP programming in school, and I've done extensive work with XSLT, a descendant of these grand old languages. Still, as with a lot of postponed editor customization work, I knew I'd have to use these macros many, many times before they earned back the time invested in creating them, because I wasn't that familiar with string manipulation and other basic operations in a LISP-based language. I kept thinking to myself, "This would be so easy if I could just do the string manipulation in Perl!"
Then, I figured out how I could write Emacs functions that called Perl to
operate on a marked block (or, in Emacs parlance, a "region"). Many Emacs users
are familiar with the Escape+| keystroke, which invokes the
shell-command-on-region function. It brings up a prompt in the
minibuffer where you enter the command to run on the marked region, and after
you press the Enter key Emacs puts the command's output in the minibuffer if it
will fit, or into a new "*Shell Command Output*" buffer if not. For example,
after you mark part of an HTML file you're editing as the region, pressing
Escape+| and entering wc (for "word count") at the
minibuffer's "Shell command on region:" prompt will feed the text to this
command line utility if you have it in your path, and then display the number of
lines, words, and characters in the region at the minibuffer. If you enter
sort at the same prompt, Emacs will run that command instead of
wc and display the result in a buffer.
Entering perl /some/path/foo.pl at the same prompt will run the
named Perl script on the marked region and display the output appropriately.
This may seem like a lot of keystrokes if you just want to do a global replace
in a few paragraphs, but remember: Ctrl+| calls Emacs's built-in
shell-command-on-region function, and you can call this same
function from a new function that you define yourself. My recent great
discovery was that along with parameters identifying the region boundaries and
the command to run on the region, shell-command-on-region takes an
optional parameter that lets you tell it to replace the input region with the
output region. When you're editing a document with Emacs, this allows you to
pass a marked region outside of Emacs to a Perl script, let the Perl script do
whatever you like to the text, and then Emacs will replace the original text
with the processed version. (If your Perl script mangled the text, Emacs'
excellent undo command can come to the rescue.)
Consider an example. When I take notes about a project at work, I might
write that Joe R. sent an e-mail telling me that a certain system won't need any
revisions to handle the new data. I want to make a note of when he told me
this, so I copy and paste the date from the e-mail he sent. We use Microsoft
Outlook at work, and the dates have a format following the model "Tue 2/22/2005
6:05 PM". I already have an Emacs macro bound to alt+d to insert
the current date and time (also handy when taking notes) and I wanted the date
format that refers to e-mails to be the same format as the ones inserted with
my alt+d macro: an ISO 8609 format of the form
"2005-02-22T18:05".
The .emacs startup file holds customized functions that you want available during your Emacs session. The following shows a bit of code that I put in mine so that I could convert these dates:
(defun OLDate2ISO ()
(interactive)
(shell-command-on-region (point)
(mark) "perl c:/util/OLDate2ISO.pl" nil t))
The (interactive) declaration tells Emacs that the function
being defined can be invoked interactively as a command. For example, I can
enter "OLDate2ISO" at the Emacs minibuffer command prompt, or I can press a
keystroke or select a menu choice bound to this function. The
point and mark functions are built into Emacs to
identify the boundaries of the currently marked region, so they're handy for
the first and second arguments to shell-command-on-region, which
tell it which text is the region to act on. The third argument is the actual
command to execute on the region; enter any command available on your operating
system that can accept standard input. To define your own Emacs functions that
call Perl functions, just change the script name in this argument from
OLDate2ISO to anything you like and then change this third
argument to shell-command-on-region to call your own Perl
script.
Leave the last two arguments as nil and t. Don't
worry about the fourth parameter, which controls the buffer where the shell
output appears. (Setting it to nil means "don't bother.") The
fifth parameter is the key to the whole trick: when non-nil, it tells Emacs to
replace the marked text in the editing buffer with the output of the command
described in the third argument instead of sending the output to a buffer.
If you're familiar with Perl, there's nothing particularly interesting about the OLDate2ISO.pl script. It does some regular expression matching to split up the string, converts the time to a 24 hour clock, and rearranges the pieces:
# Convert Outlook format date to ISO 8309 date
#(e.g. Wed 2/16/2005 5:27 PM to 2005-02-16T17:27)
while (<>) {
if (/\w+ (\d+)\/(\d+)\/(\d{4}) (\d+):(\d+) ([AP])M/) {
$AorP = $6;
$minutes = $5;
$hour = $4;
$year = $3;
$month = $1;
$day = $2;
$day = '0' . $day if ($day < 10);
$month = '0' . $month if ($month < 10);
$hour = $hour + 12 if ($6 eq 'P');
$hour = '0' . $hour if ($hour < 10);
$_ = "$year-$month-$day" . "T$hour:$minutes";
}
print;
}
When you start up Emacs with a function definition like the defun
OLDate2ISO one shown above in your .emacs file, the function is
available to you like any other in Emacs. Press Escape+x to bring
up the Emacs minibuffer command line and enter "OLDate2ISO" there to execute it
on the currently marked buffer. Like any other interactive command, you can
also assign it to a keystroke or a menu choice.
There might be a more efficient way to do the Perl coding shown above, but I didn't spend too much time on it. That's the beauty of it: with five minutes of Perl coding and one minute of elisp coding, I had a new menu choice to quickly do the transformation I had always wished for.
Another example of something I always wanted is the following txt2htmlp.pl script, which is useful after plugging a few paragraphs of plain text into an HTML document:
# Turn lines of plain text into HTML p elements.
while (<>) {
chop($_);
# Turn ampersands and < into entity references.
s/\&/\&\;/g;
s/</\<\;/g;
# Wrap each non-blank line in a "p" element.
print "<p>$_</p>\n\n" if (!(/^\s*$/));
}
Again, it's not a particularly innovative Perl script, but with the following bit of elisp in my .emacs file, I have something that greatly speeds up the addition of hastily written notes into a web page, especially when I create an Emacs menu choice to call this function:
(defun txt2htmlp ()
(interactive)
(shell-command-on-region (point)
(mark) "perl c:/util/txt2htmlp.pl" nil t))
Sometimes when I hear about hot new editors, I wonder whether they'll ever take the place of Emacs in my daily routine. Now that I can so easily add the power of Perl to my use of Emacs, it's going to be a lot more difficult for any other editor to compete with Emacs on my computer.





