Where Wizards Fear To Tread
by Artur Bergman
|
Pages: 1, 2
Altering Perls behavior to be thread-safe, ex::threads::cwd
Somethings change when you use threads; some things that you or a module might do are not like what they used to be. Most of the changes will be due to the way your operating system treats processes that use threads. Each process has typically a set of attributes, which include the current working directory, the environment table, the signal subsystem and the pid. Since threads are multiple paths of execution inside a single process, the operating system treats it as a single process and you have a single set of these attributes.
Yep. That's right - if you change the current working directory in one thread, it will also change in all the other threads! Whoops, better start using absolute paths everywhere, and all the code that uses your module might use relative paths. Aaargh...
Don't worry, this is a solvable problem. In fact, it's solvable by a module.
Perl allows us to override functions using the CORE::GLOBAL namespace.
This will let us override the functions that deal with paths and set the
cwd correctly before issuing the command. So let's start off
package ex::threads::safecwd;
use 5.008;
use strict;
use warnings;
use threads::shared;
our $VERSION = '0.01';
Nothing weird here right? Now, when changing and dealing with the
current working directory one often uses the Cwd module, so let us make
the cwd module safe first. How do we do that?
1) use Cwd;
2) our $cwd = cwd; #our per thread cwd, init on startup from cwd
3) our $cwd_mutex : shared; # the variable we use to sync
4) our $Cwd_cwd = \&Cwd::cwd;
5) *Cwd::cwd = *my_cwd;
sub my_cwd {
6) lock($cwd_mutex);
7) CORE::chdir($cwd);
8) $Cwd_cwd->(@_);
}
What's going on here? Let's analyze it line by line:
-
We include
Cwd. - We declare a variable and assign to it the cwd we start in. This variable will not be shared between threads and will contain the cwd of this thread.
- We declare a variable we will be using to lock for synchronizing work.
-
Here we take a reference to the
&Cwd::cwdand store in$Cwd_cwd. -
Now we hijack
Cwd::cwdand assign to it our ownmy_cwdso whenever someone callsCwd::cwd, it will callmy_cwdinstead. -
my_cwdstarts of by locking $cwd_mutex so no one else will muck. around with the cwd. -
After that we call
CORE::chdir()to actually set the cwd to what this thread is expecting it to be. -
And we round off by calling the original
Cwd::cwdthat we stored in step 4 with any parameters that we were handed to us.
In effect we have hijacked Cwd::cwd and wrapped it around with a lock
and a chdir so it will report the correct thing!
Now that cwd() is fixed, we need a way to actually change the
directory. To do this, we install our own global chdir, simply like
this.
*CORE::GLOBAL::chdir = sub {
lock($cwd_mutex);
CORE::chdir($_[0]) || return undef;
$cwd = $Cwd_cwd->();
};
Now, whenever someone calls chdir() our chdir will be called
instead, and in it we start by locking the variable controlling access,
then we try to chdir to the directory to see if it is possible,
otherwise we do what the real chdir would do, return undef. If it
succeeds, we assign the new value to our per thread $cwd by calling the
original Cwd::cwd()
The above code is actually enough to allow the following to work:
use threads
use ex::threads::safecwd;
use Cwd;
chdir("/tmp");
threads->create(sub { chdir("/usr") } )->join();
print cwd() eq '/tmp' ? "ok" : "nok";
Since the chdir("/usr"); inside the thread will not affect the other
thread's $cwd variable, so when cwd is called, we will lock down the
thread, chdir() to the location the thread $cwd contains and perform a
cwd().
While this is useful, we need to get along and provide some more functions to extend the functionality of this module.
*CORE::GLOBAL::mkdir = sub {
lock($cwd_mutex);
CORE::chdir($cwd);
if(@_ > 1) {
CORE::mkdir($_[0], $_[1]);
} else {
CORE::mkdir($_[0]);
}
};
*CORE::GLOBAL::rmdir = sub {
lock($cwd_mutex);
CORE::chdir($cwd);
CORE::rmdir($_[0]);
};
The above snippet does essentially the same thing for both mkdir and
rmdir. We lock the $cwd_mutex to synchronize access, then we chdir to
$cwd and finally perform the action. Worth noticing here is the check we
need to do for mkdir to be sure the prototype behavior for it is
correct.
Let's move on with opendir, open, readlink, readpipe,
require, rmdir, stat, symlink, system and unlink. None
of these are really any different from the above with the big exception
of open. open has a weird bit of special case since it can take
both a HANDLE and an empty scalar for autovification of an anonymous
handle.
*CORE::GLOBAL::open = sub (*;$@) {
lock($cwd_mutex);
CORE::chdir($cwd);
if(defined($_[0])) {
use Symbol qw();
my $handle = Symbol::qualify($_[0],(caller)[0]);
no strict 'refs';
if(@_ == 1) {
return CORE::open($handle);
} elsif(@_ == 2) {
return CORE::open($handle, $_[1]);
} else {
return CORE::open($handle, $_[1], @_[2..$#_]);
}
}
Starting off with the usual lock and chdir() we then need to check if
the first value is defined. If it is, we have to qualify it to the
callers namespace. This is what would happen if a user does
open FOO, "+>foo.txt". If the user instead does
open main::FOO, "+>foo.txt",
then Symbol::qualify notices that the handle is already qualified and
returns it unmodified. Now since $_[0] is a readonly alias we cannot
assign it over so we need to create a temporary variable and then
proceed as usual.
Now if the user used the new style open my $foo, "+>foo.txt", we need
to treat it differently. The following code will do the trick and
complete the function.
else {
if(@_ == 1) {
return CORE::open($_[0]);
} elsif(@_ == 2) {
return CORE::open($_[0], $_[1]);
} else {
return CORE::open($_[0], $_[1], @_[2..$#_]);
}
}
};
Wonder why we couldn't just assign $_[0] to $handle and unify the code
path? You see, $_[0] is an alias to the $foo in
open my $foo, "+>foo.txt" so CORE::open will correctly work.
However, if we do $handle = $_[0] we take a copy of the undefined
variable and CORE::open won't do what I mean.
So now we have a module that allows the you to safely use relative paths in most of the cases and vastly improves your ability to port code to a threaded environment. The price we pay for this is speed, since every time you do an operation involving a directory you are serializing your program. Typically, you never do those kinds of operations in a hot path anyway. You might do work on your file in a hot path, but as soon as we have gotten the filehandle no more locking is done.
A couple of problems remain. Performance-wise, there is one big problem
with system(), since we don't get control back until the
CORE::system() returns, so all path operations will hang waiting for
that. To solve that we would need to revert to XS and do some magic with
regard to the system call. We also haven't been able to override the
file test operators (-x and friends), nor can we do anything about
qx {}. Solving that problem requires working up and down the optree
using B::Generate and B::Utils. Perhaps a future version of the
module will attempt that together with a custom op to do the locking.
Conclusion
Threads in Perl are simple and straight forward, as long as we stay in pure Perl land everything behaves just about how we would expect it to. Converting your modules should be a simple matter of programming without any big wizardly things to be done. The important thing to remember is to think about how your module could possibly take advantage of threads to make it easier to use for the programmer.
Moving over to XS land is altogether different; stay put for the next article that will take us through the pitfalls of converting various kinds of XS modules to thread-safe and thread-friendly levels.

