A Timely Start
by Jean-Louis Leroy
|
Pages: 1, 2, 3
Making Perl Faster
Still pushing, I wrote a script that consolidates @INC. It creates a directory tree containing the union of all of the directory trees found in @INC and then populates them with symlinks to the .pm files. I could replace the lengthy PERL5LIB with one that contained just one directory. Here's the resulting dependency listing:
ler@cougar: cm_perl -I lib/MAP/CONFIG -c -MDevel::Dependencies=origin ftp.pl
Devel::Dependencies 23 dependencies:
Carp.pm lib/MAP/CONFIG/Carp.pm (2)
Config.pm lib/MAP/CONFIG/PA-RISC2.0/Config.pm (1)
Errno.pm lib/MAP/CONFIG/PA-RISC2.0/Errno.pm (1)
Exporter.pm lib/MAP/CONFIG/Exporter.pm (2)
Exporter/Heavy.pm lib/MAP/CONFIG/Exporter/Heavy.pm (2)
IO.pm lib/MAP/CONFIG/PA-RISC2.0/IO.pm (1)
IO/Handle.pm lib/MAP/CONFIG/PA-RISC2.0/IO/Handle.pm (1)
IO/Socket.pm lib/MAP/CONFIG/PA-RISC2.0/IO/Socket.pm (1)
IO/Socket/INET.pm lib/MAP/CONFIG/IO/Socket/INET.pm (2)
IO/Socket/UNIX.pm lib/MAP/CONFIG/IO/Socket/UNIX.pm (2)
Net/Cmd.pm lib/MAP/CONFIG/Net/Cmd.pm (2)
Net/Config.pm lib/MAP/CONFIG/Net/Config.pm (2)
Net/FTP.pm lib/MAP/CONFIG/Net/FTP.pm (2)
SelectSaver.pm lib/MAP/CONFIG/SelectSaver.pm (2)
Socket.pm lib/MAP/CONFIG/PA-RISC2.0/Socket.pm (1)
Symbol.pm lib/MAP/CONFIG/Symbol.pm (2)
Time/Local.pm lib/MAP/CONFIG/Time/Local.pm (2)
XSLoader.pm lib/MAP/CONFIG/PA-RISC2.0/XSLoader.pm (1)
lib/MAP/CONFIG/Net/libnet.cfg lib/MAP/CONFIG/Net/libnet.cfg (2)
strict.pm lib/MAP/CONFIG/strict.pm (2)
vars.pm lib/MAP/CONFIG/vars.pm (2)
warnings.pm lib/MAP/CONFIG/warnings.pm (2)
warnings/register.pm lib/MAP/CONFIG/warnings/register.pm (2)
Total directory searches: 39
ftp.pl syntax OK
Why does Perl find Carp.pm in the second directory, considering Perl should search the directory passed via -I first? perl -V gives the answer:
(extract)
@INC:
lib/MAP/CONFIG/PA-RISC2.0
lib/MAP/CONFIG
XxB
/build/LIB/UTILS!11.162/ftpstuff/PA-RISC2.0
/build/LIB/UTILS!11.162/ftpstuff
/build/LIB/UTILS!11.162/alien/PA-RISC2.0
Under some circumstances, Perl adds architecture-specific paths to @INC; for more information on this, see the description of PER5LIB in the perlrun manpage.
Finally I timed the ftp.pl program twice: with the normal PERL5LIB and with the consolidated PERL5LIB. Here are the results (u stands for "user time," s for "system time," and u+s is the sum; times are in seconds):
Running ft_ftp.pl..
47-element PERL5LIB : u: 0.07 s: 0.20 u+s: 0.27
Consolidated PERL5LIB: u: 0.05 s: 0.04 u+s: 0.09
Therefore, I recommended incorporating the consolidation script as part of the process that builds the various systems.
Conclusion
It may seem silly to have a PERL5LIB that contains 47 directories. On the other hand, that kind of situation naturally arises once you try to use Perl in complex developments such as the Agency's. After all, Perl "is a real programming language," we like to say, so why can't it do what C++ or Ada can do?
I still think that we need a Perl compiler. The problem is not the length of PERL5LIB, it's the fact that Perl processes it each time it runs the script. My workaround, in effect, amounts to "compiling" a fast Perl lib.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 22 of 22.
- Why no compiler?
2006-07-21 12:39:56 agarsha [Reply]
I never understand/stood why there is so much resistance to the compiler idea. I used to have to distribute my code to perl-naive customers. It would have been great to give them an executable... and not have to worry about their perl version, path mods, custom tinkering, whatever... and if they had to install a module... look out... some enterprise *nixes don't come with a free C-compiler and not all admins are skilled. Some/Many admins are more like operators-of-old.
I solved my distribution scenario with code of course... "just run this". But it would have saved me a lot of time (devel/QA/debug/another-case) and hassle... not to have had to worry about this... and just give 'em an exectuable and short set of instructions for "installation".
ActiveState has the right idea with there compiler. But, when it is perl... hard to get customers or management to spend money. If their were a compiler that was part of the default/standard Perl installation... rock-on.
Perl is the best language... because I know it the best... :) But seriously, what puts Perl ahead in my mind is its maturity. The one thing I always wished for was a truly functional "compiler"-that-works-in-an-enterprise-way-and-is-standard.
I wish Java had that too. Statically linking in the VM is good enough for me... but it has to always work and not introduce goobs of new un-figure-out-able-for-non-C-type-folks bugs.
- Your solution is "Hack #33" of the Perl Hacks book
2006-07-14 22:48:47 TomDLux [Reply]
Your problem is that your site has a large search path for libraries. But once a program is completed, the search path is identical on each run. You can optimize away the flexibility of the wide search path for a hard-coded, FAST, fixed path. The solution is "Hack #33" of the Perl Hacks book by chromatic and friends.
Package Devel::Presolve installs a subroutine on the front of @INC which will keep track of all the subroutines that are loaded. Once the module loading is complete, at INIT time, it reads off the locations where the modules were found, and prints the information out in a form you can stick into a preload.pm module. Loading this module will then load all the modules from fixed locations, with no searching.
You can replace the preload.pm module ith an empty one, if you need to re-determine module locations.
There are some details I've skimmed over, so check out pages 81 & 82.
Let us know how it works compared to plain Perl or (gasp) ksh.
Tom Legrady- Your solution is "Hack #33" of the Perl Hacks book
2006-10-06 17:30:28 chromatic1 [Reply]
Amusingly, I wrote Hack #33 in response to Jean-Louis's woes. It didn't solve his problem, but it was an alternate approach.
- Your solution is "Hack #33" of the Perl Hacks book
- Theads can help
2006-01-04 16:59:17 davewhipp [Reply]
A trick I've used in the past to solve this type of problem (searching multiple paths) is to use multiple threads (or processes). A file-stat is a synchronous operation, and parallelisation -- especially when using networked file systems -- can provide near perfect speedups (47 stats in parallel have the same latency as 1), and collecting the results is negligable compared to an nfs roundtrip time
Theoretically you could implement this without actually changing perl itself: you could replace @INC with a coderef that implements the parallel search. I haven't tried that though.
- Don't forget to use the tools your OS provides
2005-12-27 07:48:38 mzraly [Reply]
Sometimes the OS provides tools that can help you diagnose the problem right out of the box.
Had you been running on Solaris or Linux, you could have run truss or strace to track the system calls executed by perl. This would have shown you a whole bunch of open calls returning ENOENT for the same file in each directory in @INC until an open succeeded. Even perl -v shows this, because it still must search for all those shared libraries (e.g. libc, libm, and many others).
Does HP-UX have an equivalent to truss or strace?
- Don't forget to use the tools your OS provides
2005-12-28 02:36:53 jll [Reply]
Yes, it's called tusc. And it did help me troubleshoot some problems related to loading dynamic libs.
- Don't forget to use the tools your OS provides
- Here is the code and more
2005-12-23 07:16:01 jll [Reply]
I have uploaded files related to this article here: http://www.soundobjectlogic.com/articles/perl.com/a_timely_start/
You will find 5 files there.
Devel/Dependencies.pm - as i said it's not in CPAN format but I can do that and upload it if someone is interested.
bench contains both the benchmark and the consolidation code. Consolidation as part of the build process has not been implemented to this day, because we decided to keep the ksh version, so I never factored it out.
Finally, much of YAPC::Europe was recorded on video, you will find my lightning talk in perl-vs-ksh.avi along with the slides in OpenOffice and pdf format.
- Here is the code and more
2006-01-19 08:40:15 hexla [Reply]
The URL needs to be:
http://www.soundobjectlogic.com/articles/perl.com/a_timely_start/index.html
.. in order to view the page (permission denied otherwise)
- Here is the code and more
2006-01-02 04:31:35 glasswalk3r [Reply]
Hello,
I do think it's worth to have your module at CPAN. Could you make it available there?
I tried to download the code from http://www.soundobjectlogic.com/articles/perl.com/a_timely_start/ but it is taking a lot of time to load (could be a problem with my access to the Internet too).- Here is the code and more
2006-01-06 00:45:48 jll [Reply]
I have improved Devel::Dependencies a bit (it can also display total module load time), documented it and added a test suite...and sent it to CPAN:
http://search.cpan.org/~jlleroy/Devel-Dependencies-1.00/
- Here is the code and more
2006-01-03 07:01:45 jll [Reply]
The server runs in a virtual machine; try again (use Patience;-). I will discuss the module on c.l.p.modules and upload it in the next days.
- Here is the code and more
- Here is the code and more
- Total execution time?
2005-12-22 12:34:18 revdiablo [Reply]
In the article, you pointed out that loading the modules takes a significant portion of the script's runtime. Later, I see 0.27s listed for the "47-element PERL5LIB" and 0.09s for the "Consolidated PERL5LIB".
Did I misread something here, or is the implication here that 0.27s a significant portion of the script's runtime? If that's the case, I'd be curious to know what the aforementioned specs are. I'll grant you that 0.27s is a lot slower relative to 0.09s, it still seems pretty darned fast.- Total execution time?
2005-12-22 12:44:48 jll [Reply]
Yes the program spends a lot more time getting started than actually performing its task. That's the whole point of my article. By consolidating @INC I can make it run three times faster. To match the ksh equivalent script I would have needed a 5x speedup.
WRT 0.27 - more than one quarter of a second - being "darned fast", the short answer is: it's not up to me to decide. There were specs that said it ain't fast enough. The ksh on the other hand has no problem meeting the specs.
The context is this: the script is used to upload some files to a server and that can mean dozens of files per second. My colleague wrote a test program to simulate that situation and the Perl version quickly pushed the load to 20. The ksh version did not.- Total execution time?
2005-12-22 13:15:40 revdiablo [Reply]
Thanks for the reply. FYI, I reiterated the seemingly "common sense" facts just to make sure I didn't misread your article. Mostly I was curious what your specs were that dictated 0.27s not being fast enough. Now I know. Though I might have gone a different route in optimizing it (e.g. you could perhaps have each iteration of the script process multiple files), at least I understand what you were aiming for.
- Total execution time?
- Total execution time?
- where is the code?
2005-12-22 04:21:27 glasswalk3r [Reply]
Good article. But where is the code for the module Devel::Dependencies and the script to consolidate @INC?- where is the code?
2005-12-22 12:38:04 jll [Reply]
I will post the code tomorrow - can't do it right now, it's at the Agency. Devel::Dependencies is not in CPAN format, but if people find it worthy I can package it and send it to CPAN.
- where is the code?
- Static vs. Dynamic linking
2005-12-21 16:41:11 humblen [Reply]
Compilation of perl would not solve the module loading issue. The module loading is handled as dynamic library loading. The best way to reduce the PERL5LIB traversal time and thus module loading time is to statically link the modules.- Static vs. Dynamic linking
2005-12-22 12:34:57 jll [Reply]
Yes I could build a dozen of perls, one in each of the Agency's subsystem that contains modules. Why not?
However I think that the 'consolidated @INC' benchmark shows that an awful lot of time is spent looking for files, not loading them. Your suggestion complements my trick, it doesn't replace it.
Also, it won't help with perl programs that use pure Perl modules (that use pure Perl modules, that use pure Perl modules...). You can't statically link that. And even in the case of XS based modules, many of them also have a pure Perl part.
- Static vs. Dynamic linking
- Persistent Perl
2005-12-21 16:31:30 itub [Reply]
Did you try persistent perl (search CPAN for PersistentPerl, PPerl, or SpeedyCGI). It keeps the perl process in memory, so your script only needs to be compiled/started once. The first time you run it it takes the usual startup time, but subsequent runs start blazingly fast. The only caveat is that you need to make sure that your code works properly in that environment (be careful with global variables, for example...).- Persistent Perl
2005-12-22 12:29:17 jll [Reply]
That's one of the things that got suggested at YAPC::Europe.
I did play with both pperl and SpeedyCGI. One of them didn't compile out-of-the-CPAN-box on HP-UX. Also, they open ports and it's not something you do lightly at he Agency. We have hundreds of developers trying to work together to create reliable software, some discipline is in order here. I would have needed green light from other people (not all of them Perl-friendly), I would have had to document it etc.
I also prototyped a custom solution based on named pipes. Then I discussed the situation with my manager and we agreed that the simple task at hand didn't mandate that complexity. The ksh script didn't need demonization, it was clearer and more understandable, and it ran fast enough. So we went back to the ksh solution.
Your remark, while interesting, misses the point IMO. Yes we can play tricks to make Perl outrun ksh. The problem is that we have to play tricks.
Jean-Louis- Persistent Perl
2005-12-23 12:00:40 itub [Reply]
Hey, I love Perl, but it doesn't always have to be the fastest! If ksh happens to be a better tool for a given job, I'd gladly use it.
I'd also be happy if Perl had a compiler, for the same reason of startup time. In the meantime, however, using persistent perl has been a simple enough solution for my needs (luckily, it did compile out of the box in my case).
I also have an interest in the topic of perl startup time, as I wrote at http://perlmonks.org/index.pl?node_id=476282 . I had never thought about the effect of @INC length, but once you actually sit and try to estimate the number of directories that are being read and consider typical disk read times, the startup times you observe are not that surprising.
Thanks for an interesting article that shows how oft-forgotten details can impact performance more than one would imagine. :)
- Persistent Perl
2005-12-23 11:21:00 perrin [Reply]
I wouldn't call a persistent daemon a trick. It's pretty much the route that all web development in the past several years has taken, i.e. switch from CGI to a persistent daemon in order to get rid of startup costs. It's also the recommended way to run SpamAssassin.
Putting a lot of effort into building a compiler in order to reduce startup times to less than a quarter of a second just doesn't sound worthwhile when the persistent daemon approach solves the problem already.
- Persistent Perl
- Persistent Perl



