Re: Myth busted: Shell isn't always faster than Perl
by Roy Johnson (Monsignor) on Dec 30, 2005 at 18:54 UTC
|
The problem with the shell version is that it's spawning a new process for every rm. The usual practice is to use xargs in conjunction with find.
time find . -type f -print | xargs rm
I don't know how much that will affect your timings, though.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
I'd say it's a wash. Trying it several times, the best times I got were:
time perl -MFile::Find -e'finddepth sub { unlink if -f }, @ARGV' /tmp
real 0m3.111s
user 0m0.821s
sys 0m2.233s
time find /tmp -type f | xargs rm
real 0m3.312s
user 0m0.760s
sys 0m2.511s
And the varied widely - anywhere up to 5 seconds.
Remember: There's always one more bug.
| [reply] [d/l] [select] |
|
I tested my original script with the
return if -d;
unlink $_
against the golfed
unlink $_ if -f;
and the golfed -f test seems to be a bit slower. Maybe because unlink somehow gets called for each directory then stopped? Whereas the 'return if -d ' returns immediately.
BUT the improved shell with null find . -type f -print0 | xargs -0 rm
seems to win :-(
time -d-test Gtk3
real 0m0.412s
user 0m0.074s
sys 0m0.337s
time -f-test Gtk3
real 0m0.478s
user 0m0.076s
sys 0m0.388s
time find . -type f -print0 | xargs -0 rm
real 0m0.334s
user 0m0.012s
sys 0m0.321s
I'm not really a human, but I play one on earth.
flash japh
| [reply] [d/l] [select] |
|
Yeah, that brings the shell closer in speed, BUT it starts complaining AND skipping filenames with spaces in them, I believe that is why the original contruct was the way it was.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
|
find . -type f -print0 | xargs -0 rm
We're building the house of the future together.
| [reply] [d/l] [select] |
|
|
|
Re: Myth busted: Shell isn't always faster than Perl
by jdporter (Paladin) on Dec 30, 2005 at 17:50 UTC
|
sub {
-f _ or return;
unlink $_;
You really only want to unlink "regular" files; and this makes the
comparison apples-and-apples with the shell version.
Also, explicitly testing '.' and '..' is superfluous, because they'd be caught by -d.
We're building the house of the future together.
| [reply] [d/l] [select] |
|
sub { unlink if -f }
:-)
| [reply] [d/l] |
|
Yes, my reply originally looked like that; but as the OP said, you may want to do additional
things, such as reporting.
We're building the house of the future together.
| [reply] |
Re: Myth busted: Shell isn't always faster than Perl
by Perl Mouse (Chaplain) on Dec 31, 2005 at 00:40 UTC
|
I've never heard of the myth "Shell is always faster". Not that your myth busts anything - using the 'exec' option to delete one file at the time is a far from optimal solution. As pointed out, the '-print0' in combination with 'xargs' is much more efficient, as it saves spawning a gazillion processes.
I'm a bit surprised however that no-one so far as piped in the "programmer time is more costly than running time" mantra. Surely, the 2 seconds running time difference are dwarved by all the extra typing you need in your Perl solution. Or are Perl programmers cheap, and shell programmers expensive?
I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.
| [reply] |
|
I never type out a script more than once, it goes into a /bin directory in my path. "Damn it Jim, I'm a Perl hacker, NOT a typist" :-)
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
|
But if you haven't been on the system yet, you haven't had a chance to install your "delete files and leave the directory structure" program yet.
One way of doing system administration is to write a little program for every minor task you want. A small change, a different program. And then, everyone has to carry disks with their personal libraries around. Granted, it's workable.
I myself prefer the Unix/POSIX solution. Lots of small tools, that can be stacked like legos. Tools that are everywhere, like find and xargs. When I sit down at a Unix system, I can type
find . -type f -print0 | xargs rm
to delete files, and leave the directory structure as is. I don't have to remember whether I installed a program doing this for me on the box, and if I did, how it's called. And I don't need to write a new program if I want to delete all files older than a week - just add an extra option to find. (Sure, you could enhance your program that it takes all kinds of options, but if you have to type as many options to your program as to find, you might as well have used find in the first place).
I'm not a monoculturist programmer. For anything complex, I write a Perl or a C program (preferably Perl, but that isn't always available - if all you have is a few Mb of RAM and a dozen or so Mb on disk, there's no Perl, but busybox stacks a lot of goodies in just a few kb). But I don't bother writing programs for tasks that I don't do that often and that only require a few simple commands. That's not efficient.
| [reply] [d/l] |
|
|
|
|
|
I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.
Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.
Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.
Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?
---
$world=~s/war/peace/g
| [reply] |
|
I reckon youll still be fighting with the shell syntax
I can type find | xargs pipes in my sleep.
doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on
Present in the shell? They’re external binaries; which shell you’re using is irrelevant. Maybe “present on the system,” except that if find, xargs and rm are not present, that is one very broken system. And the -print0/-0 switches are available on these commands on all Unixoid systems where I cared to look.
And all that is far more likely to be around than perl, in any case.
If your portability argument concerns moving between Windows and Unix, well, I can see how someone working on Windows would prefer to always use Perl… :-)
Makeshifts last the longest.
| [reply] |
|
Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.
Bollocks. find | xargs has worked on every Unix system I've used for the last 30 years. Out of the box. In any shell, as the only 'shell' thing here is the pipe, which is universal. It has worked long before Larry released perl1.0, and it will continue to work long after perl5 will be a distant memory.
Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.
The shell solution will work on at least anything that's POSIX compliant. Will your Perl program work in perl6? How would you know - it may work on todays version of perl6, but maybe not on next weeks. As for Perl being present on the OS by default, for many OSses, it's only quite recent that their OS came with some version of perl5 installed.
Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?
So, you do everything with Perl scripts, so you're not a monoculture programmer? Interesting. What's your definition of monoculture then?
But you're right. Once you have a truck, you have no need for a bicycle. It's much easier to start up the truck and find a parking spot, just to get a newspaper from the shop around the corner. It's cheaper as well. Bicyclists are monoculture traffic participants - none of them know how to drive a car.
| [reply] |
|
|
Re: Myth busted: Shell isn't always faster than Perl
by Tanktalus (Canon) on Dec 31, 2005 at 17:37 UTC
|
zentara, try this one. I wrote this many years ago to clean up 100's of MB of source code (meaning 100's of 1000's of files) and it seems pretty fast. Way faster than rm -rf, for example. However, my goal wasn't to remove just the files, but the whole tree. I'll comment out the part that removes directories just to make it do what yours does. Granted ... this is a bit more complex. But it can't easily be duplicated in shell.
use strict;
use warnings;
$|=1;
foreach my $d (@ARGV)
{
remove_dir($d);
rmdir $d;
}
print "\nDone.\n";
sub remove_dir
{
my $d = shift;
if ( -f $d or -l $d )
{
unlink $d;
return;
}
# must be a directory?
my (@sfiles, @sdirs);
local *DIR;
opendir(DIR, $d) || do { print "Can't open $d: $!\n"; return };
foreach (readdir(DIR))
{
next if ($_ eq '.');
next if ($_ eq '..');
my $sd = "$d/$_";
if ( -l $sd ) { push(@sfiles, $sd);}
elsif ( -d $sd ) { push(@sdirs, $sd); }
else { push(@sfiles, $sd); }
}
closedir(DIR);
print ".";
# process subdirectories via fork
my $count;
foreach my $sd (@sdirs)
{
my $pid;
if ($pid = fork())
{
# parent
++$count;
}
elsif (defined $pid)
{
# child
remove_dir($sd);
exit;
}
else
{
# failure - try again in a bit
sleep 5;
redo;
}
while ($count > 2) {
wait();
$count--;
}
}
while (wait() != -1) {}
#foreach (@sdirs) {
# rmdir $_ || do {
# warn "$0: Unable to remove directory $_: $!\n";
# };
#}
my @cannot = grep {!unlink($_)} @sfiles;
if (@cannot) {
warn "$0: cannot unlink @cannot\n";
}
}
I'll also add that the difference in speed between .4s and 3s is quite negligible when compared to the amount of time it takes to remember and write them. This example above is ludicrously expensive to write, but it is something I do enough that I call it "RD" (yes, upper-case - it's too dangerous to get a short lower-case name) and put it in /usr/local/bin on all machines, all platforms, that I have access to (primarily as a symlink to a shared NFS partition). We really do use it that much ;-) | [reply] [d/l] |
Re: Myth busted: Shell isn't always faster than Perl
by itub (Priest) on Dec 30, 2005 at 19:29 UTC
|
I had never heard that myth; actually, I tend to hear the opposite. The truth is, it depends. ;-) | [reply] |
Re: Myth busted: Shell isn't always faster than Perl
by runrig (Abbot) on Jan 02, 2006 at 19:31 UTC
|
It depends on what you're doing also. Once when I rewrote a third-party utility in perl, rewriting this bit caused it to go slower in perl: grep "^function" *.4gl |
sed "s/\(.*\):function \(.*\)(.*/\2 \1 \/^function \2(/"
But the above was wrong, so I rewrote a "correct" perl version :/^\s*function\s+(\w+)\s*\(/i
# and then use hashes to save data so there's no s///
I rewrote the new perl version in shell (grep/sed) for kicks, and it was slower than the perl version (and much uglier). | [reply] [d/l] [select] |
Re: Myth busted: Shell isn't always faster than Perl
by Anonymous Monk on Dec 30, 2005 at 19:59 UTC
|
That is outstanding. We, at my company, had the same type of discussion. We had a failed bash script that we needed to fix but no one really knows bash, we are perl guys. | [reply] |
Re: Myth busted: Shell isn't always faster than Perl
by Anonymous Monk on Dec 30, 2005 at 23:51 UTC
|
Put your own inability to develop quality shell script as a defect of the shell, do it, clever plan. | [reply] |
|
Well that speaks to the point I'm making. The people who suggested the original slow shell script, are well respected and talented shell programmers. And my "run-of-the-mill" Perl script, beat it. So when someone says, "why use Perl, I can do it faster with a shell script", you better think twice; because maybe the Perl is faster. Also the optimized shell script only beat the Perl version by a nose, Considering how much more flexible the Perl script is, in processing the files as they are found, run-of-the-mill Perl is likely to be faster, than a run-of-the-mill shell, doing some equivalent task. Shell, with it's constant spawing of awk and sed, etc.; is probably harder to do at optimized speed, compared to Perl.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |