Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Using Perl saves time....

by szabgab (Priest)
on Jul 18, 2005 at 09:04 UTC ( [id://475678]=perlmeditation: print w/replies, xml ) Need Help??

I just had to count how many files are in a specific directory on a AIX Obviously I run

ls -1 | wc

I knew there are lots of files but after 2-3 minutes I decided this is not a joke any more so I wrote the following:

#!/usr/bin/perl -w use strict; my $dir = $ARGV[0] || "."; opendir my $dh, $dir or die "Could not open '$dir'\n"; my $c =0; $c++ while readdir $dh; print "$c\n";
It finished in under 1 sec, printed 239982 so I could kill the other process....

Thank you.

Replies are listed 'Best First'.
Re: Using Perl saves time....
by hv (Prior) on Jul 18, 2005 at 12:23 UTC

    One thing the ls does that the perl code does not is sort the output. My linux system supports ls -U to leave the list unsorted: if there are that many files this may fix the speed issue.

    Hugo

      more than one nix way to do it ;)
      'nl *'
      update removed redundant cat *

        nl - the line numbering utility takes one or more file names on its command line. So nl * cats the contents of all the files with line numers.

        46$ ls | wc 15 15 233 47$ nl * | wc 857 3540 29899 48$ ls | nl 1 Makefile 2 RCS 3 Tests 4 ... 15

        But you just want the count, not the list, so you would need to | tail -1 | cut -d' ' -f1.

        Or to be precise: ls | nl | tr -s ' ' ' ' | tr "\t" ' ' | cut -d' ' -f2

        --
        TTTATCGGTCGTTATATAGATGTTTGCA

        Hm, I think with this glob the shell will first try to put all filenames on your commandline. If you have a lot of files, the maximum size for a command line will be exceeded (this maximum can be pretty large on *nix machines, but the OP intended to count lots of files...).
Re: Using Perl saves time....
by fergal (Chaplain) on Jul 18, 2005 at 10:40 UTC
    Why did you use -l for the ls? This certainly made it slower because it had to stat every file it found. Also ls could be aliased in your shell to something funky (like one that uses different colours for differerent file types) so for a fair comparison do
    /bin/ls | wc -l
      In Using Perl saves time....:
      ls -1 | wc
      In Re: Using Perl saves time....:
      Why did you use -l for the ls?
      I think your computer is not showing you the difference between a digit one and the letter ell. The original post is an (unnecessary) digit one. You're complaining rightly about the expense of a letter ell long listing (if that was indeed the case).

      It's unnecessary because ls has two behaviors, depending on whether the output is a terminal or not (something I count as being broken, but oh well). To a terminal, it columnizes, but to a pipe or file, it's automatically one element per line (classic mode). Thank the idiots at Bezerkley for this abomination. This leads people to believe that they need to add "-1" to get one column, when in fact that's usually not necessary.

      As an example, compare "ls" with "ls | cat".

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        Oh, you are right I did not have to use -1 (one) as ls|wc counts the same as ls -1|wc

        You learn something every time...

        I didn't even think of the -1 (one) option, I thought he was using -l to make sure it was one line per file or something.

        Still it's odd that it's so slow, maybe AIX is just nasty.

        ... Thank the idiots at Bezerkley ...

        It's a joke, right?

        Dodge This!

      He used -1 which gives you one item per line, not -l which stats the files

      Of course, under most OSes, when you pipe ls into something, it implies ls -1, so it streams faster, as opposed to trying to determine how many columns to build. (I've not used AIX, so it's possible that it doesn't do this).

      I would, however, recommend using the -l flag to wc, so that it only needs to count the lines, and not the words and characters as well.

      And most ufs systems choke hard on ls when you have too many items in a directory. I saw a poorly set up system, that was rolling its log files each minute (it might've been one per transaction). The backups were failing, because it took more than 24 hrs for it to generate the listing of the directory with more than 2 million entries ... so the next backup would start running before the first one had finished.

      Why did you use -l for the ls?

      He didn't use -l; he used -1, so as to list just 1 file per line, to make the counting right. Though at least some versions of ls spot when their output is being piped and infer -1 anyway.

      Smylers

Re: Using Perl saves time....
by sh1tn (Priest) on Jul 19, 2005 at 08:08 UTC
    localhost:/usr/share/doc$ time perl -e 'print scalar(()=glob"*"),$/' 1164 real 0m0.045s user 0m0.040s sys 0m0.000s


      Your code is slow too, you need a suitably large directory to test properly-

      $ time perl -e 'print scalar(()=glob"*"),$/' 136631 real 0m4.810s user 0m3.340s sys 0m1.320s $ time perl -le 'opendir f, $ARGV[0] or die $!;++$c while readdir f; p +rint $c' . 136633 real 0m0.440s user 0m0.400s sys 0m0.040s

      --
      Murray Barton
      Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

        You are right - glob is slower than readdir.
        My point is the short one-liner. Otherwise:
        # time ls | wc -l 99999 real 0m1.850s user 0m1.590s sys 0m0.210s # time perl -e '@_ = glob"*";print$#_' 99998 real 0m1.462s user 0m0.880s sys 0m0.570s # the best alternative: # time perl -MIO::Dir -e '@_ = IO::Dir->new(".")->read;print$#_' 100000 real 0m0.680s user 0m0.570s sys 0m0.100s


Re: Using Perl saves time....
by greenFox (Vicar) on Jul 19, 2005 at 01:57 UTC

    I thought this should be a one liner. My first thought was-

    perl -le "@l = <*>; print scalar @l"

    which of course is slooow so I borrowed from your script-

    perl -le 'opendir f, $ARGV[0] or die $!;++$c while readdir f; print $c'

    I am sure the golfers could slim it down some more :)

    --
    Murray Barton
    Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

      perl -e '++$c for glob "*";print $c'

      Seems to be pretty quick, too.

      <-radiant.matrix->
      Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
      The Code that can be seen is not the true Code
      "In any sufficiently large group of people, most are idiots" - Kaa's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://475678]
Approved by Arunbear
Front-paged by ghenry
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 20:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found