Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Using READDIR runs out of memory

by DenairPete (Novice)
on Mar 20, 2018 at 19:06 UTC ( [id://1211363]=perlquestion: print w/replies, xml ) Need Help??

DenairPete has asked for the wisdom of the Perl Monks concerning the following question:

I need the wisdom of PerlMonks!!! I am running a script that opens a directory and puts files that end in .html into an array. The directory contains 1 million files total, with about half of them having the .html extension. When I run my script I get "Out of Memory": Here is how I am getting pushing the files into the array:

opendir(DIR, $accumulatorDir) or die "$!\n"; my @jrnFiles = map $_, grep /\.html$/, readdir DIR; closedir(DIR);

Is there another alternative I can use that is semi-efficient? Java has no problem doing this with their "java.io.File.listFiles"

Replies are listed 'Best First'.
Re: Using READDIR runs out of memory
by afoken (Chancellor) on Mar 20, 2018 at 19:21 UTC
    I am running a script that opens a directory and puts files that end in .html into an array. The directory contains 1 million files total, with about half of them having the .html extension. When I run my script I get "Out of Memory":

    I would not expect that to happen. Perl should easily handle an array containing a million records. Anyway, another aproach would be to iterate over the directory. Something like this:

    opendir my $d,$dirname or die "Could not open $dirname: $!"; while (defined (my $item=readdir $d)) { $item=~/\.html$/ or next; work_on_the_item($item); } closedir $d;

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks Alexander! I dont believe its the Array storage thats the problem. It's the READDIR

        I dont believe its the Array storage thats the problem.

        Well, I don't believe in the FSM, but it may still exist after all.

        Why don't you simply test if iterating solves the problem?

        It's the READDIR

        It's called readdir, not READDIR. And its behaviour is very different when used in scalar context instead of list context. RTFM.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Using READDIR runs out of memory
by Marshall (Canon) on Mar 20, 2018 at 20:46 UTC
    I'm not sure what you mean to do with the map?

    To get only names ending in .html, I would do this: my @jrnFiles = grep {/\.html$/} readdir DIR; Perl should be fine with 1 million files in the directory.

Re: Using READDIR runs out of memory
by Anonymous Monk on Mar 20, 2018 at 19:18 UTC
    Why are you using map to increase memory usage ? For further savings use while loop and push instead of grep

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1211363]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-25 19:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found