Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Out of Memory Error : V-Lookup on Large Sized TEXT File

by thargas (Deacon)
on Apr 24, 2015 at 17:52 UTC ( [id://1124585]=note: print w/replies, xml ) Need Help??


in reply to Out of Memory Error : V-Lookup on Large Sized TEXT File

Rather than read $LARGEFILE once for each line in $REFFILELIST, wouldn't it be more efficient to read it once and check each line against each line of $REFFILELIST?

Something like:

open (FILE, $ReferenceFilePath) or die "Can't open file"; chomp (@REFFILELIST = (<FILE>)); close(FILE); open OUTFILE, ">$OUTPUTFILE" or die $!; open (LARGEFILE, $LARGESIZEDFILE) or die "Can't open File"; while (<LARGEFILE>) { foreach my $line (@REFFILELIST) { print OUTFILE $_ if (index($_, $line); } } close(LARGEFILE); close(OUTFILE);

N.B. untested since the original is incomplete and doesn't provide any data.

Replies are listed 'Best First'.
Re^2: Out of Memory Error : V-Lookup on Large Sized TEXT File
by lonewolf28 (Beadle) on Apr 24, 2015 at 22:41 UTC

    Hi, With a limited information given i have put together a script. Maybe you can use it to improve yours.

    use strict; use warnings; open( my $fh, '<', "input.txt" ) or die "Cannot open input file: $!"; chomp ( my @input_data = <$fh> ); close($fh); open( my $frh, '<', "reference.txt" ) or die "Cannot open reference fi +le: $!"; chomp ( my @ref_data = <$frh> ); close ($frh); my @output = map { my $value = $_; grep { $value eq $_ } @ref_data; } @input_data; open ( my $wh, '>', "output.txt" ) or die ( "Cannot open the output fi +le. $!"); print {$wh} $_ for @output; close($wh);
Re^2: Out of Memory Error : V-Lookup on Large Sized TEXT File
by marinersk (Priest) on Apr 25, 2015 at 03:02 UTC

    Oh, sheesh, thargas -- your post made me realize I'd missed something basic in the original post. The first file he opens isn't the list of files -- it's the list of strings.

    On a gut I'd say he's buffering a Cartesian Product of lines per file x lines in REFFILE. Can't prove it without the actual source code -- but it sure would fit the memory consumption pattern being presented.

    This only enhances what everyone has been saying -- post the actual code, not this mock-up of it -- there's something structurally wrong and we'll need to see the steel to find the rust.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1124585]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-23 06:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found