Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Memory usage while tallying instances of lines in a .txt file

by TJCooper (Beadle)
on Dec 05, 2016 at 17:25 UTC ( [id://1177250]=note: print w/replies, xml ) Need Help??


in reply to Re: Memory usage while tallying instances of lines in a .txt file
in thread Memory usage while tallying instances of lines in a .txt file

The intention is to grab $index from the headerline of the .txt file (which only appears once on line-1). It's nothing more than a set of tab-delimited headers:

Strand    Type    Pos    Length    Form    Adjustment

However it can sometimes take the form:

ID   Strand    Type    Pos    Length    Form    Adjustment

Replies are listed 'Best First'.
Re^3: Memory usage while tallying instances of lines in a .txt file
by stevieb (Canon) on Dec 05, 2016 at 17:59 UTC

    The following code does what you want, ie. "Strand" can be at any position on the first line, and it removes the extreme memory overhead of reading in the whole file at once.

    use warnings; use strict; use Data::Dumper; use List::Util qw(first); my %hits; my $index; open my $fh, '<', 'file.txt' or die $!; while (<$fh>){ chomp; my @F = split ' '; if (/Strand/){ $index = first { $F[$_] eq 'Strand' } 0..$#F; next; } if (! exists $hits{$F[$index+1]}{$F[$index+2]}) { $hits{$F[$index+1]}{$F[$index+2]}{'w'} = 0; $hits{$F[$index+1]}{$F[$index+2]}{'c'} = 0; } $hits{$F[$index+1]}{$F[$index+2]}{$F[$index]}++; } print Dumper \%hits;

    Data used:

    Strand 1 4 1 0 1 5 1 0 1 31 1 0 1 74 1 0

      I'm not sure where I was reading the entire file into memory. Shouldn't these lines of code only handle the first line of the input file (given that they occur outside of the while-loop):

      my @headers = split("\t",<$IN>); my $index = first{$headers[$_] eq 'Strand'} 0..$#headers;

      Indeed, your approach does not reduce RAM requirement.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1177250]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-04-19 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found