The following code produces identical results to choroba's code but uses less than 1/4 of the memory (180MB vs 795MB for my test dataset) and runs more quickly:
#! perl -slw
use strict;
use List::Util qw[ first ];
my @headers = split ' ', scalar <>;
my $f = first { $headers[$_] eq 'Strand' } 0 .. $#headers;
my( $cCounts, $wCounts, $n, %index ) = ( '', '', 0 );
while( <> ) {
chomp;
my @F = split ' ';
my $index = $index{ $F[ $f+1 ] }{ $F[ $f + 2 ] } //= $n++;
++vec( $F[ $f ] eq 'w' ? $wCounts : $cCounts, $index, 8 );
}
while( my( $key, $subhash ) = each %index ) {
while( my( $subkey, $index ) = each %{ $subhash } ) {
print join "\t", $key, $subkey, vec( $cCounts, $index, 8 ), ve
+c( $wCounts, $index, 8 );
}
}
__END__
1177246.pl 1177246.dat > 1177246.out
It assumes no count will be greater than 256. If that's too small, change the three 8s to 16s for a small increase in memory consumption.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.