Here's how I'd do it (for clarity, this was basically suggested in the first reply) -
code untested :
use strict;
use warnings;
use Tie::Hash::Indexed;
tie my %lines1, 'Tie::Hash::Indexed'; # gives you the ordered hash
open my $IN1, '<', "tmp12" or die "Cannot open this file: $!
+";
open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $?
+";
# step 1, cache contents of $IN1 (read the first file once)
# populate %lines1 "cache"
for my $item1 (<$IN1>) {
@tmp1 = split( /\t+/, $item1 );
$lines1{ $tmp[1] } = \@tmp1; # save full $item1 line, keyed on
+$tmp[1]
}
# step 2, iterate over contents of $IN2 / look up in %lines1 to compar
+e
open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?";
LOOKUP_AND_COMPARE:
for $item2 (@lines2) {
#chomp $item2; # not needed, see last line
my @tmp2 = split( /\,+/, $item2 );
# -- look up
if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) {
my @tmp1 = @{ $lines1{ $tmp2[0] } }; # for clarity, not act
+ually needed; can get value via "$lines1{ $tmp2[0] }->[0]"
print $OUT $tmp1[0], ",", $item2; #<-updated to fix
+ bareword from old code
last LOOKUP_AND_COMPARE;
}
}
#print $OUT "\n"; # probably don't need if you don't "chomp $it
+em2"
Additional optimizations, depending on your constraint (timeversus space):
- if time, cache the larger of the 2 files
- if space, cache the smaller of the 2 files
The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.