Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^5: Simple comparison of 2 files

by Q.and (Novice)
on Jul 27, 2016 at 19:38 UTC ( [id://1168676] : note . print w/replies, xml ) Need Help??


in reply to Re^4: Simple comparison of 2 files
in thread Simple comparison of 2 files

Excellent, this is exactly the issue I was dealing with- thank you!

Replies are listed 'Best First'.
Re^6: Simple comparison of 2 files
by pryrt (Abbot) on Jul 27, 2016 at 19:43 UTC

    alternately, to avoid len(file1) x len(file2) loops,

    use autodie; use warnings; use strict; my (@data1, @data2) = (); my ($fh, $l, $n); open $fh, "<", $ARGV[0]; while(<$fh>) { ($l, $n) = split; push @data1, [ $l, $n ]; } close($fh); open $fh, "<", $ARGV[1]; while(<$fh>) { ($l, $n) = split; push @data2, [ $l, $n ]; } close($fh); foreach my $row1 ( @data1 ) { foreach my $row2 ( @data2 ) { my ($l1, $n1, $l2, $n2) = (@$row1, @$row2); my $match = $l1 eq $l2; print "$l1 from FILE1 with number $n1 and $l2 from FILE2 with +number $n2" . ($match ? '' : " DO NOT") . " match\n"; } } __END__ A from FILE1 with number 1_1 and A from FILE2 with number 2_1 match A from FILE1 with number 1_1 and B from FILE2 with number 2_2 DO NOT m +atch A from FILE1 with number 1_2 and A from FILE2 with number 2_1 match A from FILE1 with number 1_2 and B from FILE2 with number 2_2 DO NOT m +atch B from FILE1 with number 1_3 and A from FILE2 with number 2_1 DO NOT m +atch B from FILE1 with number 1_3 and B from FILE2 with number 2_2 match C from FILE1 with number 1_4 and A from FILE2 with number 2_1 DO NOT m +atch C from FILE1 with number 1_4 and B from FILE2 with number 2_2 DO NOT m +atch

    this will save a lot of time if the files are significantly larger than 4 and 2 lines, respectively. though it will end up using more memory...

      Reading file1 into memory doesn't save anything. Reading file2 into memory does. The "expensive" operation is the line by line text read of the input file. Saving the split from File1 is an idea, but not necessary since each line from File1 need only be read and dealt with once as per my code.

        Yeah, I thought of that after posting my second code... but I saw you'd already posted code like that, so didn't bother with another update/post.

        As a suggestion to Q.and, if using Marshall's code, I'd recommend picking the shorter file for "file2" and the longer file for "file1" -- it uses the least memory to store file2, but still requires only len(file1)+len(file2) line reads.