Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^6: Simple comparison of 2 files

by pryrt (Abbot)
on Jul 27, 2016 at 19:43 UTC ( [id://1168677]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Simple comparison of 2 files
in thread Simple comparison of 2 files

alternately, to avoid len(file1) x len(file2) loops,

use autodie; use warnings; use strict; my (@data1, @data2) = (); my ($fh, $l, $n); open $fh, "<", $ARGV[0]; while(<$fh>) { ($l, $n) = split; push @data1, [ $l, $n ]; } close($fh); open $fh, "<", $ARGV[1]; while(<$fh>) { ($l, $n) = split; push @data2, [ $l, $n ]; } close($fh); foreach my $row1 ( @data1 ) { foreach my $row2 ( @data2 ) { my ($l1, $n1, $l2, $n2) = (@$row1, @$row2); my $match = $l1 eq $l2; print "$l1 from FILE1 with number $n1 and $l2 from FILE2 with +number $n2" . ($match ? '' : " DO NOT") . " match\n"; } } __END__ A from FILE1 with number 1_1 and A from FILE2 with number 2_1 match A from FILE1 with number 1_1 and B from FILE2 with number 2_2 DO NOT m +atch A from FILE1 with number 1_2 and A from FILE2 with number 2_1 match A from FILE1 with number 1_2 and B from FILE2 with number 2_2 DO NOT m +atch B from FILE1 with number 1_3 and A from FILE2 with number 2_1 DO NOT m +atch B from FILE1 with number 1_3 and B from FILE2 with number 2_2 match C from FILE1 with number 1_4 and A from FILE2 with number 2_1 DO NOT m +atch C from FILE1 with number 1_4 and B from FILE2 with number 2_2 DO NOT m +atch

this will save a lot of time if the files are significantly larger than 4 and 2 lines, respectively. though it will end up using more memory...

Replies are listed 'Best First'.
Re^7: Simple comparison of 2 files
by Marshall (Canon) on Jul 27, 2016 at 20:17 UTC
    Reading file1 into memory doesn't save anything. Reading file2 into memory does. The "expensive" operation is the line by line text read of the input file. Saving the split from File1 is an idea, but not necessary since each line from File1 need only be read and dealt with once as per my code.

      Yeah, I thought of that after posting my second code... but I saw you'd already posted code like that, so didn't bother with another update/post.

      As a suggestion to Q.and, if using Marshall's code, I'd recommend picking the shorter file for "file2" and the longer file for "file1" -- it uses the least memory to store file2, but still requires only len(file1)+len(file2) line reads.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1168677]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (1)
As of 2024-04-25 00:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found