comparing two files for duplicate entries

by Melly (Hermit)
on Sep 27, 2006

in reply to comparing two files for duplicate entries

I don't know how well it would scale, but I'd build a hash from the first file ($objects{'object1'}=23.12;), then scan the second file. If the object in the second file has a defined hash, then print out the hashkey and both values..

Untested code... and I'm assuming a space delim. as per your examples..

open(FILE, "file1"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ $hash{$1} = $2; } } close FILE; open(FILE, "file2"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ print "$1: $hash{$1} $2\n" if defined $hash{$1}; } }
Tom Melly,

Replies are listed 'Best First'.
comparing two files for duplicate entries
on Sep 27, 2006

    Yup, it's really just that simple (well, maybe exists rather than defined; but that's a minor nit). If your files are really, really big you probably want to use something like Berkeley_DB or one of the other DBM modules rather than reading everything into memory, but that's just an implementation detail; the basic algorithm remains the same.

comparing two files for duplicate entries
on Sep 27, 2006
    or, just slightly different:
    open(FILE1, "file1"); open(FILE2, "file2"); while(<FILE1>){ /(\S*)\s+(\S*)/; $hash{$1}=$2; } while(<FILE2>){ /(\S*)\s+(\S*)/; print "$1 $hash{$1} $2\n" if $hash{$1} }

    "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever."

    mk at perl dot org dot br

