Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: comparing two files for duplicate entries

by Melly (Hermit)
on Sep 27, 2006 at 16:34 UTC ( #575190=note: print w/replies, xml ) Need Help??

in reply to comparing two files for duplicate entries

I don't know how well it would scale, but I'd build a hash from the first file ($objects{'object1'}=23.12;), then scan the second file. If the object in the second file has a defined hash, then print out the hashkey and both values..

Untested code... and I'm assuming a space delim. as per your examples..

open(FILE, "file1"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ $hash{$1} = $2; } } close FILE; open(FILE, "file2"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ print "$1: $hash{$1} $2\n" if defined $hash{$1}; } }
Tom Melly,

Replies are listed 'Best First'.
Re^2: comparing two files for duplicate entries
by Fletch (Bishop) on Sep 27, 2006 at 16:50 UTC

    Yup, it's really just that simple (well, maybe exists rather than defined; but that's a minor nit). If your files are really, really big you probably want to use something like Berkeley_DB or one of the other DBM modules rather than reading everything into memory, but that's just an implementation detail; the basic algorithm remains the same.

Re^2: comparing two files for duplicate entries
by mk. (Friar) on Sep 27, 2006 at 16:54 UTC
    or, just slightly different:
    open(FILE1, "file1"); open(FILE2, "file2"); while(<FILE1>){ /(\S*)\s+(\S*)/; $hash{$1}=$2; } while(<FILE2>){ /(\S*)\s+(\S*)/; print "$1 $hash{$1} $2\n" if $hash{$1} }

    "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever."

    mk at perl dot org dot br

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://575190]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2022-05-18 23:51 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (71 votes). Check out past polls.