http://qs321.pair.com?node_id=1025809


in reply to Help creating HASH for file comparison

If you're on a Unix system, don't use Perl. Sort each file on the second column, then use the 'join' command twice to find (a) lines that are common to both files, and (b) lines in one file that aren't in the other. The first group can then be further processed as desired.

# sort the first file by the second column (zero-indexed) sort -n -t , +1 f1 >s1 # sort the second file by the second column (zero-indexed) sort -n -t , +1 f2 >s2 # print lines whose second column (one-indexed) # are common to both files join -t , -1 2 -2 2 s1 s2 # print lines in file 1 that are NOT in file 2 join -t , -1 2 -2 2 -v 1 s1 s2