Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Remove Duplicates!! Please Help

by kirillm (Friar)
on Jan 03, 2008 at 15:37 UTC ( [id://660217] : note . print w/replies, xml ) Need Help??


in reply to Remove Duplicates!! Please Help

A one-liner:

perl -lnae 'print unless $seen{$F[0]}++' < file2.txt > file1.txt

Update: The OP needed the last entry if there are duplicates, the above solution took the first entry. Here's an alternative solutionthat requires module Tie::Hash::Indexed to be installed:

$ perl -MTie::Hash::Indexed -lane '\ sub BEGIN {tie %seen, "Tie::Hash::Indexed"};\ sub END {print $seen{$_} for keys %seen};\ $seen{$F[0]} = $_' file2.txt > file1.txt

Replies are listed 'Best First'.
Re^2: Remove Duplicates!! Please Help
by NetWallah (Canon) on Jan 03, 2008 at 16:16 UTC
    Marginally less overhead if the data is sorted:
    perl -lane 'print unless $F[0] eq $prev;$prev=$F[0]' < file2.txt > fi +le1.txt
    Also, the OP's code handles the boundary condition for the first record, but not the last.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re^2: Remove Duplicates!! Please Help
by alexm (Chaplain) on Jan 03, 2008 at 20:12 UTC
    This one-liner would not give the same results as requested: it prints the first entry found for a given hostname, though per the example it should be printing the last one (unless there's a typo and the 1781183799.xxx11 should be 1781183799.xxxx1 as the first entry).