Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Remove Duplicates!! Please Help

by kirillm (Friar)
on Jan 03, 2008 at 15:37 UTC ( #660217=note: print w/replies, xml ) Need Help??


in reply to Remove Duplicates!! Please Help

A one-liner:

perl -lnae 'print unless $seen{$F[0]}++' < file2.txt > file1.txt

Update: The OP needed the last entry if there are duplicates, the above solution took the first entry. Here's an alternative solutionthat requires module Tie::Hash::Indexed to be installed:

$ perl -MTie::Hash::Indexed -lane '\ sub BEGIN {tie %seen, "Tie::Hash::Indexed"};\ sub END {print $seen{$_} for keys %seen};\ $seen{$F[0]} = $_' file2.txt > file1.txt

Replies are listed 'Best First'.
Re^2: Remove Duplicates!! Please Help
by NetWallah (Canon) on Jan 03, 2008 at 16:16 UTC
    Marginally less overhead if the data is sorted:
    perl -lane 'print unless $F[0] eq $prev;$prev=$F[0]' < file2.txt > fi +le1.txt
    Also, the OP's code handles the boundary condition for the first record, but not the last.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re^2: Remove Duplicates!! Please Help
by alexm (Chaplain) on Jan 03, 2008 at 20:12 UTC
    This one-liner would not give the same results as requested: it prints the first entry found for a given hostname, though per the example it should be printing the last one (unless there's a typo and the 1781183799.xxx11 should be 1781183799.xxxx1 as the first entry).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://660217]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2022-08-11 18:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?