http://qs321.pair.com?node_id=721710


in reply to Remove duplicate lines in a file

If the duplicate records will always be grouped together, you could do something like the following to keep track of the last record you've seen. I'm assuming that the first column is the key you care about. If you really care about the first 3 columns, you'll have to modify accordingly.
use strict; use warnings; my $last = ''; while(<>) { my @columns = split; next if $columns[0] eq $last; $last = $columns[0]; print; }
If the duplicate records don't necessarily follow each other, then use a hash to determine which ones you've already seen.
use strict; use warnings; my %seen; while (<>) { my @columns = split; next if exists $seen{$columns[0]}; $seen{$columns[0]} = 1; print; }

Replies are listed 'Best First'.
Re^2: Remove duplicate lines in a file
by Anonymous Monk on Nov 05, 2008 at 17:10 UTC
    Thanks! that was great. But just a quick thought if at all I like to remove the first entry (40087) in some case? Do I need to sort the file first by the 4th column and proceed ? Or is there any better way of doing it? Once again, Thanks a lot for your reply.
      I'm not sure I understand the question. I think you're asking how to print the last entry instead of the first. This code should do that:
      use strict; use warnings; my $last_key = undef; my $last_line = <>; #get first line while(<>) { my $key = (split)[0]; if (defined $last_key && $key ne $last_key) { #new key, print the last line from the old key print $last_line; } $last_line = $_; $last_key = $key; } print $last_line; #very last entry won't get printed in the while loop