Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Remove duplicate lines in a file

by RhetTbull (Curate)
on Nov 05, 2008 at 16:16 UTC ( [id://721710]=note: print w/replies, xml ) Need Help??


in reply to Remove duplicate lines in a file

If the duplicate records will always be grouped together, you could do something like the following to keep track of the last record you've seen. I'm assuming that the first column is the key you care about. If you really care about the first 3 columns, you'll have to modify accordingly.
use strict; use warnings; my $last = ''; while(<>) { my @columns = split; next if $columns[0] eq $last; $last = $columns[0]; print; }
If the duplicate records don't necessarily follow each other, then use a hash to determine which ones you've already seen.
use strict; use warnings; my %seen; while (<>) { my @columns = split; next if exists $seen{$columns[0]}; $seen{$columns[0]} = 1; print; }

Replies are listed 'Best First'.
Re^2: Remove duplicate lines in a file
by Anonymous Monk on Nov 05, 2008 at 17:10 UTC
    Thanks! that was great. But just a quick thought if at all I like to remove the first entry (40087) in some case? Do I need to sort the file first by the 4th column and proceed ? Or is there any better way of doing it? Once again, Thanks a lot for your reply.
      I'm not sure I understand the question. I think you're asking how to print the last entry instead of the first. This code should do that:
      use strict; use warnings; my $last_key = undef; my $last_line = <>; #get first line while(<>) { my $key = (split)[0]; if (defined $last_key && $key ne $last_key) { #new key, print the last line from the old key print $last_line; } $last_line = $_; $last_key = $key; } print $last_line; #very last entry won't get printed in the while loop

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://721710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2024-04-24 16:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found