http://qs321.pair.com?node_id=127894

peppiv has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file with many items:

Tofu
Ginseng
Marbles
Green Tea
Ginseng
IChing
etc.

How can I remove duplicate data from this file?

Thanks in advance

Uncle Peppi

Replies are listed 'Best First'.
Re: Remove duplicate data from text file
by mirod (Canon) on Nov 28, 2001 at 00:24 UTC

    If you are on unix and don't care about the order of the output (or want it sorted): sort -u <file> is your friend.

    Otherwise you can try:

    perl -n -e 'print unless $seen{$_}++;' <file>
Re: Remove duplicate data from text file
by dragonchild (Archbishop) on Nov 28, 2001 at 00:35 UTC
    1. Read the file into an array
    2. Create a hash whose keys are the values in the array
    3. Create an array from the keys of the hash
    4. Print the array out to a file
    The details are left as an exercise for the reader.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Code
    by ggoebel (Sexton) on Nov 28, 2001 at 01:31 UTC
      my $filename = ...; local $/; open IN, $filename; $text = <IN>; @words{split /\s+/, $text} = undef; @words = keys %words;
Re: Remove duplicate data from text file
by CharlesClarkson (Curate) on Nov 28, 2001 at 09:02 UTC

    If you need to maintain the original order and delete subsequent duplicates, you could try this.

    { my %seen; my ($in_file, $out_file) = qw| in.txt out.txt|; open my $in_fh, $in_file or die "$in_file: $!"; open my $out_fh, '>', $out_file or die "$in_file: $!"; while ( <$in_fh> ) { print $out_fh $_ unless $seen{$_}++; } }



    HTH,
    Charles K. Clarkson


    Why isn't phonetic spelled the way it sounds?
Re: Remove duplicate data from text file
by jlongino (Parson) on Nov 28, 2001 at 08:26 UTC
    You could use this as well:
    use strict; my %hash; my $file = 'infile.txt'; open INFILE, "<$file" || die "Can't open '$file' $!\n" ; $hash{$_}++ while <INFILE>; open OUTFILE, ">$file" || die "Can't open '$file' $!\n" ; print OUTFILE "$_" foreach (keys %hash);

    --Jim

Re: Remove duplicate data from text file
by Dogma (Pilgrim) on Nov 28, 2001 at 00:28 UTC
    Have you tried using vi? {ED: Hey this was a joke, enough with the - votes... I admit I was bad}
      Please excuse my ignorance.

      Is vi a command line option?
        (I suspect that your just joking with me now but...) Vi is by far the best (and cryptic) text editor. It is also one half of the eternal Vi vs. Emacs debate. Although Emacs is really more of a religion then a text editor. :)