Remove duplicate data from text file

peppiv has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file with many items:

Tofu
Ginseng
Marbles
Green Tea
Ginseng
IChing
etc.

How can I remove duplicate data from this file?

Thanks in advance

Uncle Peppi

Comment on Remove duplicate data from text file

Replies are listed 'Best First'.
Re: Remove duplicate data from text file by mirod (Canon) on Nov 28, 2001 at 00:24 UTC
If you are on unix and don't care about the order of the output (or want it sorted): `sort -u <file>` is your friend. Otherwise you can try: `perl -n -e 'print unless $seen{$_}++;' <file>`	[reply] [d/l] [select]
Re: Remove duplicate data from text file by dragonchild (Archbishop) on Nov 28, 2001 at 00:35 UTC
Read the file into an array Create a hash whose keys are the values in the array Create an array from the keys of the hash Print the array out to a file The details are left as an exercise for the reader. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]
Code by ggoebel (Sexton) on Nov 28, 2001 at 01:31 UTC
`my $filename = ...; local $/; open IN, $filename; $text = <IN>; @words{split /\s+/, $text} = undef; @words = keys %words;` [download]	[reply] [d/l]
Re: Remove duplicate data from text file by CharlesClarkson (Curate) on Nov 28, 2001 at 09:02 UTC
If you need to maintain the original order and delete subsequent duplicates, you could try this. `{ my %seen; my ($in_file, $out_file) = qw\| in.txt out.txt\|; open my $in_fh, $in_file or die "$in_file: $!"; open my $out_fh, '>', $out_file or die "$in_file: $!"; while ( <$in_fh> ) { print $out_fh $_ unless $seen{$_}++; } }` [download] HTH, Charles K. Clarkson Why isn't phonetic spelled the way it sounds?	[reply] [d/l]
Re: Remove duplicate data from text file by jlongino (Parson) on Nov 28, 2001 at 08:26 UTC
You could use this as well: `use strict; my %hash; my $file = 'infile.txt'; open INFILE, "<$file" \|\| die "Can't open '$file' $!\n" ; $hash{$_}++ while <INFILE>; open OUTFILE, ">$file" \|\| die "Can't open '$file' $!\n" ; print OUTFILE "$_" foreach (keys %hash);` [download] --Jim	[reply] [d/l]
Re: Remove duplicate data from text file by Dogma (Pilgrim) on Nov 28, 2001 at 00:28 UTC
Have you tried using vi? {ED: Hey this was a joke, enough with the - votes... I admit I was bad}	[reply]
Re: Re: Remove duplicate data from text file by peppiv (Curate) on Nov 28, 2001 at 00:34 UTC
Please excuse my ignorance. Is vi a command line option?	[reply]
Re: Re: Re: Remove duplicate data from text file by Dogma (Pilgrim) on Nov 28, 2001 at 00:46 UTC
(I suspect that your just joking with me now but...) Vi is by far the best (and cryptic) text editor. It is also one half of the eternal Vi vs. Emacs debate. Although Emacs is really more of a religion then a text editor. :)	[reply]

Back to Seekers of Perl Wisdom