Detecting duplicate keywords passed in a form

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'm designing a portal that allows a user to configure a URL and some metadata associated with it in their user account, such as depth, maximum colors, follow-offsite links, and so on.

This portal stores URLs with this metadata in several tables. One of these contains "keywords" for each URL in the system (roughly 886 URLs thus far). As the user enters a new URL not in the system, they can enter keywords in that record, so others searching, can find it. The list of keywords is comma-separated in the URL entry form.

My question is, how do I determine that a list of keywords contains duplicates, and remove them, or prompt the user to remove them? For example:

   my @keywords = ('this','that','other','foo','this',              
                   'bar');                      ^dupe
[download]

Obviously this contains a dupe 'this', which should be noted and removed, or pointed out to the user.

Another example:

   my @keywords = ('this','that','other','foo','THIS',
                   'bar');                      ^uc(dupe)
[download]

Also a dupe, 'THIS', though uppercase.

I initially thought lowercasing each word parsed from the list passed, and walking the list, comparing each to the word prior to it, but I don't think that will work for longer lists of keywords.

Has anyone done this before? Any insight as to how this should be designed?

Update:

This code now appears to do what I want, thanks to all who have helped arrive at a solution.

   use strict;

   my (@keywords, %keywords);

   @keywords = ('this', 'that', 'other', 
                'foo', 'THIS', 'bar');
   @keywords{map lc,@keywords}=();
   @keywords = sort keys %keywords;

   foreach my $word (@keywords) {
           print $word . "\n";
   }
[download]

Also, thanks to castaway on CB, this is also in 'perldoc -q duplicate', How can I remove duplicate elements from a list or array?

Comment on Detecting duplicate keywords passed in a form Select or Download Code

Replies are listed 'Best First'.
Re: Detecting duplicate keywords passed in a form by diotalevi (Canon) on Mar 21, 2003 at 13:12 UTC
Use the uniqueness property of a hash to filter duplicates. `my @keywords = keys %{{ map { lc, undef } qw(this that other foo this bar)}} ;` Updated I changed `$_ => undef` to `lc, undef` to be correctly case-insensitive	[reply] [d/l] [select]
Re: Re: Detecting duplicate keywords passed in a form by flounder99 (Friar) on Mar 21, 2003 at 16:36 UTC
diotalevi's method looses the order. This maintains the order. `my @keywords; { #closure so %temphash gets garbage collected my %temphash; foreach ( map {lc} qw(this that other foo THIS bar)) { next if exists $temphash{$_}; $temphash{$_} = undef; push @keywords, $_; } } print join " ", @keywords; ___OUTPUT____ this that other foo bar` [download] -- flounder	[reply] [d/l]
Re: Detecting duplicate keywords passed in a form by huguei (Scribe) on Mar 21, 2003 at 14:34 UTC
something like foreach $keyword (@keywords) { $dups{$keyword}++; if $dups{$keyword}>1 { # duplicate keyword! ... hugo.	[reply]
Re: Re: Detecting duplicate keywords passed in a form by diotalevi (Canon) on Mar 21, 2003 at 14:42 UTC
That's better written using exists() and undef values for %dups. You've written more code than you had to. There's something to be said for only writing the necessary bits. `my @kewords = ...; my %dups; for my $keyword (@keywords) { if (exists $dups{$keyword}) { # dup } else { $dups{$keyword} = undef; } }` [download]	[reply] [d/l]
Re: Detecting duplicate keywords passed in a form by dakkar (Hermit) on Mar 21, 2003 at 15:01 UTC
`@keywords{map lc,@keywords}=(); @keywords = keys %keywords;` [download] This maps all keywords to lowercase, puts them as keys to a hash (I believe this to be the fastest and most compact way), then extracts them again. Note that you would lose the order of the keywords. -- dakkar - Mobilis in mobile	[reply] [d/l]
Re: Detecting duplicate keywords passed in a form by hiseldl (Priest) on Mar 21, 2003 at 14:42 UTC
`my %keywords; ... uc($word); # or lc() $keywords{$word}++; if ($keywords{$word} > 1) { # deal with a duplicate } ...` [download] -- hiseldl What time is it? It's Camel Time!	[reply] [d/l]
(dkubb) Re: (1) Detecting duplicate keywords passed in a form by dkubb (Deacon) on Mar 22, 2003 at 10:19 UTC
Remove the duplicate words in a list and preserve the original order of the words: `my %seen; my @keywords = grep ! $seen{lc()}++, qw(this that other foo THIS bar +);` [download] Dan Kubb, Perl Programmer	[reply] [d/l]
Re: Detecting duplicate keywords passed in a form by JayBonci (Curate) on Mar 22, 2003 at 10:33 UTC
Keeping order (but only keeping the last seen element of a string): `#!/usr/bin/perl -w use strict; my ($i, $keywords, @finalwords); %$keywords = map { lc($_) => $i++} qw/foo bar bat hello foo HELLO/; print join " ", @finalwords = sort {$keywords->{$a} <=> $keywords->{$b +}} keys %$keywords;` [download] My comment here is that you can use the hash's value to preserve order as well. Comments welcome. --jaybonci	[reply] [d/l]

Back to Seekers of Perl Wisdom