hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'm designing a portal that allows a user to configure a URL and some metadata associated with it in their user account, such as depth, maximum colors, follow-offsite links, and so on.

This portal stores URLs with this metadata in several tables. One of these contains "keywords" for each URL in the system (roughly 886 URLs thus far). As the user enters a new URL not in the system, they can enter keywords in that record, so others searching, can find it. The list of keywords is comma-separated in the URL entry form.

My question is, how do I determine that a list of keywords contains duplicates, and remove them, or prompt the user to remove them? For example:

my @keywords = ('this','that','other','foo','this', 'bar'); ^dupe

Obviously this contains a dupe 'this', which should be noted and removed, or pointed out to the user.

Another example:

my @keywords = ('this','that','other','foo','THIS', 'bar'); ^uc(dupe)

Also a dupe, 'THIS', though uppercase.

I initially thought lowercasing each word parsed from the list passed, and walking the list, comparing each to the word prior to it, but I don't think that will work for longer lists of keywords.

Has anyone done this before? Any insight as to how this should be designed?


This code now appears to do what I want, thanks to all who have helped arrive at a solution.

use strict; my (@keywords, %keywords); @keywords = ('this', 'that', 'other', 'foo', 'THIS', 'bar'); @keywords{map lc,@keywords}=(); @keywords = sort keys %keywords; foreach my $word (@keywords) { print $word . "\n"; }

Also, thanks to castaway on CB, this is also in 'perldoc -q duplicate', How can I remove duplicate elements from a list or array?

Replies are listed 'Best First'.
Re: Detecting duplicate keywords passed in a form
by diotalevi (Canon) on Mar 21, 2003 at 13:12 UTC

    Use the uniqueness property of a hash to filter duplicates.

    my @keywords = keys %{{ map { lc, undef } qw(this that other foo this bar)}} ;

    Updated I changed $_ => undef to lc, undef to be correctly case-insensitive

      diotalevi's method looses the order. This maintains the order.
      my @keywords; { #closure so %temphash gets garbage collected my %temphash; foreach ( map {lc} qw(this that other foo THIS bar)) { next if exists $temphash{$_}; $temphash{$_} = undef; push @keywords, $_; } } print join " ", @keywords; ___OUTPUT____ this that other foo bar



Re: Detecting duplicate keywords passed in a form
by huguei (Scribe) on Mar 21, 2003 at 14:34 UTC
    something like
    foreach $keyword (@keywords) {
         if $dups{$keyword}>1 {
            # duplicate keyword!
    ... hugo.

      That's better written using exists() and undef values for %dups. You've written more code than you had to. There's something to be said for only writing the necessary bits.

      my @kewords = ...; my %dups; for my $keyword (@keywords) { if (exists $dups{$keyword}) { # dup } else { $dups{$keyword} = undef; } }
Re: Detecting duplicate keywords passed in a form
by dakkar (Hermit) on Mar 21, 2003 at 15:01 UTC
    @keywords{map lc,@keywords}=(); @keywords = keys %keywords;

    This maps all keywords to lowercase, puts them as keys to a hash (I believe this to be the fastest and most compact way), then extracts them again.

    Note that you would lose the order of the keywords.

            dakkar - Mobilis in mobile
Re: Detecting duplicate keywords passed in a form
by hiseldl (Priest) on Mar 21, 2003 at 14:42 UTC
    my %keywords; ... uc($word); # or lc() $keywords{$word}++; if ($keywords{$word} > 1) { # deal with a duplicate } ...

    What time is it? It's Camel Time!

(dkubb) Re: (1) Detecting duplicate keywords passed in a form
by dkubb (Deacon) on Mar 22, 2003 at 10:19 UTC

    Remove the duplicate words in a list and preserve the original order of the words:

    my %seen; my @keywords = grep ! $seen{lc()}++, qw(this that other foo THIS bar +);

    Dan Kubb, Perl Programmer

Re: Detecting duplicate keywords passed in a form
by JayBonci (Curate) on Mar 22, 2003 at 10:33 UTC
    Keeping order (but only keeping the last seen element of a string):
    #!/usr/bin/perl -w use strict; my ($i, $keywords, @finalwords); %$keywords = map { lc($_) => $i++} qw/foo bar bat hello foo HELLO/; print join " ", @finalwords = sort {$keywords->{$a} <=> $keywords->{$b +}} keys %$keywords;
    My comment here is that you can use the hash's value to preserve order as well. Comments welcome.