Re: How do you find duplicates in a string?

In this particular example, since the words are tab-separated, I'd probably use a brute-force approach and split the string, then step through the resulting array, incrementing a hash. Admittedly, this is not a very efficient solution for any significant amount of data.

use strict;
use warnings;
my $string = 'LOCAL    Antony    17    Antony    23    1569';
my @tokens = split /\t/,$string;
my %duphash = ();
foreach (@tokens) {
  $duphash{$_}++;
}
[download]

Now %duphash has a 1 for each unique value and something greater than 1 for any duplicates.

Update: I ignored your requirement for skipping over numbers -- adjust accordingly.

Comment on Re: How do you find duplicates in a string? Download Code

Replies are listed 'Best First'.
Re^2: How do you find duplicates in a string? by jdporter (Paladin) on Sep 21, 2006 at 16:37 UTC
Admittedly, this is not a very efficient solution for any significant amount of data. For large — increasingly large — amounts of data, it's better than the regex with backref solution (e.g. as presented by davido). Put simply, the split-hash approach is O(n), whereas the regex-backref approach is O(n²) We're building the house of the future together.	[reply]
Re^3: How do you find duplicates in a string? by Anonymous Monk on Sep 21, 2006 at 16:54 UTC
Sorry guys, my mistake. What I want is only to know if the string has something more than one times. I understand that you split the string using tab as delimiter, but what must I do to check if the array that is produced and contains all elements of the string has any duplicates in it? Just that, I don't want to know what the dulicates are, I only want to know if there are any...	[reply]
Re^4: How do you find duplicates in a string? by davido (Cardinal) on Sep 21, 2006 at 18:50 UTC
The hash approach is better for that, but I went ahead and re-implemented the regexp approach again anyway, just in case someone is interested in a pattern matching solution rather than an 'equal key' solution. If you wanted to continue with the regexp approach, this solution will count the number of duplicate words. I modified the RE a little so that it would count "Antony Antony Antony" as two duplicates (Antony is repeated twice after the original). "Antony Antony Hank Antony Hank Mark" would count 3: Antony has two repeats, and Hank has one. `use warnings; use strict; my $string='LOCAL Antony 17 Antony 23 1569'; my $count; $count++ while $string =~ m/ \b([[:alpha:]]+)\b (?=.*?\b\1\b) /xg; print $count, "\n";` [download] Dave	[reply] [d/l]
Re^4: How do you find duplicates in a string? by mk. (Friar) on Sep 21, 2006 at 17:27 UTC
you'll need a hash to check for the dupes. `$hash{$_}++ foreach(split /\t/,$string); foreach (keys (%hash)) {print $_."\n" if ($hash{$_} > 1 && /^\D+$/)}` [download] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever." mk at perl dot org dot br	[reply] [d/l]
Re^4: How do you find duplicates in a string? by jdporter (Paladin) on Sep 21, 2006 at 20:17 UTC
I don't want to know what the dulicates are, I only want to know if there are any... I'm not sure it's possible to know the latter without also knowing the former. At any rate, any of the solutions shown so far will do the job; just ignore what the actual duplicate values are. For example, my solution can be modified very slightly: `use Scalar::Util qw( looks_like_number ); my %h; $h{$_}++ for split /\t/, $string; my $there_are_duplicates = grep { $h{$_}>1 and !looks_like_number($_) } sort keys %h;` [download] (This exploits the fact that grep returns the list of matching values in list context, and returns the number of matches in scalar context.) We're building the house of the future together.	[reply] [d/l]


No such thing as a small change
	PerlMonks