How do you find duplicates in a string?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,
I have tried the FAQ section, but don't seem to be getting anywhere.
I have a string that can have words or numbers, each one separated with a tab, for example:

$string='LOCAL    Antony    17    Antony    23    1569';
[download]

How can I check if a string given has a word (I only check for words, not numbers) more than one time and print it?

Comment on How do you find duplicates in a string? Download Code

Replies are listed 'Best First'.
Re: How do you find duplicates in a string? by davido (Cardinal) on Sep 21, 2006 at 16:20 UTC
You can use regular expression capturing, and backreferences. `if( $string =~ m/ \b([[:alpha:]]+)\b # Match and capture word .*? # Skip what you don't need \b\1\b # Match the captured word /x ) { print $1, "\n"; }` [download] Limitation: Words can contain only alpha characters. You could modify the expression: `[[:alpha:]]` so as to include what you might consider to be legal word characters, such as ' (apostrophe) and - (hyphen). I used the /x modifier to facilitate grouping the regular expression's sub-expressions into meaningful clusters so that it's easier to read. Hope this helps! Dave	[reply] [d/l] [select]
Re: How do you find duplicates in a string? by jdporter (Paladin) on Sep 21, 2006 at 16:27 UTC
`use Scalar::Util qw( looks_like_number ); my %h; $h{$_}++ for split /\t/, $string; print "$_\n" for grep { $h{$_}>1 and !looks_like_number($_) } sort keys %h;` [download] We're building the house of the future together.	[reply] [d/l]
Re: How do you find duplicates in a string? by ptum (Priest) on Sep 21, 2006 at 16:30 UTC
In this particular example, since the words are tab-separated, I'd probably use a brute-force approach and split the string, then step through the resulting array, incrementing a hash. Admittedly, this is not a very efficient solution for any significant amount of data. `use strict; use warnings; my $string = 'LOCAL Antony 17 Antony 23 1569'; my @tokens = split /\t/,$string; my %duphash = (); foreach (@tokens) { $duphash{$_}++; }` [download] Now %duphash has a 1 for each unique value and something greater than 1 for any duplicates. Update: I ignored your requirement for skipping over numbers -- adjust accordingly.	[reply] [d/l]
Re^2: How do you find duplicates in a string? by jdporter (Paladin) on Sep 21, 2006 at 16:37 UTC
Admittedly, this is not a very efficient solution for any significant amount of data. For large — increasingly large — amounts of data, it's better than the regex with backref solution (e.g. as presented by davido). Put simply, the split-hash approach is O(n), whereas the regex-backref approach is O(n²) We're building the house of the future together.	[reply]
Re^3: How do you find duplicates in a string? by Anonymous Monk on Sep 21, 2006 at 16:54 UTC
Sorry guys, my mistake. What I want is only to know if the string has something more than one times. I understand that you split the string using tab as delimiter, but what must I do to check if the array that is produced and contains all elements of the string has any duplicates in it? Just that, I don't want to know what the dulicates are, I only want to know if there are any...	[reply]
Re^4: How do you find duplicates in a string? by davido (Cardinal) on Sep 21, 2006 at 18:50 UTC
Re^4: How do you find duplicates in a string? by mk. (Friar) on Sep 21, 2006 at 17:27 UTC
Re^4: How do you find duplicates in a string? by jdporter (Paladin) on Sep 21, 2006 at 20:17 UTC

Back to Seekers of Perl Wisdom