Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: How do you find duplicates in a string?

by ptum (Priest)
on Sep 21, 2006 at 16:30 UTC ( [id://574184]=note: print w/replies, xml ) Need Help??


in reply to How do you find duplicates in a string?

In this particular example, since the words are tab-separated, I'd probably use a brute-force approach and split the string, then step through the resulting array, incrementing a hash. Admittedly, this is not a very efficient solution for any significant amount of data.

use strict; use warnings; my $string = 'LOCAL Antony 17 Antony 23 1569'; my @tokens = split /\t/,$string; my %duphash = (); foreach (@tokens) { $duphash{$_}++; }

Now %duphash has a 1 for each unique value and something greater than 1 for any duplicates.

Update: I ignored your requirement for skipping over numbers -- adjust accordingly.

Replies are listed 'Best First'.
Re^2: How do you find duplicates in a string?
by jdporter (Paladin) on Sep 21, 2006 at 16:37 UTC
    Admittedly, this is not a very efficient solution for any significant amount of data.

    For large — increasingly large — amounts of data, it's better than the regex with backref solution (e.g. as presented by davido). Put simply, the split-hash approach is O(n), whereas the regex-backref approach is O(n2)

    We're building the house of the future together.
      Sorry guys, my mistake.
      What I want is only to know if the string has something more than one times.
      I understand that you split the string using tab as delimiter, but what must I do to check if the array that is produced and contains all elements of the string has any duplicates in it? Just that, I don't want to know what the dulicates are, I only want to know if there are any...

        The hash approach is better for that, but I went ahead and re-implemented the regexp approach again anyway, just in case someone is interested in a pattern matching solution rather than an 'equal key' solution.

        If you wanted to continue with the regexp approach, this solution will count the number of duplicate words. I modified the RE a little so that it would count "Antony Antony Antony" as two duplicates (Antony is repeated twice after the original). "Antony Antony Hank Antony Hank Mark" would count 3: Antony has two repeats, and Hank has one.

        use warnings; use strict; my $string='LOCAL Antony 17 Antony 23 1569'; my $count; $count++ while $string =~ m/ \b([[:alpha:]]+)\b (?=.*?\b\1\b) /xg; print $count, "\n";

        Dave

        you'll need a hash to check for the dupes.
        $hash{$_}++ foreach(split /\t/,$string); foreach (keys (%hash)) {print $_."\n" if ($hash{$_} > 1 && /^\D+$/)}


        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever."

        mk at perl dot org dot br
        I don't want to know what the dulicates are, I only want to know if there are any...

        I'm not sure it's possible to know the latter without also knowing the former. At any rate, any of the solutions shown so far will do the job; just ignore what the actual duplicate values are. For example, my solution can be modified very slightly:

        use Scalar::Util qw( looks_like_number ); my %h; $h{$_}++ for split /\t/, $string; my $there_are_duplicates = grep { $h{$_}>1 and !looks_like_number($_) } sort keys %h;

        (This exploits the fact that grep returns the list of matching values in list context, and returns the number of matches in scalar context.)

        We're building the house of the future together.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://574184]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found