Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: How do you find duplicates in a string?

by jdporter (Paladin)
on Sep 21, 2006 at 16:37 UTC ( [id://574191]=note: print w/replies, xml ) Need Help??


in reply to Re: How do you find duplicates in a string?
in thread How do you find duplicates in a string?

Admittedly, this is not a very efficient solution for any significant amount of data.

For large — increasingly large — amounts of data, it's better than the regex with backref solution (e.g. as presented by davido). Put simply, the split-hash approach is O(n), whereas the regex-backref approach is O(n2)

We're building the house of the future together.

Replies are listed 'Best First'.
Re^3: How do you find duplicates in a string?
by Anonymous Monk on Sep 21, 2006 at 16:54 UTC
    Sorry guys, my mistake.
    What I want is only to know if the string has something more than one times.
    I understand that you split the string using tab as delimiter, but what must I do to check if the array that is produced and contains all elements of the string has any duplicates in it? Just that, I don't want to know what the dulicates are, I only want to know if there are any...

      The hash approach is better for that, but I went ahead and re-implemented the regexp approach again anyway, just in case someone is interested in a pattern matching solution rather than an 'equal key' solution.

      If you wanted to continue with the regexp approach, this solution will count the number of duplicate words. I modified the RE a little so that it would count "Antony Antony Antony" as two duplicates (Antony is repeated twice after the original). "Antony Antony Hank Antony Hank Mark" would count 3: Antony has two repeats, and Hank has one.

      use warnings; use strict; my $string='LOCAL Antony 17 Antony 23 1569'; my $count; $count++ while $string =~ m/ \b([[:alpha:]]+)\b (?=.*?\b\1\b) /xg; print $count, "\n";

      Dave

      you'll need a hash to check for the dupes.
      $hash{$_}++ foreach(split /\t/,$string); foreach (keys (%hash)) {print $_."\n" if ($hash{$_} > 1 && /^\D+$/)}


      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever."

      mk at perl dot org dot br
      I don't want to know what the dulicates are, I only want to know if there are any...

      I'm not sure it's possible to know the latter without also knowing the former. At any rate, any of the solutions shown so far will do the job; just ignore what the actual duplicate values are. For example, my solution can be modified very slightly:

      use Scalar::Util qw( looks_like_number ); my %h; $h{$_}++ for split /\t/, $string; my $there_are_duplicates = grep { $h{$_}>1 and !looks_like_number($_) } sort keys %h;

      (This exploits the fact that grep returns the list of matching values in list context, and returns the number of matches in scalar context.)

      We're building the house of the future together.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://574191]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-03-29 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found