Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How do I find if an array has duplicate elements, if so discard it?

by toddprof (Initiate)
on Jul 13, 2002 at 21:33 UTC ( [id://181532]=perlquestion: print w/replies, xml ) Need Help??

toddprof has asked for the wisdom of the Perl Monks concerning the following question: (arrays)

@A = (1, 2, 3, 4, 5) ; @B = (6, 7, 7, 8, 9) ; @C = (10, 11, 12, 13, 15) ; @D = ( @A , @B , @C );
Since @B has duplicates (7,7) It should not appear in @D. In the end @D should contain only sets @A and @C

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How do I find if an array has duplicate elements, if so discard it?
by DamnDirtyApe (Curate) on Jul 13, 2002 at 22:12 UTC
    use strict ; use warnings ; use Data::Dumper ; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep { !has_dups($_) } ( \@A, \@B, \@C ) ; print Dumper( \@D ) ;
      I took a look at this and wondered what on earth it was trying to do. Then I reread the question. This answer does try to provide a meaningful example of detecting an array with duplicate elements and discarding it. But I have a feeling the questioner meant to say "if so discard" them.

      If you are downvoting this for other than lack of interest, thanks for sending me a message or reply to let me know why so I can do better (or just stay quiet) next time.

Re: How do I find if an array has duplicate elements, if so discard it?
by gba (Initiate) on Jul 15, 2002 at 01:45 UTC
    this is some code i applied to an array:
    @legit is an array of elements
    @uniq is the resulting array of unique elements.
    foreach(@legit) { unless($b{$_}++) { push(@uniq,$_); } }
Re: How do I find if an array has duplicate elements, if so discard it?
by maraist (Initiate) on Jul 14, 2002 at 15:14 UTC
    I think, as other's have suggested that this is largely dependent on your data structures.
    e.g. Are we talking about small amounts of data run not so oftenly.
    Or are we talking about massive amounts of data that is rarely updated, or massive data that's updated regularly.
    Here's a chart:
    -----------
    
    small data, unoften updated, order not important
    # want simplicity
    sub remove_redundant {
      my %hdata = map { ($_, 1 ) } @_;
      return keys %hdata;
    }
    
    Note that you might just want to save things in a hash to begin with (and use "keys %data" whenever you want it as an array).

    ------------
    # small data, often update (more reads than writes)
    Keep in a hash and convert to a temp array when needed.
    
    %data{$val} = 1;
    
    for my $el ( keys %data ) { ... }
    
    You can even create an overloaded array object which is really just a hash.
    
    ------------
    # for large data that's rarely updated
    push @data, $val if ! grep { $_ eq $val } @data;
    
    
    It's a linear search, but if we're talking megs of data here, this is MUCH better than building a
    HUGE intermediate hash, then having to garbage collect it afterwards.
    -----------
    # for large data that's often updated. (more reads than writes)
    
    It's worth while building an overloaded class that stores the data as a hash..
    Ideally just use a hash (except that that'll waste a lot of memory).
    It's better to waste this memory up front than to fragement your dynamic memory
    pool by constantly generating intermediate memory scratch pads.

    There is one final solution, and that is to maintain sorted data, then implement a
    c-function to efficiently perform an insertion function. You can either use a red-black tree or
    an insertion sort. You could get very creative, and it would obviously be of general
    utilization (since you're storing scalars). There might be CPAN modules for this already. The red-black tree is a good compromise
    between performance and memory use whereas the insertion sort will give you the best
    memory utilization.
    =-=-=-=-=-=-==-
    One possible mechanis for using a hash for the array is the following:
    
    our %hash_cache;
    sub hash_array_push {
      my ( $hash_name, @vals ) = @_;
      for my $val ( @vals ) {
        $hash_cache{$hash_name}{$val} = 1;
      }
    }
    
    sub hash_array_del {
      my ($hash_name) = @_;
      delete $hash_cache{$hash_name};
    }
    
    sub hash_array_getref {
      my ( $hash_name) = @_;
      # note that the user can more efficiently
      # utilize an array-ref than an array.
      return $hash_cache{$hash_name};
    }
    
    Yes there's an additional function call overhead for these funcs, but you'd have that with just
    about any solution aside from manually maintaining the hash. Alternatively you could use it as oo, but OO in perl adds a slightly greater amount of
    function-call overhead (which is probably offset by the lack of need to do the symbolic
    hash-table name lookup).
Re: How do I find if an array has duplicate elements, if so discard it?
by hossman (Prior) on May 30, 2003 at 00:22 UTC
Re: Clarify: How do I find if an array has duplicate elements, if so discard it?
by BrowserUk (Patriarch) on Jul 14, 2002 at 03:18 UTC

    Depending how your building your data structure, it might make more sense to only add an array to @D, if that array has no duplicates. It being easier to not add it than to later remove it.

Re: How do I find if an array has duplicate elements, if so discard it?
by syphilis (Archbishop) on Oct 28, 2019 at 12:22 UTC
    I think the answers already given are fine for strings and IVs, but can fail where NVs (including NVs that contain certain integer values) are involved.
    In the following, I demonstrate the shortcomings of some of the (flawed) techniques provided by those answers.
    use strict; use warnings ; use Data::Dumper ; use List::MoreUtils; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } # @A contains pairs of duplicate values my @A = (1 << 60, 2 ** 60, 1 << 63, 2 ** 63) ; # @B contains two unique values my $approx = sqrt 2.0; my @B = ("$approx" + 0, sqrt 2.0) ; my @D = grep { !has_dups($_) } ( \@A, \@B ) ; print Dumper(\@D), "\n"; @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A,\@B); print Dumper(\@D), "\n"; my @list = @A; my @uniq = sort keys %{ { map { $_, 1 } @list } }; print Dumper(\@uniq); __END__ Outputs: $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ '1.15292150460684698e+18', '1152921504606846976', '9.22337203685477581e+18', '9223372036854775808' ];


    On any perl whose nvtype is double, you'll find that the various routines used above will consider that 1.4142135623731000 (0x1.6a09e667f3be3) and sqrt 2.0 (0x1.6a09e667f3bcd) are duplicates of each other, though the hex representation demonstrates quite clearly that it's not so. (The same type of failure arises when nvtype is long double or __float128.)
    OTOH, when ivsize is 8, the same routines will determine that (eg) 1 << 60 and 2 ** 60 are different values, even though they are obviously equivalent.
    (Perl's whose nvtype is __float128 might not suffer this issue.)
    The uniqnum() implementation in the core module List::Util is similarly afflicted - both List::Util and List::MoreUtils pass their test suites by avoiding the testing of these cases.

    I believe the solutions given at How to find and remove duplicate elements from an array? are similarly flawed.

    Let me know if there's an answer in either of those Q&A threads that works flawlessly - it looks to me that there isn't, but I didn't test *all* of them.

    List::Util::PP::uniqnum(), which is part of List::Util::MaybeXS, is currently the most reliable detector of numeric duplicates that I found on CPAN.
    AFAIK it's only flaw is that (on perls whose nvsize == 8 && ivsize == 8) the IV 18446744073709551615 (0xffffffffffffffff) and the NV 2 ** 64 (0x1p+64) are deemed to be duplicates, even though they are exact representations of 2 different values.

    There's currently a pull request that will fix these and all other known issues for List::Util::uniqnum at https://github.com/Dual-Life/Scalar-List-Utils/pull/80.

    Cheers,
    Rob
Re: How do I find if an array has duplicate elements, if so discard it?
by rjimlad (Acolyte) on Jul 19, 2002 at 20:18 UTC
    Err, okay, my answer, short and sweet: use a hash and array slices:
    my @a=(1 .. 2); my @b=(2 .. 3); my %c; @c{@a,@b}=(@a,@b); warn "Duplicates exist!" if scalar @c{@a,@b} != (@a+@b);
    Of course, in this instance you don't want the last line - instead you'd probably want something like:
    my @d=keys %c;
    ...which will be out-of-order, but guaranteed duplicate free.
Re: How do I find if an array has duplicate elements, if so discard it?
by snoopy (Curate) on Sep 27, 2005 at 23:02 UTC
    use YAML; use List::MoreUtils; my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A, \@B, \@C); print YAML::Dump(@D);
Re: Clarify: How do I find if an array has duplicate elements, if so discard it?
by toddprof (Initiate) on Jul 13, 2002 at 21:35 UTC
    I meant:
    @A = (1, 2, 3, 4, 5) ;
    @B = (6, 7, 7, 8, 9) ;
    @C = (10, 11, 12, 13, 15) ;
    @D = @A + @B + @C ;
    # here is where if array has duplicate elements (since @B does) then you would want to discard @B:
    foreach $element (@D) { print $element, "\n" ;
    Results should be this since @B has duplicate elements:
    1, 2, 3, 4, 5
    10, 11, 12, 13, 15

    Originally posted as a Categorized Answer.

Re: How do I find if an array has duplicate elements, if so discard it?
by Anonymous Monk on Jun 19, 2004 at 10:01 UTC
    duplicate(k, n) { int i for(i=o;i<n;i++) { if(a[i]==a[i+1]) { i++; } else i++; } }
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How do I find if an array has duplicate elements, if so discard it?
by perladdict (Chaplain) on May 11, 2006 at 09:13 UTC

    hi monk,

    i am beginar to perl,i tried to remove the duplicates from a one dimentional array.Here is my code

    #!/usr/bin/perl -w @a=(1,2,3,4,5,5,6,3,7); undef %raw; @b=grep(!$raw{$_}++,@a); print "@b \n";

    Originally posted as a Categorized Answer.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://181532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 17:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found