Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: How do I find if an array has duplicate elements, if so discard it?

by syphilis (Archbishop)
on Oct 28, 2019 at 12:22 UTC ( [id://11108041]=note: print w/replies, xml ) Need Help??


in reply to How do I find if an array has duplicate elements, if so discard it?

I think the answers already given are fine for strings and IVs, but can fail where NVs (including NVs that contain certain integer values) are involved.
In the following, I demonstrate the shortcomings of some of the (flawed) techniques provided by those answers.
use strict; use warnings ; use Data::Dumper ; use List::MoreUtils; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } # @A contains pairs of duplicate values my @A = (1 << 60, 2 ** 60, 1 << 63, 2 ** 63) ; # @B contains two unique values my $approx = sqrt 2.0; my @B = ("$approx" + 0, sqrt 2.0) ; my @D = grep { !has_dups($_) } ( \@A, \@B ) ; print Dumper(\@D), "\n"; @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A,\@B); print Dumper(\@D), "\n"; my @list = @A; my @uniq = sort keys %{ { map { $_, 1 } @list } }; print Dumper(\@uniq); __END__ Outputs: $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ '1.15292150460684698e+18', '1152921504606846976', '9.22337203685477581e+18', '9223372036854775808' ];


On any perl whose nvtype is double, you'll find that the various routines used above will consider that 1.4142135623731000 (0x1.6a09e667f3be3) and sqrt 2.0 (0x1.6a09e667f3bcd) are duplicates of each other, though the hex representation demonstrates quite clearly that it's not so. (The same type of failure arises when nvtype is long double or __float128.)
OTOH, when ivsize is 8, the same routines will determine that (eg) 1 << 60 and 2 ** 60 are different values, even though they are obviously equivalent.
(Perl's whose nvtype is __float128 might not suffer this issue.)
The uniqnum() implementation in the core module List::Util is similarly afflicted - both List::Util and List::MoreUtils pass their test suites by avoiding the testing of these cases.

I believe the solutions given at How to find and remove duplicate elements from an array? are similarly flawed.

Let me know if there's an answer in either of those Q&A threads that works flawlessly - it looks to me that there isn't, but I didn't test *all* of them.

List::Util::PP::uniqnum(), which is part of List::Util::MaybeXS, is currently the most reliable detector of numeric duplicates that I found on CPAN.
AFAIK it's only flaw is that (on perls whose nvsize == 8 && ivsize == 8) the IV 18446744073709551615 (0xffffffffffffffff) and the NV 2 ** 64 (0x1p+64) are deemed to be duplicates, even though they are exact representations of 2 different values.

There's currently a pull request that will fix these and all other known issues for List::Util::uniqnum at https://github.com/Dual-Life/Scalar-List-Utils/pull/80.

Cheers,
Rob
  • Comment on Re: How do I find if an array has duplicate elements, if so discard it?
  • Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11108041]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-25 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found