http://qs321.pair.com?node_id=548760


in reply to How do I find unique Array in Array of Arrays?

In order to detect duplicates, one needs a way of testing equality. For non-scalar data structures, this can be tricky, or at least application dependent.

As a generic solution, one can simply stringify each datastructure and compare using string equality (eq). This technique probably breaks badly when any of the contents are objects or functions or other exotic beasts. For strings and numbers, it works pretty well.

Here, I use Data::Dumper for stringification:

my @a = ( ['a','b','c'], ['a','b','c'], ['a','b','d'], ['a','b','d'], ); my @b = do { use Data::Dumper; my %seen; map { $_->[0] } grep { !$seen{$_->[1]}++ } map { [ $_, Dumper($_) ] } @a };

Replies are listed 'Best First'.
Re: Answer: How do I find unique Array in Array of Array?
by TedPride (Priest) on May 12, 2006 at 06:33 UTC
    Since this stores the Data::Dumper output for each array in %seen as a key, wouldn't it be vastly inefficient in terms of memory use? It might be better to md5 your Dumper output before using it as a key:
    use strict; use warnings; use Data::Dumper; use Digest::MD5 qw/md5/; my @a = ( ['a','b','c'], ['a','b','c'], ['a','b','d'], ['a','b','d'], ); my %seen; my @b = grep { !$seen{md5 Dumper($_)}++ } @a; print Dumper(\@b);

      Good point; but as with anything, there's a time/space tradeoff, and it's the engineer's call.

      I would say that for something like your sample data, the time it takes to calculate the MD5 would not be worth it, especially given that the memory savings would be neglible.

      In really extreme cases, you'd probably want a function that could hash a complex data structure directly, rather than a stringification of it.

      We're building the house of the future together.