jbrugger has asked for the wisdom of the Perl Monks concerning the following question:
Hi all.
I've got the following question:
I've got n arrays, that all contain data.
I want to return a new array, containing only the data, that exist in all n arrays mentioned above.
(i hope i'm clear in what i want)
Right now, my following solution works, but i wonder if there is a more clever and fast way to do this:
# If my datasets contain
# @1 = 1,2,3,4,5
# @2 = 2,3,6,7,8,8,8,9,10
# @3 = 5,6,2,3
#
# the result should be: 2,3
#
my %intersection;
my @result;
foreach my $dataArray(@AllArraysIHave) {
my %lookuptable;
foreach(@{$dataArray}) {
if (!$lookuptable{$_}) {
$lookuptable{$_} = 1;
$intersection{$_}++;
}
}
}
my $amnt = scalar(@AllArraysIHave);
foreach my $key (keys %intersection) {
if ($intersection{$key} == $amnt) {
push @result, $key;
}
}
return \@result;
*update* Thanks for all the fast replies lima1, indeed, it has to work for duplicate items as well. (changed the example a little.)
"We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
Re: Getting the intersection of n collections.
by basje (Beadle) on Oct 11, 2006 at 12:30 UTC
|
this also works with duplicates
my $sets = [
[1,2,3,3,3,3,4,5,6,7,8,9,10],
[1,1,1,2,6,7],
[3,4,5,5,5,5,6,6,6,6,7,8,9],
];
my %hash = map { $_, 1 } @{$sets->[0]};
for(my $i=1; $i<scalar(@{$sets});$i++) {
%hash = map { $_, 1 } grep {%hash->{$_}} @{$sets->[$i]};
}
return (keys(%hash));
| [reply] [d/l] |
|
I think you may have a typo in your solution. The grep {%hash->{$_}} should perhaps be grep {$hash{$_}}.Nice solution, though. Cheers, JohnGG
| [reply] [d/l] [select] |
Re: Getting the intersection of n collections.
by dtr (Scribe) on Oct 11, 2006 at 11:59 UTC
|
Assuming that each number can only appear once in each list, then the following should work:-
my %counts = ();
foreach my $val (@d1, @d2, @d3) {
$counts{$val}++;
}
return grep { $counts{$_} == 3 } sort keys %counts;
| [reply] [d/l] |
Re: Getting the intersection of n collections.
by jdporter (Paladin) on Oct 11, 2006 at 14:17 UTC
|
I point out that solutions which perform set unions pair-wise, iteratively on the set of lists do not scale well. Consider the worst-case scenario, where all of the lists are identical, and there's a million of them.
A better approach is the counting one, like that suggested by dtr, but uniqifying each list by converting it to its corresponding set hash, as basje did.
my %counts;
for my $ar ( @arrays )
{
my %set;
@set{ @$ar } = (); # this is faster.
$counts{$_}++ for keys %set;
}
my @union = grep { $counts{$_} == @arrays } keys %counts;
We're building the house of the future together.
| [reply] [d/l] |
Re: Getting the intersection of n collections.
by japhy (Canon) on Oct 11, 2006 at 12:08 UTC
|
I'd probably start by setting one array as the resulting intersection list, and comparing each array to that one.
my @arrays = ([...], [...], [...]);
my %tmp_int;
# assume intersection is entirety of first array
# and set each "frequency" to the number of arrays
@tmp_int{@{ $arrays[0] }) = (@arrays+0) x @{ $arrays[0] };
# then check the other arrays against it
for my $set (@arrays[1..$#arrays]) {
my %seen;
$tmp_int{$_}-- for grep { !$seen{$_} } @$set;
}
# the ACTUAL intersection is all the
# keys in the hash whose value is 1
my @intersection = grep { $tmp_int{$_} == 1 } keys %tmp_int;
| [reply] [d/l] |
Re: Getting the intersection of n collections.
by lima1 (Curate) on Oct 11, 2006 at 12:10 UTC
|
If the array may NOT have the same value multiple times, then this (intersect is taken and slightly modified from the perl cookbook-one must be careful these days ;) ) should work:
my @a = ( [ 1,2,3,4,5] ,
[2,3,6,7,8,9,10],
[5,6,2,3], );
my @b = @{$a[0]};
sub intersect{
my (%union, %isect);
foreach my $e (@_) { $union{$e}++ && $isect{$e}++ }
return keys %isect
}
foreach my $a_ref (@a) {
@b = intersect(@$a_ref, @b);
}
UPDATE: only works for unduplicated items! | [reply] [d/l] |
Re: Getting the intersection of n collections.
by GrandFather (Saint) on Oct 12, 2006 at 07:59 UTC
|
use strict;
use warnings;
use List::Compare;
my @AllArraysIHave = ([1,2,3,4,5], [2,3,6,7,8,8,8,9,10], [5,6,2,3]);
my $lc = List::Compare->new('-u', @AllArraysIHave);
my @result = $lc->get_intersection;
print "@result";
Prints:
3 2
Update: speed is disapointing though (listCompareF use the functional interface):
manual: 3 2
listCompareF: 2 3
listCompare: 3 2
Rate listCompare listCompareF manual
listCompare 1128/s -- -70% -93%
listCompareF 3822/s 239% -- -78%
manual 17051/s 1411% 346% --
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
That's because List::Compare calculates lots of things you aren't interested in. It's doing them from new.
| [reply] |
Re: Getting the intersection of n collections.
by Anonymous Monk on Oct 12, 2006 at 07:43 UTC
|
Untested (should work with duplicates):
my $mask = "";
my %seen;
for(my $i=0; $i<@AllArraysIHave; ++$i) {
vec($mask, $i, 1) = 1;
foreach my $item (@{$AllArraysIHave[$i]}) {
$seen{$item} = "" unless exists($seen{$item});
vec($seen{$item}, $i, 1) = 1;
}
}
my @all_there = grep $seen{$_} eq $mask, keys(%seen);
| [reply] [d/l] |
|
|