Re: Efficient selection mechanism?
by choroba (Cardinal) on Jan 14, 2014 at 14:36 UTC
|
Just an idea using bit vectors: create a bit vector for each small integer, 1 means the integer is present in the current quadruple. Then just do logical OR an the four vectors corresponding to the selected integers and look for zeroes.
| [reply] |
|
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use constant MAX => 20;
use constant EMPTY => "\0" x ((MAX / 8) + 1);
my @AoA = map [ map int rand 1 + MAX, 1 .. 4 ], 0 .. 1000;
my @vectors = (EMPTY) x (MAX + 1);
for my $num (0 .. MAX) {
for my $pos (0 .. $#AoA) {
vec($vectors[$num], $pos, 1) = grep($num == $_, @{ $AoA[$pos]
+}) ? 1 : 0;
}
}
print "@$_; " for @AoA;
print "\n";
my $subarray = $AoA[0];
say "Selecting: @$subarray.";
my $result = EMPTY;
for my $num (@$subarray) {
$result |= $vectors[$num];
}
for my $i (0 .. $#AoA) {
say "@{ $AoA[$i] }" unless vec($result, $i, 1);
}
Update: Runs under 0.3s for 10_000 quadruples.
| [reply] [d/l] |
Re: Efficient selection mechanism?
by Corion (Patriarch) on Jan 14, 2014 at 14:40 UTC
|
This is a brute force approach:
I would pack the four small integers as bits in a 32-bit integer instead of using an array. Then the problem reduces to scanning that array and checking each bit against the bitmask.
If you sort the 32-bit integers, you get a quick way of eliminating half of the search space by looking at the highest bit.
| [reply] |
Re: Efficient selection mechanism?
by salva (Canon) on Jan 14, 2014 at 14:41 UTC
|
Supposing the number of distinct small numbers is really going to stay small and the number of entries on the array on the thousands:
# untested!
my %bitmap;
for my $ix (0..$#AoA) {
for my $sn (@{$AoA[$ix]}) {
vec($bitmap{$sn}, $ix, 1) = 1;
}
}
my @nope = (2, 13, 3, 16);
my $bad = ''
for my $bit (@nope) {
$bad ||= $bitmap{$bit};
}
my @ok = grep { !vec($bad, $_, 1) } 0..$#AoA
That should use no more than a few KB and be O(N). | [reply] [d/l] |
Re: Efficient selection mechanism?
by davido (Cardinal) on Jan 14, 2014 at 16:56 UTC
|
my @control = (
[ 2, 13, 3, 16 ],
[ 10, 1, 11, 6 ],
);
To this...
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <-- Indicies rep
+resent original ints.
my @control = (
[ 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, ],
[ 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, ],
);
So that you can take a test array, "@test = ( 1, 4, 7, 9 )" and ask if sum(@{control[$n]}[@test]) == 0
If you turn off warnings for undefined values in addition, you won't even need to worry about placing zeros; only place '1' where needed. ;)
That becomes an O(n) solution that benefits from the fact that array slices are fast, and List::Util::sum is XS. Wrap it in a grep, and it should fly:
my @valid_row_ix = grep { sum( @{$control[$_]}[@test] ) == 0 } 0 .. $#
+control;
Update: I'm not benchmarking, but it might be useful instead to try:
my @valid_row_ix = grep { List::MoreUtils::none { $_ == 1 } @{$control
+[$_]}[@test] } 0 .. $#control;
...since it will short-circuit out of the "none" loop as soon as a 1 is detected, whereas 'sum' will look at all elements in your test array. However, the "$_==1" portion drops from XS back into a MULTICALL pure-Perl sub, which is more computationally expensive per iteration than a simple sum. Since the array you're testing seems to be small, my bet is with the 'sum' solution.
| [reply] [d/l] [select] |
Re: Efficient selection mechanism?
by ikegami (Patriarch) on Jan 14, 2014 at 17:13 UTC
|
Not indexed, but a regex should search an 8k string pretty fast.
my $AoA = pack 'C*', map @$_, @AoA;
my $omit = $AoA[0];
my $pat = '[^'.( join '', map quotemeta, pack 'C*', @$omit ).']{4}';
my $re = qr/\G(?:.{4})*?($pat)/s;
my @matches = map [ unpack 'C*', $_ ], $AoA =~ /$re/g;
The following avoids recreating the original arrays at the expense of two bytes per element of @AoA.
my $AoA = pack '(C4S)*', map { @{ $AoA[$_] }, $_ } 0..$#AoA;
my $omit = $AoA[0];
my $pat = '[^'.( join '', map quotemeta, pack 'C*', @$omit ).']{4}';
my $re = qr/\G(?:.{6})*?$pat(..)/s;
my @matches = @AoA[ map unpack 'S', $AoA =~ /$re/g ];
| [reply] [d/l] [select] |
Re: Efficient selection mechanism?
by kcott (Archbishop) on Jan 14, 2014 at 15:16 UTC
|
G'day BrowserUk,
Using your 5 subarrays, I selected the middle one ([ 0, 10, 11, 19 ]): that should find the first and last subarrays as being the only ones not containing any of the integers: 0, 10, 11, 19.
I don't believe this should consume huge amounts of memory:
#!/usr/bin/env perl -l
use strict;
use warnings;
my @AoA = (
[ 2, 13, 3, 16 ],
[ 10, 1, 11, 6 ],
[ 0, 10, 11, 19 ],
[ 6, 1, 19, 15 ],
[ 17, 6, 18, 12 ],
);
my $selected = $AoA[2];
my %unique = map { $_ => 1 } @$selected;
print "@$_" for grep { ! map { $_ ? $_ : () } @unique{@$_} } @AoA;
Output:
2 13 3 16
17 6 18 12
| [reply] [d/l] [select] |
|
Hi, this is also the approach that I was thinking to suggest. Using a hash to store the selected array is the easiest way. I do nt know whether it will be the fastest, but it will certainly be fairly fast.
| [reply] |
Re: Efficient selection mechanism?
by oiskuu (Hermit) on Jan 14, 2014 at 23:01 UTC
|
Here's a small bench to highlight the speed difference in some of the chosen strategies.
#! /usr/bin/perl -wl
use Benchmark 'cmpthese';
my (@v, @b);
@v[0 .. 19] = 'a' .. 'z';
for (1..1e6) {
my %r;
undef $r{$v[rand @v]} until keys %r == 4;
push @b, join '', keys %r;
}
my @AoA = map [map ord()-97, split //,$_], @b;
my @vectors;
for my $t (@v) {
push @vectors, pack "b*", pack "c*", map !!/$t/, @b;
}
my $AoA = pack 'C*', map @$_, @AoA;
cmpthese -5, {
bitmap => sub {
my $res = '';
$res |= $vectors[$_] for @{$AoA[0]};
int(@AoA) - unpack "%32b*", $res;
},
regex1 => sub {
my $omit = $AoA[0];
my $pat = '[^'.( join '', map quotemeta, pack 'C*', @$omit ).'
+]{4}';
my $re = qr/\G(?:.{4})*?($pat)/s;
int(() = $AoA =~ /$re/g);
},
regex2 => sub {
my $pat = qr/[$b[0]]/;
my $cnt = 0;
$cnt += !/$pat/ for @b;
$cnt;
},
};
| [reply] [d/l] |
Re: Efficient selection mechanism?
by hdb (Monsignor) on Jan 14, 2014 at 15:31 UTC
|
I'm far too late, my approach is already covered in the posts above. However.
I am translating the integers into a binary representation, pushing it onto each element (just for convenience) and the do a logical bitwise "and" to eliminate the unwanted elements:
use strict;
use warnings;
use List::Util 'sum';
use Data::Dumper;
sub asBinary { sum map { 2**$_ } @_ }
my @AoA = (
[ 2, 13, 3, 16 ],
[ 10, 1, 11, 6 ],
[ 0, 10, 11, 19 ],
[ 6, 1, 19, 15 ],
[ 17, 6, 18, 12 ],
);
push @$_, asBinary( @$_ ) for @AoA;
my $mask = asBinary( @{$AoA[0]}[0..3] );
my @good = grep { ~$mask & $_->[4] } @AoA;
print Dumper \@good;
Clearly, this has an issue if your integers get larger than 31 or 63 depending on your system.
| [reply] [d/l] |
Re: Efficient selection mechanism? (Thank you all)
by BrowserUk (Patriarch) on Jan 15, 2014 at 11:48 UTC
|
Thank you all for your suggestions.
As oiskuu demonstrated, the bit mapped index -- as suggested by Corion, Salva, Choroba & hdb -- is hands down winner in the performance stakes.
By using vec and string-wise boolean operations (Salva,Choroba) rather than numeral ops, I don't have to worry about the size of the small integers outgrowing the platform integer size, which is a slight but possible consideration.
Mixing code from various solutions, this is what I'm using:
use constant MAX => 20;
use constant BLANK => chr(0) x ((MAX / 8) + 1);
...
my $key = BLANK;
vec( $key, $_, 1 ) = 1 for @$_;
$cache{ $key } = $_;
...
my @subsel = grep{ ( $_ & $mask ) eq BLANK } keys %cache;
And that's it. Thank you all.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
I learnt the craft of vec from you here on PerlMonks.
| [reply] |
|
Sometimes, you just need to see your problem through someone else's eyes.
I'd been faffing around with multi-level hashes so that given the first quad's numbers I could use them to index down four levels to just that subset that didn't contain those four numbers. Which ought to work, but proved to be clumsy and produced a huge, unwieldy data structure.
But the real problem is that once I've found the second non-overlapping quad I then want to find a third that doesn't overlap either of the first two; then a fourth that doesn't overlap any of the first three. And that requires two more, deeper, multi-level data structures be built.
With the vec solution I just OR the masks and grep the previous subset again to produce the next.
Simple once you've seen it, but I was so locked in on my multi-level hash approach that using bitmasks didn't cross my mind.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: Efficient selection mechanism?
by Lennotoecom (Pilgrim) on Jan 14, 2014 at 19:53 UTC
|
about regexpes,as ikegami offered,
my humble imput, turn every number into a letter from a to t 0 - 19 and match the original line upon every:
$v[$i++] = $_ for 'a'..'t';
@a = qw/2 13 3 16/;
$a = join '', map {$v[$_]} @a;
#generating 1 000 000 base of numbers
for (0..1000000){
%rnd = ();
while(keys %rnd < 4){$rnd{$v[int(rand(20))]} = undef;}
push @b, join '', keys %rnd;
}
#doing search
for (@b){
$counter++ if !/[$a]/;
}
print "$counter\n";
on my pc
1 000 000 base generated in 4 seconds,
search upon it done in 1 second
around ~370 000 unique numbers. | [reply] [d/l] |
Re: Efficient selection mechanism?
by Anonymous Monk on Jan 14, 2014 at 23:48 UTC
|
| [reply] |
|
| [reply] |