Efficient selection mechanism?

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient selection mechanism? by choroba (Cardinal) on Jan 14, 2014 at 14:36 UTC
Just an idea using bit vectors: create a bit vector for each small integer, 1 means the integer is present in the current quadruple. Then just do logical OR an the four vectors corresponding to the selected integers and look for zeroes. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^2: Efficient selection mechanism? by choroba (Cardinal) on Jan 14, 2014 at 15:58 UTC
Working code: #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use constant MAX => 20; use constant EMPTY => "\0" x ((MAX / 8) + 1); my @AoA = map [ map int rand 1 + MAX, 1 .. 4 ], 0 .. 1000; my @vectors = (EMPTY) x (MAX + 1); for my $num (0 .. MAX) { for my $pos (0 .. $#AoA) { vec($vectors[$num], $pos, 1) = grep($num == $_, @{ $AoA[$pos] +}) ? 1 : 0; } } print "@$_; " for @AoA; print "\n"; my $subarray = $AoA[0]; say "Selecting: @$subarray."; my $result = EMPTY; for my $num (@$subarray) { $result \|= $vectors[$num]; } for my $i (0 .. $#AoA) { say "@{ $AoA[$i] }" unless vec($result, $i, 1); } [download] Update: Runs under 0.3s for 10_000 quadruples. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re: Efficient selection mechanism? by Corion (Patriarch) on Jan 14, 2014 at 14:40 UTC
This is a brute force approach: I would pack the four small integers as bits in a 32-bit integer instead of using an array. Then the problem reduces to scanning that array and checking each bit against the bitmask. If you sort the 32-bit integers, you get a quick way of eliminating half of the search space by looking at the highest bit.	[reply]
Re: Efficient selection mechanism? by salva (Canon) on Jan 14, 2014 at 14:41 UTC
Supposing the number of distinct small numbers is really going to stay small and the number of entries on the array on the thousands: `# untested! my %bitmap; for my $ix (0..$#AoA) { for my $sn (@{$AoA[$ix]}) { vec($bitmap{$sn}, $ix, 1) = 1; } } my @nope = (2, 13, 3, 16); my $bad = '' for my $bit (@nope) { $bad \|\|= $bitmap{$bit}; } my @ok = grep { !vec($bad, $_, 1) } 0..$#AoA` [download] That should use no more than a few KB and be O(N).	[reply] [d/l]
Re: Efficient selection mechanism? by davido (Cardinal) on Jan 14, 2014 at 16:56 UTC
Convert this AoA: `my @control = ( [ 2, 13, 3, 16 ], [ 10, 1, 11, 6 ], );` [download] To this... `# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <-- Indicies rep +resent original ints. my @control = ( [ 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, ], [ 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, ], );` [download] So that you can take a test array, "`@test = ( 1, 4, 7, 9 )`" and ask if `sum(@{control[$n]}[@test]) == 0` If you turn off warnings for undefined values in addition, you won't even need to worry about placing zeros; only place '1' where needed. ;) That becomes an O(n) solution that benefits from the fact that array slices are fast, and List::Util::sum is XS. Wrap it in a grep, and it should fly: `my @valid_row_ix = grep { sum( @{$control[$_]}[@test] ) == 0 } 0 .. $# +control;` [download] Update: I'm not benchmarking, but it might be useful instead to try: `my @valid_row_ix = grep { List::MoreUtils::none { $_ == 1 } @{$control +[$_]}[@test] } 0 .. $#control;` [download] ...since it will short-circuit out of the "none" loop as soon as a 1 is detected, whereas 'sum' will look at all elements in your test array. However, the "$_==1" portion drops from XS back into a MULTICALL pure-Perl sub, which is more computationally expensive per iteration than a simple `sum`. Since the array you're testing seems to be small, my bet is with the 'sum' solution. Dave	[reply] [d/l] [select]
Re: Efficient selection mechanism? by ikegami (Patriarch) on Jan 14, 2014 at 17:13 UTC
Not indexed, but a regex should search an 8k string pretty fast. `my $AoA = pack 'C', map @$_, @AoA; my $omit = $AoA[0]; my $pat = '[^'.( join '', map quotemeta, pack 'C', @$omit ).']{4}'; my $re = qr/\G(?:.{4})?($pat)/s; my @matches = map [ unpack 'C', $_ ], $AoA =~ /$re/g;` [download] The following avoids recreating the original arrays at the expense of two bytes per element of `@AoA`. `my $AoA = pack '(C4S)', map { @{ $AoA[$_] }, $_ } 0..$#AoA; my $omit = $AoA[0]; my $pat = '[^'.( join '', map quotemeta, pack 'C', @$omit ).']{4}'; my $re = qr/\G(?:.{6})*?$pat(..)/s; my @matches = @AoA[ map unpack 'S', $AoA =~ /$re/g ];` [download]	[reply] [d/l] [select]
Re: Efficient selection mechanism? by kcott (Archbishop) on Jan 14, 2014 at 15:16 UTC
G'day BrowserUk, Using your 5 subarrays, I selected the middle one (`[ 0, 10, 11, 19 ]`): that should find the first and last subarrays as being the only ones not containing any of the integers: `0, 10, 11, 19`. I don't believe this should consume huge amounts of memory: `#!/usr/bin/env perl -l use strict; use warnings; my @AoA = ( [ 2, 13, 3, 16 ], [ 10, 1, 11, 6 ], [ 0, 10, 11, 19 ], [ 6, 1, 19, 15 ], [ 17, 6, 18, 12 ], ); my $selected = $AoA[2]; my %unique = map { $_ => 1 } @$selected; print "@$_" for grep { ! map { $_ ? $_ : () } @unique{@$_} } @AoA;` [download] Output: `2 13 3 16 17 6 18 12` [download] -- Ken	[reply] [d/l] [select]
Re^2: Efficient selection mechanism? by Laurent_R (Canon) on Jan 14, 2014 at 17:52 UTC
Hi, this is also the approach that I was thinking to suggest. Using a hash to store the selected array is the easiest way. I do nt know whether it will be the fastest, but it will certainly be fairly fast.	[reply]
Re: Efficient selection mechanism? by oiskuu (Hermit) on Jan 14, 2014 at 23:01 UTC
Here's a small bench to highlight the speed difference in some of the chosen strategies. #! /usr/bin/perl -wl use Benchmark 'cmpthese'; my (@v, @b); @v[0 .. 19] = 'a' .. 'z'; for (1..1e6) { my %r; undef $r{$v[rand @v]} until keys %r == 4; push @b, join '', keys %r; } my @AoA = map [map ord()-97, split //,$_], @b; my @vectors; for my $t (@v) { push @vectors, pack "b", pack "c", map !!/$t/, @b; } my $AoA = pack 'C', map @$_, @AoA; cmpthese -5, { bitmap => sub { my $res = ''; $res \|= $vectors[$_] for @{$AoA[0]}; int(@AoA) - unpack "%32b", $res; }, regex1 => sub { my $omit = $AoA[0]; my $pat = '[^'.( join '', map quotemeta, pack 'C', @$omit ).' +]{4}'; my $re = qr/\G(?:.{4})?($pat)/s; int(() = $AoA =~ /$re/g); }, regex2 => sub { my $pat = qr/[$b[0]]/; my $cnt = 0; $cnt += !/$pat/ for @b; $cnt; }, }; [download]	[reply] [d/l]
Re: Efficient selection mechanism? by hdb (Monsignor) on Jan 14, 2014 at 15:31 UTC
I'm far too late, my approach is already covered in the posts above. However. I am translating the integers into a binary representation, pushing it onto each element (just for convenience) and the do a logical bitwise "and" to eliminate the unwanted elements: `use strict; use warnings; use List::Util 'sum'; use Data::Dumper; sub asBinary { sum map { 2**$_ } @_ } my @AoA = ( [ 2, 13, 3, 16 ], [ 10, 1, 11, 6 ], [ 0, 10, 11, 19 ], [ 6, 1, 19, 15 ], [ 17, 6, 18, 12 ], ); push @$_, asBinary( @$_ ) for @AoA; my $mask = asBinary( @{$AoA[0]}[0..3] ); my @good = grep { ~$mask & $_->[4] } @AoA; print Dumper \@good;` [download] Clearly, this has an issue if your integers get larger than 31 or 63 depending on your system.	[reply] [d/l]
Re: Efficient selection mechanism? (Thank you all) by BrowserUk (Patriarch) on Jan 15, 2014 at 11:48 UTC
Thank you all for your suggestions. As oiskuu demonstrated, the bit mapped index -- as suggested by Corion, Salva, Choroba & hdb -- is hands down winner in the performance stakes. By using vec and string-wise boolean operations (Salva,Choroba) rather than numeral ops, I don't have to worry about the size of the small integers outgrowing the platform integer size, which is a slight but possible consideration. Mixing code from various solutions, this is what I'm using: `use constant MAX => 20; use constant BLANK => chr(0) x ((MAX / 8) + 1); ... my $key = BLANK; vec( $key, $_, 1 ) = 1 for @$_; $cache{ $key } = $_; ... my @subsel = grep{ ( $_ & $mask ) eq BLANK } keys %cache;` [download] And that's it. Thank you all. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: Efficient selection mechanism? (Thank you all) by choroba (Cardinal) on Jan 15, 2014 at 12:18 UTC
I learnt the craft of vec from you here on PerlMonks. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^3: Efficient selection mechanism? (Thank you all) by BrowserUk (Patriarch) on Jan 15, 2014 at 12:47 UTC
Sometimes, you just need to see your problem through someone else's eyes. I'd been faffing around with multi-level hashes so that given the first quad's numbers I could use them to index down four levels to just that subset that didn't contain those four numbers. Which ought to work, but proved to be clumsy and produced a huge, unwieldy data structure. But the real problem is that once I've found the second non-overlapping quad I then want to find a third that doesn't overlap either of the first two; then a fourth that doesn't overlap any of the first three. And that requires two more, deeper, multi-level data structures be built. With the vec solution I just OR the masks and grep the previous subset again to produce the next. Simple once you've seen it, but I was so locked in on my multi-level hash approach that using bitmasks didn't cross my mind. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Efficient selection mechanism? by Lennotoecom (Pilgrim) on Jan 14, 2014 at 19:53 UTC
about regexpes,as ikegami offered, my humble imput, turn every number into a letter from a to t 0 - 19 and match the original line upon every: `$v[$i++] = $_ for 'a'..'t'; @a = qw/2 13 3 16/; $a = join '', map {$v[$_]} @a; #generating 1 000 000 base of numbers for (0..1000000){ %rnd = (); while(keys %rnd < 4){$rnd{$v[int(rand(20))]} = undef;} push @b, join '', keys %rnd; } #doing search for (@b){ $counter++ if !/[$a]/; } print "$counter\n";` [download] on my pc 1 000 000 base generated in 4 seconds, search upon it done in 1 second around ~370 000 unique numbers.	[reply] [d/l]
Re: Efficient selection mechanism? by Anonymous Monk on Jan 14, 2014 at 23:48 UTC
How many is "many?"	[reply]
Re^2: Efficient selection mechanism? by BrowserUk (Patriarch) on Jan 15, 2014 at 00:05 UTC
What does it matter to you? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]


Problems? Is your data what you think it is?
	PerlMonks