ahem, all your points were exactly what occurred to me on day 2 or so when I took closer look at hv's code/algorithm, though I understood nothing of maths involved. Indeed, looking at list "through modulo p" is looking at 2-d table with p columns. Then subsequent column extraction (and imagining a subset as table again, etc.) reduces amount of data, to inspect, exponentially. I don't know if it can be simpler or faster. I wrote some Perl, stared at it a little, and realized it's translatable to C almost line by line, my C skill would suffice. If data is not Perl array but packed bytes (e.g. 255 as undef), code is easily adjusted.
Lines marked //*/ were added only today to fix latest glitch mentioned. As I said, the whole thing was just code refactoring exercise on my part without understanding what's going on. And it certainly can be further improved. (edit: forgot END_OF_C when pasting; + better names for benchmark tests. edit 2: Aargh, sorry, it was before coffee in the morning, when I added today's //*/ lines and also "optimized" i.e. screwed bit tests in z_mask. Fixed.)
my @data = (
[ 2, [0, 1, 0, 2, 0]],
[ 2, [1, 0, 5, 0, 1]],
[ 2, [2, undef, undef, undef, 2]],
[ 3, [1, 0, 0, 1, 0, 0, 1]],
[ 2, [1, 0, 8, 0, 1]],
[ 3, [undef, 0, 0, undef, 0, 0, 1]],
[ 3, [undef, 0, 0, undef, 0, 0]],
[ 3, [0, 1, undef, undef, 0, undef]],
);
use Inline C => <<'END_OF_C';
int valid_set_c( int p, SV *a_ref ) {
AV* array = (AV *)SvRV( a_ref );
int last = av_top_index( array );
int from = 0;
int incr = 1;
int initial_mask = ( 1 << p ) - 1;
for ( int j = 0;; j ++ ) {
int z_mask = initial_mask;
int nz_cnt = 0;
int nz_col;
for( int i = from; i <= last; i += incr ) {
SV* item = *av_fetch( array, i, 0 );
if ( SvOK( item )) {
int col = i / incr % p;
if ( SvIV( item ) == j ) {
if ( nz_cnt && nz_col == col ) return 0; //*/
if ( z_mask & ( 1 << col )) {
z_mask &= ~( 1 << col );
if ( !z_mask ) return 0;
}
}
else {
if ( nz_cnt ++ ) {
if ( nz_col != col ) return 0;
}
else nz_col = col;
if ( !( z_mask & ( 1 << nz_col ))) return 0;
+ //*/
}
}
}
if ( nz_cnt <= 1 ) return 1;
from += nz_col * incr;
incr *= p;
}
}
END_OF_C
__END__
Rate valid_set valid_set_c
valid_set 35669/s -- -89%
valid_set_c 339083/s 851% --