Re: decomposing binary matrices

At the risk of wasting your time a second time, I'm pretty sure that this method works. The method is based upon a half remembered technique for simplifying complex boolean conditions that I learnt at college and have never used since. You draw up a truth table for states and then swap whole columns or whole rows until the true values group together. I seem to recall that it didn't matter how many times you swapped things around so long as you always swapped whole columns or whole rows at a time

My POC below basically consists of sorting the matrix both horizontally and vertically, and then picking out the sets with a equal number of 'leading zeros'. The problem I had was tracking the positions the 1's came from. The proper way would be to use a matrix of anonymous arrays (or objects), each containing the row, column and boolean value. For simplicity, I used arrays of strings, replacing each '1' by the letter of it's starting column, and appending the row numbers (starting at 1) to each string. I then sort these strings, then tranforms the 'matrix' into another set of strings and sort again. AT this point you extract and retain the 'row numbers' item from the array.

What you end up with is the smallest group (least 1s/most 0s) sorted to the top. You count the number of leading zeros in group (min of the first two items) and select all the items with at least that number of leading zeros and place them into the first group. Then get the min leading zeros of the first two of the remaining items, and repeat. The final group will be the 'leftovers'.

The following shows the process step by step for the OP example.

c:\test>600418-2.pl
This input

        00CD01
        A0CDE2
        0B0D03
        AB00E4
        0BCD05
sorted

        00CD01
        0B0D03
        0BCD05
        A0CDE2
        AB00E4

Tranformed looks like this

        000AA
        0BB0B
        C0CC0
        DDDD0
        000EE
        13524
sorted

        000AA
        000EE
        0BB0B
        13524
        C0CC0
        DDDD0
These are the sets where the letters denote the columns of the origina
+l matrix
(0 mean column not used in this set).
And the numbers above, the rows they came from.

13524  ## Column numbers from the original data. Trim leading zeros.
000AA  ## Read this as; Group one contains 
000EE  ## columns A & E from rows 2 & 4.


13524  ## Leftovers group. Trim trailing columns used by earlier group
+s.
0BB0B  ## Columns B, C, & D from Rows 1, 3 & 5.
C0CC0
DDDD0
[download]

This works as is for the other examples you've supplied (see commented out data blocks).

I do remember from those far off college days that for some complex examples, it was possible for the grouping to 'wrap over the edges'. That is, if you draw the matrix on paper, and wrap it into a cylinder to bring the left and right edges together, a group could cross the boundary. The same was possible for the top & bottom edges.

I think that this would be catered for by repeating the sort/transform/sort and select process on the smaller groups, but I haven't taken it that far yet. Unless your matrices are huge, repeating the process to subdivide the smaller groups shouldn't be a problem.

#! perl -slw
use strict;

sub xform {
    my @out;
    for my $in ( @_ ) {
        $out[ $_ ] .= substr $in, $_, 1  for 0 .. length( $in ) -1;
    }
    return @out;
}

my @grid = ( "00CD01", "A0CDE2", "0B0D03", "AB00E4", "0BCD05", );
#my @grid=( "000DEF1", "000DEF2", "000DE03", "ABCDEF4", "ABCDEF5", "AB
+0DEF6", );
#my @grid=( "000DE01", "000D0F2", "0000EF3", "AB00004", "A0C0005", "AB
+00006", );
#my @grid=( "0B0D001", "000DE02", "0B00E03", "A0000F4", "A0C0005", "A0
+000F6", );

print "This input\n";
print "\t$_" for @grid;

@grid = sort @grid;
print "sorted\n";
print "\t$_" for @grid;

my @xformed = xform @grid;

print "\nTranformed looks like this\n";
print "\t$_" for @xformed;

@xformed = sort @xformed;
print "sorted\n";
print "\t$_" for @xformed;

## extra column label item.
my( $label ) = grep /1/, @xformed;
@xformed = grep !/1/, @xformed;

my @subsets;
while( @xformed ) {
    my $mask = $xformed[ 0 ] | $xformed[ 1 ];
    if( $mask =~ m[(^0+)]  ) {
        my $count = length( $1 ); ## Length of common zero prefix

        push @{ $subsets[ @subsets ] }, shift( @xformed ), shift( @xfo
+rmed );
        while( @xformed 
          and $xformed[ 0 ] =~ m[(^0+)] 
          and length( $1 ) == $count 
        ) {
            push @{ $subsets[ -1 ] }, shift @xformed;
        }
    }
    else {
        push @subsets, [];
        push @{ $subsets[ -1 ] }, shift @xformed while @xformed;
    }
}
print <<'EOS';

These are the sets where the letters denote the columns of the origina
+l matrix
(0 mean column not used in this set).
And the numbers above, the rows they came from.
EOS

print join "\n", $label, @$_, "\n" for @subsets;
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: decomposing binary matrices Select or Download Code

Replies are listed 'Best First'.
Re^2: decomposing binary matrices by hv (Prior) on Feb 16, 2007 at 22:56 UTC
I think that approach has possibilities, though it might need some modification - there is no guarantee that the row with fewest bits is separable. Consider, say: `A: 1 1 0 0 0 B: 1 0 1 1 0 C: 1 0 1 1 0 D: 1 0 1 1 0 E: 1 1 1 1 1` [download] .. from which {B, C, D} is a separable submatrix. I'd have to think further whether I can easily construct an example that defeats the approach on both rows and columns simultaneously - it'd certainly need a larger matrix (at least 7x7 I think), but I think the same general idea could throw up a proper counterexample. At the risk of wasting your time ... Not a waste by any means - many of the suggestions so far could maybe be adaptable to a full solution. It's just a question of whether the adapted version would end up any quicker than going brute force in the first place. :) Hugo	[reply] [d/l]
Re^3: decomposing binary matrices by BrowserUk (Patriarch) on Feb 16, 2007 at 23:11 UTC
I just ran that example through the code I posted (Note: My code uses letters for columns not rows as you have). The groupings produced: `23415 0000E 000BB 23415 AAAAA CCC0C DDD0D` [download] gives rows 1 & 5 (your A & E), as a group which when subtracted from the remainer leaves rows 2, 3 & 4 (your {BCD}) as a group which is what you are after? Full output from the unchanged program (except for inserting the new data (first line below)): `# my @grid = ( "AB0001","A0CD02","A0CD03","A0CD04","ABCDE5" ); c:\test>600418-2.pl This input AB0001 A0CD02 A0CD03 A0CD04 ABCDE5 sorted A0CD02 A0CD03 A0CD04 AB0001 ABCDE5 Tranformed looks like this AAAAA 000BB CCC0C DDD0D 0000E 23415 sorted 0000E 000BB 23415 AAAAA CCC0C DDD0D These are the sets where the letters denote the columns of the origina +l matrix (0 mean column not used in this set). And the numbers above, the rows they came from. 23415 0000E 000BB 23415 AAAAA CCC0C DDD0D` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: decomposing binary matrices by hv (Prior) on Feb 17, 2007 at 13:11 UTC
Sorry, I transformed the data already, and included more than necessary. Try this instead (in which, again, {B, C, D} yield a separable submatrix): `my @grid = ( "ABCD1", "A0002", "0BCD3", "0BCD4");` [download] Hugo	[reply] [d/l]


Syntactic Confectionery Delight
	PerlMonks