comment on

I'm coming in a bit late to the party, but I think I've come up with an elegant and fast solution. I was tinkering around with the w() function from Weird number generator to see about different approaches. (that function, in case you all are curious will determine for a set of natural numbers N and a sum S, is there any subset of numbers in N that will add up to S). It's highly recursive and extremely golfed and really powerful, so be careful if you try to read it.

Anyway, my original approach to it was to generate powersets and sum them, but that gets out of control ridiculously fast with memory requirements. Yesterday, [id://Limbic~Region] msg'ed me and said he may monkey around with optimizing it, and I, hypercompetitive asshole that I am, started investigating it as well. While in process, I came up with a nifty powerset generator (that I'll probably put over in snippets), and modified it to solve this problem here.

Constraints on my version - it's all bitwise and binary, so it only handles sets up to 32 elements (or 64 if you're lucky), so it would require modification to handle larger sets. Also, it doesn't try to guess in advance which powersets it should and should not produce. It's based upon an iterator that generates sets as it goes along. If a set matches the condition, you should flag it so it knows not to bother generating any of the powersets of that subset.

It uses the simple concept of counting in binary to generate powersets, and this one starts at the high set (all elements in) and counts its way down. This way it should hit most (all?) "big" sets before hitting smaller subsets. I don't know if I can actually prove that's the case, but I think it is.

Since we're just counting, each set has a numeric index between 0 and 2ⁿ - 1. The assumption is, as we're going along, you can flag a set's as matching the condition. Then, any subsets of that set will not be generated.

First, the code.


# returns _2_ closures to generate certain powersets

sub limbic_power_generator {
  my $set = shift;
  
  #we start with the original set and count down to the null set
  my $set_idx = 2 ** @$set;
  
  #these are the set indexes we should skip
  my %skippers = ();
  
  # our first closure generates subsets
  my $generator = sub {
    #bow out if we're out of sets
    return () unless $set_idx;
    
    # Start decrementing the set_idx. We always do this at least once,
+ so
    # we get to the next set. Our original number is 2 ** @set, so we 
+start 
    # at 2 ** @set - 1 after the decrement
    while (1) {
      $set_idx--;
      
      #boolean to see if we should break out of this loop
      my $should_skip = 0;
      
      # here's the slick binary logic. Iterate over each superset we
      # should skip. if our current set_idx & (binary and) the skip se
+t is equal
      # to the set_idx we're at, then we know that we have a subset of
+ that 
      # skip set. So we skip this set. $should_skip is set to 1, which
+ means
      # we'll stay in our while loop and decrement down to the next se
+t.
      foreach my $skip (keys %skippers) {

        if (($skip & $set_idx) == $set_idx) {

          $should_skip = 1;
          last;
          
        }

      }
      #bow out if this set is NOT a subset of any set we're skipping
      last unless $should_skip;
      #bow out of the function completely with the null set if we've h
+it 0.
      return () unless $set_idx;
    }

    # Now we convert our set_idx to binary. Each bit stores whether th
+e element 
    # is in this subset. For example, set_idx 11 would be 1011, so we 
+keep
    # elements 0, 2, and 3.
    my @in_set = reverse split //, unpack("B*", pack("N", $set_idx));

  # now we return a list. The first element is an arrayref which is th
+e actual
  # subset we generated, the second is our set_idx.
  # we reverse it to make our lives easier, so don't be surprised when
+ the
  # counting gets out of order. Order is irrelevant in this case, anyw
+ay.
    return ([map { $set->[$_] } grep { $in_set[$_] } (0..$#$set)], $se
+t_idx);
  };
  
  # our second closure allows you to add sets to skip
  my $skipper = sub { $skippers{$_[0]}++ };
  
  # return both of our closures.
  return ($generator, $skipper)
}

#we'll use Limbic~Region's example set.
my $limbic_set = [qw(A B C D)];

#create our iterator and skipper
my ($limbic_iterator, $limbic_skipper) = limbic_power_generator($limbi
+c_set);

#and start cruising over our powersets.
while ( my ($set, $idx) = $limbic_iterator->() ) {
  #fancy crap to get it to print out properly.
  my $display = {map {$_ => 1} @$set};
  printf("%2s%2s%2s%2s (%d)\n", (map {$display->{$_} ? $_ : ' '}
    @$limbic_set), $idx);

  # now here's the trick, something here will determine a condition wh
+ether or
  # not this subset matches the search parameters. Let's say, for sake
+ of
  # argument, that set_idx 7 (ABC) matches. We'll just set it here.
  
  $limbic_skipper->(7);
  # that will prevent sets (AB, AC, A, BC, B, C) from printing out.
}
[download]

The slick binary logic deserves explanation. let's assume that set ABC (1110) is a valid set that meets the condition. Set BC (0110) may meet it. To see if BC is a subset of ABC, just binary and them together. You should end up with the set you're testing (1110 & 0110 = 0110). If you do, it's a subset and you can skip. If not, it's not a subset, so continue with it.

To try and help illustrate, here's a graphical representation of the order in which the powersets get generated. Each row is the order in which the sets are generated (ABCD first, ABC second, ABD third,etc). Each column represents a subset (excepting that everything shoul be under ABCD). So you can see that ABC is generated (idx 14), and if it matches the condition, then it will skip over everything else in that column (AB, AC, A, BC, B, C).

Note that the sets are repeated under each bucket in which they would match. Technically, there would be additional columns off to set side for (AB), which A and B underneath it, but the diagram was getting busy as is. Note that each subset is only generated once (A) is not created 4x, it's just repeated in each column that has it as a subset.

Whoops- These subsets are actually backwards relative to how they're actually generated (its ABCD, BCD; not ABC, ABC) because of the reversal of the binary digits. I didn't realize that until after I'd spent the time building up the spiffy diagram and didn't want to re-create it with the proper order. The concept is the same, just the sets are produced in a slightly different order.

ABCD|
ABC |ABC |
AB D|    |AB D
AB  |AB  |AB
A CD|    |    |A CD
A C |A C |    |A C
A  D|    |A  D|
A   |A   |A   |A
 BCD|    |    |    | BCD
 BC | BC |    |    | BC
 B D|    | B D|    | B D
 B  | B  | B  |    | B
  CD|    |    |  CD|  CD
  C |  C |    |  C |  C
   D|    |   D|   D|   D
()
[download]

And there you have it. It's lightning fast, and memory efficient. For each subset that matches, you only need to store a single integer to skip over generation of its powerset. I guess the algorithm is O(n²) (or is it O(n log n)? I always goof those up), but that makes it sound scarier than it is - you need to iterate over each set index to see if you should skip it, but at each spot you're doing m bitwise ands for each set you've already determined you should skip. So say you know you're skipping 5 sets, that's at most 5 bitwise ands for each possible set index. Should be pretty cheap.

In reply to Re: Powerset short-circuit optimization by jimt
in thread Powerset short-circuit optimization by Limbic~Region

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Welcome to the Monastery
	PerlMonks