Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: algorithm for 'best subsets'

by tall_man (Parson)
on Mar 03, 2005 at 09:01 UTC ( [id://436124]=note: print w/replies, xml ) Need Help??


in reply to algorithm for 'best subsets'

Here is my entry, making use of some nice operations from Bit::Vector::Overload. It still has some rough spots in combining the final results, but it narrows things down very quickly. No combinatoric generators are used.

Update: Revised to collect the groups much better. I believe it will now do what Roy Johnson suggested. I didn't change any hashes to arrays, though.

Timing on my machine for a 676-item test case generated by the benchmark program halley did:

4.830u 0.000s 0:08.29 58.2% 0+0k 0+0io 420pf+0w

Update2: I also tried a 17,576 item example (with 'kaaa' ... 'kzzz' and 'iaaa' .. 'izzz'). It ran for one hour to find all groups from 2 up to 4 (the maximum available in this case). The timing is consistent with O(I^2 * log K), where I is the item count and K is the keyword count.

Update3: Inner loop optimization -- better ways to test for empty sets (is_empty) and count bits in sets (Norm). Went from one hour to 54 minutes on the biggest case.

#!/usr/bin/perl use strict; use warnings; use Bit::Vector::Overload; my %items = ( a => [ qw/one six/ ], b => [ qw/two three five/ ], c => [ qw/one two five/ ], d => [ qw/one seven five/ ], e => [ qw/one two five/ ], f => [ qw/one two four seven/ ], g => [ qw/one two five/ ], h => [ qw/one two three five/ ], ); my $icount = keys %items; # Form a mapping from items to bit positions. # Collect a list of bitmaps for combination work. my $ix = 0; my %ipos; my @ilst; for my $itm ( sort keys %items ) { $ipos{$itm} = $ix; my $set1 = new Bit::Vector($icount + 1); $set1->Bit_On($ix); push @ilst, $set1; ++$ix; } my %revipos = reverse(%ipos); # Form a mapping from keywords to bit positions. my $scount = 0; my %kpos; for my $itm ( sort keys %items ) { for my $keyw ( @{ $items{ $itm }} ) { $kpos{$keyw} = $scount++ if (!exists $kpos{$keyw}); } } # Also form a reverse index for later printing. my %revkpos = reverse(%kpos); # Form bit vectors with ones in the keyword positions, # one for every item. my %keyword_vecs; my @lst1; for my $itm ( sort keys %items ) { my $set0 = new Bit::Vector($scount + 1); $keyword_vecs{$itm} = $set0; for my $keyw ( @{ $items{ $itm }} ) { $set0->Bit_On($kpos{$keyw}); } # hold both sets - items and keywords together. push @lst1,[$ilst[$ipos{$itm}], $set0]; } # Must have at least matching pairs. my @lst2; my %same_merger; # want to merge combos with common intersections. my $i; my $j; my $imax = @lst1; for ($i = 0; $i < $imax; ++$i) { for ($j = $i+1; $j < $imax; ++$j) { my $kcombo = $lst1[$i]->[1] & $lst1[$j]->[1]; next if $kcombo->is_empty(); next if $kcombo->Norm() < 2; my $k = "$kcombo"; if (exists $same_merger{$k}) { $same_merger{$k}->[0] |= $lst1[$i]->[0]; $same_merger{$k}->[0] |= $lst1[$j]->[0]; } else { $same_merger{$k} = [ ($lst1[$i]->[0] | $lst1[$j]->[0]), $kcom +bo ]; } } } for (keys %same_merger) { my $kref = $same_merger{$_}; my $icombo = $kref->[0]; my $kcombo = $kref->[1]; my @inames = @revipos{ $icombo->Index_List_Read() }; my @knames = @revkpos{ $kcombo->Index_List_Read() }; # Result could be externally sorted, or sort the lst2 array and the +n print. print scalar(@inames)," : @inames combo is @knames","\n"; push @lst2,[$icombo, $kcombo]; }

Replies are listed 'Best First'.
Re^2: algorithm for 'best subsets'
by Roy Johnson (Monsignor) on Mar 03, 2005 at 16:27 UTC
    Wow. It's certainly interesting code (that's not a backhanded compliment).

    However, I think you figured combinations that weren't what the OP was looking for. You tell us what the intersection is of each tuple of keys: for example, b and c have two and five in common. I think what the OP wanted was the intersection of a tuple of values: two and five appear together in b, c, e, g, and h.

    If I can figure out what's what in your code, I will see if it can be made to do that. Also, you can use arrays instead of hashes for revipos and revkpos; then you can use @revipos everywhere you use sort keys %items (because that's all it is).


    Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://436124]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-03-28 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found