Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Shouldn't you have MIN instead of MAX? You are, after all, interested in the things with the most in common.

Jumping on tall_man's idea to use Bit::Vector::Overload (and shamelessly stealing your data generator), here's a new solution. It's reasonably quick (about 15x faster than yours on my slow machine, though a chunk of the difference is printing time) to generate all the tuples and spit them out, nicely ordered by cardinality.

There is much less output, because only tuples that actually represent the intersection of some pair of elements are included. When such a tuple is found, then the rest of the elements are checked to see if they should be included with it, so that the list for the tuple is complete.

use strict; use warnings; use Bit::Vector::Overload; use List::Util 'shuffle'; my $MIN = 3; ## Some keywords my @keywords = qw[ zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen ]; ## Generate some test data my %items = map{ $_ => [ @keywords[ ( shuffle( 0 .. $#keywords ) )[ 0 .. rand @keyw +ords ] ] ] } 'a' .. 'z'; print "The item list:\n"; print "$_ => @{ $items{ $_ } }\n" for sort keys %items; print '=' x 30, "\n"; # First, build an index of the distinct values by building a hash # (anonymous) and taking the keys my @val_index = keys %{{map {($_ => undef)} map {@$_} values %items}}; # and a reverse lookup my %rev_val_index = map {($val_index[$_] => $_)} 0..$#val_index; # Now represent each entry as a bit vector my %vectors; while (my ($k, $v) = each %items) { $vectors{$k} = new Bit::Vector(scalar(@val_index)); $vectors{$k}->Bit_On($_) for @rev_val_index{@$v}; } # Compare elements pairwise and add each element # to an AoHoH indexed by size of tuple and tuple member list # if they have elements in common my @intersections; my @item_keys = keys %items; for my $i (0..$#item_keys-1) { for my $j ($i+1..$#item_keys) { my $intersect = $vectors{$item_keys[$i]} & $vectors{$item_keys[$j] +}; my @common_elements = @val_index[$intersect->Index_List_Read]; next if @common_elements < $MIN; my $name_list = join ' ', @common_elements; @{$intersections[scalar @common_elements]{$name_list}}{@item_keys[ +$i,$j]} = (); # Include any higher-order matches in lower-order matches for my $k (0..$#item_keys) { next if $k == $i or $k == $j; my $new_isect = $intersect & $vectors{$item_keys[$k]}; if ($new_isect eq $intersect) { $intersections[scalar @common_elements]{$name_list}{$item_keys +[$k]} = (); } } } } for ($MIN..$#intersections) { next unless keys %{$intersections[$_]}; print "=== $_-tuples of common elements:\n"; while (my ($k, $v) = each %{$intersections[$_]}) { print "$k: ", join(', ', sort keys %$v), "\n"; } }

Caution: Contents may have been coded under pressure.

In reply to Re^2: algorithm for 'best subsets' by Roy Johnson
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2024-04-18 16:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found