Re^2: Groups of Objects with Common Attributes

After couple days of looking up what on earth is clustering, k-means, etc. -- hopeless -- I took closer look at your code. Wow, so simple. Thanks, martink! Here is re-factored version, if I may, discarding all that was perceived superfluous and simplifying (to fit my brain). So it's, effectively, just 2 plain loops: over all attributes and all items. Loop over items doesn't add to example with pumpkins, but is required for other test cases.

I wonder, is it a mathematical fact, that even for 500 items and 75 attributes, there can be no more than 575 sets of common attributes? It somewhat contradicts to what I remember from combinatorics.

use strict;
use warnings;
use feature 'say';
use List::Util qw/ uniq all /;
use Data::Dump 'dd';

my $item2attr = {
    apple   => { red    => 1, round => 1, plant => 1, fruit     => 1 }
+,
    orange  => { orange => 1, round => 1, plant => 1, fruit     => 1 }
+,
    pumpkin => { orange => 1, round => 1, plant => 1, vegetable => 1 }
+,
    ball    => { red    => 1, round => 1, toy   => 1 },
};

# list of all items and attributes
my @items = sort keys %$item2attr;
my @attr  = sort( uniq( map { keys %$_ } values %$item2attr ));

# flip the hash 
my $attr2item;
for my $attr ( @attr ) {
    for ( @items ) {
        $attr2item-> { $attr }{ $_ } = 1
            if $item2attr-> { $_ }{ $attr }
    }
}
    
#dd $item2attr;
#say '-----------------------------------';
#dd $attr2item;
#say '-----------------------------------';

my %solutions;      # hash, to prevent duplicates

for ( @attr ) {
    my @items_ = keys %{ $attr2item-> { $_ }};
    
    my @attr_ = grep { 
        my $attr = $_;
        all { $item2attr-> { $_ }{ $attr }} @items_
    } @attr;

    _add_solution( \@attr_, \@items_ )
}

for ( @items ) { 
    my @attr_ = keys %{ $item2attr-> { $_ }};
    
    my @items_ = grep { 
        my $item = $_;
        all { $attr2item-> { $_ }{ $item }} @attr_
    } @items;

    _add_solution( \@attr_, \@items_ )
}

dd values %solutions;

# then filter solutions for required number of common 
# attributes, or find max set of common attributes,
# or find max set of items with any common attributes, etc.

sub _add_solution {             # writes to %solutions
    my ( $attr, $items ) = @_;
    
    return unless $#$items;     # skip uninteresting
    @$_ = sort @$_ for @_;

    $solutions{ join ',', @$attr } = [ 
        scalar @$attr,          # count of attributes
        scalar @$items,         # count of items
        $attr,                  # attribute list
        $items                  # item list
    ]
}

__END__

(
  [2, 2, ["red", "round"], ["apple", "ball"]],
  [2, 3, ["plant", "round"], ["apple", "orange", "pumpkin"]],
  [1, 4, ["round"], ["apple", "ball", "orange", "pumpkin"]],
  [3, 2, ["fruit", "plant", "round"], ["apple", "orange"]],
  [3, 2, ["orange", "plant", "round"], ["orange", "pumpkin"]],
)
[download]

Edit: fixed issue with sorting.

Comment on Re^2: Groups of Objects with Common Attributes Download Code


good chemistry is complicated, and a little bit messy -LW
	PerlMonks