Sets of subsets

jpearl has asked for the wisdom of the Perl Monks concerning the following question:

I have an interesting problem that i've been thinking about. Basically we have a series of different "things" (say 4 of them). I would like to group each thing together by a given set of attributes. The way this is calculated is something like:

    a1    a2    a3
t1    y    y    n
t2    y    n    y
t3    n    y    y
t4    y    y    n
[download]

From the table we can see similarities and differences between the things. For example the difference between t1 and t4 is 0, and the similarity is 3. My idea is to group the "t#"s together by maximizing the average difference between them, and minimize the similarity. To do that, however I need to be able to split the "things" into multiple different subsets. For example:

Subset(s)1 = {t1,t2}{t3}{t4}
Subset(s)2 = {t1}{t2,t3}{t4}
Subset(s)3 = {t1}{t2}{t3,t4}
[download]

etc.

Then basically iterate over all different possible combinations and find the maximum average distance and/or minimum similarity resulting from a comparison of the groups. My question is: is there a module available which will return sets of subsets like this? I'm finding a lot of modules that return permutations and the like, but not subsets. I can of course write my own, but reinventing the wheel and all that. Any suggestions of a better/different way to do a grouping of this nature are of course completely appreciated.

Thanks!

Comment on Sets of subsets Select or Download Code

Replies are listed 'Best First'.
Re: Sets of subsets by ELISHEVA (Prior) on Jul 22, 2009 at 20:49 UTC
This sounds a little like cluster analysis. If that is what you are interested in, you may want to check out the following links: PerlMonks: module for cluster analysis CPAN: Algorithm::Cluster CPAN: PDL::Stats::Kmeans Best, beth	[reply]
Re: Sets of subsets by ig (Vicar) on Jul 22, 2009 at 20:48 UTC
You might find Set::Partition or the partitions method of Algorithm::Combinatorics helpful.	[reply]
Re^2: Sets of subsets by jpearl (Scribe) on Jul 22, 2009 at 21:41 UTC
Ah yes, not thinking of searching for "partition" was not doing me any favors. Algorithm::Combinatorics had exactly the method I was looking for (partitions) :-P. Thanks!	[reply]
Re: Sets of subsets by mzedeler (Pilgrim) on Jul 22, 2009 at 20:59 UTC
It is easier to do as many class divisions as you have positions in each tuple (I guess tuple is a better word in stead of set, since you have the same number of elements in each of them): `Class(a1): {{t1, t2, t3}, {t4}} Class(a2): {{t1, t3, t4}, {t2}} Class(a3): {{t1, t4}, {t2, t3}}` [download] While you divide into classes, you'll easilly be able to maintain a distance map like so: `After processing Class(a1): \ t1 \| t2 \| t3 \| t4 t1 0 \| 0 \| 0 \| 1 t2 0 \| 0 \| 1 t3 0 \| 1 t4 0 After processing Class(a2): \ t1 \| t2 \| t3 \| t4 t1 0 \| 1 \| 0 \| 1 t2 0 \| 1 \| 2 t3 0 \| 1 t4 0 After processing Class(a3): \ t1 \| t2 \| t3 \| t4 t1 0 \| 2 \| 1 \| 1 t2 0 \| 1 \| 3 t3 0 \| 2 t4 0` [download] This way, you can get complexity O(m * n) where m is the number of tuples and n is the size of the tuples.	[reply] [d/l] [select]
Re: Sets of subsets by JavaFan (Canon) on Jul 22, 2009 at 20:17 UTC
Any suggestions of a better/different way to do a grouping of this nature are of course completely appreciated. I probably completely misunderstand what you are going to do, but it looks like a naive simple algorithm is `O(n²k)`, where `n` is the number of sets, and `k` the number of thingies in a set. But the number of subsets is already `Θ(2ⁿ)`. Going this way doesn't seem like an efficient solution to me.	[reply]
Re^2: Sets of subsets by jpearl (Scribe) on Jul 22, 2009 at 20:44 UTC
I probably didn't phrase it very well. Really I'm just interested in the partitions of the main set of things. So if there were three "things", there would only be 4 sets of subsets: `{t1}{t2}{t3} {t1,t2}{t3} {t1}{t2,t3} {t1,t3}{t2}` [download] Since ordering doesn't matter. From these partitions I can calculate the distance or similarity between the groups from the attribute table (which has already been calculated). So for the above example I might have a table like: `t1 t2 t3 t1 0 1 2 t2 1 0 3 t3 2 3 0` [download] Representing the difference between each thing. When I have the subsets from above I can calculate what the average difference is between the subsets using the difference values as the metric. Important to note as well that the number of "things" themselves are relatively few. The attributes previously calculated are in the hundreds, but the things themselves number a few tens (max 24). So, I'm not too worried about the numbers exploding on me... yet.	[reply] [d/l] [select]
Re^3: Sets of subsets by JavaFan (Canon) on Jul 22, 2009 at 21:16 UTC
Just look at the group t1 is in. t1 can be a group in itself. Or with each of t2, t3, ..., tk. Or any combination of them. Which means that if you have N "things", there are `2^N-1-1` ways to be t1 in a group with other "t's" (-1 comes from that you seem to exclude the subset of all things together). And that's not even counting the different ways you can split up the group of things that aren't in the subset t1 belongs to.	[reply]
Re^4: Sets of subsets by jpearl (Scribe) on Jul 22, 2009 at 22:00 UTC
Re: Sets of subsets by blokhead (Monsignor) on Jul 23, 2009 at 16:41 UTC
Here are a few iterative methods for generating set partitions: Re: Generator of integer partitionts of n (see second half of post) Partitioning a set into parts of given sizes blokhead	[reply]


There's more than one way to do things
	PerlMonks