http://qs321.pair.com?node_id=1214519


in reply to Groups of Objects with Common Attributes

Just some odd thoughts rather than a clear strategy. Say you have 75 attributes and 500 objects.

  1. Define a measure of distance between any pair of objects, e.g. 75 minus the number of common objects or 1 over the number of common objects.
  2. Calculate all distances between pairs. This gives you 250*499 distances.
  3. Apply some kind of clustering algorithm, e.g. nearest neighbors or something. These should be available on CPAN.

Playing with distance measures and clustering algorithms (and their parameters) will lead you to a number of different solutions.

  • Comment on Re: Groups of Objects with Common Attributes

Replies are listed 'Best First'.
Re^2: Groups of Objects with Common Attributes
by bliako (Monsignor) on May 15, 2018 at 10:18 UTC

    Continuing on what hdb suggested,

    Each object lives in a space where the said Attributes are the coordinates.

    Coordinates can be discreet or binary or continuous, e.g. binary: Apple -> (orange=0, red=1, plant=1, fruit=1,toy=0). Continuous means that there is so much probability that apple is orange (0.5) or apple is a toy (0.01). etc. Notice that each object is characterised by a set of coordinates which includes ALL attributes. If attribute does not relate to object then it is set to zero.

    In this space, there is a distance metric, e.g. euclidean (others exist) to tell you how far apart are an Apple and a pumpkin.

    Clustering is a process which groups nearby objects (in this space) together based on the distance metric chosen.

    I see a problem with the above approach: most objects will have most of their coordinates set to zero, e.g. an Apple has only 3 attributes turned on and I guess the rest 72 will be off/zero. The problem is that clustering may group together objects because they have in common the absence of a lot of attributes, and you probably want to group objects together because they have in common the presence of an attribute. An obstacle but not a tough one.