Welcome to the Monastery PerlMonks

### RFC: Conceptual association

by SilasTheMonk (Chaplain)
 on Oct 09, 2008 at 20:33 UTC Need Help??

## Problem

I think that in a week or two I may have a problem like this. I will have a list of "concepts" and for every pair of concepts a measure of how often they are associated. I would like to draw a diagram/map showing this association graphically in a web page. This problem sounds very much like one from cladistics, but not being of that profession myself I would not know.

## My proposed solution (ignoring edge cases and ambiguities)

1. Find the pair with the highest association metric, draw them in the middle of the graph joined by a line.
2. Those two nodes form the edge of the graph.
3. For each node n in the edge of the graph, find the most highly associated two nodes not already taken and join them to n. They are now added to the edge and n removed from the edge.
4. Go back to step 3.

## Questions

• Is this algorithm (or variant) implemented as a perl module anywhere?
• If not any general thoughts?
• Are any you of you cladists?

## Update

I had another look on CPAN after looking at the word "cladogram" and came up with Bio::Phylo. Still it's rather a lot for me to take in.

Replies are listed 'Best First'.
Re: RFC: Conceptual association
by TGI (Parson) on Oct 09, 2008 at 22:10 UTC
Re: RFC: Conceptual association
by hawtin (Prior) on Oct 10, 2008 at 11:53 UTC

I had a similar problem (with a multi-dimensional set of distances between each pair, but I don't think that makes a difference). My solution was to use a "simulated-annealing" approach. A pseudo-code description would be something like

```
my @items
my @best_layout
my \$least_error

for (0..100)
{
foreach @items
add to layout at random x,y
best_error_measure = sum of badness of all inter-item distances
my \$jiggle_distance = largish
while(\$jiggle_distance > smallish)
{
randomly pick item try it in four alternate locations
(up, down, left, right)
if(new_error_measure < best_error_measure)
keep the item in its best new location
else
jiggle_distance *= 0.95
}
if(best_error_measure < least_error)
{
save layout into best_layout
least_error = best_error_measure
}
}

This has the advantage that you can take any consideration into account when measuring the "badness" of a layout, for example are there items you want near the centre. For my data it very rapidly converged on a reasonable solution (the optimal solution would have taken much longer but I just needed something that was "good enough").

For my particular data the standard "cluster analysis" approaches gave poor results, and I wanted a two dimensional display of the results.

For my particular data the standard "cluster analysis" approaches gave poor results, and I wanted a two dimensional display of the results.

Just out of curiosity, what method gave poor results? A PCA normally does a good job in such a visualization of multi-dimensional data.

Re: RFC: Conceptual association
by GrandFather (Saint) on Oct 09, 2008 at 23:18 UTC

A possibly related concept is the 'tag cloud'. CPAN has a number of modules related to managing tag clouds, although most of the interesting looking modules focus on an HTML context.

Perl reduces RSI - it saves typing
At one level the tag cloud is not what I was looking for. It seems to indicate only the importance of concepts not their relationships.

On the other hand it meets the broader, possibly unstated, objective admirably. It utilizes the data to provide an attractive visual interface to the data.

On the other hand the first phase of data that will become available represents relationships rather than value.

Unconventional use for a cloud, but could it work to:
1. gather stats for all combinations axb, axc, axd,..., bxc, bxd, ..., cxd, cxe, cxf ..., thus expressing the relationship quantitatively;
2. use the magnitudes of those relationships to drive the tag-cloud generator?

Disclaimer - I am totally naive about tag clouds.

This signature will be ready by Christmas
Re: RFC: Conceptual association
by lima1 (Curate) on Oct 10, 2008 at 08:06 UTC
I have some difficulties understanding your algorithm, but I assume it is something like UPGMA. Is your "concepts" graph really a bifurcating tree? In Phylogeny, one assumes that all observed "concepts" have a common ancestor. If this is the case, it is straightforward to apply simple algorithms like Neighbor-joining or UPGMA. They take as input a distance matrix.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://716297]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2021-12-01 16:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
R or B?

Results (13 votes). Check out past polls.

Notices?