http://qs321.pair.com?node_id=573648

rvosa has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I am looking to integrate a number of objects that need to notify each other of their state changes. I am looking for design pattern suggestions on how to proceed.
The problem space is that of a branch of biology called 'phylogenetics'. The main objects in this problem space are:
  • Matrices: Matrices are containers that hold biological data.

    The rows in the matrix are biological entities (usually species), the columns are comparative data points ("characters").

    For example, you could have a matrix with three rows - one for Homo sapiens, one for Pan paniscus (the pygmy chimpanzee) and one for Pan troglodytes (the common chimpanzee) - if the matrix contains a single character ("has opposable big toes, yes/no"), the matrix would look like this:
    Homo_sapiens 0 Pan_paniscus 1 Pan_troglodytes 1
    In its simplest form, you could implement this as a two-dimensional array, and it is probably instructive (at least for me) to think of it that way, although requirements of type safety mean there'd almost certainly be an abstraction layer that keeps an eye on what goes into the matrix (character matrices can contain dna sequences, binary character states, "continuous characters", i.e. floating point values, and a bunch of more esoteric data types).

    The typical kinds of operations you'd want to do on a matrix object are things like adding and removing rows and columns, renaming rows, annotating columns.


  • Trees: trees represent a graph representation (directed, acyclical, usually) of the inferred relationships between the rows in a matrix.

    A tree describing the relationships between the species in the matrix described here might show the chimpanzees as more closely related one to another than either to humans, based on the distribution of "character states" (the exact inference depends on the assumed direction of evolutionary change, e.g. did human lose opposable big toes in the course of their evolution, or did chimpanzees gain it?).

    Tree objects usually are recursive data structures of some form.

    Things one might want to do with a tree include performing several calculations on the tree shape, changing the shape, removing branches, renaming entities in the tree.


  • Taxa: the (often incorrectly applied) shorthand term for the intersection between trees and matrices.

    'Homo_sapiens', both in the tree and the matrix, is a "taxon". In its simplest form, you can think of this as a name constituting a unique primary/foreign key in the course of an analysis.

    For example, when we use 'Homo_sapiens' in the matrix, we refer to the same thing as when we use it in a tree.

These objects interact with one another during the course of a typical phylogenetic analysis, which might consist of:
  1. Collecting data
  2. Shoe-horning it into one or more matrices
  3. Inferring one or more trees from the matrices
  4. Analyzing the fit of the data on the trees. This might include things like removing outliers, fixing typos in taxon names, and other things in desparate need of a mechanism to preserve referential integrity between trees, taxa and matrices.
There's a number of packages on CPAN that deal with this kind of research: Bio::NEXUS and Bio::Phylo (I wrote Bio::Phylo, and I'm working with the authors of Bio::NEXUS) and bioperl (the 500-pound gorilla with which we want to stay compatible).

If you care to look at these packages, you'll notice that there are implementations of the tree, matrix and taxon objects I described, but no larger framework that deals coherently with the relationships between them. One part of the integration I am trying to achieve is a situation where changes to a taxon object cascade through to the matrix and tree objects that refer to this taxon. Likewise, changes in the matrix should be reflected in the tree and (sometimes) vice versa.
The problem I have is that I can't quite conceive of the right architecture to keep referential integrity between the different objects I'm dealing with. I am thinking of something like the Observer pattern, but I'm not sure if that's entirely appropriate (observing and handling goes in multiple directions between the objects. I fear spaghetti.) I am very eager to hear your suggestion and comments.

Thanks!