http://qs321.pair.com?node_id=604276

rvosa has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I am debating how best to design the following: I have three types of objects ($nodes in evolutionary trees, DNA $sequences, and $taxon objects, i.e. "species"). I want to create bi-directional relationships between these things.
For example, the $taxon object (being a representation of our concept of a "species") with the name 'Homo sapiens' could be instantiated at some point, and subsequently a $sequence object containing a human DNA sequence could be created. I want to create a link between the sequence and the taxon. In fact, I might want to link many sequences to that one taxon (say, a set of genes sequenced from a human subject), and link many nodes (from different trees) to that one taxon. Simply put: there's a one-to-many relationship from taxon to nodes and sequences, and a one-to-one relationship from node to taxon, and from sequence to taxon.

The end result should be that I can do things like:
my @sequences = $taxon->get_sequences; my @nodes = $taxon->get_nodes; my $taxon = $node->get_taxon;
I have so far used a design where the nodes and sequences have a set_taxon method (where the ref to the taxon is stored as a field in the object), and the taxon objects have set_nodes and set_sequences methods (also with fields for those, directly in the object). This strikes me as bug prone, because all the relationships automatically have to be bidirectional.

The quick fix is to say that whoever is called to set_whatever needs to check whether or not the link in the other direction exists, and make one if not. Then, you have to prevent these calls from bouncing back and forth, and pretty soon you're in if/elsif/elsif... territory.
I now wonder if I should create a $taxonlinker object that does the bookkeeping in both directions, e.g. $taxonlinker->make_link( $node, $taxon ) or something to that effect, and all other objects ($nodes, $sequences and $taxon objects) simply communicate with the bookkeeper and ask it for whatever else might be on the other side of the link.

I half and half got that idea from perldesignpatterns but the page doesn't give me that much. Could anyone give some examples how that might work in the wild? Some code?

Thanks!

Replies are listed 'Best First'.
Re: One to many, many to one relationships
by TOD (Friar) on Mar 12, 2007 at 04:34 UTC
    that's rather a SQL-problem than one of perl.

    USE database; DROP TABLE IF EXISTS taxon; CREATE TABLE taxon (t_id INT(n) PRIMARY KEY auto_increment, [...] ); DROP TABLE IF EXISTS node; CREATE TABLE node (n_id INT(n) PRIMARY KEY auto_increment, n_t_id INT(n) NOT NULL DEFAULT 0, [...] ); DROP TABLE IF EXISTS sequence; CREATE TABLE sequence (s_id INT(n) PRIMARY KEY auto_increment, s_t_id INTn) NOT NULL DEFAULT 0, [...] );

    notice that every record in the node-table, and every record in the sequence-table has to contain the pointer to a record in the taxon-table. the one-to-many relationship will then be realized by something like:
    $dbh->selectall_arrayref("SELECT * FROM taxon, node WHERE n_t_id = t_i +d");

    and the one-to-one realtionship you will find with a query like:
    $dbh->selectrow_arrayref("SELECT * FROM node, taxon WHERE n_id = XXX A +ND t_id = n_t_id");

    how you then wrap that in an object-oriented code is a matter of taste. but basically these are the things that should go on in the background.

    greez
    TOD
      I had some similar task recently and found Rose::DB::Object very, very nice. See Section "Relationships" in the Tutorial.
      Thanks. I agree that there's a similarity with rdbms, and elsewhere in the great big project I'm part of there's someone working on a database that models exactly these relationships in this same way, but that wasn't quite my question.
      how you then wrap that in an object-oriented code is a matter of taste.
      That was my question. I think it's not just a matter of taste, I think there are robust ways of doing this, and spaghetti-esque ways of doing it, and I can't quite figure out how to get it right. Thanks for the input, though!
Re: One to many, many to one relationships
by graff (Chancellor) on Mar 12, 2007 at 04:44 UTC
    I think the question hinges on when the linkage information first becomes available: is it instrinsically known at the point when an object instance is created, or might it get introduced at some later stage as an update to an existing instance?

    If it makes sense that instances of each object type are able to exist independently, without necessarily being linked to one of the other object types, then it would make sense for link creation to be a separate process, which takes a taxon instance and a node or sequence instance, and updates each one by adding a reference to the other. (In the taxon, a reference to a node or sequence would be pushed onto an array of nodes or sequences; in a node or sequence, there can be only one reference to a given taxon.)

    OTOH, if every node and/or every sequence being created is inherently related to a specific taxon, then it would make sense that the "new()" method for node and sequence objects would take a required reference to a taxon instance, check to make sure this reference is defined, create the new node or sequence instance that contains the reference to that taxon, and finally invokes the "add_node" or "add_sequence" method of that taxon, to make the relation bidirectional.

    (At least one of the object types needs to be able to exist independently, without requiring a relation to the other types. It sounds like the taxon object would be that way for sure.)

    (update: And I agree with TOD, that it makes more sense if the underlying data storage is a relational database, where foreign key relations can be enforced automatically once you define the tables properly, rather than just being a bunch of perl-internal objects.)

      I think the question hinges on when the linkage information first becomes available: is it instrinsically known at the point when an object instance is created, or might it get introduced at some later stage as an update to an existing instance?
      It might be introduced later - especially for nodes, which might never be linked to a taxon.

      Also, I don't want to/can't change the api such that the link has to be created in the constructor.
      If it makes sense that instances of each object type are able to exist independently, without necessarily being linked to one of the other object types, then it would make sense for link creation to be a separate process, which takes a taxon instance and a node or sequence instance, and updates each one by adding a reference to the other. (In the taxon, a reference to a node or sequence would be pushed onto an array of nodes or sequences; in a node or sequence, there can be only one reference to a given taxon.)
      Yup, that's what I tend towards. Thanks!
Re: One to many, many to one relationships
by eric256 (Parson) on Mar 12, 2007 at 21:38 UTC

    Just make sure than one of your link creation functions bails out if its already been set. For instance in the code below if the node is already in the taxon the taxon doesn't bother trying to set itself as that nodes taxon. The other option could be to have one more layer of inderiction. So you have set_taxon($taxon) that users use, it would automaticaly create the link, it would then call $taxon->_add_node($node) The private (underscored) versions of add_node and set_taxon don't automaticaly do anything.

    use strict; use warnings; { package Taxon; sub new { bless {nodes=>[]}, shift; }; sub add_node { my $self = shift; my $node = shift; return 1 if $self->has_node($node); push @{$self->{nodes}}, $node; $node->set_taxon($self); } sub has_node { my $self = shift; my $node = shift; return grep { $_ eq $node } $self->get_nodes(); } sub get_nodes { @{shift->{nodes}} }; } { package Node; sub new { bless {_taxon=>''}, shift; }; sub taxon { my $self = shift; return $self->{'_taxon'}; } sub set_taxon { my $self = shift; my $taxon = shift; return if $self->{'_taxon'} eq $taxon; $self->{'_taxon'} = $taxon; $self->{'_taxon'}->add_node($self); } } my $node = new Node; my $taxon = new Taxon; print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $node->set_taxon($taxon); print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $node = new Node; $taxon = new Taxon; print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $taxon->add_node($node); print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n";

    Or the second way

    use strict; use warnings; { package Taxon; sub new { bless {nodes=>[]}, shift; }; sub add_node { my $self = shift; my $node = shift; $self->_add_node($node); $node->_set_taxon($self); } sub _add_node { my $self = shift; my $node = shift; return 1 if $self->has_node($node); push @{$self->{nodes}}, $node; } sub has_node { my $self = shift; my $node = shift; return grep { $_ eq $node } $self->get_nodes(); } sub get_nodes { @{shift->{nodes}} }; } { package Node; sub new { bless {_taxon=>''}, shift; }; sub taxon { my $self = shift; return $self->{'_taxon'}; } sub set_taxon { my $self = shift; my $taxon = shift; $self->_set_taxon($taxon); $taxon->_add_node($self); } sub _set_taxon { my $self = shift; my $taxon = shift; $self->{'_taxon'} = $taxon; } } my $node = new Node; my $taxon = new Taxon; print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $node->set_taxon($taxon); print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $node = new Node; $taxon = new Taxon; print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n"; $taxon->add_node($node); print '$node->taxon() = ', $node->taxon(), "\n"; print '$taxon->get_nodes() = ', $taxon->get_nodes(), "\n";

    ___________
    Eric Hodges
Re: One to many, many to one relationships
by Herkum (Parson) on Mar 12, 2007 at 17:16 UTC
      No, really, this is not a database design issue. It's about how to design my objects to model the relationships:
      One-to-many: taxon ---> node ---> node ---> sequence ---> sequence One-to-one: node ---> taxon One-to-one: sequence ---> taxon
      Such that I can do:
      my @nodes = $taxon->get_nodes; my @sequences = $taxon->get_sequences; my $taxon = $node->get_taxon; $taxon = $sequence->get_taxon;
      ...with an underlying architecture where the relationships automatically are bi-directional in an elegant way. For example:
      $node->set_taxon( $taxon );
      ...might mean that the $node now holds a reference to $taxon, but the $taxon now also has to hold a reference to $node. So should I then do:
      sub set_taxon { my ( $node, $taxon ) = @_; $node->{'_taxon'} = $taxon; # check if taxon already knows about me for my $known ( $taxon->get_nodes ) { return $node if $known == $node; } $taxon->add_node( $node ); return $node; }
      ...but then the $taxon->add_node( $node ) sub would likewise need to check if $node already knows about $taxon, and if not make the link in the other direction - without recursing add infinitum. All in all, that's not elegant.

      So my question was: should the link making be managed by a third object (some sort of manager), to which the set_whatever, add_whatever, get_whatever method calls are re-routed. Or should I do something else entirely?

      My question is not addressed by database designs, raw sql code, or object-relational mappers. Thank you for your input, though. I'm looking for suggestions for design patterns to deal with bi-directional relationships between objects. I understand that the suggestion for MVC has something to do with the issue (but the term MVC is so polluted at this point it's not unambiguously obvious how to apply it - perhaps the 'C' is the link manager?), so greatshots I think got what I was talking about, and so did graff.

      I apologize to all others if I've phrased things so poorly that I made you write about databases.
        I think the closest formal (GoF) pattern for what you are describing is the Mediator pattern, which acts as the third-party object you are describing. The main challenge is that this pattern is open to abuse as described by the God anti-pattern
        You can do this with Class::DBIx and Rose::DB::Object. For example in RDBO: Assume this simple schema:
        DROP TABLE public.nodes; DROP TABLE public.sequences; DROP TABLE public.taxa; CREATE TABLE public.taxa ( id SERIAL NOT NULL PRIMARY KEY , common_name CHARACTER(60) , UNIQUE (common_name) ); CREATE TABLE public.nodes ( id SERIAL NOT NULL PRIMARY KEY , description CHARACTER(60) , taxon_id INT REFERENCES taxa (id) ); CREATE TABLE public.sequences ( id SERIAL NOT NULL PRIMARY KEY , description CHARACTER(60) , seq TEXT NOT NULL , taxon_id INT REFERENCES taxa (id) );
        Then this perl code:
        package MyDB; use base qw(Rose::DB); __PACKAGE__->use_private_registry; __PACKAGE__->register_db( driver => 'pg', database => 'mydb', host => 'localhost', username => $ENV{USER}, password => '', ); 1; package MyDB::Object; use MyDB; use base qw(Rose::DB::Object); sub init_db { MyDB->new }; 1; package MyDB::Node; use base qw(MyDB::Object); __PACKAGE__->meta->table('nodes'); __PACKAGE__->meta->auto_initialize; __PACKAGE__->meta->make_manager_class('nodes'); 1; package MyDB::Sequence; use base qw(MyDB::Object); __PACKAGE__->meta->table('sequences'); __PACKAGE__->meta->auto_initialize; __PACKAGE__->meta->make_manager_class('sequences'); 1; package MyDB::Taxon; use base qw(MyDB::Object); __PACKAGE__->meta->table('taxa'); __PACKAGE__->meta->auto_initialize; 1; my $t = MyDB::Taxon->new(common_name => 'Homo Sapiens'); $t->add_sequences({ description => '1', seq => 'AA'}, { description => '2', seq => 'GG'},); $t->add_nodes({ description => '1', }, ); $t->save; # a little bit more complicated than necessary just # to demonstrate queries my $nodes = MyDB::Node::Manager->get_nodes( query => [ 'taxon.id' => $t->id], require_objects => [ 'taxon'], ); foreach my $node (@$nodes) { print "NODE: ". $node->description . ' ' . $node->taxon->common_name . "\n"; }; my $seqs = MyDB::Sequence::Manager->get_sequences( query => [ 'taxon_id' => $t->id], ); foreach my $seq (@$seqs) { print "SEQ: ". $seq->description . ' ' . $seq->taxon->common_name . "\n"; }; # change taxon $nodes->[0]->taxon(MyDB::Taxon->new(common_name=>'Guppy')); $nodes->[0]->save; # old taxon $nodes->[0]->taxon($t); #$nodes->[0]->save;
        ...does want you want I think. Or not? Is there a reason why you want to write your own OO mapper (you said you will have an underlying database)? Update: Fixed Idention
Re: One to many, many to one relationships
by greatshots (Pilgrim) on Mar 12, 2007 at 04:46 UTC
      Thanks for the input. I saw a reference to MVC on the perl design patterns page as well - and I've used it before in other projects - but I don't see how this applies to my problem. Who's the model? Who's the view? Who's the controller? Some more explanation would be great!