Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all.

I recently found a poster presentation at PyCon 2013 with the following summary:

We constructed a knowledge-based data model using Django's object-relation mapping (ORM) in which the cancer-related informations from 3 ontologies, i.e. Gene Ontology, Disease Ontology, and ChEBI, and 4 clinical databases, i.e. 1000 Genomes, Comparative Toxicogenomics Database, ClinicalTrials, and DrugBank were utilized and semantically related. Using our data model, the integrated information such as related genes, their mutations, single nucleotide polymorphisms, clinical trials, and related drugs for given cancer types can be retrieved.

I immediately thought this would make a fascinating student presentation provided I was able to construct a similar project (the source code to the aforementioned project was not provided) using Perl. With that objective in mind, I was wondering if you (collectively) could provide some feedback on what approach you would take? I am considering Catalyst and DBIx::Class but I have yet to find a suitable tutorial on either or, more specifically, a book on Catalyst whose reviews inspire confidence. Clearly, such a project is ambitious but I would not need to present the project until April of 2014 and this should provide ample opportunity to develop the necessary skills / knowledge. Since I am completely unfamiliar with both Catalyst and DBIx::Class, I would be appreciative if someone could share with me (us) the purpose of these packages. Would either be suitable for the project described above? Do you suggest learning SQL or would an Object Relational Model (ORM) make that action unnecessary?

Thanks for the help.
  • Comment on Using Perl in a personalized medicine application.

Replies are listed 'Best First'.
Re: Using Perl in a personalized medicine application.
by thomas895 (Deacon) on Jul 24, 2013 at 08:20 UTC

    If you just want to output information from a database, then there are many ways of doing that. I have heard great things about Dancer, which abstracts much of the work for you, allowing you to focus on the presentation of the data. I've never used it, because I don't like this Moose stuff(it just doesn't look right). But you may find differently, so I encourage you to try it.

    If you want to use Catalyst, then try the tutorial -- Catalyst::Manual::Tutorial -- which is a very good starting point for both Catalyst and DBIx::Class.

    I know next to nothing about biology(except that during DNA replication, the RNA somehow gets the other RNA-thing(tRNA?) to move along it and translate every 3 codas into an amino acid which it strings on to a chain of amino acids...I think), so unfortunately I can't help you with how to organize that data or how to format it to your wishes.

    "Excuse me for butting in, but I'm interrupt-driven..."
      Moose stuff
      Dancer (not talking about Dancer2, which is written in Moo) has nothing in common with Moose.
      لսႽÜ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Using Perl in a personalized medicine application.
by erix (Prior) on Jul 24, 2013 at 11:46 UTC

    Unless you do this only as a learning exercise, I think that the main question you should answer for yourself: is it not unnecessary? I am not quite sure, but I think EnsEMBL, UniProt, Entrez between them already have a lot, perhaps even all of the things that that PyCon quote mentions. EBI, EMBL, NCBI, they all have large teams working on both data, website, and interfaces. Make sure you find a useful addition/niche before even starting to work on it.

    You also omit to mention what this would be used for. What is the audience? What kind of queries do you want to answer? How fast, how many, how reliably.

    In general, I think the size of the source-databases involved (chebi, GO, human and perhaps other genomes) would make a database necessary, unless you envisage a system that makes use their respective API's.

    I don't think an ORM (or which ORM) should be your first concern. Such things are pretty much downstream of the above questions.

    A last consideration: have you contacted the poster presenters? Most people love it when others show interest in their work, and they are possibly quite willing to share their work. Even if you don't use it it may give you good ideas.

      I would agree. The web framework (catalyst or something else) and the ORM (DBIx::Class or something else) are fairly mundane. The really fun part is the semantically related bit. Bringing together diverse data sets and trying to tease apart similar stuff can be either a lot of fun or a pure nightmare ... it all depends on your definition of similar.

Re: Using Perl in a personalized medicine application.
by Your Mother (Archbishop) on Jul 24, 2013 at 16:36 UTC

    Of possible further interest

    Not related to genetics analysis or everything you mention but you might look into UMLS::Similarity and UMLS::Interface. They arenít drop in, you have to have some MeSH stuff locally (requires registration and involves some very large files) and understand quite a bit of licensing (mostly open but not all), and itís a technical challenge to set up but itís extremely interesting and powerful for what it does if you take the trouble.