Re: Perl and Bioinformatics

Guys

I enjoyed the node. Here is a suggested topic that would be worth pursuing for the biological crowd

Data Structures. So BioPerl gives you some pretty cool data structures that are easy to handle. Its when you run into custom structures that you get problems. For instance if I'm working with E.coli, I have ~5e6 bp of DNA - 1e7bp if I'm working on each nucleotide on both strands. How do I manage working with an analysis that needs to annotate every base, eg working with coverage from next gen analysis. Using arrays or hashes gets ugly because you will typically run out of memory. I'm not aware of an out of the box BioPerl solution, though I could stand to be corrected. You could use pack and unpack. You could use DB::File. You might even go to Berkeley DB. But the problem is general enough that it would be useful to see one or more tutorials on what to do for these larger analysis problems that are beyond simple scripts and not necessarily part of the BioPerl toolbox.

MadraghRua
yet another biologist hacking perl....

Comment on Re: Perl and Bioinformatics

Replies are listed 'Best First'.
Re^2: Perl and Bioinformatics by BioLion (Curate) on Feb 19, 2010 at 15:27 UTC
I think this is a really good suggestion - certainly a topic that is becoming more and more relevant. We tried to touch on this in the text, not to knock BioPerl, but their objects are generally huge (even for simple things) mainly because of the need to ensure that they all mesh well together, and ensuring backwards compatibility, amongst other things. As biohisham says, this interoperability of the whole suite is it's greatest strength, but certainly can be a weakness too. I have generally taken to rolling my own stripped down objects, and using caching when things get really hairy. I asked a question on this sort of topic before ( Storable Objects ), and for that problem I did end up setting 'store-points' where I would cache the appropriate info as certain critic points. This worked, but certainly isn't applicable to all cases, especially ones like you mention where the processing isn't so linear. If you have an example problem (and solutions you tried), please post it here, it would be good to get discussion going - as I said, I think this is a very relevant problem. Just a something something...	[reply]

Replies are listed 'Best First'.

Re^2: Perl and Bioinformatics
by BioLion (Curate) on Feb 19, 2010 at 15:27 UTC

I think this is a really good suggestion - certainly a topic that is becoming more and more relevant. We tried to touch on this in the text, not to knock BioPerl, but their objects are generally huge (even for simple things) mainly because of the need to ensure that they all mesh well together, and ensuring backwards compatibility, amongst other things. As biohisham says, this interoperability of the whole suite is it's greatest strength, but certainly can be a weakness too.

I have generally taken to rolling my own stripped down objects, and using caching when things get really hairy.

I asked a question on this sort of topic before ( Storable Objects ), and for that problem I did end up setting 'store-points' where I would cache the appropriate info as certain critic points. This worked, but certainly isn't applicable to all cases, especially ones like you mention where the processing isn't so linear.

If you have an example problem (and solutions you tried), please post it here, it would be good to get discussion going - as I said, I think this is a very relevant problem.

Just a something something...

[reply]

In Section Tutorials