No such thing as a small change | |
PerlMonks |
Re: renaming 1000's of FASTA filesby sundialsvc4 (Abbot) |
on Jul 11, 2011 at 20:30 UTC ( [id://913769]=note: print w/replies, xml ) | Need Help?? |
I agree that SQLite is quite probably “the thing to do” here. I would put each of the sequences into a table, along with (in a separate column) whatever key value you happen to be looking for ... whatever (I’m not a biologist...) “identifies” the sequence, whether or not it is unique. If there are many keys that might identify a particular sequence, “okay, big deal, you have a many-to-one relationship.” Anyhow ... SQLite gives you a quite-robust SQL implementation, in a single file, without a server. (Really, the only “gotcha” that it has is transactions: when you are changing data, you need to be in a transaction while doing so, or you’ll really regret it speed-wise.) Now, build non-unique indexes. Once you have the data in this database form, I suspect that a lot of your “difficult processing” will be reduced to SQL queries that you might not even require a Perl program to run. SQLite is quite the tool ... I really beat the heck out of it once, or tried to. It didn’t flinch. 31 million rows at one point, and it still wasn’t flinching.
In Section
Seekers of Perl Wisdom
|
|