Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: renaming 1000's of FASTA files

by sundialsvc4 (Abbot)
on Jul 11, 2011 at 20:30 UTC ( #913769=note: print w/replies, xml ) Need Help??

in reply to renaming 1000's of FASTA files

I agree that SQLite is quite probably “the thing to do” here.   I would put each of the sequences into a table, along with (in a separate column) whatever key value you happen to be looking for ... whatever (I’m not a biologist...) “identifies” the sequence, whether or not it is unique.

If there are many keys that might identify a particular sequence, “okay, big deal, you have a many-to-one relationship.”

Anyhow ... SQLite gives you a quite-robust SQL implementation, in a single file, without a server.   (Really, the only “gotcha” that it has is transactions:   when you are changing data, you need to be in a transaction while doing so, or you’ll really regret it speed-wise.)

Now, build non-unique indexes.

Once you have the data in this database form, I suspect that a lot of your “difficult processing” will be reduced to SQL queries that you might not even require a Perl program to run.

SQLite is quite the tool ... I really beat the heck out of it once, or tried to.   It didn’t flinch.   31 million rows at one point, and it still wasn’t flinching.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://913769]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2022-08-15 03:45 GMT
Find Nodes?
    Voting Booth?

    No recent polls found