Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: renaming 1000's of FASTA files

by sundialsvc4 (Abbot)
on Jul 11, 2011 at 20:30 UTC ( #913769=note: print w/replies, xml ) Need Help??


in reply to renaming 1000's of FASTA files

I agree that SQLite is quite probably “the thing to do” here.   I would put each of the sequences into a table, along with (in a separate column) whatever key value you happen to be looking for ... whatever (I’m not a biologist...) “identifies” the sequence, whether or not it is unique.

If there are many keys that might identify a particular sequence, “okay, big deal, you have a many-to-one relationship.”

Anyhow ... SQLite gives you a quite-robust SQL implementation, in a single file, without a server.   (Really, the only “gotcha” that it has is transactions:   when you are changing data, you need to be in a transaction while doing so, or you’ll really regret it speed-wise.)

Now, build non-unique indexes.

Once you have the data in this database form, I suspect that a lot of your “difficult processing” will be reduced to SQL queries that you might not even require a Perl program to run.

SQLite is quite the tool ... I really beat the heck out of it once, or tried to.   It didn’t flinch.   31 million rows at one point, and it still wasn’t flinching.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://913769]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2022-08-15 03:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?