Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: regular expression questions (from someone without experience)

by biohisham (Priest)
on Sep 22, 2010 at 14:54 UTC ( [id://861298]=note: print w/replies, xml ) Need Help??


in reply to regular expression questions (from someone without experience)

Well, the miRBase data format is spooky, but initially what I'll want to do is to clean up this file by removing the lines that have no interesting value to the analysis problem at hand, you can keep the original file intact and the cleaned up file(s) be generated from there and each can have their own subset of the original file and their own subproblem to be analysed that collectively culminate into achieving the overall analytical goal (N.B. You've not mentioned what you intend to do with the file sections you wanted captured).

I have to disagree with Moritz's reliance on the '*' to separate the records (this arose from the OP's description) because, these '*'s in here have a different meaning all together and they aren't record separators at all since they're used to reflect how two lines -or multiple ones for that matter- of letters are identical at the character level in that position, this is known as Sequence Alignment, so if these sequences weren't identical no '*' appears and thus two records can be inadvertently fused and if an alignment appeared mid-record then a record could be separated into two without having noticed so. On a related note you use the '-' to represent alignment gaps.

gap | v TTCCAG-CCAGCTTTGTGACT-CTA TTCCAGCCCAGCTTTATGACT-GTA TTCCAGCCCAGCTTCTTCGCT-CTG ****** ****** * * ^ | identity
Back to topic, refining the file by purging the unwanted lines can probably allow you to use one of the BioPerl modules to tackle the entire problem without writing much code after all and can enable us to see a clear definition thereof in order to relevantly provide assistance.

You may want to read Perl and Bioinformatics in addition.


Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://861298]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-20 03:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found