Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^10: how to read input from a file, one section at a time?

by davi54 (Sexton)
on Apr 02, 2019 at 15:32 UTC ( [id://1232025]=note: print w/replies, xml ) Need Help??


in reply to Re^9: how to read input from a file, one section at a time?
in thread how to read input from a file, one section at a time?

Hi,

My apologies for not being clear. Just to let you know, multiple proteins can have different header sequences but identical sequence information. When I say duplicate entries, I mean the actual sequence (and not the header). I want the script to read the input file and identify if there are more than one entries with the same sequence information and print them. Does that help? Again, sorry for the confusion and thank you for your help.

  • Comment on Re^10: how to read input from a file, one section at a time?

Replies are listed 'Best First'.
Re^11: how to read input from a file, one section at a time?
by poj (Abbot) on Apr 02, 2019 at 15:43 UTC

    Try

    my %fasta_seen; FASTA_RECORD: while ( my $para = <$PROTFILE> ) { # Remove fasta header line if ( $para =~ s/^>(.*)//m ){ $name = $1; }; # Remove comment line(s) $para =~ s/^\s*#.*//mg; next FASTA_RECORD if $fasta_seen{ $para }++; …

    This may not be a sensible solution if your sequences are very long in which case consider using a message digest like Digest::MD5

    poj
      And how do I print the duplicate entries?
        #next FASTA_RECORD if $fasta_seen{ $para }++; if ( $fasta_seen{ $para }++ ){ print "DUPLICATE : $name \n $para\n"; next FASTA_RECORD; }
        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1232025]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 22:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found