Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl code for reverse complement of DNA sequence

by mashuk (Initiate)
on Apr 25, 2021 at 18:02 UTC ( [id://11131708]=perlquestion: print w/replies, xml ) Need Help??

mashuk has asked for the wisdom of the Perl Monks concerning the following question:

I was trying to make a perl code to get the reverse complement of fasta sequences of DNA in a .fna file format. the sequence02C.fna file contains 100s of DNA sequences :

>adbca3e TGCTCCCCACGCTTGCCTCTCCAGTACTCAACCAAAGCAGTCTCTAGAAAAACAGTTTCCAACGCAATAC +GATGGAATTCCACTTCCCAAATATCTC >4c2a958 TCCCCACGCTTTCGCGCTTCAGCGTCAGTATCTGTCCAGTGAGCTGACTTCTCCATCGGCATTCCTACAC +AGTACTCTAGAAAAACAGTTTCTGCTC >0639b5b TCGCGCCTCAGTGTCCAACGCAATACGAGTTGCAGACCAGGACACATGGAATTCCACTTCCCTCTCCAGT +ACTCAACCAAAGCAGTCTCTAGAAAAG

output file is only the complementary of the sequence but not reverse. Additionally, it does not contain the sequenceIDs(ASVs) ">adbca3e"

Could anyone please suggest me the appropriate code to do this reverse complementary of this sequence at once? Thanks in advance.

#!/usr/local/perl open (NS, "sequence02C.fna"); while (<NS>) { if ($_ =~ tr/ATGC/TACG/) {print $_;} }

Replies are listed 'Best First'.
Re: Perl code for reverse complement of DNA sequence (updated)
by AnomalousMonk (Archbishop) on Apr 25, 2021 at 19:17 UTC

    If I understand correctly, the reverse complement of a DNA sequence (or string, as it is in Perl) is literally the reverse of the complement of the sequence/string (update: or maybe it's the complement of the reverse :). See reverse.

    Win8 Strawberry 5.30.3.1 (64) Sun 04/25/2021 15:11:49 C:\@Work\Perl\monks >perl use 5.014; # /r modifier for tr///anslation & s///ubstitution use strict; use warnings; my $s = 'ACTGTTCCAAGG'; print "'$s' \n"; $s = reverse $s =~ tr/ATCG/TAGC/r; print "'$s' \n"; ^Z 'ACTGTTCCAAGG' 'CCTTGGAACAGT'
    If your version of Perl does not support the /r translation/substitution modifier (version 5.14+), let us know and alternate code can easily be given.

    Update: The result above is also returned for the given input by the http://www.bioinformatics.org/sms/rev_comp.html website | webpage. I'm not a bio-guy, so I hope this means I understand the problem somewhat and the code above is at least minimally useful.


    Give a man a fish:  <%-{-{-{-<

Re: Perl code for reverse complement of DNA sequence
by kcott (Archbishop) on Apr 27, 2021 at 10:07 UTC

    G'day mashuk,

    Welcome to the Monastery.

    [I appreciate this is your first post here. For future reference, please read "How do I post a question effectively?". Also consider the comments others have made in this thread.]

    For demonstration purposes, I've truncated your DNA sequences to 20 characters. I believe what you're after is code something like this (pm_11131708_fasta_rev_comp.pl):

    #!/usr/bin/env perl use strict; use warnings; use autodie; my $in_file = 'pm_11131708_in'; my $out_file = 'pm_11131708_out'; { open my $in_fh, '<', $in_file; open my $out_fh, '>', $out_file; while (<$in_fh>) { if (0 == index $_, '>') { print $out_fh $_; } else { chomp; my $rev = scalar reverse; $rev =~ y/ATGC/TACG/; print $out_fh "$rev\n"; } } }

    Sample run:

    $ cat pm_11131708_in >adbca3e TGCTCCCCACGCTTGCCTCT >4c2a958 TCCCCACGCTTTCGCGCTTC >0639b5b TCGCGCCTCAGTGTCCAACG $ ./pm_11131708_fasta_rev_comp.pl $ cat pm_11131708_out >adbca3e AGAGGCAAGCGTGGGGAGCA >4c2a958 GAAGCGCGAAAGCGTGGGGA >0639b5b CGTTGGACACTGAGGCGCGA

    See also: autodie, open, index and reverse.

    — Ken

      My understanding, feeble as it is, of the FASTA file format is that the base sequence portion of a FASTA record in such a file is of indeterminate length and can be multi-line, e.g., something like:

      ... >4c2a958... TCCCCACGCTTTCGCGCTTC... TAGTAGTAGTAGTAG... ... >0639b5b... ...

      I think it should be noted that your solution++ reverse-complements the DNA sequence on a line-by-line basis, so the entire DNA sequence of a FASTA record (possibly many lines of varying length) will not, as I understand it, be correctly processed.


      Give a man a fish:  <%-{-{-{-<

        The DNA sequences could be spread over multiple lines; however, the OP sample data only showed single lines. My code dealt with the sample data that was presented.

        Records could start with a semicolon. The OP sample data did not show any such records, so I made no attempt to process that either.

        There could be errors in the input — for instance, illegally embedded blank lines or letters that didn't represent a nucleotide — which I made no attempt to validate.

        I made it clear that the solution I presented was "code something like this". It provides the basic elements of what is required based on what was asked. If there are additional, or different, requirements which the OP cannot code, that should form a follow-up question.

        — Ken

Re: Perl code for reverse complement of DNA sequence
by BillKSmith (Monsignor) on Apr 26, 2021 at 15:51 UTC
    You would receive a lot more answers if you asked your question in the vocabulary of perl rather than biology. Provide a link to the definition of '.fna file'. Explain which features of that format are applicable. Most of your readers will not understand the word 'complement' the way you mean it. I am not sure about 'reverse'. Please post a working example that demonstrates the failure your describe. Include the output that it produces and the output that you intend.
    Bill

      Probably the OP means this? I have no idea what this means. Best regards, Peter

      «The Crux of the Biscuit is the Apostrophe»

Re: Perl code for reverse complement of DNA sequence
by jo37 (Deacon) on Apr 25, 2021 at 18:25 UTC
      Crossposted to SO.

      ... link?

Re: Perl code for reverse complement of DNA sequence
by 1nickt (Canon) on Apr 25, 2021 at 18:35 UTC

    Hi, do yourself a favour and use the SuperSearch on this site to search for many similar questions, with answers.

    Hope this helps!


    The way forward always starts with a minimal test.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11131708]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-26 00:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found