Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Perl code for reverse complement of DNA sequence

by kcott (Archbishop)
on Apr 27, 2021 at 10:07 UTC ( [id://11131755]=note: print w/replies, xml ) Need Help??


in reply to Perl code for reverse complement of DNA sequence

G'day mashuk,

Welcome to the Monastery.

[I appreciate this is your first post here. For future reference, please read "How do I post a question effectively?". Also consider the comments others have made in this thread.]

For demonstration purposes, I've truncated your DNA sequences to 20 characters. I believe what you're after is code something like this (pm_11131708_fasta_rev_comp.pl):

#!/usr/bin/env perl use strict; use warnings; use autodie; my $in_file = 'pm_11131708_in'; my $out_file = 'pm_11131708_out'; { open my $in_fh, '<', $in_file; open my $out_fh, '>', $out_file; while (<$in_fh>) { if (0 == index $_, '>') { print $out_fh $_; } else { chomp; my $rev = scalar reverse; $rev =~ y/ATGC/TACG/; print $out_fh "$rev\n"; } } }

Sample run:

$ cat pm_11131708_in >adbca3e TGCTCCCCACGCTTGCCTCT >4c2a958 TCCCCACGCTTTCGCGCTTC >0639b5b TCGCGCCTCAGTGTCCAACG $ ./pm_11131708_fasta_rev_comp.pl $ cat pm_11131708_out >adbca3e AGAGGCAAGCGTGGGGAGCA >4c2a958 GAAGCGCGAAAGCGTGGGGA >0639b5b CGTTGGACACTGAGGCGCGA

See also: autodie, open, index and reverse.

— Ken

Replies are listed 'Best First'.
Re^2: Perl code for reverse complement of DNA sequence
by AnomalousMonk (Archbishop) on Apr 27, 2021 at 18:54 UTC

    My understanding, feeble as it is, of the FASTA file format is that the base sequence portion of a FASTA record in such a file is of indeterminate length and can be multi-line, e.g., something like:

    ... >4c2a958... TCCCCACGCTTTCGCGCTTC... TAGTAGTAGTAGTAG... ... >0639b5b... ...

    I think it should be noted that your solution++ reverse-complements the DNA sequence on a line-by-line basis, so the entire DNA sequence of a FASTA record (possibly many lines of varying length) will not, as I understand it, be correctly processed.


    Give a man a fish:  <%-{-{-{-<

      The DNA sequences could be spread over multiple lines; however, the OP sample data only showed single lines. My code dealt with the sample data that was presented.

      Records could start with a semicolon. The OP sample data did not show any such records, so I made no attempt to process that either.

      There could be errors in the input — for instance, illegally embedded blank lines or letters that didn't represent a nucleotide — which I made no attempt to validate.

      I made it clear that the solution I presented was "code something like this". It provides the basic elements of what is required based on what was asked. If there are additional, or different, requirements which the OP cannot code, that should form a follow-up question.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11131755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-19 21:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found