Re: Perl code for reverse complement of DNA sequence

Welcome to the Monastery.

[I appreciate this is your first post here. For future reference, please read "How do I post a question effectively?". Also consider the comments others have made in this thread.]

For demonstration purposes, I've truncated your DNA sequences to 20 characters. I believe what you're after is code something like this (pm_11131708_fasta_rev_comp.pl):

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $in_file = 'pm_11131708_in';
my $out_file = 'pm_11131708_out';

{
    open my $in_fh, '<', $in_file;
    open my $out_fh, '>', $out_file;

    while (<$in_fh>) {
        if (0 == index $_, '>') {
            print $out_fh $_;
        }
        else {
            chomp;
            my $rev = scalar reverse;
            $rev =~ y/ATGC/TACG/;
            print $out_fh "$rev\n";
        }
    }
}
[download]

Sample run:

$ cat pm_11131708_in
>adbca3e
TGCTCCCCACGCTTGCCTCT
>4c2a958
TCCCCACGCTTTCGCGCTTC
>0639b5b
TCGCGCCTCAGTGTCCAACG

$ ./pm_11131708_fasta_rev_comp.pl

$ cat pm_11131708_out
>adbca3e
AGAGGCAAGCGTGGGGAGCA
>4c2a958
GAAGCGCGAAAGCGTGGGGA
>0639b5b
CGTTGGACACTGAGGCGCGA
[download]

See also: autodie, open, index and reverse.

— Ken

Comment on Re: Perl code for reverse complement of DNA sequence Select or Download Code

Replies are listed 'Best First'.
Re^2: Perl code for reverse complement of DNA sequence by AnomalousMonk (Archbishop) on Apr 27, 2021 at 18:54 UTC
My understanding, feeble as it is, of the FASTA file format is that the base sequence portion of a FASTA record in such a file is of indeterminate length and can be multi-line, e.g., something like: `... >4c2a958... TCCCCACGCTTTCGCGCTTC... TAGTAGTAGTAGTAG... ... >0639b5b... ...` [download] I think it should be noted that your solution++ reverse-complements the DNA sequence on a line-by-line basis, so the entire DNA sequence of a FASTA record (possibly many lines of varying length) will not, as I understand it, be correctly processed. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Perl code for reverse complement of DNA sequence by kcott (Archbishop) on Apr 27, 2021 at 23:51 UTC
The DNA sequences could be spread over multiple lines; however, the OP sample data only showed single lines. My code dealt with the sample data that was presented. Records could start with a semicolon. The OP sample data did not show any such records, so I made no attempt to process that either. There could be errors in the input — for instance, illegally embedded blank lines or letters that didn't represent a nucleotide — which I made no attempt to validate. I made it clear that the solution I presented was "code something like this". It provides the basic elements of what is required based on what was asked. If there are additional, or different, requirements which the OP cannot code, that should form a follow-up question. — Ken	[reply]


laziness, impatience, and hubris
	PerlMonks