Perl code for reverse complement of DNA sequence

mashuk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl code for reverse complement of DNA sequence (updated) by AnomalousMonk (Archbishop) on Apr 25, 2021 at 19:17 UTC
If I understand correctly, the reverse complement of a DNA sequence (or string, as it is in Perl) is literally the reverse of the complement of the sequence/string (update: or maybe it's the complement of the reverse :). See reverse. `Win8 Strawberry 5.30.3.1 (64) Sun 04/25/2021 15:11:49 C:\@Work\Perl\monks >perl use 5.014; # /r modifier for tr///anslation & s///ubstitution use strict; use warnings; my $s = 'ACTGTTCCAAGG'; print "'$s' \n"; $s = reverse $s =~ tr/ATCG/TAGC/r; print "'$s' \n"; ^Z 'ACTGTTCCAAGG' 'CCTTGGAACAGT'` [download] If your version of Perl does not support the `/r` translation/substitution modifier (version 5.14+), let us know and alternate code can easily be given. Update: The result above is also returned for the given input by the `http://www.bioinformatics.org/sms/rev_comp.html` ~~website~~ \| webpage. I'm not a bio-guy, so I hope this means I understand the problem somewhat and the code above is at least minimally useful. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: Perl code for reverse complement of DNA sequence by kcott (Archbishop) on Apr 27, 2021 at 10:07 UTC
G'day mashuk, Welcome to the Monastery. [I appreciate this is your first post here. For future reference, please read "How do I post a question effectively?". Also consider the comments others have made in this thread.] For demonstration purposes, I've truncated your DNA sequences to 20 characters. I believe what you're after is code something like this (`pm_11131708_fasta_rev_comp.pl`): `#!/usr/bin/env perl use strict; use warnings; use autodie; my $in_file = 'pm_11131708_in'; my $out_file = 'pm_11131708_out'; { open my $in_fh, '<', $in_file; open my $out_fh, '>', $out_file; while (<$in_fh>) { if (0 == index $_, '>') { print $out_fh $_; } else { chomp; my $rev = scalar reverse; $rev =~ y/ATGC/TACG/; print $out_fh "$rev\n"; } } }` [download] Sample run: `$ cat pm_11131708_in >adbca3e TGCTCCCCACGCTTGCCTCT >4c2a958 TCCCCACGCTTTCGCGCTTC >0639b5b TCGCGCCTCAGTGTCCAACG $ ./pm_11131708_fasta_rev_comp.pl $ cat pm_11131708_out >adbca3e AGAGGCAAGCGTGGGGAGCA >4c2a958 GAAGCGCGAAAGCGTGGGGA >0639b5b CGTTGGACACTGAGGCGCGA` [download] See also: autodie, open, index and reverse. — Ken	[reply] [d/l] [select]
Re^2: Perl code for reverse complement of DNA sequence by AnomalousMonk (Archbishop) on Apr 27, 2021 at 18:54 UTC
My understanding, feeble as it is, of the FASTA file format is that the base sequence portion of a FASTA record in such a file is of indeterminate length and can be multi-line, e.g., something like: `... >4c2a958... TCCCCACGCTTTCGCGCTTC... TAGTAGTAGTAGTAG... ... >0639b5b... ...` [download] I think it should be noted that your solution++ reverse-complements the DNA sequence on a line-by-line basis, so the entire DNA sequence of a FASTA record (possibly many lines of varying length) will not, as I understand it, be correctly processed. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Perl code for reverse complement of DNA sequence by kcott (Archbishop) on Apr 27, 2021 at 23:51 UTC
The DNA sequences could be spread over multiple lines; however, the OP sample data only showed single lines. My code dealt with the sample data that was presented. Records could start with a semicolon. The OP sample data did not show any such records, so I made no attempt to process that either. There could be errors in the input — for instance, illegally embedded blank lines or letters that didn't represent a nucleotide — which I made no attempt to validate. I made it clear that the solution I presented was "code something like this". It provides the basic elements of what is required based on what was asked. If there are additional, or different, requirements which the OP cannot code, that should form a follow-up question. — Ken	[reply]
Re: Perl code for reverse complement of DNA sequence by BillKSmith (Monsignor) on Apr 26, 2021 at 15:51 UTC
You would receive a lot more answers if you asked your question in the vocabulary of perl rather than biology. Provide a link to the definition of '.fna file'. Explain which features of that format are applicable. Most of your readers will not understand the word 'complement' the way you mean it. I am not sure about 'reverse'. Please post a working example that demonstrates the failure your describe. Include the output that it produces and the output that you intend. Bill	[reply]
Re^2: Perl code for reverse complement of DNA sequence by karlgoethebier (Abbot) on Apr 26, 2021 at 19:38 UTC
Probably the OP means this? I have no idea what this means. Best regards, Peter �The Crux of the Biscuit is the Apostrophe�	[reply]
Re: Perl code for reverse complement of DNA sequence by jo37 (Deacon) on Apr 25, 2021 at 18:25 UTC
Crossposted to SO. Update: https://stackoverflow.com/questions/67256378/perl-code-for-reverse-complement-of-dna-sequence Greetings, -jo `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply]
Re^2: Perl code for reverse complement of DNA sequence by Anonymous Monk on Apr 25, 2021 at 19:20 UTC
Crossposted to SO. ... link?	[reply]
Re: Perl code for reverse complement of DNA sequence by 1nickt (Canon) on Apr 25, 2021 at 18:35 UTC
Hi, do yourself a favour and use the SuperSearch on this site to search for many similar questions, with answers. Hope this helps! The way forward always starts with a minimal test.	[reply]


Keep It Simple, Stupid
	PerlMonks