char substitution without regexes

biohisham has asked for the wisdom of the Perl Monks concerning the following question:

A colleague of mine is learning Perl, he approached me with an interesting problem that got him running in circles !. He basically has this string of DNA letters (qw(a g c t)) that he wants to get its reverse complement - reading the string backwards and replacing every letter to the one complementary to it (i.e a<->t, g<->c).

The book exercise requires him to achieve this task by specifically using substr and without using any regex at all, after him trying his best I gave him the following code:

use strict;
use warnings;

my $string = "aaaggctt";
my %hash = qw(a t g c c g t a);

print "The DNA string is: \n";
print $string,"\n";
print "The reverse complement is: \n";

#One way to do it... 1st chunk..
for (my $i = length($string); $i>= 0 ; $i--){
        for (keys %hash){
                print substr($string, $i, 1) eq $_ ? $hash{$_} : '';
                }
        }


print "\n"

#Expansion of the previous solution.. 2nd chunk..
for (my $i = length($string); $i>= 0 ; $i--){
        for (keys %hash){
                if(substr($string, $i, 1) eq $_ ){
                        print $hash{$_};
                        }
                }
        }
[download]

I am interested in two issues. Firstly, will ye wise monks rear forward a TIMTOWDI solution that doesn't use a substitution regex? and that doesn't necessarily use substr, so we can learn something new?.

Secondly, benchmarking the two fragments showed me variations in the code speed that sometimes the first code chunk executes faster and other times slower than the second code chunk !. Briefly, I am using the module Benchmark as follows :

$t1 = Benchmark->new;
# First chunk from the code above
$t2 = Benchmark->new;
$t3 = Benchmark->new;
# Second chunk from the code above
$t4 = Benchmark->new;

$td1 = timediff($t2-$t1);
$td2 = timediff($t4-$t3);

print timestr($td1),"\n",timestr($td2);
[download]

Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

Comment on char substitution without regexes Select or Download Code

Replies are listed 'Best First'.
Re: char substitution without regexes by jwkrahn (Abbot) on Sep 30, 2010 at 16:38 UTC
`( my $new = reverse $string ) =~ tr/atgc/tacg/; print "$new\n";` [download] Update: Also if you want to remove any non a, t, c or g characters: `( my $new = reverse $string ) =~ tr/atgc\0-\277/tacg/d; print "$new\n";` [download]	[reply] [d/l] [select]
Re: char substitution without regexes by ikegami (Patriarch) on Sep 30, 2010 at 17:12 UTC
Awww, jwkrahn beat me to it. Here's one that doesn't use tr/// either: `my %complement = ( a t g c ); %complement = ( %complement, reverse %complement ); print map $complement{$_}, reverse split //, $chain;` [download] `%hash` and `$string` are awful names, so I changed them.	[reply] [d/l] [select]
Re^2: char substitution without regexes by jwkrahn (Abbot) on Sep 30, 2010 at 17:59 UTC
Yes, but, that uses a regex and the OP said "without regexes".	[reply]
Re^3: char substitution without regexes by ikegami (Patriarch) on Sep 30, 2010 at 18:10 UTC
It passes a pattern, but it's actually special-cased not to use the regex engine. Ok, so I did it without thinking. `print map $complement{$_}, reverse unpack '(a)*', $chain;` [download]	[reply] [d/l]
Re^4: char substitution without regexes by jwkrahn (Abbot) on Sep 30, 2010 at 18:16 UTC
Re^5: char substitution without regexes by ikegami (Patriarch) on Sep 30, 2010 at 18:28 UTC
Re: char substitution without regexes by oko1 (Deacon) on Sep 30, 2010 at 18:41 UTC
Good rule of thumb: whenever you think "lookup table", your next thought should be "use a hash". `my (@list, %find); @list = qw/a t g c/; # You can do it manually... @find{@list} = qw/t a c g/; # ...or programmatically: @find{@list} = map {reverse @list[$_2, $_2+1] } 0..@list/2-1; print "$_: $find{$_}\n" for @list;` [download] Output: `a: t t: a g: c c: g` [download] -- "Language shapes the way we think, and determines what we can think about." -- B. L. Whorf	[reply] [d/l] [select]
Re: char substitution without regexes by TomDLux (Vicar) on Sep 30, 2010 at 20:25 UTC
You said you wanted different thinking .... ikegami shows how to generate the map for one character. Use that to generate a set of maps for all possible two character strings, use that to generate a set of maps for all three character strings. When you reach the length of the input string, do a hash lookup, and voila! Obviously slower than doing one string with tr///, but maybe worth it if you have a large number of strings to process. As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply]
Re: char substitution without regexes by Anonymous Monk on Oct 01, 2010 at 12:42 UTC
for (my $i = length($string); $i>= 0 ; $i--){ print $hash{ substr($string, $i, 1) }; }	[reply]
Re^2: char substitution without regexes by Soul Singin' (Initiate) on Oct 02, 2010 at 16:12 UTC
Here's a variant on the above, but syntactically clearer: #!/usr/bin/env perl use strict ; use warnings ; ## DNA string my $string = "aaaggctt"; ## syntactically clear hash my %hash = ( "a" => "t" , "g" => "c" , "c" => "g" , "t" => "a" ); ## print the DNA string and reverse complement ## add some extra spacing to make it look pretty print "\n" ; print "\t The DNA string is: " . $string . "\n" ; print "\tThe reverse complement is: " ; ## go through the string one element at a time ## and print reverse complement for my $i (0..length($string)-1) { print $hash{ substr($string, $i, 1) } ; } ## finish with a little extra spacing print "\n\n" ; [download]	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks