Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

char substitution without regexes

by biohisham (Priest)
on Sep 30, 2010 at 16:30 UTC ( [id://862806]=perlquestion: print w/replies, xml ) Need Help??

biohisham has asked for the wisdom of the Perl Monks concerning the following question:

A colleague of mine is learning Perl, he approached me with an interesting problem that got him running in circles !. He basically has this string of DNA letters (qw(a g c t)) that he wants to get its reverse complement - reading the string backwards and replacing every letter to the one complementary to it (i.e a<->t, g<->c).

The book exercise requires him to achieve this task by specifically using substr and without using any regex at all, after him trying his best I gave him the following code:

use strict; use warnings; my $string = "aaaggctt"; my %hash = qw(a t g c c g t a); print "The DNA string is: \n"; print $string,"\n"; print "The reverse complement is: \n"; #One way to do it... 1st chunk.. for (my $i = length($string); $i>= 0 ; $i--){ for (keys %hash){ print substr($string, $i, 1) eq $_ ? $hash{$_} : ''; } } print "\n" #Expansion of the previous solution.. 2nd chunk.. for (my $i = length($string); $i>= 0 ; $i--){ for (keys %hash){ if(substr($string, $i, 1) eq $_ ){ print $hash{$_}; } } }
I am interested in two issues. Firstly, will ye wise monks rear forward a TIMTOWDI solution that doesn't use a substitution regex? and that doesn't necessarily use substr, so we can learn something new?.

Secondly, benchmarking the two fragments showed me variations in the code speed that sometimes the first code chunk executes faster and other times slower than the second code chunk !. Briefly, I am using the module Benchmark as follows :

$t1 = Benchmark->new; # First chunk from the code above $t2 = Benchmark->new; $t3 = Benchmark->new; # Second chunk from the code above $t4 = Benchmark->new; $td1 = timediff($t2-$t1); $td2 = timediff($t4-$t3); print timestr($td1),"\n",timestr($td2);


Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

Replies are listed 'Best First'.
Re: char substitution without regexes
by jwkrahn (Abbot) on Sep 30, 2010 at 16:38 UTC
    ( my $new = reverse $string ) =~ tr/atgc/tacg/; print "$new\n";

    Update: Also if you want to remove any non a, t, c or g characters:

    ( my $new = reverse $string ) =~ tr/atgc\0-\277/tacg/d; print "$new\n";
Re: char substitution without regexes
by ikegami (Patriarch) on Sep 30, 2010 at 17:12 UTC
    Awww, jwkrahn beat me to it. Here's one that doesn't use tr/// either:
    my %complement = ( a t g c ); %complement = ( %complement, reverse %complement ); print map $complement{$_}, reverse split //, $chain;

    %hash and $string are awful names, so I changed them.

      Yes, but, that uses a regex and the OP said "without regexes".

        It passes a pattern, but it's actually special-cased not to use the regex engine.

        Ok, so I did it without thinking.

        print map $complement{$_}, reverse unpack '(a)*', $chain;
Re: char substitution without regexes
by oko1 (Deacon) on Sep 30, 2010 at 18:41 UTC

    Good rule of thumb: whenever you think "lookup table", your next thought should be "use a hash".

    my (@list, %find); @list = qw/a t g c/; # You can do it manually... @find{@list} = qw/t a c g/; # ...or programmatically: @find{@list} = map {reverse @list[$_*2, $_*2+1] } 0..@list/2-1; print "$_: $find{$_}\n" for @list;

    Output:

    a: t t: a g: c c: g

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
Re: char substitution without regexes
by TomDLux (Vicar) on Sep 30, 2010 at 20:25 UTC

    You said you wanted different thinking ....

    ikegami shows how to generate the map for one character. Use that to generate a set of maps for all possible two character strings, use that to generate a set of maps for all three character strings.

    When you reach the length of the input string, do a hash lookup, and voila!

    Obviously slower than doing one string with tr///, but maybe worth it if you have a large number of strings to process.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Re: char substitution without regexes
by Anonymous Monk on Oct 01, 2010 at 12:42 UTC
    for (my $i = length($string); $i>= 0 ; $i--){ print $hash{ substr($string, $i, 1) }; }

      Here's a variant on the above, but syntactically clearer:

      #!/usr/bin/env perl use strict ; use warnings ; ## DNA string my $string = "aaaggctt"; ## syntactically clear hash my %hash = ( "a" => "t" , "g" => "c" , "c" => "g" , "t" => "a" ); ## print the DNA string and reverse complement ## add some extra spacing to make it look pretty print "\n" ; print "\t The DNA string is: " . $string . "\n" ; print "\tThe reverse complement is: " ; ## go through the string one element at a time ## and print reverse complement for my $i (0..length($string)-1) { print $hash{ substr($string, $i, 1) } ; } ## finish with a little extra spacing print "\n\n" ;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://862806]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-20 04:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found