Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: motif finding

by quester (Vicar)
on Jan 31, 2012 at 07:58 UTC ( #950903=note: print w/replies, xml ) Need Help??

in reply to motif finding

A more "Perl-ish" way of doing the same thing...

use strict; use warnings; use Term::ANSIColor; use autodie; #Program to find motif site in a given protein sequence using files my $motif = "AGGGGG"; open( my $read, "<dna.txt" ); my @e = <$read>; $_ = join( " ", @e ); s/\s+//g; my @c; push @c, pos( ) - length( $motif ) + 1 while /$motif/g; s/$motif/color( 'bold green' ) . $motif . color( 'black' )/eg; print $_, "\n"; print "Number of sites the motif (AGGGGG) is present: ", scalar @c, "\ +n"; print "And the positions in the string are: ", join( ',', @c ), "\n\n" +;
The eliminates counting characters one at a time, as in the $i loop in the original, in favor of using pattern matching on the entire character string. I have found that eliminating loop counters wherever possible greatly reduces the number of bugs in my code.

Replies are listed 'Best First'.
Re^2: motif finding
by educated_foo (Vicar) on Jan 31, 2012 at 13:55 UTC
    Or, even more Perl-ish, with a bit less extra work (e.g. only one //g loop):
    use Term::ANSIColor; open(READ,"<dna.txt"); $m = 'AGGGGG'; $_ = do { local $/; <READ> }; # read whole file s/\s+//g; # remove blanks s{$m}{ # search the string push @c, 1 - length($m) + pos; # remember position color('bold green').$m.color('reset'); # remember to reset! }eg; print "$_\n"; # print transformed string print "NUMBER OF SITES THE MOTIF ($m) IS PRESENT: ".@c."\n"; print "AND THE POSITION IN THE STRING IS:", join(',', @c), "\n\n";
      my motif input is a file, how i can modified the program to make it work?
Re^2: motif finding
by RichardK (Parson) on Jan 31, 2012 at 14:19 UTC

    I think using File::Slurp is even easier and more perl-ish :)

    use File::Slurp; # read file as a string my $text = read_file('dna.txt'); # now remove whitespace including line breaks $text =~ s/\s+//g; # stuff
    (update : removed a stray space)

      Thank you very much for the reply. In the code that i have used i am giving the input (the motif sequence). considering entire genome as a single string if i want the most repeated elements of say 20 base pairs in the entire string how can i find it?

        I'm not sure what you're looking for, can you explain with a simple example?

        Are you looking for repeats of given string or something more complex?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://950903]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2023-09-30 16:03 GMT
Find Nodes?
    Voting Booth?

    No recent polls found