motif finding

devi has asked for the wisdom of the Perl Monks concerning the following question:

hello every one i have a program to identify a given motif in the dna seq and color that particular region alone. when i am trying it by using ansi color my entire seq is being colored what should i do to color only the motif part here is the code that i have used

#use strict;

#use warnings;

#Progarm to find motif site in a given protein sequence using files

use Term::ANSIColor 

open(READ,"<dna.txt");

@e=<READ>;

$a=join(" ",@e);

$a=~s/\s+//gi;

$s=0;@c;

for($i=0;$i<length($a);$i++)

{    

$d=substr($a,$i,6);

if($d eq "AGGGGG")

{

print color 'bold green';

$s++;

push(@c,$i+1);

}

}
print $a ;

print"NUMBER OF SITES THE MOTIF (AGGGGG) IS PRESENT: $s\n";

print"AND THE POSITION IN THE STRING IS:", 

print join(',',(@c,"\n")),"\n\n";
[download]

Comment on motif finding Download Code

Replies are listed 'Best First'.
Re: motif finding by quester (Vicar) on Jan 31, 2012 at 07:12 UTC
Welcome to the Monastery! Please pay attention to the posting rules, especially the one about enclosing your code in `...` tags. It makes the code much easier to read, which results in more people wanting to help you. First for a line by line comment on you code the way you wrote it... `# use strict; # use warnings;` [download] Um... you commented out use strict and use warnings? That's like telling the compiler "I don't need your help! I know what I'm doing! I'm certain!". Leave them in and fix the errors that show up. For instance... `open(READ,"<dna.txt"); @e = <READ>; $a = join( " ", @e );` [download] should be... `open(READ,"<dna.txt"); my @e = <READ>; my $a = join( " ", @e );` [download] Use strict requires that every variable be declared with my (or with "our" or "state" in more complicated cases.) To fix the coloring, while still following your original logic, I would change `for ( $i = 0 ; $i < length($a) ; $i++ ) { $d = substr( $a, $i, 6 ); if ( $d eq "AGGGGG" ) { print color 'bold green'; $s++; push( @c, $i + 1 ); } } print $a ;` [download] to something like `my $printed = 0; for ( my $i = 0 ; $i < length($a) ; $i++ ) { my $d = substr( $a, $i, 6 ); if ( $d eq "AGGGGG" ) { print substr( $a, $printed, $i - $printed ), color( 'bold green' ), $d, color( 'black' ); $printed = $i + 6; $s++; push( @c, $i + 1 ); } } print substr( $a, $printed ), "\n";` [download] I am printing out the section before each match, switching to green, printing the match, then switching back to black (it doesn't do that automatically.) Note the extra $printed variable is there to keep track of where the last match ended. Finally, `print "AND THE POSITION IN THE STRING IS:", print join( ',', ( @c, "\n +" ) ), "\n\n";` [download] should be `print "AND THE POSITIONS IN THE STRING ARE:", join( ',', @c ), "\n\n";` [download] which fixes two problems; one is an extra comma after the position of the last match, because join was putting a comma between the last element of @c and the "\n". The second is that it was printing an extra "1". That's because the second print was interpreted as a function call. The print function works like the print statement but it also returns a value, usually 1 to indicate that it worked.	[reply] [d/l] [select]
Re: motif finding by Khen1950fx (Canon) on Jan 31, 2012 at 07:09 UTC
Try this: #!/usr/bin/perl use strict; use warnings; use Term::ANSIColor::Print open READ, '<', '/root/Desktop/rna.dna'; my (@e) = <READ>; my $a = join( " ", @e ); $a =~ s/\s+//gi; my $s = 0; my @c; my $i; for ( $i = 0; $i < length($a); ++$i ) { my $d = substr( $a, $i, 6 ); unless ( $d eq "AGGGGG" ) { my $print = Term::ANSIColor::Print->new(); $print->bold_green($d); ++$s; push( @c, $i + 1 ); } } print $a; print "NUMBER OF SITES THE MOTIF (AGGGGG) IS PRESENT: $s"; print "AND THE POSITION IN THE STRING IS: \n", print join( ',', ( @c, "\n" ) ), "\n"; [download]	[reply] [d/l]
Re: motif finding by quester (Vicar) on Jan 31, 2012 at 07:58 UTC
A more "Perl-ish" way of doing the same thing... use strict; use warnings; use Term::ANSIColor; use autodie; #Program to find motif site in a given protein sequence using files my $motif = "AGGGGG"; open( my $read, "<dna.txt" ); my @e = <$read>; $_ = join( " ", @e ); s/\s+//g; my @c; push @c, pos( ) - length( $motif ) + 1 while /$motif/g; s/$motif/color( 'bold green' ) . $motif . color( 'black' )/eg; print $_, "\n"; print "Number of sites the motif (AGGGGG) is present: ", scalar @c, "\ +n"; print "And the positions in the string are: ", join( ',', @c ), "\n\n" +; [download] The eliminates counting characters one at a time, as in the $i loop in the original, in favor of using pattern matching on the entire character string. I have found that eliminating loop counters wherever possible greatly reduces the number of bugs in my code.	[reply] [d/l]
Re^2: motif finding by educated_foo (Vicar) on Jan 31, 2012 at 13:55 UTC
Or, even more Perl-ish, with a bit less extra work (e.g. only one //g loop): `use Term::ANSIColor; open(READ,"<dna.txt"); $m = 'AGGGGG'; $_ = do { local $/; <READ> }; # read whole file s/\s+//g; # remove blanks s{$m}{ # search the string push @c, 1 - length($m) + pos; # remember position color('bold green').$m.color('reset'); # remember to reset! }eg; print "$_\n"; # print transformed string print "NUMBER OF SITES THE MOTIF ($m) IS PRESENT: ".@c."\n"; print "AND THE POSITION IN THE STRING IS:", join(',', @c), "\n\n";` [download]	[reply] [d/l]
Re^3: motif finding by Anonymous Monk on Oct 05, 2013 at 21:07 UTC
my motif input is a file, how i can modified the program to make it work?	[reply]
Re^2: motif finding by RichardK (Parson) on Jan 31, 2012 at 14:19 UTC
I think using File::Slurp is even easier and more perl-ish :) `use File::Slurp; # read file as a string my $text = read_file('dna.txt'); # now remove whitespace including line breaks $text =~ s/\s+//g; # ...do stuff` [download] (update : removed a stray space)	[reply] [d/l]
Re^3: motif finding by devi (Initiate) on Feb 03, 2012 at 06:08 UTC
Thank you very much for the reply. In the code that i have used i am giving the input (the motif sequence). considering entire genome as a single string if i want the most repeated elements of say 20 base pairs in the entire string how can i find it?	[reply]
Re^4: motif finding by RichardK (Parson) on Feb 03, 2012 at 12:47 UTC

Back to Seekers of Perl Wisdom