devi has asked for the wisdom of the Perl Monks concerning the following question:

hello every one i have a program to identify a given motif in the dna seq and color that particular region alone. when i am trying it by using ansi color my entire seq is being colored what should i do to color only the motif part here is the code that i have used
#use strict; #use warnings; #Progarm to find motif site in a given protein sequence using files use Term::ANSIColor open(READ,"<dna.txt"); @e=<READ>; $a=join(" ",@e); $a=~s/\s+//gi; $s=0;@c; for($i=0;$i<length($a);$i++) { $d=substr($a,$i,6); if($d eq "AGGGGG") { print color 'bold green'; $s++; push(@c,$i+1); } } print $a ; print"NUMBER OF SITES THE MOTIF (AGGGGG) IS PRESENT: $s\n"; print"AND THE POSITION IN THE STRING IS:", print join(',',(@c,"\n")),"\n\n";

Replies are listed 'Best First'.
Re: motif finding
by quester (Vicar) on Jan 31, 2012 at 07:12 UTC

    Welcome to the Monastery!

    Please pay attention to the posting rules, especially the one about enclosing your code in ... tags. It makes the code much easier to read, which results in more people wanting to help you.

    First for a line by line comment on you code the way you wrote it...

    # use strict; # use warnings;

    Um... you commented out use strict and use warnings? That's like telling the compiler "I don't need your help! I know what I'm doing! I'm certain!". Leave them in and fix the errors that show up. For instance...

    open(READ,"<dna.txt"); @e = <READ>; $a = join( " ", @e );

    should be...

    open(READ,"<dna.txt"); my @e = <READ>; my $a = join( " ", @e );

    Use strict requires that every variable be declared with my (or with "our" or "state" in more complicated cases.)

    To fix the coloring, while still following your original logic, I would change

    for ( $i = 0 ; $i < length($a) ; $i++ ) { $d = substr( $a, $i, 6 ); if ( $d eq "AGGGGG" ) { print color 'bold green'; $s++; push( @c, $i + 1 ); } } print $a ;

    to something like

    my $printed = 0; for ( my $i = 0 ; $i < length($a) ; $i++ ) { my $d = substr( $a, $i, 6 ); if ( $d eq "AGGGGG" ) { print substr( $a, $printed, $i - $printed ), color( 'bold green' ), $d, color( 'black' ); $printed = $i + 6; $s++; push( @c, $i + 1 ); } } print substr( $a, $printed ), "\n";

    I am printing out the section before each match, switching to green, printing the match, then switching back to black (it doesn't do that automatically.) Note the extra $printed variable is there to keep track of where the last match ended.


    print "AND THE POSITION IN THE STRING IS:", print join( ',', ( @c, "\n +" ) ), "\n\n";

    should be

    print "AND THE POSITIONS IN THE STRING ARE:", join( ',', @c ), "\n\n";

    which fixes two problems; one is an extra comma after the position of the last match, because join was putting a comma between the last element of @c and the "\n". The second is that it was printing an extra "1". That's because the second print was interpreted as a function call. The print function works like the print statement but it also returns a value, usually 1 to indicate that it worked.

Re: motif finding
by Khen1950fx (Canon) on Jan 31, 2012 at 07:09 UTC
    Try this:
    #!/usr/bin/perl use strict; use warnings; use Term::ANSIColor::Print open READ, '<', '/root/Desktop/rna.dna'; my (@e) = <READ>; my $a = join( " ", @e ); $a =~ s/\s+//gi; my $s = 0; my @c; my $i; for ( $i = 0; $i < length($a); ++$i ) { my $d = substr( $a, $i, 6 ); unless ( $d eq "AGGGGG" ) { my $print = Term::ANSIColor::Print->new(); $print->bold_green($d); ++$s; push( @c, $i + 1 ); } } print $a; print "NUMBER OF SITES THE MOTIF (AGGGGG) IS PRESENT: $s"; print "AND THE POSITION IN THE STRING IS: \n", print join( ',', ( @c, "\n" ) ), "\n";
Re: motif finding
by quester (Vicar) on Jan 31, 2012 at 07:58 UTC

    A more "Perl-ish" way of doing the same thing...

    use strict; use warnings; use Term::ANSIColor; use autodie; #Program to find motif site in a given protein sequence using files my $motif = "AGGGGG"; open( my $read, "<dna.txt" ); my @e = <$read>; $_ = join( " ", @e ); s/\s+//g; my @c; push @c, pos( ) - length( $motif ) + 1 while /$motif/g; s/$motif/color( 'bold green' ) . $motif . color( 'black' )/eg; print $_, "\n"; print "Number of sites the motif (AGGGGG) is present: ", scalar @c, "\ +n"; print "And the positions in the string are: ", join( ',', @c ), "\n\n" +;
    The eliminates counting characters one at a time, as in the $i loop in the original, in favor of using pattern matching on the entire character string. I have found that eliminating loop counters wherever possible greatly reduces the number of bugs in my code.
      Or, even more Perl-ish, with a bit less extra work (e.g. only one //g loop):
      use Term::ANSIColor; open(READ,"<dna.txt"); $m = 'AGGGGG'; $_ = do { local $/; <READ> }; # read whole file s/\s+//g; # remove blanks s{$m}{ # search the string push @c, 1 - length($m) + pos; # remember position color('bold green').$m.color('reset'); # remember to reset! }eg; print "$_\n"; # print transformed string print "NUMBER OF SITES THE MOTIF ($m) IS PRESENT: ".@c."\n"; print "AND THE POSITION IN THE STRING IS:", join(',', @c), "\n\n";
        my motif input is a file, how i can modified the program to make it work?

      I think using File::Slurp is even easier and more perl-ish :)

      use File::Slurp; # read file as a string my $text = read_file('dna.txt'); # now remove whitespace including line breaks $text =~ s/\s+//g; # stuff
      (update : removed a stray space)

        Thank you very much for the reply. In the code that i have used i am giving the input (the motif sequence). considering entire genome as a single string if i want the most repeated elements of say 20 base pairs in the entire string how can i find it?