comment on

Does this do what you want? There is no need to split the sequence into an array as pos will allow you to find where in a string a match has been made. Note that [^ACGT] is a negative character class, i.e. match anything that isn't A, C, G or T. Using capturing parentheses, ( ... ), and matching globally, m{ ... }g or / ... /g will advance along the sequence looking for invalid letters.

I am opening a file that is held inside the script just to keep things tidy on my system but the code will work fine with STDIN. The code.

use 5.026;
use warnings;

open my $dnaFH, q{<}, \ <<__EOD__ or die $!;
TAAGAACAATAAGAACAAGAACAATAA
GAACAATAAGXAATAAGAAXXAACAAGAACAATAA
ACAATAAAAGAACAATAAGAA
__EOD__

while ( my $sequence = <$dnaFH> )
{
    chomp $sequence;
    my $length = length $sequence;
    say qq{Sequence: $sequence -- Length $length};
    if ( $sequence =~ m{^[ACGT]+$} )
    {
        say q{     Sequence is GOOD!};
    }
    else
    {
        my @badPosns;
        push @badPosns, pos $sequence
           while $sequence =~ m{(?x) (?= ( [^ACGT] ) )}g;
        my $nBad = scalar @badPosns;
        my $perc = sprintf q{%.2f}, $nBad / $length * 100;
        say qq{     Sequence is BAD at @badPosns};
        say qq{     $nBad bad positions, $perc\% of total};
    }
}

close $dnaFH or die $!;
[download]

The output.

Sequence: TAAGAACAATAAGAACAAGAACAATAA -- Length 27
     Sequence is GOOD!
Sequence: GAACAATAAGXAATAAGAAXXAACAAGAACAATAA -- Length 35
     Sequence is BAD at 10 19 20
     3 bad positions, 8.57% of total
Sequence: ACAATAAAAGAACAATAAGAA -- Length 21
     Sequence is GOOD!
[download]

I hope this is helpful. Please ask further if you need more help.

Update: There was a mistake in the code, I should have used a look-ahead assertion as without that pos gives the position after the match, not that of the match itself. Added extended syntax ((?x)) to make the regex clearer. My bad :-(

Update 2: I should also have corrected the output, now done.

Cheers,

JohnGG

In reply to Re: Find element in array by johngg
in thread Find element in array by Sofie

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


We don't bite newbies here... much
	PerlMonks