http://qs321.pair.com?node_id=11113042


in reply to Find element in array

G'day Sofie,

Welcome to the Monastery.

"I am trying to check if an input DNA sequence only contains nucleotides."

That's a good start: you've succinctly stated your main goal.

"And if it doesn't I want to print out the position in the sequence where an invalid character was entered."

Excellent: you a have a subtask; also succinctly stated.

"From title: Find element in array"

In my opinion, this is where you started to go wrong. You decided that you needed to split the entire sequence into individual characters and assign those to an array; then go back and iterate the entire array checking each individual character. DNA sequences can be exceptionally long — you may be well aware of this — and doing all this extra work is completely unnecesssary for your stated goals.

Here's a script that does what you want. I've had to make some guesses about the output as you didn't specify that.

#!/usr/bin/env perl use strict; use warnings; my $DNA = <STDIN>; chomp($DNA); my $lengthseq = length $DNA; print "The length of the sequence is: $lengthseq\n"; my (@nucleotideDNA, @nonvalid); for my $pos (0 .. $lengthseq - 1) { my $nucleotide = substr $DNA, $pos, 1; if ($nucleotide =~ /^[ACGT]$/) { push @nucleotideDNA, $pos+1 . ":\t$nucleotide"; } else { push @nonvalid, $pos+1 . ":\t$nucleotide"; } } print "*** nucleotideDNA ***\n"; print "$_\n" for @nucleotideDNA; print "*** nonvalid ***\n"; print "$_\n" for @nonvalid;

Here's a sample run:

$ ./pm_11113020_parse_dna.pl XACGTYTGCAZ The length of the sequence is: 11 *** nucleotideDNA *** 2: A 3: C 4: G 5: T 7: T 8: G 9: C 10: A *** nonvalid *** 1: X 6: Y 11: Z

You may have noticed that I've structured my code in a similar way to yours. Let's look at the differences.

"... I am very new to perl ..."

That's fine, we all started knowing nothing about Perl. Note that Perl is the language and perl is the program.

I recommend you read through "perlintro" and bookmark that page. There's no need to try and learn it all in one sitting; just get a general feel for what it has to offer. It is peppered with links to FAQs, tutorials and more detailed information. Refer back to it whenever the need arises.

Finally, in case you had some genuine, but unstated, reason to use an array, you could have iterated it like this:

for my $pos (0 .. $#DNA) { ... }

Then accessed each element with $DNA[$pos] and reported the position with $pos+1 as I did.

Using the range operator (..) is a standard way to do this: see "perlop: Range Operators" for details.

I don't think that's what you wanted, or needed, here. You've at least learned how to do this in a more appropriate scenario at some other time.

— Ken