You can use
length to get the length of a string. You just need to make sure you get rid of the newlines or you could use
tr/ACGT/ACGT/
This might point you in the right direction:
use strict;
use warnings;
my ($chars, $gene);
for (<DATA>) {
chomp; #get rid of newline
if (/^\>/) {
print "$gene => $chars chars\n" if $chars;
$chars=0;
$gene = $_;
}
else {
$chars += length;
#$chars += tr/AGCT/AGCT/; #would work also
}
}
print "$gene => $chars chars\n" if $chars; #print the last gene at the
+ end of the file
__DATA__
>gb|AE008689|:2073051-2073458, Atu4895
TTGGTCACATATTTTCGCTATTCGAAACCCAAATATCCCTGCGGAACACGTTTTATTGGAGCCGCCGTTT
TGCAGTTGCTCGGTGCTGGATTCTTGGCGTTCGTCTTTTGTCTGCTGGATGGGCTCACTGCAAAACCAAC
GATAATTCTCGGTCAGTTTGTATCATGCCTAGTGGGGAGCGTCGCCGGCTTTCATTTCGTGGCTTTTCGT
CGCCCAGGCACGGACGGCCAACTTTACCTTATCGCGACGTCGCTCTTGGCATTTGGGACCCATTATTGGC
TGGTGTCATATTCATTACCTGACCTTTTGCTAGCAAGATTGATTTCAGGATTTGGATCGGGTGTGGTTGT
TGCAGGAACTTTCCGACGTCGCTTTCTGGAAAATCCGGTAATTCCCTGCGTTCGATAG
>gb|AE008689|:c2074151-2073549, Atu4896
GTGTTTGGACCATATTTTCTGTGCAAACTAAACGATGACATAGGGCGATTTTTAGTGGCGGACAAATACA
GACTTCCCGAAGAGTTTTTTACCACTCGGTTTCTCGTTAGACGCATCGTACCCACAGACGCTGAAGCTAT
TTTCGAAGGGTGGAACACCGATCCCGAGGTGACGAAGTACCTGACGTGGAAACCCCACTCCGAGCTTGGC
CAGACACAGCGGGCGATTGAAGAAAATTATAGTGCGTGGAATGCAGGTACATCGTTTCCAGCTGTCATCT
GCCATCGCGAACGGCCACATGAACTAATCGGCCGTATTGATGCACGTCCGATGGGCCACAAGGTCTCTTA
CGGGTGGCTTGTCCGAAGAACCTGGTGGGGCCGGGGTGTTGCAAGCGAGGTCGTTCAACTCGCTGTAGAA
CACGCGTTATCGCATCCGCGCATCTTTCGCACCGAAGCATCCTGCGACGTTCTGAACACGGCGTCAGCAA
GAGTGATGGAAAAAGTAGGGATGACAAAGGAGGCCGTGCTTCGACGGTACCTTTTTCACCCCAATTTTTC
GAATATGCCGCGAGACGCCTTCCTGTATTCCAAGGTACGTTAA
>gb|AE008689|:c2074749-2074345, Atu4897
ATGAAACATACCATCGCAGTTCTCGGCCTGATCACCTTCTCCAGCCCGGCCTTCGCAGCATCGTGCGAGA
AAAACTTCACCGTCTCAGGCGTACCGATGGTCACGGCTGTCTCTTACAAATCCTTTCAGGAACTGCCGAA
AGCCAAAGCACCAGCTGTCCTTCAAAAGCTCGCCCAGGCCGTCGCGGCAGAAGGTTTTTCAGGTATCCAG
ATCAACAAGGCACTGTCGTCAATCGATGCCCATCAGGAAACCAGCGGAAGTGGCAGGATTCAGACGCTGC
GGGTTGTCGCCCGCCAGAAAGGCGCCGCTGTCCGGATCGATGCTGTCTTCAATATTCAGGCAGGACAGAT
CGCCGACAAAGACGTCATCCGCAAGGGCATCTGCGACATCATAAAAGGCGCGTAA
outputs
>gb|AE008689|:2073051-2073458, Atu4895 => 408 chars
>gb|AE008689|:c2074151-2073549, Atu4896 => 603 chars
>gb|AE008689|:c2074749-2074345, Atu4897 => 405 chars