Your exon and intron regex was limiting the results if I
understand your question correctly. I reworked the code to
to:
foreach(<INFO1>)
{
# got rid of the '*' which is too greedy
# I made the date matches specific to the
# example input you provided in your post
# you may need to adjust for more options
# in the matches depending on your data
# consistency
if(/^DATE\s+(\d{2})-(\w{3})-(\d{4})/){
print OUT "DBACC\t $no\n";
print OUT "Date\t $1-$2-$3\n";
$no++;
# made one conditional that gets both
# exon and intron. Used a [] (character
# class match) instead of the \d*-
# The + after it allows for 1 or more
# of a 0-9 , ';' or '-'
} elsif(/\s+\/(intr|ex)on="([\d-;]+)"\n/) {
# added a split on ';' in case you want
# or need to do something with each one
# seperated by a ';'
my @values = split(/;/,$2);
foreach (@values) {
# needed to uppercase the matched prefix
# based on your example output since
# the match was on the lowercase prefix
print OUT ucfirst($1) . "on\t \{Translation\%$_\}\n";
}
# if you don't need to do the split just do this
# print OUT ucfirst($1) . "on\t \{Translation\%$2\}\n";
} else {
print OUT "line $counter\n";
}
$counter++;
}
There are several good nodes on regex in the tutorial
section.
See the
gotcha
one in particular.