if it's an option to slurp the file before processing, like:
...
my $fn = 'data.dat';
my $stuff;
open my$fh, '<', 'data.dat' or die "$fn - $!";
{ local $/; $stuff = <$fh> }
close $fh;
...
then the program could be simplified like:
...
while( $stuff =~ /^ (\d+): \s+ # the number xx: => $1
(\w+) \s+ # the locus => $2
( (?:.(?!^\d+:))+ ) # the remaining record => $3
/msgx ) {
my ($locus, $name, $record, $kegg, $func, $proc, $comp) = ($2, '',
+$3, ('unknown')x4);
$name = $1 if $record =~ /([^\n\[]+)\s*/;
$kegg = $1 if $record =~ /KEGG \s+ pathway: \s+ (.+?)\s+Function \s
++ Evidence/sx;
$func = $1 if $record =~ /Function \s+ Evidence \s+ (.+?) (?:\n\n|
+\z)/sx;
$proc = $1 if $record =~ /Process \s+ Evidence \s+ (.+?) (?:\n\n|
+\z)/sx;
$comp = $1 if $record =~ /Component \s+ Evidence \s+ (.+?) (?:\n\n|
+\z)/sx;
print join "\n\n", $locus, $name, $kegg, $func, $proc, $comp;
}
...
Regards
mwa