Thats a very common problem, you are trying to add a record only when the successor is to be parsed, but the last record has no successor (sic ;)!
Most people try to solve by repeating code to add the last record after the loop.
But it's much cleaner this way (avoiding a posteriori state logic)
use strict;
use warnings;
use Data::Dump;
my $header;
my %sequence;
while ( my $line = <DATA> ){
chomp $line;
if ( $line =~ /^>(.*)/ ) {
$header = $1;
} else {
$sequence{$header} .= $line;
}
}
dd \%sequence;
__DATA__
>sequence_5849
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>sequence_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>sequence_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
output
{
sequence_0808 => "CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGAC
+CAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT",
sequence_5849 => "CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTC
+CGGCCTTCCCTCCCACTAATAATTCTGAGG",
sequence_5959 => "CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCG
+CCGAAGGTCTATATCCATTTGTCAGCAGACACGC",
}
Cheers Rolf
( addicted to the Perl Programming Language)
update
a *general pattern* to solve such problems while staying DRY is to use references
if ( $line =~ /^(HEAD_PATTERN)/ ) {
$data = \ $deeply{nested}{structure}{$1}; # reference data
+
} else {
$$data .= $line; # derefrence data
}
like this you don't need to repeat the path of a deeply nested data structure, which might vary in multiple dimensions
update
added some explanation |