Again, GrandFather's code above already seems to print the information in essentially the way you want, except the formatting is different. So change the print formatting. Is this what you need help with?
On the other hand, you may mean that you want the peptides encapsulated into an independent data structure that you can pass around to any function at will. Here's an adaptation of GrandFather's code to produce a data structure associating proteins with their split peptides:
c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(dd);
;;
my @proteins = qw(
DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG
ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD
DAAAAATTLTTTAMTTTTTTCK
XXXXXXX
);
;;
my %protein_peptides;
;;
for my $protein (@proteins) {
my @peptides = split /(?<=[KR])(?!P)/, $protein;
;;
next if @peptides < 2;
;;
push @{ $protein_peptides{$protein} }, \@peptides
}
;;
dd \%protein_peptides;
"
{
ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD => [
["ALTAMCMNVWEITYHK", "GSDVNR", "R", "ASFAQPPPQPPPPLLAIKPASDASD"],
],
DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG => [
["DAAAAATTLTTTAMTTTTTTCK", "MMFRPPPPPGGGGGGGGGGGG"]
],
}
I have reformatted the native output of Data::Dump::dd() as it appeared on my monitor to make it more readable. (Update: I like Data::Dump as my dumper, but you may prefer Data::Dumper, which is core.)
Note that the protein DAAAAATTLTTTAMTTTTTTCK does not appear in the output data structure because, while it ends in a K that is not followed by a P and so might in some cases be considered to be followed by an empty (or null) string, split will not produce trailing null fields when called as it is in the code. (Update: Therefore, DAAAAATTLTTTAMTTTTTTCK is considered not to have been split at all, and so does not appear in the output structure.) See split for the rules about producing null trailing (and leading) fields. Note also that the protein XXXXXXX does not appear in the output structure because it contains no split point whatsoever.
See Perl Data Structures Cookbook (perldsc) for more info on generating and accessing complex Perl data structures.
Give a man a fish: <%-{-{-{-<
|