Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Bioinformatics: Regex loop, no output

by Not_a_Number (Prior)
on Nov 16, 2015 at 15:12 UTC ( [id://1147806]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Bioinformatics: Regex loop, no output
in thread Bioinformatics: Regex loop, no output

In what way does GrandFather's solution (which you replied to) not do what you want?
  • Comment on Re^3: Bioinformatics: Regex loop, no output

Replies are listed 'Best First'.
Re^4: Bioinformatics: Regex loop, no output
by TamaDP (Initiate) on Nov 16, 2015 at 15:37 UTC

    Sorry, I dind't explain myself very well there. So, I have an array of proteins that are digested with one enzyme that the user selects. I get an array of peptides, and I want to send that array to a subroutine for printing, where the printing comes out as:

    >Protein 1 Peptide 1 DAAAAATTLTTTAMTTTTTTCK >Protein 1 Peptide 2 MMFRPPPPPGGGGGGGGGGGG >Protein 2 Peptide 1 ALTAMCMNVWEITYHK

    And so on... So in order to format the printing like that, I need to track which peptide belongs to each protein. Or am I making things more complicated than necessary? Thx

      Again, GrandFather's code above already seems to print the information in essentially the way you want, except the formatting is different. So change the print formatting. Is this what you need help with?

      On the other hand, you may mean that you want the peptides encapsulated into an independent data structure that you can pass around to any function at will. Here's an adaptation of GrandFather's code to produce a data structure associating proteins with their split peptides:

      c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my @proteins = qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD DAAAAATTLTTTAMTTTTTTCK XXXXXXX ); ;; my %protein_peptides; ;; for my $protein (@proteins) { my @peptides = split /(?<=[KR])(?!P)/, $protein; ;; next if @peptides < 2; ;; push @{ $protein_peptides{$protein} }, \@peptides } ;; dd \%protein_peptides; " { ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD => [ ["ALTAMCMNVWEITYHK", "GSDVNR", "R", "ASFAQPPPQPPPPLLAIKPASDASD"], ], DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG => [ ["DAAAAATTLTTTAMTTTTTTCK", "MMFRPPPPPGGGGGGGGGGGG"] ], }
      I have reformatted the native output of Data::Dump::dd() as it appeared on my monitor to make it more readable. (Update: I like Data::Dump as my dumper, but you may prefer Data::Dumper, which is core.)

      Note that the protein  DAAAAATTLTTTAMTTTTTTCK does not appear in the output data structure because, while it ends in a K that is not followed by a P and so might in some cases be considered to be followed by an empty (or null) string, split will not produce trailing null fields when called as it is in the code. (Update: Therefore,  DAAAAATTLTTTAMTTTTTTCK is considered not to have been split at all, and so does not appear in the output structure.) See split for the rules about producing null trailing (and leading) fields. Note also that the protein  XXXXXXX does not appear in the output structure because it contains no split point whatsoever.

      See Perl Data Structures Cookbook (perldsc) for more info on generating and accessing complex Perl data structures.


      Give a man a fish:  <%-{-{-{-<

        Many thanks for the help, seems to be working right now. Definitely need to learn more about cpan.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1147806]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-24 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found