Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Issues regarding for loops and recursion

by Marshall (Canon)
on Jan 12, 2023 at 12:38 UTC ( [id://11149548]=note: print w/replies, xml ) Need Help??


in reply to Issues regarding for loops and recursion

I see some trouble with your code to parse out the sequences from SEQRES lines. There is some redundant information in the .pdb format. I used the sequential letter (A,B,C,D,E,etc) to delineate start/end of the various sequences. Number of 3 letter groups in the sequence would be another way. I am not sure what you are trying to do. At the end of code, see a couple of example FASTA segments I cut n' pasted from the program output.

I have no ide what to do with the x,y,z atom coordinates or even what they mean. I averaged them for fun. I changed your regex a bit to make it easier to pick out the +-floating point numbers.

To run code, download the .pdb file and name it "6U9D.pdb". In production code, I probably would not use so much memory for temporary results, preferring to calculate and output results as they occur rather than save a bunch of data and process it at the end. However for these prototype things, having the data in intermediate forms can useful for debugging. Have fun.

use strict; use warnings; open my $pdb_fh, '<', "6U9D.pdb" or die "unable to open 6U9D.pdb for r +eading $!"; my %amino_acid_conversion = ( ALA => 'A', ARG => 'R', ASN => 'N', ASP => 'D', CYS => 'C', GLN => 'Q', GLU => 'E', GLY => 'G', HIS => 'H', ILE => 'I', LEU => 'L', LYS => 'K', MET => 'M', PHE => 'F', PRO => 'P', SER => 'S', THR => 'T', TRP => 'W', TYR => 'Y', VAL => 'V'); my @atoms; #2D array of x,y,z of all atoms (many thousands) my @seqs; #raw sequences with the 3 letter designations #convert later to FASTA format my $cur_seq = ''; my $cur_ltr = ''; # parse out data of interest from file # while (<$pdb_fh>) { if (my ($ltr, $seq) = (/^SEQRES\s+\d+\s+(\w+)\s+\d+\s+([A-Z ]+)$/) +) { $seq =~ s/\s+$//; if ($ltr eq $cur_ltr) { $cur_seq .= " $seq"; } else { push @seqs, $cur_seq if $cur_seq ne ''; # end of current seq +uence $cur_seq = $seq; # begin of the next +sequence $cur_ltr = $ltr; } } elsif (my ($x,$y,$z) = (/^ATOM\s+.*?([\d.-]+)\s+([\d.-]+)\s+([\d.-] ++)/) ) { push @atoms, [$x,$y,$z]; } } push @seqs, $cur_seq; # don't forget to finish the last seq! #### output collected data ### #make a fasta sequence segments foreach my $seq (@seqs) { # my $fasta = join '',map{$amino_acid_conversion{$_}}split ' ',$seq +; # without using a map: # my $fasta =''; foreach my $char3 (split ' ',$seq) { $fasta.= $amino_acid_conversion{$char3} } print ">Some Fasta Description Line\n"; #use 60 char lines while ($fasta) #fasta suggested max is 80 { print substr($fasta,0,60,''),"\n"; } } #print the data points # I am not sure what needs to be done with them # average of each coordinate? #foreach my $row_ref (@atoms) #uncomment to print #{ # print @$row_ref,"\n"; #} my $xsum; my $ysum; my $zsum; foreach my $row_ref (@atoms) # @atoms is a 2D array { my ($x, $y , $z ) = @$row_ref; $xsum+=$x; $ysum+=$y; $zsum+=$z; } print "avg x = ",$xsum/@atoms,"\n"; print "avg y = ",$ysum/@atoms,"\n"; print "avg z = ",$zsum/@atoms,"\n"; __END__ These are 2 examples: You will have to figure out what goes in the FASTA description line And perhaps not all of these sequences are relevant? Looks like a lot are duplicates. >Some Fasta Description Line MHHHHHHENLYFQGAPSFNVDPLEQPAEPSKLAKKLRAEPDMDTSFVGLTGGQIFNEMMS RQNVDTVFGYPGGAILPVYDAIHNSDKFNFVLPKHEQGAGHMAEGYARASGKPGVVLVTS GPGATNVVTPMADAFADGIPMVVFTGQVPTSAIGTDAFQEADVVGISRSCTKWNVMVKSV EELPLRINEAFEIATSGRPGPVLVDLPKDVTAAILRNPIPTKTTLPSNALNQLTSRAQDE FVMQSINKAADLINLAKKPVLYVGAGILNHADGPRLLKELSDRAQIPVTTTLQGLGSFDQ EDPKSLDMLGMHGCATANLAVQNADLIIAVGARFDDRVTGNISKFAPEARRAAAEGRGGI IHFEVSPKNINKVVQTQIAVEGDATTNLGKMMSKIFPVKERSEWFAQINKWKKEYPYAYM EETPGSKIKPQTVIKKLSKVANDTGRHVIVTTGVGQHQMWAAQHWTWRNPHTFITSGGLG TMGYGLPAAIGAQVAKPESLVIDIDGDASFNMTLTELSSAVQAGTPVKILILNNEEQGMV TQWQSLFYEHRYSHTHQLNPDFIKLAEAMGLKGLRVKKQEELDAKLKEFVSTKGPVLLEV EVDKKVPVLPMVAGGSGLDEFINFDPEVERQQTELRHKRTGGKH >Some Fasta Description Line MGSSHHHHHHSSGLVPRGSHMENLYFQGATRPPLPTLDTPSWNANSAVSSIIYETPAPSR QPRKQHVLNCLVQNEPGVLSRVSGTLAARGFNIDSLVVCNTEVKDLSRMTIVLQGQDGVI EQARRQIEDLVPVYAVLDYTNSEIIKRELVMARISLLGTEYFEDLLLHHHTSTNAGAADS QELVAEIREKQFHPANLPASEVLRLKHEHLNDITNLTNNFGGRVVDISETSCIVELSAKP TRISAFLKLVEPFGVLECARSGMMALPRTPLKTSTEEAADEDEKISEIVDISQLPPG I have no idea what these numbers would mean? avg x = 321.013155298296 avg y = 290.744642162734 avg z = 69.196842162731

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149548]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-28 21:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found