Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file

by Nastazia (Initiate)
on Jun 22, 2018 at 09:57 UTC ( [id://1217177]=perlquestion: print w/replies, xml ) Need Help??

Nastazia has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone, I am trying to write a script in perl which will do the following

it will read a pdb file that contains only Ca atoms as the following

1 2 3 4 5 6 ATOM 1 CA PRO A 889 84.370 72.820 26.830 1.00 0.00 + ATOM 2 CA THR A 890 87.370 73.900 28.080 1.00 0.00 + ATOM 3 CA VAL A 891 90.920 72.490 27.750 1.00 0.00 + ATOM 4 CA PHE A 892 93.640 74.890 28.970 1.00 0.00 + ATOM 5 CA HIS B 893 97.060 74.200 27.360 1.00 0.00 + ATOM 6 CA LYS B 894 99.880 73.920 29.990 1.00 0.00

it will read a second pdb that contains every atom

1 2 3 4 5 6 ATOM 1 N PRO A 889 16.220 12.185 1.804 1.00 71.54 + N ATOM 2 CA PRO A 889 16.101 12.990 3.034 1.00 70.89 + C ATOM 3 C PRO A 889 15.432 14.346 2.803 1.00 72.31 + C ATOM 4 O PRO A 889 14.743 14.852 3.703 1.00 72.20 + O ATOM 5 CB PRO A 889 17.553 13.151 3.502 1.00 72.96 + C ATOM 6 CG PRO A 889 18.315 12.067 2.782 1.00 78.00 + C ATOM 7 CD PRO A 889 17.626 11.907 1.465 1.00 73.35 + C

(The files refer to the same molecule but have different number of lines)

So if the residue number (column num 5) is the same it will take the chain letter (column num 4) from the first file and replace all the chain letters that have the same residue number in the second file. So far i've got this disaster :/

print "\nEnter the network pdb file file: "; $inputFile = <STDIN>; chomp $inputFile; unless (open(INPUTFILE, $inputFile)) { print "Cannot read from '$inputFile'"; <STDIN>; exit; } # load the file into an array chomp(@networkpdb = <INPUTFILE>); # close the file close(INPUTFILE); print "\nEnter the pdb output file: "; $inputFile2 = <STDIN>; chomp $inputFile2; unless (open(INPUTFILE, $inputFile2)) { print "Cannot read from '$inputFile2'"; <STDIN>; exit; } chomp(@pdb = <INPUTFILE>); close(INPUTFILE); for ($line1 = 0; $line1 < scalar @networkpdb; $line1++) { if ($networkpdb[$line1] =~ m/ATOM\s+\d+\s+\w+\s+\w{3}\s*(\w+)\s*(\ +d*)\s+\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*/ig) { my $resnum=$2; my $chain=$1; for ($line = 0; $line < scalar @pdb; $line++) { if ($pdb[$line]=~ m/(ATOM\s+\d+\s+\w+\s+\w{3}\s*)(\w+)\s*(\d*)(\s ++\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*)/ig) { my $begining=$1; my $resnum1=$3; my $chain1=$2; my $end=$4; if ($resnum1=$resnum) {$chain1=$chain; $parsedData{$line} = $begining.$chain1."\s".$resnum1.$end; }}}}} # create the output file name $outputFile = "WithNetwork_".$inputFile; # open the output file open (OUTFILE, ">$outputFile"); # print the data lines foreach $line (sort {$a <=> $b} keys %parsedData) { print OUTFILE $parsedData{$line}."\n"; } # close the output file close (OUTFILE);

thank you very much in advance

  • Comment on Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file
  • Select or Download Code

Replies are listed 'Best First'.
Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file
by hippo (Bishop) on Jun 22, 2018 at 10:18 UTC
    if ($resnum1=$resnum)

    Inside the brackets is an assignment. You almost certainly don't want to do that but instead test equality. ie:

    if ($resnum1 == $resnum)

    == is for comparing numbers and eq is for comparing strings.

    Is there any particular reason you use those massive regexes in preference to a simple split? That might make things a little clearer. Other tips: use strict and warnings, replace print ... exit with die and try to use consistent indenting to make your code more legible (this really does help).

    Good luck.

Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file
by Laurent_R (Canon) on Jun 22, 2018 at 12:09 UTC
    In addition to hippo's comments, please note that nested loops are likely to crucify performances if your files are even moderately large.

    You should probably store the values of interest of the first file into a hash and then lookup the hash when reading the second file. This will be faster and easier to implement.

Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file
by talexb (Chancellor) on Jun 22, 2018 at 14:25 UTC

    Another approach would be to dump the information into two database tables, and have the database do the heavy lifting.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1217177]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-19 22:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found