Dear all,
Though this is a reply to meetraz, it involves having read the next 10 or so replies too.
I'm now posting an example of the file data that I use, and the pseudo-algorithm
But before I do so, I want to say that probably the best reply of the lot came from 'toma' I will truly go away and study computational geomtry now, thanks toma!
File:
ATOM 1 N GLY A 1 43.202 54.570 86.432 1.00 44.15
+ N
ATOM 2 CA GLY A 1 44.109 54.807 85.249 1.00 45.94
+ C
ATOM 3 C GLY A 1 44.984 56.034 85.443 1.00 43.20
+ C
ATOM 4 O GLY A 1 45.070 56.527 86.578 1.00 46.54
+ O
ATOM 5 N SER A 2 45.617 56.538 84.378 1.00 38.14
+ N
ATOM 6 CA SER A 2 46.461 57.710 84.544 1.00 33.00
+ C
ATOM 7 C SER A 2 46.057 59.017 83.842 1.00 29.38
+ C
ATOM 8 O SER A 2 46.522 60.071 84.270 1.00 32.21
+ O
ATOM 9 CB SER A 2 47.972 57.377 84.387 1.00 29.10
+ C
ATOM 10 OG SER A 2 48.326 56.931 83.084 1.00 25.44
+ O
ATOM 11 N HIS A 3 45.172 59.000 82.838 1.00 24.08
+ N
ATOM 12 CA HIS A 3 44.778 60.271 82.173 1.00 20.59
+ C
ATOM 13 C HIS A 3 43.299 60.425 81.787 1.00 18.87
+ C
ATOM 14 O HIS A 3 42.664 59.463 81.375 1.00 16.14
+ O
ATOM 15 CB HIS A 3 45.597 60.514 80.897 1.00 17.13
+ C
ATOM 16 CG HIS A 3 47.060 60.648 81.145 1.00 20.02
+ C
ATOM 17 ND1 HIS A 3 47.596 61.677 81.887 1.00 20.09
+ N
ATOM 18 CD2 HIS A 3 48.099 59.855 80.797 1.00 21.10
+ C
ATOM 19 CE1 HIS A 3 48.904 61.516 81.989 1.00 19.86
+ C
ATOM 20 NE2 HIS A 3 49.233 60.417 81.333 1.00 18.86
+ N
The co-ordinates are indeed cartesian, the last three columns are meaningless.
The other important columns are the second through to the sixth columns, in order:
- atom number
- atom name
- residue name
- chain id
- residue number
I should add that each line is for an atom and not for a residue, though I was originally talking about residue iteration, I use residue iteration to try and avoid any extra atomic iteration, but in reality, the number of lines is about 10-20 times the number of residues themselves, depending on the residue type
The pseudo-code, which I'l post below, will ignore the format listed above, because each line is for one atom, and I have a coded PDB::Atom object (home-made) that takes the line as it's read, and parses it.
For the sake of making the psuedo-code concise, it is indeed 'PSEUDO' so please dont expect it to run at all!
The function 'addToMemory' simply fills the different hashes I use for lookup, especially to recall all the atoms in a residue. It is in there also, that the atomic data is added to the database, so there is one DB call per line in the file, this is one bottleneck, but much less of a bottleneck than the sheer number of iterations themselves
I try to cut down on the iterations, by pre-calculating the 3D center of the residue, and then comparing the distance of a residue pair to a hard-coded cut-off (varies depending on residues themselves) in 'notClose' (not shown). This avoids having to iterate through all the atoms in a residue, if there is no chance of a bond. Finally the bond detection itself is in another function, not shown, and doesn't necessarily return a bond, there is more calculations depending on the nature of the atom itself. I just wanted to show the nature of the iterations themselves.
It should be noted that there is a large amount of processing for particular residues and atoms due to possible errors in the file, which I've excluded.
open (FILE, "< ".$file ) or die "\ncouldn't open FILE: ".$file.": $!
+";
my $i=0;
while(<FILE>){
if($_ =~ /^ATOM/){
$temp = new PDB::Atom('-line' => $_, '-number'=>$i);
addToMemory($self, $temp, $i);
########bond detection #################
###iterate through chains###
foreach my $ch (sort {$a <=> $b} keys %{$self->{'contacts'}){
next if $temp->chain eq $ch; #skip same chains
###iterate through residues###
foreach my $r (sort {$a <=> $b} keys %{$self->{'contacts'}{$ch
+}}){
next if notClose($temp,self->{'contacts'}{$ch}{$r});
foreach my $a (sort {$a <=> $b} keys %{$self->{'residues'}
+{$ch.$r}}){
$bonds->newBond($temp,$self->{'all'}{$
+a});
}
}
}
}
}
close(FILE);
sub addToMemory{
my ($self, $atom, $numb) = @_;
$self->{'all'}{$numb}=$atom;
unless($atom->proton){
$self->{'atoms'}{$numb} = 1;
}else{
$self->{'protons'}{$numb} = 1;
}
$self->{'contacts'}{$atom->chain}{$atom->resNumber}=residualAverag
+e($atom, $self->{'contacts'}{$atom->chain}{$atom->resNumber});
$self->{'residues'}{$ch.$atom->resNumber}{$numb}=1;
}
Edited by Chady -- converted pre to code tags.
|