Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Processing data with lot of math...

by monkey_boy (Priest)
on May 12, 2004 at 15:29 UTC ( [id://352786]=note: print w/replies, xml ) Need Help??


in reply to Processing data with lot of math...

I guess these are PDB files(?).
I have done similar manipulations to pdb files with few (speed) problems. There will always be some performance issues when calculating an "all by all" distance matrix, but i dont think the problem here is with RAM, as most pdb files are not that big. Perhaps if you post the code i may be able to help.

one optimisation is if the distance cutoff is fixed (i.e. 3 angstroms).
Then instead of calculating the euclidean distance, calculate the square euclidean distance & see if it is > 9 (i.e. 3**2).
This saves a great deal of time, as the sqrt() function is relativly expensive.
hope this helps.
I should really do something about this apathy ... but i just cant be bothered
  • Comment on Re: Processing data with lot of math...

Replies are listed 'Best First'.
Re: Re: Processing data with lot of math...
by itub (Priest) on May 12, 2004 at 16:00 UTC

    This saves a great deal of time, as the sqrt() function is relativly expensive.

    That used to be the true in the old times, when sqrt was calculated by software and the languages were simple. Well, simple languages still exist, but at least for Perl, the overhead of creating and destroying variables, moving things around and executing the opcodes are more important than what takes your hardware to do the floting point operations. Just see this benchmark for an example:

    use Benchmark ':all'; our $a = 347.3; our $b = 876.1; our $c; cmpthese( 5_000_000, { 'rand' => sub { rand() }, add => sub { $c = $a + $b }, mul => sub { $c = $a * $b }, div => sub { $c = $a / $b }, 'sqrt' => sub { $c = sqrt($b) }, } );

    Benchmark: timing 5000000 iterations of add, div, mul, rand, sqrt...
           add:  2 wallclock secs ( 1.32 usr +  0.00 sys =  1.32 CPU) @ 3787878.79/s (n=5000000)
           div:  1 wallclock secs ( 1.58 usr +  0.00 sys =  1.58 CPU) @ 3164556.96/s (n=5000000)
           mul:  2 wallclock secs ( 1.50 usr +  0.00 sys =  1.50 CPU) @ 3333333.33/s (n=5000000)
          rand:  1 wallclock secs ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 4347826.09/s (n=5000000)
          sqrt:  0 wallclock secs ( 1.18 usr +  0.01 sys =  1.19 CPU) @ 4201680.67/s (n=5000000)
              Rate  div  mul  add sqrt rand
    div  3164557/s   --  -5% -16% -25% -27%
    mul  3333333/s   5%   -- -12% -21% -23%
    add  3787879/s  20%  14%   -- -10% -13%
    sqrt 4201681/s  33%  26%  11%   --  -3%
    rand 4347826/s  37%  30%  15%   3%   --
    

    Surprise! sqrt and rand may even be faster! The results are not entirely reproducible, but the short answer is: it doesn't really matter. As I said in a previous post, the problem here is the type of algorithm and how it scales.

      Erm... i dont understand, its not really the point how fast sqrt() is , the main thing is your doing an extra calculation that is completly redundant,
      If you have 10,000 atoms then 10,000*10,000 sqrt()'s has an effect on performance,
      It must be better to do just one $cutoff_dist**2?
      However i do agree that the any improovments to the brute force method would be better,
      Cheers , monkey_boy
      I should really do something about this apathy ... but i just cant be bothered

        I'm sorry, you are right that it's always good to avoid redundant operations. I just wanted to point out that sqrt was not a likely bottleneck, but that the algorithm was the problem (and in that we agree). By avoiding sqrt you might get a size-independent 5% percent improvement if you are lucky, but by using a better algorithm you get a 99% improvement for sufficiently large systems if you have N^1 scaling instead of N^2 (note: by improvement I mean reduction in execution time).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://352786]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-26 06:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found