Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Re: Processing data with lot of math...

by itub (Priest)
on May 12, 2004 at 18:44 UTC ( [id://352862]=note: print w/replies, xml ) Need Help??


in reply to Re: Processing data with lot of math...
in thread Processing data with lot of math...

Hey, this approach works really well! I'll include it in my next version of Chemistry::Bond::Find. It scales almost linearly, maybe even better than the recursive partitioning algorithm as I implemented it, and it avoids some of the overhead of the recursive calls and the creation and passing of lots of data structures.

For my test file with 1717 atoms, my recursive algorithm took about 9.7 seconds, while the new method based on your post took 3.2 seconds.

I should remember that you can throw hashes at almost any problem with Perl...

  • Comment on Re: Re: Processing data with lot of math...

Replies are listed 'Best First'.
Re: Re: Re: Processing data with lot of math...
by BrowserUk (Patriarch) on May 12, 2004 at 19:24 UTC

    That's good. Unfortunately, anything involving recursion in perl tends to hit performance quite badly once the volumes of data rise.

    If the mapping to integer coords is working okay, then you could (potentially) save a bit more time on large volumes of data by concatenating the pack'd integers rather than the ascii representations.

    $hash{ join'', pack 'N3', $x, $y, $z } = 'name x.xx y.yy z.zz';

    This would allow you to do an alpha sort rather than a numeric one, and avoids the need to add leading zeros to the values. It might also conserve a little space, though you are unlikely to see this unless your resultant concatenated ascii values are greater than 12 characters.

    You'd need to benchmark some typical datasets to see what if any benefit this would have, and where an transitions occured.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

      Regarding the concatenations, my implementation has several differences with respect to your original proposal:

      • The real coordinates are normalized by dividing by the cutoff (which is relatively small), and rounded to the next lower integer (i.e., floored). The "concatenation" is numerical; $n = 1000000*$x + 1000*$y + $z + 500500500. Problem here: I'm assuming that the entire system is within +/- 500*$cutoff. I'll have to fix that somehow.
      • I use a hash to store the atoms. Each "bucket" ($hash{$n}) may contain more than one atom (usually 1-4), so it's actually an array reference. Atoms are Chemistry::Atom objects, which know their own symbol, name, real coordinates, etc.
      • The buckets don't need to be sorted; I just loop for each of them, and then check for bonds with the N^2 algorithm within the bucket (remember N is small), and then check for bonds with the atoms in the neighboring buckets (this requires N*M checks if there are M atoms in the neighbor). "Neighbor" is defined in a way that avoids counting the same pair twice.

      Note: the neighbors of $n are $n+1, $n+1000, $n+1000000, etc., for a total of 13 neighbors (a cube is surrounded by 26 cubes, but you only need half of them to avoid counting twice)

        When you said "based on your post", you should have prefixed it with "loosely" :)

        I'm not sure that I fully understand the details of what you are doing. Maybe I'll pull the module and take a looksee if the code and POD clarifies anything. Note: It's not your description that is lacking, just my knowledge of the actual application.

        If the idea adapts to something useful, it's the best reward one can ask for. Thanks for mentioning it:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://352862]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 14:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found