Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Heap structure for lookup?

by BrowserUk (Patriarch)
on May 27, 2015 at 08:43 UTC ( [id://1127947]=note: print w/replies, xml ) Need Help??


in reply to Re: Heap structure for lookup?
in thread Heap structure for lookup?

The problem is that Tree::RedBlack is implemented in pure Perl, using a blessed hash for each node.

Without having tried it, I estimate that a tree to hold my 150e6 values would require ~40GB of ram.

I haven't found an XS implementation, but even then it would require at least 2x64-bit pointers + a 64-bit pointer to an SvUV + 1-bit per node. If they stored the R/B in an unused bit in one of the pointers, then that's 56 bytes * 150e6 = ~8GB, which is too much.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^3: Heap structure for lookup?
by hdb (Monsignor) on May 27, 2015 at 09:09 UTC

    This means (binary) trees generally are out question, correct?

    Looking into a different direction, is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?

      This means (binary) trees generally are out question, correct?

      Certainly any CPAN modules that store generic perl scalars. A custom implementation that stored just a U64 value; giving 24 * 150e6 ~ 3.5GB would be possible.

      But it's the need for 2 64-bit pointers that is the main problem, hence my mind drifting to heaps. They self order as you insert, and avoid the pointers.

      But are any of the variations O(logN) searchable?

      is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?

      No. They can take on any of the 2^64 values and are likely to be distributed right across the range.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        If you know the number of values upfront (or an upper barrier to the number) and if the values are roughly equally distributed across the full range (with some clustering of course), you could pre-allocate an array and then come up with a guess where a particular value belongs (the value divided by 2**64 or some algorithm based on the binary representation of that value). Then do some local sorting. If you made the array somewhat larger than needed you would have less clashes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1127947]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-19 19:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found