Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Sorting data that don't fit in memory

by dragonchild (Archbishop)
on Oct 06, 2007 at 18:33 UTC ( #643129=note: print w/replies, xml ) Need Help??


in reply to Sorting data that don't fit in memory

As an alternate to tilly's suggestion, consider using DBM::Deep. One of its design requirements was to provide access to more data than could fit in memory. I'm not sure about the performance comparisons, but if you happen to be dealing with multi-level datastructures (HoAoHoAoA or something similar), nothing is going to beat DBM::Deep. In addition, DBM::Deep is (I believe) the only DBM that correctly handles Perl references. That may also be a consideration.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
  • Comment on Re: Sorting data that don't fit in memory

Replies are listed 'Best First'.
Re^2: Sorting data that don't fit in memory (DBM::Deep)
by tye (Sage) on Oct 10, 2007 at 16:05 UTC

    So, does the act of putting a hash into DBM::Deep cause the keys to be sorted (that would make sense for an efficient on-disk format and would be a very welcome feature to me) or are you suggesting that using perl's sort on an array from DBM::Deep would efficiently swap parts in and out of memory such that the sorting would work reasonably fast and also not require that much memory? (update: I guess the latter would also require storing the sorted results into a DBM::Deep array, perhaps the same one)

    - tye        

      The hashkeys are not stored sorted, no. But, in doing a sort (and I could be wrong), Perl doesn't necessarily pull all the data into memory. If it does, then I would certainly accept help in getting Perl to do the sort in the way you're describing.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        Then you might want to test your suggestion of using sort on an enormous DBM::Deep'd array. I'm not convinced that it will be easy to actually avoid using huge amounts of memory or that it will be reasonably fast. More detailed instructions on how to accomplish the former might be helpful and you might find improvements to make while looking at the latter.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://643129]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2020-10-24 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (242 votes). Check out past polls.

    Notices?