http://qs321.pair.com?node_id=643129


in reply to Sorting data that don't fit in memory

As an alternate to tilly's suggestion, consider using DBM::Deep. One of its design requirements was to provide access to more data than could fit in memory. I'm not sure about the performance comparisons, but if you happen to be dealing with multi-level datastructures (HoAoHoAoA or something similar), nothing is going to beat DBM::Deep. In addition, DBM::Deep is (I believe) the only DBM that correctly handles Perl references. That may also be a consideration.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
  • Comment on Re: Sorting data that don't fit in memory

Replies are listed 'Best First'.
Re^2: Sorting data that don't fit in memory (DBM::Deep)
by tye (Sage) on Oct 10, 2007 at 16:05 UTC

    So, does the act of putting a hash into DBM::Deep cause the keys to be sorted (that would make sense for an efficient on-disk format and would be a very welcome feature to me) or are you suggesting that using perl's sort on an array from DBM::Deep would efficiently swap parts in and out of memory such that the sorting would work reasonably fast and also not require that much memory? (update: I guess the latter would also require storing the sorted results into a DBM::Deep array, perhaps the same one)

    - tye        

      The hashkeys are not stored sorted, no. But, in doing a sort (and I could be wrong), Perl doesn't necessarily pull all the data into memory. If it does, then I would certainly accept help in getting Perl to do the sort in the way you're describing.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        Then you might want to test your suggestion of using sort on an enormous DBM::Deep'd array. I'm not convinced that it will be easy to actually avoid using huge amounts of memory or that it will be reasonably fast. More detailed instructions on how to accomplish the former might be helpful and you might find improvements to make while looking at the latter.

        - tye