Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

large data structures

by genehack (Beadle)
on Apr 28, 2000 at 08:29 UTC ( #9539=perlquestion: print w/replies, xml ) Need Help??

genehack has asked for the wisdom of the Perl Monks concerning the following question:


I'm struggling with implementing a large two level data structure, and would like to get some input from the mass mind.

Conceptually the structure is a hash of arrays -- about 7000 different arrays, each with 65536 (4**8) elements. Figuring 1 byte per element 1 gives a total size of 458,752,700 bytes -- so that's not going to work. (And, yes, I do need random access at the upper level of the structure, so working on this one hash at a time isn't going to cut it.)

The arrays are pretty sparsely populated, maybe ~25% filled on average, and actually map to strings rather than numbers2, so I considered using a hash of hashes...but I also need this to be persistent (i.e., written out to disk), and from the *DBM_File docs, multi-level structures are a no-op.

So, at this point, I'm looking at using pack to scrunch the second level hashes into blobs, which can then be unpacked on the fly as they are accessed. Of course, that means I've got to have a second structure, telling me how many key/value pairs I have in each packed hash (because each will differ)...

Someone on c.l.p.misc suggested MLDBM, which looks like it might be a solution -- but the inability to directly modify the structure could end up being a PITA in the long run.

So, here I am. Can the Monks come up with a more flexible idea than the c.l.p.misc group?


1 Which is an under-estimination, in all likelihood.
2 I was using one global look-up array to map the strings back onto multiple data arrays.

Replies are listed 'Best First'.
by The Alien (Sexton) on May 01, 2000 at 07:41 UTC

    I am working on this sort of thing now. The inability to directly modify structures under MLDBM is not so much of an issue as you might think. I'm very new to MLDBM, so take this with NaCl

    My current idea is that one could prepare a subroutine that returns an array and use it like this...

    # We want to change element 4 of array foo to '12' $MLDBMhash{foo} = &Revise_It($MLDBMhash{foo},'4','12'); sub &Revise_It { @somearray = @($_[0]); $somearry[$_[1]] = $_[2]; return @somearray; }

    This is all off the top of my head. Obviously, under some conditions, you could easily have just an UpdateArray function that returned only success or failure. Depends on what you want to do. That method could quite possibly make the code more readable since at a glance anyone can see what array and which place are being updated with what value.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://9539]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2020-10-22 03:52 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (225 votes). Check out past polls.