Space Efficiency of Hashes

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Howdy

I have a basic conceptual question about Perl's implementation of hashes. First let me give the problem background.

I have two tables in my database a{id, acc} and b{build, acc}. I almost always need to have {id, build} tuples and because my tables are fairly large (5 million rows each, roughly) I would like to create a c{id, build} table. Note that there an acc is not necessarily found in either a or b -- each table contains some not found in the other.

A straight SQL join statement is running horribly slow on my DB and our DBA can offer no suggestions for tuning.

As a work-around, I was thinking of retrieving all of table b into a hash: $data{acc} = build. The underlying hash relationship of one build per acc is guaranteed to be valid.

So, the question is, how can I determine how much memory this will take? Naively I would figure that it would require bytes(acc) + bytes(build) which in my case is 60. At that point I could just do a (large) number of hash-lookups, which should be relatively quick. I would write to file, then bulk-load the data and build indexes.

Only... 60 bytes for 5 million records appears to be about 300 MB. That's easily do-able but it doesn't take into account overhead.

Aside from the caveat that doing this in the DB properly ought to be faster, is my approach reasonable and will memory usage be anywhere close to 300 MB (I have about 2 GB available for this)?

Back to Seekers of Perl Wisdom