Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??


I have a basic conceptual question about Perl's implementation of hashes. First let me give the problem background.

I have two tables in my database a{id, acc} and b{build, acc}. I almost always need to have {id, build} tuples and because my tables are fairly large (5 million rows each, roughly) I would like to create a c{id, build} table. Note that there an acc is not necessarily found in either a or b -- each table contains some not found in the other.

A straight SQL join statement is running horribly slow on my DB and our DBA can offer no suggestions for tuning.

As a work-around, I was thinking of retrieving all of table b into a hash: $data{acc} = build. The underlying hash relationship of one build per acc is guaranteed to be valid.

So, the question is, how can I determine how much memory this will take? Naively I would figure that it would require bytes(acc) + bytes(build) which in my case is 60. At that point I could just do a (large) number of hash-lookups, which should be relatively quick. I would write to file, then bulk-load the data and build indexes.

Only... 60 bytes for 5 million records appears to be about 300 MB. That's easily do-able but it doesn't take into account overhead.

Aside from the caveat that doing this in the DB properly ought to be faster, is my approach reasonable and will memory usage be anywhere close to 300 MB (I have about 2 GB available for this)?

In reply to Space Efficiency of Hashes by Anonymous Monk

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.