http://qs321.pair.com?node_id=109898

pmas has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to decide how to implement my CGI script (reading from database into buffer-like structures). I am not ready to implement it in OO, DBIx::Recordset looks too complicated, Tie:DB claims is rather slow... I decided to go KISS way: just to implement simple layer around DBI calls to be used for 80% calls, and do rest by some custom way later, if and when needed.

Now, thinking about reading one record of data: I can fetch array, or hashref. Reading into hash structure looks more promising (field values will be placed in hash elements, conveniently named as fieldnames), obviously it is for the price. But what is the price? I do not know how to find it.

I am aware that premature optimization is evil, but before developing rules and guidelines I would like to know consequences of my decision. I can benchmark access time and compare hash and array, but how about memory usage? Right now it is not a big issue, but later my scripts should run website with dozens of daily users, using mod_perl.

Before reading source, I would like to tap vast knowledge of our monks here and ask:

  1. What strategy should I use when trying to access memory usage of different ways of implementing features? Do we have some Benchmark::Memory package for it? Any links to articles/approaches?
  2. Your gut feeling: How much more memory uses hash comparing with the same data placed in plain array? I read that some "buckets" exist, but do not know details. What to read? What to worry about?
  3. I probably will use many hashes. Is better to use many hashes with small number of items, or overhead is too big, and preferable is small number of hashes with large number of items? (I.e. JOIN multiple tables and read them into one hash, or read every table into own hash). I am aware that reading it in one DB query is better time-wise, I just want to know if putting all JOIN-able records in one hash will be another incentive - to save memory.

I know I should go object way. I promise version 2 will be object. I am not ready yet... ;-)

pmas
To make errors is human. But to make million errors per second, you need a computer.

  • Comment on Comparing memory requirements for hashes and arrays

Replies are listed 'Best First'.
Re: Comparing memory requirements for hashes and arrays
by Cine (Friar) on Sep 03, 2001 at 23:25 UTC
    Memory consumption of an array of size X is the space needed for X pointers (more specific SV pointers) plus a SV for the array itself. A hash needs both the pointer, a key and some overhead for a linked list (the buckets) and itself.

    Given you can program you program with little or no effort using arrays instead of hashes do just that. But if you need to use a lot of time rewriting something to use an array instead of a hash, use the hash.

    PS.
    OO is NOT better... Its just another way of doing things.
    Personally I dont see the big point, when you are not 20+ people working on the same project...

    T I M T O W T D I
      Don't forget that Perl uses power of 2 buffering, so your array probably takes up considerably more space than just the number of elements in it.

      Also the simple debuggability and extensibility of hash-based structures makes them a win over arrays in my books. Even if up front you think the development speed is going to be similar, the debugging speed will not be...

Re: Comparing memory requirements for hashes and arrays
by toma (Vicar) on Sep 04, 2001 at 09:30 UTC
    Many times a database table fits nicely in a list of lists (2 dimensional array). Other times a hash works better. I worked on a project where list of lists was chosen after extensive testing.

    Questions to think about for hash versus array:

    Is your data rectangular?
    Regular database tables without NULLs nicely fits a rectangular data structures. Irregular data fits better in hashes.

    What sort of iteration will you need to do?
    If you will know the key to find your data element, a hash is much better than searching over a list. If you need to do a comparison on each key anyway, an array might be easier to search. For example, if you are checking each key to see if it matches a regular expression, an array can be better. (Let me know if you want to know why :-).

    Are the keys of the hash simple to implement?
    If you have one database field that is the key, it is easy to use it as a hash key. If the database rows are keyed with multiple columns, the hash gets more complicated since you will need to combine columns to make the hash key.

    Do you need to keep reloading your data structures from the database, or are they static?
    If they are static, you can use a few tricks that save memory. If you have a machine with shared libraries and copy-on-write virtual memory, you can get multiple modperl http daemons to share the database data.

    You can presize arrays so that they have less memory overhead. For measuring memory consumption, the normal tools such as ps and top will work reasonably well.

    To see if high-level behavior such as copy-on-write is working properly, you need to stress test your server to see how much traffic load causes it to swap. You really need to use a development server for this type of testing.

    Writing a stress-test program is fun! Create a program using LWP to simulate the behavior of a single user. Run a bunch of these programs at the same time to simulate the load caused by many users. You can get typical user behavior patterns by examining your server log files. This approach allows you make impressive claims, such as "This system is scaled for two second page load times with 1000 simultaneous users."

    It should work perfectly the first time! - toma

Re: Comparing memory requirements for hashes and arrays
by perrin (Chancellor) on Sep 04, 2001 at 03:22 UTC
    In my experience, hashes use about 15% more memory than arrays for similar data with constants for array indexes. This is not a good enough reason to obscure your program. Use the data structure that makes the most sense for your code and will be the most readable for future maintainers.
      Your experienced guess is good enough for me, brother perrin. Thank you++.

      And 15% more for hash is is small price to pay for convenience and flexibility of hashes, as tilly said. We need to store also keys. 15% is less than I expected... /me have no more concerns about using hashes. Thank you all!

      pmas
      To make errors is human. But to make million errors per second, you need a computer.