Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

List overhead

by malloc (Pilgrim)
on Oct 18, 2001 at 00:13 UTC ( [id://119546]=perlquestion: print w/replies, xml ) Need Help??

malloc has asked for the wisdom of the Perl Monks concerning the following question:

Masters,
I was just wondering if anyone had done any research into the overhead associated with lists and hashes in Perl. This question arises as i was just building and array in perl, and found my memory climbing to over 100 meg, when the output of that array takes up 30 meg in a file! Now, this could be because the array had numerous small items, namely about 1.6 million entries of 30 or so bytes each. Is this all overhead? and generally , what overhead can be expected with lists? Is it constant to how many items you have + size of items? Thank you for any wisdom you can shed. (I don't think i have any mem leaks, i took out the array push and mem usage dropped to 2 meg)

-malloc

searches on list/array overhead were not fruitful

Replies are listed 'Best First'.
Re: List overhead
by perrin (Chancellor) on Oct 18, 2001 at 01:09 UTC
    One thing to keep in mind is that Perl's dynamically growing lists have a memory cost associated with them. It allocates memory in chunks when it thinks it needs more. One result of this is that Perl might allocate for 50 elements, and when you hit element 50 in an array Perl might decide to grab RAM for another 50 elements, just in case you're going to use them.

    One way to cut down on array size is to pre-size for the number of elements you expect all in one chunk. I think I recall this was done like so:

    $#foo = 100; # expecting 100 elements
    All of this is discussed in the back of the Camel in the section on optimizing memory.
Re: List overhead
by tadman (Prior) on Oct 18, 2001 at 02:36 UTC
    Hashes, by their very nature, are often full of empty pointers, each of which is bound to be at least 4 bytes on 32-bit platforms. If your elements are only 30 bytes each, then a large amount of this memory is probably sitting in unused hash-slots. There's nothing much you can do about this, but if you are running short on RAM, you can use a "tied" hash, as described below.

    Even with lists, the Perl interpreter must store information besides the data itself, such as the length of the scalar, reference counts, and so forth, which normally doesn't add up to a whole lot. I would expect that a minimum of 12-16 bytes to be allocated per scalar just for this kind of internal information. Of course, with several million tiny scalars being allocated, this can add up to a lot of data.

    In compiled languages like C, when you ask for an array of 1,000,000 30-byte strings, that's what you get, usually as a big continuous chunk of RAM. With Perl, unless you want to do something really crazy, like pack your strings into one giant scalar and use substr to extract them, you have to live with the overhead. Usually it's not so bad, but it might come as a bit of a surprise.

    Since you have about 30 MB of real data, you are experiencing about a 1:3 expansion when loaded into RAM. If this is a problem, you can use a "tied" hash which uses far less RAM, but is disk-based and a fair bit slower. Still, tied hashes and regular hashes work the same way. It only requires you to add a few lines to tie the hash, and the rest of your code can stay the same.
Re: List overhead
by dragonchild (Archbishop) on Oct 18, 2001 at 00:20 UTC
    Perl allocates a whole bunch more in RAM than you would think. For example, when you create a 1 element array, closer to 50 elements are pre-allocated.

    As for your immediate question - Perl has these things called SV's, or Scalar Values. It contains an entry for a number as well as a string, among other things. I don't know for sure, but I wouldn't be surprised if those were allocated a little bigger ahead of time for efficiency.

    Thus, you can easily go 3-5 times in RAM what it takes on disk. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: List overhead
by blakem (Monsignor) on Oct 18, 2001 at 00:24 UTC
Re: List overhead
by jeroenes (Priest) on Oct 18, 2001 at 12:52 UTC
    I analyze data consisting of many items but little information per item. I quickly ran across perl's hunger for memory. In my experience (linux 2.2/perl5.6/gcc) perl asks for 43 byte overhead per item (that is just from total memory footprint with large lists in memory, so not really accurate). If your list has 10 million items, you need 400 megabyte of memory. That's simply too much.

    There are solutions:

    1. BerkeleyDB
    2. pack/unpack bytestrings
    3. vec on bitstrings
    I conincedently started to use the latter one this week, and it works good. For large lists, I have to keep a few flags per item. vec returns or sets bits (the desired number of bits, actually) so you do not need pack conversions, and that really clears up the code.

    For really large lists, BerkeleyDB is the way to go. Find it at http://www.sleepycat.com.

    Cheers,

    Jeroen
    "We are not alone"(FZ)

Re: List overhead
by scott (Chaplain) on Oct 18, 2001 at 23:19 UTC

    You might also consider the Perl Data Language if you're doing numerical operations on your arrays.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://119546]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-23 20:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found