http://qs321.pair.com?node_id=725380

lightoverhead has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

my question is:

if I use the code:

#!/usr/bin/perl -w use strict; my @array; map {$_+1}@array[10000..90000]; map {$_+1}@array[91000..92000];

the array elements from 0 to 9999 and 90001 to 90999 will all be "undef". And at this point they will occupy a chunk of memory. Is there a method that I can save the memory that was occupied by these "undef" and use them later?

Thanks.

Replies are listed 'Best First'.
Re: Could we save the memory occupied by "undef" in an array?
by JavaFan (Canon) on Nov 23, 2008 at 03:42 UTC
    You can use a hash.

    An array will use memory for the undefined values. The upshot is that accessing elements and slices is fast.

      Thank you for your reply, the reason I would like to use array is as you said it is fast, however at the expense of eating memory. Also, for these undef elements, I may want to fill in something later.

      So,what if I can only use array to implement, and Is there a method or package that can squash the memory occupied by "undef" in an array and return the the positions back when I try to fill something into the position that was occupied by a undef, sort of like autovivification?

        Before you complain about the speed of a hash, benchmark it. The difference between a hash and an array shouldn't be much. I doubt a sparse array would be any faster.

        Your program is already heavily invested in the performance of hashes. Most objects use hashes as their underlying data structure. Every reference to a package variable results in multiple hash lookups (I think). And every call to a subroutine results in multiple hash lookups (I think).

        Is there a method or package that can squash the memory occupied by "undef" in an array and return the the positions back when I try to fill something into the position that was occupied by a undef, sort of like autovivification?

        The benefit of arrays come from knowing exactly where to look into memory for a given element knowing only a base address and the element index. That becomes impossible if some of the elements are missing.

        Now, there could be something more efficient data structure than a hash for your needs, but accessing it would require calling a subroutine/method. I believe that results in a hash lookup to find the code pointer associated with the subroutine name, so you're worse off than you started.

        On the plus side, arrays use the most efficient of two kinds of undef by default if I read what follows correctly. Instead of creating a new undef SV, uninitialized array elements refer to a global undef constant. That means only sizeof(void*) bytes are wasted per unused slot instead of sizeof(void*) + sizeof(SV).

        >perl -MDevel::Peek -e"$a[2]=undef; $a[3]=3; Dump \@a" SV = RV(0x183c518) at 0x22609c REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x226db0 SV = PVAV(0x22bfcc) at 0x226db0 REFCNT = 2 FLAGS = () IV = 0 NV = 0 ARRAY = 0x1833f9c FILL = 3 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 # Not allocated Elt No. 1 # Not allocated Elt No. 2 # Allocated but undef SV = NULL(0x0) at 0x225f7c REFCNT = 1 FLAGS = () Elt No. 3 # Allocated and defined SV = IV(0x18287b4) at 0x22606c REFCNT = 1 FLAGS = (IOK,pIOK) IV = 3
        So,what if I can only use array to implement, and Is there a method or package that can squash the memory occupied by "undef" in an array and return the the positions back when I try to fill something into the position that was occupied by a undef, sort of like autovivification?
        Why won't a hash do if you don't want to spend the memory? Why do you think someone will have written a module that basically does what a hash does?
Re: Could we save the memory occupied by "undef" in an array?
by perrin (Chancellor) on Nov 23, 2008 at 04:32 UTC
    Are you sure about that? I recall that there is only one copy of the undef value in memory, and I'm not sure that perl fills all the buckets from 0 to 9999 when you set 10000. Why don't you try just setting 10000 and see how much memory the program takes? If it's not much, you can just forget all about this.
      If perl wouldn't fill in the "buckets", then the buckets would contain random garbage. Since perl reuses memory, that garbage may well be old array elements. How would perl know which array elements are "valid" and which ones are "not filled in"?

      Note that (internal, C-level) arrays contain pointers to SVs.

        It could mark a range as empty without actually filling in each bucket.

        I just tried it. Setting element 10000 of an array takes 0.35 MB more RAM than setting element 0 on perl 5.8.8 for OS X on Intel. So, there's a cost, although not a large one.

Re: Could we save the memory occupied by "undef" in an array?
by ikegami (Patriarch) on Nov 23, 2008 at 04:47 UTC
    PDL might be useful. You haven't specified what you are doing, so it's hard to tell.
Re: Could we save the memory occupied by "undef" in an array?
by ig (Vicar) on Nov 23, 2008 at 09:06 UTC