http://qs321.pair.com?node_id=1232701


in reply to Difference between exists and defined

I am expecting after the assignment in the second line, the array should consists of four elements #0-3
Why would you expect that? $array[3] is the only element you've assigned a value to, or even mentioned, prior to the loop, so there's no reason for Perl to have allocated storage space for any other elements.

What you have missed is that Perl arrays are sparse data structures, so that you can designed to allow you to assign to $array[8675309] without consuming an unreasonable amount of memory to store the 8675309 unused elements which precede it. They are explicitly not C-style indexes into a region of contiguous memory.

(Also, as previous replies have mentioned, the docs warn against using exists on arrays, so this behavior should be considered implementation-dependent and other versions of the perl binary may potentially behave differently. I kind of doubt that they actually would behave differently in this case, but you still shouldn't rely on it in any code that you care about.)

Replies are listed 'Best First'.
Re^2: Difference between exists and defined (updated)
by AnomalousMonk (Archbishop) on Apr 17, 2019 at 21:31 UTC

    Update: The ideas I've expressed in this post are apparently neither entirely correct nor entirely incorrect! Please see the posts of LanX here, haukex here and dsheroh here.

    $array[3] is the only element you've assigned a value to ... so there's no reason for Perl to have allocated storage space for any other elements. ... Perl arrays are sparse data structures ... you can assign to $array[8675309] without consuming ... memory to store ... unused elements ...

    I think these statements are incorrect regarding Perl positional (if that's the correct term) arrays. (Perl associative arrays are sparse.) Using Windows Task Manager to graph memory usage in real time (Windoze gotta be good for something) when the following code is executed, one can see that assignment to an array element causes contiguous allocation of enough memory to "grow" the array sufficiently to include the assigned element.

    c:\@Work\Perl\monks>perl -wMstrict -le "my @ra; print 'array declared'; sleep 5; ;; $ra[ 100_000_000 ] = 42; print '1st array assignment'; sleep 5; ;; $ra[ 200_000_000 ] = 137; print '2nd array assignment'; sleep 5; ;; print 'byebye'; " array declared 1st array assignment 2nd array assignment byebye
    The same effect is seen with assignment to array length rather than to any element:
        $#ra = 100_000_000;

    It's a question of what to do with the allocated memory. Perl arrays are arrays of scalars, and a scalar is constructed by default in the very well-defined state of un-defined-ness; an "undefined" scalar is a completely specified C/C++ object. So how do you initialize the space for 100,000,000 scalars allocated in the example above? The specific way this question is answered from one CPU/OS/Perl implementation to another is the basis of the ambiguity surrounding the use of exists on allocated but never-accessed array elements.

    My fuzzy understanding of the Perl guts is that to save time (not space!), array elements in the situation described above are quickly created in a state of quasi-existence: the memory is not left as random garbage, but neither is it a sequence of fully-fledged, default-initialized scalars. Hence the advice regarding use of exists with array elements: Don't Do That!™

    Perhaps others more familiar with the details of this question can comment on specifics.


    Give a man a fish:  <%-{-{-{-<

      I don't remember where I read it (probably in the Panther book) but Perl arrays are designed to easily compete with linked lists, while keeping the benefits of indexed access.

      That is to allow dynamic growth on both ends in a very dynamic way.

      An array has an internal index for the first and last element and allocates twice as much space as reserve for push or unshift.

      Basically only the range between the first and last existing element need to be stored, plus mentioned reserve.

      The existing elements are kind of pointers to scalars which are allocated separately.

      Allocation of new space is only needed if the reserve elements are filled, since this happens in exponential steps of doubling* it's statistically very efficient.

      Shrinking the array happens just by adjusting the indices for the first and last element.

      HTH

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      update

      see here Shift, Pop, Unshift and Push with Impunity!

      *) not sure anymore about the doubling, maybe confusing that part with hashes.

      I think these statements are incorrect regarding Perl positional (if that's the correct term) arrays. (Perl associative arrays are sparse.)

      Just wanted to confirm that you're correct that Perl's arrays are not sparse. I haven't yet found a reference in the official docs that says so explicitly, but I'm sure it's somewhere.

      use Devel::Size 'total_size'; my @foo; print total_size(\@foo), "\n"; # prints 64 $foo[100_000_000] = 'x'; print total_size(\@foo), "\n"; # prints 800000114 $foo[200_000_000] = 'x'; print total_size(\@foo), "\n"; # prints 1760000156
      I stand corrected. At some point, I probably read the linked list thing that LanX mentioned and now misremembered it.

      To convince myself, I threw together:

      #!/usr/bin/env perl use strict; use warnings; use 5.010; use Memory::Usage; my @array1; my @array2; my $mu = Memory::Usage->new(); $mu->record('ready to go'); $array1[5268] = 1; $mu->record('array1 has an element'); $array2[8675309] = 1; $mu->record('array2 has an element'); $mu->dump();
      Running this on a Debian 8.11 machine with perl 5.20.2, I get the result:
      time vsz ( diff) rss ( diff) shared ( diff) code ( diff) + data ( diff) 0 20824 ( 20824) 2568 ( 2568) 1916 ( 1916) 8 ( 8) + 920 ( 920) ready to go 0 20824 ( 0) 2568 ( 0) 1916 ( 0) 8 ( 0) + 920 ( 0) array1 has an element 0 88600 ( 67776) 70416 ( 67848) 2048 ( 132) 8 ( 0) + 68696 ( 67776) array2 has an element
      The array index 5268 that I used for array1 is a magic number, apparently corresponding to the minimum size that my perl allocates for an array when it's initially declared. If I increase the index to 5269, it shows an additional 132k (all the numbers are in kilobytes) allocated when array1 is assigned to.