Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Memory usage double expected

by sectokia (Pilgrim)
on Oct 27, 2022 at 04:50 UTC ( [id://11147727]=perlquestion: print w/replies, xml ) Need Help??

sectokia has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

With 64bit strawberry Perl on windows, when I have very large scalars, I notice that the RAM usage is twice as high as I would expect.

A basic example is:

use Devel::Size qw(total_size); $x = 'a' x (2**32); # 4GiB print total_size($x) . "\n"; sleep 60;

The output is 4294967330, however the windows commit and active working set for perl.exe is 8GiB. Devel::Peek also reports 4GiB for the SV len... ?

Replies are listed 'Best First'.
Re: Memory usage double expected
by LanX (Saint) on Oct 27, 2022 at 08:51 UTC
    I reduced it to 1 GB and was still able to reproduce it.

    But the effect disappeared, after I changed the logic to avoid temporary data on the RHS.

    use v5.12; use warnings; use Devel::Size qw(total_size); my $x = 'a'; $x x= 2**30; # 1GiB print total_size($x) . "\n"; sleep 60;

    Seems like the allocated extra space for your 'a' x (2**32) wasn't released.

    update

    see also Re^2: Memory usage double expected (run-time)

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Re: Memory usage double expected
by bliako (Monsignor) on Oct 27, 2022 at 06:51 UTC

    If that's any help, linux's pmap also reports 8615556K

    Edit:

    Could it be because Perl (and others) re-allocates the size of $x to double the last requested size? Unlikely if total_size reports 4Gb I guess.

    Try reading a large file instead of creating a huge scalar in $x.

Re: Memory usage double expected
by Discipulus (Canon) on Oct 27, 2022 at 12:09 UTC
    Hello,

    what I see is nonsense to me

    This is perl 5, version 26, subversion 0 (v5.26.0) built for MSWin32-x64-multi-thread

    With your program (little modification to not watch the task manager) I see it doubled

    use strict; use Devel::Size qw(total_size); my $x = 'a' x (2**30); print "Devel::Size = ".human(total_size($x)). +"\n"; open my $cmd, qq(tasklist /NH /FI "PID eq $$"|) or die; while (<$cmd>){ print qq(tasklist PID $$ = $1\n) if /(\S+\s\w{1,2}$)/} sub human{ my $size = shift; my @order= qw/Tb Gb Mb Kb byte/; if($size<1024){return"$size byte"} while ($size >= 1024){$size=$size/1024;pop @order;} return sprintf("%4.2f %2s", $size, (pop @order)); } __END__ Devel::Size = 1.00 Gb tasklist PID 37288 = 2.104.612 K

    But with this version of mine I see what everyone is expecting to:

    use strict; use warnings; use Devel::Size qw(total_size); my $x; foreach my $order ( qw(20 24 30 32) ){ $x = 'a' x ( 2 ** $order ); print "\n\nsize of scalar 2**$order\n"; print "Devel::Size = ".human(total_size($ +x))."\n"; open my $cmd, qq(tasklist /NH /FI "PID eq $$"|) or die; while (<$cmd>){ print qq(tasklist PID $$ = $1\n) if /(\S+\s\w{1,2} +$)/} } sub human{ my $size = shift; my @order= qw/Tb Gb Mb Kb byte/; if($size<1024){return"$size byte"} while ($size >= 1024){$size=$size/1024;pop @order;} return sprintf("%4.2f %2s", $size, (pop @order)); } __END__ size of scalar 2**20 Devel::Size = 1.00 Mb tasklist PID 19660 = 8.452 K size of scalar 2**24 Devel::Size = 16.00 Mb tasklist PID 19660 = 23.820 K size of scalar 2**30 Devel::Size = 1.00 Gb tasklist PID 19660 = 1.056.012 K size of scalar 2**32 Devel::Size = 4.00 Gb tasklist PID 19660 = 4.201.748 K

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Memory usage double expected
by NERDVANA (Deacon) on Oct 29, 2022 at 05:04 UTC
    Ok, here's a layman explanation:

    Perl is free to do whatever it thinks might help your program run faster, usually at the expense of using more memory. You can't make any assumptions about memory usage of a script. In this case, perl probably created a compile-time constant for that which it copies when you ask for its value.

    If you are working with extremely large data, you can't toss it around in the usual manners without large spikes of memory usage. You need to consider something like File::Map and then pass around references to that scalar instead of passing the scalar around by value. If you describe more of your needs, we can suggest better techniques to avoid loading it all into memory at once.

Re: Memory usage double expected
by kcott (Archbishop) on Oct 28, 2022 at 06:43 UTC

    G'day sectokia,

    I have Perl v5.36.0 (via Perlbrew) running on Cygwin which is running on Win10. Cygwin and Win10 were both updated in the last 24 hours; so, everything is up-to-date.

    I ran your code and got the same results: Devel::Size & Devel::Peek showing 4GB; MSWin showing 8GB.

    I changed the size from 2**32 (4GB) to 2**31 (2GB). Devel::Size & Devel::Peek are now showing 2GB, as expected; however, MSWin is still showing 8GB.

    In both cases, all of the 8GB was freed when the Perl code completed.

    I don't want to jump to any conclusions: I'm sure there would be differences between PerlbrewPerl-Cygwin-MSWin and StrawberryPerl-MSWin. However, it does seem to me that issues are possibly related more to MSWin memory management than Perl itself.

    I'd suggest rerunning your code with a variety of sizes; checking Devel::Size, Devel::Peek & MSWin memory values for each.

    — Ken

Re: Memory usage double expected -- further questions
by Discipulus (Canon) on Oct 28, 2022 at 07:55 UTC
    Hello all,

    can someone explain this in a more layman way? I have still many doubts:

    • (1) $x = 'a' x (2**32) provokes the wrong behavior. I read RHS is compile-time constant so what I understand is that Right Hand Side expression is evaluated during compile time. But why doubles the memory used? And why 2**${\32} forces it to be posponed to runtime? And why the latter does not double the memory?
    • (2) COW aka Copy On Write should be the default in an assignement and this should prevent the memory to be doubled. So in this case is effectively a bug?
    • (3) in the two snippets I posted here I cannot spot any real difference in the assignement but they behaves differently. Why?

    # wrong beaviour as it doubles the memory # first code of my previous post, very similar to the OP one my $x = 'a' x (2**30); # RIGHT beaviour, it does NOT double memory used # second code posted above my $x; foreach my $order ( qw(20 24 30 32) ){ $x = 'a' x ( 2 ** $order ); ... # RIGHT beaviour, even with my $x declared inside the foreach loop foreach my $order ( qw(20 24 30 32) ){ my $x = 'a' x ( 2 ** $order ); ...

    In addition every perl I have atm ( strawberry portable: 5.26.0 5.22.3 5.24.2 5.26.2 ) I observe the same beahviours of the two above programs, ie. doubled and not doubled; I read also Linux users experience the same. So it must be something really bound to Perl itself and I'd like to know why and how to prevent this: a doubled memory footprint is not such a great feauture to have :)

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      (3) in the two snippets I posted here I cannot spot any real difference in the assignement but they behaves differently. Why?

      $x = 'a' x (2**30); is different from $order=30; $x = 'a' x ( 2 ** $order );. The first one's RHS is considered a constant. The other's is not as it has that $order.

        > The first one's RHS is considered a constant

        here a little demo of the constant folding to make it clearer

        C:\tmp>perl -MO=Deparse -e "$_ =10; my $x = 'a' x 10" $_ = 10; my $x = 'aaaaaaaaaa'; -e syntax OK C:\tmp>perl -MO=Deparse -e "$_ =10; my $x = 'a' x $_" $_ = 10; my $x = 'a' x $_; -e syntax OK C:\tmp>

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

Re: Memory usage double expected
by Anonymous Monk on Oct 27, 2022 at 07:14 UTC

    RHS is compile-time constant and so no COW for it? Try $x = 'a' x (2**${\32});

      > Try $x = 'a' x (2**${\32});

      yes, forcing the creation of the RHS into runtime works.

      edit

      Dave_The_M will know better, why the space of an inline constant wasn't released.

      I suppose it's just an overlooked optimization.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

        Tim Bunce's excellent Profiling Memory Usage talk gives some insight into what's going on at the 09:20 minute mark:

        sub foo { my $var = "X" x 10_000_000; } foo();

        Tim explains that:

        1. The buffer for the $var lexical is preserved (for next time you call the subroutine).
        2. The compiler says hey "X" is a constant and 10_000_000 is a constant, so I'm gonna build you a 10 MB constant! And I'm going to keep it here to one side so when you call this subroutine again I can just copy it in for you! (aka Constant folding).

        With recursion, things get much worse (sorry, I couldn't bring myself to watch that part :).

        Though Tim's Devel::SizeMe module might be useful, AFAIK it is up for adoption and not being actively developed ATM.

        I also keep a list of Memory Tools References (from this list, Mini-Tutorial: Perl's Memory Management by ikegami is definitely worth reading).

        My initial thoughts:

        In general an assign should COW, but the rules are complex. It's not happening in this case. It's probably a bug.

        Dave.

Re: Memory usage double expected
by bliako (Monsignor) on Oct 28, 2022 at 08:22 UTC

    For investigating this further (in Linux):

    print "pid:$$\n"; my $x = 'monks' x (3); sleep 60;

    and then use gcore <pid> to dump the memory of said process to file (called core.<pid>) and then find occurences of "monks" either with strings core.<pid> | grep monks or hexdump -C core.<pid> | grep monks

    true to what the explanation offered to perlmonks above, the anonymous monk firstly:

    print "$$\n"; my $n = 3; $x = 'monks' x ($n); sleep 60;
    monksmonksmonks monks

    contrast to

    print "$$\n"; $x = 'monks' x (3); sleep 60;
    monksmonksmonks monksmonksmonks monksmon<

    bw, bliako

Re: Memory usage double expected
by harangzsolt33 (Chaplain) on Oct 27, 2022 at 19:58 UTC
    I have noticed the SAME behavior in TinyPerl 5.8 running on Windows XP. I reserve a large amount of memory, let's say 20 MB using the 'x' operator. I want a string that is 20 million bytes and is filled with letter 'A' all the way. So, I do this : my $VAR = 'A' x 20000000; # And boom! It uses 40MB of memory. I used a memory viewer to look into the TINYPERL.EXE application to see what's going on. I thought, it will be filled with 00 41 00 41 00 41 because it might store the letters as Unicode, reserving two bytes for each character. But nope! That's not what happens. Perl literally creates a twice as many letter 'A's in memory than what I want!

    Someone explained it this way: Since Perl sees that both the letter 'A' and the 20_000_000 are constants, it creates a backup copy in memory in order to use it later... Nah, that's not true. because you can replace the 20 million with a variable, and read the number from STDIN and whatever you punch in, it still fills up twice as many bytes with letter 'A's which makes no sense.

    I have played around with this a little bit and discovered that if you use the vec() function, you can initialize a string without wasting memory. vec($A, 19999999, 8) = 0; will give you a string that is exactly 20 million bytes long filled with zero bytes. Now, if you do vec($A, 19999999, 8) = 65; it will still pad the string with zero bytes and insert a letter 'A' at the end. I would like to know if there's a way to tell vec() function to use some other character for padding. It always uses the zero byte as padding. So, to fill up the string with letter 'A's, I would probably create a for loop and repeat the following 5 million times: vec($A, $PTR++, 32) = 0x41414141; That'll give you a string with 20 million letter 'A's. But if you want to write 4 gigs of letter 'A's that'll take quite awhile! lol

    If anybody knows a shortcut to initialize a string with letter 'A's QUICKLY and without using the 'x' operator, please, do tell me!!!

      G'day harangzsolt33,

      "vec($A, 19999999, 8) = 0; will give you a string that is exactly 20 million bytes long filled with zero bytes."

      All good so far. You have a string of the length you wanted. This works for me.

      "So, to fill up the string with letter 'A's, I would probably create a for loop and repeat the following 5 million times: ..."

      Or you could just do this once:

      $A =~ y/\0/A/;

      That works for me. I ran a few tests and that seems to take about five times longer than "$A = 'A' x 20_000_000". Run your own benchmarks, but I think you'd be better off with the 'x' operator.

      — Ken

      So the simple answer is to avoid having both sides of the 'x' operator as constants. For example, this will fix your situations:

      my $VAR = makeA(20000000); sub makeA { return 'A' x (shift) }

      This would also fix it:

      my $VAR = make20m('A'); sub make20m { return (shift) x 20000000 }

      The conclusion seems to be that perl the constant itself occupies memory, and then the variable gets its own copy of the constant.

Re: Memory usage double expected
by harangzsolt33 (Chaplain) on Oct 27, 2022 at 20:13 UTC
    What's more, if you create a sub that creates a big string, and the sub uses that string as the return value, then two copies of that string will live in memory EVEN THOUGH you created it with vec() function. Once you're outside the sub, you can only access one copy, and once you undef it outside the sub, it deletes only one copy! This is really weird behavior. I don't understand it at all. But again, I have played around... and I have noticed that if I pass the variable as an argument, then I only one copy of the variable exists in memory:
    #!/usr/bin/perl -w use strict; use warnings; sub myfunc_1 { my $A = ''; vec($A, 20000000, 8) = 0; return $A; } sub myfunc_2 { vec($_[0], 100000000, 8) = 0; return 0; } $b = <STDIN>; my $BIGSTRING = ''; myfunc_2($BIGSTRING); # memory usage is normal. $b = <STDIN>; undef $BIGSTRING; # Pff! Gone from memory! TINYPERL.EXE memory # usage visibly shrinks in Windows Task Manager. $b = <STDIN>; my $STRING2 = myfunc_1(); # memory usage is double! $b = <STDIN>; undef $STRING2; # Deletes one copy only $b = <STDIN>;

    CONCLUSION: If you write a sub that reads something, let's say, you write a sub called ReadFile() you don't want to return the contents of the file as the return value of the sub, because then two copies of the data will exist in memory. You pass the buffer as an argument to the sub, and then the sub fills it up using

     $_[1] = 'CONTENT';

    Perhaps this is why sysread() also works the same way; instead of returning the bytes that were read from the file, it expects the buffer to be passed to it as an argument. The first argument is the file handle, the second is the buffer and the third is the number of bytes to read. And it returns the number of bytes that were read instead of the actual bytes!

    ALSO NOTE: Even if your sub does not spell out "return $BIGSTRING;" it still returns the multi-megabyte string if that was the return value of the last statement in the sub. And even if you do not use the return value of the sub, it still gets stuck in memory!!!

    #!/usr/bin/perl -w use strict; use warnings; sub MemoryEaterFunc { my $A = ''; vec($A, 20000000, 8) = 0; # Create a large string $A .= $_[0]; # Add to it. Do something. # Evaluate the last statement, # and that is the return value of the sub! even if # you don't write return $A; it still returns it. # If you want the function to not return the big string, # then put 'return 0;' at the end of the sub. } $a = <STDIN>; MemoryEaterFunc(2); # Eats memory and doesn't release it $a = <STDIN>;

    You say, "All right fine! I will return zero and then see what happens!"

    Unfortunately, the MemoryEaterFunc() will still gobble up 20 megabytes of RAM even if it returns zero, and you do not use its return value. To actually make sure that it doesn't gobble up memory, we need to undef the variable $A before exiting the sub!

    Interestingly, if you do not undef $A and you use that same sub repeatedly, it doesn't gobble up an additional 20 MB EACH time. It only uses 20 MB total. Period. (In a way, this proves that Perl uses heap memory instead of stack.)

    MemoryEaterFunc(2);
    MemoryEaterFunc(2);
    MemoryEaterFunc(2);
    MemoryEaterFunc(2);
    # At this point, memory usage is 20 MB.
    $b = <STDIN>;

    Okay, so let's fix this thing so it doesn't waste memory anymore:

    sub MemoryEaterFunc_FIXED { my $A = ''; vec($A, 20000000, 8) = 0; # Create a large string $A .= $_[0]; # Add to it. Do something. undef $A; }

    Now the memory eater doesn't eat memory anymore. It uses 20 MB of memory INSIDE the sub, but then once it gets done, it no longer holds onto that memory.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11147727]
Approved by Discipulus
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-25 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found