Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Substitution with regex and memory consumption

by k-mx (Scribe)
on Feb 29, 2020 at 11:01 UTC ( [id://11113568]=perlquestion: print w/replies, xml ) Need Help??

k-mx has asked for the wisdom of the Perl Monks concerning the following question:

Hello, comrades monks!

I want to share knowledge about interesting regex behavior:

#!/usr/bin/env perl use strict; use warnings; use 5.022; # WARNING! Will consume about 3GB of RAM $| = 1; my $size = 1024 * 1024 * 1000; my $s = 'C' x $size; # eval << 'EOD'; $s =~ s/C/1/; # copy of CCCCC... will stay in memory $s =~ s/C/2/; # same but 1CCCC... # EOD print "Time to measure memory usage...\n"; sleep(9000);

Every expression with substitution call, will copy original string and store it until next call. So, memory not leaking and will be reused (for e.g. inside loop blocks), but this behavior can lead to serious RAM consumption. The only known (for me) workaround for this problem is string eval, that will force memory reclaim, but this is clumsy in my opinion.

What do you think about such behavior? Is there more elegant way to free up memory used by 's///'?

UPD:

Reproduced on 5.22.2 (Slackware 14.2), 5.30.0 (Centos 7), and no memory waste on 5.16.3 (Centos 7)

Investigation with 'Test::LeakTrace'

Interesting links:

Replies are listed 'Best First'.
Re: Substitution with regex and memory consumption
by dave_the_m (Monsignor) on Feb 29, 2020 at 20:27 UTC
    In general after a successful match (or the match part of a substitution), the regex engine keeps a copy of the original string so that it can dynamically generate values for $1, $2, $&, $` etc on demand. This string needs to be kept for at least as long as the surrounding scope - i.e. the scope of $1 etc. The details are far more complex, but internally perl's Copy-on-Write mechanism often (but not always) avoids having to do a real copy. But it doesn't always work out for the most efficient use of memory.

    Dave.

      Okay, thank you! Some assumptions, please correct me if i'm wrong:

      1. Prior 5.18.0, s/// will copy original only if one of these was set: $&, $`, $'. Interpreter will set global PL_sawampersand flag that can't be disabled later. m// is also affected by this flag.
      2. Between 5.18.0 and 5.20.0, Perl can track usage of mentioned variables separately and copy only requested part of string.
      3. Perl 5.20.0+, successful s/// match always changes string, so COW mechanism always had to copy original.

      So, before 5.20 we have choice: avoid $&, $`, $' and use /p modifier to explicitly copy ${^*MATCH}. Now we can use $&, $`, $', m// don't suffer from PL_sawampersand anymore, but s/// will always copy original string, PL_sawampersand state doesn't matter, and nothing we can do with that.

        That's roughly it, yes.

        Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11113568]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-24 19:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found