Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Regex replace leaking memory.

by Striker1440 (Sexton)
on Aug 25, 2020 at 03:42 UTC ( [id://11121066]=perlquestion: print w/replies, xml ) Need Help??

Striker1440 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I've run into some strange logic, which I'm suspecting is a bug within the regex engine of later Perl versions.
Using libdevel-leak-perl with this simplified code sample:
use strict; use warnings; use Devel::Leak; my $string = "TESTING STRING"; my $count = Devel::Leak::NoteSV (my $handle); print ($string =~ s/\ STRING//); undef $string; Devel::Leak::CheckSV ($handle);
Testing this using Perl 26 / 30 the code snippet leaks showing:
new 0x55fd7b6db1e0 : SV = PV(0x55fd7b6dbe60) at 0x55fd7b6db1e0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x55fd7b713c00 "TESTING STRING"\0 CUR = 14 LEN = 16 COW_REFCNT = 1
Testing with Perl 16, this does not leak.
It looks like a copy of the input string is being made and stored in a SV but not being cleared up after the regex completes its replace?
My questions are, is this actually a bug within Perl? Am I doing something fundamentally wrong? If it is a fundamental bug are there any work arounds that could avoid leaking?

Striker.

Edit: Adding Perl version information.

Tested and confirmed leaking on:

Ubuntu 16.04
This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi

Ubuntu 18.04
This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi

Ubuntu 20.04
This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi

Tested and not leaking on:

Centos 7
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

OEL 7
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Additional Edit: Adding full instructions for reproduction on WSL.

Seems like this hasn't been so easy to reproduce elsewhere, although I've seen it on multiple systems. After getting home I spun up a brand new Ubuntu 18.04 Windows Subsystem for Linux version 2, I was able to reproduce it with these steps:

Run an sudo apt update

Install sudo apt install libdevel-leak-perl

Create the file regex_test.pl with the contents:
use strict; use warnings; use Devel::Leak; my $string = "TESTING STRING"; my $count = Devel::Leak::NoteSV (my $handle); print ($string =~ s/\ STRING//); undef $string; Devel::Leak::CheckSV ($handle);

Run the test with perl regex_test.pl

Which results in:
new 0x561fa5f971e0 : SV = PV(0x561fa5f97e60) at 0x561fa5f971e0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x561fa5fc6320 "TESTING STRING"\0 CUR = 14 LEN = 16 COW_REFCNT = 1

I'm not sure why it seems to be a bit flaky to reproduce, perhaps slight library / distribution differences that cause this to manifest. But thanks to everyone that's had a look at this so far! I'll investigate some more myself and if I do find a solution I'll make sure to add it to the end of this post.

Another, Another Update:

After doing some more testing I saw the posts indicating that the CPAN version behaved differently to the APT version. I believe this is the case although they seem to have the same version number?

If I use the CPAN version I can see the behaviour that the others are seeing where It simply prints:
1 new 000000000072a728 :
To me this indicates that the CPAN module is "broken" it indicates a leak as noted by the "new 000000000072a728 :" section, but does not print the "debug" information associated with the SV that is being kept. If I switch from the CPAN module back to the APT version I get the full trace of information associated with the SV as previously noted. Thoughts?

Replies are listed 'Best First'.
Re: Regex replace leaking memory.
by dave_the_m (Monsignor) on Aug 25, 2020 at 08:12 UTC
    When perl does a match, it takes a copy of the original string (and a list of indices of where $1, $& etc matched) so that if $1 or $& etc are used later within the current scope, their value can be constructed on demand (the $1 etc variables are effectively tied). This extra SV is stored along with the regex object. I suspect in this case Devel::Leak is wrong.

    Note that the copy uses copy-on-write, which allows multiple string SVs to share the same string buffer. In your case, the constant string "TESTING STRING" is shared with $string when the latter is assigned to. When the match is done, the string becomes shared for a second time. When the substitution is then done, $string is modified and its string buffer becomes unshared. This leaves the original buffer shared by the constant SV and the SV in the regex. I think the regex's SV is the one being displayed by Devel::Leak.

    Over various releases, perl has differed in how it manages the copy kept by the regex; in particular, when the pattern itself had no captures and $& hadn't yet been seen during compilation perl used to overly-optimise by not capturing, and eval '$&' could return garbage.

    Dave.

Re: Regex replace leaking memory.
by swl (Parson) on Aug 25, 2020 at 04:56 UTC

    I cannot reproduce your issue using Strawberry Perl 5.28.0.

    1 new 000000000072a728 :

    You might need to provide more details from perl -v (or perhaps perl -V).

    Update:

    Also cannot reproduce using perl 5.30 using perlbrew under Windows Subsystem for Linux.

    perl -v:

    This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-li +nux
      Edited original message to include distribution and Perl versioning.

        Thanks for the update.

        I also cannot reproduce on Centos 7.8.2003:

        This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-li +nux

        One difference between yours and mine is that yours are all multithreaded, with the failing ones being gnu-thread-multi. I don't know if that has any relevance, but hopefully others can comment on that.

        Update:

        I also tested under an Ubuntu version 18 with the system perl, version 5.26.1 x86_64-linux-gnu-thread-multi, and could not reproduce.

Re: Regex replace leaking memory.
by kcott (Archbishop) on Aug 25, 2020 at 08:37 UTC

    G'day Striker1440,

    Welcome to the Monastery (at least, as a first-time poster — I see you registered four years ago).

    I ran this for you using these:

    This is perl 5, version 30, subversion 0 (v5.30.0) built for cygwin-th +read-multi This is perl 5, version 32, subversion 0 (v5.32.0) built for cygwin-th +read-multi

    I had to install Devel::Leak for both. I used the cpan utility; both have v0.03.

    Here's the output for 5.30 and 5.32, respectively:

    new 0x600003a80 : 1 new 0x600003b38 : 1

    — Ken

Re: Regex replace leaking memory.
by swl (Parson) on Aug 25, 2020 at 22:26 UTC

    It looks like it is to do with the libdevel-leak-perl package, as installing from CPAN does not have the issues, even with the system perl under Ubuntu. I can reproduce it using the Ubuntu system perl with that package (I also posted that in node 11121070, but it's easy to miss if you are only scanning for direct responses).

Re: Regex replace leaking memory.
by k-mx (Scribe) on Aug 30, 2020 at 12:33 UTC
    I think this is same issue as discussed in this thread

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11121066]
Approved by Corion
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found