http://qs321.pair.com?node_id=957042

MaphsterB has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I've recently written some parser code and today discovered what appears to be a memory leak somewhere in the code. Debugging and stripping down the code, I've managed to construct a toy example that illustrates the problem. Here's the code:

#!/usr/bin/perl use strict; use warnings; open(my $fh, '<', 'EXAMPLE.TXT'); my $regexp = qr/(?<value1>\d+)\s+(?<value2>\d+)/; while(<$fh>) { next unless /$regexp/; my $value1 = $+{value1}; my $value2 = $+{value2}; print "GOT $value1 $value2\n"; }

The code is simply using 2 named capture buffers in a regexp to parse out numeric values.

'EXAMPLE.TXT' is just a text file consisting of a pair of numbers on each line. I used

1 2 3 4 5 6 7 8 ...

And so on, for about 100,000 lines, though it doesn't really have to be that long.

I'm working in ActiveState perl, v5.10.0, WinXP x86, and using the task manager to observe how much memory perl uses as it parses the file. Usage steadily increases until the script finishes. For this toy example, it's not so much of an issue, but in my actual project it gets out of hand rather fast.

I've noticed that switching over to $1 and $2 rather than $+{value1} and $+{value2} eliminates the problem, but I prefer using the named capture buffers for clarity as things get big & hairy.

My question is...why? I was assuming that the my-scoped variables within the loop would go out of scope each iteration and free up any references to %+'s elements. I'm aware that %+ is a tied hash, but am not familiar enough with the details of tied hashes to figure out what's going wrong.

Thanks
-Maph

Replies are listed 'Best First'.
Re: Memory Leaks and %+
by JavaFan (Canon) on Feb 29, 2012 at 22:11 UTC
    IIRC, that's a known bug in 5.10, and fixed in 5.12.

    Can you test whether your program is still leaking memory in 5.14 or in 5.15.8?

      This memory leak (I assume that it's the same %+ leak unless there were two different leaks associated with %+) affected me writing Date::Manip so I've been tracking it fairly closely. The leak exists in:
         5.10.1
         5.12.4
         5.14.2
         5.15.5
      
      The bug has been fixed in:
         5.15.6
      
      So unfortunately (to my knowledge) there is currently no released stable version which does not suffer from the leak, and unless they release a maintenance version of 5.14, it won't be until fixed in a stable release until 5.16.
Re: Memory Leaks and %+
by onelesd (Pilgrim) on Feb 29, 2012 at 23:15 UTC
    Perl 5.10 was released in 2008 - try a more recent release before you rip all of your hair out.
Re: Memory Leaks and %+
by MaphsterB (Initiate) on Mar 01, 2012 at 13:36 UTC

    Dang. I was hoping it was something wrong with my code, and therefore fixable. Unfortunately, for the intent & purpose of my particular script I'm stuck with 5.10. Network distribution and company policy and all that.

    I will try it out on a newer version when I get the chance and report back, though.

    In the meantime, I've found a quick n'dirty fix in case anyone else encounters this archaic headache. Just switch to @+ and @- instead of the more convenient %+:

    #!/usr/bin/perl use strict; use warnings; open(my $fh, '<', 'EXAMPLE.TXT'); my $regexp = qr/(?<value1>\d+)\s+(?<value2>\d+)/; while(<$fh>) { next unless /$regexp/; my $value1 = substr($_, $-[1], $+[1] - $-[1]); my $value2 = substr($_, $-[2], $+[2] - $-[2]); print "GOT $value1 $value2\n"; }

    And thanks for the responses!

    -Maph

      Isn't it easier to just switch over to $1 and $2 at this point? Your new substr workaround isn't very readable (to me anyway), so it appears that you've lost the one thing you were hoping to gain by using %+.

      my $regexp = qr/(\d+)\s+(\d+)/; while(<$fh>) { next unless /$regexp/; my $value1 = $1; my $value2 = $2; print "GOT $value1 $value2\n"; }

        I completely agree yours is more readable that the OPs alternate version. But even more readable is:

        my $regexp = qr/(\d+)\s+(\d+)/; while(<$fh>) { my( $value1, $value2 ) = /$regexp/ or next; print "GOT $value1 $value2\n"; }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?