Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Memory use/leak with large number of (?{}) patterns in regex

by dave_the_m (Monsignor)
on Nov 24, 2019 at 20:00 UTC ( [id://11109155]=note: print w/replies, xml ) Need Help??


in reply to Re: Memory use/leak with large number of (?{}) patterns in regex
in thread Memory use/leak with large number of (?{}) patterns in regex

It's the combination of captures and code blocks. Each time the regex engine is about to execute a code block, it saves the indices of all the captures done so far, so they can be restored at the end. It does this on the pessimistic assumption that code within the block can do anything, including recursively executing the same regex again, overwriting the existing capture indices.

This is why quadratic memory behaviour is being seen.

Not ideal, but can avoided if you use non-capturing braces.

Dave.

  • Comment on Re^2: Memory use/leak with large number of (?{}) patterns in regex
  • Download Code

Replies are listed 'Best First'.
Re^3: Memory use/leak with large number of (?{}) patterns in regex
by jcb (Parson) on Nov 25, 2019 at 01:07 UTC

    Could saving the capture indices be lazily done, with some kind of "regex in use" flag set on the regex, such that recursively executing the same regex causes the capture indices to be preserved, but only if really needed?

    This would slightly add to the general regex overhead, from needing to check the "regex in use" flag on every pattern match, but perhaps that could be folded into the existing logic that handles compiling patterns when needed?

      There's a lot that *could* be done. It's a complete mess at the moment and needs an overhaul - it affects lots of things, not just this issue, e.g. unnecessary slowness using a regex object for a match compared with a literal pattern. Really, the capture state needs splitting off into a separate data structure from the main regex data structure, so that it can be swapped in and out easily, and so a qr// object can be used in multiple places without internally having to clone the whole thing each time.

      It's on my very long list of things to be do, but I'm not likely to do it any time soon.

      Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11109155]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-26 06:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found