Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: An optimization of last resort: eliminate capturing from your regexps

by demerphq (Chancellor)
on Jul 11, 2006 at 08:43 UTC ( [id://560360]=note: print w/replies, xml ) Need Help??


in reply to An optimization of last resort: eliminate capturing from your regexps

I wonder... A switch could be added that would tell perl that you commit to not modifying the string before utilizing $1. Then the memcopy would be unnecessary. That way you could say

$str =~ /Hulk hate (\w+)/k;

And get the same effect. Of course the demons that would fly out of your nose if you said:

if ($str =~ /Hulk hate (\w+)/k) { $str='demons!'; print $1; # could and probably would throw fatal error. }

Would be your problem...

BTW, if it isnt clear, this is the reason the memcpy is needed.

Alternatively, perhaps magic could be introduced to $str so that the memcpy would only happen if $str was modified while the match vars pointed at it...

A last possibility would be to not do the memcpy if the string was RO. Then you could by hand readonly the string, do the match, use $1 and then when and if you needed to modify the string undo the RO.

I guess another variant could be that the /k would result in no copy, and an understanding that accessing $1 et all would be a fatal error while that regex was the last used. However the @- @+ arrays would be populated. Its just the user would be expected to do the substring operations by hand.

---
$world=~s/war/peace/g

Replies are listed 'Best First'.
Re^2: An optimization of last resort: eliminate capturing from your regexps
by tilly (Archbishop) on Jul 12, 2006 at 01:09 UTC
    Making that a flag on Perl is a horrible idea. Quick, I want to speed up my program, can I use that flag? I dunno, do you want to go through all of the modules you import and didn't write?

    Make that a flag on the regular expression instead. Then people can turn it on when they find a critical part of their code to optimize. And don't have to worry about looking through modules written by other people.

    As tye pointed out, I misread. Sorry.

      Am I misunderstanding you or did you just not notice the second paragraph:

      $str =~ /Hulk hate (\w+)/k;

      Note that demerphq has added /k to his regex. I don't see demerphq proposing any per-script or per-invocation flag that you seem to have imagined.

      - tye        

Re^2: An optimization of last resort: eliminate capturing from your regexps
by Aristotle (Chancellor) on Jul 24, 2006 at 22:04 UTC

    If a pattern doesn’t contain (?{}) or (??{}) bits, then $str cannot change during a match. So in that case it would be feasible to postpone the memcpy until right after the match (before the regex engine returns) and memcpy only the matched bits. That way, all regexen which don’t run Perl code would automatically avoid unnecessary copying. I think that would be a worthwhile patch.

    Makeshifts last the longest.

      I think that's what it does already.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        That would contradict tye’s explanation:

        Capturing in a regex imparts a performance hit because it means that a copy will be made of the string that the regex is being applied to (which makes it a worse performance hit when matching against really large strings – one of the worst cases being running a lot of little regexes with capturing against the same huge string, something a parser is likely to do).

        I don’t see how what tye said would apply if perl already behaved the way I said it should.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://560360]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-04-23 13:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found