Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Improving performance

by dominic01 (Sexton)
on Mar 14, 2015 at 12:26 UTC ( [id://1120045]=perlquestion: print w/replies, xml ) Need Help??

dominic01 has asked for the wisdom of the Perl Monks concerning the following question:

I have a big Perl script (~7000 lines) with thousand of regular expressions. Though it is working fine, now I am checking the possibilities of improving its performance. I am using a pattern (?:[.,:;] ?| ) in about 250 places. If I define it in qr/(?:[.,:;] ?| )/ and use the variable in the regular expression, I don't see any any improvement. Like the above I have identified few more patterns.
Appreciate any suggestion or pointers in this regard

Replies are listed 'Best First'.
Re: Improving performance
by Athanasius (Archbishop) on Mar 14, 2015 at 12:46 UTC

    Hello dominic01,

    The Camel Book (4th Edition, 2012) has a section on “Time Efficiency” which contains the following (p. 692):

    Short-circuit alternation is often faster than the corresponding regex. So:

    print if /one-hump/ || /two/;

    is likely to be faster than:

    print if /one-hump|two/;

    at least for certain values of one-hump and two.

    Also, if your regex is more likely to find a space than a punctuation character, test for the space first. Only profiling will show whether these kinds of tweaks make significant improvements, but they’re worth a try.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Improving performance
by Anonymous Monk on Mar 14, 2015 at 12:54 UTC

    Don't guess at which things in your code might be slow, measure it, e.g. Devel::NYTProf

    Also, using the same regex 250 times sounds like your code might be in need of some refactoring, like pulling any repeated code blocks into subroutines.

Re: Improving performance
by AnomalousMonk (Archbishop) on Mar 14, 2015 at 15:09 UTC
    If I define it in qr/(?:[.,:;] ?| )/ and use the variable in the regular expression ...

    This does not directly address your performance concerns, but when qr//-defined regex objects are interpolated into other regexes, either qr//, m// or s///, the process is more or less like string interpolation; the new regex does not somehow "call back" to the interpolated regex as a subroutine call within another subroutine would do.

    To get a feel for this process, write and run (with full warnings and strictures) some code like the following, which just prints compiled regexes. (This is untested because I can't provide a working example at the moment.) Note that qr// automatically encapsulates its pattern in a  (?:...) non-capturing group that preserves the regex modifier flags, so the non-capturing grouping within  qr/(?:[.,:;] ?| )/ is redundant (but does no harm); try it both ways.

    my $foo = qr/[.,:;] ?| /; print $foo, "\n"; my $bar = qr{ $foo+ (?: hic | hac | hoc) }xms; print $bar, "\n"; my $baz = qr{ $bar{42} f[eio]e }xms; print $baz, "\n";

    Give a man a fish:  <%-(-(-(-<

Re: Improving performance
by hdb (Monsignor) on Mar 14, 2015 at 13:05 UTC

    It also depends how you use the results. You do not seem to capture the result from your match. So in some circumstances I could imagine that

    /[.,:; ]+/

    might do the job as well (like matching the bit between two words in a text, assuming no two punctuation signs or blanks in sequence). More context would be helpful to provide more support!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1120045]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-19 03:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found