comment on

There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.

Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.

I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.

Note that I have not checked if all begfoo sets are parsed correctly...

I also don't have a version 5.8 to work with.

use 5.022;
use warnings;

use Benchmark qw {:all};

open my $fh, 'x.txt' or die;

my $data = do {local $/ = undef; <$fh>};

cmpthese (
    10,
    {
        one => sub {parse_foo1($data)},
        two => sub {parse_foo2($data)},
    }
);


sub parse_foo1 {
    my ($text) = @_;
    my $name;
    {
        last if $text =~ /\G \s* \Z/gcmsx;

        if     ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (.
+*?) \s* \) \s* ;/gcmsx) { $name = $1 }
        elsif  ($text =~ /\G \s* ^ \s* endfoo            /gcmsx) { }
        elsif  ($text =~ /\G \s* ^ \s* \S+ \s+  .*? \s* ;/gcmsx) { }
        else { die "ERROR: unknown syntax\n" }

        redo;
    }
    print "LAST FOO1: $name\n";
}

sub parse_foo2 {
    my ($text) = @_;
    my $name;
    while (not $text =~ /\G \s* \Z/gcmsx) {

        $text =~ /\G \s* /gcsmx;  #  march through any white space
        if     ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \)
+ \s* ;/gcmsxaa) { $name = $1 }
        elsif  ($text =~ /\G endfoo            /gcmsx) { }
        elsif  ($text =~ /\G \S+ \s+  .*? \s* ;/gcmsx) { }
        else { die "ERROR: unknown syntax\n" }

    }
    print "LAST FOO2: $name\n";
}
[download]

Example results:

v5.32.0
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
      Rate  one  two
one 2.08/s   -- -72%
two 7.53/s 261%   --
[download]

In reply to Re: regex gotcha moving from 5.8.8 to 5.30.0? by swl
in thread regex gotcha moving from 5.8.8 to 5.30.0? by mordibity

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Keep It Simple, Stupid
	PerlMonks