There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.
Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.
I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.
Note that I have not checked if all begfoo sets are parsed correctly...
I also don't have a version 5.8 to work with.
use 5.022;
use warnings;
use Benchmark qw {:all};
open my $fh, 'x.txt' or die;
my $data = do {local $/ = undef; <$fh>};
cmpthese (
10,
{
one => sub {parse_foo1($data)},
two => sub {parse_foo2($data)},
}
);
sub parse_foo1 {
my ($text) = @_;
my $name;
{
last if $text =~ /\G \s* \Z/gcmsx;
if ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (.
+*?) \s* \) \s* ;/gcmsx) { $name = $1 }
elsif ($text =~ /\G \s* ^ \s* endfoo /gcmsx) { }
elsif ($text =~ /\G \s* ^ \s* \S+ \s+ .*? \s* ;/gcmsx) { }
else { die "ERROR: unknown syntax\n" }
redo;
}
print "LAST FOO1: $name\n";
}
sub parse_foo2 {
my ($text) = @_;
my $name;
while (not $text =~ /\G \s* \Z/gcmsx) {
$text =~ /\G \s* /gcsmx; # march through any white space
if ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \)
+ \s* ;/gcmsxaa) { $name = $1 }
elsif ($text =~ /\G endfoo /gcmsx) { }
elsif ($text =~ /\G \S+ \s+ .*? \s* ;/gcmsx) { }
else { die "ERROR: unknown syntax\n" }
}
print "LAST FOO2: $name\n";
}
Example results:
v5.32.0
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
Rate one two
one 2.08/s -- -72%
two 7.53/s 261% --
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.