http://qs321.pair.com?node_id=11128173


in reply to regex gotcha moving from 5.8.8 to 5.30.0?

There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.

Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.

I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.

Note that I have not checked if all begfoo sets are parsed correctly...

I also don't have a version 5.8 to work with.

use 5.022; use warnings; use Benchmark qw {:all}; open my $fh, 'x.txt' or die; my $data = do {local $/ = undef; <$fh>}; cmpthese ( 10, { one => sub {parse_foo1($data)}, two => sub {parse_foo2($data)}, } ); sub parse_foo1 { my ($text) = @_; my $name; { last if $text =~ /\G \s* \Z/gcmsx; if ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (. +*?) \s* \) \s* ;/gcmsx) { $name = $1 } elsif ($text =~ /\G \s* ^ \s* endfoo /gcmsx) { } elsif ($text =~ /\G \s* ^ \s* \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } redo; } print "LAST FOO1: $name\n"; } sub parse_foo2 { my ($text) = @_; my $name; while (not $text =~ /\G \s* \Z/gcmsx) { $text =~ /\G \s* /gcsmx; # march through any white space if ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \) + \s* ;/gcmsxaa) { $name = $1 } elsif ($text =~ /\G endfoo /gcmsx) { } elsif ($text =~ /\G \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } } print "LAST FOO2: $name\n"; }

Example results:

v5.32.0 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 Rate one two one 2.08/s -- -72% two 7.53/s 261% --