Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: regex gotcha moving from 5.8.8 to 5.30.0?

by choroba (Archbishop)
on Feb 09, 2021 at 20:22 UTC ( #11128145=note: print w/replies, xml ) Need Help??

in reply to regex gotcha moving from 5.8.8 to 5.30.0?

Can you please provide a sample input? I generated one using
my $str = ""; $str .= "\nbegfoo bar ( xyz ) ;\nendfoo\nqux 123 ;" while 100E+06 > length $str;

but the whole script including the data creation takes only 5 seconds.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: regex gotcha moving from 5.8.8 to 5.30.0?
by mordibity (Acolyte) on Feb 09, 2021 at 21:17 UTC

    Hmm, that's interesting, there is some data-dependency! My first attempt to make some fake data, like yours, didn't lead to any performance difference between 5.8.8 and 5.30.0. So I made the data-faker a little smarter (in particular, multi-line begfoo declarations) and was able to get a delta to show up:

    my $num = shift or die "num?\n"; for my $i (0 .. $num) { my @in = map { "input$_" } (; my @out = map { "output$_" } (; print "begfoo FOO_$i (\n", join(",\n", @in, @out), ");\n"; print " input $_;\n" foreach @in; print " output $_;\n" foreach @out; print " foo inst$_ (j, k, l, m, n, o, p);\n" foreach 0 .. int(ran +d(100)); print "endfoo\n\n"; }

    I generated some dummy output with 50000 definitions: " 50000 >", giving a file about 263Mb and that was large/real enough to show a definite 2x difference:

    • 5.8.8 : 0.01s user 0.02s system 0% cpu 14.474 total
    • 5.30.0 : 0.01s user 0.02s system 0% cpu 37.312 total
      I usually use re 'debug'; when debugging regular expressions, but I'm not sure it's helpful in this case.

      The usual suspects are .* or .*?, because they start by matching the whole string and then backtracking to match less. Can't you replace them with [^;]* or similar?

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        Thx for the idea; I just tried it (using [^;]*? instead of .*? in the two locations) but it didn't really change the (crude) runtimes -- 5.8.8 is over twice as fast as 5.30.0 (~14sec vs ~37sec) on the fake data, and 10x as fast (~6sec vs 105sec) on the real data...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11128145]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2021-04-23 07:36 GMT
Find Nodes?
    Voting Booth?

    No recent polls found