Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Perl6 Regex extremely slow

by czipperz (Initiate)
on May 10, 2015 at 21:31 UTC ( [id://1126261]=perlquestion: print w/replies, xml ) Need Help??

czipperz has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl6 (Rakudo) and am writing a simple regex. It correctly parses, so that is not the problem. It is so slow to parse that I am using the Perl5 m:P5 prefix to be able to parse in a reasonable amount of time. They should both do the same thing.

My Perl6 code is: /^^ (<[#.]>) [\" ([ <-[\\ \"]> | . <-[\"]> | <-[\\]> . ] +) \"   |   (<-[\ ]> +)]/
My Perl5 code is: m:P5/^([#.])(?:\"((?:[^\\\"]|.[^\"]|[\\].)+)\"|([^ ]+))/

They match the same thing (any thing inside quotes including escaped quotes, or the first word) but the Perl5 version is very fast comparatively.

I am running Perl6 version 2015.03 build on MoarVM version 2015.03. And just so you all know, I put a print statement before the first test, between the tests, then after them and it was definitely much (as in seconds) slower to use Perl6. Why is this?

Edit: I just changed my code to remove the  <-[\\ \"]> | and the [^\\\"]| and it pretty much removed the difference. That's really weird.

Replies are listed 'Best First'.
Re: Perl6 Regex extremely slow
by Anonymous Monk on May 10, 2015 at 22:28 UTC

    Your regexes are not identical:

    First, the P6 version reads  <-[\\]> . where your P5 version reads  [\\]. without negation.

    Second, in P6 `|` is subject to longest-token matching, whereas you need to use `||` to match alternatives sequentially.

Re: Perl6 Regex extremely slow
by moritz (Cardinal) on May 13, 2015 at 13:12 UTC
    It is so slow to parse that I am using the Perl5 m:P5 prefix to be able to parse in a reasonable amount of time.

    Rakudo uses the same regex engine, regardless of whether you use :P5 or not.

    But as mentioned before, the regexes don't do the same thing. Why use ^^ (begin-of-line) vs ^ (begin-of-string), for one? With ^^, the regex engine has to scan the entire string for newlines, with ^, it can immediatly know there's no match if the first attempt at matching failed.

    Also, when talking about "slow", it helps to provide some numbers, both for run time and length of the input strings. Rakudo's regex engine is known to be slower than Perl 5's, but without anything concrete to look it, it's impossible for us say what's wrong.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1126261]
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-23 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found