Regexp::Assemble hangs with a certain case

kimmel has asked for the wisdom of the Perl Monks concerning the following question:

Okay I am trying to figure out why 'one' works but 'two' just hangs the script. I read perlre and perlretut again to see if the answer would just jump out at me and it didn't.

Edit: I forgot to paste part of the code below. See my follow-up comment for the correct full program.

#!/usr/bin/perl

use v5.16;
use warnings;
use autodie qw( :all );
use utf8::all;
use File::Slurp qw( read_file );
use Regexp::Assemble;
use Benchmark qw( cmpthese :hireswallclock );

my %seen;
my %seen2;

my $fname   = 'dracula.txt';
my $content = read_file($fname);
$content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /;

my @patterns = read_file('sample_patterns');
chomp @patterns;
my $regex = join '|', map {quotemeta} @patterns;
$regex = qr/\b($regex)\b/ixms;


cmpthese(
    -5,
    {   
        'one' => sub {
            $seen{$1}++ while $content =~ /$regex/g;
        },
        'two' => sub {
            $seen2{$1}++ while $content =~ /$regex/;
        },
    }
);
[download]

The source text is Bram Stoker's Dracula a 836KB file with 16,248 lines. The sample_patterns file contains 4,000 patterns, one per line. The only difference between 'one' and 'two' is the g modifier on the regexp.

Comment on Regexp::Assemble hangs with a certain case Download Code

Replies are listed 'Best First'.
Re: Regexp::Assemble hangs with a certain case by golux (Chaplain) on Nov 15, 2012 at 16:44 UTC
Hi Kimmel, It's because in the second case, $content matches the $regex (at the same location each time), so you're never changing the condition; hence never exiting the loop. Try changing "while" to "if", perhaps? say substr+lc crypt(qw $i3 SI$),4,5	[reply]
Re: Regexp::Assemble hangs with a certain case by Anonymous Monk on Nov 16, 2012 at 03:18 UTC
Run this `perl -Mre=debug -le " 1 while q{234} =~ /\d/g "` [download] Compare with one or two pages from this infinite loop `perl -Mre=debug -le " 1 while q{234} =~ /\d/ "` [download] You should notice that without g in m//g the pos-ition doesn't advance, you're always matching against 2, and its always true, and it never ends, cause 2 is always \d Regexp::Assemble hangs with a certain case A regex produced by Regexp::Assemble is not Regexp::Assemble , Regexp::Assemble is not hanging -- but you're not even using Regexp::Assemble to assemble a regex, so its got nothing to do with it	[reply] [d/l] [select]
Re^2: Regexp::Assemble hangs with a certain case by kimmel (Scribe) on Nov 16, 2012 at 17:20 UTC
Ooops I did not check the code I posted, again. I really need to stop doing that. Here is what the program should have looked like. use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{\|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '\|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; my $regex2 = Regexp::Assemble->new->add(@patterns); $regex2->anchor_word(1); $regex2->flags('ixms'); $regex2->re(); cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$regex2->mvar(1)}++ while $content =~ /$regex2/; }, } ); [download] I understand now why it was acting the way it was.	[reply] [d/l]


P is for Practical
	PerlMonks