http://qs321.pair.com?node_id=649253


in reply to repeatedly delete words expressed in alternation from end of string [regexp]

First, wouldn't it be better to start with a list of words instead of a regex?

my @words = qw( SA NV LTD CO LLC );

So we'll need to build the regex programatically.

my ($re) = map qr/$_/i, join '|', map quotemeta, @words;

Using Regexp::List can greatly speed up the process.

use Regexp::List qw( ); my $re = Regexp::List->new(modifiers => 'i')->list2re(@words);

Now that we have the regex, let's avoid the fragility of 1 while s/// while properly removing spaces.

while (<>) { chomp; s/^ (?: $re \s+ )+//x; s/ (?: \s+ $re )+//xg; print("$_$/"); }

Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

Update: Oops, it could still leave spaces. Fixed.
Update: Added Regexp::List method.

Replies are listed 'Best First'.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by Roy Johnson (Monsignor) on Nov 06, 2007 at 17:42 UTC
    Should only remove the expression from the end of the string. So it's actually a little simpler:
    my @words = qw( SA NV LTD CO LLC ); my ($re) = map qr/$_/i, join '|', map quotemeta, @words; while (<DATA>) { chomp; s/(?:\s*\b$re)+$//; print "[$_]\n"; } __END__ Bobs leave SA Warehouse SA LTD Jims Fine Wines CO LLC

    Caution: Contents may have been coded under pressure.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by princepawn (Parson) on Nov 06, 2007 at 18:10 UTC
    Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.
    Thanks for this. Thing is, I have an entire module full of this mistake. Unless there is a pragma to fix this, then I have to go fix them all manually.


    Ivan Raikov says: the first step to understanding recursion is to begin by understanding recursion.
      Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

      Well, it can. It has virtually no impact for many cases. For the cases where it causes the string being matched to be copied, then the "greatly" only applies if you are matching against a large string.

      Re^6: Can we make $& better? (need) shows that it used to be only a regex w/o /g in a scalar context that incurred this penalty. demerphq patched Perl such that newer Perls also have the penalty for a regex w/o /g in a list context. (So for modern Perls, /g is necessary and sufficient to prevent the copying, it seems.)

      - tye