http://qs321.pair.com?node_id=649239

princepawn has asked for the wisdom of the Perl Monks concerning the following question:

Given an alternating regexp:
my $RE = qr! \b ( SA | NV | LTD | CO | LLC ) \b !ix ;
How would you say: delete any and all of these from the end of the string, leading to
Bobs Warehouse SA LTD => Bobs Warehouse Jims Fine Wines CO LLC => Jims Fine Wines


Ivan Raikov says: the first step to understanding recursion is to begin by understanding recursion.

Replies are listed 'Best First'.
Re: regexp - repeatedly delete words from end of string with words in alternation
by gamache (Friar) on Nov 06, 2007 at 15:58 UTC
    while (s/\s*$RE$//) {}
    or
    s/(\s*$RE)*$//;
    will get you there without leaving trailing spaces in the finished $_. (Update: made it a little easier to read.)
Re: regexp - repeatedly delete words expressed in alternation from end of string
by ikegami (Patriarch) on Nov 06, 2007 at 16:33 UTC

    First, wouldn't it be better to start with a list of words instead of a regex?

    my @words = qw( SA NV LTD CO LLC );

    So we'll need to build the regex programatically.

    my ($re) = map qr/$_/i, join '|', map quotemeta, @words;

    Using Regexp::List can greatly speed up the process.

    use Regexp::List qw( ); my $re = Regexp::List->new(modifiers => 'i')->list2re(@words);

    Now that we have the regex, let's avoid the fragility of 1 while s/// while properly removing spaces.

    while (<>) { chomp; s/^ (?: $re \s+ )+//x; s/ (?: \s+ $re )+//xg; print("$_$/"); }

    Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

    Update: Oops, it could still leave spaces. Fixed.
    Update: Added Regexp::List method.

      Should only remove the expression from the end of the string. So it's actually a little simpler:
      my @words = qw( SA NV LTD CO LLC ); my ($re) = map qr/$_/i, join '|', map quotemeta, @words; while (<DATA>) { chomp; s/(?:\s*\b$re)+$//; print "[$_]\n"; } __END__ Bobs leave SA Warehouse SA LTD Jims Fine Wines CO LLC

      Caution: Contents may have been coded under pressure.
      Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.
      Thanks for this. Thing is, I have an entire module full of this mistake. Unless there is a pragma to fix this, then I have to go fix them all manually.


      Ivan Raikov says: the first step to understanding recursion is to begin by understanding recursion.
        Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

        Well, it can. It has virtually no impact for many cases. For the cases where it causes the string being matched to be copied, then the "greatly" only applies if you are matching against a large string.

        Re^6: Can we make $& better? (need) shows that it used to be only a regex w/o /g in a scalar context that incurred this penalty. demerphq patched Perl such that newer Perls also have the penalty for a regex w/o /g in a list context. (So for modern Perls, /g is necessary and sufficient to prevent the copying, it seems.)

        - tye        

Re: regexp - repeatedly delete words expressed in alternation from end of string (KISS)
by tye (Sage) on Nov 06, 2007 at 17:56 UTC

    I'm a bit surprised that I didn't notice anybody offering the simple, obvious solution:

    s/(\s+$RE)+$//;

    (tested) Update: One was hidden in the middle of gamache's sentence.

    - tye        

Re: regexp - repeatedly delete words from end of string with words in alternation
by kyle (Abbot) on Nov 06, 2007 at 15:57 UTC
    my $RE = qr! \b ( SA | NV | LTD | CO | LLC ) \b !ix ; my $s1 = 'Bobs Warehouse SA LTD'; my $s2 = 'Jims Fine Wines CO LLC'; 1 while $s1 =~ s{$RE}{}; 1 while $s2 =~ s{$RE}{}; print "[$s1]\n[$s2]\n"; __END__ [Bobs Warehouse ] [Jims Fine Wines ]
Re: regexp - repeatedly delete words expressed in alternation from end of string
by toolic (Bishop) on Nov 06, 2007 at 16:00 UTC
    use warnings; use strict; my $RE = qr! \b ( SA | NV | LTD | CO | LLC ) \b !ix ; while (<DATA>) { s/$RE//g; print; } __DATA__ Bobs Warehouse SA LTD Jims Fine Wines CO LLC

    Outputs:

    Bobs Warehouse Jims Fine Wines