http://qs321.pair.com?node_id=298407


in reply to multiple matches with regexp

Well, the Perl way is obviously TIMTOWDI, but I have a Perl-ish RegEx way for you ;-):
$a="aaaa"; $a=~m/(aa)(?{push @a, $1})(?!)/; print join ( "-", @a );
This uses (?{}) (just a bit of code within a RegEx that is executed whenever the RE engine runs over it) and (?!) (negative look-ahead), so that it always fails (that's a bit of its magic), both explained in perlre. You could say it is ugly, but I personally like it :-).
Hope this helped.
CombatSquirrel.
Entropy is the tendency of everything going to hell.

Replies are listed 'Best First'.
Re: Re: multiple matches with regexp
by sandfly (Beadle) on Oct 10, 2003 at 21:08 UTC
    This is clever, and extended my understanding of the RE engine (++), but is it guaranteed to work?

    I got interested in why a negative look-ahead was required, and found that negative and positive failing look-behinds work too, but a simple mis-match doesn't, and neither does a failing zero-length positive look-ahead: (?=x). For example, m/(aa)(?{push @a, $1})x/ does not work. Presumably the regex optimiser sees that there is no 'x' in 'aaaa', so it doesn't bother with the step-wise attempts to match the 'a's.

    Is it possible a future regex engine will realise that mis-match is inevitable because (?!) will always mis-match, and break this code?

      A too smart RegEx engine would already break the (?{}) part of the code, which is evaluated every time the engine runs over it. The main problem is that (?{}) is an experimental feature which may be changed or deleted in future Perl versions. Still, AFAIK, it is considered useful for some RegExes (the above one is fairly standard) which will hopefully prevent major changes in the syntax. And don't forget we have Perl 6 coming up ;-).
      Cheers,
      CombatSquirrel.
      Entropy is the tendency of everything going to hell.
        To avoid experimental features one may choose the following:
        $regexp="a{2}"; $_="aaaa"; push @a , $1 while m/(?=($regexp))./g; print join ( "-", @a ) . "\n";
        Greetings
Re: Re: multiple matches with regexp
by almaric (Acolyte) on Oct 11, 2003 at 00:46 UTC
    I like it, and with
    use re 'eval';
    I was also able to use a regular expression instead of a fixed string: "aa" -> "a{2}"