debiandude has asked for the wisdom of the Perl Monks concerning the following question:

Hey. I have a string that contains data similiar to this: 'aaabbcdd'. I was trying to write write a regex that would match 'aaa' then 'bb' then 'c' and then 'dd'. That is, it would group together everything that is the same.

I figured I would have to use the lookahead operator to check to see if what he just matched matches what is next but I am not sure how to do this. Also I was planning on doing it in a while loop like this:

my $string = 'aaabbcddeef'; while($string =~ m/\w(?!patternthatworkd)/gi) { do stuff.... }

Thanks for any help

Replies are listed 'Best First'.
Re: Lookahead regex help
by borisz (Canon) on Aug 16, 2004 at 16:24 UTC
    Im not really sure if this is what you have in mind:
    my $string = 'aaabbcddeef'; while($string =~ m/((.)\2*)/gi) { print "$1\n"; } __OUTPUT__ aaa bb c dd ee f
      That seems to work. Thanks. But I am curious as to what the \2 does?
        \2 looks for what the second () pair has captured. In this case a single char.
        $$ perl -MYAPE::Regex::Explain -e'die YAPE::Regex::Explain->new(qr/((. +)\2*)/i)->explain' The regular expression: (?i-msx:((.)\2*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \2* what was matched by capture \2 (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------