http://qs321.pair.com?node_id=998232


in reply to RegEx + vs. {1,}

If you want a list of all two letter patterns that appear at least twice somewhere in your string, you need to make three changes to your regex.

  1. you need to make (\w{2,}) non-greedy by adding a "?" to the end, e.g. (\w{2,}?).
  2. you need to wrap what comes after (\w{2,}?) with a zero width lookahead group. Otherwise you will miss all the matches between the first and second occurrence of "ab"
  3. you need to handle repetitions of your regex slightly differently. Instead of /( mumblefoo )+/ you need /mumblefoo/g. Using a + the way you did will only get you the last match found because each time the + causes the regex to repeat, it replaces the previous match.

Taken together these changes will make your regex will look like this: /(\w{2,}?)(?=.*?\1)/g:

print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab", "\n"; print "<" . join('|',$x =~ /(\w{2,}?)(?=.*?\1)/g) , ">\n"; #outputs: <ab|cd|ef|ab|cd|ab>

You can more info on zerolength lookaheads via the Extended Patterns section of the perlre manpage on perldoc

Replies are listed 'Best First'.
Re^2: RegEx + vs. {1,}
by choroba (Cardinal) on Oct 10, 2012 at 14:25 UTC
    Or, just remove the comma from the quantifier: \w{2}
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Indeed. There can never be a three character sequence in your string which occurs more frequently than a two character sequence. (Because the three character sequence contains two two character sequences.)

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        But what should be returned if we have string 'abcabcabcdef' with same amount of three-chars-long and two-chars-long string? 'ab' or 'abc'? I assume OP wants the longer one.