http://qs321.pair.com?node_id=519910

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Nub: Trying to use regular expressions to match any letter, provided it is a doubled letter, like balloon or subbookkeeper but not bar or zed. Neither [a-z]{2} nor \w{2} work. How would you-all do it, O wise Perl Monks?

Reason is for a fairly simple guessing game; based on the test above is foo behind the Green Glass Door or otherwise? I'm writing a script to play that game as a simple project for kicks and giggles. Your help is invaluable.

jdporter added <code> tags

Replies are listed 'Best First'.
Re: Using regex to match double letters, and only double letters
by McDarren (Abbot) on Dec 30, 2005 at 01:01 UTC
    You need to use capturing parentheses, and then backreference. EG:
    #!/usr/bin/perl -w use strict; while (<DATA>) { next if ($_ !~ /([A-Za-z])\1/); print; } __DATA__ balloon hello world foo bar perlmonks merry christmas

    Which gives:

    balloon hello foo merry

    See perlre for more information.

    Hope this helps,
    Darren :)

      Good solution! I couldn't help but golf it a little. :)
      #!/usr/bin/perl -w use strict; while (<DATA>) { print if (/([a-z])\1/i); } __DATA__ balloon hello world foo bar perlmonks merry christmas
      That helps immensely, thank you very much indeed. Why I didn't think of that, I shall never know.
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Using regex to match double letters, and only double letters
by davido (Cardinal) on Dec 30, 2005 at 05:05 UTC

    I like using POSIX character sets when they're applicable because they will "work" nicely with the locale and utf8 pragmas, and Unicode. When you see [a-zA-Z] in code, you're looking at code that probably won't play nice with locales. A good way to accommodate locales is POSIX character classes. So re-written with that in mind, you have this:

    print "Match!\n" if $string =~ m/([[:alpha:]])\1/; # Probably good.

    ...instead of the possibly problematic:

    print "Match!\n" if $string =~ m/([a-zA-Z])\1/; # Maybe bad!

    Dave

Re: Using regex to match double letters, and only double letters
by jdporter (Paladin) on Jan 09, 2021 at 19:36 UTC

    a little trick. (Still uses regexes, but nothing complicated, not even any captures.)

    sub find_runs { local $_ = shift; $_ ^= substr $_, 1; # string shifted left one char. my @r; push @r, [ $-[0], $+[0] - $-[0] + 1 ] while /\x00+/g; @r } # find the 'oo' and the 'ttt' my $s = "look for doubled lettters"; for my $r ( find_runs($s) ) { print substr $s, $r->[0], $r->[1]; }
Re: Using regex to match double letters, and only double letters
by Locutus (Beadle) on Apr 16, 2018 at 15:01 UTC

    Duck Duck Go pointed me to this quite old thread when searching for "regular expression matching double letters". Although the OP emphasized "and only double letters" in its title all solutions posted so far also match double letters which are part of triple, quadruple, ... letters (e.g. in "Helllo", "cooool") and within 12 years no one seems to have cared about that. For what it's worth I do care about it and found it surprisingly difficult to come up with a regex that matches doubles only. One of my closest approaches is

    /(.)((?!\1).)\2(?!\2)/

    which uses a negative lookahead to make sure there's something else before the double letter and another negative lookahead to make sure there's something else following the double letter. However, this regex still matches the "something else before" character, too - as I have to use a capturing group for it in order to have a corresponding backreference to use in the lookahead assertion.

    Could it be that finding doubles only is not a regular problem in the sense of Theoretical Computer Science? Or is it just me being unable to find a proper solution?

      Further to AnonyMonk's Re^2: Using regex to match double letters, and only double letters regex:   Note that the look-ahead is not strictly necessary:
          / (.) \g-1 (?: \g-1+ (*SKIP)(*FAIL))? /x
      works | seems to work equally well. Since Perl version 5.10+ regex extensions must be used (for (*SKIP) (*FAIL)) anyway, I've also used \gn relative backreferencing so the regex can be defined in a  qr// more safely; this can be more convenient in an extractive application.


      Give a man a fish:  <%-{-{-{-<

      #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; for my $s (qw( xxabyycdzz axxbmmmcyyd xxyy exxyy xxyyf xxgyy mmm mmmen +nn )) { say for '---', $s; say $2 while $s =~ /(?|^((.))\2(?!\2) |(?<=(.)(?!\1))(.)\2(?!\2))/gx; }

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Your regex (as a whole) matches the "something else before" character as well - printing the content of a certain capture group is cheating! ;-)

      Perl regex engine has support for (*SKIP) since version 5.010; so you could write:

      /(.)\1((?!\1)|\1*(*SKIP)(*FAIL))/

Re: Using regex to match double letters, and only double letters
by vennirajan (Friar) on Dec 30, 2005 at 08:33 UTC
    Hi,

    Using the above solution, you can also try the programs to find out the palindromes. For example,
     print $_ if ( $_ =~ /([[:alpha:]])([[:alpha:]])([[:alpha:]])([[:alpha:]])\3\2\1/i );
    This above code matches of the palindrome of length 6.

    In the same way you can apply your logic to find out the palindromes.

    Hope this will enhance your view in regular expressions.

    Regards,
    S.Venni Rajan.
    "A Flair For Excellence."
                    -- BK Systems.
      Lots of things you can do with backreferences. Even check for primes:
      perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/'
      (by Abigail)
Re: Using regex to match double letters, and only double letters
by Anonymous Monk on Jan 09, 2021 at 15:54 UTC

    My Perl-fu is non-existent but this caught my eye in another web search I was doing. I propose the following, non-golf, trivial method which does not require backreferencing;

    ([aA]{2}|[bB]{2}|[cC]{2}|[dD]{2}|[eE]{2}|[fF]{2}|[gG]{2}|[hH]{2}|[iI]{2}|[jJ]{2}|[kK]{2}|[lL]{2}|[mM]{2}|[nN]{2}|[oO]{2}|[pP]{2}|[qQ]{2}|[rR]{2}|[sS]{2}|[tT]{2}|[uU]{2}|[vV]{2}|[wW]{2}|[xX]{2}|[yY]{2}|[zZ]{2})

    Needless to say, yikes.