http://qs321.pair.com?node_id=985255


in reply to regular expression

You can use this regex:
/(.*)(\1.*)/
The desired string is stored in $2.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: regular expression
by tj_thompson (Monk) on Aug 03, 2012 at 18:33 UTC

    While this will work for this string, we're obviously dealing with a perl novice here and this solution is difficult to understand at best for someone learning the language.

    If you're ever uncertain as to why your regex is not working, put a capture '()' around each element of the regex and dump them out. You'll see exactly which regex elements are matching which pieces of your string. This will give you a lot of insight into what might be going wrong. Also be sure that you check the return value of your match is true. You might not be matching at all.

    As pointed out by many previous posters, I recommend the OP research greedy and non-greedy matches. Also read up on anchoring regexes with '^' and '$'.

Re^2: regular expression
by pileofrogs (Priest) on Aug 03, 2012 at 16:51 UTC

    Woah!

    That works. I expected it would fail because the first group would match everything and then there would be nothing for the 2nd group to match against.

    Here's the code I used to test:

    #! /usr/bin/perl -w -T use strict; my $str = 'foo_bar_foo_bar_12345'; print "$str\n"; $str =~ /(.*)(\1.*)/ || die "Failed!\n"; print "$2\n";

    Can anyone explaine why that is?

      From Backtracking:

      For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part—that's why it's called backtracking.

      The regex engine begins as you say, by matching everything to the first .*, but when the whole match fails it then backtracks one character and tries again. Eventually, it has backtracked to the point at which $1 contains foo_bar and $2 contains foo_bar_12345. The regex engine then verifies that this value of $2 does finally satisfy the condition \1.*, so the entire match succeeds and the regex engine stops looking and returns.

      HTH,

      Athanasius <°(((><contra mundum

        Big ++!

Re^2: regular expression
by Rahul Gupta (Sexton) on Aug 04, 2012 at 05:31 UTC
    Thanks.It worked for me :-)