Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Regex: matching character which happens exactly once

by QM (Parson)
on Oct 23, 2017 at 14:15 UTC ( #1201896=note: print w/replies, xml ) Need Help??


in reply to Regex: matching character which happens exactly once

Also, it seems you want a forward reference, in the same spirit as a backref -- something the regex engine will go check after "filling" the ref:
my ($match) = /^(?:[^\1]*)(.)(?:[^\1]*)$/;

Here rendered as \1, just like the backref.

As the first capturing paren hasn't been encountered before the backref mention triggers different behavior in the regex path. This will make backtracking more painful, and possibly more likely to be pathological, but one has to assume some risk...

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re^2: Regex: matching character which happens exactly once
by AnomalousMonk (Bishop) on Oct 24, 2017 at 03:05 UTC

    I understand that the  [^\1] regex expression presented here is intended as pseudocode, but in addition to the radical changes to backrferencing it implies, there's another problem: the syntax of character classes would have to change radically to support it. Something like  \1 in a character class is compiled as an octal character representation:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $rx = qr{ [\1] }xms; print $rx; print 'match' if qq{\cA} =~ $rx; " (?msx-i: [\1] ) match


    Give a man a fish:  <%-{-{-{-<

      Yes. Like LanX, you're getting bogged down in implementation.

      What do you want it to do with that syntax? How would you redefine Perl regexes to do this?

      I think another symbol for backref would help. (I don't know what we'd use, but that's a different problem.) Then use the same thing for forward references. And for fun, we'll call those ferkcabs.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        How would you redefine Perl regexes to do this?

        But the  (?:[^\1]*) construct was envisioned to get around the lack of variable-width positive/negative lookbehind; I think I'd just implement that. (Of course, I suspect the reason it hasn't been implemented yet is because it's a royal pain to do so. Perhaps best to be careful what you wish for lest you find yourself with a new and very difficult assignment. As for implementing ferkcabs, I think I'd switch to another profession first. :)


        Give a man a fish:  <%-{-{-{-<

Re^2: Regex: matching character which happens exactly once (using global memory)
by LanX (Cardinal) on Oct 23, 2017 at 14:33 UTC
    This can't work, because the first \1 will be always set to the last match ahead (and undef or "" at first encounter)

    This is because $1 is a global var will keep match instead of erasing when backtracking.

    FWIW I tried something similar by capturing the following character in $2 for the next run:

    m/ ^ (?:(?!\2).)*? (.) (?=(.|$)) (?!.*\1) /x

    But couldn't get it to work, probably because the regex engine is not considering another defined \2 while backtracking. (or probably b/c I was too tired last night)

    DB<310> @inp = glob '{a,b}'x3 DB<311> ;m/ ^ (?:(?!\2).)*? (?{say "<$_ $2>"}) (.) (?=(.|$)) (?!.* +\1) /x and say ("found $1 in $_") for @inp <aaa > <aab > <aba > <aba b> found b in aba <abb > found a in abb <baa > found b in baa <bab > <bab a> found a in bab <bba > <bbb > DB<312>

    probably I'm having a bug in my logic, experts to the rescue! ;-)

    Didn't have the time yet for proper debugging.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      I didn't expect my forward ref to work, if that's what you thought. The regex engine would have to include a flag to indicate when the first capturing parens were seen, and do the right thing.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        What is the right thing???

        I can only judge the implementation, which considers $1 ff to be global.

        update

        This demonstrates whats happening, $2 is always the next character or empty.

        DB<322> ;m/ ^ .*? (?{say "<$_ $2>"}) (.) (?=(.|$)) (*FAIL)/x <babab > <babab a> <babab b> <babab a> <babab b> <babab >

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1201896]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2021-04-17 08:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?