Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: You won't believe what this regular expression does!

by LanX (Sage)
on Feb 25, 2021 at 13:01 UTC ( #11128780=note: print w/replies, xml ) Need Help??


in reply to You won't believe what this regular expression does!

Lets dissect this into smaller problems.

Simplification

I tried to simplify the case to avoid misunderstandings

DB<32> p "hello" =~ s/o*$/O/gr; hellOO DB<33> $_="hello"; s/o*$/O/g; print # for older Perls hellOO DB<34>

Surprise: the o is replaced twice.

Explanation so far

You and Hauke already explained that

  • pos isn't changing after the first match b/c of the zero-width of $
  • the empty o* is matching again

(And I agree that the referenced perlre#Repeated-Patterns-Matching-a-Zero-length-Substring needs a rewrite)

DB<41> $_="hello"; say pos,"($1)" while m/(o*$)/g; # pos doesn't c +hange 5(o) 5() DB<42> p "hello" =~ s/x*$/O/gr; # empty match ( +no x) helloO

Disappointments

Now, why is it surprising?

I think your case is that $ in combination with the /m modifier should act differently. Correct?

  • Would this be consistent?
  • Are there already examples of zero-width assertions who does it that way?
  • Are there work-arounds to achieve what you want? (i.e. skipping zero-length matches)
Workarounds

Here a guess for the last question

DB<44> p "hello\nfoo" =~ s/o*\n/O/gmr; hellOfoo DB<45> p "hello\nfoo\n" =~ s/o*\n/O/gmr; # added \n at the end of + input hellOfO DB<46>

Meta

Question @all: Is the problem better understood now? :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

edit

added more code

update

added headlines for structuring

) because empty patterns are always matching

compare

DB<59> p "12345" =~ s/x*/ /gmr; 1 2 3 4 5 DB<60>

Replies are listed 'Best First'.
Re^2: You won't believe what this regular expression does!
by haukex (Archbishop) on Feb 25, 2021 at 14:42 UTC
    Are there work-arounds to achieve what you want?

    I sometimes use (?:\n|\z) to be explicit that I want the line endings to be consumed by the engine.

      Thanks! :)

      But please note the second fOO

      DB<55> p "hello\nfoo" =~ s/o*(?:\n|\z)/O/gmr; hellOfOO DB<56>

      I'm busy right now, but I seem to remember that one could use features for atomic matches...

      I'll try later...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      update

      ) nah doesn't help, since it's not a backtracking problem.

        But please note the second fOO

        Yes, good point! I think the main question is what the intent of the regex is. If it's "replace any o's at the end of each line", then the better solution is, as you said, /o+$/, and using o* is the "mistake".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11128780]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2022-12-06 00:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?