Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re:perl indication of end of string already matched

by AnomalousMonk (Archbishop)
on Jun 08, 2020 at 15:35 UTC ( [id://11117824]=note: print w/replies, xml ) Need Help??


in reply to perl indication of end of string already matched

This sounds very much like an XY Problem. Can you give us a Short, Self-Contained, Correct Example to illustrate your immediate problem? In any event, is the following something like what you would want?

c:\@Work\Perl\monks>perl -wMstrict -le "my $str = 'abc'; while ($str =~ m/./gc) { printf qq{1: %d \n}, pos $str; } ;; printf qq{2: %d \n}, pos $str; print 'pos at end' if pos $str == length $str; " 1: 1 1: 2 1: 3 2: 3 pos at end


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: perl indication of end of string already matched
by nachumk (Initiate) on Jun 08, 2020 at 16:28 UTC
    print 'pos at end' if pos $str == length $str;

    This if statement is true whether or not the previous regex matched $. And the regex engine will only match $ once. How does the regex engine know that it matched $? And can I get access to that information? I prefer to not call the regex engine again.

    Using pos == length is sufficient. I was hoping there was a simpler call, something like pos() but for identifying whether $ was already matched. That would allow me to avoid two calls (length and pos) and instead call one function (perhaps eos($str)). I'm very sensitive to performance during parsing.

      print 'pos at end' if pos $str == length $str;

      This if statement is true whether or not the previous regex matched $.

      I don't understand this. Can you give an example of a non-lookahead regex that matches to the end of a string and does not match at the end of the string, i.e. does not leave pos sitting beyond the end of the string (or pos == length)?

      Using pos == length is sufficient. ... a simpler call ... avoid two calls ... and instead call one function ... I'm very sensitive to performance during parsing.

      It sounds as if you may have an answer (even though I'm still a bit confused about the question). I imagine that Inline::C would allow you to define a single function to examine the internals of a string scalar and return info on pos versus length. Good luck :)


      Give a man a fish:  <%-{-{-{-<

        I guess in it's most simplistic form, my question is whether the state that regex uses to determine whether $ was already matched is available to me? For example:
        my $str = "abc"; $str =~ m/.|$/g; # succeeds, and pos goes to 1 $str =~ m/.|$/g; # succeeds, and pos goes to 2 $str =~ m/.|$/g; # succeeds, and pos goes to 3 $str =~ m/.|$/g; # succeeds, and pos stays at 3 # Why does this behave + differently than the next regex? pos($str) is 3 for both calls $str =~ m/.|$/g; # fails and resets pos

        How does the engine know that it has already succeeded with the fourth regex (when pos is already 3)? There must be some mechanism that records that internally in the regex engine. I'd like to know if that information is accessible.

        FYI, I rewrite my code constantly, this issue came up during one rewrite, I have to go back and see if there's a case where it is helpful. But the question still piqued my interest. I would like to understand if the internal structure that has this information ($ matched) is accessible

      >>How does the regex engine know that it matched $?

      Hi.

      I think engine doesn't know. It knows e.g. that it matched a zero-length branch once. And it cancels to match second time the same place in order to avoid eternal matching.
      I believe you can get similar results with regexes like these: m/.|(?:)/gc, m/.|(?=)|(?:)/gc...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11117824]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-24 01:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found