Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Applying regexes to streams: Perl enhancement idea (not that easy)

by tye (Sage)
on Jan 07, 2003 at 23:29 UTC ( [id://225110]=note: print w/replies, xml ) Need Help??


in reply to Re: Applying regexes to streams: Perl enhancement idea
in thread Applying regexes to streams: Perl enhancement idea

No. That would prevent the regex from succeeding at end-of-string. What I want is to prevent the regex from backtracking due to end-of-string. This can happen at any point in the regex so there is no one place in the pattern that you can put something to cause it to happen. It would be like putting a special token at the end of the string such that every part of the regex treats that token specially.

It could/should actually do even more than that. Even "mel" =~ /l+/z should fail because it terminated the search due to the end-of-string and the next bytes on my stream might well be "low" and so I'd want that regex to match both "l"s.

                - tye
  • Comment on Re^2: Applying regexes to streams: Perl enhancement idea (not that easy)
  • Download Code

Replies are listed 'Best First'.
Re^3: Applying regexes to streams: Perl enhancement idea
by Aristotle (Chancellor) on Jan 08, 2003 at 08:15 UTC
    How would I go about telling /z to wrap it up and accept the end of string as end of match? There are really two things you are asking of the engine: to continue where it left off last time, and to fail without forgetting where it's at when it hits the end of string. You need a way to be able to ask for the first without the latter. Otherwise, as a silly example (but let's pretend it isn't), /.+/z would always fail, even at the end of my input stream where I'd want it to successfully match at end of string.

    Makeshifts last the longest.

      Good point. My original example code didn't handle that case correctly in part because it started out as an example of using a regular expression to match record terminators and in part because I had not fully considered the effect of //z on greedy matches until I replied to theorbtwo's node.

      We already have a separate "continue where it left off last time" feature for regular expressions: //g in a scalar context and pos(). So my example is easy to fix by dropping /z once I've found end-of-stream. I'll update it shortly to reflect this.

      Note that my example fetches pos() in order to strip stuff from the front of the buffer, therefore each match is performed with pos()=0. If, for example, you were instead matching record terminators, then you would instead fetch pos() in order to restore it before the next match (since the sysread updates the contents of the buffer which also resets its pos).

      Thanks,
                      - tye

        But how does the regex engine know that that search sans /z is supposed to be the finishing search of the previous series of /z searches, as opposed to an entirely new pattern to be applied? What I'm talking about is analogous to using /gc and then finally only /c to conclude the series of matches. If there was no /g, the (lack of) presence of /c alone would be ambiguous.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://225110]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-04-18 04:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found