Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Regexes on Streams - Revisited!

by Aristotle (Chancellor)
on Oct 14, 2003 at 09:00 UTC ( [id://299057]=note: print w/replies, xml ) Need Help??


in reply to Regexes on Streams - Revisited!

Be aware that switching from die to incrementing a variable means the engine will continue trying to match (potentially expensive) alternatives after the first premature end of string condition. dieing would short circuit immediately. If you want to avoid recompiling once the stream has run out, you could use something like die unless $end_of_stream where the flag is set by the main loop when the respective condition is met.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: Regexes on Streams - Revisited!
by tsee (Curate) on Oct 14, 2003 at 17:11 UTC
    Yes, the bug I mentioned is related to that.
    Anyhow, die-ing out of the regex caused my perl to dump core (win32 multithread 5.6.1 -- ActivePerl), so that's not an option. Incrementing a lexical variable $end_of_stream worked only in *some* cases. Printing something from the (${}) construct worked okay, but the incrementing action-at-a-distance did not work every time. Thus the use of a package variable.

    Back to that bug. The (?!) construct causes the current branch of the match to fail. That's nice but, as far as I can tell, entirely uneffective as this are because of the | at the end of the inserted end-of-string-tracking regex group.
    Instead, one'd want to modify the inserted regular expression to quit trying to match once the code construct is reached. (Ideally, it'd just fail which would make the whole experimental code construct unnecessary.)

    Unfortunately, I'm currently not able to spend much time on finding such a regex. (Read: almost none, I have to continue studying now.)

    Steffen
      Btw, it is probably cleaner to write
      (?:\z(?{ die })|)
      as
      (?(?=\z)(?{ die }))

      It's too bad your Perl segfaults on die from inside a regex.. that doesn't happen for me (5.8.0 Linux nothreads).

      Your troubles with using a lexical are possibly due to these code blocks inside regexes being closures; were you aware of that?

      Unfortunately, there is currently no way to tell the regex engine to fail the entire match immediately, which is why die is necessary. It will work in Perl6, but then, so will matching on streams.. :)

      The only solution is to do what we did in the days of Pascal to cope with the lack of last and friends: nest conditionals. In terms of the pattern matching, that means an attempt to match

      .*?abc(def)?
      becomes something like
      (?(?!\z) .*? (?(?!\z) abc (?(?!\z) (def)? | (?{ $PREMATURE_INPUT_END++ }) ) | (?{ $PREMATURE_INPUT_END++ }) ) | (?{ $PREMATURE_INPUT_END++ }) )

      Makeshifts last the longest.

        Just a quick note on using lexicals in those regex closures. The trouble was that the lexicals were seen sometimes. I could not find out when they were seen I<exactly>. If those regex code constructs worked alright as closures, they would have seen the lexicals I<all the time> because the closures were declared and used I<in the same scope> that the lexical was declared. What happened was, usually the first two times the regular expression engine executed the closure, the lexical would be incremented, and after that, the code would still be executed (print() worked), but the lexicals weren't touched. (I tried it with a tied lexical that warns when touched, too.)

        Steffen

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://299057]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-25 23:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found