Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Re: Regexes on Streams - Revisited!

by tsee (Curate)
on Oct 14, 2003 at 17:11 UTC ( [id://299190]=note: print w/replies, xml ) Need Help??


in reply to Re: Regexes on Streams - Revisited!
in thread Regexes on Streams - Revisited!

Yes, the bug I mentioned is related to that.
Anyhow, die-ing out of the regex caused my perl to dump core (win32 multithread 5.6.1 -- ActivePerl), so that's not an option. Incrementing a lexical variable $end_of_stream worked only in *some* cases. Printing something from the (${}) construct worked okay, but the incrementing action-at-a-distance did not work every time. Thus the use of a package variable.

Back to that bug. The (?!) construct causes the current branch of the match to fail. That's nice but, as far as I can tell, entirely uneffective as this are because of the | at the end of the inserted end-of-string-tracking regex group.
Instead, one'd want to modify the inserted regular expression to quit trying to match once the code construct is reached. (Ideally, it'd just fail which would make the whole experimental code construct unnecessary.)

Unfortunately, I'm currently not able to spend much time on finding such a regex. (Read: almost none, I have to continue studying now.)

Steffen
  • Comment on Re: Re: Regexes on Streams - Revisited!

Replies are listed 'Best First'.
Re^3: Regexes on Streams - Revisited!
by Aristotle (Chancellor) on Oct 14, 2003 at 23:52 UTC
    Btw, it is probably cleaner to write
    (?:\z(?{ die })|)
    as
    (?(?=\z)(?{ die }))

    It's too bad your Perl segfaults on die from inside a regex.. that doesn't happen for me (5.8.0 Linux nothreads).

    Your troubles with using a lexical are possibly due to these code blocks inside regexes being closures; were you aware of that?

    Unfortunately, there is currently no way to tell the regex engine to fail the entire match immediately, which is why die is necessary. It will work in Perl6, but then, so will matching on streams.. :)

    The only solution is to do what we did in the days of Pascal to cope with the lack of last and friends: nest conditionals. In terms of the pattern matching, that means an attempt to match

    .*?abc(def)?
    becomes something like
    (?(?!\z) .*? (?(?!\z) abc (?(?!\z) (def)? | (?{ $PREMATURE_INPUT_END++ }) ) | (?{ $PREMATURE_INPUT_END++ }) ) | (?{ $PREMATURE_INPUT_END++ }) )

    Makeshifts last the longest.

      Just a quick note on using lexicals in those regex closures. The trouble was that the lexicals were seen sometimes. I could not find out when they were seen I<exactly>. If those regex code constructs worked alright as closures, they would have seen the lexicals I<all the time> because the closures were declared and used I<in the same scope> that the lexical was declared. What happened was, usually the first two times the regular expression engine executed the closure, the lexical would be incremented, and after that, the code would still be executed (print() worked), but the lexicals weren't touched. (I tried it with a tied lexical that warns when touched, too.)

      Steffen

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://299190]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 12:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found