Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: what would you like to see in perl5.12?

by bart (Canon)
on Aug 20, 2007 at 11:45 UTC ( [id://633776]=note: print w/replies, xml ) Need Help??


in reply to what would you like to see in perl5.12?

First of all, the things that are promised for 5.10. :) These include:
  • defined or: // and //=
  • speed improvements in regexes as promised (and implemented) by demerphq
  • recursive regexes! (ditto)

Aside from that, I'd like to see support for matching regexes across boundaries for partially loaded buffers. That would ease processing files in blocks of a few k each, instead of having to load the entire file into a string.

As an example: say you're looking for a word "SELECT" and the buffer contains:

my $sth = $dbh->prepare('SEL
It's possible that it would have matched "SELECT" if the buffer wasn't cut off.

I'd like regexes to be able to catch that. Automatically.

I don't really care how it's done, but I personally favor a system that takes some action (die, set a variable, call a callback sub) when the lookahead "touches" the back end of the buffer. (I call that the "electric fence" approach: touch it and you're dead.)

Replies are listed 'Best First'.
Re^2: what would you like to see in perl5.12?
by sgt (Deacon) on Aug 22, 2007 at 20:56 UTC

    Yes. I do agree completely. This opens the realm of stream regexps and would facilitate greatly the construction of regexp-based tokenizer (scalar m//gc) which need to process their input in chunks. Currently you need to resort to contorted hacks to do stream tokenizing, a pity as this limits the implementation of generic parser generators in pure Perl.

    What is needed is a way to keep the state of the regexp engine at the end of the buffer -- end-of-buffer-match case--, so that when you add another chunk, the engine does not start again from the beginning. Considering all the goodies added by demerphq, maybe there is hope ;) to see something soon.

    Also I'd like to be able to switch to a smaller but faster regexp implementation just for a block. Or maybe be able to turn off parts of the main engine -- locally -- that I know I am not going to use in a given block (supposing that doing so gives extra speed of course).

    cheers --stephan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://633776]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-25 10:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found