Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Re: New regex trick...

by japhy (Canon)
on Jul 22, 2002 at 19:56 UTC ( #184210=note: print w/replies, xml ) Need Help??

in reply to Re: New regex trick...
in thread New regex trick...

You're thinking too hard. I cheated. My patch just fools the regex engine into thinking it hasn't actually started matching yet. Here's a drawn-out example.
$str = "abc.def.ghi.jkl"; $str =~ s{ .* # match as much as you can \K # and then pretend HERE is where we start \. .* # then a . and anything else }{}x; # replace with nothing __END__ abc.def.ghi.jkl $& AAAAAAAAAAA .* "abc.def.ghi" \K "" B \. "." BCCC .* ".jkl"
Does that help you see what I do? My patch consists of a couple lines of support, but this is the beef:
case KEEP: PL_regstartp[0] = locinput - PL_bostr; break;
That's what happens when the regex engine encounters the \K. The rest of the patch is just creating the "KEEP" node, and telling toke.c that "\K" is a valid escape sequence.

Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Re: Re: New regex trick...
by erikharrison (Deacon) on Jul 22, 2002 at 20:34 UTC

    Okay, I know it's because I'm dumb, but I still don't get it. Please don't yell at me, but why does the \K anchor keep .* from matching .jkl? And if it backtracks like normal, then where does the speed come from? I think that may be the essence of my confusion - why is this faster?

    If I don't get it this time, I'll give up and just trust it ;-).


    Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet

      Oh, \K doesn't stop .* from matching the entire string. Perl is smart enough to back off to the last "." when the \. node comes up.

      What \K is doing is faking WHERE in the string (and the pattern) the regex started to match. Compare:

      $str = "Match 9 the 1 last 6 digit 2 blah"; $str =~ /.*\d/; print "[$`] [$&] [$']\n"; $str =~ /.*\K\d/; print "[$`] [$&] [$']\n"; __END__ [] [Match 9 the 1 last 6 digit 2] [ blah] [Match 9 the 1 last 6 digit ] [2] [ blah]
      See, \K tells $& that THIS is where it begins. This is useful in substitutions:
      # you go from this: s/(saveme)deleteme/$1/; # to this: s/saveme\Kdeleteme//;
      And you save time on replacing "saveme" with itself.

      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        Ah, excellent explanation. That clears things up nicely. ++ to erikharrison for asking the dumb questions that I needed answered.
      Erik, the bit about .* not matching .jkl is just plain old regex engine rules. That is to say, the match wouldn't succeed unless the last literal period (\.) is followed by 0 or more things (.*), eh? I'm assuming you're talking about the first .*, not the second one.

      When there is no wind, row.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://184210]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2021-04-16 12:22 GMT
Find Nodes?
    Voting Booth?

    No recent polls found