http://qs321.pair.com?node_id=184210


in reply to Re: New regex trick...
in thread New regex trick...

You're thinking too hard. I cheated. My patch just fools the regex engine into thinking it hasn't actually started matching yet. Here's a drawn-out example.
$str = "abc.def.ghi.jkl"; $str =~ s{ .* # match as much as you can \K # and then pretend HERE is where we start \. .* # then a . and anything else }{}x; # replace with nothing __END__ abc.def.ghi.jkl $& AAAAAAAAAAA .* "abc.def.ghi" \K "" B \. "." BCCC .* ".jkl"
Does that help you see what I do? My patch consists of a couple lines of support, but this is the beef:
case KEEP: PL_regstartp[0] = locinput - PL_bostr; break;
That's what happens when the regex engine encounters the \K. The rest of the patch is just creating the "KEEP" node, and telling toke.c that "\K" is a valid escape sequence.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Re: Re: New regex trick...
by erikharrison (Deacon) on Jul 22, 2002 at 20:34 UTC

    Okay, I know it's because I'm dumb, but I still don't get it. Please don't yell at me, but why does the \K anchor keep .* from matching .jkl? And if it backtracks like normal, then where does the speed come from? I think that may be the essence of my confusion - why is this faster?

    If I don't get it this time, I'll give up and just trust it ;-).

    Cheers,
    Erik

    Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet

      Oh, \K doesn't stop .* from matching the entire string. Perl is smart enough to back off to the last "." when the \. node comes up.

      What \K is doing is faking WHERE in the string (and the pattern) the regex started to match. Compare:

      $str = "Match 9 the 1 last 6 digit 2 blah"; $str =~ /.*\d/; print "[$`] [$&] [$']\n"; $str =~ /.*\K\d/; print "[$`] [$&] [$']\n"; __END__ [] [Match 9 the 1 last 6 digit 2] [ blah] [Match 9 the 1 last 6 digit ] [2] [ blah]
      See, \K tells $& that THIS is where it begins. This is useful in substitutions:
      # you go from this: s/(saveme)deleteme/$1/; # to this: s/saveme\Kdeleteme//;
      And you save time on replacing "saveme" with itself.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        Ah, excellent explanation. That clears things up nicely. ++ to erikharrison for asking the dumb questions that I needed answered.
      Erik, the bit about .* not matching .jkl is just plain old regex engine rules. That is to say, the match wouldn't succeed unless the last literal period (\.) is followed by 0 or more things (.*), eh? I'm assuming you're talking about the first .*, not the second one.
      Paul

      When there is no wind, row.