http://qs321.pair.com?node_id=1229800

morgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to construct a regex that extracts the string beween two quotes with the complication that that string may contain backslash-escaped quotes.

To illustrate:

my $rexeg = ???? # this is what I am after my ($m1) = '"hubba bubba"' =~ $regex; print "ok\n" if $m1 eq 'hubba bubba'; # should print "ok" my ($m2) = '"hubba \"bubba\""' =~ $regex; print "ok\n" if $m2 eq 'hubba "bubba"'; # should also print "ok"
I hope that is understandable...

I tried to do this with negative lookbehinds, but I attempt failed with "Variable length lookbehind not implemented", so am looking for some help here.

Many thanks!

Replies are listed 'Best First'.
Re: regex for strings with escaped quotes
by haukex (Archbishop) on Feb 12, 2019 at 15:13 UTC
    use Regexp::Common qw/delimited/; my $str = q{ x "foo \"bar\"" y }; $str =~ /($RE{delimited}{-delim=>'"'})/; print $1, "\n"; print $RE{delimited}{-delim=>'"'}, "\n"; __END__ "foo \"bar\"" (?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))
      It somehow stops working when I use the regex directly and I cannot see why:
      my $str = q{ x "foo \"bar\"" y }; $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/; print $1, "\n"; # prints nothing
        $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/;

        It's missing the capture group that I added: /($RE{delimited}{-delim=>'"'})/

      thanks, but I forgot to mention one detail:

      I do not need this for a perl-program but for a perl5-compatible regex engine in Go.

      I there a way to print the regex that is used?

        I there a way to print the regex that is used?

        That's what the second line of output above is. Simplification of that regex is left as an exercise to the reader :-) (Update: Nevermind.)

        BTW, you can also use the -keep feature to get only the part between the quotes:

        use Regexp::Common qw/delimited/; q{ x "foo \"bar\"" y } =~ /$RE{delimited}{-delim=>'"'}{-keep}/; print $3, "\n"; # prints: foo \"bar\"