Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Match whitespace or start-of-string with lookbehind

by BigLug (Chaplain)
on Aug 24, 2004 at 02:39 UTC ( [id://385297]=perlquestion: print w/replies, xml ) Need Help??

BigLug has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to curl my quotes. To do that I need to match a quote after whitespace followed by a character. However what if the quote is the first thing in the string? In that case I need to match start-of-string or whitespace. Here's code that works in my head:
$text =~ s/(?<=(^|\s))"(?=\w)/“/gs;
However because ^ is zero-width and \s isn't, I get an error about variable width lookbehind not being supported.

How do I do this? Or will I need to use two regexes for this?


Cheers!
Rick
If this is a root node: Before responding, please ensure your clue bit is set.
If this is a reply: This is a discussion group, not a helpdesk ... If the discussion happens to answer a question you've asked, that's incidental.

Replies are listed 'Best First'.
Re: Match whitespace or start-of-string with lookbehind
by Enlil (Parson) on Aug 24, 2004 at 02:54 UTC
    I am not quite sure of what you are asking but here is something that I think is functionally equivalent to what you are showing doing:
    $text =~ s/(^|\s)"(?=\w)/$1“/gs;
    In short I am just changing the variable length lookbehind assertion and making it a regular captured pattern and putting it back after the match.

    HTH

    -enlil

      Yeah, that works. I just hate capturing something just to stick it back in the same place later!
      If this is a root node: Before responding, please ensure your clue bit is set.
      If this is a reply: This is a discussion group, not a helpdesk ... If the discussion happens to answer a question you've asked, that's incidental.
Re: Match whitespace or start-of-string with lookbehind
by etcshadow (Priest) on Aug 24, 2004 at 03:50 UTC
    It's not precisely the same thing as you're asking for... but would word-boundary (\b) be sufficient?

    If not, you can just re-nest the look-behind and the alternation into:

    $text =~ s/(^|(?<=\s))"(?=\w)/“/gs;
    That works. They key is that the ^ doesn't have to be look-behind, since it's zero-width, anyway.
    ------------ :Wq Not an editor command: Wq
Re: Match whitespace or start-of-string with lookbehind
by ysth (Canon) on Aug 24, 2004 at 05:07 UTC
    In general you need one alternative lookbehind for each possible length: (?:(?<=^)|(?<=\s)). In this case you can negate it to make the lookbehind always one character: (?<!\S).
Re: Match whitespace or start-of-string with lookbehind
by Your Mother (Archbishop) on Aug 24, 2004 at 06:28 UTC

    I think this does what you're after for this:

    $text =~ s/(?: (?<=\s) | (?<=\A) ) " (?=\w) /“/gx;
    (update, I think etcshadow's answer is better in this case, though the non-capture |'d lookbehind is useful sometimes) (update 2: yikes, I don't know how I missed ysth's post, sorry for the duplication.)

Re: Match whitespace or start-of-string with lookbehind
by diotalevi (Canon) on Aug 24, 2004 at 13:35 UTC

    I like ysth's response best but here's another: size each alternative to be identical.

    (?<= (?s: ^. ) | \s )

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://385297]
Approved by SciDude
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-25 13:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found