Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: Defining Characters in Word Boundary?

by iaw4 (Monk)
on Jan 20, 2011 at 14:00 UTC ( [id://883317]=note: print w/replies, xml ) Need Help??


in reply to Re: Defining Characters in Word Boundary?
in thread Defining Characters in Word Boundary?

thanks. this is what I needed to learn. I did not know the extended regex expressions in the camel book (i.e., (?...) sequences), chapter 5, table 5.6. is there a meaningful difference between (?!a-z) and (?=^a-z)? is the former recommended? /iaw
  • Comment on Re^2: Defining Characters in Word Boundary?

Replies are listed 'Best First'.
Re^3: Defining Characters in Word Boundary?
by ikegami (Patriarch) on Jan 20, 2011 at 16:32 UTC
    Compare
    'ab' =~ /a(?!a)/ 'a' =~ /a(?!a)/
    and
    'ab' =~ /a(?=[^a])/ 'a' =~ /a(?=[^a])/
Re^3: Defining Characters in Word Boundary?
by Jim (Curate) on Jan 20, 2011 at 17:17 UTC
    is there a meaningful difference between (?![a-z]) and (?=[^a-z])? is the former recommended?

    Yes, they're different regular expression patterns that match different things. (?![a-z]) asserts "not followed by any of the characters from 'a' through 'z', which includes not being followed by any character." (?=[^a-z]) asserts "followed by a single character that is not any of the characters from 'a' through 'z'." The former is a negative assertion; the latter is a positive assertion.

    In your case, (?![a-z]) is what you would want to use.

    [PerlMonks posting tip: Enclose Perl code in <code></code> tags, even code within paragraphs.]

    UPDATE: Removed color.

      In your case,  (?![a-z]) is what you would want to use.

      One behavioral difference between these regexes and, in the case of the OP, the reason iaw4 would (probably) want to use this regex is that it can match at the end of a string and thus emulates the behavior of the  \b assertion. (Note:  \b can also match at the start of a string.)

      >perl -wMstrict -le "my $str = 'abcd'; for my $rx (qr{(?=[^a-z])}, qr{(?![a-z])}, qr{\b}) { my @offsets; push @offsets, $-[1] while $str =~ m{ ($rx) }xmsg; if (@offsets) { print qq{$rx matches '$str' at offset(s) @offsets}; } else { print qq{$rx does not match '$str'}; } } " (?-xism:(?=[^a-z])) does not match 'abcd' (?-xism:(?![a-z])) matches 'abcd' at offset(s) 4 (?-xism:\b) matches 'abcd' at offset(s) 0 4

        (?<![a-zA-Z]) emulates \b at the beginning of an alphabetic string and (?![a-zA-Z]) emulates \b at the end of an alphabetic string.

        But as I explained before, it doesn't seem from iaw4's problem description that asserting a boundary match is even needed.

        UPDATE: Removed color.

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://883317]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found