On zero-width negative lookahead assertions

bronto has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: On zero-width negative lookahead assertions by ccn (Vicar) on Sep 10, 2004 at 14:13 UTC
there are two errors in the code: `@` and `.` must be backslashed your `\s` allows the regexp to match when `\s` matches empty string You are searching the match and Perl find it for you looking through all possible combinations	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:21 UTC
@ and . must be backslashed Backslashed: still matches too much your \s* allows the regexp to match when \s* matches empty string I know it, I expressely want to match 0 or more spaces before line end Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply]
Re^3: On zero-width negative lookahead assertions by ccn (Vicar) on Sep 10, 2004 at 14:29 UTC
Note the difference: `perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alliases`	[reply] [d/l]
Re: On zero-width negative lookahead assertions by ikegami (Patriarch) on Sep 10, 2004 at 14:24 UTC
First, don't forget to escape @ and .. `>perl -lne "/^root:\s(?!admin\@somewhere\.here)(.)/ and print $1" \ aliases.txt someone@somewhere.else admin@somewhere.here` [download] Note the leading space. When the regexp engine failed using all the spaces, it backtracked to \s* matching all but one space. One way to fix it is to anchor it as follows: `>perl -ne "/^root:\s*(?!admin\@somewhere\.here)\S/ and print;" \ aliases.txt root: someone@somewhere.else` [download]	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:37 UTC
That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end... It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions Thanks a lot! Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply]
How backtracking works in regular expressions by ikegami (Patriarch) on Sep 10, 2004 at 15:33 UTC
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation. It has nothing to do with lookaheads, really. For example, let's look at `/^abbc/` The regexp can be read as: 1. Starting at the begining of the string 2. Match an 'a'. 3. Match as many 'b's as possible, but not matching any is ok. 4. Match a 'b'. 5. Match a 'c'. `Match against 'abbbbbbc' 01234567 1) ok! pos = 0. (zw) 2) ok! Found an 'a' at pos 0. pos = 1. 3) ok! Found 6 'b's at pos 1 through 6. pos = 7. 4) fail! Did not find a 'b' at pos 7. Backtrack! 3) ok! Found 5 'b's at pos 1 through 5. pos = 6. 4) ok! Found a 'b' at pos 6. pos = 7. 5) ok! Found a 'c' at pos 7. pos = 8. Match!` [download] Something similiar is occuring with your `/^root:\s(?!email)/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') Match!` [download] Now let's look at my solution `/^root:\s(?!email)\S/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. 5. Match a '\S'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') 5) fail! Did not find a '\S' at pos 5. Backtrack! Nothing more to try. No match!` [download] `Match against 'root: hisemail' 01234567890123 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla) (found 'hisemail') 5) ok! Found a '\S' at pos 6. pos = 6. Match!` [download] Backtracking means: (might not be an exhaustive list) In the case of the first rule Look for a match further on. In the case of a `` rule or `?` rule, try matching less. In the case of a `*?` rule or `??` rule, try matching more. In the case of a `\|` or `[]` rule, try matching the next choice. else, no match, so backtrack the last matching rule.	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Eimi Metamorphoumai (Deacon) on Sep 10, 2004 at 14:46 UTC
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.	[reply]
Re^4: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:54 UTC
Re^4: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 15:32 UTC
Re^5: On zero-width negative lookahead assertions by Roy Johnson (Monsignor) on Sep 10, 2004 at 15:51 UTC
Re^5: On zero-width negative lookahead assertions by ysth (Canon) on Sep 10, 2004 at 17:39 UTC
Re^3: On zero-width negative lookahead assertions by Anonymous Monk on Sep 10, 2004 at 14:54 UTC
There is a non-space character after the \s. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used: `/root:(?>\s)(?!...)/` [download]	[reply] [d/l]
Re: On zero-width negative lookahead assertions by pbeckingham (Parson) on Sep 10, 2004 at 14:14 UTC
The following works if you break it into two expressions, but I can't see why yours doesn't match. `perl -ne '/^root:\s/ and $_ !~ /admin\@somewhere\.here/ and print' +alias` [download] Update:* Moving it around also works: `perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias` [download] pbeckingham - typist, perishable vertebrate.	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:25 UTC
`perl -ne '/^root:\s/ and $_ !~ /admin\@somewhere\.here/ and print' alias`* That's ok, but I want to understand that blah-blah-look-ahead thing `perl -ne '/^root:(?!\sadmin\@somewhere\.here)/ and print' alias`* This works! But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s$ at the end of the regex `:-(` Thanks a lot Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.*	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:51 UTC
> But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s$ at the end of the regex :-( That is because if you have the string `root: admin@somewhere.here 11111233333333333333333333` [download] and the RE `/^root:\s(?!\s*admin\@somewhere\.here)/ ABBBBBCCC` [download] then the part A in the RE matches the beginning of the string, part BBBBB matches 11111 ("root:") and CCC matches an empty string (not a space, a string with zero chars in it). After this empty string follows a space, and the space is not the beginning of "admin@somewhere.here", because it is the beginning of " admin@somewhere.here". I hope things are getting clearer for you :-)	[reply] [d/l] [select]
Re: On zero-width negative lookahead assertions by antirice (Priest) on Sep 10, 2004 at 15:16 UTC
A few things: Don't forget to escape your `@` and `.`. I also tried escaping the @ sign with a backslash, \Q...\E or useing strict: no way. You must escape @ in a regex no matter what. However, be careful with your escaping as it exhibits different behavior depending upon what's around it. `> perl -l $,=$/; print 'right:', qr(a\@b), qr(a\Q@\Eb), qr(\Qa@\Eb), qr(\Qa\E\@\Qb\E); print 'wrong:', qr(\Qa@b\E), qr(\Qa\@b\E); __END__ right: (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) wrong: (?-xism:a) (?-xism:a\\\@b)` [download] Your regex without an escaped @ is equivalent to `/^root:\s(?!admin.here)/`; that is unless `@somewhere` is defined within your program, of course. `\s` can also match the empty string as the following code shows: `> perl -l $_='root: admin@somewhere.here'; print '(',join(")(",/^(root:)(\s)(?!admin\@somewhere\.here)/),')'; print qq[Postmatch contained "$'"]; __END__ (root:)() Postmatch contained " admin@somewhere.here"` [download] More oddly (to me), if I add a \s$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!! The reason for that is because you basically turned your regex into `/^root:\s$/`. How should you do it? There are a couple of ways: `Hardcoded: /^root:(?!\sadmin\@somewhere\.here)/ Variable: my $admin_email = 'admin@somewhere.here'; /^root:(?!\s\Q$admin_email\E)/` [download] Note that if you want more constraints on your regex, you need to add them at the end of the zero-width negative lookahead assertion. Hope this helps. Update:* Wow, guess that took me a lot longer than I thought it would. Everyone else already said what I did =/ antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks