Re: On zero-width negative lookahead assertions

Replies are listed 'Best First'.
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:37 UTC
That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end... It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions Thanks a lot! Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply]
How backtracking works in regular expressions by ikegami (Patriarch) on Sep 10, 2004 at 15:33 UTC
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation. It has nothing to do with lookaheads, really. For example, let's look at `/^abbc/` The regexp can be read as: 1. Starting at the begining of the string 2. Match an 'a'. 3. Match as many 'b's as possible, but not matching any is ok. 4. Match a 'b'. 5. Match a 'c'. `Match against 'abbbbbbc' 01234567 1) ok! pos = 0. (zw) 2) ok! Found an 'a' at pos 0. pos = 1. 3) ok! Found 6 'b's at pos 1 through 6. pos = 7. 4) fail! Did not find a 'b' at pos 7. Backtrack! 3) ok! Found 5 'b's at pos 1 through 5. pos = 6. 4) ok! Found a 'b' at pos 6. pos = 7. 5) ok! Found a 'c' at pos 7. pos = 8. Match!` [download] Something similiar is occuring with your `/^root:\s(?!email)/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') Match!` [download] Now let's look at my solution `/^root:\s(?!email)\S/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. 5. Match a '\S'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') 5) fail! Did not find a '\S' at pos 5. Backtrack! Nothing more to try. No match!` [download] `Match against 'root: hisemail' 01234567890123 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla) (found 'hisemail') 5) ok! Found a '\S' at pos 6. pos = 6. Match!` [download] Backtracking means: (might not be an exhaustive list) In the case of the first rule Look for a match further on. In the case of a `` rule or `?` rule, try matching less. In the case of a `*?` rule or `??` rule, try matching more. In the case of a `\|` or `[]` rule, try matching the next choice. else, no match, so backtrack the last matching rule.	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Eimi Metamorphoumai (Deacon) on Sep 10, 2004 at 14:46 UTC
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.	[reply]
Re^4: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:54 UTC
not exactly, not "followed by something that is not 'admin@somewhere.here'" it is "not followed by 'admin@somewhere.here' That is a difference, because it matches, if nothing follows at all.	[reply]
Re^4: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 15:32 UTC
Uhmmmmm... so the old adagio that "* is greedy" has an exception when zwnlaa come into play; I expected that the \s* had eat all the whitespace before the e-mail address. Ok. Now I am still to understand why that \S thing works... Oh, by the way, I am doing: `perl -i.bak -pe 'BEGIN { $status = 0 } /^root:(?!\sadmin\@somewhere\.here\s$)/ and $status = 1 ; END { exit $status }' aliases` and it seems to work great! Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply] [d/l]
Re^5: On zero-width negative lookahead assertions by Roy Johnson (Monsignor) on Sep 10, 2004 at 15:51 UTC
Re^5: On zero-width negative lookahead assertions by ysth (Canon) on Sep 10, 2004 at 17:39 UTC
Re^3: On zero-width negative lookahead assertions by Anonymous Monk on Sep 10, 2004 at 14:54 UTC
There is a non-space character after the \s. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used: `/root:(?>\s)(?!...)/` [download]	[reply] [d/l]


Think about Loose Coupling
	PerlMonks