On zero-width negative lookahead assertions

bronto has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

Following the suggestions I had from this node, I started coding a one liner but I can't get it to work.

The problem: I have a UNIX alias file and I want to modify only root's alias, and only if it is different from a predefined one. For example:

root: admin@somewhere.here is OK and should be left untouched
root: someone@somewhere.else is NOT OK and should be modified
Any non-root alias should be left untouched

To test if I well understanded the lesson, I created a file containing...

root: admin@somewhere.here
root: someone@somewhere.else
any: anybody@anywhere.else
[download]

...and wrote a regular expression that I would eventually put into an s/// operator; I expected it to match just the second line, but the one-liner below...

perl -ne '/^root:\s*(?!admin@somewhere.here)/ and print' alliases

actually outputs:

root: admin@somewhere.here
root: someone@somewhere.else
[download]

which looks quite odd to me, since I expected the first line not to match. I also tried quoting the @ sign with a backslash, \Q...\E or useing strict: no way. More oddly (to me), if I add a \s*$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!!

I am getting a little confused, where am I doing wrong?

Thanks in advance, and thanks to everyone that answered to the original post

Ciao!
--bronto

The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz

Comment on On zero-width negative lookahead assertions Select or Download Code

Replies are listed 'Best First'.
Re: On zero-width negative lookahead assertions by ccn (Vicar) on Sep 10, 2004 at 14:13 UTC
there are two errors in the code: `@` and `.` must be backslashed your `\s` allows the regexp to match when `\s` matches empty string You are searching the match and Perl find it for you looking through all possible combinations	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:21 UTC
@ and . must be backslashed Backslashed: still matches too much your \s* allows the regexp to match when \s* matches empty string I know it, I expressely want to match 0 or more spaces before line end Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply]
Re^3: On zero-width negative lookahead assertions by ccn (Vicar) on Sep 10, 2004 at 14:29 UTC
Note the difference: `perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alliases`	[reply] [d/l]
Re: On zero-width negative lookahead assertions by ikegami (Patriarch) on Sep 10, 2004 at 14:24 UTC
First, don't forget to escape @ and .. `>perl -lne "/^root:\s(?!admin\@somewhere\.here)(.)/ and print $1" \ aliases.txt someone@somewhere.else admin@somewhere.here` [download] Note the leading space. When the regexp engine failed using all the spaces, it backtracked to \s* matching all but one space. One way to fix it is to anchor it as follows: `>perl -ne "/^root:\s*(?!admin\@somewhere\.here)\S/ and print;" \ aliases.txt root: someone@somewhere.else` [download]	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:37 UTC
That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end... It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions Thanks a lot! Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.	[reply]
How backtracking works in regular expressions by ikegami (Patriarch) on Sep 10, 2004 at 15:33 UTC
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation. It has nothing to do with lookaheads, really. For example, let's look at `/^abbc/` The regexp can be read as: 1. Starting at the begining of the string 2. Match an 'a'. 3. Match as many 'b's as possible, but not matching any is ok. 4. Match a 'b'. 5. Match a 'c'. `Match against 'abbbbbbc' 01234567 1) ok! pos = 0. (zw) 2) ok! Found an 'a' at pos 0. pos = 1. 3) ok! Found 6 'b's at pos 1 through 6. pos = 7. 4) fail! Did not find a 'b' at pos 7. Backtrack! 3) ok! Found 5 'b's at pos 1 through 5. pos = 6. 4) ok! Found a 'b' at pos 6. pos = 7. 5) ok! Found a 'c' at pos 7. pos = 8. Match!` [download] Something similiar is occuring with your `/^root:\s(?!email)/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') Match!` [download] Now let's look at my solution `/^root:\s(?!email)\S/` The regexp can be read as: 1. Starting at the begining of the string 2. Match 'root:'. 3. Match as many '\s's as possible, but not matching any is ok. 4. Match something other than 'email'. 5. Match a '\S'. `Match against 'root: email' 01234567890 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) fail! Found 'email' at pos 6 through 10. Backtrack! 3) ok! Found 0 '\s' at pos 5. pos = 5. (zw) 4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla) (found ' email') 5) fail! Did not find a '\S' at pos 5. Backtrack! Nothing more to try. No match!` [download] `Match against 'root: hisemail' 01234567890123 1) ok! pos = 0. (zw) 2) ok! Found a 'root:' at pos 0 through 4. pos = 5. 3) ok! Found 1 '\s' at pos 5. pos = 6. 4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla) (found 'hisemail') 5) ok! Found a '\S' at pos 6. pos = 6. Match!` [download] Backtracking means: (might not be an exhaustive list) In the case of the first rule Look for a match further on. In the case of a `` rule or `?` rule, try matching less. In the case of a `*?` rule or `??` rule, try matching more. In the case of a `\|` or `[]` rule, try matching the next choice. else, no match, so backtrack the last matching rule.	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Eimi Metamorphoumai (Deacon) on Sep 10, 2004 at 14:46 UTC
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.	[reply]
Re^4: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:54 UTC
Re^4: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 15:32 UTC
Re^5: On zero-width negative lookahead assertions by Roy Johnson (Monsignor) on Sep 10, 2004 at 15:51 UTC
Re^5: On zero-width negative lookahead assertions by ysth (Canon) on Sep 10, 2004 at 17:39 UTC
Re^3: On zero-width negative lookahead assertions by Anonymous Monk on Sep 10, 2004 at 14:54 UTC
There is a non-space character after the \s. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used: `/root:(?>\s)(?!...)/` [download]	[reply] [d/l]
Re: On zero-width negative lookahead assertions by pbeckingham (Parson) on Sep 10, 2004 at 14:14 UTC
The following works if you break it into two expressions, but I can't see why yours doesn't match. `perl -ne '/^root:\s/ and $_ !~ /admin\@somewhere\.here/ and print' +alias` [download] Update:* Moving it around also works: `perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias` [download] pbeckingham - typist, perishable vertebrate.	[reply] [d/l] [select]
Re^2: On zero-width negative lookahead assertions by bronto (Priest) on Sep 10, 2004 at 14:25 UTC
`perl -ne '/^root:\s/ and $_ !~ /admin\@somewhere\.here/ and print' alias`* That's ok, but I want to understand that blah-blah-look-ahead thing `perl -ne '/^root:(?!\sadmin\@somewhere\.here)/ and print' alias`* This works! But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s$ at the end of the regex `:-(` Thanks a lot Ciao! `--bronto` In theory, there is no difference between theory and practice. In practice, there is.*	[reply] [d/l] [select]
Re^3: On zero-width negative lookahead assertions by Crian (Curate) on Sep 10, 2004 at 14:51 UTC
> But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s$ at the end of the regex :-( That is because if you have the string `root: admin@somewhere.here 11111233333333333333333333` [download] and the RE `/^root:\s(?!\s*admin\@somewhere\.here)/ ABBBBBCCC` [download] then the part A in the RE matches the beginning of the string, part BBBBB matches 11111 ("root:") and CCC matches an empty string (not a space, a string with zero chars in it). After this empty string follows a space, and the space is not the beginning of "admin@somewhere.here", because it is the beginning of " admin@somewhere.here". I hope things are getting clearer for you :-)	[reply] [d/l] [select]
Re: On zero-width negative lookahead assertions by antirice (Priest) on Sep 10, 2004 at 15:16 UTC
A few things: Don't forget to escape your `@` and `.`. I also tried escaping the @ sign with a backslash, \Q...\E or useing strict: no way. You must escape @ in a regex no matter what. However, be careful with your escaping as it exhibits different behavior depending upon what's around it. `> perl -l $,=$/; print 'right:', qr(a\@b), qr(a\Q@\Eb), qr(\Qa@\Eb), qr(\Qa\E\@\Qb\E); print 'wrong:', qr(\Qa@b\E), qr(\Qa\@b\E); __END__ right: (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) (?-xism:a\@b) wrong: (?-xism:a) (?-xism:a\\\@b)` [download] Your regex without an escaped @ is equivalent to `/^root:\s(?!admin.here)/`; that is unless `@somewhere` is defined within your program, of course. `\s` can also match the empty string as the following code shows: `> perl -l $_='root: admin@somewhere.here'; print '(',join(")(",/^(root:)(\s)(?!admin\@somewhere\.here)/),')'; print qq[Postmatch contained "$'"]; __END__ (root:)() Postmatch contained " admin@somewhere.here"` [download] More oddly (to me), if I add a \s$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!! The reason for that is because you basically turned your regex into `/^root:\s$/`. How should you do it? There are a couple of ways: `Hardcoded: /^root:(?!\sadmin\@somewhere\.here)/ Variable: my $admin_email = 'admin@somewhere.here'; /^root:(?!\s\Q$admin_email\E)/` [download] Note that if you want more constraints on your regex, you need to add them at the end of the zero-width negative lookahead assertion. Hope this helps. Update:* Wow, guess that took me a lot longer than I thought it would. Everyone else already said what I did =/ antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom