Re: On zero-width negative lookahead assertions
by ccn (Vicar) on Sep 10, 2004 at 14:13 UTC
|
| [reply] [d/l] [select] |
|
@ and . must be backslashed
Backslashed: still matches too much
your \s* allows the regexp to match when \s* matches empty string
I know it, I expressely want to match 0 or more spaces before line end
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] |
|
| [reply] [d/l] |
Re: On zero-width negative lookahead assertions
by ikegami (Patriarch) on Sep 10, 2004 at 14:24 UTC
|
First, don't forget to escape @ and ..
>perl -lne "/^root:\s*(?!admin\@somewhere\.here)(.*)/ and print $1" \
aliases.txt
someone@somewhere.else
admin@somewhere.here
Note the leading space. When the regexp engine failed using all the spaces, it backtracked to \s* matching all but one space. One way to fix it is to anchor it as follows:
>perl -ne "/^root:\s*(?!admin\@somewhere\.here)\S/ and print;" \
aliases.txt
root: someone@somewhere.else
| [reply] [d/l] [select] |
|
That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end...
It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions
Thanks a lot!
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] |
|
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation.
It has nothing to do with lookaheads, really. For example, let's look at
/^ab*bc/
The regexp can be read as:
1. Starting at the begining of the string
2. Match an 'a'.
3. Match as many 'b's as possible, but not matching any is ok.
4. Match a 'b'.
5. Match a 'c'.
Match against 'abbbbbbc'
01234567
1) ok! pos = 0. (zw)
2) ok! Found an 'a' at pos 0. pos = 1.
3) ok! Found 6 'b's at pos 1 through 6. pos = 7.
4) fail! Did not find a 'b' at pos 7. Backtrack!
3) ok! Found 5 'b's at pos 1 through 5. pos = 6.
4) ok! Found a 'b' at pos 6. pos = 7.
5) ok! Found a 'c' at pos 7. pos = 8.
Match!
Something similiar is occuring with your
/^root:\s*(?!email)/
The regexp can be read as:
1. Starting at the begining of the string
2. Match 'root:'.
3. Match as many '\s's as possible, but not matching any is ok.
4. Match something other than 'email'.
Match against 'root: email'
01234567890
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) fail! Found 'email' at pos 6 through 10. Backtrack!
3) ok! Found 0 '\s' at pos 5. pos = 5. (zw)
4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla)
(found ' email')
Match!
Now let's look at my solution
/^root:\s*(?!email)\S/
The regexp can be read as:
1. Starting at the begining of the string
2. Match 'root:'.
3. Match as many '\s's as possible, but not matching any is ok.
4. Match something other than 'email'.
5. Match a '\S'.
Match against 'root: email'
01234567890
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) fail! Found 'email' at pos 6 through 10. Backtrack!
3) ok! Found 0 '\s' at pos 5. pos = 5. (zw)
4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla)
(found ' email')
5) fail! Did not find a '\S' at pos 5. Backtrack!
Nothing more to try.
No match!
Match against 'root: hisemail'
01234567890123
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla)
(found 'hisemail')
5) ok! Found a '\S' at pos 6. pos = 6.
Match!
Backtracking means: (might not be an exhaustive list)
- In the case of the first rule
- Look for a match further on.
- In the case of a * rule or ? rule,
- try matching less.
- In the case of a *? rule or ?? rule,
- try matching more.
- In the case of a | or [] rule,
- try matching the next choice.
- else,
- no match, so backtrack the last matching rule.
| [reply] [d/l] [select] |
|
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.
| [reply] |
|
|
|
|
|
There is a non-space character after the \s*. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used:
/root:(?>\s*)(?!...)/
| [reply] [d/l] |
Re: On zero-width negative lookahead assertions
by pbeckingham (Parson) on Sep 10, 2004 at 14:14 UTC
|
The following works if you break it into two expressions, but I can't see why yours doesn't match.
perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print'
+alias
Update: Moving it around also works:
perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias
pbeckingham - typist, perishable vertebrate.
| [reply] [d/l] [select] |
|
perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print' alias
That's ok, but I want to understand that blah-blah-look-ahead thing
perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias
This works! But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(
Thanks a lot
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] [d/l] [select] |
|
> But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(
That is because if you have the string
root: admin@somewhere.here
11111233333333333333333333
and the RE
/^root:\s*(?!\s*admin\@somewhere\.here)/
ABBBBBCCC
then the part A in the RE matches the beginning of the string, part BBBBB matches 11111 ("root:") and CCC matches an empty string (not a space, a string with zero chars in it). After this empty string follows a space, and the space is not the beginning of "admin@somewhere.here", because it is the beginning of " admin@somewhere.here".
I hope things are getting clearer for you :-)
| [reply] [d/l] [select] |
Re: On zero-width negative lookahead assertions
by antirice (Priest) on Sep 10, 2004 at 15:16 UTC
|
Hardcoded:
/^root:(?!\s*admin\@somewhere\.here)/
Variable:
my $admin_email = 'admin@somewhere.here';
/^root:(?!\s*\Q$admin_email\E)/
Note that if you want more constraints on your regex, you need to add them at the end of the zero-width negative lookahead assertion. Hope this helps.
Update: Wow, guess that took me a lot longer than I thought it would. Everyone else already said what I did =/
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1
| [reply] [d/l] [select] |