bronto has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks
Following the suggestions I had from this node, I started coding a one liner but I can't get it to work.
The problem: I have a UNIX alias file and I want to modify only root's alias, and only if it is different from a predefined one. For example:
- root: admin@somewhere.here is OK and should be left untouched
- root: someone@somewhere.else is NOT OK and should be modified
- Any non-root alias should be left untouched
To test if I well understanded the lesson, I created a file containing...
root: admin@somewhere.here
root: someone@somewhere.else
any: anybody@anywhere.else
...and wrote a regular expression that I would eventually put into an s/// operator; I expected it to match just the second line, but the one-liner below...
perl -ne '/^root:\s*(?!admin@somewhere.here)/ and print' alliases
actually outputs:
root: admin@somewhere.here
root: someone@somewhere.else
which looks quite odd to me, since I expected the first line not to match. I also tried quoting the @ sign with a backslash, \Q...\E or useing strict: no way. More oddly (to me), if I add a \s*$ at the end of the regex to match any whitespace between the address and the end of line, then no line matches!!!
I am getting a little confused, where am I doing wrong?
Thanks in advance, and thanks to everyone that answered to the original post
Ciao! --bronto
The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz
Re: On zero-width negative lookahead assertions
by ccn (Vicar) on Sep 10, 2004 at 14:13 UTC
|
| [reply] [d/l] [select] |
|
@ and . must be backslashed
Backslashed: still matches too much
your \s* allows the regexp to match when \s* matches empty string
I know it, I expressely want to match 0 or more spaces before line end
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] |
|
| [reply] [d/l] |
Re: On zero-width negative lookahead assertions
by ikegami (Patriarch) on Sep 10, 2004 at 14:24 UTC
|
First, don't forget to escape @ and ..
>perl -lne "/^root:\s*(?!admin\@somewhere\.here)(.*)/ and print $1" \
aliases.txt
someone@somewhere.else
admin@somewhere.here
Note the leading space. When the regexp engine failed using all the spaces, it backtracked to \s* matching all but one space. One way to fix it is to anchor it as follows:
>perl -ne "/^root:\s*(?!admin\@somewhere\.here)\S/ and print;" \
aliases.txt
root: someone@somewhere.else
| [reply] [d/l] [select] |
|
That works, and I thank you for explaining why. Unfortunately, I can't understand why, in the first case, the parentheses match the leading space, and why putting the \S makes it match, even if there is no non-space character at the end...
It would be glad if you (or anyone else) could further explain that. I think I'll discover what I didn't understand of zwnla assertions
Thanks a lot!
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] |
|
Note: Perl regexp matching is not necessarily implemented as described below. I'm totally ignorant as to how it is actually implemented. One could say this document describes the specs rather than the implementation.
It has nothing to do with lookaheads, really. For example, let's look at
/^ab*bc/
The regexp can be read as:
1. Starting at the begining of the string
2. Match an 'a'.
3. Match as many 'b's as possible, but not matching any is ok.
4. Match a 'b'.
5. Match a 'c'.
Match against 'abbbbbbc'
01234567
1) ok! pos = 0. (zw)
2) ok! Found an 'a' at pos 0. pos = 1.
3) ok! Found 6 'b's at pos 1 through 6. pos = 7.
4) fail! Did not find a 'b' at pos 7. Backtrack!
3) ok! Found 5 'b's at pos 1 through 5. pos = 6.
4) ok! Found a 'b' at pos 6. pos = 7.
5) ok! Found a 'c' at pos 7. pos = 8.
Match!
Something similiar is occuring with your
/^root:\s*(?!email)/
The regexp can be read as:
1. Starting at the begining of the string
2. Match 'root:'.
3. Match as many '\s's as possible, but not matching any is ok.
4. Match something other than 'email'.
Match against 'root: email'
01234567890
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) fail! Found 'email' at pos 6 through 10. Backtrack!
3) ok! Found 0 '\s' at pos 5. pos = 5. (zw)
4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla)
(found ' email')
Match!
Now let's look at my solution
/^root:\s*(?!email)\S/
The regexp can be read as:
1. Starting at the begining of the string
2. Match 'root:'.
3. Match as many '\s's as possible, but not matching any is ok.
4. Match something other than 'email'.
5. Match a '\S'.
Match against 'root: email'
01234567890
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) fail! Found 'email' at pos 6 through 10. Backtrack!
3) ok! Found 0 '\s' at pos 5. pos = 5. (zw)
4) ok! Found something other than 'email' at pos 5. pos = 5. (zwla)
(found ' email')
5) fail! Did not find a '\S' at pos 5. Backtrack!
Nothing more to try.
No match!
Match against 'root: hisemail'
01234567890123
1) ok! pos = 0. (zw)
2) ok! Found a 'root:' at pos 0 through 4. pos = 5.
3) ok! Found 1 '\s' at pos 5. pos = 6.
4) ok! Found something other than 'email' at pos 6. pos = 6. (zwla)
(found 'hisemail')
5) ok! Found a '\S' at pos 6. pos = 6.
Match!
Backtracking means: (might not be an exhaustive list)
- In the case of the first rule
- Look for a match further on.
- In the case of a * rule or ? rule,
- try matching less.
- In the case of a *? rule or ?? rule,
- try matching more.
- In the case of a | or [] rule,
- try matching the next choice.
- else,
- no match, so backtrack the last matching rule.
| [reply] [d/l] [select] |
|
The regexp engine will match if it can find any way to. So what you're asking for is "root, followed by some number (possibly zero) of whitespace characters, followed by something that is not 'admin@somewhere.here'". So it matches with root, followed by zero spaces, followed by ' admin@somewhere.here' (with a leading space). Since the string ' admin@somewhere.here' isn't 'admin@somewhere.here' (without the space), the lookahead works. That's why you need the \s* inside the lookahead, making it "try to find spaces followed by admin@somewhere.here, and if you can, fail" instead of "look for spaces, but make sure it's not followed by admin@somewhere.here". Subtle, but important.
| [reply] |
|
|
|
|
|
There is a non-space character after the \s*. The (?!) part is a zero-width assertion. Zero-width means just that - it doesn't consume anything of the string to match. In stead of using the \S, one could also have used:
/root:(?>\s*)(?!...)/
| [reply] [d/l] |
Re: On zero-width negative lookahead assertions
by pbeckingham (Parson) on Sep 10, 2004 at 14:14 UTC
|
The following works if you break it into two expressions, but I can't see why yours doesn't match.
perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print'
+alias
Update: Moving it around also works:
perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias
pbeckingham - typist, perishable vertebrate.
| [reply] [d/l] [select] |
|
perl -ne '/^root:\s*/ and $_ !~ /admin\@somewhere\.here/ and print' alias
That's ok, but I want to understand that blah-blah-look-ahead thing
perl -ne '/^root:(?!\s*admin\@somewhere\.here)/ and print' alias
This works! But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(
Thanks a lot
Ciao! --bronto
In theory, there is no difference between theory and practice. In practice, there is.
| [reply] [d/l] [select] |
|
> But I can't understand why it doesn't work if you put the \s* outside the parens, nor I can understand why it stops working if I put the \s*$ at the end of the regex :-(
That is because if you have the string
root: admin@somewhere.here
11111233333333333333333333
and the RE
/^root:\s*(?!\s*admin\@somewhere\.here)/
ABBBBBCCC
then the part A in the RE matches the beginning of the string, part BBBBB matches 11111 ("root:") and CCC matches an empty string (not a space, a string with zero chars in it). After this empty string follows a space, and the space is not the beginning of "admin@somewhere.here", because it is the beginning of " admin@somewhere.here".
I hope things are getting clearer for you :-)
| [reply] [d/l] [select] |
Re: On zero-width negative lookahead assertions
by antirice (Priest) on Sep 10, 2004 at 15:16 UTC
|
Hardcoded:
/^root:(?!\s*admin\@somewhere\.here)/
Variable:
my $admin_email = 'admin@somewhere.here';
/^root:(?!\s*\Q$admin_email\E)/
Note that if you want more constraints on your regex, you need to add them at the end of the zero-width negative lookahead assertion. Hope this helps.
Update: Wow, guess that took me a lot longer than I thought it would. Everyone else already said what I did =/
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1
| [reply] [d/l] [select] |
|
|