http://qs321.pair.com?node_id=1212995

mxb has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm continuing my exploration of Perl and currently exploring regular expressions.

For this simple test, I have a small string of KEY=value pairs, separated by ';' characters. I am trying to identify if a particular KEY has a certain value and a different KEY does not contain a value.

I have code which follows which does this with two regexs:

#!/usr/bin/env perl use strict; use warnings; use 5.016; for (<DATA>) { chomp; print "Match: $_\n" if /name=bob/ and not /flags=.*?cat.*?;/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

In this simple example, I'm looking to get the line which matches a name of 'bob' and flags does not contain 'cat'. This code works and returns the single matching line.

However, I tried combining the regexes and using lookaheads to match multiple conditions, but this is currently failing. My code for this follows

#!/usr/bin/env perl use strict; use warnings; use 5.016; for (<DATA>) { chomp; print "Match: $_\n" if /(?=name=bob)(?!flags=.*?cat.*?;)/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

My understanding is that this matches if name is set to 'bob' and flags do not contain 'cat' (searching up to the first semicolon). However, it returns both 'bob' lines.

Am I missing something obvious here? I understand that lookaheads are zero-width matches, but maybe this is something to do with anchoring?

PS: I know this could probably be done simpler by splitting on ';' into a hash and validating the hash, but this is a learning exercise :)

Many thanks

Replies are listed 'Best First'.
Re: Multiple regexs into single combined regex using lookaheads
by tybalt89 (Monsignor) on Apr 16, 2018 at 15:16 UTC
    #!/usr/bin/perl use strict; use warnings; for (<DATA>) { chomp; print "Match: $_\n" if /^(?=.*name=bob)(?!.*flags=.*?cat.*?;)/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

      Ahh! I see you've added .* before each match. Many thanks for the working example.

      Am I correct in thinking that both lookaheads must be anchored in the same position (which is ^ in the example above?)

      Because they are both anchored at the start of the string, each lookahead can match multiple of any character before the string I'm looking for (e.g. name=bob) as this allows each lookahead to 'seek' forward independently, and succeed at a different point in the string?

        In answer to your two questions: yes and yes.


        Give a man a fish:  <%-{-{-{-<