Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Multiple regexs into single combined regex using lookaheads

by mxb (Pilgrim)
on Apr 16, 2018 at 15:02 UTC ( #1212995=perlquestion: print w/replies, xml ) Need Help??

mxb has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm continuing my exploration of Perl and currently exploring regular expressions.

For this simple test, I have a small string of KEY=value pairs, separated by ';' characters. I am trying to identify if a particular KEY has a certain value and a different KEY does not contain a value.

I have code which follows which does this with two regexs:

#!/usr/bin/env perl use strict; use warnings; use 5.016; for (<DATA>) { chomp; print "Match: $_\n" if /name=bob/ and not /flags=.*?cat.*?;/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

In this simple example, I'm looking to get the line which matches a name of 'bob' and flags does not contain 'cat'. This code works and returns the single matching line.

However, I tried combining the regexes and using lookaheads to match multiple conditions, but this is currently failing. My code for this follows

#!/usr/bin/env perl use strict; use warnings; use 5.016; for (<DATA>) { chomp; print "Match: $_\n" if /(?=name=bob)(?!flags=.*?cat.*?;)/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

My understanding is that this matches if name is set to 'bob' and flags do not contain 'cat' (searching up to the first semicolon). However, it returns both 'bob' lines.

Am I missing something obvious here? I understand that lookaheads are zero-width matches, but maybe this is something to do with anchoring?

PS: I know this could probably be done simpler by splitting on ';' into a hash and validating the hash, but this is a learning exercise :)

Many thanks

Replies are listed 'Best First'.
Re: Multiple regexs into single combined regex using lookaheads
by tybalt89 (Prior) on Apr 16, 2018 at 15:16 UTC
    #!/usr/bin/perl use strict; use warnings; for (<DATA>) { chomp; print "Match: $_\n" if /^(?=.*name=bob)(?!.*flags=.*?cat.*?;)/; } __DATA__ name=bob;flags=human;age=10 name=tiddles;flags=cat,black;age=3 name=bob;flags=cat,white;age=6

      Ahh! I see you've added .* before each match. Many thanks for the working example.

      Am I correct in thinking that both lookaheads must be anchored in the same position (which is ^ in the example above?)

      Because they are both anchored at the start of the string, each lookahead can match multiple of any character before the string I'm looking for (e.g. name=bob) as this allows each lookahead to 'seek' forward independently, and succeed at a different point in the string?

        In answer to your two questions: yes and yes.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1212995]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (2)
As of 2020-10-25 06:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (249 votes). Check out past polls.

    Notices?