Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Need RegExp help - doing an AND match

by Anonymous Monk
on Jul 01, 2007 at 13:12 UTC ( [id://624296]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I would like to convert this working code into RegExp:
#@words is a bunch of words , $line is the line to search $found = 1; foreach (@words) { next if(($line =~ /$_/i) || ($_ eq "")); $found = 0; } if($found == 1){ #success }
It's just doing an AND match on the array @words

I have no problem in doing an OR match by using |
Thanks in advance

Replies are listed 'Best First'.
Re: Need RegExp help - doing an AND match - use grep instead
by imp (Priest) on Jul 01, 2007 at 13:33 UTC
    Is there a reason why you want to do this with a regex? It isn't really the right tool for the job.

    I would probably just use grep to get a list of the missing items.

    my @words = qw( foo bar ); while (my $line = <DATA>) { chomp $line; my @missing = grep {$line !~ $_} @words; printf "line: %-10s; missing: %s\n", $line, join ',', @missing; } __DATA__ foo bar foo bar
    Or use List::Util's first, which will abort the search once one is missing.
    use List::Util qw( first ); my @words = qw( foo bar ); while (my $line = <DATA>) { chomp $line; my $missing = first {$line !~ $_} @words; printf "line: %-10s; missing: %s\n", $line, $missing; } __DATA__ foo bar foo bar
    Note that this will only return the first missing item, and if you're searching for something like "0" then you should check whether $has_missing is defined.
      Ah, fantastic, hadn't even thought of using grep. Thanks very much, rather an embarrassing first post. No, there wasn't any special need for Regexp other than trying to learn a bit more about them. My code now looks like this and works:
      #@words is a bunch of words , $line is the line to search if(!grep{$line !~ $_} @words) { #success }
      (I didn't actually need the missing ones.) Thanks a lot, imp!
      Apologies all. I've re-read all posts here and List::Util first does the trick. Thanks to all who replied!
Re: Need RegExp help - doing an AND match
by BrowserUk (Patriarch) on Jul 01, 2007 at 14:08 UTC

    This works okay, though how it fairs performance wise compared with other methods I'm not sure.

    #! perl -slw use strict; sub reAnd{ my $re = ''; $re .= '(?=^.*\b' . quotemeta() . '\b)' for @_; return qr[$re]; } my @words = qw[ an of and ]; my $re1 = reAnd( @words ); #print $re1; my $re2 = reAnd( qw[ a great sweet mother by the wellfed voice beside +him ] ); #print $re2; while( <DATA> ) { m[$re1]i and print "1:$_"; m[$re2] and print "2:$_"; } __DATA__ Stephen, an elbow rested on the jagged granite, leaned his palm agains +t his brow and gazed at the fraying edge of his shiny black coat-sleeve. Pain, that was not yet the pain of love, fretted his heart. Silently, in a dream she had come to him after her death, her wasted body within its loose brown graveclothes giving off an odour of wax and rosewood, her breath, that had bent upon him, mute, reproachful, a faint odour o +f wetted ashes. Across the threadbare cuffedge he saw the sea hailed as a great sweet mother by the wellfed voice beside him. The ring of bay and skyline held a dull green mass of liquid. A bowl of white china ha +d stood beside her deathbed holding the green sluggish bile which she ha +d torn up from her rotting liver by fits of loud groaning vomiting.

    Prints

    C:\test>624296.pl 1:its loose brown graveclothes giving off an odour of wax and rosewood +, 2:a great sweet mother by the wellfed voice beside him. The ring of ba +y

    The basic mechanism is to use regex of the form (?=^.*\bword\b). That is, a positive lookahead assertion that reads: Starting at the begining of the line, skip as much of anything as need to try and locate the word 'word', delimited by word/nonword transitions. (\b).

    As these are zero length assertions, they do not advance the matchpoint, so adding a second one again starts from the beginning of the string. This gives the ability to match any number of words in any order. If they all match, the regex succeeds and the AND operation is achieved.

    By generating the regex in a sub, the 'horrors' of the 'bunch of regex' can be hidden from the squeamish.

    Add /i to the use of the generated regex if you need case independant matching.

    If you omit the ^, then the lookaheads will continue from the current pos, and so you can append the AND operation to longer regex. However, continuing to match after the successful match is more involved.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Ok, thanks everyone for the help!

      RegExp was actually fastest in the end overall and obviously depends on the input.

      Unfortunately I ran into a problem - I don't know how to handle regular expressions actually containing the character '.'

      For example, I may wish to search for a file extension.

        See perlre.

        /./ will match the character '.', but will also match many other characters. What you likely want is to quote the dot, so it loses its special meaning:

        "." =~ m!\.!

        Update: GrandFather spotted a missing ! at the end of the regular expression

      You have to appreciate a site where James Joyce writes your sample data... ;-)

      BrowserUK++

Re: Need RegExp help - doing an AND match
by imp (Priest) on Jul 01, 2007 at 13:55 UTC
    Here's one way to do it with regex:
    use strict; use warnings; my $re = qr{ ^ (?=.*foo) (?=.*bar) }x; while (my $line = <DATA>) { print $line if $line =~ $re; } __DATA__ foo bar foo bar bar foo abc foo bar
      Oh, hang on a second, there was a valid reason why I wanted RegExp, actually, I forgot.

      grep will still search the whole array @words, right?

      My original code terminates as soon as it fails to match one word in the array which is what I wanted as the file is huge.
      Ok, it's not the best code at all, but it doesn't have to search the entire array every time.

      I haven't benchmarked to check yet, but I guess grep might take longer.

      I was hoping for a RegExp that stopped its search on a mismatch so I think your second solution might be faster, I will check.

      Thanks!
        The List::Util module has a first function which may be what you want here instead of grep.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://624296]
Approved by imp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (8)
As of 2024-04-18 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found