Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regexp exclude a string

by Murcia (Monk)
on Jan 19, 2006 at 13:31 UTC ( [id://524212]=perlquestion: print w/replies, xml ) Need Help??

Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi Confreres,

I want to have all files that have green or red or blue in the text BUT not the string "green red blue", in a dir!

# parenthesis only to show that this is one string! # the string "green red blue" can/is several times in the text

When "green", "red", "blue" is in the text together with the string "green red blue" anywhere in the text than too!

example?

text1:
you can use green red blue.
the mountain was green!
yes, use green red blue.

text2:
you can use green red blue.
the mountain was more a hill!

text3:
you can use colors.
the mountain was blue!

text1, text2, text3 are in one dir result should be: text1 text3

Replies are listed 'Best First'.
Re: regexp exclude a string
by BrowserUk (Patriarch) on Jan 19, 2006 at 14:06 UTC

    If you're on unix, you don't need the BEGIN{} block, and you'll need to use ^D instead ^Z. This assumes that any occurance of 'green red blue' are as shown in the examples, and not split across lines:

    P:\test>perl -ln - text* BEGIN{ @ARGV = map glob, @ARGV} END{ print for sort keys %list } s[green red blue][]g; $list{ $ARGV }=1 if m[\b(green|red|blue)\b]; ^Z

    If the three words could be split across lines, and your files are not huge, then you could use this instead:

    P:\test>perl -0777 -ln - text* BEGIN{ @ARGV = map glob, @ARGV} END{ $\="\n"; print for sort keys %list } s[green\s+red\s+blue][]sg; $list{ $ARGV }=1 if m[\b(green|red|blue)\b]; ^Z

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: regexp exclude a string
by blazar (Canon) on Jan 19, 2006 at 13:48 UTC

    Often two regexen are better than one. Possible strategy: search for the strings you want and set a flag. Also check for the string you don't want and forget about that file immediately if you find it. (You may optimize it not to try to match the "good ones" if one has already been found.) Print the name of the file if the flag is set.

    Update: minimal example with glob. If you want/need to use File::Find or one of its relatives, then adapt as needed. Hack && improve at will.

    # untested FILE: for (glob '*.txt') { open my $fh, '<', $_ or die "$_: D'Oh! $!\n"; my $found; while (<$fh>) { /green|red|blue/ and $found++ if !$found; next FILE if /green red blue/; } print if $found; # $\ eq "\n" }

    Or else if you trust in advance your files not to be huge, you may slurp in them all at once and simply

    print if /green|red|blue/ and !/green red blue/;

    But do not cargo cult that into a (bad) habit, slurping is generally not recommended.

      #!/usr/bin/perl FILE: for my $file (glob '*.txt') { open my $fh, '<', $file or die "$file: D'Oh! $!\n"; while (<$fh>) { /(?=^(?:(?!green red blue).)*$).*?(green|red|blue)/i; do { print "$file \n"; next FILE; } if $1; } }

        (Any good reason why the shebang line is out of the code block?)

        If you find that more readable... but if you knew, then why did you need to ask in the first place? In the meanwhile your regex has grown complex enough and I tired enough not to even try and understand it. ;-)

Re: regexp exclude a string
by turo (Friar) on Jan 19, 2006 at 15:06 UTC

    $headache++
    $myTime--
    $problem->{description}--
    ...

    perl -e '$/=""; map {$c=0; $file=$_; open FH,$file; $_=<FH>; 1 while m +/(?:green red blue)|(?:(green|red|blue)(?{$c++}))/g; close FH; print + "$file matchs!\n" if $c } glob("*")'
    perl -Te 'print map { chr((ord)-((10,20,2,7)[$i++])) } split //,"turo"'
      $headache++ # take a asperine $murciaTime++ $problem->{description}-- # how to do it better? #... #Thanks!
Re: regexp exclude a string
by murugu (Curate) on Jan 19, 2006 at 14:18 UTC

    Hi Murcia,

    If i understand your question correctly, Here is my code.

    #get the files under a directory using readdir or glob. my @a = qw(text1 text2 text3);# filenames in an array. my @found; foreach my $a (@a) { my $FH; open ($FH, $a) or die ("unable to open $a: $!\n"); while (<$FH>) { if (m/green|blue|red/ && !m/green red blue/) { push @found ,$a; last; } } close $FH; } print $_,$/ for (@found);

    Regards,
    Murugesan Kandasamy
    use perl for(;;);

      And if I understand his question correctly, this will fail if he has "this is green and yellow" on one line in a file and "green red blue" on the next one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://524212]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-23 12:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found