in reply to Re: Regex solution needed
in thread Regex solution needed

And you beat me to whipping up a solution, though plugging your regex into my test proves that your regex works for the test cases I was able to think up:

#!/usr/bin/perl use strict; use warnings; my @input = ( "cocks are roosters!", "my cocks crow at dawn", "i'm a fan of the cocks", "game cocks all the way!", "do you like the gamecocks?" ); my $vulgar_list = { cocks => { regex => qr/(?<!a)(?<!the)(?<!game)(?<!\s)\s*coc +k/ } }; my $foundvulgar; for my $input( @input ){ study($input); foreach my $word (keys %$vulgar_list) { my $regex = $vulgar_list->{$word}->{regex}; if ($input =~ m/$regex/) { $foundvulgar = $word; last; } } print "phrase: $input found?: $foundvulgar\n"; $foundvulgar = ''; } __OUPUT__ phrase: cocks are roosters! found?: cocks phrase: my cocks crow at dawn found?: cocks phrase: i'm a fan of the cocks found?: phrase: game cocks all the way! found?: phrase: do you like the gamecocks? found?:

s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)

Replies are listed 'Best First'.
Re^3: Regex solution needed
by spivey3587 (Acolyte) on Feb 23, 2007 at 19:47 UTC
    Thanks for all the input, guys. I enhanced the initial suggestion to include a few more lookforward possibilities which I think covers enough bases for me to keep it in the dictionary. I omitted the dirty test cases since these are "sacred" boards, but they did get caught correctly :)

    my @tests = ( "How bout them cocks?", "I'm a big cocks fan", "I love the cocks", "That cocks game was sweet", "Anyone know the cocks score from last night?", "gamecocks rule", "I love the gamecocks, but...", "My favorite cocks player is..." ); foreach my $s (@tests) { if ($s =~ / (?<! \b a ) (?<! \b the ) (?<! \b them ) (?<! \b game ) (?<! \s ) \s* cocks? \b (?! \s fan ) (?! \s game ) (?! \s score ) (?! \s player ) /x) { print "Vulgar: '$s'\n"; } }

      The \s should probably be \s+, and

      (?! \s+ fan ) (?! \s+ game ) (?! \s+ score ) (?! \s+ player )
      is slower than
      (?! \s+ (?: fan | game | score | player ) )

      This factors out the constant \s+, and it uses | which probably has a lower overhead than (?!...). Furthermore, alternations of constant strings can be highly optimized by re engine modifications demerphq added to 5.9. (I don't think those particular strings can be optimized, though.)