Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

grep match variable position

by Anonymous Monk
on May 16, 2016 at 11:33 UTC ( [id://1163128]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks

I use the following to discard stopwords, which are saved in an array, from my process

my $word="of"; my @stopwords= ("a","the","of"); if (grep /$word/i, @stopwords) { print "Stopword $word discarded\n"; } else{ print "$word is not a stop word\n"; }

This works fine if my $word has only one word, it just matches everywhere. It becomes complicated if $word is made out of more words, as I don't want to consider $word a stopword according to the position of the match. What I need to do is to exactily control the grep match, which means:

the position of the match, where the stopword is found in $word, for example at the beginning or at the end of $word. With $word="of the blue" match is true, $word="the blue of" match is true, $word="out of blue" match is no true

No idea where to start. I ould be glad if someone could give me some hint to work on.

Replies are listed 'Best First'.
Re: grep match variable position
by Laurent_R (Canon) on May 16, 2016 at 13:28 UTC
    Hi,

    It will be easier and most probably more efficient to store your words in a hash and check for existence of your candidates in the hash.

    Something like this:

    my $word = "of"; my %stopwords = map { $_ => 1 } qw/ a the of /; if (exists $stopwords{$word}) { print "Stopword $word discarded\n"; } else{ print "$word is not a stop word\n"; }
Re: grep match variable position
by Marshall (Canon) on May 16, 2016 at 14:38 UTC
    You can just refine the regex within the grep. In this case using the block form of grep is I think better... If the stop word is either at the beginning or at the end of the "$word", then it is a match.
    #!/usr/bin/perl use warnings; use strict; my @words = ("of the blue","the blue of","out of blue"); my @stopwords= ("a","the","of"); foreach my $word (@words) { if (grep{$word =~ /^$_/i or $word =~ /$_$/i}@stopwords) { print "Stopword $word discarded\n"; } else { print "$word is not a stop word\n"; } } __END__ Stopword of the blue discarded Stopword the blue of discarded out of blue is not a stop word
Re: grep match variable position
by AnomalousMonk (Archbishop) on May 16, 2016 at 14:53 UTC

    Yet another, and more general, approach:

    c:\@Work\Perl\monks>perl -wMstrict -le "my @stopwords= qw(a the of); my ($stopper) = map qr{ (?i) $_ }xms, join q{ | }, reverse sort map quotemeta, @stopwords ; print $stopper; ;; my $l_bound = qr{ \A }xms; my $r_bound = qr{ \z }xms; ;; for my $string ( 'of sky blue', 'all blue of', 'out of blue', @ARGV, ) { printf qq{'$string' }; if ($string =~ m{ $l_bound $stopper | $stopper $r_bound }xms) { print 'has stopper pattern'; } else { print 'has no stopper'; } } " of the a off and (?^msx: (?i) the | of | a ) 'of sky blue' has stopper pattern 'all blue of' has stopper pattern 'out of blue' has no stopper 'of' has stopper pattern 'the' has stopper pattern 'a' has stopper pattern 'off' has stopper pattern 'and' has stopper pattern
    You may not need such generality.


    Give a man a fish:  <%-{-{-{-<

Re: grep match variable position
by QuillMeantTen (Friar) on May 16, 2016 at 11:56 UTC

    Just my two cents here,
    for each $word why dont you create a custom regex array instead of just a stopword array? You might like reading this regarding the proper storage of your regexes.
    your standard regex could look like something like this:

    #!/usr/bin/perl my $string = $ARGV[0]; if($string =~ m#(\Aof|of\z)#){ print "matched out at start or end\n"; }
    tested on my machine.

Re: grep match variable position
by hippo (Bishop) on May 16, 2016 at 15:45 UTC

    TIMTOWTDI: with index

    #!/usr/bin/env perl use strict; use warnings; use Test::More tests => 5; my @yesmatch = ('of', 'of the blue', 'the blue of', 'ending with of'); my @nomatch = ('out of blue'); my @stopwords = qw/a of the/; for my $phrase (@yesmatch) { ok (stopped($phrase, \@stopwords), "$phrase stopped"); } for my $phrase (@nomatch) { ok ((not stopped($phrase, \@stopwords)), "$phrase not stopped"); } sub stopped { my ($phrase, $sw) = @_; my $rev; for my $word (@$sw) { return 1 unless index ($phrase, $word); $rev //= reverse $phrase; return 1 unless index ($rev, reverse $word); } return 0; }

    Note that "the blue of" starts with "the" and so doesn't really test the end-matching so I've added "ending with of" to the test patterns.

Re: grep match variable position
by jdporter (Paladin) on May 18, 2016 at 13:08 UTC

    This question has been asked and answered here a few times in the past: Super Search it. I especially like bobf's solution using cpan modules.

    PS - I do like to see people re-inventing the wheel, because it means they're exercising their technical creativity, and sometimes a novel solution shows up!

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
Re: grep match variable position
by Anonymous Monk on May 16, 2016 at 16:10 UTC

    Thank you very much for the suggestions, just brilliant!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1163128]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 20:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found