Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Match strings in order of character length, and remove the string from further processing

by Athanasius (Archbishop)
on May 01, 2019 at 07:37 UTC ( [id://1233219]=note: print w/replies, xml ) Need Help??


in reply to Match strings in order of character length, and remove the string from further processing

Hello ScarletRoxanne,

Here’s an approach which replaces each phrase/term with a temporary marker, then removes stopwords, then replaces the markers with their original terms:

use strict; use warnings; use Const::Fast; use Data::Dump; const my $DELIM => '\034'; my %stops = map { lc $_ => 1 } qw( I am the of and you are ); my @terms = ('manager of sales', 'chairman of the board'); @terms = sort { length $b <=> length $a } @terms; # longest firs +t my $file3 = 'I am the Senior Manager of Sales and of Marketing. ' . 'You are the Chairman of the Board of Directors.'; $file3 =~ tr/A-Z/a-z/; # convert to lower case # replace terms with temporary markers $file3 =~ s{$terms[$_]}{$DELIM$_$DELIM}gi for 0 .. $#terms; my @file3 = split /\s+/, $file3; @file3 = grep { ! exists $stops{$_} } @file3; for my $entry (@file3) { if ($entry =~ /\Q$DELIM\E(\d+)\Q$DELIM\E/) { $entry = '*' . $terms[$1] . '*'; } else { $entry =~ s{[[:punct:]]}{}g; # remove punctuation } } print "$_\n" for @file3;

Output:

17:35 >perl 1997_SoPW.pl senior *manager of sales* marketing *chairman of the board* directors 17:35 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

  • Comment on Re: Match strings in order of character length, and remove the string from further processing
  • Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1233219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-24 22:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found