Word Exclusion Regex (was Re: regex problem)
by japhy (Canon) on Feb 09, 2002 at 21:46 UTC
|
Well, a general principle is this:
# $re = exclude(@words);
sub exclude {
my %words;
push @{ $words{ quotemeta substr($_, 0, 1) } },
quotemeta substr($_, 1)
for @_;
my $first = "[^@{[ join '', keys %words ]}]*";
my $rest =
join "|",
map "$_(?!" . join("|", @{ $words{$_} }) . ")",
keys %words;
return qr/^$first(?:(?:$rest)$first)*$/;
}
my $re = exclude(qw( this that those ));
# print $re; # for debugging purposes
for ("I like this", "give me that one", "these rock!") {
print "$_ => " . /$re/;
}
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
|
(?-xism:^[^p c d]*(?:(?:p(?!ig)|c(?!at)|d(?!og))[^p c d]*)*$)
dog =>
cat =>
pig =>
owl => 1
ddog =>
ccat =>
ppig =>
pdog =>
pcat =>
elephant =>
ppppcatgggg =>
-Blake
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
|
|
|
|
| [reply] |
Re: regex problem
by Juerd (Abbot) on Feb 09, 2002 at 21:24 UTC
|
Use a negative look-ahead assertion, see perlre for information about that.
$pattern = '(?!.*foo)';
It is, however, very expensive. If changing the code is possible, do so.
2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
| [reply] [d/l] [select] |
|
Yikes! /(?!.*foo)/ matches ALL strings! I'm not sure this pattern is meaningful. /bar(?!foo)/ matches strings containing 'bar' that aren't followed by 'foo'. The problem is that matching a negative pattern is a subtle problem. There's a reason for !~. The person who posed this question is S.O.L. The application needs to branch to handle negative patterns. That is the easiest and most readable solution. The other thing to consider is how this pattern is formed at all. If the poster is getting these patterns off the command line or through CGI, it's unwise to directly run that pattern. Remember that Perl regexes can (now) execute arbitrary Perl code. For instance:
# don't run this
$pattern = '(?{`rm -rf *`})';
$str =~ /$pattern/
The beauty here is that no match is required for the perl code to be run.
What's my point? The poster needs better control over incoming patterns anyway, so adding a little branching logic for negative matches shouldn't be burdensome. | [reply] [d/l] [select] |
|
Your regex there is safe, unless you've turned on use re 'eval'. Perl stops you from executing regexes with evaluations in them from variables.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who could use a job
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] |
Re: regex problem
by djantzen (Priest) on Feb 09, 2002 at 21:22 UTC
|
Well, there are two differences between your examples -- the use of variables and the switch from !~ to =~. Is the latter a typo? If not, then you can accomplish the same thing by using [^$pattern]. In any case, your real question is whether regexes can be constructed, and the answer is yes. It's simply a matter of writing it as a normal string, and using perlfunc:eval to compile at runtime. (Note, you may have to do some character escaping (using '\') in your original string if it gets at all complicated.) For example: my $string = "$foo|$bar"; for ( list ) { eval !~ /$string/; }
As far as expense goes, I can't see any reason why this would be particularly taxing.
Update Juerd++ for the correction regarding [^$pattern]. My mistake.
| [reply] [d/l] [select] |
|
If not, then you can accomplish the same thing by using [^$pattern].
[^$pattern] is misleading. [] create a character class, so [^$characters] would be a good example. That doesn't negate a regex, though.
It's simply a matter of writing it as a normal string, and using perlfunc:eval to compile at runtime.
That requires parsing. Parsing is bad, because it's too easy to do it the wrong way. This too doesn't really answer the question.
eval !~ /$string/;
Unless you meant eval($_) !~ /$string/;, and I'm sure you didn't, that's wrong.
2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
| [reply] [d/l] [select] |