in reply to How to identify invalid reg. expr.?
I have the following code in a CGI thing I've been working on off-and-on:
my $searchstring = $obj->param('words');
eval { no warnings; "" =~ /$searchstring/};
if ($@) { ... }
something like "***" entered will raise some flags, as "*" is a quantifier, and must follow something other than another quantifier. A single left paren or an "unclosed" character class (a missing right bracket) will also do it. I'm sure there are other regexen that will "break" this.
I should point out that I got the idea for using that method from here somewhere, but I can't find the node in question. I also don't know the ins and outs of eval (block vs. string, for example), but this solution works for me. YMMV. HTH.
--
There are 10 kinds of people -- those that understand binary, and those that don't.
Re: Re: How to identify invalid reg. expr.?
by samtregar (Abbot) on Jun 05, 2002 at 18:40 UTC
|
**** sound of gongs and security buzzers going off, ****
**** red lights flashing, robots running in circles ****
That code one heck of security vulnerability waiting to happen. What happens when I pass: (?{ dump }) as the contents of the words parameter?
Of course, you might have protected this functionality from untrusted access, but since I don't know that you have I think a warning is in order.
-sam
| [reply] [d/l] [select] |
|
At my insistence on the P5P mailing list, the inline-execute feature was restricted to prevent this. From perldoc perlre:
For reasons of security, this construct is for-
bidden if the regular expression involves run-
time interpolation of variables, unless the per-
ilous "use re 'eval'" pragma has been used (see
the re manpage), or the variables contain
results of "qr//" operator (see the
qr/STRING/imosx entry in the perlop manpage).
This restriction is because of the wide-spread
and remarkably convenient custom of using run-
time determined strings as patterns. For exam-
ple:
$re = <>;
chomp $re;
$string =~ /$re/;
Before Perl knew how to execute interpolated
code within a pattern, this operation was com-
pletely safe from a security point of view,
although it could raise an exception from an
illegal pattern. If you turn on the "use re
'eval'", though, it is no longer secure, so you
should only do so if you are also using taint
checking. Better yet, use the carefully con-
strained evaluation within a Safe module. See
the perlsec manpage for details about both these
mechanisms.
I forced the issue when Ilya was initially hesitant by saying that I would have a
CERT warning prepared against Perl 5.6.0 if this feature went in without the
restriction, as it would open up holes worldwide to many naive sites.
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
|
Here we go: Just to verify, I just changed it to
eval { use re 'eval'; no warnings; "" =~ /$searchstring/ };
Run from the browser, it generates an internal server error (the log complains of premature end of script headers, regardless of how the script is run.) but no coredump. When run from the command line, the script is aborted and does dump core. Umm, yay. (Maybe the lack of core dump is due to apache? I'll look into it.)
So, I'll consider removing the regex search completely, or building one from pieces as samtregar mentioned in a previous post. (I already have AND and ANY searches, though. We'll see what I can come up with.) Thanks for the tips, folks.
--
There are 10 kinds of people -- those that understand binary, and those that don't.
| [reply] [d/l] |
|
|
That solves the worst problem, certainly, but what about a denial-of-service attack? It's not hard to craft a regex that won't be solved before the heat death of universe. Or one that crashes Perl through stack exhaustion.
-sam
| [reply] |
|
104: if ($@) {
DB<2> x $@
0 '/(?{ dump })/: Eval-group not allowed at runtime, use re \'eval\'
+at dbsearch.pl line 103.
'
DB<3>
I have no idea what this means. Like I said in my earlier post, I don't know the finer points of using eval, or what's causing it to not dump core. Anyway, how can I make this safer? (I plan to post the entire script for a review one of these years, after I tweak one or two more things, and find a place to host the script.)
--
There are 10 kinds of people -- those that understand binary, and those that don't.
| [reply] [d/l] |
|
Interesting. I didn't know you couldn't eval code in a regex at run-time. Well, even without that I could still hog your CPU by passing it a regex with an exponential solving time.
As far as what you can do - don't accept a regex from an untrusted user. I don't believe there's any way to fully validate the friendliness of a regex. Maybe you could offer your users a set of pre-canned searchs "full-word search", "phrase search", "starts with", "ends with", etc. Then use the input to build the appropriate regex with \Q$term\E to quarantine the input.
-sam
| [reply] |
|
|
|
|