Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: How to identify invalid reg. expr.?

by mephit (Scribe)
on Jun 05, 2002 at 17:11 UTC ( [id://171915]=note: print w/replies, xml ) Need Help??


in reply to How to identify invalid reg. expr.?

I have the following code in a CGI thing I've been working on off-and-on:
my $searchstring = $obj->param('words'); eval { no warnings; "" =~ /$searchstring/}; if ($@) { ... }
something like "***" entered will raise some flags, as "*" is a quantifier, and must follow something other than another quantifier. A single left paren or an "unclosed" character class (a missing right bracket) will also do it. I'm sure there are other regexen that will "break" this.

I should point out that I got the idea for using that method from here somewhere, but I can't find the node in question. I also don't know the ins and outs of eval (block vs. string, for example), but this solution works for me. YMMV. HTH.

--

There are 10 kinds of people -- those that understand binary, and those that don't.

Replies are listed 'Best First'.
Re: Re: How to identify invalid reg. expr.?
by samtregar (Abbot) on Jun 05, 2002 at 18:40 UTC
    **** sound of gongs and security buzzers going off, **** **** red lights flashing, robots running in circles ****

    That code one heck of security vulnerability waiting to happen. What happens when I pass:

    (?{ dump })

    as the contents of the words parameter?

    Of course, you might have protected this functionality from untrusted access, but since I don't know that you have I think a warning is in order.

    -sam

      At my insistence on the P5P mailing list, the inline-execute feature was restricted to prevent this. From perldoc perlre:
      For reasons of security, this construct is for- bidden if the regular expression involves run- time interpolation of variables, unless the per- ilous "use re 'eval'" pragma has been used (see the re manpage), or the variables contain results of "qr//" operator (see the qr/STRING/imosx entry in the perlop manpage). This restriction is because of the wide-spread and remarkably convenient custom of using run- time determined strings as patterns. For exam- ple: $re = <>; chomp $re; $string =~ /$re/; Before Perl knew how to execute interpolated code within a pattern, this operation was com- pletely safe from a security point of view, although it could raise an exception from an illegal pattern. If you turn on the "use re 'eval'", though, it is no longer secure, so you should only do so if you are also using taint checking. Better yet, use the carefully con- strained evaluation within a Safe module. See the perlsec manpage for details about both these mechanisms.
      I forced the issue when Ilya was initially hesitant by saying that I would have a CERT warning prepared against Perl 5.6.0 if this feature went in without the restriction, as it would open up holes worldwide to many naive sites.

      -- Randal L. Schwartz, Perl hacker

        Here we go: Just to verify, I just changed it to
        eval { use re 'eval'; no warnings; "" =~ /$searchstring/ };
        Run from the browser, it generates an internal server error (the log complains of premature end of script headers, regardless of how the script is run.) but no coredump. When run from the command line, the script is aborted and does dump core. Umm, yay. (Maybe the lack of core dump is due to apache? I'll look into it.)

        So, I'll consider removing the regex search completely, or building one from pieces as samtregar mentioned in a previous post. (I already have AND and ANY searches, though. We'll see what I can come up with.) Thanks for the tips, folks.

        --

        There are 10 kinds of people -- those that understand binary, and those that don't.

        That solves the worst problem, certainly, but what about a denial-of-service attack? It's not hard to craft a regex that won't be solved before the heat death of universe. Or one that crashes Perl through stack exhaustion.

        -sam

      Eep! I hadn't even thought of that. I just ran that in the browser, and got my "invalid regex" warning. After examining that bit, it looks like it *should* dump core, but it doesn't. Maybe something in my system (apache configuration, security configuration, quotas, something like that) is preventing the core from being dumped.

      I just ran the script through the debugger, and it turns out that $@ contains the following:

      104: if ($@) { DB<2> x $@ 0 '/(?{ dump })/: Eval-group not allowed at runtime, use re \'eval\' +at dbsearch.pl line 103. ' DB<3>
      I have no idea what this means. Like I said in my earlier post, I don't know the finer points of using eval, or what's causing it to not dump core. Anyway, how can I make this safer? (I plan to post the entire script for a review one of these years, after I tweak one or two more things, and find a place to host the script.)

      --

      There are 10 kinds of people -- those that understand binary, and those that don't.

        Interesting. I didn't know you couldn't eval code in a regex at run-time. Well, even without that I could still hog your CPU by passing it a regex with an exponential solving time.

        As far as what you can do - don't accept a regex from an untrusted user. I don't believe there's any way to fully validate the friendliness of a regex. Maybe you could offer your users a set of pre-canned searchs "full-word search", "phrase search", "starts with", "ends with", etc. Then use the input to build the appropriate regex with \Q$term\E to quarantine the input.

        -sam

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://171915]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-24 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found