Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Hash Clash on purpose

by l2kashe (Deacon)
on Jun 02, 2003 at 20:08 UTC ( [id://262476]=note: print w/replies, xml ) Need Help??


in reply to Hash Clash on purpose

I read the paper, and the crux of the issue is not processing incoming data prior to passing it to a hashing function. I think its something ala
chomp($f = <>); $input{$f}++;
Though it might be something closer to
chomp($f = <>); if ( $input{$f} ) { blah; }
The problem described is causing the hashing function to work harder to produce the hashed string... degrading performance to O(n * n) time frame. But it assumes the attacker has a quasi direct pipe to the hashing functions input.

There are situations where it makes sense to track data collected in a hash (letter/number counting comes to mind off the top of my head), but the type of pre constructed data that is required for this attack to be successful raises other questions in my mind.

If this is something other than trivial tracking of data, and is infact some larger piece of production code we are talking about then you don't trust anything coming in from outside, right??

I would be interested to hear what other monks think after reading the paper, personally it would be interesting to hear from individuals who actually work on the Perl codebase, and what they think of this situation. Also I didn't see any code to reproduce this, but I also didn't look too hard. I think I may revist this now though.

MMMMM... Chocolaty Perl Goodness.....

Replies are listed 'Best First'.
Re: Re: Hash Clash on purpose
by iburrell (Chaplain) on Jun 02, 2003 at 21:29 UTC
    Since this issue hasn't been recognized before, no one worries about this form of bad input.

    There is one very common module that accepts input from the outside and inserts strings into a hash table: CGI.pm. It stores the parameter names as keys in a hash table. As an experiment, I took the 10,000 strings from the paper, turned them into a query string of about 200 KB, and POST it to a CGI script. The CGI process ran for 3 minutes at 99% CPU before it finished parsing.

    Lots of CGI scripts don't have any POST limits for forms. Even those that do, lots of strings can be passed in the sizes I have seen used (1-10 MB). POST limits are usually designed to prevent file uploads, not pathological queries.

      Good insight, can you tell what type of perl devel I don't do regularly? ;)

      On another note though, aren't the hash keys in a form and such predefined? Thinking this through, it would take a maliciously crafted URL to actually exploit a backend perl based CGI, as opposed to going through the "normal" processing channel, as in this exploitation its really the keys that are the issue as opposed to the values. Since we are talking about DOS attacks on systems, this is a valid attack. It will most certainly be very easy to track via some simple log searching without some proxying involved, but thats kinda outside of the scope of the thread...

      New question: Is there a way to "fix" CGI.pm?
      Thinking it through it feels knee jerkish to point at CGI, and I think might be outside of the scope of the modules varied duties. Though it would be interesting if you could code something to the effect of...

      use CGI; my $cgi = new CGI; $cgi->allow_param( qw(foo,bar,baz) );
      Which would only allow foo, bar, and baz, and silently drop everything else from the URL. But that kind of breaks the standalone nature of many cgi programs, and would need to be checked for/cleaned prior to actually parsing the parameters. On a plus side this could lead to many many more sights being far more secure than they presently are as its one more obstacle to hurdle in the never ending hunt for other people's processing cycles. Kinda like 'use strict' for CGI :)

      Im still hoping to get feedback from more "senior" members of the community, to get a handle on what they really think of the issue. Also does/will this effect Perl6?

      MMMMM... Chocolaty Perl Goodness.....
        There currently is no way to define the parameters that CGI.pm will accept. Any good CGI program will check its parameters and ignore the ones it doesn't expect. Some will even produce errors if there are extra, unexpected parameters. In any case, this happens after the query string has been parsed and too late to prevent this DoS attack. It wouldn't be too complicated to modify CGI.pm to take a code reference, regular expression, or list to validate the parameters against.

        The simplest fix is probably to change the hash function inside of Perl. Very few programs depend on the internals of the hash implementation, and those that do deserve to be broken. The quickest fix is if there are arbitrary parameters to vary those parameters randomly on startup or per hash table. The paper suggests using a better, less deterministic hash function.

        It wouldn't break anything in existing CGI programs unless you made it mandatory. It could help very much to make it available and to not enforce any such limit unless someone calls the method. That way, adding one line to the code provides the security.

        This shouldn't be kept to just CGI.pm, of course. Any program or module which currently takes outside data and stuffs it into a hash should be modified with the same sort of logic if the Perl hash implementation isn't going to be changed.

        I don't know what Larry would say to BSD licensed code in every case, but I am pretty sure the default answer is that perl is licensed under Artisitic License / GPL, and therefore will not take on the BSD license. That pretty much would kill the inclusion of the code developed by the authors of that paper which they recommend and which is BSD licensed.

        Adding the logic to only accept certain keys for hashes at the language level is likely to be a real performance killer, although it'd be kind of neat. Probably, though, the way to go will be to do this at the module or application level, as applicable.

        Christopher E. Stith
        use coffee;
        > On another note though, aren't the hash keys in a form and such predefined? Thinking this through, it would take a maliciously crafted URL to actually exploit a backend perl based CGI, as opposed to going through the "normal" processing channel, as in this exploitation its really the keys that are the issue as opposed to the values.

        The CGI script reads the keys (params, in this case) via http -- end of story, since the client can ship anything over http. Of course the content will be "maliciously crafted" -- it's an attack, and a rather trivial one, as iburrell just demonstrated.

        > New question: Is there a way to "fix" CGI.pm?

        Sure, and you've mentioned one -- limit which params are allowed. But a simpler way to deal with this attack would be to limit the number of params allowed, with a sensible built in default (something considerably less than 10000). Another possibility is to refuse to enter any more entries into the hash if its size exceeds some threshhold. However, that doesn't work in Perl6, where scalar(%hash) just returns the number of entries. (But maybe there's some other way to get the table size?)

        > Im still hoping to get feedback from more "senior" members of the community, to get a handle on what they really think of the issue.

        It's a real problem, as iburrell trivially demonstrated. The linux kernel folks are addressing it, and the perl folks would be wise to do the same.

        > Also does/will this effect Perl6?

        If they don't do something about it. The current Perl6 hash function is the same as the so-called one-at-a-time function of perl 5.8.0. Scott Crosby's solution of using a universal hash function is expensive, and the linux folks have rejected it. Their approach, however, makes the hash function indeterminate, which would be a problem for many perl applications where reproducibility is important. Perl6 could have a "use determinate;" that would prevent it from implicitly randomizing the hash function (or anything else). If perl doesn't do it by default, then it falls upon CGI.pm, among others, to deal with the problem, and that gets rather ugly in terms of "the modules varied duties", as you put it.

      I'm impressed. 10,000 strings of 24 bytes sent in a 200kb query. I want to know how you pulled that off. :)

      Thanks for the experimental result; I'll note this thread on the webpage. Also remember that the behavior is quadratic. 20,000 strings, 500kbyte, would take 12 minutes. 100,000 strings would run for 5 hours.

      Scott Crosby

        200 KB comes from sloppy rounding.

        It is possible to use shorter strings. There is a bug in how Perl puts hashes in buckets that makes pathological hashes more likely. The paper generated strings that all have the same hash value. It is much easier to find strings that all go in the same hash bucket. tye wrote a patch that fixes this bug and recently got it accepted.

        tilly wrote a generator that uses the bug to quickly find strings that produce pathological hashes. About 1/8 of all strings fit the criteria. Using simple alphabetic strings, I can get 60000 strings encoded 300000 bytes that all go in the same bucket. Inserting all of those into a hash takes 10 minutes for direct insert and much longer for CGI parsing.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://262476]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-25 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found