Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^5: On showing the weakness in the MD5 digest function and getting bitten by scalar context

by Anonymous Monk
on Aug 27, 2004 at 21:18 UTC ( [id://386496]=note: print w/replies, xml ) Need Help??


in reply to Re^4: On showing the weakness in the MD5 digest function and getting bitten by scalar context
in thread On showing the weakness in the MD5 digest function and getting bitten by scalar context

Which part of "computationally infeasible" did you not understand? But I think it's better stated, "meant to be computationally infeasible to find a different message with the same hash", with two subcategories of infeasibility: where the original message is known, and where it is not.
  • Comment on Re^5: On showing the weakness in the MD5 digest function and getting bitten by scalar context

Replies are listed 'Best First'.
Re^6: On showing the weakness in the MD5 digest function and getting bitten by scalar context
by BrowserUk (Patriarch) on Aug 27, 2004 at 22:31 UTC

    See 386498. Then point me at the vulnerability that this discovery opens up?

    I recently ran a process that produced 100 million md5s of randomly generated data. I hit duplicates in that process on at least two runs. For my purposes, using the md5 Digest as a hashing function, I simply added a space to the end of the text to maintain uniqueness. For that application, the trailing whitespace was irrelevant.

    But 2 runs out of 4 or 5 of 100 million showed duplicates. A total runtime of less than 24 hours. Was I just extraordinarially (un)lucky? I don't think so. As I said in another thread recently, stats ain't my strong suite, but I think that the odds of generating 2 matching pairs from 500 million is probably well within statistical norms.

    However, if you gave me an md5, and asked me to find a plaintext that matched it, without giving me the plaintext you had used to generate it. That would be computationally infeasible. This, I belive, is what the md5 algorithm is intended to achieve.

    But if you need the original plaintext in order to generate the new plaintext?

    Alternatively, take my trojan binary and the md5 from some trusted piece of code, and then tell me what bytes I need to insert into data space (and where) within that binary in order for it's md5 to match that of the trusted piece of software. That would be a vulnerability that would make me consider md5 broken.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      People are rat-holing on the "I can make two random-data chunks which have the same MD5 hash." Sure, that's an interesting data point.

      But my point is (and has been) that the complexity is far higher if you want to (1) engineer a new data stream which will match an original data stream's hash, while (2) maintaining a plausible protocol and formatting.

      • Forge a new JPEG image (same hash as the original image) which has no broken data fields.
      • Forge a new tarball which can still be decompressed.
      • Forge a new text file without using random line noise or dictionary gibberish.

      If you can do that, even once, THEN I will be impressed and reconsider the value of MD5's distribution fingerprinting.

      --
      [ e d @ h a l l e y . c c ]

      I recently ran a process that produced 100 million md5s of randomly generated data. I hit duplicates in that process on at least two runs.
      Are you absolutely certain that the random data itself contained no duplicates? I would be very interested to see these MD5 collisions of yours.
      I think that the odds of generating 2 matching pairs from 500 million is probably well within statistical norms.
      Sure. If you reduce MD5 to about 30 bits.
      However, if you gave me an md5, and asked me to find a plaintext that matched it, without giving me the plaintext you had used to generate it. That would be computationally infeasible. This, I belive, is what the md5 algorithm is intended to achieve.
      This is known as the Preimage Problem. It's much more difficult than the Collision Problem (finding two messages with the same MD5). Cryptographic hashes are supposed to prevent someone from doing either one. You can answer "no they aren't" again if you want, but you will still be wrong.
      Alternatively, take my trojan binary and the md5 from some trusted piece of code, and then tell me what bytes I need to insert into data space (and where) within that binary in order for it's md5 to match that of the trusted piece of software. That would be a vulnerability that would make me consider md5 broken.
      There are more uses of MD5 than are dreamt of in your philosophy, Horatio.
        Sure. If you reduce MD5 to about 30 bits.
        60 bits. My bad. Point stands.

        From the RFC (which you appear to be (mis)quoting) -- my highlighting:

        This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input.

        It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest.

        The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.


        Cryptographic hashes are supposed to prevent someone from doing either one.

        Nowhere in that do I see MD5 described as a "cryptographic hash"? Any application that uses a "digital signature" as a "cryptographic hash" based upon "conjectured...computational infeasibility" is a misapplication of the algorithm.

        If the application needs a "cryptographic hash", it should be using one.

        There are more uses of MD5 than are dreamt of in your philosophy, Horatio.

        Ah yes, my dear Josephine Hardy*, but how many of them are misuses?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://386496]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-18 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found