Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Calculating corruption

by james28909 (Deacon)
on Oct 18, 2014 at 21:19 UTC ( [id://1104253]=perlquestion: print w/replies, xml ) Need Help??

james28909 has asked for the wisdom of the Perl Monks concerning the following question:

Hey, i am trying to figure out the best approach to checking a file for corruption. the file is encrypted and it cannot be decrypted. but i honestly just need some different points of view as to how i can even go about checking this encrypted file for corruption. it should always be a string of random bits as it is encrypted, but if anyone has any input i would sure appreciate it.

Replies are listed 'Best First'.
Re: Calculating corruption
by BrowserUk (Patriarch) on Oct 18, 2014 at 22:16 UTC

    You acknowledge in your question that an encrypted file will closely simulate random noise.

    If the encryption method was perfect, the (uncorrupted) file would be indistinguishable from random noise. (No encryption is perfect!)

    Corruption could entail any or all of:

    • The inversion of a single bit somewhere in the file.
    • The inversion of every bit in the file.
    • The removal of a single byte somewhere in the file.
    • The removal of every byte in the file.
    • The insertion of a single extra byte in the file.
    • The insertion of any number of extra bytes within the file.
    • The replacement of a single byte in the file.
    • The replacement of every byte within the file.
    • Other...

    Bottom line: unless a checksum of the encrypted file was generated at the same time the file was encrypted; and you know how that checksum was generated; and you have access to the checksum; and you can guarantee that the checksum could not itself have been corrupted; you might just as well be asking how to reverse time for all the possibility of getting a useful answer to your question.

    Stop pursuing the solution to an impossible problem.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      well i have been reading some more and come across standard deviation and spread. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right?

      std dev /should/ be moderately comparable from encrypted file to encrypted file of same data, right? or atleast within a certain range. if it is out of this certain range, then you can safely say it is more than likely corrupted right?
        well i have been reading some more and come across standard deviation. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right?

        Why? (Why would they have a similar StdDev?)

        Standard Deviation measures deviation from the mean. Given a full (eg. exhaustive, but necessarily small) set of all the possible datasets of a given size; the variance (and thus StdDev) of the standard deviations, would range, and be equally distributed, between zero and infinity.

        Hence,the StdDev of any single sample --of anything -- means exactly nothing!

        That is, if the inputs are exactly 'random'; then the standard deviations are linear; and thus, completely uninformative.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The statistical method you describe to determine the likelihood that a stream of bytes is "corrupted" (i.e., altered in some way from its original state) will only work for a very specific kind of corruption:  the kind that results in the assumed randomness of the bytes (due to encryption) being measurably reduced. If this is exactly the kind of corruption you expect and want to identify when it occurs, and you don't expect or want to identify any other kind of corruption, then the statistical method you describe may be useful to you.

        Let's say you have an encrypted file that consists of 1,234,567,890 bytes. One arbitrary bit of one arbitrary byte is switched from 0 to 1, or vice versa. The file is now "corrupted" (i.e., altered from its original state). You will never discover this corruption after the fact by any statistical method (guesswork).

Re: Calculating corruption
by LanX (Saint) on Oct 18, 2014 at 21:33 UTC
      well yeah that would work great if all these files were identical and encrypted the exact same way, but sadly they are not. upon further reading, i started looking for "checking a string for randomess", and that is a dead end it seems as well because how would you define "random" isnt everything "random"? lol
        You asked how to check if a file is corrupted.

        Store a checksum of the encrypted file, best together with the file, maybe appended to the end.

        If you meant something different, you may want to try to explain it...

        Cheers Rolf

        (addicted to the Perl Programming Language and ☆☆☆☆ :)

        update

        btw: Trying to decrypt it should be enough, I'm not aware of any common encryption which can be reversed if corrupted.

        The idea is to generate the checksum right after encryption has occurred. Then whenever the integrity of the file needs to be checked, the checksum of the encrypted file can be recomputed and compared to the baseline value. This will detect corruption to a high level a confidence, if a suitable checksum function is used. It appears in your case that this was not done, so there is no baseline with which to compare. Detecting if the encrypted files have been corrupted from their original state is now impossible.

        Computing the checksum of the files in their current state will enable you to detect any further corruption. If the concern is deteriorating media, this might be useful.

        I understand that certain encryption methods leave telltales in their encrypted files, some of which could be detected via a statistical analysis. This would allow you to tell if a certain file still looks "pretty much like most files encrypted using this method." If that would actually provide any level of reassurance to you, go for it.

        1 Peter 4:10

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1104253]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-03-28 18:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found