So with this clearer statement of your actual problem, we can see there's a statistical method you can use to determine if a collection of bytes is less random than expected. And I may be better able to help you nail down the simplest method than a smart person is precisely because I don't know mathematics or statistics very well.
In an encrypted file, each of the 256 bytes from 0 through 255 will occur about the same number of times. They won't occur the exact same number of times, of course, but they'll mostly be very close in frequency. (This is one of your stated assumptions.) You can easily measure the maximum variance from the mean of the frequencies of one or more example encrypted files. I remember learning the word "epsilon" a few years ago. I think it applies here. You compute a useful epsilon to use to determine if one or more bytes of an encrypted file occur more or less frequently than expected. Wild outliers imply corruption.
I used the word "variance" above. I think standard deviation is a measure of statistical variance. (I'm not going to google it now. I'm winging this explanation on intuition and poor memory.) I think of the epsilon I described above as being the result of computing the greatest percentage difference from the mean of the furthest outlier from the mean in a viable encrypted file. I don't know enough about standard deviation to know if it has anything to do with my naïve conception of "percentage difference from the mean." But I suspect it does.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|