http://qs321.pair.com?node_id=640714


in reply to Module for Approximate File Comparison

A pretty neat and simple method is one outlined by Zaxo in Re: similar texts !?. Basically, you measure how much it helps a compression algorithm to concatenate the two files together, compared to when you compress them independently.

If you think of a compression algorithm as a mild approximation to Shannon entropy, then this approach is essentially computing the corresponding approximation of mutual information (normalized to the range 0.5 - 1), which is the intuitive concept you seem to be looking for.

blokhead