http://qs321.pair.com?node_id=1233626


in reply to Re^2: Data compression by 50% + : is it possible?
in thread Data compression by 50% + : is it possible?

It makes a small -- insignificant -- difference to the outcome.

A quick check -- because quicker than trying to derive a formula -- shows that of the 729,000 3-value combinations, 11,839 contain consecutive numbers:

use Algorithm::Combinatorics qw[ variations_with_repetition ];; @c = variations_with_repetition( [ 1 .. 90 ], 3 );; print scalar @c;; 729000 $d = 0; $_->[0]+1 == $_->[1] or $_->[1]+1 == $_->[2] and ++$d for @c; +print $d;; 7922

And there are 8010 combinations of 2 sets of three that must be excluded because the last digit of the first set of 3 is one less that the first digit of the second set:

$d = 0; $_->[0] == $_->[2]+1 and ++$d for @c; print $d;; 8010

Which means that instead of 531,441,000,000 6-value combinations, there are only (729,000 - 7922 )**2 - 8010 = 519,953,474,074, which means it would still take 40-bits to represent any legal 6-value string; so the math doesn't change: ( 1 - 40/48 ) *100 = 16.67% is best possible for *any* dataset.

Any algorithm that achieves better on any given dataset; will not achieve the same results for all datasets; nor even a high percentage of them.

Ie. To achieve better, you'd need to reduce the size of the domain so that you could compress 36-bits into 48 which would give 25% compression. But that would require throwing away 87% of the possible datasets


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit