in reply to Re^2: Data compression by 50% + : is it possible?
in thread Data compression by 50% + : is it possible?
It makes a small -- insignificant -- difference to the outcome.
A quick check -- because quicker than trying to derive a formula -- shows that of the 729,000 3-value combinations, 11,839 contain consecutive numbers:
use Algorithm::Combinatorics qw[ variations_with_repetition ];; @c = variations_with_repetition( [ 1 .. 90 ], 3 );; print scalar @c;; 729000 $d = 0; $_->[0]+1 == $_->[1] or $_->[1]+1 == $_->[2] and ++$d for @c; +print $d;; 7922
And there are 8010 combinations of 2 sets of three that must be excluded because the last digit of the first set of 3 is one less that the first digit of the second set:
$d = 0; $_->[0] == $_->[2]+1 and ++$d for @c; print $d;; 8010
Which means that instead of 531,441,000,000 6-value combinations, there are only (729,000 - 7922 )**2 - 8010 = 519,953,474,074, which means it would still take 40-bits to represent any legal 6-value string; so the math doesn't change: ( 1 - 40/48 ) *100 = 16.67% is best possible for *any* dataset.
Any algorithm that achieves better on any given dataset; will not achieve the same results for all datasets; nor even a high percentage of them.
Ie. To achieve better, you'd need to reduce the size of the domain so that you could compress 36-bits into 48 which would give 25% compression. But that would require throwing away 87% of the possible datasets
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Data compression by 50% + : is it possible?
by roboticus (Chancellor) on May 12, 2019 at 13:07 UTC | |
by BrowserUk (Patriarch) on May 12, 2019 at 23:33 UTC | |
by baxy77bax (Deacon) on May 13, 2019 at 08:53 UTC |