http://qs321.pair.com?node_id=11100027

Hmmm. I was just thinking of a 90-bit vec as a set. So the bits in vec(expr, offset, bits) is 1.

I've probably missed some criterion why the vec can't be interpreted as 8bit chars for output.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re^4: Data compression by 50% + : is it possible?
by LanX (Sage) on May 15, 2019 at 16:47 UTC
The average line size of the old format like demonstrated by the OP's code is ~ 14 characters + newline

90 bits need 11.25 bytes per line.

Not that much of a big win.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

I just modified my 54 bits/line solution to do base 50 math like you suggested to get 51 bits/line.

It does work --- but ---it is SLOW!!!

```#!/usr/bin/perl

# https://perlmonks.org/?node_id=1233613

use strict;
use warnings;
use bignum;

my @legal = grep !/11/ && tr/1// <= 3, glob '{0,1}' x 8;
my %code;
@code{@legal} = 0 .. \$#legal;
my %decode = reverse %code;
\$_ = [ split ' ', '23456789' & tr/01/ ?/r ] for values %decode;

sub tobits
{
my \$n = shift;
my \$bits = '';
\$bits = \$n % 2 . \$bits, \$n >>= 1 for 1 .. 51;
\$bits;
}

sub tonum
{
my \$n = 0;
\$n = 2 * \$n + \$_ for split //, shift;
\$n;
}

#print tobits(42), "\n";
#exit;

sub compress
{
my \$coded = '';
for ( shift =~ /(.*)\n/g )
{
my @lookup = (0) x 123;
@lookup[ unpack 'C*', \$_ ] = (1) x length;
my \$n = 0;
for( my \$group = 35; \$group < 123; \$group += 10 )
{
\$n = \$n * 50 + \$code{ join '', @lookup[\$group .. \$group + 7] };
}
#   print "\$n\n";
\$coded .= tobits(\$n);
}
return pack 'b*', \$coded;
}

sub decompress
{
my \$decoded = '';
for my \$line ( unpack('b*', shift) =~ /.{51}/g )
{
my \$n = tonum(\$line);
#   print "\$n\n";
my \$digit = 8;
for( my \$group = 33; \$group < 123; \$group += 10 )
{
\$decoded .= pack 'C*',
map \$group + \$_, @{ \$decode{ int \$n / 50 ** \$digit % 50 } };
\$digit--;
}
\$decoded .= "\n";
}
return \$decoded;
}

my \$input = '';
for (1 .. 80)
{
for (my \$x=0; \$x<90; \$x+=10)
{
my @c;
push(@c, int (rand(10)+\$x));
push(@c, int (rand(10)+\$x));
push(@c, int (rand(10)+\$x));
push(@c, int (rand(10)+\$x));
@c = sort{\$a<=>\$b}@c;
for (my \$i = 1; \$i < @c; \$i++)
{
\$input .= chr(33+\$c[\$i]) if \$c[\$i] != \$c[\$i-1] && \$c[\$i] != \$c[\$
+i-1]+1;
}
}
\$input .= "\n";
}

#use Data::Dump 'dd'; dd \$_ for \$input =~ /.*\n/g;
print "\n     input length ", length \$input, "\n";

my \$compressed = compress(\$input);
my \$compressedlength = length \$compressed;
print "compressed length \$compressedlength\n";
my \$restored = decompress(\$compressed);
if( \$input eq \$restored )
{
printf "\nMatched, compression ratio = %.1f%%\n",
100 * (1 - length(\$compressed) / length(\$restored));
}
else
{
print "----------------------------------------failed\n";
use Data::Dump 'dd'; dd \$_ for \$restored =~ /.*\n/g;
}
Hi thanks. :)

I'm not surprised, I expected a trade off of speed and compression.

But look at the bright side,  BUK owes you (or Roboticus?) a dinner now. ;)

Honestly if I really needed it I'd rather try to precompute 9 look up tables.

But at this point I'd most probably just try Huffman coding with a single table and even higher compression.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice