Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

baxy77bax:

That depends on how much redundant information there is in your data. The more redundancy you have, the easier it is to squeeze it.

So the first thing you ought to do is find out how random your data appears to be. I rand your script a few times, and it seems to generate about 15 characters per line.

I did a little hacking on your inner loop and discovered that due to your if statements in the encoding of your array @c, there are only 50 different possibilities for it, which you could encode easily in 1 character (six bits). Since you've got 9 iterations of that loop, each line could be encoded as 9 characters. That's not quite enough to crunch out half the space, but it should get you a good start.

The 50 different possibilities have a fairly non-uniform distribution, where the most common 4 should appear 25% of the time, and the most common 9 appear over 50% of the time, so there's definitely some room to squeeze out even more space.

use strict; use warnings; my %cnt; # Generate all possibilities: for my $a1 (0 .. 9) { for my $a2 (0 .. 9) { for my $a3 (0 .. 9) { for my $a4 (0 .. 9) { my @c = sort ($a1, $a2, $a3, $a4); my $s = toStr(@c); $cnt{$s}++; } } } } my $ttl = 0; print <<EOHDR; num pct ttl ttl pct offsets to print ----- ------- ----- ------- ---------------- EOHDR for my $k (sort { $cnt{$b} <=> $cnt{$a} } keys %cnt) { $ttl += $cnt{$k}; printf "%5u %6.2f %5u %6.2f <%s>\n", $cnt{$k}, 100*$cnt{$k}/10000.0, $ttl, 100*$ttl/10000.0, $k; } sub toStr { my @c = @_; my @rv = (); for (my $i=1; $i < @c; $i++) { if ($c[$i] != $c[$i-1] && $c[$i] != $c[$i-1]+1) { push @rv, $c[$i]; } } return join(":",@rv); }

When I ran it, it shows:

$ perl funky.pl num pct ttl ttl pct offsets to print ----- ------- ----- ------- ---------------- 840 8.40 840 8.40 <7> 830 8.30 1670 16.70 <8> 682 6.82 2352 23.52 <6> 592 5.92 2944 29.44 <> 524 5.24 3468 34.68 <5> 508 5.08 3976 39.76 <9> 408 4.08 4384 43.84 <5:8> 396 3.96 4780 47.80 <6:9> 396 3.96 5176 51.76 <6:8> 366 3.66 5542 55.42 <4> 336 3.36 5878 58.78 <7:9> 312 3.12 6190 61.90 <5:7>

What the table means is that the most common result would be that the array @c would yield chr(33 + 7 + $x), and would do so 8.4% of the time. The fourth row shows that the array @c would print nothing at all 5.92% of the time. The ninth row shows that we would print chr(33 + 6 + $x), chr(33 + 8 + $x) 3.96% of the time.

By using the table and encoding the values in 9 characters, you could also omit the "\n" and read the data as fixed length records, if you don't want to try to find out a method to squeeze out a bit more. I'd suggest reading Huffman_coding and/or Arithmetic_coding for other ideas.

Edit: tweaked a phrase, fixed the links to wikipedia articles.

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re: Data compression by 50% + : is it possible? by roboticus
in thread Data compression by 50% + : is it possible? by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-04-20 15:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found