Pathologically Eclectic Rubbish Lister | |
PerlMonks |
compression challengeby cLive ;-) (Prior) |
on Apr 25, 2001 at 02:18 UTC ( [id://75290]=perlmeditation: print w/replies, xml ) | Need Help?? |
I've put this here, coz I'm not sure which section to post in....
Just spent the worst part of an hour on this, only to spot an obvious problem - chances of matching n bytes is 256^n, not 256^(n-1). D'oh ;-) So this is NOT A SUGGESTED SOLUTION!, just my thought processes before I went wrong... The challenge, discussed on /.: I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant. Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state. With this offer, you can tune your algorithm to my data. You tell me the parameters of size in advance. All I get to do is arrange the bits within my file according to the dictates of my whim. As a processing fee, I will require an advance deposit of $100 from any contestant. This deposit is 100% refundable if you meet the challenge. Although it would never get paid (see this :), I couldn't help but wonder. My thoughts... Compressor:parse the file starting at byte $i=0 search for $n byte repetitions for $n>1, existing when $i is between 256**($n-2) and 256**($n-1) - 1 inclusive splice these out of the data and store a placeholder (ie $i) to show where to resplice bytes to. So, the matches reduce storage space as follows: repeat position match storage 0-255 2 bytes 1 byte 256-65535 3 bytes 2 bytes 65536-16777215 4 bytes 3 bytes etc delimit matches of different byte size with a single delimiter, ie: single byte matches=double byte matches=etc determine the delimiter based on lowest number of occurences in matches (as these would have to be deleted) So for a file of length $i and $m matches (not containing delimiter), bytes saved = $m - (int(log(base 256)$i) possibly minus another one - can't remember my logs :) Only problem is that this saves a negative amount on average... Decompressor:Take the values stored, and parse. Here's a wordy example to illustrate how I'd decompress... my @matches = split '=', 'single byte matches=double byte matches=etc'; my $i=0; for (@matches) { while(substr($_,0,$i+1,'')) { ... (un?)pack to decimal ... add repetion of length $i+1 at this position in the string } $i++; } Obviously, this is pseudocode. we'd be dealing with 2 bytes representing a number between 256 and 65535 etc, but I hope you see the idea. Damn, and I was so sure I was on to something... Ah well, I really must get back to work, but your thoughts/code welcomed on this... cLive ;-)
Back to
Meditations
|
|