Problems? Is your data what you think it is? | |
PerlMonks |
Re: More details on compressing a set of integersby BrowserUk (Patriarch) |
on Jan 26, 2003 at 06:48 UTC ( [id://229950]=note: print w/replies, xml ) | Need Help?? |
Its worth pointing out that attempting to compress a string containing a list of comma delimited ascii numbers using Base64, will make the string grow in size, not shrink. Eg. A string containing 1000 5 digit numbers separated by commas, is 6000 bytes. Encode that base64 and you get a string nearly 8K in length. Pack the same string using 16-bits and you get 2000 bytes. If you really need to make it smaller still, then you might consider using Compress::Zlib::memGzip which compresses the same ascii list into 1800 bytes. There are caveats associated with this 10% reduction in size. A) your compression will vary with the data set. B) Compress::Zlib::memGunzip is 10 to 15 times slower than unpack. If your main concern is that using pack & unpack with format 'S*', will limit you to 16-bit numbers, then move to 32-bit using 'L*'. This will pack the same 1000 numbers into 4000 bytes which makes it look not such a good option until you think about what happens when you start using 6-digit numbers. The string containing 100,000 .. 101,000, comma delimited ascii is 7000 bytes. Base64:9467, zipped:1817, packed 32-bit 4000. Again, Zlib is looking good, but now the figures for a random mixed of 1 to 6-digit numbers ascii:6458, base64:8726, zipped:3169, pack 32-bit:4000. Suddenly the trade for decompression speed looks less valuable, and if you move to a random mix of 8-digit numbers, Zlib starts averaging 4500 bytes for 1000 numbers, pack remains at 4000 bytes. Finally, with Zlib, there is a heavy penalty for small lists. Lists of 1 x 5-digits average around 25 bytes; 4 x 5-digits numbers around 45 bytes and 4 x 8-digit numbers around 75 bytes. The corresponding pack'd lists come out at 2/8 for 'S*' and 4/16/16 for 'L*'. Examine what is said, not who speaks. The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
In Section
Seekers of Perl Wisdom
|
|