Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Compressing a set of integers - Solution

by toma (Vicar)
on Jan 27, 2003 at 05:14 UTC ( [id://230128]=note: print w/replies, xml ) Need Help??


in reply to Compressing a set of integers

Many thanks for all the responses, especially I0 for the brilliant snippet of code and bart for the clear explanation of the benefits of pack with the "w" template. All of the responses were valuable; they really improved my understanding. I have implemented a solution along these lines and now I only need about 26% as much memory as I did with the ascii dataset. I am delighted with this solution!

With this solution, I expect to save about 1.5 million numbers in about 2 megabytes of storage. Wow!

Here is the proof of concept code that I ended up with:

use Storable; use File::Flat; my @to_save; my $nrows=10000; my $rowmax=300; for (0..$nrows) { my @r; $r[0]=int(rand(20)); for (1..int(rand($rowmax))) { push @r, $r[$_-1] + 1 + int(rand(200)); } push @to_save, join ',',@r; } my $in=""; foreach( @to_save ) { $in .= $_."\n"; my @a = sort {$a<=>$b} split/,/; my $d=0; for( @a ){ ($d,$_)=($_,$_-$d); } $_ = pack'w*',@a; } open(TEXT, ">array.txt") or die "Can't open output file"; print TEXT $in; close TEXT; store (\@to_save, 'array.sto'); my $to_retrieve= retrieve ('array.sto'); my $out=""; foreach( @$to_retrieve ) { my @a = unpack'w*',$_; my $c = 0; foreach my $n (@a) { $c += $n; $out .= $c.","; } chop $out; $out .= "\n"; } if ($in eq $out) { print "OK\n" } my $sz_in= File::Flat->fileSize('array.txt') or die "Can't size array.txt"; my $sz_out= File::Flat->fileSize('array.sto') or die "Can't size array.sto"; print "Final size = ",int(100*$sz_out/$sz_in+0.5),"% of input\n";
It should work perfectly the first time! - toma

Replies are listed 'Best First'.
Re: Re: Compressing a set of integers
by BrowserUk (Patriarch) on Jan 27, 2003 at 05:39 UTC

    I'd seen, but never really taken any notice of the 'w' format for pack & unpack. Totally made for the job.

    Perhaps the only downside of using 'w' instead of 'S' or 'L' is that you need to unpack the whole string to get to an individual value. With 'S' or 'L' you could have indexed into the string with substr to read or write an individual value. How much of a penalty this is will depend on your application and whether you will always need to access all the values in any given string each time.

    Probably worth it if the savings on space are enough.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      It wouldn't be necessary to use substr to extract a value in a buffer full of packed shorts or longs. The '@' code can be used to move to a specific offset in the buffer before extracting.

      --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://230128]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-18 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found