MimisIVI has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I build a Search engine which use an inverted index.
For small number of documents is quiet fast but not for
large ...
The solution is to compress the data that i am saving in
the index but i have problem to understand the two most
helpfull Perl's functions (pack and unpack)...
The are a lot of compression techiques, like Elias delta
code, Golomb code, e.t.c.
I made a reasearch for Perl code in the net but with
no any luck. I found this code in C++ which simply encode and
decode positive integers by use the byte alligned compression techique..
In each case, n postings are processed. Uncompressed
postings are stored in the integer array uncompressed,
compressed postings in the byte array compressed.
Unfortunately my knowledge is very low in C++ and i
need your help to translate this code in Perl!!!
If anyone knows any other compression techniques (except of the Huffman because its too complex for me..) for
possitive integers will be huge help for me too...
Thanks in Advance!
Mimis
I build a Search engine which use an inverted index.
For small number of documents is quiet fast but not for
large ...
The solution is to compress the data that i am saving in
the index but i have problem to understand the two most
helpfull Perl's functions (pack and unpack)...
The are a lot of compression techiques, like Elias delta
code, Golomb code, e.t.c.
I made a reasearch for Perl code in the net but with
no any luck. I found this code in C++ which simply encode and
decode positive integers by use the byte alligned compression techique..
Encode Integers: int outPos = 0, previous = 0; for (int inPos = 0; inPos < n; inPos++) { int delta = uncompressed[inPos] - previous; while (delta >= 128) { compressed[outPos++] = (delta & 127) | 128; delta = delta >> 7; } compressed[outPos++] = delta; } Decode Integers: int outPos = 0, previous = 0; for (int outPos = 0; outPos < n; outPos++) { for (int shift = 0; ; shift += 7) { int temp = compressed[inPos++]; previous += ((temp & 127) << shift); if (temp < 128) break; } uncompressed[outPos] = previous; }
In each case, n postings are processed. Uncompressed
postings are stored in the integer array uncompressed,
compressed postings in the byte array compressed.
Unfortunately my knowledge is very low in C++ and i
need your help to translate this code in Perl!!!
If anyone knows any other compression techniques (except of the Huffman because its too complex for me..) for
possitive integers will be huge help for me too...
Thanks in Advance!
Mimis
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Compress positive integers
by ikegami (Patriarch) on Apr 07, 2008 at 23:08 UTC | |
Re: Compress positive integers
by BrowserUk (Patriarch) on Apr 08, 2008 at 02:02 UTC | |
Re: Compress positive integers
by FunkyMonk (Chancellor) on Apr 07, 2008 at 23:09 UTC | |
Re: Compress positive integers
by tachyon-II (Chaplain) on Apr 08, 2008 at 01:13 UTC | |
Re: Compress positive integers
by Cody Pendant (Prior) on Apr 07, 2008 at 23:47 UTC | |
by radiantmatrix (Parson) on Apr 11, 2008 at 17:24 UTC | |
by MimisIVI (Acolyte) on Apr 11, 2008 at 17:54 UTC | |
Re: Compress positive integers
by MimisIVI (Acolyte) on Apr 08, 2008 at 02:41 UTC | |
by BrowserUk (Patriarch) on Apr 08, 2008 at 07:33 UTC | |
by MimisIVI (Acolyte) on Apr 08, 2008 at 12:49 UTC | |
by tachyon-II (Chaplain) on Apr 09, 2008 at 00:35 UTC | |
by MimisIVI (Acolyte) on Apr 09, 2008 at 13:19 UTC | |
| |
by BrowserUk (Patriarch) on Apr 09, 2008 at 19:07 UTC |
Back to
Seekers of Perl Wisdom