comment on

Hi again,

I am afraid that you didnt understand the shema that i use...

One more time...

Document Collection: 5000 documents
Average size of each document(nr of words): 554 words
Number of individual words that appear in the colection: 15000

Now for each word that appears in the collection (in this case 15000 times)
i save in the DBMS 15000 posting lists...like below...

word="Gratis" --> word id=15
column word Id ==> 15
column Posting List ==> "1 2;3;4; 2 5;2; 3 1;15;......

word="Twee" --> word id=10
column word Id ==> 10
column Posting List ==> "10 20;3 21 2; 43 100;105;......

Like this way for each of the 15000 words....

So if the user will give the query -> "gratis twee"

i will fetch the two above posting lists and i merge them
to find wich documents have the both terms and i will rank higher the docs which
have the terms knear to each other by checking their positions(proximity score)

i am afraid that your solution isnt what i actually i look for,
..but anyway thanks for your time..

As i wrote in previous post i found a way to encode
postive integers not matter how big they are by
using the Elias gamma code...

An example of the code is like below...

Number(ASCI) --> Gamma representation(BITS)
         1   =  1
         2   =  010
         3   =  011
         4   =  00100
e.t.c...
[download]

My first question is how can translate the below string as bit string...

posting list==> "1110001110101011111101101111011"

and my last question when i will fetch this string how i
will unpack it and read one BIT per time..?

The decode process is like that:

1. Read and count 1s from the stream until you reach the first 0. Call this count of ones N.
2.read the next N bits of the stream and translate them in ASCI..
(f.e. 101 = 5 )
3.So to decode the number i sum N to the 2 power with the number of the step 2.

So the first number is the 9 = 1110001.
The second is the 6 = 11010...and e.t.c.

I hope you can help me ....

Regards
Mimis

In reply to Re^3: Compress positive integers by MimisIVI
in thread Compress positive integers by MimisIVI

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Think about Loose Coupling
	PerlMonks