Think about Loose Coupling | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hi again,
I am afraid that you didnt understand the shema that i use... One more time... Document Collection: 5000 documents Average size of each document(nr of words): 554 words Number of individual words that appear in the colection: 15000 Now for each word that appears in the collection (in this case 15000 times) i save in the DBMS 15000 posting lists...like below... word="Gratis" --> word id=15 column word Id ==> 15 column Posting List ==> "1 2;3;4; 2 5;2; 3 1;15;...... word="Twee" --> word id=10 column word Id ==> 10 column Posting List ==> "10 20;3 21 2; 43 100;105;...... Like this way for each of the 15000 words.... So if the user will give the query -> "gratis twee" i will fetch the two above posting lists and i merge them to find wich documents have the both terms and i will rank higher the docs which have the terms knear to each other by checking their positions(proximity score) i am afraid that your solution isnt what i actually i look for, ..but anyway thanks for your time.. As i wrote in previous post i found a way to encode postive integers not matter how big they are by using the Elias gamma code... An example of the code is like below...
My first question is how can translate the below string as bit string... posting list==> "1110001110101011111101101111011" and my last question when i will fetch this string how i will unpack it and read one BIT per time..? The decode process is like that: 1. Read and count 1s from the stream until you reach the first 0. Call this count of ones N. 2.read the next N bits of the stream and translate them in ASCI.. (f.e. 101 = 5 ) 3.So to decode the number i sum N to the 2 power with the number of the step 2. So the first number is the 9 = 1110001. The second is the 6 = 11010...and e.t.c. I hope you can help me .... Regards Mimis In reply to Re^3: Compress positive integers
by MimisIVI
|
|