Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Re: Re: Refining a 'vector space search'.

by gjb (Vicar)
on Jun 28, 2003 at 17:35 UTC ( #269911=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Refining a 'vector space search'.
in thread Refining a 'vector space search'.

You can get away with a sparse representation of the vector, i.e. you can keep just the positions of the elements of the vector that are non-zero. Say the vector looks like:

(0, 1, 0, 0, 0, 1, 0, 0, 1, 0)
than you can encode it by a list containing just:
( 1, 5, 8 )
This can be a huge space saver if the vector is really large. It is very easy to calculate the cosine between vectors in that representation: just count the number of common elements in the lists and divide by the square root of product of the length of each of the two lists.

Hope this helps, -gjb-

Replies are listed 'Best First'.
Re: Re: Re: Re: Refining a 'vector space search'.
by Seumas (Curate) on Jun 28, 2003 at 18:56 UTC
    Is it enough to represent the values in 0/non-zero like that? That is, do we just need to know that "places 4, 11, 145, 519, 1238 are all non-zero"? I mean, in the vector I displayed as the example above, you see that they are broken down into what appears to be cos themselves(?) and everything is either 0 or a point between 0 and 1. So when performing a calculation on your above example, would "places 1,5 and 8 are positive" be just as fruitful as "place 1 is 0.02453, place 5 is 0.42128 and place 8 is 0.242112"?

      This depends on the algorithm. Some information retrieval algorithms just work with boolean values, others keep track of the frequency of a term in a document.

      If you want to keep track of the frequencies, you can either store position/frequency pairs or use two lists, one for the position, the other for the frequencies. The former approach is cleaner, the latter should be faster.

      Hope this helps, -gjb-

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://269911]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2021-12-04 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?