http://qs321.pair.com?node_id=256391


in reply to Re: Re: Re: Re: speeding up a file-based text search
in thread speeding up a file-based text search

The reasons you have said that using an inverted index isn't practical is that

a) you need to support searching for phrases

Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this).

And, or & not are just extensions of the set manipulations.

b) you need to support partial matches.

Partial matches are a bit more complex, but davorgs Tie::Hash::Regex as the basic for your inverted index,

or use grep /partial.*., keys %index; (which what is used under the covers).

This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. bio* => bio[^\s]*), unless your users are comfortable using regex notation.

Just a thought in case you haven't already considered this.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller