The reasons you have said that using an inverted index isn't practical is that
a) you need to support searching for phrases
Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this).
And, or & not are just extensions of the set manipulations.
b) you need to support partial matches.
Partial matches are a bit more complex, but davorgs Tie::Hash::Regex as the basic for your inverted index,
or use grep /partial.*., keys %index; (which what is used under the covers).
This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. bio* => bio[^\s]*), unless your users are comfortable using regex notation.
Just a thought in case you haven't already considered this.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
|