|Problems? Is your data what you think it is?|
Re: Re: Re: Re: Re: speeding up a file-based text searchby BrowserUk (Pope)
|on May 07, 2003 at 21:56 UTC||Need Help??|
The reasons you have said that using an inverted index isn't practical is that
a) you need to support searching for phrases
Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this).
And, or & not are just extensions of the set manipulations.
b) you need to support partial matches.
or use grep /partial.*., keys %index; (which what is used under the covers).
This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. bio* => bio[^\s]*), unless your users are comfortable using regex notation.
Just a thought in case you haven't already considered this.
Examine what is said, not who speaks."Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller