P is for Practical | |
PerlMonks |
Re: Re: Re: Re: Re: speeding up a file-based text searchby BrowserUk (Patriarch) |
on May 07, 2003 at 21:56 UTC ( [id://256391]=note: print w/replies, xml ) | Need Help?? |
The reasons you have said that using an inverted index isn't practical is that a) you need to support searching for phrases Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this). And, or & not are just extensions of the set manipulations. b) you need to support partial matches. Partial matches are a bit more complex, but davorgs Tie::Hash::Regex as the basic for your inverted index, or use grep /partial.*., keys %index; (which what is used under the covers). This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. bio* => bio[^\s]*), unless your users are comfortable using regex notation. Just a thought in case you haven't already considered this. Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
In Section
Seekers of Perl Wisdom
|
|