Re: Re: Re: speeding up a file-based text search

in reply to Re: Re: speeding up a file-based text search
in thread speeding up a file-based text search

Could you you show us a few examples queries that you wish to support, specifically, the format in which the queries are defined?

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Comment on Re: Re: Re: speeding up a file-based text search

Replies are listed 'Best First'.
Re: Re: Re: Re: speeding up a file-based text search by perrin (Chancellor) on May 07, 2003 at 20:42 UTC
Here are the options: query type: phrase, and, or case-sensitive: yes, no whole words only: yes, no	[reply]
Re: Re: Re: Re: Re: speeding up a file-based text search by BrowserUk (Patriarch) on May 07, 2003 at 21:56 UTC
The reasons you have said that using an inverted index isn't practical is that a) you need to support searching for phrases Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this). And, or & not are just extensions of the set manipulations. b) you need to support partial matches. Partial matches are a bit more complex, but davorgs Tie::Hash::Regex as the basic for your inverted index, or use `grep /partial.., keys %index;` (which what is used under the covers). This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. `bio => bio[^\s]*`), unless your users are comfortable using regex notation. Just a thought in case you haven't already considered this. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l] [select]
Re: Re: Re: Re: Re: Re: speeding up a file-based text search by perrin (Chancellor) on May 07, 2003 at 22:06 UTC
Phrase matching is not the same as "and" matching. It's not enough for two words to both be in the same record; they have to be there next to each other in the correct order. A word list can't do that, although it can be used to qualify records for further checking. I can do partial matching as part of that, although it requires a full scan of the word list. I'm going to try it. Incidentally, I'm using index() instead of m// for partial matching, which should be faster. Giving users regex search capability is not a goal.	[reply]
Re7: speeding up a file-based text search by dragonchild (Archbishop) on May 07, 2003 at 22:23 UTC
Re: Re: Re: Re: Re: Re: Re: speeding up a file-based text search by BrowserUk (Patriarch) on May 07, 2003 at 22:33 UTC
Re^7: speeding up a file-based text search (word list for phrase search) by Aristotle (Chancellor) on May 09, 2003 at 19:04 UTC

In Section Seekers of Perl Wisdom