laziness, impatience, and hubris | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
One possibility for phrase matching is to build a second layer of indexing. The first layer is "This word exists". The second layer is "This other word is right after me at some place in the document".
Now, this will give the possibility of false matches, depending on how you index. For example, the phrase "in the" might end up matching "Come on in. The tea is on the stove." Another problem is 3+ word phrases. The system I'm proposing will tell you if pairs are in the right order. But, using the above snippet, "in the stove" would match that document because "in the" and "the stove" are both phrases that exist, even though "in the stove" isn't there. But, it all depends on how perfect you want to be. "Good enough, I can give you now. Perfect will be along tomorrow." ------ Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. In reply to Re7: speeding up a file-based text search
by dragonchild
|
|