Problems? Is your data what you think it is? | |
PerlMonks |
Re: Guessing/Ordering Partial Databy mattr (Curate) |
on Apr 14, 2005 at 07:29 UTC ( [id://447667]=note: print w/replies, xml ) | Need Help?? |
I'd like to point you somewhere and then offer my own swing at this.
One approach is to make a reverse index. You might like to check out an article that's an old favorite of mine on Building a Vector Space Search Engine in Perl. Also Lingua::Stem::Fr may help improve accuracy. Also you can use the above article's suggestion of keeping a bad words list and remove de, la, du, etc. from your dictionary. But in your guesses you seem to want to do phrase matching, and this is not directly supported. There are more sophisticated algorithms but if you want phrases I'd say the brute force with grepping and keeping track of hits is best for this case, it is not so difficult algorithmically and for only a hundred items it will not be slow if you only loop through once for each word. Note a hash key can have spaces in it. That said, here is my shot at it. My strategy was simple, and has the added attraction of keeping score, only showing the highest scoring hits, and allowing you to search for phrases. (at least it seems to work that way so far). If you want to use the command line, take a look at @ARGV.
In Section
Seekers of Perl Wisdom
|
|