Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Perl, MySQL, and Full Text Searches

by jlawrenc (Scribe)
on Mar 18, 2001 at 03:26 UTC ( [id://65216]=note: print w/replies, xml ) Need Help??


in reply to Perl and mySQL Searches

You might be pleased to know that MySQL does support the creation of full text indexes. Check out Chapter 25.2 at MySQL.com for details.

When it comes to parsing your user input forms you do have a few options for syntaxes. You might want to check out some of the text processing modules such as Text::Balanced to reliably extract your quoted strings. However, you might be disappointed to discover that the FTR (Full Text Retrieval) functions of MySQL do not support phrase searching.

I am a fan of a couple of search criteria synaxes:
- "alta vista" - the + and - of keywords syntax
- what I call "simplified boolean" where the users can put AND's and OR's in their searches without needing to do lots of quoting. - you choose your syntax with what you think your users are going to most easily be able to use.

Once you have decided how you want your users to express their queries and how you wanna bust up your searches and you feel that the FTR searches in MySQL are sufficient you could translate user terms into SELECTs.

Keep in mind that it is easy to think "arg, MySQL FTR is not sophisticated enough for my searches". Don't forget You can still do ANDs, ORs and NOTs by stringing successive clauses such as:

create table documents ( docid int not null auto_increment primary key, title varchar(255) null, doctext text, fulltext index (text) ); select title,doctext from documents where match(doctext) against ('perl') or match(doctext) against ('monger') and not match(doctext) against ('java');
Your results will come back weighted by a common relevancy ranking algorithm (vector space) and truncated at 50% "threshold". There are few compile-time tunings you can make other than this threshold, the minimum size of index term, and stopwords.

A good example of current IR technology development is Managing Gigabytes. These guys are pretty damn smart.

I hope that is at least a bit of lead for building your system.

Cheers, Jay

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://65216]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-29 15:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found