When you say "database", do you mean a SQL database?
If so then you need to do some additional work to index
the database (unless you happen to have a full text search engine, of course).
For a very simple system you can do something like this:
create table words
( doc_id int -- or numeric?
, word varchar(32)
)
For each document, extract the words (skipping things like "I", "the", etc.), and store each (doc_id. word) combination in the words table.
Now you can search your documents with something like this:
select d.doc_text, d.doc_id
from documents d
, words w
where w.doc_id = d.doc_id
and w.word in (<list of words to match>)
Now you have all the documents that have each of the words that you want to search on - you can then apply additional logic to find phrases (i.e. "receeding hairline") for example.
To help rank documents you can expand this by adding either a count column and/or a position to the words table. The position column is the offset of this word in the document, and you can use this to handle "near" queries, and also to rank documents.
Michael |