http://qs321.pair.com?node_id=306942


in reply to Re: Building a search engine
in thread Building a search engine

Perlfect is good and I tried. The problem with that is update. If I have over 100,000 files and adding a single file or as 200 files per day is a big problem. Because I have to re-index everything (ie..100200 files). Re-indexing everything takes lots of time. If there is anyway I can do incremental indexing or combine 2 indexing?. How do i go about having a list of files only to index with Perlfect ?

Thanks.
artist

Replies are listed 'Best First'.
Re: Re: Re: Building a search engine
by zakzebrowski (Curate) on Nov 14, 2003 at 13:15 UTC

    Do you have access to a second machine? Build the index on one machine, and then scp the necessary files back to the host that runs the web page...? (Not sure if you can do that...)

    BTW, (I know that you don't have access to a database, but) someone mentioned above that you could do keyword searching by creating an appropriate interafce in mysql. Additionally, mysql (and oracle) have full content / full text search on text / varchar / clob fields. You then just build a content index (exercise left to the student), and then when you do the insert you (should) be able to do a full text search on that table. (You may need to "rebuild" an index to get it work, but again, it's left as an exercise to the student.) The basic idea is to have a "CONTAINS" clause, which specifies if the document contains the following words, bring back a 'match score' for each document... Google search result:free text php/mysql tutorial



    ----
    Zak
    undef$/;$mmm="J\nutsu\nutss\nuts\nutst\nuts A\nutsn\nutso\nutst\nutsh\ +nutse\nutsr\nuts P\nutse\nutsr\nutsl\nuts H\nutsa\nutsc\nutsk\nutse\n +utsr\nuts";open($DOH,"<",\$mmm);$_=$forbbiden=<$DOH>;s/\nuts//g;print +;