Dear Monks,
I am looking to build or use a search engine for my Internet Website. I searched here and at CPAN but found no useful results. (May be I am missing right terms in CPAN search). Search will be for only HTML files. Search engine has to search different set of files for queries coming from different customers. The platform could be Windows or Unix using Apache server (no mod_perl) if that matters.
Building Google like search engine would excellent but anything and everything that has good value would help.
I don't want to put Google front-end on my website.
Thus, what I am looking for is exisiting search mechanism or CPAN modules that helps me to build a search engine. Of course, I am open to any other suggestions.
Thank you,
{artist}
Update: Please note that I do not have access to database for this purpose. My files are not changed. They are added/deleted on daily base. Volume: Around 10000 files
Update2:(20031114)
- Volume : total around 200,000 files and 5000 files are added daily.. Some are removed at periodical interval.
- Perlfect Issues:.No incremental/differential index. And it takes large amount of time to build the index. So cannot search on recently updated things. The data has to be published ASAP. So cannot wait for indexing to finish before publishing. If someone has idea: How to encorporate differential indexing with perlfect, I would really appreciate it.
- Cannot use Google. The main reason is I have password protected sites and not everybody should be able see everybody else's contents. Think: -> Building a search system for mail accounts.
20031113 Edit by Corion: Fixed unclosed blockquote
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|