Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Running SuperSearch off a fast full-text index.

by clinton (Priest)
on Jun 11, 2007 at 17:53 UTC ( #620547=note: print w/replies, xml ) Need Help??


in reply to Re^3: Running SuperSearch off a fast full-text index.
in thread Running SuperSearch off a fast full-text index.

The cache-loading can be significant with large indexes, but is only felt once if you are working in a persistent environment (mod_perl, FastCGI)

Does this mean that for mod_perl running the prefork MPM, each child process needs to load the cache? That must use a lot of memory, no?

And how do you handle cache updates across all the child processes (whether they're on the same machine or different machines?

thanks

Clint

  • Comment on Re^4: Running SuperSearch off a fast full-text index.

Replies are listed 'Best First'.
Re^5: Running SuperSearch off a fast full-text index.
by creamygoodness (Curate) on Jun 11, 2007 at 18:58 UTC
    Does this mean that for mod_perl running the prefork MPM, each child process needs to load the cache? That must use a lot of memory, no?

    Yes, and KinoSearch is not thread safe. The memory requirements can be significant for large indexes, even though the data structures are not Perl's and attempts have been made to keep things compact.

    And how do you handle cache updates across all the child processes (whether they're on the same machine or different machines?

    A Searcher instance represents a snapshot of the index in time. Until you manually reload by creating a new Searcher, changes to the index are not visible.

    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com
      So maybe a reasonable solution would be:
      • a separate mod_perl search server, which takes search requests from the web server and returns (eg) an XML or Soap list of IDs
      • each child process checks (eg) a last_cache_update file once a minute to decide whether to reload the caches or not

      Clint

        Yes, that'll work. Tip: if you're fetching docs from a DB rather than from the index, you may want to turn off stored and vectorized for your fields to save disk space, though that will prevent you from using KinoSearch's built-in Highlighter. Those properties are on by default because disk space is cheap and stuff should be easy.
        --
        Marvin Humphrey
        Rectangular Research ― http://www.rectangular.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://620547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2021-10-24 07:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (89 votes). Check out past polls.

    Notices?