Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Bring Back the Old Supersearch

by dru145 (Friar)
on Apr 12, 2002 at 15:07 UTC ( [id://158603]=monkdiscuss: print w/replies, xml ) Need Help??

Ok, I admit, I wasn't a big fan of the old super search, but now I see how usefull it is compared to it's replacement. I guess the old saying applies here "You don't know what you have unil it's gone."

I can't even search on the string "DBI" now, because of the 4 letter requirement. Plus, I can't speed up my query by searching just a specific date range.

Why o why monks did you take away my beloved, but often cursed, super search? Please bring it back.

Thanks,
Dru
Another satisfied monk.

Replies are listed 'Best First'.
(tye)Re: Bring Back the Old Supersearch
by tye (Sage) on Apr 12, 2002 at 15:30 UTC

    We've been working on that for some time. It isn't a simple problem.

    Why o why monks did you take away my beloved, but often cursed, super search?
    because it made the site unstable. A single search with an unfortunate choice of criteria could lock up the site for minutes. This is related to the move to pair.com which resulted in mysqld being more sensitive to long-running queries that don't monopolize resources but (under FreeBSD) somehow interferes with other queries.

    Much, perhaps all, of the site slow downs and lock ups were probably due to the old super search.

            - tye (but my friends call me "Tye")
Re: Bring Back the Old Supersearch
by rinceWind (Monsignor) on Apr 12, 2002 at 15:47 UTC
    Fellow brethren and monastic powers that be,

    Have you considered having a duplicate of the Perlmonks database on a different host, just for searching? This could be a data warehouse, refreshed once per day via a dump, as most of the very recent stuff is not that vital for searching.

    There could also be some keyword indices built to speed up searching.

    My $0.02 worth --rW

Use Google Site Keyword - Re: Bring Back the Old Supersearch
by metadoktor (Hermit) on Apr 12, 2002 at 19:56 UTC

        Yes, you'll have better luck searching against the above mirror, which is easily accomplished by clicking the Google link that I put on the Super Search page quite a while ago. (:

                - tye (but my friends call me "Tye")
Re: Bring Back the Old Supersearch
by mrbbking (Hermit) on Apr 12, 2002 at 16:51 UTC
    We now know the "long term cost" of it's being "pumped up on steroids", I suppose...

    :-)

Re: Bring Back the Old Supersearch
by belg4mit (Prior) on Apr 15, 2002 at 05:58 UTC
    Not knowing what's under the hood, some things to consider if not yet implemented:
  • making the search AND, I've actually found it quite frustrating that it is OR. Or allow logic, I am not sure how this affects search time, but it should result in a smaller recordset to handle, and a smaller page to return.
  • forbid searching for perl keywords (operators, functions...) that are not ANDed with something, shortcircuit to Simple Search.
  • likewise rule out badwords (common words), at least not when ANDed with something else.
  • paginating the results. Bandwidth in bursts, and and a good database with LIMIT ought to stop once it reaches the limit. I imagine subsequent pages would need to be told the node ID whence to search, to contintue with the performance boost?
  • --
    perl -pe "s/\b;([mnst])/'\1/mg"

      A thought I just had. I thought this existed before, the ability to bound a search by date. This being useful for the frequent "I remember there was something about foo last week".
    • --
      perl -pew "s/\b;([mnst])/'$1/g"

Re: Bring Back the Old Supersearch
by mattr (Curate) on Apr 18, 2002 at 14:48 UTC
    Hmm, when I search for my own login name I only get 15 or so results. So another limitation to put on the page is "only top 15 results returned"?

    Not that I was in love with the old SuperSearch, but the new one seems of very limited use. Could you describe what the basic problem is in terms of amount of data to be searched, number of records, and number of searches per minute?

    I'm not convinced Mysql has such an incredible text search mechanism.. Perhaps using more Perl or something else? I have had great results with htdig on spidered content files, maybe better results than most since I didn't lose sleep over a last security hole that was found recently.. my mod_perl wrapper was suitably paranoid.

    Might I suggest that text to be searched is saved in another database designed solely for text searching? At the very least, it will not impact mysql at all. It also will be based on first learning which words are in each page (not depending on regexes) and using inverted indices. Synonyms, homonyms, misspellings, and fuzzy weighting of these algorithms are possible, and the redesigned engine would output only a certain number of results at a time.

    A very straightforward hack using the htdig system might be to periodically output new nodes as files to disk, with some embedded fields for node id/title/author. For example it can search mail header fields in mailing lists. Or maybe the extra fields are looked up through a separate b-tree. Then the htdig database would be updated with those files, and the files are erased. Your mod_perl code slurps up the results and builds a search page the way you like it using the tag data.

    Though I'm sure you've banged at this for a while, I just feel there are other solutions to the problem, TMTOWTDI. I'd be willing to do it. Anyway, you can try a boolean search on a gigabyte of data (60 sites) with word stemming here. Though a perl-only solution may still be totally doable. My system (I call it EyeLatitude) is meant to allow various search engines to be plugged into the back of it, all bound up in perl happiness. I'm selling it for significant bucks, but free to the monks if you want the code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://158603]
Approved by Sidhekin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-24 02:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found