Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^3: Running SuperSearch off a fast full-text index.

by educated_foo (Vicar)
on Jun 10, 2007 at 21:02 UTC ( #620366=note: print w/replies, xml ) Need Help??


in reply to Re^2: Running SuperSearch off a fast full-text index.
in thread Running SuperSearch off a fast full-text index.

A lot of these are obsoleted by a good ranking function, which will tend to pull the best hits to the top even without the additional metadata. For example, a search for "rectangular humphrey" turns up this: "I'm starting to get offers from people who want to sponsor features in my CPAN distro, KinoSearch," which is very relevant -- I didn't realize you were the author of KinoSearch, which you are also suggesting as a platform.

I agree that node ratings, etc., can be useful, but one of Google's big lessons is that quantity can beat quality: intelligent analysis of huge amounts of generic data can beat analysis of specialized data. This is particularly visible in its approach to natural language translation, but is nearly as important in search.

  • Comment on Re^3: Running SuperSearch off a fast full-text index.

Replies are listed 'Best First'.
Re^4: Running SuperSearch off a fast full-text index.
by creamygoodness (Curate) on Jun 10, 2007 at 22:03 UTC
    I didn't realize you were the author of KinoSearch, which you are also suggesting as a platform.

    And I, in turn, was unaware that dmitri was my sock puppet. ;)

    intelligent analysis of huge amounts of generic data can beat analysis of specialized data.

    Sure, those techniques are powerful... The brute force "did you mean" stratagem[1] is tough to top, no question!

    As for whether we'll be able to deliver an overall improvement on the PerlMonks search experience, I guess we'll just have to present something, and people can vote with their clicks.

    [1] Major search engines decide what to suggest based on search history: what most people have typed in after misspelling something. This has proven superior to algorithms based on edit distance.

    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://620366]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (None)
    As of 2021-10-18 02:08 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My first memorable Perl project was:







      Results (72 votes). Check out past polls.

      Notices?