Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Logical text search

by Wonko the sane (Deacon)
on Oct 13, 2004 at 01:14 UTC ( [id://398749]=perlquestion: print w/replies, xml ) Need Help??

Wonko the sane has asked for the wisdom of the Perl Monks concerning the following question:

I have a large database of documents that I want to provide users a way to do boolean
queries on. For example, I want to be able to interpret a search string like:
perl and code -java or perl + code not java
Both should find documents that have the words 'perl' and 'code', but not the word java.

I would think that there has to be someone that has done this already but I havent found
anything in my searches.

Does anyone know of a module that would provide this type of text searching functionality?

Best Regards,
Wonko

Replies are listed 'Best First'.
Re: Logical text search
by tachyon (Chancellor) on Oct 13, 2004 at 01:53 UTC

    I highly recommend swish-e for search tasks. The core indexing and search engine is in C which is what you want for speed. There is a very nice Perl XS interface called SWISH::API that comes with it, as well as a Perl cgi script. This is the code that runs the search on a lot of open source sites like Apache. There is also htdig which is the GNU search engine. It works well of course but I don't like it as much.

    cheers

    tachyon

      Heh. Back in 95-96, I used Swish (the original one) for my first Perl project. I ended up hacking on the C code to fix some bugs and enhance it a bit (doubled speed, etc). When I turned my patch back in, he told me that others had improved it 50-fold. I was crushed. *laughs*

      If Swish is still around with a Perl API now, I give it two and a half thumbs up.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Logical text search
by Zaxo (Archbishop) on Oct 13, 2004 at 01:31 UTC

    I think you should try splitting this into two problems.

    The first is to parse the user input, getting some kind of parse tree with logical operations in the interior nodes and search words at the leaves. You may want to do this systematically with a parser generator module.

    The second is to translate the parse tree into whatever mechanism your search uses, whether a bunch of greppy file scans or a database query.

    After Compline,
    Zaxo

Re: Logical text search
by jaldhar (Vicar) on Oct 13, 2004 at 02:12 UTC

    And just to add one more choice, there is Plucene which is a whole search engine building toolkit.

    --
    જલધર

Re: Logical text search
by perrin (Chancellor) on Oct 13, 2004 at 02:58 UTC
    SWISH-E is great. Use the Perl modules from the distribution, not the older ones on CPAN. There are also some more on CPAN: Search-InvertedIndex and DBIx::FullTextSearch. Finally, MySQL and PostgreSQL have full-text indexing options available.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://398749]
Approved by davidj
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-25 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found