Don't ask to ask, just ask | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Lucy is very interesting in that it is a Perl port of Apache Java Lucene/Solr Which YaCy is based on, I think.
My search engine, if it can actually be called that, though, does not use full text search, or any actual text search whatsoever. Except possibly site description. It basically ignores all text and just focuses on structured metadata. It functions similar to a public library book indexing system where the indexing code has no real relation with any of the actual words in the books it indexes, where for example All books on computer programming are represented by the code 005 Personally, in many, if not most instances, I'm looking for a topic to read about. I don't need every word in two dozen different books on a particular topic indexed. In the public library, books on Perl are encoded with 005.13. what kind of mad lunatic would go into a library and expect the librarian to scan through every word of every book in the entire library system to find books containing a particular word or two? Yet, on the internet, that is the status quo. The librarian just points a finger. Comparatively speaking, what are the database requirements of full text indexing vs this kind of conceptual indexing used in libraries for a hundred years, which has the added advantage of being language independent? My "search engine" is, fundamentally, more a method for packaging and unpackaging metadata. Everything I generally ever need or want to know about a website can be encapsulated into a metadata string which more often then not takes up less space in the database than the websites URL. A text search of an entire document can sometimes be useful, but, wouldn't it usually be better to at least narrow the text search down to the resources within a more well defined topic area first? So, I'm interested, to some degree, on how to strip all that kind of full text searching stuff out. Or at least give it a secondary status of: use rarely and only if really needed. But a SUBJECT (like "Perl programming": 005.13) is just one facet of a website that can be encoded. As mentioned, there are many other things that are often neglected by both website creators and search engines. Or can only be accessed through proprietary database systems. An events calendar perhaps. Tom In reply to Re^2: RFC: Peer to Peer Conceptual Search Engine
by PerlGuy(Tom)
|
|