Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: RFC: Peer to Peer Conceptual Search Engine

by soonix (Canon)
on Jan 28, 2020 at 11:55 UTC ( #11111967=note: print w/replies, xml ) Need Help??

in reply to RFC: Peer to Peer Conceptual Search Engine

To reverse engineer a system like YaCy, programming skills, even in both languages (Perl and Java), will not be sufficient. Much more important is knowledge of the underlying concepts (as opposed to the concepts that your search engine is to search/find) and understanding how they are connected.

Why? Simply because the two languages differ in their concepts, and translating it 1:1 would result in a behemoth.

Most probably the structure of YaCy is dictated (at least partially) by the structures that Java supports best, which are not necessarily those a Perl programmer would even consider. There might be similiar - but not the same - libraries for both languages. And so on.

While writing this, I stumbled over A Tagcloud For Cory Doctorow, P2P Homework and Lucy, which might be not usable, but interesting in this context.

  • Comment on Re: RFC: Peer to Peer Conceptual Search Engine

Replies are listed 'Best First'.
Re^2: RFC: Peer to Peer Conceptual Search Engine
by PerlGuy(Tom) (Acolyte) on Jan 28, 2020 at 14:12 UTC
    Lucy is very interesting in that it is a Perl port of Apache Java Lucene/Solr Which YaCy is based on, I think.

    My search engine, if it can actually be called that, though, does not use full text search, or any actual text search whatsoever. Except possibly site description. It basically ignores all text and just focuses on structured metadata.

      The closest thing I ever came across in terms of an IDEAL search engine was the custom site search for

      Over 100,000 groups and organizations and unnumbered individuals, worldwide networked and organized through this social network, which would have been impossible without the unique multifaceted search interface.

      What happened to this social network? One day, it was simply announced that the site was shutting down. All that remains, it seems, is some of the non functional static pages archived on the Wayback Machine.

      Here is an Internet Archive page showing the deceptively simple search interface:

      It had conceptual indexing of facets such as; "Solutions" (to world problems, issues and concerns) along with Organizations, Groups, People, Events, Resources, etc. Also these facets could be simultaneously searched by language, location and if desired, key word. I really loved that search engine.

      I may be a wee bit paranoid or something, but it seems, nearly every trace of the original free, open source WiserEarth API, and all documentation has been scrubbed from the internet. Including the Internet Archive. If anyone has a tip where it can still be found, I'd appreciate that.

      So, this brings to the foreground, one of the problems of centralized indexing. If a well organized, worldwide, social activist community becomes problematic, it is all to easy to take out a central server. Or maybe the maintainers of the site just got tired of maintaining it. Either way, something hundreds of thousands of world betterment groups, organizations and individual activists depended, really depended on, vanished.

      What essentially pulled all these groups and organizations together was a database with a functional search engine geared towards real human needs.


        Your "conceptual indexing" sounds a lot like what's today called "social bookmarking", which tries to apply a similiar process to webpages as used in libraries. The Wikipedia page has a section "Comparison with search engines".

        The Search API probably was derived from (or the same as) the WiserEarth API, which (still) is in the Internet Archive (FAQ and Documentation)

        I don't think there's active scrubbing going on, the "normal" entropic force is strong enough already, especially if the information in question needs active maintenance.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11111967]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2020-05-25 08:50 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (144 votes). Check out past polls.