good chemistry is complicated,
and a little bit messy -LW
RFC: Peer to Peer Conceptual Search Engineby PerlGuy(Tom) (Acolyte)
|on Jan 28, 2020 at 10:37 UTC||Need Help??|
I consider myself a very novice Perl programmer, though I've been studying and using Perl for, I don't even know how many years. 30 maybe. Still so much to learn and so little time.
I got into programming because I wanted a better search engine than any that were available, back in the day, say 1995. (I still think a better search engine is needed and possible).
But all the programmer's I approached about my idea said
it was impossible. Or impossibly difficult.
I knew what needed to be done (or what I wanted at least, needed or not) from a user perspective, but I knew absolutely nothing about computer programming.
I thought, how hard could it be? So I started studying Perl. The best programming language around, as far as I was able to determine. Well, it was the only programming language, the study of which made me laugh, and gave me joy. Perl was poetry. Sometimes literally. Studying Perl made me happy and gave me hope. Best of all, it was open source.
After about ten years of reading old heavy Perl textbooks from Books-a-million's discount table, I was finally able to get a webserver going with Perl installed on a 386 and type a shebang! line and a very simple CGI script. Something like echo. Had a webpage with a form field and got the program to return something, without the program mysteriously erasing itself.
Ten more years and I finally had my own basic proof-of-concept search engine up and running on a free web-hosting service. Just barely. Along with a primative one page at a time web-crawler.
All this, in order to prove that a search engine could find websites by parameters other than key words. Things like concepts.
The reason I needed and wanted a more capable search engine was because I had been selected to take charge of a research organization and publisher that was networking with thousands of other organizations. I had to keep abreast of what all of these other organizations were doing. Their events and activities. All of this mass of information had to be prioritized. Part of my work was to attend events. This involved knowing what events were scheduled far enough in advance so as to reserve a space or table, or just schedule a lunch meeting while in a particular region with various individuals.
Reading through thousands of organizations fliers and newsletters and the like for the pertinent information was a never ending task that was never completed. I wanted to automate it.
I wanted to be able to have a computer scan through all this material and be able to find when and where the events were happening around the globe, in order to be able to travel in a circuit and attend as many events as possible. While traveling I also wanted to be able to meet with various individuals.
Also, many of these organizations had ongoing activities, but some were run-of-the-mill, and some were high priority. Some activities could wait, some required immediate action.
All of this kind of information is now up on websites. But sorting through it all is still mostly a manual chore. Technology for extracting various kinds of metadata from a website is available, but not often used for what is really important.
What I needed to be effective, and to be able to sort through all the most important data and have it organized and prioritized according to location and event schedules and all that kind of thing, was a search engine that could search for much more than key words; dates for events, locations, categories; like agricultural, political, scientific, environmental, human rights, etc. etc. There would need to be some means of prioritizing data according to importance and urgency, and perhaps credibility and any other such parameters. It would have to cut across languages: that is, be language independent. So, in other words, If I want to find events relating to organic hearloom gardening around the world, I need to be able to search for that as a concept, regardless if the actual data is presented in a language I may not understand. German, French, Spanish, Portuguese, whatever.
So I incorporated all of these features into my search engine. I think I can now demonstrate how all that I've described is possible. Oh, and, of course, the search engine needs to be able to search with any or all of these parameters simultaneously.
Now, I figure, if I can get a proof-of-concept search engine/spider/database working, being a mere self-taught, amateur Perl programmer, how much better could it be with some actual real experienced programmer's doing something with it?
But, my search engine still resides on a single server. It is therefore centralized. Ideally I believe it should be peer-to-peer, but, I haven't learned how to do that kind of programming yet.
Recently however, I discovered that there is, and has been, an open source, peer-to-peer search engine available for about the past 15 years, unbeknownst to me. Unfortunately it is written in Java. Well it is fortunate at least that it is open source, https://YaCy.net
I don't know Java. But, I learned some Perl, PhP, HTML, CSS, I also started to study many computer languages before settling on Perl, so, how hard could it be?
I shall be studying Java while trying to reverse engineer and transmutate YaCy from Java into Perl (if possible) and somehow or another integrate it with my own search engine metadata format.