Write Google Desktop Search plug-ins in Perl.

Replies are listed 'Best First'.
Re: Write Google Desktop Search plug-ins in Perl. by Tortue (Scribe) on Sep 08, 2005 at 13:07 UTC
Google's Könguló plug-in "crawls websites you specify and makes them searchable via GDS". Its source is available (in Python...). It's a good way to get started writing Google Desktop plug-ins, according to this OnLamp article.	[reply]
Re^2: Write Google Desktop Search plug-ins in Perl. by QM (Parson) on Sep 08, 2005 at 17:52 UTC
Google's Könguló plug-in "crawls websites you specify and makes them searchable via GDS" And Könguló apparently doesn't work with the new Desktop beta. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re: Write Google Desktop Search plug-ins in Perl. ("words") by tye (Sage) on Sep 08, 2005 at 20:27 UTC
On my to-do list is to write a google-desktop-search plug-in (probably in Perl) that hands to google only searchable words or only the unique searchable words from text files (unless the text file is short enough to not require such processing) and to log when that leaves parts of the text file unindexed. And then override many of Google's "filters" to use this one instead. GDS has some unfortunate design problems (last I checked and as near as I can tell -- I haven't checked out GDS v2 because I could find absolutely no mention of improvements in these problems and it sounds like just a lots-of-flash, Microsoft-imitating interface do-over that I'd probably hate): It will only index 5000 "words" per file Repeated words count against this total Punctuation counts against this total even though you can't search on punctuation 80 punctuation characters in a row counts as 80 words! (but you can't search on any of them) The tool gives you no way to check which files have been indexed and makes no mention of the fact that it only indexed the first 2% or 5% or whatever of tons of your files Note that I have many files that have well under 5000 unique "words" (where here "words" means things that GDS will actually let me search for) that GDS silently only bothered to index the first tiny fraction of, in part, because they contained chunks of punctuation characters (I dislike speed-bump comments, but they were enshrined in the company coding standard before I arrived). It was a long and frustrating task to figure out that this was the problem. But I'm sure that if I only had a PhD or two, I'd understand why this design is actually superior to one that, I don't know, indexes most of the words of files such that you can search for them or at least tells you when its indexing of a file missed the vast majority of its content. (Yes, I do understand that most people have limited disk space, that sometimes a hard upper bound is a necessary evil, and that there is some validity to the Microsoft^WGoogle mindset of not showing people too much information because it confuses many of them. But it appears that Google felt it much more important to give me the ability to look at "cached" copies of the first few kilobytes of every previous version of a file over letting me search beyond the first few kilobytes. I also understand that resorting to only unique words will mean that searches for "adjacent words" won't work if those two words weren't adjacent the first time they appear in the document. Silly me, I find being able to search the entire content of larger files w/o "adjacent words" always working to be far superior to being able to use "adjacent words" and others searches over 100% of the first 2% of the file.) Yes, I'm bitter; thanks for noticing. (: - tye	[reply]
Re^2: Write Google Desktop Search plug-ins in Perl. ("words") by zby (Vicar) on Sep 09, 2005 at 19:55 UTC
Interesting. Is it likely that the web google indexer has similar limitations?	[reply]
Re: Write Google Desktop Search plug-ins in Perl. by b10m (Vicar) on Sep 08, 2005 at 12:08 UTC
`grep`, `locate` and `find` should be enough to look for files ;-) -- b10m 'Google is Evil' -rw-rw-rw- 1 satan demons 0 Jun 06 06:06 google	[reply]


"be consistent"
	PerlMonks