Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Write Google Desktop Search plug-ins in Perl.

by techcode (Hermit)
on Sep 08, 2005 at 08:35 UTC ( [id://490147]=perlnews: print w/replies, xml ) Need Help??

I noticed interesting New! (new) button in my Google Toolbar. So I went to check it out. Looks like there is a new version of Google Desktop Search out.

But what's more important - it enables you to write plug-ins for it. They published an SDK. And most important - Perl is on the list of supported languages.

http://desktop.google.com/developer.html

Anyone has some ideas (and knowledge) for some plug-ins?

  • Comment on Write Google Desktop Search plug-ins in Perl.

Replies are listed 'Best First'.
Re: Write Google Desktop Search plug-ins in Perl.
by Tortue (Scribe) on Sep 08, 2005 at 13:07 UTC
    Google's Könguló plug-in "crawls websites you specify and makes them searchable via GDS". Its source is available (in Python...). It's a good way to get started writing Google Desktop plug-ins, according to this OnLamp article.
      Google's Könguló plug-in "crawls websites you specify and makes them searchable via GDS"
      And Könguló apparently doesn't work with the new Desktop beta.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Write Google Desktop Search plug-ins in Perl. ("words")
by tye (Sage) on Sep 08, 2005 at 20:27 UTC

    On my to-do list is to write a google-desktop-search plug-in (probably in Perl) that hands to google only searchable words or only the unique searchable words from text files (unless the text file is short enough to not require such processing) and to log when that leaves parts of the text file unindexed. And then override many of Google's "filters" to use this one instead.

    GDS has some unfortunate design problems (last I checked and as near as I can tell -- I haven't checked out GDS v2 because I could find absolutely no mention of improvements in these problems and it sounds like just a lots-of-flash, Microsoft-imitating interface do-over that I'd probably hate):

    1. It will only index 5000 "words" per file
    2. Repeated words count against this total
    3. Punctuation counts against this total even though you can't search on punctuation
    4. 80 punctuation characters in a row counts as 80 words! (but you can't search on any of them)
    5. The tool gives you no way to check which files have been indexed and makes no mention of the fact that it only indexed the first 2% or 5% or whatever of tons of your files

    Note that I have many files that have well under 5000 unique "words" (where here "words" means things that GDS will actually let me search for) that GDS silently only bothered to index the first tiny fraction of, in part, because they contained chunks of punctuation characters (I dislike speed-bump comments, but they were enshrined in the company coding standard before I arrived). It was a long and frustrating task to figure out that this was the problem.

    But I'm sure that if I only had a PhD or two, I'd understand why this design is actually superior to one that, I don't know, indexes most of the words of files such that you can search for them or at least tells you when its indexing of a file missed the vast majority of its content.

    (Yes, I do understand that most people have limited disk space, that sometimes a hard upper bound is a necessary evil, and that there is some validity to the Microsoft^WGoogle mindset of not showing people too much information because it confuses many of them. But it appears that Google felt it much more important to give me the ability to look at "cached" copies of the first few kilobytes of every previous version of a file over letting me search beyond the first few kilobytes. I also understand that resorting to only unique words will mean that searches for "adjacent words" won't work if those two words weren't adjacent the first time they appear in the document. Silly me, I find being able to search the entire content of larger files w/o "adjacent words" always working to be far superior to being able to use "adjacent words" and others searches over 100% of the first 2% of the file.) Yes, I'm bitter; thanks for noticing. (:

    - tye        

      Interesting. Is it likely that the web google indexer has similar limitations?
Re: Write Google Desktop Search plug-ins in Perl.
by b10m (Vicar) on Sep 08, 2005 at 12:08 UTC

    `grep`, `locate` and `find` should be enough to look for files ;-)

    --
    b10m
       'Google is Evil'
       -rw-rw-rw-  1 satan demons  0 Jun 06 06:06 google
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlnews [id://490147]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-25 10:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found