http://qs321.pair.com?node_id=75137

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i am trying to write a program that searches the documents in a foreign website simply because no body puts a search engine in their site themselves. i get how to search for words in documents in my own site, but how do i do it in someone elses? my dream is to be able to just put in; search:www.whatever.com for:perl, and then it will search all the documents in that directory for the word perl and then send back links. IS THAT TOO MUCH TO ASK?!?!?! : )

Replies are listed 'Best First'.
Re: search a foreign directory
by suaveant (Parson) on Apr 24, 2001 at 21:04 UTC
    well... the easy way would be to use something like wget which supports recursive downloads, then search it locally...

    Or you could write a web spider of your own in perl using LWP and search the pages each time (or make a local copy as with the wget). Probably be a good idea to cache the pages locally for a while and search them locally, then rebuild the link and go to the actual site.

    of course, google and altavista have an option to search within a specific domain, so if they are in there you could just use them :)

    Update BTW, to use the domain searching in AV and google go to their Advanced Search pages
                    - Ant

      You dont have to go to the advanced search page. Altavista supports (amongst others) these nice little shortcuts:

      +host:domain.com - only search for results on this domain
      +link:domain.com - only search for results that link to domain.com

      The latter is useful when you want to see who's linked to your site :)

      Just include them along with your search term to restrict. This means you can create a search of the site easily by auto populating a search box with the +host:domain.com string and allowing users to enter their term. Or use JavaScript to hide the term and present an empty search box.

      cLive ;-)